File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/n03-4005_metho.xml
Size: 4,431 bytes
Last Modified: 2025-10-06 14:08:16
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-4005"> <Title>A Spoken Dialogue Interface to a Geologist's Field Assistant</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Example Dialogue </SectionTitle> <Paragraph position="0"> The language capabilities developed so far are largely direct commanding with the user controlling task initiative.</Paragraph> <Paragraph position="1"> A sample of user commands is given in Table 1. A system response is always given, but is usually omitted below for the sake of brevity. When given, the system response appears in italics.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Architecture </SectionTitle> <Paragraph position="0"> This spoken dialogue system shares a common architecture with several prior systems: CommandTalk (Stent et al., 1999), PSA (Rayner et al., 2000), WITAS (Lemon et al., 2001), and the Intelligent Procedure Assistant (Aist et al., 2002). The architecure has been well described in prior work. The critical feature of the architecture relevant to this work is the use of a grammar-based language model for speech recognition that is automatically derived from the same Unification Grammar that is used for parsing and interpretation.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Data Collection </SectionTitle> <Paragraph position="0"> The Mobile Agents project conducted two field tests in 2002: a one week dress rehearsal at JSC in the Mars yard in May, and a two week field test in the Arizona desert in September, split between two sites of geological interest, one near the Petrified Forest National Park, and the other on the ejecta field at Meteor Crater. We collected approxmimately 5,000 recorded sound files from 8 subjects during the September tests, some from space-suit subjects, and the rest in shirt-sleeve walk-throughs (still a high wind condition). We transcribed 1059 wave files.</Paragraph> <Paragraph position="1"> All conditions were performed open-mic and all sounds that were picked up by the microphone were recorded, so not all of these files contained within-domain utterances intended. Of the transcribed sound files, 208 contained no speech (mostly wind noise) and 243 contained out-of-domain speech that was intended for other hearers. That left 608 within-domain utterances that were split 80%20% into test and training utterances.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Technical Challanges </SectionTitle> <Paragraph position="0"> The Geologist's Field Assitant requires the ability to make voice notes that can be stored and transmitted.</Paragraph> <Paragraph position="1"> We implemented this by adding a recording mode to the speech recognizer agent, and temporarily increasing the speech end-pointing interval. This allows us to record multi-sentence voice notes without treating inter-sentence pauses as end-of-voice-note markers. Entering recording mode is triggered by specific speach acts like Take a voice note or Annotate sample bag one.</Paragraph> <Paragraph position="2"> When considering recognition accuracy in the open-mic condition, we consider additional metrics beyond word-error rate (WER). Since the recognizer can fail to find a hypothesis for an utterance, we compute the falserejection rate (FREJ) for within-domain utterances and adjusted word-error (AWER) counting only the word errors on the non-rejected utterances. We also consider misrecognitions of out-of-domain utterance as withindomain, and compute the false-accept rate (FACC). Table 2 gives the performance results for the grammar-based language model that was used in the September test. This model gives reasonable performance on within-domain utterances, but falsely accepts 25.5% of out-of-domain utterances. After the September test, we used the training data we had collected to build a Probabilistic Context-Free Grammar using the compute-grammar-probs tool that comes with Nuance (Nuance, 2002). Using only 485 utterances of training data, there was improvement in both the AWER and FACC rates, resulting in a language model where both FREF and FACC were under 10%. There was also a substantial improvement in recognition speed, as measured in multiples of CPU real-time.</Paragraph> </Section> class="xml-element"></Paper>