File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-0312_metho.xml

Size: 9,781 bytes

Last Modified: 2025-10-06 14:07:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0312">
  <Title>Building a Robust Dialogue System with Limited Data *</Title>
  <Section position="4" start_page="61" end_page="61" type="metho">
    <SectionTitle>
3 The One-Grammar Approach
</SectionTitle>
    <Paragraph position="0"> In a domain with limited data, the inability to collect a sufficient corpus for training a statistical language model can be a significant problem. For CommandTalk, we did not create a statistical language model. Instead, with information gathered from interviews of subject matter experts (SME's), we developed a handwritten grammar using Gemini (Dowding et al., 1993), a unification-based grammar formalism. We used this unification grammar for both natural language understanding and generation, and, using a grammar compiler we developed, compiled it into a context-free form suitable for the speech recognizer as well.</Paragraph>
    <Paragraph position="1"> The effe~s_ of this single-grammar approach on the robustness of the CommandTalk system were twofold. On the negative side, we presumably ended up with a recognition language model with less coverage than a statistical model would have had. Our attempts to deal with this are discussed in the next section. On the positive side, we eliminated the usual discrepancy in coverage between the recognizer and the natural language parser. This was advantageous, since no fragment-combining or other parsing robustness techniques were needed.</Paragraph>
    <Paragraph position="2"> Our approach had other advantages as well. Any changes we made to the understanding grammar were automatically reflected in the recognition and generation grammars, making additions and modifications efficient. Also, anecdotal evidence suggests that the language used by the system often influences the language used by speakers, so maintaining consistency between the input and output of the system is desirable.</Paragraph>
  </Section>
  <Section position="5" start_page="61" end_page="62" type="metho">
    <SectionTitle>
4 Utterance-Level Robustness
</SectionTitle>
    <Paragraph position="0"> It is difficult to write a grammar that is constrained enough to be useful without excluding some reasonable user utterances. To alleviate this problem, we modified the speech recognition grammar and natural language parser to allow certain &amp;quot;closeto-grammar&amp;quot; utterances. Utterances with inserted words, such as Center on Checkpoint 1 now or zoom way out (where Center on Checkpoint 1 and zoom out are grammatical) were permitted by allowing the recognizer to skip unknown words. We also allowed utterances with deleted words, as long as those words did not contribute to the semantics of the utterance as determined by the Gemini semantic rules constraining logical forms. For example, a user could say, Set speed, 40 kph rather than Set speed to 40 kph.</Paragraph>
    <Paragraph position="1"> The idea behind these modifications was to allow utterances with a slightly broader range of wordings than those in the grammar, but with essentially the same meanings: We began by testing the effects of these modifications on in-grammar utterances, to ensure that  they did not significantly decr egse recognition performance. We used a small test corpus of approximately 800 utterances read by SRI employees. We collected four measures of performance: * Recognition time, measured, in multiples of CPU real time (CPURT). A recognition time of lxCPURT means that on,our CPU (a Sun Ultra2), recognition took exactly as~ long as the duration of the utterance. : * Sentence reject rate (SRR).' The percentage of sentences that the recognizer rejects.</Paragraph>
    <Paragraph position="2"> * Adjusted word error rate (A:WER). The percentage of words in non:rejected sentences that are misrecognized.</Paragraph>
    <Paragraph position="3"> * Sentence error rate (SER). The percentage of sentences in which some sort of error occurred, either a complete rejection or misrecognized word.</Paragraph>
    <Paragraph position="4"> Several parameters affected the results, most notably the numerical penalties assigned for inserting or deleting words, and the pruning threshold of the recognizer. Raising the pruning threshold caused both reject and error rates to go down, but slowed recognition. Lowering the penalties caused rejection rates to go down, but word and Sentence error rates to go up, since some sentences which had been rejected were now recognized partially correctly, and some sentences which had been recognized correctly now included some errors. Lowering the penalties also led to slower recognition.</Paragraph>
    <Paragraph position="5"> Table 1 shows recognition results for the non-robust and robust versions 0f the recognition grammar on in-grammar utterances: Th e pruning threshold is the same for both versions and the insertion and deletion penalties are set to intermediate values. Recognition times for the robust grammar are about 60% slower than those of the control grammar, but still at acceptable levels. Reject and error rates are fairly close for the two grammars. Overall, adding robustness to the recognition grammar did not severely penalize in-grammar recognition performance. null We had very little out-of-grammar data for CommandTalk, and finding subjects in this highly specialized domain would have been difficult and expensive. To test our robustness techniques on out- null of-grammar utterances, we decided to port them to another domain with easily accessible users and data; namely, the ATIS air travel domain. We wrote a small grammar covering part of the ATIS data and ,compiled it into a recognition grammar using the same techniques as in CommandTalk. Unfortunately, we were unable to carry out any experiments, because the recognition grammar we derived yielded recognition times that were so slow as to be impractical. We discuss these results further in Section 6.</Paragraph>
  </Section>
  <Section position="6" start_page="62" end_page="62" type="metho">
    <SectionTitle>
5 Diaiogue-Level Robustness
</SectionTitle>
    <Paragraph position="0"> To be considered robust at the dialogue level, a system must be able to deal with situations where an utterance is recognized and parsed, but cannot be interpreted withi~4he current system state or dialogue context. In addition~it must be easy for the user to correct faulty interpretations on the part of the system. Contextual interpretation problems may occur for a variety of reasons, including misrecognitions, incorrect reference resolution, and confusion or incompleteness on the part of the user.</Paragraph>
    <Paragraph position="1"> The CommandTalk dialogue manager maintains a Stack to ~keep 'track of the current discourse context and uses small finite-state machines to represent different~ types of subdialogues. Below we illustrate some types of subdialogues and other techniques which provide robustness at the dialogue level. Note that for each utterance, we write what the system recognizes, not what the user actually says.</Paragraph>
    <Section position="1" start_page="62" end_page="62" type="sub_section">
      <SectionTitle>
5.1 Correction Subdlalogues
</SectionTitle>
      <Paragraph position="0"> Allowing the user to correct full or partial utterances can remedy interpretation problems caused by misrecognitions, incorrect reference resolution, or user error.</Paragraph>
      <Paragraph position="1"> In Example 1, the system responds to the user's first utterance by producing a rising tone, illustrated by the (r) symbol, to indicate successful interpretation and execution of the command, in this case creation of a CEV, a type of vehicle. (Unsuccessful interpretation is indicated by a falling tone, illustrated by the (r) symbol.) In utterances 3 through 6, a misrecognition causes the system to perform the wrong behavior. The user initiates a correction subdialogue, and the system goes on to correctly re-interpret the full utterance.</Paragraph>
    </Section>
    <Section position="2" start_page="62" end_page="62" type="sub_section">
      <SectionTitle>
5.2 Implicit Confirmation
</SectionTitle>
      <Paragraph position="0"> Use of implicit confirmation in combination with correction subdialogues makes it easy to correct faulty interpretations as soon as possible by alerting the user to possible sources of error.</Paragraph>
      <Paragraph position="1"> In utterances 7 and 8, the system must resolve the user's reference, &amp;quot;CEV&amp;quot;, to a particular unit. It therefore echoes the user's command using the CEV's unique call sign. This makes explicit the system's interpretation of the user's utterance, giving the user a chance to correct the system if necessary.</Paragraph>
      <Paragraph position="2"> Note that utterance 4 also contains an implicit confirmation, since the system has resolved the user's gesture to a set of coordinates.</Paragraph>
    </Section>
    <Section position="3" start_page="62" end_page="62" type="sub_section">
      <SectionTitle>
5.3 Clarification Subdialogues
</SectionTitle>
      <Paragraph position="0"> Clarification subdialogues are generally initiated by the system as a result of errors or incomplete commands on the part of the user.</Paragraph>
      <Paragraph position="1"> Example 3 illustrates three different types of problems that can be corrected by system questions.</Paragraph>
      <Paragraph position="2"> First, the user's reference to &amp;quot;CEV&amp;quot; in utterance 11 is ambiguous, so the system asks a question to determine which CEV the user is referring to. Next, the system asks the user to supply a missing piece of information that is required to carry out the command. Finally, when the user makes an error by referring to a point that doesn't exist, the system prompts for a correction.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML