File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/p99-1053_metho.xml
Size: 9,760 bytes
Last Modified: 2025-10-06 14:15:28
<?xml version="1.0" standalone="yes"?> <Paper uid="P99-1053"> <Title>Integrating multiple knowledge sources</Title> <Section position="4" start_page="415" end_page="415" type="metho"> <SectionTitle> 4 Verification of the </SectionTitle> <Paragraph position="0"> Framework To test this framework, data was examined from 31 TRAINS 93 dialogs (Heeman and Allen, 1995), a series of human-human problem solving dialogs in a railway transportation domain. 5 There were 3441 utterances, 6 19189 words, 259 examples of overlapping utterances, and 495 speech repairs.</Paragraph> <Paragraph position="1"> The framework presented above covered all the overlapping utterances and speech repairs with three exceptions. Ordering the words of two speakers strictly by word ending points neglects the fact that speakers may be slow to interrupt or may anticipate the original speaker and interrupt early. The latter was a problem in utterances 80 and 81 of dialog d92a-l.2 as shown below. The numbers in the last row represent times of word endings; for example, so ends at 255.5 seconds into the dialog. Speaker s uttered the complement of u's sentence before u had spoken the verb. The overlapping speech was confusing enough to the speakers that they felt they needed to reiterate utterances 80 and 81 in the next utterances. The same is true of the other two such examples in the corpus. It may be the case that a more sophisticated model of interruption will not be necessary if speakers cannot follow completions that lag or precede the correct interruption area.</Paragraph> </Section> <Section position="5" start_page="415" end_page="417" type="metho"> <SectionTitle> 5 The Dialog Parser Implementation </SectionTitle> <Paragraph position="0"> In addition to manually checking the adequacy of the framework on the cited TRAINS data, we tested a parser imple-SSpecifically, the dialogs were d92-1 through mented as discussed in section 3 on the same data. The parser was a modified version of the one in the TRIPS dialog system (Ferguson and Allen, 1998). Users of this system participate in a simulated evacuation scenario where people must be transported along various routes to safety. Interactions of users with TRIPS were not investigated in detail because they contain few speech repairs and virtually no interruptions. T But, the domains of TRIPS and TRAINS are similar enough to allow us run TRAINS examples on the TRIPS parser.</Paragraph> <Paragraph position="1"> One problem, though, is the grammatical coverage of the language used in the TRAINS domain. TRIPS users keep their utterances fairly simple (partly because of speech recognition problems) while humans talking to each other in the TRAINS domain felt no such restrictions. Based on a 100-utterance test set drawn randomly from the TRAINS data, parsing accuracy is 62% 8 However, 37 of these utterances are one word ~The low speech recognition accuracy encourages users to produce short, carefully spoken utterances leading to few speech repairs. Moreover, the system does not speak until the user releases the speech input button, and once it responds will not stop talking even if the user interrupts the response. This virtually eliminates interruptions.</Paragraph> <Paragraph position="2"> unique utterance interpretation. The parser was counted as being correct if one of the interpretations it returned was correct. The usual cause of failure was the parser finding no interpretation. Only 3 failures were due to the parser returning only incorrect interpretations.</Paragraph> <Paragraph position="3"> long (okay, yeah, etc.) and 5 utterances were question answers (two hours, in Elmira); thus on interesting utterances, accuracy is 34.5%. Assuming perfect speech repair detection, only 125 of the 495 corrected speech repairs parsed. 9 Of the 259 overlapping utterances, 153 were simple backchannels consisting only of editing terms (okay, yeah) spoken by a second speaker in the middle of the first speaker's utterance. If the parser's grammar handles the first speaker's utterance these can be parsed, as the second speaker's interruption can be skipped. The experiments focused on the 106 overlapping utterances that were more complicated. In only 24 of these cases did the parser's grammar cover both of the overlapping utterances.</Paragraph> <Paragraph position="4"> One of these examples, utterances utt39 and 40 from d92a-3.2 (see below), involves three independently formed utterances that overlap. We have omitted the beginning of s's utterance, so that would be five a.m. for space reasons. Figure 2 shows the syntactic structure of s's utterance (a relative clause) under the words of the utterance, u's two utterances are shown above the words of figure 2. The purpose of this figure is to show how interpretations can be formed around interruptions by another speaker and how these interruptions themselves form interpretations. The specific syntactic 9In 19 cases, the parser returned interpretation(s) but they were incorrect but not included in the above figure.</Paragraph> <Paragraph position="5"> structure of the utterances is not shown.</Paragraph> <Paragraph position="6"> Typically, triangles are used to represent a parse tree without showing its internal structure. Here, polygonal structures must be used due to the interleaved nature of the utterances.</Paragraph> <Paragraph position="7"> s: when it would get to bath u: okay how about to dansville Figure 3 is an example of a collaboratively built utterance, utterances 132 and 133 from d92a-5.2, as shown below, u's interpretation of the utterance (shown below the words in figure 3) does not include s's contribution because until utterance 134 (where u utters right) u has not accepted this continuation.</Paragraph> <Paragraph position="8"> u: and then I go back to avon s: via dansville</Paragraph> </Section> <Section position="6" start_page="417" end_page="418" type="metho"> <SectionTitle> 6 Rescoring a Pre-parser </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="417" end_page="418" type="sub_section"> <SectionTitle> Speech Repair Identifier </SectionTitle> <Paragraph position="0"> One of the advantages of providing speech repair information to the parser is that the parser can then use its knowledge of grammar and the syntactic structure of the input to correct speech repair identification errors.</Paragraph> <Paragraph position="1"> As a preliminary test of this assumption, we used an older version of Heeman's language model (the current version is described in (Heeman and Allen, 1997)) and connected it to the current dialog parser. Because the parser's grammar only covers 35% of input sentences, corrections were only made based on global grammaticality.</Paragraph> <Paragraph position="2"> The effectiveness of the language module without the parser on the testing corpus is shown in table 1. ideg The testing corpus conidegNote, current versions of this language model perform significantly better.</Paragraph> <Paragraph position="3"> sisted of TRAINS dialogs containing 541 repairs, 3797 utterances, and 20,069 words, ii For each turn in the input, the language model output the n-best predictions it made (up to 100) regarding speech repairs, part of speech tags, and boundary tones.</Paragraph> <Paragraph position="4"> The parser starts by trying the language model's first choice. If this results in an interpretation covering the input, that choice is selected as the correct answer. Otherwise the process is repeated with the model's next choice. If all the choices are exhausted and no interpretations are found, then the first choice is selected as correct. This approach is similar to an experiment in (Bear et al., 1992) except that Bear et al. were more interested in reducing false alarms. Thus, if a sentence parsed without the repair then it was ruled a false alarm. Here the goal is to increase recall by trying lower probability alternatives when no parse can be found.</Paragraph> <Paragraph position="5"> The results of such an approach on the test corpus are listed in table 2. Recall increases by 4.8% (13 cases out of 541 repairs) showing promise in the technique of rescoring the output of a pre-parser speech repair identifier. With a more comprehensive grammar, a strong disambiguation system, and the current version of Heeman's language model, the results should get better. The drop in precision is a worthwhile tradeoff as the parser is never forced to accept posited repairs but is merely given the option of pursuing alternatives that include them.</Paragraph> <Paragraph position="6"> Adding actual speech repair identification (rather than assuming perfect identification) gives us an idea of the performance improvement (in terms of parsing) that speech repair handling brings us. Of the 284 repairs correctly guessed in the augmented model, 79 parsed, i2 Out of 3797 utterances, this means that 2.1% of the time the parser would have failed without speech repair informanSpecifically the dialogs used were d92-1 through d92a-5.2; d93-10.1 through d93-10.4; and d93-11.1 through d93-14.2. The language model was never simultaneously trained and tested on the same data. i2In 11 cases, the parser returned interpretation(s) but they were incorrect and not included in the above figure.</Paragraph> <Paragraph position="7"> tion. Although failures due to the grammar's coverage are much more frequent (38% of the time), as the parser is made more robust, these 79 successes due to speech repair identification will become more significant. Further evaluation is necessary to test this model with an actual speech recognizer rather than transcribed utterances.</Paragraph> </Section> </Section> class="xml-element"></Paper>