XML Viewer - p95-1005

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/95/p95-1005_evalu.xml
Size: 6,451 bytes
Last Modified: 2025-10-06 14:00:14
<?xml version="1.0" standalone="yes"?>
<Paper uid="P95-1005">
  <Title>Discourse Processing of Dialogues with Multiple Threads</Title>
  <Section position="8" start_page="35" end_page="37" type="evalu">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> The evaluation was conducted on a corpus of 8 previously unseen spontaneous English dialogues containing a total of 223 sentences. Because spoken language is imperfect to begin with, and because the parsing process is imperfect as well, the input to the discourse processor was far from ideal. We are encouraged by the promising results presented in figure 6, indicating both that it is possible to successfully process a good measure of spontaneous dialogues in a restricted domain with current technology, 5 and that our extension of TST yields an improvement in performance.</Paragraph>
    <Paragraph position="1"> The performance of the discourse processor was evaluated primarily on its ability to assign the correct speech act to each sentence. We are not claiming that speech act recognition is the best way to evaluate the validity of a theory of discourse, but because speech act recognition is the main aspect of the discourse processor which we have implemented, and because recognizing the discourse structure is part of the process of identifying the correct speech act, we believe it was the best way to evaluate the difference between the two different focusing mechanisms in our implementation at this time. Prior to the evaluatic.n, the dialogues were analyzed by hand sit should be noted that we do not claim to have solved the problem of discourse processing of spontaneous dialogues. Our approach is coursely grained and leaves much room for future development in every respect.</Paragraph>
    <Paragraph position="2">  and sentences were assigned their correct speech act for comparison with those eventually selected by the discourse processor. Because the speech acts for the test dialogues were coded by one of the authors and we do not have reliability statistics for this encoding, we would draw the attention of the readers more to the difference in performance between the two focusing mechanisms rather than to the absolute performance in either case.</Paragraph>
    <Paragraph position="3"> For each sentence, if the correct speech act, or either of two equally preferred best speech acts were recognized, it was counted as correct. If a weaker form of a correct speech act was recognized, it was counted as acceptable. See the previous section for more discussion about weaker forms of speech acts.</Paragraph>
    <Paragraph position="4"> Note that if a stronger form is recognized when only the weaker one is correct, it is counted as wrong.</Paragraph>
    <Paragraph position="5"> And all other cases were counted as wrong as well, for example recognizing a suggestion as an acceptance. null In each category, the number of speech acts determined based on plan inference is noted. In some cases, the discourse processor is not able to assign a speech act based on plan inference. In these cases, it randomly picks a speech act from the list of possible speech acts returned from the matching rules. The number of sentences which the discourse processor was able to assign a speech act based on plan inference increases from 164 (74%) with Standard TST to 186 (83%) with Extended TST. As Figure 6 indicates, in many of these cases, the discourse processor guesses correctly. It should be noted that although the correct speech act can be identified without plan inference in many cases, it is far better to recognize the speech act by first recognizing the role the sentence plays in the dialogue with the discourse processor since this makes it possible for further processing to take place, such as ellipsis and anaphora resolution. 6 You will notice that Figure 6 indicates that the 6Ellipsis and anaphora resolution are areas for future development.</Paragraph>
    <Paragraph position="6"> biggest difference in terms of speech act recognition between the two mechanisms is that Extended TST got more correct where Standard TST got more acceptable. This is largely because of cases like the one in Figure 4. Sentence 5 is an acceptance to the suggestion made in sentence 3. With Standard TST, the inference chain for sentence 3 would no longer be on the active path when sentence 5 is processed. Therefore, the inference chain for sentence 5 cannot attach to the inference chain for sentence 3. This makes it impossible for the discourse processor to recognize sentence 5 as an acceptance. It will try to attach it to the active path. Since it is a statement informing the listener of the speaker's schedule, a possible speech act is State-Constraint. And any State-Constraint can attach to the active path as a confirmation because the constraints on confirmation attachments are very weak. Since State-Constraint is weaker than Accept, it is counted as acceptable. While this is acceptable for the purposes of speech act recognition, and while it is better than failing completely, it is not the correct discourse structure. If the reply, sentence 5 in this example, contains an abbreviated or anaphoric expression referring to the date and time in question, and if the chain of inference attaches to the wrong place on the plan tree as in this case, the normal procedure for augmenting the shortened referring expression from context could not take place correctly as the attachment is made.</Paragraph>
    <Paragraph position="7"> In a separate evaluation with the same set of dialogues, performance in terms of attaching the current chain of inference to the correct place in the plan tree for the purpose of augmenting temporal expressions from context was evaluated. The results were consistent with what would have been expected given the results on speech act recognition. Standard TST achieved 64.3% accuracy while Extended TST achieved 70.4%.</Paragraph>
    <Paragraph position="8"> While the results are less than perfect, they indicate that Extended TST outperforms Standard TST on spontaneous scheduling dialogues. In summary, Figure 6 makes clear, with the extended version of TST, the number of speech acts identified correctly  increases from 161 (72%) to 171 (77%), and the number of sentences which the discourse processor was able to assign a speech act based on plan inference increases from 164 (74%) to 186 (83%).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML