File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/p03-2009_evalu.xml

Size: 3,338 bytes

Last Modified: 2025-10-06 13:58:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-2009">
  <Title>Spkr ID Words Discourse Chunk</Title>
  <Section position="7" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5 Discussion and future work
</SectionTitle>
    <Paragraph position="0"> Table 5.1 shows the results from two DA tagging runs using the case-based reasoning tagger: one run without discourse chunks, and one with.</Paragraph>
    <Paragraph position="1"> Without discourse chunks With discourse chunks</Paragraph>
    <Paragraph position="3"> Table 5.1: Overall accuracy for the CBR tagger To put these results in perspective, human performance has been estimated at about 84% (Stolcke 2000), since human taggers sometimes disagree about intentions, especially when speakers perform more than one dialogue act in the same utterance. Much of the recent DA tagging work (using 18-25 tags) scores around the mid-fifty to mid-sixty percentiles in accuracy (see Stolcke 2000 for a review of similar work). This work uses the Verbmobil-2 tagset of 32 tags.</Paragraph>
    <Paragraph position="4"> It could be argued that the discourse chunk information, being based on tags, gives the DA tagger extra information about the tags themselves, and thus gives an unfair boost to the performance. At present it is difficult to say if this is the only reason for the performance gains. If this were the case, we would expect to see improvement in recognition for the four tags that are chunk starters, and less of a gain in those that are not.</Paragraph>
    <Paragraph position="5"> In the test run with discourse chunks, however, we see across-the-board gains in almost all categories, regardless of whether they begin a chunk or not. Table 5.2 shows performance measured in terms of the well-known standards of precision, recall, and f-measure.</Paragraph>
    <Paragraph position="6"> One notable exception to the upward trend is EXCLUDE, a beginning-of-chunk marker, which performed slightly worse with discourse chunks.</Paragraph>
    <Paragraph position="7"> This would suggest that chunk information alone is not enough to account for the overall gain. Both ACCEPT and FEEDBACK_POSITIVE improved slightly, suggesting that discourse chunks were able to help disambiguate these two very similar tags.</Paragraph>
    <Paragraph position="8"> Table 5.3 shows the improvement in tagging scores for one-word utterances, often difficult to tag because of their general use and low information. These words are more likely to be tagged ACCEPT when they appear near the beginning of a chunk, and FEEDBACK_POSITIVE when they appear nearer the end. Discourse chunks help their classification by showing their place in the dialogue cycle.</Paragraph>
    <Paragraph position="9"> One weakness of this project is that it assumes knowledge of the correct chunk tag. The test corpus was tagged with the right answers for the chunks. Under normal circumstances, the corpus would be tagged with the best guess, based on the DA tags from an earlier run. However, the goal for this project was to see if, given perfect information, discourse chunking would aid DA tagging performance. The performance gains are persuasive evidence that it does. Ongoing work involves seeing how accurately a new corpus can be tagged with discourse chunks, even when the DA tags are unknown.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML