File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/p98-1103_evalu.xml

Size: 4,443 bytes

Last Modified: 2025-10-06 14:00:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1103">
  <Title>Context Management with Topics for Spoken Dialogue Systems</Title>
  <Section position="6" start_page="634" end_page="635" type="evalu">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"> We tested the Predict-Support algorithm using cross-validation on our corpus. The accuracy results of the first predictions are given in Table 4. PP is the corpus perplexity which represents the average branching factor of the corpus, or the number of alternatives from which to choose the correct label at a given point.</Paragraph>
    <Paragraph position="1"> For the pruned topic types, we reserved 10 randomly picked dialogues for testing (each test file contained about 400-500 test utterances), and used the other 70 dialogues for training in each test cycle.</Paragraph>
    <Paragraph position="2"> The average accuracy rate, 78.68 % is a satisfactory result. We also did another set of cross-validation tests using 75 dialogues for training and 5 dialogues for testing, and as expected, a bigger training corpus gives better recognition results when perplexity stays the same.</Paragraph>
    <Paragraph position="3"> To compare how much difference a bigger number of topic tags makes to the results, we conducted cross-validation tests with the original 62 topic types. A finer set of topic tags does worsen  the accuracy, but not as much as we expected: the Support-part of the algorithm effectively remedies prediction inaccuracies.</Paragraph>
    <Paragraph position="4"> Since the same corpus is also tagged with speech acts, we conducted similar cross-validation tests with speech act labels. The recognition rates are worse than those of the 62 topic types, although perplexity is almost the same. We believe that this is because speech acts ignore the actual content of the utterance. Although our speech act labels are surface-oriented, they correlate with only a few fixed phrases (I would like to; please), and are thus less suitable to convey the semantic focus of the utterances, expressed by the content words than topics, which by definition deal with the content.</Paragraph>
    <Paragraph position="5"> As the lower-bound experiments we conducted cross-validation tests using the trigram backoffmodel, i.e. relying only on the context which records the history of topic types. For the first ranked predictions the accuracy rate is about 40%, which is on the same level as the first ranked speech act predictions reported in Reithinger and Mater (1995).</Paragraph>
    <Paragraph position="6"> The average precision of the Predict-Support algorithm is also calculated (Table 5). Precision is the ratio of correctly assigned tags to the total number of assigned tags. The average precision for all the pruned topic types is 74.64%, varying from 95.63% for ROOM to 37.63% for MIx. If MIx is left out, the average precision is 79.27%. The poor precision for MIX is due to the unknown word problem with mutual information.</Paragraph>
    <Paragraph position="7">  The results of the topic recognition show that the model performs well, and we notice a considerable improvement in the accuracy rates compared to accuracy rates in speech act recognition cited in section 2 (modulo perplexity). Although the rates are somewhat optimistic as we used transcribed dialogues (= the correct recognizer output), we can still safely conclude that topic information provides a promising starting point in attempts to provide an accurate context for the spoken dialogue systems. This can be further verified in the perplexity measures for the word recognition: compared to a general language model trained on non-tagged dialogues, perplexity decreases by 20 % for a language model which is trained on topic-dependent dialogues, and by 14 % if we use an open test with unknown words included as well (Jokinen and Morimoto, 1997).</Paragraph>
    <Paragraph position="8"> At the end we have to make a remark concerning the relevance of speech acts: our argumentation is not meant to underestimate their use for other purposes in dialogue modelling, but rather, to emphasise the role of topic information in successful context management: in our opinion the topics provide a more reliable and straighforward approximation of the utterance meaning than speech acts, and should not be ignored in the definition of context models for spoken dialogue systems.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML