File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/p05-2014_evalu.xml
Size: 4,515 bytes
Last Modified: 2025-10-06 13:59:27
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-2014"> <Title>Dialogue Act Tagging for Instant Messaging Chat Sessions</Title> <Section position="7" start_page="82" end_page="83" type="evalu"> <SectionTitle> 6 Discussion and Future Research </SectionTitle> <Paragraph position="0"> As indicated by the Kappa statistics in SS3.1, labelling utterances with dialogue acts can sometimes be a subjective task. Moreover, there are many possible tag sets to choose from. These two factors make it difficult to accurately compare various tagging methods and is one reason why Kappa statistics and perplexity measures are useful. The work presented in this paper shows that using even the relatively simple bag-of-words approach with a naive Bayes classifier can produce very good results.</Paragraph> <Paragraph position="1"> One important area not tackled by this experiment was that of utterance boundary detection. Multiple utterances are often sent in one message, sometimes in one sentence, and each utterance must be tagged.</Paragraph> <Paragraph position="2"> Approximately 40% of the messages in our corpus have more than one utterance per message. Utterances were manually marked in this experiment as the study was focussed only on dialogue act classification given a sequence of utterances. It is rare, however, to be given text that is already segmented into utterances, so some work will be required to accomplish this segmentation before automated dialogue act tagging can commence. Therefore, utterance boundary detection is an important area for further research.</Paragraph> <Paragraph position="3"> The methods used to detect dialogue acts presented here do not take into account sentential structure. The sentences in (1) would thus be treated equally with the bag-of-words approach.</Paragraph> <Paragraph position="4"> (1) a. john has been to london b. has john been to london Without the punctuation (as is often the case with informal typed dialogue) the bag-of-words approach will not differentiate the sentences, whereas if we look at the ordering of even the first two words we can see that &quot;john has ...&quot; is likely to be a STATEMENT whereas &quot;has john ...&quot; would be a question. It would be interesting to research other types of features such as phrase structure or even looking at the order of the first x words and the parts of speech of an utterance to determine its dialogue act.</Paragraph> <Paragraph position="5"> Aspects of dialogue macrogame theory (DMT) (Mann, 2002) may help to increase tagging accuracy. In DMT, sets of utterances are grouped together to form a game. Games may be nested as in the following example: A: May I know the price range please? B: In which currency? A: $US please B: 200-300 Here, B has nested a clarification question which was required before providing the price range. The bigram model presented in this paper will incorrectly capture this interaction as the sequence YES-NO-QUESTION, OPEN-QUESTION, STATEMENT, STATEMENT, whereas DMT would be able to extract the nested question resulting in the correct pairs of question and answer sequences.</Paragraph> <Paragraph position="6"> Although other studies have attempted to automatically tag utterances with dialogue acts (Stolcke et al., 2000; Jurafsky et al., 1997; Kita et al., 1996) it is difficult to fairly compare results because the data was significantly different (transcribed spoken dialogue versus typed dialogue) and the dialogue acts were also different ranging from a set of 9 (Kita et al., 1996) to 42 (Stolcke et al., 2000). It may be possible to use a standard set of dialogue acts for a particular domain, but inventing a set that could be used for all domains seems unlikely. This is primarily due to differing needs in various applications. A superset of dialogue acts that covers all domains would necessarily be a large number of tags (at least the 42 identified by Stolcke et al. (2000)) with many tags not being appropriate for other domains.</Paragraph> <Paragraph position="7"> The best result from our dialogue act classifier was obtained using a bigram discourse model resulting in an average tagging accuracy of 81.6% (see Table 3). Although this is higher than the results from 13 recent studies presented by Stolcke et al. (2000) with accuracy ranging from [?] 40% to 81.2%, the tasks, data, and tag sets used were all quite different, so any comparison should be used as only a guideline. null</Paragraph> </Section> class="xml-element"></Paper>