File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/p04-1088_concl.xml
Size: 5,245 bytes
Last Modified: 2025-10-06 13:54:08
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-1088"> <Title>FLSA: Extending Latent Semantic Analysis with features for dialogue act classification</Title> <Section position="6" start_page="0" end_page="0" type="concl"> <SectionTitle> 6 Discussion and future work </SectionTitle> <Paragraph position="0"> In this paper, we have presented a novel extension to LSA, that we have called Feature LSA. Our work is the first to show that FLSA is more effective than LSA, at least for the specific task we worked on, DA classification. In parallel, we have shown that FLSA can be effectively used to train a DA classifier. We have reached performances comparable to or better than published results on DA classification, and we have used an easily trainable method.</Paragraph> <Paragraph position="1"> FLSA also highlights the effectiveness of other dialogue related features, such as Game, to classify DAs. The drawback of features such as Game is that a dialogue system may not have them at its disposal when doing DA classification in real time. However, this problem may be circumvented. The number of different games is in general rather low (8 in CallHome Spanish, 6 in MapTask), and the game label is constant across DAs belonging to the same game. Each DA can be classified by augmenting it with each possible game label, and by choosing the most accurate match among those returned by each of these classification attempts. Further, if the system can reliably recognize the end of a game, the method just described needs to be used only for the first DA of each game. Then, the game label that gives the best result becomes the game label used for the next few DAs, until the end of the current game is detected.</Paragraph> <Paragraph position="2"> Another reason why we advocate FLSA over other approaches is that it appears to be close to human performance for DA classification, in the same way that LSA approximates well many aspects of human competence / performance (Landauer and Dumais, 1997).</Paragraph> <Paragraph position="3"> To support this claim, first, we used the k coefficient (Krippendorff, 1980; Carletta, 1996) to assess the agreement between the classification made by FLSA and the classification from the corpora -see Table 8. A general rule of thumb on how to interpret the values of k (Krippendorff, 1980) is to require a value of k [?] 0.8, with 0.67 < k < 0.8 allowing tentative conclusions to be drawn. As a whole, Table 8 shows that FLSA achieves a satisfying level of agreement with human coders. To put Table 8 in perspective, note that expert human coders achieved k = 0.83 on DA classification for MapTask, but also had available the speech source (Carletta et al., 1997).</Paragraph> <Paragraph position="4"> We also compared the confusion matrix from (Carletta et al., 1997) with the confusion matrix we obtained for our best result on MapTask (FLSA using Game + Speaker). For humans, the largest sources of confusion are between: check and queryyn; instruct and clarify; and acknowledge, reply-y and ready. Likewise, our FLSA method makes the most mistakes when distinguishing between instruct and clarify; and acknowledge, reply-y, and ready.</Paragraph> <Paragraph position="5"> Instead it performs better than humans on distinguishing check and query-yn. Thus, most of the sources of confusion for humans are the same as for FLSA.</Paragraph> <Paragraph position="6"> Future work includes further investigating how to select promising feature combinations, e.g. by using logical regression.</Paragraph> <Paragraph position="7"> We are also exploring whether FLSA can be used as the basis for semi-automatic annotation of dialogue acts, to be incorporated into MUP, an annotation tool we have developed (Glass and Di Eugenio, 2002). The problem is that large corpora are necessary to train methods based on LSA. This would seem to defeat the purpose of using FLSA as the basis for semi-automatic dialogue annotation, since, to train FLSA in a new domain, we would need a large hand annotated corpus to start with. Co-training (Blum and Mitchell, 1998) may offer a solution to this problem. In co-training, two different classifiers are initially trained on a small set of annotated data, by using different features. Afterwards, each classifier is allowed to label some unlabelled data, and picks its most confidently predicted positive and negative examples; this data is added to the annotated data. The process repeats until the desired perfomance is achieved. In our scenario, we will experiment with training two different FLSA models, or one FLSA model and a different classifier, such as a naive Bayes classifier, on a small portion of annotated data that includes features like DAs, Game, etc. We will then proceed as described on the unlabelled data.</Paragraph> <Paragraph position="8"> Finally, we have started applying FLSA to a different problem, that of judging the coherence of texts. Whereas LSA has been already successfully applied to this task (Foltz et al., 1998), the issue is whether FLSA could perform better by also taking into account those features of a text that enhance its coherence for humans, such as appropriate cue words.</Paragraph> </Section> class="xml-element"></Paper>