File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/n03-2032_intro.xml
Size: 5,540 bytes
Last Modified: 2025-10-06 14:01:40
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-2032"> <Title>Latent Semantic Analysis for dialogue act classification</Title> <Section position="3" start_page="1" end_page="1" type="intro"> <SectionTitle> 3 Methods </SectionTitle> <Paragraph position="0"> We have experimented with four methods: LSA proper, which we call plain LSA; two versions of clustered LSA, in which we 'cluster' the document dimension in the Word-Document matrix; FLSA, in which we incorporate features other than words to train LSA (specifically, we used the preceding n dialogue acts).</Paragraph> <Paragraph position="1"> Plain LSA. The input to LSA is a Word-Document matrix with a row for each word, and a column for each document (for us, a document is a unit such as a sentence or paragraph tagged with a dialogue act). Cell c#28i;j#29 contains the frequency with which word</Paragraph> <Paragraph position="3"> . Clearly, this w*d matrix will be very sparse. Next, LSA applies SVD to the Word-Document matrix, obtaining a representation of each document in a k dimensional space: crucially, k is much smaller than the dimension of the original space. As a result, words that did not appear in certain documents now appear, as an estimate of their correlation to the meaning of those documents. The number of dimensions k retained by LSA is an empirical question. The results we report below are for the best k we experimented with.</Paragraph> <Paragraph position="4"> To choose the best tag for a document in the test set, we compare the vector representing the new document with the vector of each document in the training set. The tag of They should be more appropriately termed tutor moves.</Paragraph> <Paragraph position="5"> the document which has the highest cosine with our test vector is assigned to the new document.</Paragraph> <Paragraph position="6"> Clustered LSA. Instead of building the Word-Document matrix we build a Word-Tag matrix, where the columns refer to all the possible dialog act types (tags). The cell c#28i;j#29 will tell us how many times word</Paragraph> <Paragraph position="8"> . The Word-Tag matrix is w*t instead of w*d. We then apply Plain LSA to the Word-Tag matrix.</Paragraph> <Paragraph position="9"> SemiClustered LSA. In Clustered LSA we lose the distribution of words in the documents. Moreover, if the number of tags is small, such as for DIAG, SVD loses its meaning. SemiClustered LSA tries to remedy these problems. We still produce the k-dimensional space applying SVD to the Word-Document matrix. We then reduce the Word-Tag matrix to the k dimensional space using a transformation based on the SVD of the Word-Document matrix. Note that both Clustered and SemiClustered LSA are much faster at test time than plain LSA, as the test document needs to be compared only with t tag vectors, rather than with d document vectors (t#3C#3Cd).</Paragraph> <Paragraph position="10"> Feature LSA (FLSA). We add extra features to plain LSA. Specifically, we have experimented with the sequence of the previous n dialogue acts. We compute the input WordTag-Document matrix by computing the Word-Document matrix, computing the Tag-Document matrix and then concatenating them vertically to get the #28w+t#29*d final matrix. Otherwise, the method is the same as Plain LSA.</Paragraph> <Section position="1" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 4Results </SectionTitle> <Paragraph position="0"> Table 1 reports the best results we obtained for each corpus and method. In parentheses, we include the k dimension, and, for FLSA, the number of previous tags we considered.</Paragraph> <Paragraph position="1"> In all cases, we can see that Plain LSA performs much better than baseline, where baseline is computed as picking the most frequent dialogue act in each corpus. As concerns DIAG, we can also see that SemiClustered LSA improves on Plain LSA by 3%, but no other method does.</Paragraph> <Paragraph position="2"> As regards CallHome, first, the results with plain LSA are comparable to published ones, even if the comparison is not straightforward, because it is often unclear what the target classification and features used are. For example, (Ries, 1999) reports 76.2% accuracy by using neural networks augmented with the sequence of the n previous speech acts. However, (Ries, 1999) does not mention the target classification; the reported baseline appears compatible with both CallHome37 and CallHome10. The training features in (Ries, 1999) include part-of-speech (POS) tags for words, which we do not have. This may explain the higher performance. Including POS tags into our FLSA method is left for future work.</Paragraph> <Paragraph position="3"> No variation on LSA performs better than plain LSA in our CallHome experiments. In fact, clustered and semiclustered LSA perform vastly worse on the larger classification problem in CallHome37. It appears that, the smaller the corpus and target classification are, the better clustered and semiclustered LSA perform. In fact, semiclustered LSA outperforms plain LSA on DIAG.</Paragraph> <Paragraph position="4"> Our experiments with FLSA do not support the hypothesis that adding features different from words to LSA helps with classification. (Wiemer-Hastings, 2001) reports mixed results when augmenting LSA: adding POS tags did not improve performance, but adding some syntactic information did. Note that, in our experiments, adding more than one previous speech act did not help.</Paragraph> </Section> </Section> class="xml-element"></Paper>