XML Viewer - n04-4025

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/n04-4025_metho.xml
Size: 10,304 bytes
Last Modified: 2025-10-06 14:08:54
<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-4025">
  <Title>Automated Team Discourse Annotation and Performance Prediction Using LSA</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Latent Semantic Analysis
</SectionTitle>
    <Paragraph position="0"> LSA is a fully automatic corpus-based statistical method for extracting and inferring relations of expected contextual usage of words in discourse (Landauer et al., 1998).</Paragraph>
    <Paragraph position="1"> LSA has been used for a wide range of applications and for simulating knowledge representation, discourse and psycholinguistic phenomena. These approaches have included: information retrieval (Deerwester et al., 1990), and automated text analysis (Foltz, 1996). In addition, LSA has been applied to a number of NLP tasks, such as text segmentation (Choi et al., 2001).</Paragraph>
    <Paragraph position="2"> More recently Serafin et al. (2003) used LSA for dialogue act classification, finding that LSA can effectively be used for such classification and that adding features to LSA showed promise.</Paragraph>
    <Paragraph position="3"> To train LSA we added 2257 documents to the corpus UAV transcripts. These documents consisted of training documents and pre- and post-training interviews related to UAVs, resulting in a total of 22802 documents in the final corpus. For the UAV-Corpus we used a 300 dimensional semantic space.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Automatic Discourse Tagging
</SectionTitle>
    <Paragraph position="0"> Our goal was to use semantic content of team dialogues to better understand and predict team performance. The approach we focus on here is to study the dialogue on the turn level. Working within the limitations of the manual annotations, we developed an algorithm to tag transcripts automatically, resulting in some decrease in performance, but a significant savings in time and resources. null We established a lower bounds tagging performance of 0.27 by computing the tag frequency in the 12 transcripts tagged by two taggers. If all utterances were tagged with the most frequent tag, the percentage of turns tagged correctly would be 27%.</Paragraph>
    <Paragraph position="1"> Automatic Annotation with LSA. In order to test our algorithm to automatically annotate the data, we computed a &amp;quot;corrected tag&amp;quot; for all 2916 turns in the 12 team-at-mission transcripts tagged by two taggers. This was necessary due to the only moderate agreement between the taggers. We used the union of the sets of tags assigned by the taggers as the &amp;quot;corrected tag&amp;quot;. The union, rather than the intersection, was used since taggers sometimes missed relevant tags within a turn. The union of tags assigned by multiple taggers better captures all likely tag types within the turn. A disadvantage to using corrected tags is the loss of sequential tag information within individual turns.</Paragraph>
    <Paragraph position="2"> However the focus of this study was on identifying the existence of relevant discourse, not on its order within the turn.</Paragraph>
    <Paragraph position="3"> Then, for each of the 12 team-at-mission transcripts, we automatically assigned &amp;quot;most probable&amp;quot; tags to each turn, based on the corrected tags of the &amp;quot;most similar&amp;quot; turns in the other 11 team-at-missions. For a given turn, T, the algorithm proceeds as follows: Find the turns in the other 11 team-at-mission transcripts, whose vectors in the semantic space have the largest cosines, when compared with T's vector in the semantic space. We choose either the ones with the top n (usually top 10) cosines, or the ones whose cosines are above a certain threshold (usually 0.6). The corrected tags for these &amp;quot;most similar&amp;quot; turns are retrieved. The sum of the cosines for each tag that appears is computed and normalized to give a probability that the tag is the corrected tag. Finally, we determine the predicted tag by applying a cutoff (0.3 and 0.4 seem to produce the best results): all of the tags above the cutoff are chosen as the predicted tag. If no tag has a probability above the cutoff, them the single tag with the maximum probability is chosen as the predicted tag.</Paragraph>
    <Paragraph position="4"> We also computed the average cosine similarity of T to its 10 closest tags as a measure of certainty of categorization. For example, if T is not similar to any previously categorized turns, then it would have a low certainty. This permits the flagging of turns that the algorithm is not likely to tag as reliability.</Paragraph>
    <Paragraph position="5"> In order to improve our results, we considered ways to incorporate simple discourse elements into our predictions. We added two discourse features to our algorithm: for any turn with a question mark, &amp;quot;?&amp;quot;, we increased to probability that uncertainty, &amp;quot;U&amp;quot;, would be one of the tags in its predicted tag; and for any turn following a turn with a question mark, &amp;quot;?&amp;quot;, we increased to probability that response, &amp;quot;R&amp;quot;, would be one of the tags in its predicted tag.</Paragraph>
    <Paragraph position="6"> We refer to our original algorithm as LSA and our algorithm with the two discourse features added as LSA+. Using LSA+ with our two methods now performs only 11% and 15% below human-human agreement (see Table 1).</Paragraph>
    <Paragraph position="7"> We realize that training our system on tags where humans had only moderate agreement is not ideal. Our failure analyses indicated that the distinctions our algorithm has difficulty making are the same distinctions that the humans found difficult to make, so we believe that improved agreement among human annotators would result in similar improvements for our algorithm.</Paragraph>
    <Paragraph position="8"> The results suggest that we can automatically annotate team transcripts with tags. While the approach is not quite as accurate as human taggers, LSA is able to tag an hour of transcripts in under a minute. As a comparison, it can take half an hour or longer for a trained tagger to do the same task.</Paragraph>
    <Paragraph position="9"> Measuring Agreement. The C-value measures the proportion of inter-coder agreement, but does not take into account agreement by chance. In order to adjust for chance agreement we computed Cohens Kappa (Cohen 1960), as shown in Table 1.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Predicting Overall Team Performance
</SectionTitle>
    <Paragraph position="0"> Throughout the CERTT Lab UAV-STE missions a performance measure was calculated to determine each teams effectiveness at completing the mission. The performance score was a composite of objective measures including: amount of fuel/film used, number/type of photographic errors, time spent in warning and alarm states, and un-visited waypoints. This composite score ranged from 0 to 1000. The score is highly predictive of how well a team succeeded in accomplishing their mission. We used two approaches to predict these overall team performance scores: correlating the tag frequencies with the scores and by correlating entire mission transcripts with one another.</Paragraph>
    <Paragraph position="1"> Team Performance Based on Tags. We computed correlations between the team performance score and tag frequencies in each team-at-mission transcript.</Paragraph>
    <Paragraph position="2"> The tags for all 20545 utterances were first generated using the LSA+ method. The tag frequencies for each team-at-mission transcript were then computed by counting the number of times each individual tag appeared in the transcript and dividing by the total number of individual tags occurring in the transcript.</Paragraph>
    <Paragraph position="3"> Our preliminary results indicate that frequency of certain types of utterances correlate with team performance. The correlations for tags predicted by computer are shown in Table 2.</Paragraph>
    <Paragraph position="4">  useful results that can be interpreted in terms of team processes. Teams that tend to state more facts and acknowledge other team members more tend to perform better. Those that express more uncertainty and need to make more responses to each other tend to perform worse. These results are consistent with those found in Bowers et al. (1998), but were generated automatically rather than by the hand-coding done by Bowers.</Paragraph>
    <Paragraph position="5"> Team Performance Based on Whole Transcripts.</Paragraph>
    <Paragraph position="6"> Another approach to measuring content in team discourse is to analyze the transcript as a whole. Using a method similar to that used to score essays with LSA (Landauer et al. 1998), we used the transcripts to predict the team performance score. We generate the predicted team performance scores was as follows: Given a sub-set of transcripts, S, with known performance scores, and a transcript, t, with unknown performance score, we can estimate the performance score for t by computing its similarity to each transcript in S. The similarity between any two transcripts is measured by the cosine between the transcript vectors in the UAV-Corpus semantic space. To compute the estimated score for t, we take the average of the performance scores of the 10 closest transcripts in S, weighted by cosines. A holdout procedure was used in which the score for a teams transcript was predicted based on the transcripts and scores of all other teams (i.e. a teams score was only predicted by the similarity to other teams). Our results indicated that the LSA estimated performance scores correlated strongly with the actual team performance scores (r = 0.76, p &lt; 0.01), as shown in Figure 1. Thus, the results indicate that we can accurately predict the overall performance of the team (i.e. how well they fly and complete their mission) just based on an analysis of their transcript from the mission.</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
CODERS-AGREEMENT C-VALUE KAPPA
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
class="xml-element"></Paper>
Download Original XML