XML Viewer - w06-0304

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-0304_evalu.xml
Size: 2,898 bytes
Last Modified: 2025-10-06 13:59:51
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0304">
  <Title>User-directed Sentiment Analysis: Visualizing the Affective Content of Documents</Title>
  <Section position="7" start_page="28" end_page="29" type="evalu">
    <SectionTitle>
5 Evaluation
</SectionTitle>
    <Paragraph position="0"> IN-SPIRE is a document visualization tool that is designed to explore the thematic content of a large collection of documents. In this paper, we have described the added functionality of exploring affect as one of the possible dimensions. As an exploratory system, it is difficult to define appropriate evaluation metric. Because the goal of our system is not to discretely bin the documents into affect categories, traditional metrics such as precision are not applicable. However, to get a sense of the coverage of our lexicon, we did compare our measurements to the hand annotations provided for the customer review dataset.</Paragraph>
    <Paragraph position="1"> The dataset had hand scores (-3-3) for each feature contained in each review. We summed these scores to discretely bin them into positive (&gt;0) or negative (&lt;0). We did this both at the feature level and the review level (by looking at the cumulative score for all the features in the review). We compared these categorizations to the scores output by our measurement tool. If a document had a higher proportion of positive words than negative, we classified it as positive, and negative if it had a higher proportion of negative words. Using a chi-square, we found that the categorizations from our system were related with the hand annotations for both the whole reviews (chi-square=33.02, df=4, p&lt;0.0001) and the individual features (chisquare=150.6, df=4, p&lt;0.0001), with actual agreement around 71% for both datasets. While this number is not in itself impressive, recall that our lexicon was built independently of the data for which is was applied. W also expect some agreement to be lost by conflating all scores into discrete bins, we expect that if we compared the numeric values of the hand annotations and our scores, we would have stronger correlations.</Paragraph>
    <Paragraph position="2"> These scores only provide an indication that the lexicon we used correlates with the hand annotations for the same data. As an exploratory system, however, a better evaluation metric would be a user study in which we get feedback on the usefulness of this capability in accomplishing a variety of analytical tasks. IN-SPIRE is currently deployed in a number of settings, both commercial and government. The added capabilities for interactively exploring affect have recently been deployed. We plan to conduct a variety of user evaluations in-situ that focus on its utility in a number of different tasks. Results of these studies will help steer the further development of this methodology.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML