XML Viewer - j04-3002

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/j04-3002_concl.xml
Size: 4,865 bytes
Last Modified: 2025-10-06 13:53:56
<?xml version="1.0" standalone="yes"?>
<Paper uid="J04-3002">
  <Title>at Asheville</Title>
  <Section position="6" start_page="303" end_page="305" type="concl">
    <SectionTitle>
6. Conclusions
</SectionTitle>
    <Paragraph position="0"> Knowledge of subjective language promises to be beneficial for many NLP applications including information extraction, question answering, text categorization, and  Computational Linguistics Volume 30, Number 3 summarization. This article has presented the results of an empirical study in acquiring knowledge of subjective language from corpora in which a number of feature types were learned and evaluated on different types of data with positive results. null We showed that unique words are subjective more often than expected and that unique words are valuable clues to subjectivity. We also presented a procedure for automatically identifying potentially subjective collocations, including fixed collocations and collocations with placeholders for unique words. In addition, we used the results of a method for clustering words according to distributional similarity (Lin 1998) to identify adjectival and verbal clues of subjectivity.</Paragraph>
    <Paragraph position="1"> Table 9 summarizes the results of testing all of the above types of PSEs. All show increased precision in the evaluations. Together, they show consistency in performance. In almost all cases they perform better or worse on the same data sets, despite the fact that different kinds of data and procedures are used to learn them. In addition, PSEs learned using expression-level subjective-element data have precisions higher than baseline on document-level opinion piece data, and vice versa.</Paragraph>
    <Paragraph position="2"> Having a large stable of PSEs, it was important to disambiguate whether or not PSE instances are subjective in the contexts in which they appear. We discovered that the density of other potentially subjective expressions in the surrounding context is important. If a clue is surrounded by a sufficient number of other clues, then it is more likely to be subjective than if there were not. Parameter values were selected using training data manually annotated at the expression level for subjective elements and then tested on data annotated at the document level for opinion pieces. All of the selected parameters led to increases in precision on the test data, and most lead to increases over 100%. Once again we found consistency between expression-level and document-level annotations. PSE sets defined by density have high precision in both the subjective-element data and the opinion piece data. The large differences between training and testing suggest that our results are not brittle.</Paragraph>
    <Paragraph position="3"> Using a density feature selected from a training set, sentences containing high-density PSEs were extracted from a separate test set, and manually annotated by two judges. Fully 93% of the sentences extracted were found to be subjective or to be near subjective sentences. Admittedly, the chosen density feature is a high-precision, low-frequency one. But since the process is fully automatic, the feature could be applied to more unannotated text to identify regions containing subjective sentences. In addition, because the precision and frequency of the density features are stable across data sets, lower-precision but higher-frequency options are available.</Paragraph>
    <Paragraph position="4"> Finally, the value of the various types of PSEs was demonstrated with the task of opinion piece classification. Using the k-nearest-neighbor classification algorithm with leave-one-out cross-validation, a classification accuracy of 94% was achieved on a large test set, with a reduction in error of 28% from the baseline.</Paragraph>
    <Paragraph position="5"> Future work is required to determine how to exploit density features to improve the performance of text categorization algorithms. Another area of future work is searching for clues to objectivity, such as the politeness features used by Spertus (1997). Still another is identifying the type of a subjective expression (e.g., positive or negative evaluative), extending work such as Hatzivassiloglou and McKeown (1997) on classifying lexemes to the classification of instances in context (compare, e.g., &amp;quot;great!&amp;quot; and &amp;quot;oh great.&amp;quot;) In addition, it would be illuminating to apply our system to data annotated with discourse trees (Carlson, Marcu, and Okurowski 2001). We hypothesize that most objective sentences identified by our system are dominated in the discourse by subjective sentences and that we are moving toward identifying subjective discourse segments.</Paragraph>
    <Paragraph position="6">  Wiebe, Wilson, Bruce, Bell, and Martin Learning Subjective Language</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML