File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1017_intro.xml

Size: 3,881 bytes

Last Modified: 2025-10-06 14:02:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1017">
  <Title>Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Related Work
</SectionTitle>
    <Paragraph position="0"> Much of the earlier research in automated opinion detection has been performed by Wiebe and colleagues (Bruce and Wiebe, 1999; Wiebe et al., 1999; Hatzivassiloglou and Wiebe, 2000; Wiebe, 2000; Wiebe et al., 2002), who proposed methods for discriminating between subjective and objective text at the document, sentence, and phrase levels. Bruce and Wiebe (1999) annotated 1,001 sentences as subjective or objective, and Wiebe et al. (1999) described a sentence-level Naive Bayes classifier using as features the presence or absence of particular syntactic classes (pronouns, adjectives, cardinal numbers, modal verbs, adverbs), punctuation, and sentence position. Subsequently, Hatzivassiloglou and Wiebe (2000) showed that automatically detected gradable adjectives are a useful feature for subjectivity classification, while Wiebe (2000) introduced lexical features in addition to the presence/absence of syntactic categories. More recently, Wiebe et al.</Paragraph>
    <Paragraph position="1"> (2002) report on document-level subjectivity classification, using a k-nearest neighbor algorithm based on the total count of subjective words and phrases within each document.</Paragraph>
    <Paragraph position="2"> Psychological studies (Bradley and Lang, 1999) found measurable associations between words and human emotions. Hatzivassiloglou and McKeown (1997) described an unsupervised learning method for obtaining positively and negatively oriented adjectives with accuracy over 90%, and demonstrated that this semantic orientation, or polarity, is a consistent lexical property with high inter-rater agreement. Turney (2002) showed that it is possible to use only a few of those semantically oriented words (namely, &amp;quot;excellent&amp;quot; and &amp;quot;poor&amp;quot;) to label other phrases co-occuring with them as positive or negative. He then used these phrases to automatically separate positive and negative movie and product reviews, with accuracy of 66-84%. Pang et al.</Paragraph>
    <Paragraph position="3"> (2002) adopted a more direct approach, using supervised machine learning with words and n-grams as features to predict orientation at the document level with up to 83% precision.</Paragraph>
    <Paragraph position="4"> Our approach to document and sentence classification of opinions builds upon the earlier work by using extended lexical models with additional features. Unlike the work cited above, we do not rely on human annotations for training but only on weak metadata provided at the document level. Our sentence-level classifiers introduce additional criteria for detecting subjective material (opinions), including methods based on sentence similarity within a topic and an approach that relies on multiple classifiers. At the document level, our classifier uses the same document labels that the method of (Wiebe et al., 2002) does, but automatically detects the words and phrases of importance without further analysis of the text. For determining whether an opinion sentence is positive or negative, we have used seed words similar to those produced by (Hatzivassiloglou and McKeown, 1997) and extended them to construct a much larger set of semantically oriented words with a method similar to that proposed by (Turney, 2002). Our focus is on the sentence level, unlike (Pang et al., 2002) and (Turney, 2002); we employ a significantly larger set of seed words, and we explore as indicators of orientation words from syntactic classes other than adjectives (nouns, verbs, and adverbs).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML