File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1200_intro.xml

Size: 9,233 bytes

Last Modified: 2025-10-06 14:02:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1200">
  <Title>Determining the Sentiment of Opinions</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Algorithm
</SectionTitle>
    <Paragraph position="0"> Given a topic and a set of texts, the system operates in four steps. First it selects sentences that contain both the topic phrase and holder candidates. Next, the holder-based regions of opinion are delimited. Then the sentence sentiment classifier calculates the polarity of all sentiment-bearing words individually. Finally, the system combines them to produce the holder's sentiment for the whole sentence.</Paragraph>
    <Paragraph position="1"> Figure 1 shows the overall system architecture.</Paragraph>
    <Paragraph position="2"> Section 2.1 describes the word sentiment classifier and Section 2.2 describes the sentence sentiment classifier.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Word Sentiment Classifier
2.1.1 Word Classification Models
</SectionTitle>
      <Paragraph position="0"> For word sentiment classification we developed two models. The basic approach is to assemble a small amount of seed words by hand, sorted by polarity into two lists--positive and negative--and then to grow this by adding words obtained from WordNet (Miller et al.</Paragraph>
      <Paragraph position="1"> 1993; Fellbaum et al. 1993). We assume synonyms of positive words are mostly positive and antonyms mostly negative, e.g., the positive word &amp;quot;good&amp;quot; has synonyms &amp;quot;virtuous, honorable, righteous&amp;quot; and antonyms &amp;quot;evil, disreputable, unrighteous&amp;quot;. Antonyms of negative words are added to the positive list, and synonyms to the negative one.</Paragraph>
      <Paragraph position="2"> To start the seed lists we selected verbs (23 positive and 21 negative) and adjectives (15 positive and 19 negative), adding nouns later.</Paragraph>
      <Paragraph position="3"> Since adjectives and verbs are structured differently in WordNet, we obtained from it synonyms and antonyms for adjectives but only synonyms for verbs. For each seed word, we extracted from WordNet its expansions and added them back into the appropriate seed lists. Using these expanded lists, we extracted an additional cycle of words from WordNet, to obtain finally 5880 positive adjectives, 6233 negative adjectives, 2840 positive verbs, and 3239 negative verbs.</Paragraph>
      <Paragraph position="4"> However, not all synonyms and antonyms could be used: some had opposite sentiment or were neutral. In addition, some common words such as &amp;quot;great&amp;quot;, &amp;quot;strong&amp;quot;, &amp;quot;take&amp;quot;, and &amp;quot;get&amp;quot; occurred many times in both positive and negative categories. This indicated the need to develop a measure of strength of sentiment polarity (the alternative was simply to discard such ambiguous words)--to determine how strongly a word is positive and also how strongly it is negative. This would enable us to discard sentiment-ambiguous words but retain those with strengths over some threshold.</Paragraph>
      <Paragraph position="5"> Armed with such a measure, we can also assign strength of sentiment polarity to as yet unseen words. Given a new word, we use WordNet again to obtain a synonym set of the unseen word to determine how it interacts with our sentiment seed lists. That is, we compute</Paragraph>
      <Paragraph position="7"> where c is a sentiment category (positive or negative), w is the unseen word, and syn n are the WordNet synonyms of w. To compute Equation (1), we tried two different models:</Paragraph>
      <Paragraph position="9"> feature (list word) of sentiment class c which is also a member of the synonym set of w, and count(f k ,synset(w)) is the total number of occurrences of f k in the synonym set of w. P(c) is the number of words in class c divided by the total number of words considered. This model derives from document classification. We used the synonym and antonym lists obtained from Wordnet instead of learning word sets from a corpus, since the former is simpler and does not require manually annotated data for training. Equation (3) shows the second model for a word sentiment classifier.</Paragraph>
      <Paragraph position="11"> To compute the probability P(w|c) of word w given a sentiment class c, we count the occurrence of w's synonyms in the list of c.</Paragraph>
      <Paragraph position="12"> The intuition is that the more synonyms occuring in c, the more likely the word belongs. We computed both positive and negative sentiment strengths for each word and compared their relative magnitudes. Table 2 shows several examples of the system output, computed with Equation (2), in which &amp;quot;+&amp;quot; represents positive category strength and &amp;quot;-&amp;quot; negative. The word &amp;quot;amusing&amp;quot;, for example, was classified as carrying primarily positive sentiment, and &amp;quot;blame&amp;quot; as primarily negative. The absolute value of each category represents the strength of its sentiment polarity. For instance, &amp;quot;afraid&amp;quot; with strength -0.99 represents strong negavitity while &amp;quot;abysmal&amp;quot; with strength -0.61 represents weaker negativity.</Paragraph>
      <Paragraph position="14"/>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Sentence Sentiment Classifier
</SectionTitle>
      <Paragraph position="0"> As shows in Table 1, combining sentiments in a sentence can be tricky. We are interested in the sentiments of the Holder about the Claim. Manual analysis showed that such sentiments can be found most reliably close to the Holder; without either Holder or Topic/Claim nearby as anchor points, even humans sometimes have trouble reliably determining the source of a sentiment. We therefore included in the algorithm steps to identify the Topic (through direct matching, since we took it as given) and any likely opinion Holders (see Section 2.2.1). Near each Holder we then identified a region in which sentiments would be considered; any sentiments outside such a region we take to be of undetermined origin and ignore (Section 2.2.2). We then defined several models for combining the sentiments expressed within a region (Section 2.2.3).</Paragraph>
      <Paragraph position="1">  We used BBN's named entity tagger IdentiFinder to identify potential holders of an opinion. We considered PERSON and ORGANIZATION as the only possible opinion holders. For sentences with more than one Holder, we chose the one closest to the Topic phrase, for simplicity. This is a very crude step. A more sophisticated approach would employ a parser to identify syntactic relationships between each Holder and all dependent expressions of sentiment.</Paragraph>
      <Paragraph position="2">  Lacking a parse of the sentence, we were faced with a dilemma: How large should a region be? We therefore defined the sentiment region in various ways (see Table 3) and experimented with their effectiveness, as reported in Section 3.</Paragraph>
      <Paragraph position="3"> Window1: full sentence Window2: words between Holder and Topic Window3: window2 +- 2 words Window4: window2 to the end of sentence  We built three models to assign a sentiment category to a given sentence, each combining the individual sentiments of sentiment-bearing words, as described above, in a different way. Model 0 simply considers the polarities of the sentiments, not the strengths: Model 0: [?] (signs in region) The intuition here is something like &amp;quot;negatives cancel one another out&amp;quot;. Here the system assigns the same sentiment to both &amp;quot;the California Supreme Court agreed that the state's new term-limit law was constitutional&amp;quot; and &amp;quot;the California Supreme Court disagreed that the state's new term-limit law was unconstitutional&amp;quot;. For this model, we also included negation words such as not and never to reverse the sentiment polarity.</Paragraph>
      <Paragraph position="4"> Model 1 is the harmonic mean (average) of the sentiment strengths in the region:</Paragraph>
      <Paragraph position="6"> Here n(c) is the number of words in the region whose sentiment category is c. If a region contains more and stronger positive than negative words, the sentiment will be positive.</Paragraph>
      <Paragraph position="7"> Model 2 is the geometric mean:</Paragraph>
      <Paragraph position="9"> The following are two example outputs.</Paragraph>
      <Paragraph position="10"> Public officials throughout California have condemned a U.S. Senate vote Thursday to exclude illegal aliens from the 1990 census, saying the action will shortchange California in Congress and possibly deprive the state of millions of dollars of federal aid for medical emergency services and other programs for poor people.</Paragraph>
      <Paragraph position="12"> SENTIMENT_POLARITY: negative For that reason and others, the Constitutional Convention unanimously rejected term limits and the First Congress soundly defeated two subsequent term-limit proposals.</Paragraph>
      <Paragraph position="14"> SENTIMENT_POLARITY: negative</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML