XML Viewer - n06-1026

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-1026_metho.xml
Size: 22,775 bytes
Last Modified: 2025-10-06 14:10:09
<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1026">
  <Title>Identifying and Analyzing Judgment Opinions</Title>
  <Section position="3" start_page="200" end_page="204" type="metho">
    <SectionTitle>
2 Analysis of Judgment Opinions
</SectionTitle>
    <Paragraph position="0"> In this section, we first describe our methodology for detecting opinion bearing words and for identifying their valence, which is described in Section 2.1. Then, in Section 2.2, we describe our algorithm for identifying opinion holders. In Section 3, we show how to use our methodology for detecting opinions in short emails.</Paragraph>
    <Section position="1" start_page="200" end_page="201" type="sub_section">
      <SectionTitle>
2.1 Detecting Opinion-Bearing Words
and Identifying Valence
</SectionTitle>
      <Paragraph position="0"> We introduce an algorithm to classify a word as being positive, negative, or neutral classes. This classifier can be used for any set of words of interest and the resulting words with their valence tags can help in developing new applications such as a public opinion monitoring system. We define an opinion-bearing word as a word that carries a positive or negative sentiment directly such as &amp;quot;good&amp;quot;, &amp;quot;bad&amp;quot;, &amp;quot;foolish&amp;quot;, &amp;quot;virtuous&amp;quot;, etc. In other words, this is the smallest unit of opinion that can thereafter be used as a clue for sentence-level or text-level opinion detection.</Paragraph>
      <Paragraph position="1"> We treat word sentiment classification into Positive, Negative, and Neutral as a three-way classification problem instead of a two-way classification problem of Positive and Negative. By adding the third class, Neutral, we can prevent the classifier from assigning either positive or negative sentiment to weak opinion-bearing words. For example, the word &amp;quot;central&amp;quot; that Hatzivassiloglou and McKeown (1997) included as a positive adjective is not classified as positive in our system. Instead  we mark it as &amp;quot;neutral&amp;quot; since it is a weak clue for an opinion. If an unknown word has a strong relationship with the neutral class, we can therefore classify it as neutral even if it has some small connotation of Positive or Negative as well.</Paragraph>
      <Paragraph position="2"> Approach: We built a word sentiment classifier using WordNet and three sets of positive, negative, and neutral words tagged by hand. Our insight is that synonyms of positive words tend to have positive sentiment. We expanded those manually selected seed words of each sentiment class by collecting synonyms from WordNet. However, we cannot simply assume that all the synonyms of positive words are positive since most words could have synonym relationships with all three sentiment classes. This requires us to calculate the closeness of a given word to each category and determine the most probable class. The following formula describes our model for determining the category of a word:</Paragraph>
      <Paragraph position="4"> feature of class c which is also a member of the synonym set of the given word w. count(f k ,synset(w)) is the total number of occurrences of the word feature f k in the synonym set of word w. In section 4.1, we describe our manually annotated dataset which we used for seed words and for our evaluation.</Paragraph>
    </Section>
    <Section position="2" start_page="201" end_page="201" type="sub_section">
      <SectionTitle>
2.2 Identifying Opinion Holders
</SectionTitle>
      <Paragraph position="0"> Despite successes in identifying opinion expressions and subjective words/phrases (See Section 1), there has been less achievement on the factors closely related to subjectivity and polarity, such as identifying the opinion holder. However, our research indicates that without this information, it is difficult, if not impossible, to define 'opinion' accurately enough to obtain reasonable inter-annotator agreement. Since these factors co-occur and mutually reinforce each other, the question &amp;quot;Who is the holder of this opinion?&amp;quot; is as important as &amp;quot;Is this an opinion?&amp;quot; or &amp;quot;What kind of opinion is expressed here?&amp;quot;.</Paragraph>
      <Paragraph position="1"> In this section, we describe the automated identification for opinion holders. We define an opinion holder as an entity (person, organization, country, or special group of people) who expresses explicitly or implicitly the opinion contained in the sentence.</Paragraph>
      <Paragraph position="2"> Previous work that is related to opinion holder identification is (Bethard et al. 2004) who identify opinion propositions and holders. However, their opinion is restricted to propositional opinion and mostly to verbs. Another related work is (Choi et al.</Paragraph>
      <Paragraph position="3"> 2005) who use the MPQA corpus  to learn patterns of opinion sources using a graphical model and extraction pattern learning. However, they have a different task definition from ours. They define the task as identifying opinion sources (holders) given a sentence, whereas we define it as identifying opinion sources given an opinion expression in a sentence. We discussed their work in Section 1.</Paragraph>
      <Paragraph position="4"> Data: As training data, we used the MPQA corpus (Wilson and Wiebe, 2003), which contains news articles manually annotated by 5 trained annotators. They annotated 10657 sentences from 535 documents, in four different aspects: agent, expressive-subjectivity, on, and inside. Expressivesubjectivity marks words and phrases that indirectly express a private state that is defined as a term for opinions, evaluations, emotions, and speculations. The on annotation is used to mark speech events and direct expressions of private states. As for the holder, we use the agent of the selected private states or speech events. While there are many possible ways to define what opinion means, intuitively, given an opinion, it is clear what the opinion holder means. Table 1 shows an example of the annotation. In this example, we consider the expression &amp;quot;the U.S. government 'is the source of evil' in the world&amp;quot; with an expres- null dan, responding to Bush's 'axis of evil' remark, said the U.S. government 'is the source of evil' in the world.</Paragraph>
      <Paragraph position="5"> Expressive subjectivity the U.S. government 'is the source of evil' in the world</Paragraph>
    </Section>
    <Section position="3" start_page="201" end_page="204" type="sub_section">
      <SectionTitle>
Strength Extreme
Source Iraqi Vice President Taha Yassin Ramadan
</SectionTitle>
      <Paragraph position="0"> sive-subjectivity tag as an opinion of the holder &amp;quot;Iraqi Vice President Taha Yassin Ramadan&amp;quot;.</Paragraph>
      <Paragraph position="1"> Approach: Since more than one opinion may be expressed in a sentence, we have to find an opinion holder for each opinion expression. For example, in a sentence &amp;quot;A thinks B's criticism of T is wrong&amp;quot;, B is the holder of &amp;quot;the criticism of T&amp;quot;, whereas A is the person who has an opinion that B's criticism is wrong. Therefore, we define our task as finding an opinion holder, given an opinion expression. Our earlier work (ref suppressed) focused on identifying opinion expressions within text. We employ that system in tandem with the one described here.</Paragraph>
      <Paragraph position="2"> To learn opinion holders automatically, we use a Maximum Entropy model. Maximum Entropy models implement the intuition that the best model is the one that is consistent with the set of constraints imposed by the evidence but otherwise is as uniform as possible (Berger et al. 1996). There are two ways to model the problem with ME: classification and ranking. Classification allocates each holder candidate to one of a set of predefined classes while ranking selects a single candidate as answer. This means that classification modeling  can select many candidates as answers as long as they are marked as true, and does not select any candidate if every one is marked as false. In contrast, ranking always selects the most probable candidate as an answer, which suits our task better. Our earlier experiments showed poor performance with classification modeling, an experience also reported for Question Answering (Ravichandran et al. 2003).</Paragraph>
      <Paragraph position="3"> We modeled the problem to choose the most probable candidate that maximizes a given conditional probability distribution, given a set of holder  l is a model parameter indicating the weight of its feature function.</Paragraph>
      <Paragraph position="4">  In our task, there are two classes: holder and non-holder. Figure 1 illustrates our holder identification system. First, the system generates all possible holder candidates, given a sentence and an opinion expression &lt;E&gt;. After parsing the sentence, it extracts features such as the syntactic path information between each candidate &lt;H&gt; and the expression &lt;E&gt; and a distance between &lt;H&gt; and &lt;E&gt;. Then it ranks holder candidates according to the score obtained by the ME ranking model. Finally the system picks the candidate with the highest score. Below, we describe in turn how to select holder candidates and how to select features for the training model.</Paragraph>
      <Paragraph position="5"> Holder Candidate Selection: Intuitively, one would expect most opinion holders to be named entities (PERSON or ORGANIZATION)  . However, other common noun phrases can often be opinion holders, such as &amp;quot;the leader&amp;quot;, &amp;quot;three nations&amp;quot;, and &amp;quot;the Arab and Islamic world&amp;quot;. Sometimes, pronouns like he, she, and they that refer to a PERSON, or it that refers to an ORGANIZATION or country, can be an opinion holder. In our study, we consider all noun phrases, including common noun phrases, named entities, and pronouns, as holder candidates. Feature Selection: Our hypothesis is that there exists a structural relation between a holder &lt;H&gt; and an expression &lt;E&gt; that can help to identify opinion holders. This relation may be represented by lexical-level patterns between &lt;H&gt; and &lt;E&gt;, but anchoring on surface words might run into the data sparseness problem. For example, if we see the lexical pattern &amp;quot;&lt;H&gt; recently criticized &lt;E&gt;&amp;quot; in the training data, it is impossible to match the expression &amp;quot;&lt;H&gt; yesterday condemned &lt;E&gt;&amp;quot;. These, however, have the same syntactic features in our  We use BBN's named entity tagger IdentiFinder to collect named entities.</Paragraph>
      <Paragraph position="6">  model. We therefore selected structural features from a deep parse, using the Charniak parser.</Paragraph>
      <Paragraph position="7"> After parsing the sentence, we search for the lowest common parent node of the words in &lt;H&gt; and &lt;E&gt; respectively (&lt;H&gt; and &lt;E&gt; are mostly expressed with multiple words). A lowest common parent node is a non-terminal node in a parse tree that covers all the words in &lt;H&gt; and &lt;E&gt;. Figure 2 shows a parsed example of a sentence with the holder &amp;quot;China's official Xinhua news agency&amp;quot; and the opinion expression &amp;quot;accusing&amp;quot;. In this example, the lowest common parent of words in &lt;H&gt; is the bold NP and the lowest common parent of &lt;E&gt; is the bold VBG. We name these nodes Hhead and Ehead respectively. After finding these nodes, we label them by subscript (e.g., NP</Paragraph>
      <Paragraph position="9"> indicate they cover &lt;H&gt; and &lt;E&gt;. In order to see how Hhead and Ehead are related to each other in the parse tree, we define another node, HEhead, which covers both Hhead and Ehead. In the example, HEhead is S at the top of the parse tree since it</Paragraph>
      <Paragraph position="11"> To express tree structure for ME training, we extract path information between &lt;H&gt; and &lt;E&gt;. In the example, the complete path from Hhead to Ehead is &amp;quot;&lt;H&gt; NP S VP S S VP VBG &lt;E&gt;&amp;quot;. However, representing each complete path as a single feature produces so many different paths with low frequencies that the ME system would learn poorly. Therefore, we split the path into three parts: HEpath, Hpath an Epath. HEpath is defined as a path from HEhead to its left and right child nodes that are also parents of Hhead and Ehead.</Paragraph>
      <Paragraph position="12"> Hpath is a path from Hhead and one of its ancestor nodes that is a child of HEhead. Similarly, Epath is defined as a path from Ehead to one of its ancestors that is also a child of HEhead. With this splitting, the system can work when any of HEpath, Hpath or Epath appeared in the training data, even if the entire path from &lt;H&gt; to &lt;E&gt; is unseen. Table 2 summarizes these concepts with two holder candidate examples in the parse tree of Figure 2.</Paragraph>
      <Paragraph position="13"> We also include two non-structural features. The first is the type of the candidate, with values NP, PERSON, ORGANIZATION, and LOCATION. The second feature is the distance between &lt;H&gt; and &lt;E&gt;, counted in parse tree words. This is motivated by the intuition that holder candidates tend to lie closer to their opinion expression. All features are listed in Table 3. We describe the performance of the system in Section 4.</Paragraph>
    </Section>
    <Section position="4" start_page="204" end_page="204" type="sub_section">
      <SectionTitle>
Emails
</SectionTitle>
      <Paragraph position="0"> In this section, we describe a German email analysis system into which we included the opinion-bearing words from Section 2.1 to detect opinions expressed in emails. This system is part of a collaboration with the EU-funded project QUALEG (Quality of Service and Legitimacy in eGovernment) which aims at enabling local governments to manage their policies in a transparent and trustable way  . For this purpose, local governments should be able to measure the performance of the services they offer, by assessing the satisfaction of its citizens. This need makes a system that can monitor and analyze citizens' emails essential. The goal of our system is to classify emails as neutral or as bearing a positive or negative opinion.</Paragraph>
      <Paragraph position="1"> To generate opinion bearing words, we ran the word sentiment classifier from Section 2.1 on 8011 verbs to classify them into 807 positive, 785 negative, and 6149 neutral. For 19748 adjectives, the system classified them into 3254 positive, 303 negative, and 16191 neutral. Since our opinion-bearing words are in English and our target system is in German, we also applied a statistical word alignment technique, GIZA++  (Och and Ney 2000). Running it on version two of the European Parliament corpus, we obtained statistics for 678,340 German-English word pairs and 577,362 English-German word pairs. Obtaining these two lists of translation pairs allows us to convert English words to German, and German to English, without a full document translation system. To utilize our English opinion-bearing words in a German opinion analysis system, we developed two models,  http://www.fjoch.com/GIZA++.html outlined in Table 4, each of which is triggered at different points in the system.</Paragraph>
      <Paragraph position="2"> In both models, however, we still need to decide how to apply opinion-bearing words as clues to determine the sentiment of a whole email. Our previous work on sentence level sentiment classification (ref suppressed) shows that the presence of any negative words is a reasonable indication of a negative sentence. Since our emails are mostly short (the average number of words in each email is 19.2) and we avoided collecting weak negative opinion clue words, we hypothesize that our previous sentence sentiment classification study works on the email sentiment analysis. This implies that an email is negative if it contains more than certain number of strong negative words. We tune this parameter using our training data. Conversely, if an email contains mostly positive opinion-bearing words, we classify it as a positive email. We assign neutral if an email does not contain any strong opinion-bearing words.</Paragraph>
      <Paragraph position="3"> Manually annotated email data was provided by our joint research site. This data contains 71 emails from citizens regarding a German festival. 26 of them contained negative complaints, for example, the lack of parking space, and 24 of them were positive with complimentary comments to the organization. The rest of them were marked as &amp;quot;questions&amp;quot; such as how to buy festival tickets, &amp;quot;only text&amp;quot; of simple comments, &amp;quot;fuzzy&amp;quot;, and &amp;quot;difficult&amp;quot;. So, we carried system experiments on positive and negative emails with precision and recall. We report system results in Section 4.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="204" end_page="206" type="metho">
    <SectionTitle>
4 Experiment Results
</SectionTitle>
    <Paragraph position="0"> In this section, we evaluate the three systems described in Sections 2 and 3: detecting opinion-bearing words and identifying valence, identifying opinion holders, and the German email opinion analysis system.</Paragraph>
    <Section position="1" start_page="204" end_page="205" type="sub_section">
      <SectionTitle>
4.1 Detecting Opinion-bearing Words
</SectionTitle>
      <Paragraph position="0"> We described a word classification system to detect opinion-bearing words in Section 2.1. To examine its effectiveness, we annotated 2011 verbs and 1860 adjectives, which served as a gold standard null  . These words were randomly selected from a  Although nouns and adverbs may also be opinion-bearing, we focus only on verbs and adjectives for this study.  collection of 8011 English verbs and 19748 English adjectives. We use training data as seed words for the WordNet expansion part of our algorithm (described in Section 2.1). Table 5 shows the distribution of each semantic class. In both verb and adjective annotation, neutral class has much more words than the positive or negative classes. We measured the precision, recall, and F-score of our system using 10-fold cross validation. Table 6 shows the results with 95% confidence bounds. Overall (combining positive, neutral and negative), our system achieved 77.7% +- 1.2% accuracy on verbs and 69.1% +- 2.1% accuracy on adjectives. The system has very high precision in the neutral category for both verbs (97.2%) and adjectives (89.5%), which we interpret to mean that our system is really good at filtering non-opinion bearing words. Recall is high in all cases but precision varies; very high for neutral and relatively high for negative but low for positive.</Paragraph>
    </Section>
    <Section position="2" start_page="205" end_page="206" type="sub_section">
      <SectionTitle>
4.2 Opinion Holder Identification
</SectionTitle>
      <Paragraph position="0"> We conducted experiments on 2822 &lt;sentence; opinion expression; holder&gt; triples and divided the data set into 10 &lt;training; test&gt; sets for cross validation. For evaluation, we consider to match either fully or partially with the holder marked in the test data. The holder matches fully if it is a single entity (e.g., &amp;quot;Bush&amp;quot;). The holder matches partially when it is part of the multiple entities that make up the marked holder. For example, given a marked holder &amp;quot;Michel Sidibe, Director of the Country and Regional Support Department of UNAIDS&amp;quot;, we consider both &amp;quot;Michel Sidibe&amp;quot; and &amp;quot;Director of the Country and Regional Support Department of UNAIDS&amp;quot; as acceptable answers.</Paragraph>
      <Paragraph position="1"> Our experiments consist of two parts based on the candidate selection method. Besides the selection method we described in Section 2.2, we also conducted a separate experiment by excluding pronouns from the candidate list. With the second method, the system always produces a non-pronoun holder as an answer. This selection method is useful in some Information Extraction application that only cares non-pronoun holders.</Paragraph>
      <Paragraph position="2"> We report accuracy (the percentage of correct answers the system found in the test set) to evaluate our system. We also report how many correct answers were found within the top2 and top3 system answers. Tables 7 and 8 show the system accuracy with and without considering pronouns as alias candidates, respectively. Table 8 mostly shows lower accuracies than Table 7 because test data often has only a non-pronoun entity as a holder and the system picks a pronoun as its answer. Even if the pronoun refers the same entity marked in the test data, the evaluation system counts it as wrong because it does not match the hand annotated holder.</Paragraph>
      <Paragraph position="3"> To evaluate the effectiveness of our system, we set the baseline as a system choosing the closest candidate to the expression as a holder without the Maximum Entropy decision. The baseline system had an accuracy of only 21.3% for candidate selection over all noun phrases and 23.2% for candidate selection excluding pronouns.</Paragraph>
      <Paragraph position="4"> The results show that detecting opinion holders is a hard problem, but adopting syntactic features (F2, F3, and F4) helps to improve the system. A promising avenue of future work is to investigate the use of semantic features to eliminate noun  noun phrases as candidates)  phrases such as &amp;quot;cheap energy subsidies&amp;quot; or &amp;quot;possible strikes&amp;quot; from the candidate set before we run our ME model, since they are less likely to be an opinion holder than noun phrases like &amp;quot;three nations&amp;quot; or &amp;quot;Palestine people.&amp;quot;</Paragraph>
    </Section>
    <Section position="3" start_page="206" end_page="206" type="sub_section">
      <SectionTitle>
4.3 German Emails
</SectionTitle>
      <Paragraph position="0"> For our experiment, we performed 7-fold cross validation on a set of 71 emails. Table 9 shows the average precision, recall, and F-score. Results show that our system identifies negative emails (complaints) better than praise. When we chose a system parameter for the focus, we intended to find negative emails rather than positive emails because officials who receive these emails need to act to solve problems when people complain but they have less need to react to compliments. By highlighting high recall of negative emails, we may misclassify a neutral email as negative but there is also less chance to neglect complaints.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML