File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/e06-1025_intro.xml
Size: 8,653 bytes
Last Modified: 2025-10-06 14:03:17
<?xml version="1.0" standalone="yes"?> <Paper uid="E06-1025"> <Title>Determining Term Subjectivity and Term Orientation for Opinion Mining</Title> <Section position="4" start_page="193" end_page="195" type="intro"> <SectionTitle> 2 Related work </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="193" end_page="194" type="sub_section"> <SectionTitle> 2.1 Determining term orientation </SectionTitle> <Paragraph position="0"> Most previous works dealing with the properties of terms within an opinion mining perspective have focused on determining term orientation.</Paragraph> <Paragraph position="1"> Hatzivassiloglou and McKeown (1997) attempt to predict the orientation of subjective adjectives by analysing pairs of adjectives (conjoined by and,or,but,either-or,orneither-nor) extracted from a large unlabelled document set.</Paragraph> <Paragraph position="2"> The underlying intuition is that the act of conjoining adjectives is subject to linguistic constraints on the orientation of the adjectives involved; e.g.</Paragraph> <Paragraph position="3"> and usually conjoins adjectives of equal orientation, while but conjoins adjectives of opposite orientation. The authors generate a graph where terms are nodes connected by &quot;equal-orientation&quot; or &quot;opposite-orientation&quot; edges, depending on the conjunctions extracted from the document set. A clustering algorithm then partitions the graph into a Positive cluster and a Negative cluster, based on a relation of similarity induced by the edges.</Paragraph> <Paragraph position="4"> Turney and Littman (2003) determine term orientation by bootstrapping from two small sets of subjective &quot;seed&quot; terms (with the seed set for Positive containing terms such as good and nice, and the seed set for Negative containing terms such as bad and nasty). Their method is based on computing the pointwise mutual information (PMI) of the target term t with each seed term ti as a measure of their semantic association.</Paragraph> <Paragraph position="5"> Given a target term t, its orientation value O(t) (where positive value means positive orientation, and higher absolute value means stronger orientation) is given by the sum of the weights of its semantic association with the seed positive terms minus the sum of the weights of its semantic association with the seed negative terms. For computing PMI, term frequencies and co-occurrence frequencies are measured by querying a document set by means of the AltaVista search engine1 with a &quot;t&quot; query, a &quot;ti&quot; query, and a &quot;t NEARti&quot; query, and using the number of matching documents returned by the search engine as estimates of the probabilities needed for the computation of PMI.</Paragraph> <Paragraph position="6"> Kamps et al. (2004) consider instead the graph defined on adjectives by the WordNet2 synonymy relation, and determine the orientation of a target adjective t contained in the graph by comparing the lengths of (i) the shortest path between t and the seed term good, and (ii) the shortest path between t and the seed term bad: if the former is shorter than the latter, than t is deemed to be Positive, otherwise it is deemed to be Negative.</Paragraph> <Paragraph position="7"> Takamura et al. (2005) determine term orientation (for Japanese) according to a &quot;spin model&quot;, i.e. a physical model of a set of electrons each endowed with one between two possible spin directions, and where electrons propagate their spin direction to neighbouring electrons until the system reaches a stable configuration. The authors equate terms with electrons and term orientation to spin direction. They build a neighbourhood matrix connecting each pair of terms if one appears in the gloss ofthe other, and iteratively apply thespin model on the matrix until a &quot;minimum energy&quot; configuration is reached. The orientation assigned to a term then corresponds to the spin direction assigned to electrons.</Paragraph> <Paragraph position="8"> ThesystemofKimandHovy(2004) tackles orientation detection by attributing, to each term, a positivity score and a negativity score; interestingly, terms may thus be deemed to have both a positive and a negative correlation, maybe with different degrees, and some terms may be deemed to carry a stronger positive (or negative) orientation than others. Their system starts from a set of positive and negative seed terms, and expands the positive (resp. negative) seed set by adding to it the synonyms of positive (resp. negative) seed termsandtheantonyms ofnegative(resp. positive) seed terms. The system classifies then a target term t into either Positive or Negative by means of two alternative learning-free methods based on the probabilities that synonyms of t also appear in the respective expanded seed sets. A problem with this method is that it can classify only terms that share somesynonyms withtheexpanded seed sets.</Paragraph> <Paragraph position="9"> Kim and Hovy also report an evaluation of human inter-coder agreement. We compare this evaluation with our results in Section 5.</Paragraph> <Paragraph position="10"> The approach we have proposed for determining term orientation (Esuli and Sebastiani, 2005) is described in more detail in Section 3, since it will be extensively used in this paper.</Paragraph> <Paragraph position="11"> All these works evaluate the performance of the proposed algorithms by checking them against precompiled sets of Positive and Negative terms, i.e. checking how good the algorithms are at classifying a term known to be subjective into either Positive or Negative. When tested on the same benchmarks, the methods of (Esuli and Sebastiani, 2005; Turney and Littman, 2003) have performed with comparable accuracies (however, the method of (Esuli and Sebastiani, 2005) is much more efficient than the one of (Turney and Littman, 2003)), and have outperformed the method of (Hatzivassiloglou and McKeown, 1997) by a wide margin and the one by (Kamps et al., 2004) by a very wide margin. The methods described in (Hatzivassiloglou and McKeown, 1997) is also limited by the fact that it can only decide the orientation of adjectives, while the method of (Kamps et al., 2004) is further limited in that it can only work on adjectives that are present in WordNet. The methods of (Kim and Hovy, 2004; Takamura et al., 2005) are instead difficult to compare with the other ones since they were not evaluated on publicly available datasets.</Paragraph> </Section> <Section position="2" start_page="194" end_page="195" type="sub_section"> <SectionTitle> 2.2 Determining term subjectivity </SectionTitle> <Paragraph position="0"> Riloff et al. (2003) develop a method to determine whether a term has a Subjective or an Objective connotation, based on bootstrapping algorithms.</Paragraph> <Paragraph position="1"> The method identifies patterns for the extraction of subjective nouns from text, bootstrapping from a seed set of 20 terms that the authors judge to be strongly subjective and have found to have high frequency in the text collection from which the subjective nouns must be extracted. The results of this method are not easy to compare with the ones we present in this paper because of the different evaluation methodologies. While we adopt the evaluation methodology used in all of the papers reviewed so far (i.e. checking how good our system is at replicating an existing, independently motivated lexical resource), the authors do not test their method on an independently identified set of labelled terms, butonthesetoftermsthatthealgorithm itself extracts. This evaluation methodology only allows to test precision, and not accuracy tout court, since no quantification can be made of false negatives (i.e. the subjective terms that the algorithm should have spotted but has not spotted). In Section 5 this will prevent us from drawing comparisons between this method and our own.</Paragraph> <Paragraph position="2"> Baroni and Vegnaduzzo (2004) apply the PMI method, first used by Turney and Littman (2003) to determine term orientation, to determine term subjectivity. Their method uses a small set Ss of 35 adjectives, marked as subjective by human judges, toassign asubjectivity score toeachadjective to be classified. Therefore, their method, unlike our own, does not classify terms (i.e. take firm classification decisions), but ranks them according to a subjectivity score, on which they evaluate precision at various level of recall.</Paragraph> </Section> </Section> class="xml-element"></Paper>