File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/p97-1023_metho.xml
Size: 15,099 bytes
Last Modified: 2025-10-06 14:14:38
<?xml version="1.0" standalone="yes"?> <Paper uid="P97-1023"> <Title>Predicting the Semantic Orientation of Adjectives</Title> <Section position="4" start_page="174" end_page="175" type="metho"> <SectionTitle> 3 Data Collection </SectionTitle> <Paragraph position="0"> For our experiments, we use the 21 million word 1987 Wall Street Journal corpus 4, automatically annotated with part-of-speech tags using the PARTS tagger (Church, 1988).</Paragraph> <Paragraph position="1"> In order to verify our hypothesis about the orientations of conjoined adjectives, and also to train and evaluate our subsequent algorithms, we need a 3Certain words inflected with negative affixes (such as in- or un-) tend to be mostly negative, but this rule applies only to a fraction of the negative words. Furthermore, there are words so inflected which have positive orientation, e.g., independent and unbiased.</Paragraph> <Paragraph position="2"> 4Available form the ACL Data Collection Initiative as CD ROM 1.</Paragraph> <Paragraph position="3"> Positive: adequate central clever famous intelligent remarkable reputed sensitive slender thriving Negative: contagious drunken ignorant lanky and negative orientations.</Paragraph> <Paragraph position="4"> set of adjectives with predetermined orientation labels. We constructed this set by taking all adjectives appearing in our corpus 20 times or more, then removing adjectives that have no orientation. These are typically members of groups of complementary, qualitative terms (Lyons, 1977), e.g., domestic or medical.</Paragraph> <Paragraph position="5"> We then assigned an orientation label (either + or -) to each adjective, using an evaluative approach. The criterion was whether the use of this adjective ascribes in general a positive or negative quality to the modified item, making it better or worse than a similar unmodified item. We were unable to reach a unique label out of context for several adjectives which we removed from consideration; for example, cheap is positive if it is used as a synonym of inexpensive, but negative if it implies inferior quality. The operations of selecting adjectives and assigning labels were performed before testing our conjunction hypothesis or implementing any other algorithms, to avoid any influence on our labels. The final set contained 1,336 adjectives (657 positive and 679 negative terms). Figure 1 shows randomly selected terms from this set.</Paragraph> <Paragraph position="6"> To further validate our set of labeled adjectives, we subsequently asked four people to independently label a randomly drawn sample of 500 of these adjectives. They agreed with us that the positive/negative concept applies to 89.15% of these adjectives on average. For the adjectives where a positive or negative label was assigned by both us and the independent evaluators, the average agreement on the label was 97.38%. The average inter-reviewer agreement on labeled adjectives was 96.97%. These results are extremely significant statistically and compare favorably with validation studies performed for other tasks (e.g., sense disambiguation) in the past. They show that positive and negative orientation are objective properties that can be reliably determined by humans.</Paragraph> <Paragraph position="7"> To extract conjunctions between adjectives, we used a two-level finite-state grammar, which covers complex modification patterns and noun-adjective apposition. Running this parser on the 21 million word corpus, we collected 13,426 conjunctions of adjectives, expanding to a total of 15,431 conjoined adjective pairs. After morphological trans- null extreme results would have been obtained if same- and different-orientation conjunction types were equally distributed.</Paragraph> <Paragraph position="8"> or more actually formations, the remaining 15,048 conjunction tokens involve 9,296 distinct pairs of conjoined adjectives (types). Each conjunction token is classified by the parser according to three variables: the conjunction used (and, or, bu~, either-or, or neither-nor), the type of modification (attributive, predicative, appositive, resultative), and the number of the modified noun (singular or plural).</Paragraph> </Section> <Section position="5" start_page="175" end_page="175" type="metho"> <SectionTitle> 4 Validation of the Conjunction </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="175" end_page="175" type="sub_section"> <SectionTitle> Hypothesis </SectionTitle> <Paragraph position="0"> Using the three attributes extracted by the parser, we constructed a cross-classification of the conjunctions in a three-way table. We counted types and tokens of each conjoined pair that had both members in the set of pre-selected labeled adjectives discussed above; 2,748 (29.56%) of all conjoined pairs (types) and 4,024 (26.74%) of all conjunction occurrences (tokens) met this criterion. We augmented this table with marginal totals, arriving at 90 categories, each of which represents a triplet of attribute values, possibly with one or more &quot;don't care&quot; elements.</Paragraph> <Paragraph position="1"> We then measured the percentage of conjunctions in each category with adjectives of same or different orientations. Under the null hypothesis of same proportions of adjective pairs (types) of same and different orientation in a given category, the number of same- or different-orientation pairs follows a binomial distribution with p = 0.5 (Conover, 1980).</Paragraph> <Paragraph position="2"> We show in Table 1 the results for several representative categories, and summarize all results below: null * Our conjunction hypothesis is validated overall and for almost all individual cases. The results are extremely significant statistically, except for a few cases where the sample is small.</Paragraph> <Paragraph position="3"> * Aside from the use of but with adjectives of different orientations, there are, rather surprisingly, small differences in the behavior of conjunctions between linguistic environments (as represented by the three attributes). There are a few exceptions, e.g., appositive and conjunctions modifying plural nouns are evenly split between same and different orientation. But in these exceptional cases the sample is very small, and the observed behavior may be due to chance.</Paragraph> <Paragraph position="4"> * Further analysis of different-orientation pairs in conjunctions other than but shows that conjoined antonyms are far more frequent than expected by chance, in agreement with (Justeson and Katz, 1991).</Paragraph> </Section> </Section> <Section position="6" start_page="175" end_page="176" type="metho"> <SectionTitle> 5 Prediction of Link Type </SectionTitle> <Paragraph position="0"> The analysis in the previous section suggests a base-line method for classifying links between adjectives: since 77.84% of all links from conjunctions indicate same orientation, we can achieve this level of performance by always guessing that a link is of the same-orientation type. However, we can improve performance by noting that conjunctions using but exhibit the opposite pattern, usually involving adjectives of different orientations. Thus, a revised but still simple rule predicts a different-orientation link if the two adjectives have been seen in a but conjunction, and a same-orientation link otherwise, assuming the two adjectives were seen connected by at least one conjunction.</Paragraph> <Paragraph position="1"> Morphological relationships between adjectives also play a role. Adjectives related in form (e.g., adequate-inadequate or thoughtful-thoughtless) almost always have different semantic orientations. We implemented a morphological analyzer which matches adjectives related in this manner. This process is highly accurate, but unfortunately does not apply to many of the possible pairs: in our set of 1,336 labeled adjectives (891,780 possible pairs), 102 pairs are morphologically related; among them, 99 are of different orientation, yielding 97.06% accuracy for the morphology method. This information is orthogonal to that extracted from conjunctions: only 12 of the 102 morphologically related pairs have been observed in conjunctions in our corpus. Thus, we add to the predictions made from conjunctions the different-orientation links suggested by morphological relationships.</Paragraph> <Paragraph position="2"> We improve the accuracy of classifying links derived from conjunctions as same or different orientation with a log-linear regression model (Santner and Duffy, 1989), exploiting the differences between the various conjunction categories. This is a generalized linear model (McCullagh and Nelder, 1989) with a linear predictor = wWx where x is the vector of the observed counts in the various conjunction categories for the particular adjective pair we try to classify and w is a vector of weights to be learned during training. The response y is non-linearly related to r/ through the inverse logit function,</Paragraph> <Paragraph position="4"> Note that y E (0, 1), with each of these endpoints associated with one of the possible outcomes.</Paragraph> <Paragraph position="5"> We have 90 possible predictor variables, 42 of which are linearly independent. Since using all the 42 independent predictors invites overfitting (Duda and Hart, 1973), we have investigated subsets of the full log-linear model for our data using the method of iterative stepwise refinement: starting with an initial model, variables are added or dropped if their contribution to the reduction or increase of the residual deviance compares favorably to the resulting loss or gain of residual degrees of freedom. This process led to the selection of nine predictor variables.</Paragraph> <Paragraph position="6"> We evaluated the three prediction models discussed above with and without the secondary source of morphology relations. For the log-linear model, we repeatedly partitioned our data into equally sized training and testing sets, estimated the weights on the training set, and scored the model's performance on the testing set, averaging the resulting scores. 5 Table 2 shows the results of these analyses. Although the log-linear model offers only a small improvement on pair classification than the simpler but prediction rule, it confers the important advantage 5When morphology is to be used as a supplementary predictor, we remove the morphologically related pairs from the training and testing sets.</Paragraph> <Paragraph position="7"> of rating each prediction between 0 and 1. We make extensive use of this in the next phase of our algorithm. null</Paragraph> </Section> <Section position="7" start_page="176" end_page="177" type="metho"> <SectionTitle> 6 Finding Groups of Same-Oriented Adjectives </SectionTitle> <Paragraph position="0"> The third phase of our method assigns the adjectives into groups, placing adjectives of the same (but unknown) orientation in the same group. Each pair of adjectives has an associated dissimilarity value between 0 and 1; adjectives connected by same-orientation links have low dissimilarities, and conversely, different-orientation links result in high dissimilarities. Adjective pairs with no connecting links are assigned the neutral dissimilarity 0.5.</Paragraph> <Paragraph position="1"> The baseline and but methods make qualitative distinctions only (i.e., same-orientation, differentorientation, or unknown); for them, we define dissimilarity for same-orientation links as one minus the probability that such a classification link is correct and dissimilarity for different-orientation links as the probability that such a classification is correct. These probabilities are estimated from separate training data. Note that for these prediction models, dissimilarities are identical for similarly classifted links.</Paragraph> <Paragraph position="2"> The log-linear model, on the other hand, offers an estimate of how good each prediction is, since it produces a value y between 0 and 1. We construct the model so that 1 corresponds to same-orientation, and define dissimilarity as one minus the produced value.</Paragraph> <Paragraph position="3"> Same and different-orientation links between adjectives form a graph. To partition the graph nodes into subsets of the same orientation, we employ an iterative optimization procedure on each connected component, based on the exchange method, a non-hierarchical clustering algorithm (Spgth, 1985). We define an objective/unction ~ scoring each possible</Paragraph> <Paragraph position="5"> where \[Cil stands for the cardinality of cluster i, and d(z, y) is the dissimilarity between adjectives z and y. We want to select the partition :Pmin that minimizes ~, subject to the additional constraint that for each adjective z in a cluster C,</Paragraph> <Paragraph position="7"> where C is the complement of cluster C, i.e., the other member of the partition. This constraint, based on Rousseeuw's (1987) s=lhoue~es, helps correct wrong cluster assignments.</Paragraph> <Paragraph position="8"> To find Pmin, we first construct a random partition of the adjectives, then locate the adjective that will most reduce the objective function if it is moved from its current cluster. We move this adjective and proceed with the next iteration until no movements can improve the objective function. At the final iteration, the cluster assignment of any adjective that violates constraint (1) is changed. This is a steepestdescent hill-climbing method, and thus is guaranteed to converge. However, it will in general find a local minimum rather than the global one; the problem is NP-complete (Garey and $ohnson, 1979). We can arbitrarily increase the probability of finding the globally optimal solution by repeatedly running the algorithm with different starting partitions.</Paragraph> <Paragraph position="9"> 7 Labeling the Clusters as Positive or Negative The clustering algorithm separates each component of the graph into two groups of adjectives, but does not actually label the adjectives as positive or negative. To accomplish that, we use a simple criterion that applies only to pairs or groups of words of opposite orientation. We have previously shown (Hatzivassiloglou and McKeown, 1995) that in oppositions of gradable adjectives where one member is semantically unmarked, the unmarked member is the most frequent one about 81% of the time. This is relevant to our task because semantic markedness exhibits a strong correlation with orientation, the unmarked member almost always having positive orientation (Lehrer, 1985; Battistella, 1990).</Paragraph> <Paragraph position="10"> We compute the average frequency of the words in each group, expecting the group with higher average frequency to contain the positive terms. This aggregation operation increases the precision of the labeling dramatically since indicators for many pairs of words are combined, even when some of the words are incorrectly assigned to their group.</Paragraph> </Section> class="xml-element"></Paper>