Predicting the Semantic Orientation of Adjectives 
Vasileios Hatzivassiloglou and Kathleen R. McKeown 
Department of Computer Science 
450 Computer Science Building 
Columbia University 
New York, N.Y. 10027, USA 
{vh, kathy)©cs, columbia, edu 
Abstract 
We identify and validate from a large cor- 
pus constraints from conjunctions on the 
positive or negative semantic orientation 
of the conjoined adjectives. A log-linear 
regression model uses these constraints to 
predict whether conjoined adjectives are 
of same or different orientations, achiev- 
ing 82% accuracy in this task when each 
conjunction is considered independently. 
Combining the constraints across many ad- 
jectives, a clustering algorithm separates 
the adjectives into groups of different orien- 
tations, and finally, adjectives are labeled 
positive or negative. Evaluations on real 
data and simulation experiments indicate 
high levels of performance: classification 
precision is more than 90% for adjectives 
that occur in a modest number of conjunc- 
tions in the corpus. 
1 Introduction 
The semantic orientation or polarity of a word indi- 
cates the direction the word deviates from the norm 
for its semantic group or lezical field (Lehrer, 1974). 
It also constrains the word's usage in the language 
(Lyons, 1977), due to its evaluative characteristics 
(Battistella, 1990). For example, some nearly syn- 
onymous words differ in orientation because one im- 
plies desirability and the other does not (e.g., sim- 
ple versus simplisfic). In linguistic constructs such 
as conjunctions, which impose constraints on the se- 
mantic orientation of their arguments (Anscombre 
and Ducrot, 1983; Elhadad and McKeown, 1990), 
the choices of arguments and connective are mutu- 
ally constrained, as illustrated by: 
The tax proposal was 
simple and well-received } 
simplistic but well-received 
*simplistic and well-received 
by the public. 
In addition, almost all antonyms have different se- 
mantic orientations3 If we know that two words 
relate to the same property (for example, members 
of the same scalar group such as hot and cold) but 
have different orientations, we can usually infer that 
they are antonyms. Given that semantically similar 
words can be identified automatically on the basis of 
distributional properties and linguistic cues (Brown 
et al., 1992; Pereira et al., 1993; Hatzivassiloglou and 
McKeown, 1993), identifying the semantic orienta- 
tion of words would allow a system to further refine 
the retrieved semantic similarity relationships, ex- 
tracting antonyms. 
Unfortunately, dictionaries and similar sources 
(theusari, WordNet (Miller et al., 1990)) do not in- 
clude semantic orientation information. 2 Explicit 
links between antonyms and synonyms may also be 
lacking, particularly when they depend on the do- 
main of discourse; for example, the opposition bear- 
bull appears only in stock market reports, where the 
two words take specialized meanings. 
In this paper, we present and evaluate a method 
that automatically retrieves semantic orientation in- 
formation using indirect information collected from 
a large corpus. Because the method relies on the cor- 
pus, it extracts domain-dependent information and 
automatically adapts to a new domain when the cor- 
pus is changed. Our method achieves high preci- 
sion (more than 90%), and, while our focus to date 
has been on adjectives, it can be directly applied to 
other word classes. Ultimately, our goal is to use this 
method in a larger system to automatically identify 
antonyms and distinguish near synonyms. 
2 Overview of Our Approach 
Our approach relies on an analysis of textual corpora 
that correlates linguistic features, or indicators, with 
1 Exceptions include a small number of terms that are 
both negative from a pragmatic viewpoint and yet stand 
in all antonymic relationship; such terms frequently lex- 
icalize two unwanted extremes, e.g., verbose-terse. 
2 Except implicitly, in the form of definitions and us- 
age examples. 
174 
semantic orientation. While no direct indicators of 
positive or negative semantic orientation have been 
proposed 3, we demonstrate that conjunctions be- 
tween adjectives provide indirect information about 
orientation. For most connectives, the conjoined ad- 
jectives usually are of the same orientation: compare 
fair and legitimate and corrupt and brutal which ac- 
tually occur in our corpus, with ~fair and brutal and 
*corrupt and legitimate (or the other cross-products 
of the above conjunctions) which are semantically 
anomalous. The situation is reversed for but, which 
usually connects two adjectives of different orienta- 
tions. 
The system identifies and uses this indirect infor- 
mation in the following stages: 
1. All conjunctions of adjectives are extracted 
from the corpus along with relevant morpho- 
logical relations. 
2. A log-linear regression model combines informa- 
tion from different conjunctions to determine 
if each two conjoined adjectives are of same 
or different orientation. The result is a graph 
with hypothesized same- or different-orientation 
links between adjectives. 
3. A clustering algorithm separates the adjectives 
into two subsets of different orientation. It 
places as many words of same orientation as 
possible into the same subset. 
4. The average frequencies in each group are com- 
pared and the group with the higher frequency 
is labeled as positive. 
In the following sections, we first present the set 
of adjectives used for training and evaluation. We 
next validate our hypothesis that conjunctions con- 
strain the orientation of conjoined adjectives and 
then describe the remaining three steps of the algo- 
rithm. After presenting our results and evaluation, 
we discuss simulation experiments that show how 
our method performs under different conditions of 
sparseness of data. 
3 Data Collection 
For our experiments, we use the 21 million word 
1987 Wall Street Journal corpus 4, automatically an- 
notated with part-of-speech tags using the PARTS 
tagger (Church, 1988). 
In order to verify our hypothesis about the ori- 
entations of conjoined adjectives, and also to train 
and evaluate our subsequent algorithms, we need a 
3Certain words inflected with negative affixes (such 
as in- or un-) tend to be mostly negative, but this rule 
applies only to a fraction of the negative words. Further- 
more, there are words so inflected which have positive 
orientation, e.g., independent and unbiased. 
4Available form the ACL Data Collection Initiative 
as CD ROM 1. 
Positive: adequate central clever famous 
intelligent remarkable reputed 
sensitive slender thriving 
Negative: contagious drunken ignorant lanky 
listless primitive strident troublesome 
unresolved unsuspecting 
Figure 1: Randomly selected adjectives with positive 
and negative orientations. 
set of adjectives with predetermined orientation la- 
bels. We constructed this set by taking all adjectives 
appearing in our corpus 20 times or more, then re- 
moving adjectives that have no orientation. These 
are typically members of groups of complementary, 
qualitative terms (Lyons, 1977), e.g., domestic or 
medical. 
We then assigned an orientation label (either + or 
-) to each adjective, using an evaluative approach. 
The criterion was whether the use of this adjective 
ascribes in general a positive or negative quality to 
the modified item, making it better or worse than a 
similar unmodified item. We were unable to reach 
a unique label out of context for several adjectives 
which we removed from consideration; for example, 
cheap is positive if it is used as a synonym of in- 
expensive, but negative if it implies inferior quality. 
The operations of selecting adjectives and assigning 
labels were performed before testing our conjunction 
hypothesis or implementing any other algorithms, to 
avoid any influence on our labels. The final set con- 
tained 1,336 adjectives (657 positive and 679 nega- 
tive terms). Figure 1 shows randomly selected terms 
from this set. 
To further validate our set of labeled adjectives, 
we subsequently asked four people to independently 
label a randomly drawn sample of 500 of these 
adjectives. They agreed with us that the posi- 
tive/negative concept applies to 89.15% of these ad- 
jectives on average. For the adjectives where a pos- 
itive or negative label was assigned by both us and 
the independent evaluators, the average agreement 
on the label was 97.38%. The average inter-reviewer 
agreement on labeled adjectives was 96.97%. These 
results are extremely significant statistically and 
compare favorably with validation studies performed 
for other tasks (e.g., sense disambiguation) in the 
past. They show that positive and negative orien- 
tation are objective properties that can be reliably 
determined by humans. 
To extract conjunctions between adjectives, we 
used a two-level finite-state grammar, which covers 
complex modification patterns and noun-adjective 
apposition. Running this parser on the 21 mil- 
lion word corpus, we collected 13,426 conjunctions 
of adjectives, expanding to a total of 15,431 con- 
joined adjective pairs. After morphological trans- 
175 
Conjunction category 
Conjunction 
types 
analyzed 
All appositive and conjunctions 
All conjunctions 2,748 
All and conjunctions 2,294 
All or conjunctions 305 
All but conjunctions 214 
All attributive and conjunctions 1,077 
All predicative and conjunctions 860 
30 
% same- 
orientation 
(types) 
77.84% 
81.73% 
77.05% 
30.84% 
80.04% 
84.77% 
70.00% 
% same- 
orientation 
(tokens) 
72.39% 
78.07% 
60.97% 
25.94% 
76.82% 
84.54% 
63.64% 
P-Value 
(for types) 
< i • i0 -I~ 
< 1 • 10 -1~ 
< 1 • 10 -1~ 
2.09.10 -:~ 
< 1. i0 -16 
< 1. i0 -I~ 
0.04277 
Table 1: Validation of our conjunction hypothesis. The P-value is the probability that similar 
extreme results would have been obtained if same- and different-orientation conjunction types were 
equally distributed. 
or more 
actually 
formations, the remaining 15,048 conjunction tokens 
involve 9,296 distinct pairs of conjoined adjectives 
(types). Each conjunction token is classified by the 
parser according to three variables: the conjunction 
used (and, or, bu~, either-or, or neither-nor), the 
type of modification (attributive, predicative, appos- 
itive, resultative), and the number of the modified 
noun (singular or plural). 
4 Validation of the Conjunction 
Hypothesis 
Using the three attributes extracted by the parser, 
we constructed a cross-classification of the conjunc- 
tions in a three-way table. We counted types and to- 
kens of each conjoined pair that had both members 
in the set of pre-selected labeled adjectives discussed 
above; 2,748 (29.56%) of all conjoined pairs (types) 
and 4,024 (26.74%) of all conjunction occurrences 
(tokens) met this criterion. We augmented this ta- 
ble with marginal totals, arriving at 90 categories, 
each of which represents a triplet of attribute values, 
possibly with one or more "don't care" elements. 
We then measured the percentage of conjunctions 
in each category with adjectives of same or differ- 
ent orientations. Under the null hypothesis of same 
proportions of adjective pairs (types) of same and 
different orientation in a given category, the num- 
ber of same- or different-orientation pairs follows a 
binomial distribution with p = 0.5 (Conover, 1980). 
We show in Table 1 the results for several repre- 
sentative categories, and summarize all results be- 
low: 
• Our conjunction hypothesis is validated overall 
and for almost all individual cases. The results 
are extremely significant statistically, except for 
a few cases where the sample is small. 
• Aside from the use of but with adjectives of 
different orientations, there are, rather surpris- 
ingly, small differences in the behavior of con- 
junctions between linguistic environments (as 
represented by the three attributes). There are 
a few exceptions, e.g., appositive and conjunc- 
tions modifying plural nouns are evenly split 
between same and different orientation. But 
in these exceptional cases the sample is very 
small, and the observed behavior may be due 
to chance. 
• Further analysis of different-orientation pairs in 
conjunctions other than but shows that con- 
joined antonyms are far more frequent than ex- 
pected by chance, in agreement with (Justeson 
and Katz, 1991). 
5 Prediction of Link Type 
The analysis in the previous section suggests a base- 
line method for classifying links between adjectives: 
since 77.84% of all links from conjunctions indicate 
same orientation, we can achieve this level of perfor- 
mance by always guessing that a link is of the same- 
orientation type. However, we can improve perfor- 
mance by noting that conjunctions using but exhibit 
the opposite pattern, usually involving adjectives of 
different orientations. Thus, a revised but still sim- 
ple rule predicts a different-orientation link if the 
two adjectives have been seen in a but conjunction, 
and a same-orientation link otherwise, assuming the 
two adjectives were seen connected by at least one 
conjunction. 
Morphological relationships between adjectives al- 
so play a role. Adjectives related in form (e.g., ade- 
quate-inadequate or thoughtful-thoughtless) almost 
always have different semantic orientations. We im- 
plemented a morphological analyzer which matches 
adjectives related in this manner. This process is 
highly accurate, but unfortunately does not apply 
to many of the possible pairs: in our set of 1,336 
labeled adjectives (891,780 possible pairs), 102 pairs 
are morphologically related; among them, 99 are of 
different orientation, yielding 97.06% accuracy for 
the morphology method. This information is orthog- 
onal to that extracted from conjunctions: only 12 
of the 102 morphologically related pairs have been 
observed in conjunctions in our corpus. Thus, we 
176 
Prediction 
method 
Always predict 
same orientation 
But rule 
Log-linear model 
Morphology 
used? 
No 
Yes 
No 
Yes 
Accuracy on reported 
same-orientation links 
77.84% 
78.18% 
81.81% 
82.20% 
No 
Yes 
81.53% 
82.00% 
Accuracy on reported 
different-orientation links 
97.06% 
69.16% 
78.16% 
73.70% 
82.44% 
Table 2: Accuracy of several link prediction models. 
Overall 
accuracy 
77.84% 
78.86% 
80.82% 
81.75% 
80.97% 
82.05% 
add to the predictions made from conjunctions the 
different-orientation links suggested by morphologi- 
cal relationships. 
We improve the accuracy of classifying links de- 
rived from conjunctions as same or different orienta- 
tion with a log-linear regression model (Santner and 
Duffy, 1989), exploiting the differences between the 
various conjunction categories. This is a generalized 
linear model (McCullagh and Nelder, 1989) with a 
linear predictor 
= wWx 
where x is the vector of the observed counts in the 
various conjunction categories for the particular ad- 
jective pair we try to classify and w is a vector of 
weights to be learned during training. The response 
y is non-linearly related to r/ through the inverse 
logit function, 
e0 
Y= l q-e" 
Note that y E (0, 1), with each of these endpoints 
associated with one of the possible outcomes. 
We have 90 possible predictor variables, 42 of 
which are linearly independent. Since using all the 
42 independent predictors invites overfitting (Duda 
and Hart, 1973), we have investigated subsets of the 
full log-linear model for our data using the method 
of iterative stepwise refinement: starting with an ini- 
tial model, variables are added or dropped if their 
contribution to the reduction or increase of the resid- 
ual deviance compares favorably to the resulting loss 
or gain of residual degrees of freedom. This process 
led to the selection of nine predictor variables. 
We evaluated the three prediction models dis- 
cussed above with and without the secondary source 
of morphology relations. For the log-linear model, 
we repeatedly partitioned our data into equally sized 
training and testing sets, estimated the weights on 
the training set, and scored the model's performance 
on the testing set, averaging the resulting scores. 5 
Table 2 shows the results of these analyses. Al- 
though the log-linear model offers only a small im- 
provement on pair classification than the simpler but 
prediction rule, it confers the important advantage 
5When morphology is to be used as a supplementary 
predictor, we remove the morphologically related pairs 
from the training and testing sets. 
of rating each prediction between 0 and 1. We make 
extensive use of this in the next phase of our algo- 
rithm. 
6 Finding Groups of Same-Oriented 
Adjectives 
The third phase of our method assigns the adjectives 
into groups, placing adjectives of the same (but un- 
known) orientation in the same group. Each pair 
of adjectives has an associated dissimilarity value 
between 0 and 1; adjectives connected by same- 
orientation links have low dissimilarities, and con- 
versely, different-orientation links result in high dis- 
similarities. Adjective pairs with no connecting links 
are assigned the neutral dissimilarity 0.5. 
The baseline and but methods make qualitative 
distinctions only (i.e., same-orientation, different- 
orientation, or unknown); for them, we define dis- 
similarity for same-orientation links as one minus 
the probability that such a classification link is cor- 
rect and dissimilarity for different-orientation links 
as the probability that such a classification is cor- 
rect. These probabilities are estimated from sep- 
arate training data. Note that for these prediction 
models, dissimilarities are identical for similarly clas- 
sifted links. 
The log-linear model, on the other hand, offers 
an estimate of how good each prediction is, since it 
produces a value y between 0 and 1. We construct 
the model so that 1 corresponds to same-orientation, 
and define dissimilarity as one minus the produced 
value. 
Same and different-orientation links between ad- 
jectives form a graph. To partition the graph nodes 
into subsets of the same orientation, we employ an 
iterative optimization procedure on each connected 
component, based on the exchange method, a non- 
hierarchical clustering algorithm (Spgth, 1985). We 
define an objective/unction ~ scoring each possible 
partition 7 ) of the adjectives into two subgroups C1 
and C2 as 
i=1 x,y E Ci 
177 
Number of 
adjectives in 
test set (\[An\[) 
2 730 
3 516 
4 369 
5 236 
Number of 
links in 
test set (\[L~\[) 
2,568 
2,159 
1,742 
1,238 
Average number 
oflinksfor 
each adjective 
7.04 
8.37 
9.44 
10.49 
Accuracy 
78.08% 
82.56% 
87.26% 
92.37% 
Ratio of average 
group frequencies 
1.8699 
1.9235 
L3486 
1.4040 
Table 3: Evaluation of the adjective classification and labeling methods. 
where \[Cil stands for the cardinality of cluster i, and 
d(z, y) is the dissimilarity between adjectives z and 
y. We want to select the partition :Pmin that min- 
imizes ~, subject to the additional constraint that 
for each adjective z in a cluster C, 
1 1 
ICl- 1 d(=,y) < --IVl d(=, y) (1) 
where C is the complement of cluster C, i.e., the 
other member of the partition. This constraint, 
based on Rousseeuw's (1987) s=lhoue~es, helps cor- 
rect wrong cluster assignments. 
To find Pmin, we first construct a random parti- 
tion of the adjectives, then locate the adjective that 
will most reduce the objective function if it is moved 
from its current cluster. We move this adjective and 
proceed with the next iteration until no movements 
can improve the objective function. At the final it- 
eration, the cluster assignment of any adjective that 
violates constraint (1) is changed. This is a steepest- 
descent hill-climbing method, and thus is guaran- 
teed to converge. However, it will in general find a 
local minimum rather than the global one; the prob- 
lem is NP-complete (Garey and $ohnson, 1979). We 
can arbitrarily increase the probability of finding the 
globally optimal solution by repeatedly running the 
algorithm with different starting partitions. 
7 Labeling the Clusters as Positive 
or Negative 
The clustering algorithm separates each component 
of the graph into two groups of adjectives, but does 
not actually label the adjectives as positive or neg- 
ative. To accomplish that, we use a simple criterion 
that applies only to pairs or groups of words of oppo- 
site orientation. We have previously shown (Hatzi- 
vassiloglou and McKeown, 1995) that in oppositions 
of gradable adjectives where one member is semanti- 
cally unmarked, the unmarked member is the most 
frequent one about 81% of the time. This is relevant 
to our task because semantic markedness exhibits 
a strong correlation with orientation, the unmarked 
member almost always having positive orientation 
(Lehrer, 1985; Battistella, 1990). 
We compute the average frequency of the words 
in each group, expecting the group with higher av- 
erage frequency to contain the positive terms. This 
aggregation operation increases the precision of the 
labeling dramatically since indicators for many pairs 
of words are combined, even when some of the words 
are incorrectly assigned to their group. 
8 Results and Evaluation 
Since graph connectivity affects performance, we de- 
vised a method of selecting test sets that makes this 
dependence explicit. Note that the graph density is 
largely a function of corpus size, and thus can be 
increased by adding more data. Nevertheless, we 
report results on sparser test sets to show how our 
algorithm scales up. 
We separated our sets of adjectives A (containing 
1,336 adjectives) and conjunction- and morphology- 
based links L (containing 2,838 links) into training 
and testing groups by selecting, for several values 
of the parameter a, the maximal subset of A, An, 
which includes an adjective z if and only if there 
exist at least a links from L between x and other 
elements of An. This operation in turn defines a 
subset of L, L~, which includes all links between 
members of An. We train our log-linear model on 
L - La (excluding links between morphologically re- 
lated adjectives), compute predictions and dissimi- 
larities for the links in L~, and use these to classify 
and label the adjectives in An. c~ must be at least 
2, since we need to leave some links for training. 
Table 3 shows the results of these experiments for 
a = 2 to 5. Our method produced the correct clas- 
sification between 78% of the time on the sparsest 
test set up to more than 92% of the time when a 
higher number of links was present. Moreover, in all 
cases, the ratio of the two group frequencies correctly 
identified the positive subgroup. These results are 
extremely significant statistically (P-value less than 
10 -16 ) when compared with the baseline method of 
randomly assigning orientations to adjectives, or the 
baseline method of always predicting the most fre- 
quent (for types) category (50.82% of the adjectives 
in our collection are classified as negative). Figure 2 
shows some of the adjectives in set A4 and their clas- 
sifications. 
178 
Classified as positive: 
bold decisive disturbing generous good 
honest important large mature patient 
peaceful positive proud sound 
stimulating straightforward strange 
talented vigorous witty 
Classified as negative: 
ambiguous cautious cynical evasive 
harmful hypocritical inefficient insecure 
irrational irresponsible minor outspoken 
pleasant reckless risky selfish tedious 
unsupported vulnerable wasteful 
Figure 2: Sample retrieved classifications of adjec- 
tives from set A4. Correctly matched adjectives are 
shown in bold. 
9 Graph Connectivity and 
Performance 
A strong point of our method is that decisions on 
individual words are aggregated to provide decisions 
on how to group words into a class and whether to 
label the class as positive or negative. Thus, the 
overall result can be much more accurate than the 
individual indicators. To verify this, we ran a series 
of simulation experiments. Each experiment mea- 
sures how our algorithm performs for a given level 
of precision P for identifying links and a given av- 
erage number of links k for each word. The goal is 
to show that even when P is low, given enough data 
(i.e., high k), we can achieve high performance for 
the grouping. 
As we noted earlier, the corpus data is eventually 
represented in our system as a graph, with the nodes 
corresponding to adjectives and the links to predic- 
tions about whether the two connected adjectives 
have the same or different orientation. Thus the pa- 
rameter P in the simulation experiments measures 
how well we are able to predict each link indepen- 
dently of the others, and the parameter k measures 
the number of distinct adjectives each adjective ap- 
pears with in conjunctions. P therefore directly rep- 
resents the precision of the link classification algo- 
rithm, while k indirectly represents the corpus size. 
To measure the effect of P and k (which are re- 
flected in the graph topology), we need to carry out a 
series of experiments where we systematically vary 
their values. For example, as k (or the amount of 
data) increases for a given level of precision P for in- 
dividual links, we want to measure how this affects 
overall accuracy of the resulting groups of nodes. 
Thus, we need to construct a series of data sets, 
or graphs, which represent different scenarios cor- 
responding to a given combination of values of P 
and k. To do this, we construct a random graph 
by randomly assigning 50 nodes to the two possible 
orientations. Because we don't have frequency and 
morphology information on these abstract nodes, we 
cannot predict whether two nodes are of the same 
or different orientation. Rather, we randomly as- 
sign links between nodes so that, on average, each 
node participates in k links and 100 x P% of all 
links connect nodes of the same orientation. Then 
we consider these links as identified by the link pre- 
diction algorithm as connecting two nodes with the 
same orientation (so that 100 x P% of these pre- 
dictions will be correct). This is equivalent to the 
baseline link classification method, and provides a 
lower bound on the performance of the algorithm 
actually used in our system (Section 5). 
Because of the lack of actual measurements such 
as frequency on these abstract nodes, we also de- 
couple the partitioning and labeling components of 
our system and score the partition found under the 
best matching conditions for the actual labels. Thus 
the simulation measures only how well the system 
separates positive from negative adjectives, not how 
well it determines which is which. However, in all 
the experiments performed on real corpus data (Sec- 
tion 8), the system correctly found the labels of the 
groups; any misclassifications came from misplacing 
an adjective in the wrong group. The whole proce- 
dure of constructing the random graph and finding 
and scoring the groups is repeated 200 times for any 
given combination of P and k, and the results are 
averaged, thus avoiding accidentally evaluating our 
system on a graph that is not truly representative of 
graphs with the given P and k. 
We observe (Figure 3) that even for relatively low 
t9, our ability to correctly classify the nodes ap- 
proaches very high levels with a modest number of 
links. For P = 0.8, we need only about ? links 
per adjective for classification performance over 90% 
and only 12 links per adjective for performance over 
99%. s The difference between low and high values 
of P is in the rate at which increasing data increases 
overall precision. These results are somewhat more 
optimistic than those obtained with real data (Sec- 
tion 8), a difference which is probably due to the uni- 
form distributional assumptions in the simulation. 
Nevertheless, we expect the trends to be similar to 
the ones shown in Figure 3 and the results of Table 3 
on real data support this expectation. 
10 Conclusion and Future Work 
We have proposed and verified from corpus data con- 
straints on the semantic orientations of conjoined ad- 
jectives. We used these constraints to automatically 
construct a log-linear regression model, which, com- 
bined with supplementary morphology rules, pre- 
dicts whether two conjoined adjectives are of same 
812 links per adjective for a set of n adjectives requires 
6n conjunctions between the n adjectives in the corpus. 
179 
~ 75' 70. 
65. 
60" 
55- 
50 ~ 
0i2~4567891() 1'2 14 16 18 20 
Avem0e neiohbo~ per node 
(a) P = 0.75 
25 30 32.77 
95. 
90. 
85. 
~75' 
Average neighbors per node 
(b) P = 0.8 
,~ 70 
65 
6O 
5,5 
50 
Average netghbo~ per node 
(c) P = 0.85 
25 28.64 
Figure 3: Simulation results obtained on 50 nodes. 
10( 
95 
9O 
85 
P 
~ 7o 
55 
Average neighb0m per node 
(d) P = 0.9 
In each figure, the last z coordinate indicates the 
(average) maximum possible value of k for this P, and the dotted line shows the performance of a random 
classifier. 
or different orientation with 82% accuracy. We then 
classified several sets of adjectives according to the 
links inferred in this way and labeled them as posi- 
tive or negative, obtaining 92% accuracy on the clas- 
sification task for reasonably dense graphs and 100% 
accuracy on the labeling task. Simulation experi- 
ments establish that very high levels of performance 
can be obtained with a modest number of links per 
word, even when the links themselves are not always 
correctly classified. 
As part of our clustering algorithm's output, a 
"goodness-of-fit" measure for each word is com- 
puted, based on Rousseeuw's (1987) silhouettes. 
This measure ranks the words according to how well 
they fit in their group, and can thus be used as 
a quantitative measure of orientation, refining the 
binary positive-negative distinction. By restricting 
the labeling decisions to words with high values of 
this measure we can also increase the precision of 
our system, at the cost of sacrificing some coverage. 
We are currently combining the output of this sys- 
tem with a semantic group finding system so that we 
can automatically identify antonyms from the cor- 
pus, without access to any semantic descriptions. 
The learned semantic categorization of the adjec- 
tives can also be used in the reverse direction, to 
help in interpreting the conjunctions they partici- 
pate. We will also extend our analyses to nouns and 
verbs. 
Acknowledgements 
This work was supported in part by the Office 
of Naval Research under grant N00014-95-1-0745, 
jointly by the Office of Naval Research and the 
Advanced Research Projects Agency under grant 
N00014-89-J-1782, by the National Science Founda- 
180 
tion under grant GER-90-24069, and by the New 
York State Center for Advanced Technology un- 
der contracts NYSSTF-CAT(95)-013 and NYSSTF- 
CAT(96)-013. We thank Ken Church and the 
AT&T Bell Laboratories for making the PARTS 
part-of-speech tagger available to us. We also thank 
Dragomir Radev, Eric Siegel, and Gregory Sean 
McKinley who provided models for the categoriza- 
tion of the adjectives in our training and testing sets 
as positive and negative. 

References 
Jean-Claude Anscombre and Oswald Ducrot. 1983. 
L ' Argumentation dans la Langue. Philosophic et 
Langage. Pierre Mardaga, Brussels, Belgium. 
Edwin L. Battistella. 1990. Markedness: The Eval- 
uative Superstructure of Language. State Univer- 
sity of New York Press, Albany, New York. 
Peter F. Brown, Vincent J. della Pietra, Peter V. 
de Souza, Jennifer C. Lai, and Robert L. Mercer. 
1992. Class-based n-gram models of natural lan- 
guage. Computational Linguistics, 18(4):487-479. 
Kenneth W. Church. 1988. A stochastic parts 
program and noun phrase parser for unrestricted 
text. In Proceedings of the Second Conference on 
Applied Natural Language Processing (ANLP-88), 
pages 136-143, Austin, Texas, February. Associa- 
tion for Computational Linguistics. 
W. J. Conover. 1980. Practical Nonparametric 
Statistics. Wiley, New York, 2nd edition. 
Richard O. Duda and Peter E. Hart. 1973. Pattern 
Classification and Scene Analysis. Wiley, New 
York. 
Michael Elhadad and Kathleen R. McKeown. 1990. 
A procedure for generating connectives. In Pro- 
ceedings of COLING, Helsinki, Finland, July. 
Michael R. Garey and David S. Johnson. 1979. 
Computers and Intractability: A Guide to the 
Theory ofNP-Completeness. W. H. Freeman, San 
Francisco, California. 
Vasileios Hatzivassiloglou and Kathleen R. McKe- 
own. 1993. Towards the automatic identification 
of adjectival scales: Clustering adjectives accord- 
ing to meaning. In Proceedings of the 31st Annual 
Meeting of the ACL, pages 172-182, Columbus, 
Ohio, June. Association for Computational Lin- 
guistics. 
Vasileios I-Iatzivassiloglou and Kathleen R. MeKe- 
own. 1995. A quantitative evaluation of linguis- 
tic tests for the automatic prediction of semantic 
markedness. In Proceedings of the 83rd Annual 
Meeting of the ACL, pages 197-204, Boston, Mas- 
sachusetts, June. Association for Computational 
Linguistics. 
John S. Justeson and Slava M. Katz. 1991. Co- 
occurrences of antonymous adjectives and their 
contexts. Computational Linguistics, 17(1):1-19. 
Adrienne Lehrer. 1974. Semantic Fields and Lezical 
Structure. North Holland, Amsterdam and New 
York. 
Adrienne Lehrer. 1985. Markedness and antonymy. 
Journal of Linguistics, 31(3):397-429, September. 
John Lyons. 1977. Semantics, volume 1. Cambridge 
University Press, Cambridge, England. 
Peter McCullagh and John A. Nelder. 1989. Gen- 
eralized Linear Models. Chapman and Hall, Lon- 
don, 2nd edition. 
George A. Miller, Richard Beckwith, Christiane Fell- 
baum, Derek Gross, and Katherine J. Miller. 
1990. Introduction to WordNet: An on-line lexi- 
cal database. International Journal of Lexicogra- 
phy (special issue), 3(4):235-312. 
Fernando Pereira, Naftali Tishby, and Lillian Lee. 
1993. Distributional clustering of English words. 
In Proceedings of the 3Ist Annual Meeting of the 
ACL, pages 183-190, Columbus, Ohio, June. As- 
sociation for Computational Linguistics. 
Peter J. Rousseeuw. 1987. Silhouettes: A graphical 
aid to the interpretation and validation of cluster 
analysis. Journal of Computational and Applied 
Mathematics, 20:53-65. 
Thomas J. Santner and Diane E. Duffy. 1989. The 
Statistical Analysis of Discrete Data. Springer- 
Verlag, New York. 
Helmuth Sp~ith. 1985. Cluster Dissection and Anal- 
ysis: Theory, FORTRAN Programs, Examples. 
Ellis Horwo0d, Chiehester, West Sussex, England. 
