Effects of Adjective Orientation and Gradability 
on Sentence Subjectivity 
Vasileios Hatzivassiloglou 
Department o1' Computer Science 
Columbia Universily 
New York, NY 10027 
vh@cs, columbia, edu 
Janyce M. Wiebe 
Department of Computer Science 
New Mexico State University 
Las Cruces, NM 88003 
wiebe@cs, nmsu. edu 
Abstract 
Subjectivity is a pragmatic, sentence-level feature that 
has important implications for texl processing applica- 
lions such as information exlractiou and information ic- 
lricwd. We study tile elfeels of dymunic adjectives, se- 
mantically oriented adjectives, and gradable ad.ieclivcs 
on a simple subjectivity classiiicr, and establish lhat 
lhcy arc strong predictors of subjectivity. A novel train- 
able mclhod thai statistically combines two indicators of 
gradability is presented and ewlhlalcd, complementing 
exisling automatic Icchniques for assigning orientation 
labels. 
1 Introduction 
In recent years, computalional tcchniqt,es for the deter- 
mination of &:deal semantic features have been proposed 
and ewdualed. Such features include sense, register, do- 
main spccilicity, pragmatic restrictions on usage, scnlan- 
lic markcdncss, and orientation, as well as automatically 
ictcnlifiecl links between words (e.g., semantic rclalcd- 
hess, syllollynly, antonylny, and tneronymy). Aulomal- 
ically learning features of this type from hugc corpora 
allows the construction or augmentation of lexicons, and 
the assignment of scmanlic htbcls lo words and phrases 
in running text. This information in turn can bc used to 
help dcterlninc addilional features at the It×teal, clause, 
sentence, or document level. 
Tiffs paper explores lira benelits that some lexical fea- 
tures of adjectives offer l'or the prediction of a contexlual 
sentence-level feature, suOjectivity. Subjectivity in nat- 
ural  re\['crs to aspects of  used to ex- 
press opinions and ewfluations. The computatiomtl task 
addressed here is to distinguish sentences used to present 
opinions and other tbrms of subjectivity (suOjective sen- 
tences, e.g., "At several different layers, it's a fascinating 
title") from sentences used to objectively present factual 
information (objective sentences, e.g., "Bell industries 
Inc. increased its quarterly to 10 cents from 7 cents a 
share"). 
Much research in discourse processing has focused 
on task-oriented and insmmtional dialogs. The task ad- 
dressed here comes to the fore in other genres, especially 
news reporting and lnternet lorums, in which opinions 
of various agents are expressed and where subjectivity 
judgements couht help in recognizing inllammatory rues- 
sages ("llanles') and mining online sources for product 
reviews. ()thor (asks for whicll subjectivity recognition 
is potentially very useful include infornmtion extraction 
and information retrieval. Assigning sub.icctivity labels 
to documents or portions of documents is an example of 
non-topical characteri×ation of information. Current in- 
formation extraction and rolricval lechnology focuses al- 
most exclusively on lhe subject matter of the documcnls. 
Yet, additiomtl components of a document inllucncc its 
relevance to imrlicuhu • users or tasks, including, for ex- 
alnple, the evidential slatus el: lhc material presented, and 
attitudes adopted in fawn" or against a lmrticular person, 
event, or posilion (e.g., articles on a presidenlial cam- 
paign wrillen to promote a specific candidate). In sum- 
marization, subjectivity judgmcnls could be included in 
documcllt proiilcs to augment aulomatically produced 
docunacnt summaries, and to hel l) the user make rele- 
vance judgments when using a search engine. 
()thor work on sub.iectivity (Wicbc et al., 1999; Bruce 
and Wicbc, 2000) has established a positive and statisti- 
cally signilicant correlation with the presence of adiec- 
lives. £incc the mere presence of one or iDoi'c adjectives 
is useful for prcdicling (hat a scntcrtce is subjective we 
investigate ill this paper (lie cfl'ccts of additional lcxical 
scmanlic lcalurcs of adjectives that can be automatically 
learned from corpora. We consider two such l%atures: se- 
mantic orientation, which represents an ewdualivc char- 
acterization of a word's deviation from the norm for its 
semantic group (e.g., beauti/'ul is positively oriented, as 
opposed to ugly); and gradability, which characterizes a 
word's ability to express a property in wlrying degrees. 
In lira remainder of this paper, we \[irst address adjec- 
tive orientation in Section 2, summarizing a previously 
published method for automatically separating oriented 
adjectives into positive and negative classes. Then, Sec- 
tion 3 presents a novel method for learning gradablc ad- 
jectives using a largo corpus and a statistical feature com- 
bination naodel. In Section 4, we review earlier exper- 
iments on testing subjectivity using wu'ious fcatt, res as 
predictors, and then present comparative analyses of the 
effects that orientation and gradability have on our abil- 
ity to In'edict sentence subjectivity from adjectives. Wc 
show that both give us higher-quality features for recog- 
nizing st@icctive sentences, and conclude by discussing 
future extensions to Ibis work. 
299 
Ct' 
Number of Number of Average nnmber Ratio o1' average 
adjectives in links in of links for Accuracy 
test set (IAc~l) test set (IL~I) each adjective group frequencies 
730 2,568 7.04 78.08% 1.8699 
516 2,159 8.37 82.56% 1.9235 
369 1,742 9.44 87.26% 1.3486 
236 1,238 10.49 92.37% 1.4040 
Table 1: Evaluation o1' the adjective orientation classification and labeling methods (from (Hatzivassiloglou and McK- 
eown, 1997)). 
2 Semantic Orientation 
The semantic orientation or polarity of a word indicates 
the direction the word deviates fl'om the norm for its se- 
mantic group or lexicalfield (Lehrer, 1974). It is an eval- 
uative characteristic (Battistella, 1990) of the meaning of 
the word which restricts its usage to appropriate prag- 
matic contexts. Words that encode a desirable state (e.g., 
beautiful, unbiased) have a positive orientation, while 
words that represent undesirable states have a negative 
orientation. Within tile particular syntactic class o1' ad- 
jectives, orientation can be expressed as the ability of an 
adjective to ascribe in general a positive or negative qual- 
ity to the modified item, making it better or worse than a 
similar unmodilied item. 
Most antonymous adjectives can be contrasted on 
the basis of orientation (e.g., beautil)d-ugly); similarly, 
nearly synonymous terms are often distinguished by dill 
fcrent orientations (e.g., simple-siml)listic). While ori- 
entation applies to many adjectives, there are also those 
that have no orientation, typically as members of groups 
of complementary, qualitative terms (Lyons, 1977) (e.g., 
domestic, medical, or red). Since orientation is inher- 
ently connected with cwduative judgements, it appears 
to be a promising feature for predicting subjectivity. 
Hatzivassiloglou and McKeown (1997) presented a 
method for autonmtically assigning a + or - orientation 
label to adjectives known to have some semantic orien- 
tation. Their method is based on information extracted 
fi'om conjunctions between adjectives in a large corpusI 
because orientation constrains the use of the words in 
specific contexts (e.g., compare corrupt and brutal with 
*corrupt but brutal), observed conjunctions of adjectives 
can be exploited to inl'er whether the conjoined words 
are of the same or different orientation. Using a shallow 
parser on a 21 million word corpus of Wall Street Jour- 
nal articles, Hatzivassiloglou and McKeown developed 
and trained a log-linear statistical model that predicts 
whether any two adjectives have the same orientation 
with 82% accuracy. The predicted links o1' same or dil L 
ferent orientation are automatically assigned a strength 
value (essentially, a confidence estimate) by tile model, 
and induce a graph that can be partitioned with a clus- 
tering algorithm into components so that all words in the 
same component belong to the same orientation class. 
Once the classes have been determined, fl'equency infor- 
mation is used to assign positive or negative labels to 
each class (there are slightly fewer positive terms, but 
with a significantly higher rate of occurrence than nega- 
tive terms). 
Hatzivassiloglou and McKeown applied their method 
to 1,336 (657 positive and 679 negative) adjectives which 
were all the oriented adjectives appearing in the corpus 
20 times or more. Orientation labels were assigned to 
these adjectives by hand. I Subsequent validation of the 
initial selection and label assignment steps with indepen- 
dent human judges showed an agreement of 89% t'or tile 
first step and 97% for the second step, establishing that 
orientation is a fairly objective semantic property. Be- 
cause the accuracy ol' the method depends on the den- 
sity of conjunctions per adjective, Hatzivassiloglou and 
MeKeown tested separately their algorithm for adjectives 
appearing in at least 2, 3, 4, or 5 conjunctions in the col 
pus; their results are shown in Table I. 
In this paper, we use the model labels assigned by 
hand by Hatzivassiloglou and McKeown, and tile labels 
automatically obtained by their method and reported in 
(Hatzivassiloglou and McKeown, 1997) with the follow- 
ing extension: An adjective that appears in k conjunc- 
tions will receive (possibly different) labels when ana- 
lyzed together with all adjectives appearing in at least 2, 
3 ..... k conjunctions; since performance generally in- 
creases with the number of conjunctions per adjective, 
we select as the orientation label the one assigned by 
the experi,nent t, sing the highest applicable conjunctions 
threshold. Overall, we have labels for 730 adjectives 2, 
with a prediction accuracy of 81.51%. 
3 Gradability 
Gradability (or grading) (Sapir, 1944; Lyons, 1977, p. 
27 I) is the semantic property that enables a word to par- 
ticipate in comparative constructs and to accept mod- 
ifying expressions that act as intensitiers or diminish- 
ers. Gradable adjectives express properties in varying 
degrees ot' strength, relative to a norm either explicitly 
ISome adjectives with unclem; mnbiguous, or conlexl,-dependenl 
orientation were excluded. 
2Those appearing in the corpus in two conjunctions or inore, since 
some conjunction data nlust be left out to h'ain the link prediction algo- 
rithm. 
300 
cold 
Unmodilied by 
grading words 
Moditied by 
grading words 
civil 
Unmodilied by 
grading words 
Modified by 
grading words 
Uninllected 392 20 1,296 1 
Inllected for degree 18 0 0 0 
'litble 2: Extracted wdues of gradability indicators, i.e., frequencies of the word with or without the specitied intlection 
or moditication, for two adjectives, one gradable (cold) and one primarily non-gradable (civil). The frequencies were 
compt, ted li'om the 1987 Wall Street Journal corpus. 
mentioned or implicitly supplied by the modilied noun 
(for example, a small planet is usually much larger thart a 
large house; cf. the distinction between absolute and tel- 
alive adjectives made by Katz (1972, p. 254)). This rel- 
ativism in the interpetation of gradable words indicates 
that gradability is likely to be a good predictor ¢71' subjec- 
tivity. 
3.1 Indicators ofgradability 
Most gradable words appear at least several times in a 
large corpt, s either in forms inflected for degree (i.e., 
comparative and superlative), or in tile context of grading 
modilicrs such as veo,. However, non-gradable words 
may also occasionally appear in such contexts or forms 
under exceptional circumstances. For example, ve O, 
dead can be used tk)r emphasis, and re&let am~ re&let 
(as in "her lhce became redder and redder") can be used 
to indicate a progression of coloring, qb distinguish be- 
tween truly gradablc adjectives and non-gradable adjec- 
tives in these exceptional contexts, we have developed 
a trainable log-linear statistical model that lakes into ac- 
count tile number of times an ad.iective has been observed 
in a form or context indicating gradability relative to the 
number of limes it has been seen in non-gradable con- 
texts. 
We use a shallow parser to retrieve from a large corpus 
tagged for part-of-speech with Church's PARTS tagger 
(Church, 1988) all adjectives and their modifiers. Al- 
though the most common use of an adverb modifying 
an adjective is to function as an intensilier or diminisher 
(Quirk et al., 1985, p. 445), adverbs can also add to tile 
semantic content of the adjectival phrase instead of pro- 
viding a grading effect (e.g., immediately available, po- 
litically vuhmrable), or function as cmphasizers, adding 
to the force o1' tile base adjective and not lo its degree 
(e.g., virtually impossible; compare *re O, impossible). 
Therefore, we compiled by hand a list of 73 adverbs and 
noun phrases (such as a little, exceedingly, somewhat, 
and veo') that are fi'equently used as grading moditicrs. 
The number of times each adjective appears mod ilied by 
a term form this list becomes a first indicator of gradabil- 
ity. 
To detect inflected forms o1' adjectives (which, in 15> 
glish, always indicate gradability st, bject to the excel> 
tions discussed earlier), we have implemented an auto- 
matic lnorphology analysis component. This program 
recognizes several irregular forms (e.g., good-better- 
best) and strips tile grading suffixes -er and -est Dora 
regularly inllected adjectives, producing a list of candi- 
date base forms that if inflected would yield tilt origi- 
nal adjective (e.g., bigger produces three potential forms, 
big, bigg, and bigge). The frequency of these candi- 
date base words is checked against tile corpus, and tile 
form with signilicantly higher frequency is selected. To 
guard against cases of base adjective forms that end in -er 
or-est (e.g., sih,er), the original word is also included 
alllong tile candidates. The total number of times this 
procedure is successfully applied for each adjective be- 
comes a second indicator of gradability. 
3.2 l)etermlnlng gradabillty 
The presence or absence of each of the above two indica- 
tors results in a 2 x 2 frequency table IBr each adjective; 
examples for one gradable and one non-gradable adjec- 
tive are given in "lhble 2. "lb convert lhese four numbers 
to a single decision on tile gradability of tile ad.iective, we 
use a log-linear model. Ix)g-linear models (Nantnef and 
l)ufly, 1989) construct a linear combination (weighted 
sum) of the predictor wlriables 1~, 
i=1 
and relate it to the actual response H. (in this case, 0 for 
non-gmdable and 1 for gradable) via the so-called logis- 
tic trcm,sJbrmation, 
1~- I -t- e'J 
Maximum likelihood estimates for the coefficients fli 
are obtained from training samples for which the correct 
response H, is known, using the iterative reweighted non- 
linear least squares algorithm (Bates and Watts, 1988). 
This statistical model is particularly suited for model- 
ing variables with a "yes"-"no" (binary) value, because, 
unlike linear models, it captures the dependency of IFs 
variance on its mean (Santner and Dully, 1989). 
We normalize the counts for the two indicators of 
g,'adability, and the cot, at ot'joint occurrences of both in- 
tleetion and modilication by grading moditiers, by divid- 
ing with the total frequency of the adjective in the corpus. 
In this manner, we obtain three real-valued predictors 
301 
Classitied as gradable: 
acceptable accurate afraid aware busy careful 
cautious el~eap creative critical dangerous 
different disappointing equal fair fanfiliar far 
favorable formal free frequent good grand 
inadequate intense interesting legitimate likely 
positive professional reasonable rich 
short-term significant slow solid sophisticated 
sound speculative thin tight tough uucertain 
widespread worth 
Classilied as non-gradable: 
additional alleged alternative annual antitrust 
automatic certain criminal cumulative daily 
deputy domestic cldcrly false linaneial 
first-quarter full hefty illegal institutional 
internal legislative long-distance military 
minimum monthly moral national official 
one-time other outstanding present prior 
prospective punitive regional scientific 
secondary sexual subsidiary taxable 
three-nmnth three-year total tremendous 
two-year unfifir unsolicited upper vohmtary 
white wholesale world-wide wrong 
Figure 1: Automatically obtained classification of a 
sample of 100 adjectives as gradable or not. Correct 
decisions (according to the COBU1LD-based reference 
model) are indicated in bold. 
14", i = 1,..., 3 for the log-linear model. We also con- 
sider a modilied model, where any adjective for which 
any occurrence of simultaneous inflection and modilica- 
tion has been detected is automatically labeled gradable; 
the remaining two predictors are used to classify the ad- 
jectives that do not fullill this condition. This modilica- 
tion is motivated by the fact that observing an adjective 
in such a context offers a very high likelihood o1' grad- 
ability. 
3.3 Experimental results 
We extracted from the 1987 Wall Street Journal corpus 
(21 million words) all adjectives with a frequency o1' 300 
or more; this produced a collection of 496 words. Grad- 
ability labels specifying whether each word is gradable 
or not were manually assigned, using tim designations 
of the Collins COBUILD (Collins Birmingham Univer- 
sity International Language Database) dictionary (Sin- 
clair, 1987). COBUILD marks each sense of each adjec- 
tive with one of the labels QUALIT, CLASSIF, or COLOR, 
corresponding to gradable, non-gradable, and color ad- 
jectives. In cases where COBUILD supplies conflicting 
labels for different senses of a word, we either omitted 
that word or, if a sense were predominant, gave it the 
label of that sense. In some cases, the word did not 
appear in COBUILD; these typically were descriptive 
compounds speci\[ic to the domain (e.g., anti-takeover, 
over-the-coullter) and were in most cases marked as non- 
gradable adjectives. Overall, 453 of tile 496 adjeclives 
(91.33%) were assigned gradability labels by hand, while 
the remaining 53 words were discarded because they 
were misclassitied as adjectives by the part-ol:speech 
tagger (e.g., such) or because they coukt not be assigned 
a unique gradability label in accordance with COBUILD. 
Out of these words, 235 (51.88%) were manually classi- 
lied as gradable adjectives, and 218 (48.12%) were clas- 
silied as non-gradablc adjectives. 
Following the methodology of the preceding subsec- 
tion, we recovered the inflection and modilication indica- 
tors for these 453 adjectives, and trained both the unmod- 
ified and modilied log-linear models rcpcatedly, using a 
randomly selected subset ol' 300 adjectives for training 
and 100 adjectives for testing. The entire cycle of se- 
lecting random test and training sets, fitting the model's 
coefficients, making predictions, and evaluating the pre- 
dicted gradability labels is repeated 100 times, to ensure 
that the ewtluation is not affected by a lucky (or unlucky) 
partition of the data between training and test sets. This 
procedure yields over the 453 adjectives gradability clas- 
sifications with an average precision o1' 93.55% and av- 
erage recall o1' 82.24% (in terms of the gradable words 
reported or recovered, respectively). The overall accu- 
racy of the predicted gradability labels is 87.97%. These 
results were obtained with the modified log-linear model, 
which slightly ot, tperformed the model that uses all three 
predictors (in that case, we obtained an average precision 
of 93.86%, average recall ol' 81.70%, and average over- 
all accuracy o1' 87.70%). Figure I lists the gradability 
labels that were automatically assigned to one of the 100 
random test sets ttsing the moditied prediction algorithm. 
We also assigned automatically labels to the entire set of 
453 adjectives, using 4-fold cross-validation (repeatedly 
training on three-fourths of tim 453 adjectives and test- 
ing on the rest). This resulted in precision of 94.15%, 
recall of 82.13%, and accuracy of 88.08% for the entire 
adjective set. 
4 Subjectivity 
The main motivation for the present paper is to examine 
the effect that information about an adjective's semantic 
orientation and gradability has on its probability of oc- 
curring in a subjective sentence (and hence on its quality 
as a subjectivity predictor). We tirst review related work 
on subjectivity recognition and then present our results. 
4.1 Previous work on subjectivity recognition 
In work by Wiebc, Bruce, and O'Hara (Wiebe ct al., 
1999; Bruce and Wicbe, 2000), a corpus of 1,001 sen- 
tences 3 of the Wall Street Journal TreeBank Corpus 
3Conlpoutld sentences were manually segmented into their con- 
juncts, and each conjtmct treated as a scparale sentence. 
302 
(Marcus et al., 1993) was nlanually annotated with sub- 
jeciivity chlssifications. Specifically, each sentence was 
assigned a subjective or objective classitication, accord- 
ing to concensus lags derived by a slalistical analysis of 
lhe chisses assigned by three human judges (see (Wiebe 
et al., 1999) for further infornmtion). The total nulnber 
of subjective sentences in lhe data is 486, and the total 
number of objeclive sentences is 515. 
Bruce and Wiebe (2000) performed a statistical anal- 
ysis of the assigned classitications, linding lhat ac(iec- 
tivcs are statistically signilicantly and positively corre- 
lated with subjective sentences in the corpus on the basis 
(, . The proba- of the log-likelihood ratio test statistic -,2 
bility of a sentence being subjective, simply given din! 
there is at least one adjective in lhe sentellee, is 56%, 
even though there are more objective than subjective sen- 
lences in the corpus. In addition, Bruce and Wicbe iden- 
tiffed a type of adjective that is indicative of subjective 
sentences: those Quirk et al. (1985) term dynamic, which 
"denote qualities that a,'e thoughl to be subjecl to con- 
trol by the possessor" (p. 434). IZxamples are "kind" and 
"careful". Bruce and Wiebe nianually applied synlactic 
tests to identify dynamic adjectives in hall' of the corpus 
nlentioned above. We inclutle such adjectives in the anal- 
ysis below, to assess whether additional lexical seinantic 
features associated with subjectivity hel I ) improve pro- 
dictability. 
Wiebe el al. (1999) developed an automatic system to 
perform st, bjectivily lagging. In 10-fold cross valida- 
lion experiments applied to the corpus described above, 
a probabilislic classilier oblaincd an average accuracy on 
subjectivity lagging of 72.17%, nlorc Ihan 20 perccnlage 
poinls higher than the baseline accuracy obtained by al- 
ways choosing tile nlore frcquent class. A binary feature 
is included for each of lhe lbllowing: lhe presence in lhe 
sentence of a pl'ollotln, an adjective, a cardinal number, 
a modal other fllan will, and an adverb other than #lot. 
They also inchlded a binary feature representing whether 
or not the sentence begins a new lxu'agraph, l:inally, a 
feature was included representing co-occurrence of word 
tokens and punciuation marks with tile sul~jective and ob- 
jective classilicfition. An analysis of the system showed 
that the adjective \['cature was imporlant to realizing the 
inlprovolncnts over lllO baseline accuracy, in this \])apci', 
we use lhe performance of the simple adjcclive fealtu'e as 
a baseline, and identify higher quality adjeclive features 
based on gradability and orienlalion. 
4.2 Orientation and gradability as subjectivity 
predictors: Results 
We measure the precision of a simple prediction method 
for subjectivity: a sonlence is classilicd as subjcclivc il' at 
least one nlonlbor of a set of adjectives N occurs in 1he 
sontonco, alld objeclive otherwise. By wirying 1tlo sot 
(e.g., all adjeclives, only gradable adjectives, only nega- 
tively orienied adjectives, etc.) we call assess the t, seful- 
heSS of ihe additional knowledge for predicting subjec- 
livity. 
For the present study, we use tile set of all adjectives 
automatically identified in tile corpt, s by Wiebc et al. 
(1999) (Section 4.1 ); the set of dynamic adjectives Ill,{Inu- 
ally identified by Bruce and Wiebe (2000) (Section 4.1); 
tile set of scnmntic orientation labels assigned by Hatzi- 
vassiloglou and McKeown (1997), both manually and 
automatically with our extension described in Section 2; 
and the set of gradability labels, both manually and att- 
tomatically assigned according to the revised log-linear 
model of Section 3. We calculate restllts (shown in 'hi- 
ble 3) for each of lhese sets of all adjectives, dynamic, 
oriented and gradable adjectives, as well as for unions 
and intersections of lhose sets. Nole fliat these four sets 
have been extracted l'rom comparable but different cor- 
pora (different years of the Wall Street Journal), therefore 
sometimes adjectives in one corpus may not be present 
in the other corpus, reducing the size of intersection sets. 
Also, for gradability, we worked with a sample set of 100 
adjectives rather than all possible adjectives we could 
automatically calcuhtte gladabiliiy vahles for, since our 
goal in the present work is to measure correlations be- 
tween these sets and sul~jeciivity, rather than building a 
system for predicling subjectivity for as many ac\[iectives 
as possible. 
In Table 3, the second cohmm identifies 8, the set 
of ac\[iective types in question. The third cohimn gives 
the number of subjective sentences that contain one or 
more instances of members of S, and the fourth colunul 
gives lhe same ligure for ol~jective sentences. Therefore 
these two cohinuls together specify lhe coverage of tlm 
subjectivity indicator examined. The lifth cohimn gives 
111c conditional probability that a sentence is subjective. 
givell that (tile of iilorc illstatices of ti/enlbcl+S of +5; ap- 
pears. This is a precishm inetrie that assesses feature 
quality: if inslances of <"7 appear, how likely is the son- 
tence to be subjective? The last two colunuls contrast the 
observed conditional probability with the a priori prob- 
ability of subjective sentellees (i.e., chalice; sixth col- 
ulnn) and with the probability assigned by the baseline 
all-adjectives model (i.e., the lirst row in the table; sev- 
enth colunm). 
The nlost striking aspect of these results is lhat all sets 
involviug dynamic adiectives positive or negative po- 
larity, or gradability are better predictors of sul~jective 
sentenccs than the class of adjectives as a whole, lqve 
of the sets are at least 25 points better (LI4, LI6, L21, 
L23, and L24); four others are at least 20 points better 
(L2, L9, L13, and 1,15); and live others are at least 15 
points better (L4, LI I, 1,18, L20, and 1,22). In most of 
these cases, the difference between these predictors and 
all adjectives is statistically signiticant 4 fit the 5% level or 
less; ahnost all of these predictors offer statistically sig- 
nificantly better than even odds in predicting subjectivity 
correctly. In nlany cases where statistical signilicance 
'Iwe applied a chi-square lest Oll the 2 x 2 cross-classificalion lable 
(Fleiss, 1981). 
303 
LI. 
L2. 
L3. 
L4. 
L5. 
L6. 
L7. 
L8. 
L9. 
L10. 
Lll. 
Ll2. 
LI3. 
LI4. 
LI5. 
L16. 
LI7. 
LI8. 
LI9. 
L20. 
L21. 
L22. 
L23. 
L24. 
Adjeclive Set S # Subj Sents with (s G ,5') + 
Dyn Adjs fq S of L5. 
# Obj Sents l'(Subj Sent I Significance 
with (s G ,5') + (~ e S) +) Against maiority Against all adjs 
All Adjectives 403 321 0.56 0.0041 N/A 
Dynamic Adjectives 92 32 0.74 1.1989 • 1.0 -r 1..6369 - 10 -4 
Pol+, man 138 87 0.61 0.0007 0.1546 
Pol-, man 79 37 0.67 0.0001 0.0158 
Pol+ U Pol-, man 197 114 0.63 6.91.91 • 10 -~ 0.0260 
Grad, man 193 115 0.63 1.9633 • 10 -~ 0.0440 
Not Grad, man 172 147 0.54 0.1084 0.6496 
t'o1+, auto 121 79 0.60 0.0026 0.2537 
Pol-, auto 61 21 0.74 1.1635 • 10 -~ 0.0017 
PoI+ U I'o1--, auto 170 95 0.64 8.5888 - 10 -~ 0.0202 
Grad, auto 30 14 0.68 0.0166 0.1418 
Not Grad, auto 63 51 0.55 0.2079 0.9363 
51 19 0.73 0.0001 0.0081 
8.0397.10 -~ Dyn Adjs 71 S of L6. 39 8 0.83 
l)yn Adjs 71 S of L I0. 50 19 0.72 0.0002 0.0103 
Dyn Adjs 71 S ofLl I 7 2 0.78 0.1582 0.3220 
Grad 71 Pol+, man 90 58 0.61 0.0070 0.2891 
Grad 71 Pol-, man 35 I6 0.69 0.0080 0.09711 
Grad 71 (Pol+ U Pol-), man 119 71 0.63 0.0005 0.1000 
Grad fl Pol+, auto 13 6 0.68 0.1376 0.3833 
Grad n Pol-, auto 2 0 1.00 0.4556 0.5838 
Grad 71 (Pol+ U Pol-), auto 15 6 0.71 0.0636 0.2255 
l)yn Adjs N S o1' L22. 4 0 1.00 0.1203 0.2019 
I)yn Adjs ('1 ,_"; of L19. 24 5 0.83 0.0006 0.0070 
Key: (s G ,5')+: one or more instances of members ofS. Ib/+: positive polarity, lbl-: negalive polarity. 
GtzM: gradable. Dyn: dynamic. Matt: manually identilied. Auto: automalically identified. 
Table 3: Subjectivity prediction results. 
4.3671.. 10 -4 
could not established this is due to small counts, caused 
by the small size of the set of adjectives automatically 
labeled for gradability. 
It is also important to note that, in most cases, 
tile automatically-classified adjectives are comparable 
or better predictors of subjective sentences than the 
manually-assigned ones. Comparing tile automatically 
generated classes with the manually identilied ones, the 
positive polarity set decreases by 1 percentage point (L3 
and L8), while the negative polarity set increases by 7 
points (L4 and L9), and the gradable sot increases by 5 
percentage points (L6 and LI 1). Among the intersection 
sets, in two cases the results are lower for tile computer- 
generated sets (Ll 3/LI 5 and L 14/L 16), but in tile other 4 
eases, the results are higher (LI 7/L20, L 18/L21, L19/L2, 
L24/L23). 
Finally, the table shows that, in most cases, pro- 
dictability improves or at worst remains essentially tile 
same as additional lexical features are considered. For 
tile set of dynamic adjectives, the predictability is 74% 
(L2), and improves in 4 of the 6 cases in which it is in- 
tersected with other sets (LI4, L l6, L23, and L24). For 
the other two (L 13 and LI 5), predictability is only 1 or 2 
points lower (not statistically significant). For the man- 
ually assigned polarity and gradability sets, in one case 
predictability is lower (L17 < L6), but in the other cases 
it remains the same or improves. The results are even 
better for the automatically assigned polarity and grad- 
ability sets: predictability improves when both features 
are considered in all but one case, when predictability 
remains the same (L20 > L8; L21 > L9; L22 > LI0; 
and LI 1 _< L20, L21, and L22). 
5 Conclusion and Future Work 
This paper presents an analysis of different adjective fea- 
tures for predicting subjectivity, showing that tlmy are 
more precise than those previously used for this task. Wc 
establish that lexical semantic features such as seman- 
tic orientation and gradability determine in large part the 
subjectivity status of sentences in which they appear. We 
also present an automatic meflmd for extracting gradabil- 
ity values reliably, complementing earlier work on se- 
mantic orientation and dynamic adjectives. 
In addition to finding more precise features for auto- 
marie subjectivity recognition, this kind of analysis could 
help efforts to encode subjective features in ontologies 
such as those described in (Knight and Luk, 1994; Ma- 
hesh and Nirenburg, 1995; Hovy, 1998). These on- 
tologies are useful for many NLP tasks, such as ma- 
chine translation, word-sense disambiguation, and gen- 
eration. Some subjective features are included in exist- 
ing ontologies (for example, Mikrokosmos (Mahesh and 
304 
Nirenburg, 1995) includes atlitude slots). Our corpus- 
based methods could help in idenlifying more or exlend- 
ing their coverage. 
To be able to use automatic subjectivily recognition 
in texl-processing applications, good ch,cs o1' sub.iccliv- 
ity mttst be found. The features developed in lhis paper 
are not only good clues of subjectivity, lhey can be Men- 
tilied automatically from corpora (see (Hatzivassiloglou 
and McKeown, 1997), and Section 3 in the present pa- 
per). In fact, the results in "Iable 3 show that the pre- 
dictability of the automatically determined gradability 
and polarity sets is better than or at least comparable to 
the predictability of the manually determined sets. Thus, 
tile oriented and gradable adjectives in the particular ap- 
plication genre can be idenlified fo," use in subjectivity 
recognition. 
Ou, efforts in this paper are largely exploratory, aim- 
ing to establish correlations among tim wlrious features 
examined. In related work, we have begun to incorporale 
the features developed herc into systems for recognizing 
flames and mining reviews in lnternel forums, extend- 
ing subjectivity judgments froth the sentence to the doc- 
ument level. In addition, we are seeking ways lo extend 
the orientation and gradability methods so that individual 
word occurrences, rather than word lypes, are character- 
ized as oriented or gradable. We also pla n l{7 incorpo- 
rate the new features presented here in machine learning 
models for tile prediction of subjectivity (e.g., (Wiebe ct 
al., 1999)) and lest lheir interaclions wilh olhcr proposed 
features. 
Acknowledglnents 
This research was SUl~ported in part by the National Sci- 
ence Foundation under grant number IIS-9817434, and 
by |he Of lice of Nawtl Research under grant number 
N00014-95-1-0776. Any opinions, tindings, or recom- 
mendations a,e those of tile authors, and do not neces- 
sarily rellect the views of the above agencies. 

References 

I)ouglas M. Bates and 1)onald G. Watts. 1988. NoMi~> 
ear Regression Analysis and its Applicatiolls. Wiley, 
New York. 

Edwin L. Battistella. 1990. Markedness: 7he Evahiative 
Siq~etwtructure qf'Language. State University of New 
York Press, Albany, New York. 

Rebecca Bruce and ,lanyce Wiebe. 2000. Recognizing 
subjectivity: A case study of rllanual tagging. Natural 
Language E, gineering, 6(2). 

Kenneth W. Church. 1988. A stochastic parts program 
and noun phrase parser for unrestricted text. In Proceedings of the Second Conference of Applied Natural Language Processing (ANLP-88), pages 136-143, 
Austin, Texas, February. Association for Computational Linguistics. 

Joseph 1~. Fleiss. 1981. Statistical Methods for Rates 
and l'mportions. Wiley, New York, 2rid edition. 

Vasileios Hatzivassiloglou and Kathleen R. McKeown. 
1997. Predicting the semantic orientation of adjectives. In Proceedings of the 35th Annual Meeting 
of the ACL and the 8th Conference of the European 
Chapter of the ACL, pages 174-181, Madrid, Spain, 
July. Association of Computational Linguistics. 

Eduard Hovy. 1998. Combining and standardizing 
large-scale practical ontologies for machine translation and other uses. In Proceedings of the 1st International Conference on Language Resources and Evahtalien (LREC), Granada, Spain. 

Jerrold J. Kalz. 1972. Sema,tic Theory. Harper and 
Row, New York. 

Kevin Knight and Steve K. Luk. 1994. Building a large- 
scale knowledge base liw machine h'anslation. Ill Pro- 
ceedi,gs o\[" the 12th Natio,al Co,ference o, Artifi- 
cial l,telli,q,e, ce (AAAI-94), w)lume 1, pages 773-778, 
Sealtlc, Washinglon, July-Augt, st. American Associ- 
ation for Artificial Intelligence. 

Adrienne Lehrer. 1974. Sema, tic l,}'elds and Lexical 
Structztre. North Holland, Amster&tm and New York. 

John I,yons. 1977. Sema,tics, volume 1. Cambridge 
University Press, Cambridge, England. 

K. Mahesh and S. Nirenburg. 1995. A siluated ontol- 
ogy for practical NLP. In Pmceedi,gs of the Work- 
shol~ oil Basic Ontological Issues in Knowledge Shar- 
ing, 14th lntenmtio,al.loi, t Co,ference oil Artificial 
Intelligence (LICAI-95), MontrEal, Canada, Augusl. 

Mitchell P. Marcus, Beatrice Santorini, and Mary Ann 
Marcinkiewicz. 1993. Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics, 19(2):313-330, June. 

I~tandolph Quirk, Sidney Grecnbaum, Geoffrey l,eech, 
alld Jall Svartvik. 1985. A Complvhe,sive Grammar 
el'the English l.cmguage. Longman, London and New 
York. 

Thomas .l. Sanlner and Diane E. l)uffy. 1989. The Statis- 
tical Analysis of Discrete Data. Springer-Verlag, New 
York. 

tTAward Sapir. 1944. ()n grading: A study ill semantics. 
l~hilosol;hy qfScie,ce, 2:93-116. Reprinted in (Sapir, 
1949). 

Edwa,d Sapir. 1949. Selected Wiqtings i, Language, 
Culture and Personality. University of California 
Press, Be,'keley, California. Edited by David G. Mat> 
delbat, m. 

John M. Sinclair (editor in chiet). 1987. Collins 
COBU1LD English Language Dictionary. Collins, 
London. 

J. Wiebe, R. Bruce, and T. O'Hara. 1999. Development and use of a gold standard data set for subjectivity classifications. In Proceedings of the 37th Annual Meeting of the Association for Computational 
Linguistics (ACL-99), pages 246-253, Universily of 
Maryhmd, June. 
