Ordering Among Premodifiers 
James Shaw and Vasileios Hatzivassiloglou 
Department of Computer Science 
Columbia University 
New York, N.Y. 10027, USA 
{shaw, vh}@cs, columbia, edu 
Abstract 
We present a corpus-based study of the se- 
quential ordering among premodifiers in noun 
phrases. This information is important for the 
fluency of generated text in practical appli- 
cations. We propose and evaluate three ap- 
proaches to identify sequential order among pre- 
modifiers: direct evidence, transitive closure, 
and clustering. Our implemented system can 
make over 94% of such ordering decisions cor- 
rectly, as evaluated on a large, previously un- 
seen test corpus. 
1 Introduction 
Sequential ordering among premodifiers affects 
the fluency of text, e.g., "large foreign finan- 
cial firms" or "zero-coupon global bonds" are 
desirable, while "foreign large financial firms" 
or "global zero-coupon bonds" sound odd. The 
difficulties in specifying a consistent ordering of 
adjectives have already been noted by linguists 
\[Whorf 1956; Vendler 1968\]. During the process 
of generating complex sentences by combining 
multiple clauses, there are situations where mul- 
tiple adjectives or nouns modify the same head 
noun. The text generation system must order 
these modifiers in a similar way as domain ex- 
perts use them to ensure fluency of the text. For 
example, the description of the age of a patient 
precedes his ethnicity and gender in medical do- 
main as in % 50 year-old white female patient". 
Yet, general lexicons such as WordNet \[Miller et 
al. 1990\] and COMLEX \[Grishman et al. 1994\], 
do not store such information. 
In this paper, we present automated tech- 
niques for addressing this problem of determin- 
ing, given two premodifiers A and B, the pre- 
ferred ordering between them. Our methods 
rely on and generalize empirical evidence ob- 
tained from large corpora, and are evaluated 
objectively on such corpora. They are informed 
and motivated by our practical need for order- 
ing multiple premodifiers in the MAGIC system 
\[Dalal et al. 1996\]. MAGIC utilizes co-ordinated 
text, speech, and graphics to convey informa- 
tion about a patient's status after coronary by- 
pass surgery; it generates concise but complex 
descriptions that frequently involve four or more 
premodifiers in the same noun phrase. 
To demonstrate that a significant portion of 
noun phrases have multiple premodifiers, we 
extracted all the noun phrases (NPs, exclud- 
ing pronouns) in a two million word corpus of 
medical discharge summaries and a 1.5 million 
word Wall Street Journal (WSJ) corpus (see 
Section 4 for a more detailed description of the 
corpora). In the medical corpus, out of 612,718 
NPs, 12% have multiple premodifiers and 6% 
contain solely multiple adjectival premodifiers. 
In the WSJ corpus, the percentages are a little 
lower, 8% and 2%, respectively. These percent- 
ages imply that one in ten NPs contains mul- 
tiple premodifiers while one in 25 contains just 
multiple adjectives. 
Traditionally, linguists study the premodifier 
ordering problem using a class-based approach. 
Based on a corpus, they propose various se- 
mantic classes, such as color, size, or national- 
ity, and specify a sequential order among the 
classes. However, it is not always clear how 
to map premodifiers to these classes, especially 
in domain-specific applications. This justifies 
the exploration of empirical, corpus-based al- 
ternatives, where the ordering between A and 
B is determined either from direct prior evi- 
dence in the corpus or indirectly through other 
words whose relative order to A and B has al- 
ready been established. The corpus-based ap- 
proach lacks the ontological knowledge used by 
linguists, but uses a much larger amount of di- 
135 
rect evidence, provides answers for many more 
premodifier orderings, and is portable to differ- 
ent domains. 
In the next section, we briefly describe prior 
linguistic research on this topic. Sections 3 and 
4 describe the methodology and corpus used in 
our analysis, while the results of our experi- 
ments are presented in Section 5. In Section 6, 
we demonstrate how we incorporated our or- 
dering results in a general text generation sys- 
tem. Finally, Section 7 discusses possible im- 
provements to our current approach. 
2 Related Work 
The order of adjectives (and, by analogy, nom- 
inal premodifiers) seems to be outside of the 
grammar; it is influenced by factors such as 
polarity \[Malkiel 1959\], scope, and colloca- 
tional restrictions \[Bache 1978\]. Linguists \[Goy- 
vaerts 1968; Vendler 1968; Quirk and Green- 
baum 1973; Bache 1978; Dixon 1982\] have per- 
formed manual analyses of (small) corpora and 
pointed out various tendencies, such as the facts 
that underived adjectives often precede derived 
adjectives, and shorter modifiers precede longer 
ones. Given the difficulty of adequately describ- 
ing all factors that influence the order of pre- 
modifiers, most earlier work is based on plac- 
ing the premodifiers into broad semantic classes, 
and specifying an order among these classes. 
More than ten classes have been proposed, with 
some of them further broken down into sub- 
classes. Though not all these studies agree on 
the details, they demonstrate that there is fairly 
rigid regularity in the ordering of adjectives. 
For example, Goyvaerts \[1968, p. 27\] proposed 
the order quality -< size/length/shape -< 
old/new/young -< color -< nationality -< 
style -< gerund -< denominall; Quirk and 
Greenbaum \[1973, p. 404\] the order general 
-< age -< color -< participle -< provenance 
-< noun -< denominal; and Dixon \[1982, p. 
24\] the order value -< dimension -< physical 
property -< speed -< human propensity -< age 
-< color. 
Researchers have also looked at adjective or- 
dering across languages \[Dixon 1982; Frawley 
1992\]. Frawley \[1992\], for example, observed 
that English, German, Hungarian, Polish, Turk- 
ish, Hindi, Persian, Indonesian, and Basque, all 
1Where A ~ B stands for "A precedes B'. 
order value before size and both of those before 
color. 
As with most manual analyses, the corpora 
used in these analyses are relatively small com- 
pared with modern corpora-based studies. Fur- 
thermore, different criteria were used to ar- 
rive at the classes. To illustrate, the adjec- 
tive "beautiful" can be classified into at least 
two different classes because the phrase "beau- 
tiful dancer" can be transformed from either the 
phrase "dancer who is beautiful", or "dancer 
who dances beautifully". 
Several deep semantic features have been pro- 
posed to explain the regularity among the po- 
sitional behavior of adjectives. Teyssier \[1968\] 
first proposed that adjectival functions, i.e. 
identification, characterization, and classifica- 
tion, affect adjective order. Martin \[1970\] car- 
ried out psycholinguistic studies of adjective 
ordering. Frawley \[1992\] extended the work 
by Kamp \[1975\] and proposed that intensional 
modifiers precede extensional ones. However, 
while these studies offer insights at the complex 
phenomenon of adjective ordering, they cannot 
be directly mapped to a computational proce- 
dure. 
On the other hand, recent computational 
work on sentence planning \[Bateman et al. 
1998; Shaw 1998b\] indicates that generation re- 
search has progressed to a point where hard 
problems such as ellipsis, conjunctions, and or- 
dering of paradigmatically related constituents 
are addressed. Computational corpus stud- 
ies related to adjectives were performed by 
\[Justeson and Katz 1991; Hatzivassiloglou and 
McKeown 1993; Hatzivassiloglou and McKeown 
1997\], but none was directly on the ordering 
problem. \[Knight and Hatzivassiloglou 1995\] 
and \[Langkilde and Knight 1998\] have proposed 
models for incorporating statistical information 
into a text generation system, an approach that 
is similar to our way of using the evidence ob- 
tained from corpus in our actual generator. 
3 Methodology 
In this section, we discuss how we obtain the 
premodifier sequences from the corpus for anal- 
ysis and the three approaches we use for estab- 
lishing ordering relationships: direct corpus ev- 
idence, transitive closure, and clustering analy- 
sis. The result of our analysis is embodied in a 
136 
function, compute_order(A, B), which returns 
the sequential ordering between two premodi- 
tiers, word A and word B. 
To identify orderings among premodifiers, 
premodifier sequences are extracted from sim- 
plex NPs. A simplex NP is a maximal noun 
phrase that includes premodifiers such as de- 
terminers and possessives but not post-nominal 
constituents such as prepositional phrases or 
relative clauses. We use a part-of-speech tag- 
ger \[Brill 1992\] and a finite-state grammar to 
extract simplex NPs. The noun phrases we ex- 
tract start with an optional determiner (DT) or 
possessive pronoun (PRP$), followed by a se- 
quence of cardinal numbers (CDs), adjectives 
(JJs), nouns (NNs), and end with a noun. We 
include cardinal numbers in NPs to capture the 
ordering of numerical information such as age 
and amounts. Gerunds (tagged as VBG) or past 
participles (tagged as VBN), such as "heated" 
in "heated debate", are considered as adjectives 
if the word in front of them is a determiner, 
possessive pronoun, or adjective, thus separat- 
ing adjectival and verbal forms that are con- 
flared by the tagger. A morphology module 
transforms plural nouns and comparative and 
superlative adjectives into their base forms to 
ensure maximization of our frequency counts. 
There is a regular expression filter which re- 
moves obvious concatenations of simplex NPs 
such as "takeover bid last week" and "Tylenol 
40 milligrams". 
After simplex NPs are extracted, sequences 
of premodifiers are obtained by dropping deter- 
miners, genitives, cardinal numbers and head 
nouns. Our subsequent analysis operates on the 
resulting premodifier sequences, and involves 
three stages: direct evidence, transitive closure, 
and clustering. We describe each stage in more 
detail in the following subsections. 
3.1 Direct Evidence 
Our analysis proceeds on the hypothesis that 
the relative order of two premodifiers is fixed 
and independent of context. Given two premod- 
ifiers A and B, there are three possible under- 
lying orderings, and our system should strive 
to find which is true in this particular case: ei- 
ther A comes before B, B comes before A, or 
the order between A and B is truly unimpor- 
tant. Our first stage relies on frequency data 
collected from a training corpus to predict the 
order of adjective and noun premodifiers in an 
unseen test corpus. 
To collect direct evidence on the order of 
premodifiers, we extract all the premodifiers 
from the corpus as described in the previous 
subsection. We first transform the premodi- 
tier sequences into ordered pairs. For example, 
the phrase "well-known traditional brand-name 
drug" has three ordered pairs, "well-known -< 
traditional", "well-known -~ brand-name", and 
"traditional -~ brand-name". A phrase with n 
premodifiers will have (~) ordered pairs. From 
these ordered pairs, we construct a w x w matrix 
Count, where w the number of distinct modi- 
fiers. The cell \[A, B\] in this matrix represents 
the number of occurrences of the pair "A -~ B", 
in that order, in the corpus. 
Assuming that there is a preferred ordering 
between premodifiers A and B, one of the cells 
Count\[A,B\] and Count\[B,A\] should be much 
larger than the other, at least if the corpus be- 
comes arbitrarily large. However, given a corpus 
of a fixed size there will be many cases where 
the frequency counts will both be small. This 
data sparseness problem is exacerbated by the 
inevitable occurrence of errors during the data 
extraction process, which will introduce some 
spurious pairs (and orderings) of premodifiers. 
We therefore apply probabilistic reasoning to 
determine when the data is strong enough to 
decide that A -~ B or B -~ A. Under the null 
hypothesis that the two premoditiers order is ar- 
bitrary, the number of times we have seen one of 
them follows the binomial distribution with pa- 
rameter p -- 0.5. The probability that we would 
see the actually observed number of cases with 
A ~ B, say m, among n pairs involving A and 
B is 
k----m 
which for the special case p = 0.5 becomes 
(0 (0 k=m k=rn 
If this probability is low, we reject the null hy- 
pothesis and conclude that A indeed precedes 
(or follows, as indicated by the relative frequen- 
cies) B. 
137 
3.2 Transitivity 
As we mentioned before, sparse data is a seri- 
ous problem in our analysis. For example, the 
matrix of frequencies for adjectives in our train- 
ing corpus from the medical domain is 99.8% 
empty--only 9,106 entries in the 2,232 x 2,232 
matrix contain non-zero values. To compen- 
sate for this problem, we explore the transi- 
tive properties between ordered pairs by com- 
puting the transitive closure of the ordering re- 
lation. Utilizing transitivity information corre- 
sponds to making the inference that A -< C fol- 
lows from A -~ B and B -< C, even if we have no 
direct evidence for the pair (A, C) but provided 
that there is no contradictory evidence to this 
inference either. This approach allows us to fill 
from 15% (WSJ) to 30% (medical corpus) of the 
entries in the matrix. 
To compute the transitive closure of the order 
relation, we map our underlying data to special 
cases of commutative semirings \[Pereira and Ri- 
ley 1997\]. Each word is represented as a node of 
a graph, while arcs between nodes correspond to 
ordering relationships and are labeled with ele- 
ments from the chosen semiring. This formal- 
ism can be used for a variety of problems, us- 
ing appropriate definitions of the two binary op- 
erators (collection and extension) that operate 
on the semiring's elements. For example, the 
all-pairs shortest-paths problem in graph the- 
ory can be formulated in a rain-plus semiring 
over the real numbers with the operators rain 
for collection and + for extension. Similarly, 
finding the transitive closure of a binary relation 
can be formulated in a max-rain semi-ring or a 
or-and semiring over the set {0, 1}. Once the 
proper operators have been chosen, the generic 
Floyd-Warshall algorithm \[Aho et al. 1974\] can 
solve the corresponding problem without modi- 
fications. 
We explored three semirings appropriate to 
our problem. First, we apply the statistical de- 
cision procedure of the previous subsection and 
assign to each pair of premodifiers either 0 (if 
we don't have enough information about their 
preferred ordering) or 1 (if we do). Then we use 
the or-and semiring over the {0,1} set; in the 
transitive closure, the ordering A -~ B will be 
present if at least one path connecting A and B 
via ordered pairs exists. Note that it is possible 
for both A -~ B and B -~ A to be present in the 
transitive closure. 
This model involves conversions of the corpus 
evidence for each pair into hard decisions on 
whether one of the words in the pair precedes 
the other. To avoid such early commitments, 
we use a second, refined model for transitive 
closure where the arc from A to B is labeled 
with the probability that A precedes indeed B. 
The natural extension of the ({0, 1}, or, and) 
semiring when the set of labels is replaced with 
the interval \[0, 1\] is then (\[0, 1\], max, rain). 
We estimate the probability that A precedes B 
as one minus the probability of reaching that 
conclusion in error, according to the statistical 
test of the previous subsection (i.e., one minus 
the sum specified in equation (2). We obtained 
similar results with this estimator and with the 
maximal likelihood estimator (the ratio of the 
number of times A appeared before B to the 
total number of pairs involving A and B). 
Finally, we consider a third model in which 
we explore an alternative to transitive closure. 
Rather than treating the number attached to 
each arc as a probability, we treat it as a cost, 
the cost of erroneously assuming that the corre- 
sponding ordering exists. We assign to an edge 
(A, B) the negative logarithm of the probability 
that A precedes B; probabilities are estimated 
as in the previous paragraph. Then our prob- 
lem becomes identical to the all-pairs shortest- 
path problem in graph theory; the correspond- 
ing semiring is ((0, +c~), rain, +). We use log- 
arithms to address computational precision is- 
sues stemming from the multiplication of small 
probabilities, and negate the logarithms so that 
we cast the problem as a minimization task (i.e., 
we find the path in the graph the minimizes 
the total sum of negative log probabilities, and 
therefore maximizes the product of the original 
probabilities). 
3.3 Clustering 
As noted earlier, earlier linguistic work on 
the ordering problem puts words into seman- 
tic classes and generalizes the task from order- 
ing between specific words to ordering the cor- 
responding classes. We follow a similar, but 
evidence-based, approach for the pairs of words 
that neither direct evidence nor transitivity can 
resolve. We compute an order similarity mea- 
sure between any two premodifiers, reflecting 
whether the two words share the same pat- 
138 
tern of relative order with other premodifiers 
for which we have sufficient evidence. For each 
pair of premodifiers A and B, we examine ev- 
ery other premodifier in the corpus, X; if both 
A -~ X and B -~ X, or both A ~- X and B ~- X, 
one point is added to the similarity score be- 
tween A and B. If on the other hand A -~ X and 
B ~- X, or A ~- X and B -~ X, one point is sub- 
tracted. X does not contribute to the similarity 
score if there is not sufficient prior evidence for 
the relative order of X and A, or of X and B. 
This procedure closely parallels non-parametric 
distributional tests such as Kendall's T \[Kendall 
1938\]. 
The similarity scores are then converted into 
dissimilarities and fed into a non-hierarchical 
clustering algorithm \[Sp~th 1985\], which sep- 
arates the premodifiers in groups. This is 
achieved by minimizing an objective function, 
defined as the sum of within-group dissimilari- 
ties over all groups. In this manner, premodi- 
tiers that are closely similar in terms of sharing 
the same relative order with other premodifiers 
are placed in the same group. 
Once classes of premodifiers have been in- 
duced, we examine every pair of classes and de- 
cide which precedes the other. For two classes 
C1 and C2, we extract all pairs of premodifiers 
(x, y) with x E C1 and y E C2. If we have evi- 
dence (either direct or through transitivity) that 
x -~ y, one point is added in favor of C1 -~ C2; 
similarly, one point is subtracted if x ~- y. After 
all such pairs have been considered, we can then 
predict the relative order between words in the 
two clusters which we haven't seen together ear- 
lier. This method makes (weak) predictions for 
any pair (A, B) of words, except if (a) both A 
and B axe placed in the same cluster; (b) no or- 
dered pairs (x, y) with one element in the class 
of A and one in the class of B have been identi- 
fied; or (c) the evidence for one class preceding 
the other is in the aggregate equally strong in 
both directions. 
4 The Corpus 
We used two corpora for our analysis: hospi- 
tal discharge summaries from 1991 to 1997 from 
the Columbia-Presbyterian Medical Center, and 
the January 1996 part of the Wall Street Jour- 
nal corpus from the Penn TreeBank \[Marcus et 
al. 1993\]. To facilitate comparisons across the 
two corpora, we intentionally limited ourselves 
to only one month of the WSJ corpus, so that 
approximately the same amount of data would 
be examined in each case. The text in each cor- 
pus is divided into a training part (2.3 million 
words for the medical corpus and 1.5 million 
words for the WSJ) and a test part (1.2 million 
words for the medical corpus and 1.6 million 
words for the WSJ). 
All domain-specific markup was removed, and 
the text was processed by the MXTERMINATOR 
sentence boundary detector \[Reynar and Rat- 
naparkhi 1997\] and Brill's part-of-speech tag- 
ger \[Brill 1992\]. Noun phrases and pairs of pre- 
modifiers were extracted from the tagged corpus 
according to the methods of Section 3. From 
the medical corpus, we retrieved 934,823 sim- 
plex NPs, of which 115,411 have multiple pre- 
modifiers and 53,235 multiple adjectives only. 
The corresponding numbers for the WSJ cor- 
pus were 839,921 NPs, 68,153 NPs with multiple 
premodifiers, and 16,325 NPs with just multiple 
adjectives. 
We separately analyze two groups of premodi- 
tiers: adjectives, and adjectives plus nouns mod- 
ifying the head noun. Although our techniques 
are identical in both cases, the division is moti- 
vated by our expectation that the task will be 
easier when modifiers are limited to adjectives, 
because nouns tend to be harder to match cor- 
rectly with our finite-state grammar and the in- 
put data is sparser for nouns. 
5 Results 
We applied the three ordering algorithms pro- 
posed in this paper to the two corpora sepa- 
rately for adjectives and adjectives plus nouns. 
For our first technique of directly using evidence 
from a separate training corpus, we filled the 
Count matrix (see Section 3.1) with the fre- 
quencies of each ordering for each pair of pre- 
modifiers using the training corpora. Then, we 
calculated which of those pairs correspond to a 
true underlying order relation, i.e., pass the sta- 
tistical test of Section 3.1 with the probability 
given by equation (2) less than or equal to 50%. 
We then examined each instance of ordered pre- 
modifiers in the corresponding test corpus, and 
counted how many of those the direct evidence 
method could predict correctly. Note that if A 
and B occur sometimes as A -~ B and some- 
139 
Corpus Test pairs 
Medical/ adjectives 27,670 
Financial/ adjectives 9,925 
Medical/ 
adjectives 74,664 
and nouns 
Financial/ 
adjectives 62,383 
and nouns 
Direct evidence Transitivity Transitivity (maxomin) (min-plus) 
92.67% (88.20%-98.47%) 89.60% (94.94%-91.79%) 94.93% (97.20%-96.16%) 
75.41% (53.85%-98.37%) 79.92% (72.76%-90.79%) 80.77% (76.36%-90.18%) 
88.79% (80.38%-98.35%) 87.69% (90.86%-91.50%) 90.67% (91.90%-94.27%) 
65.93% (35.76%-95.27%) 69.61% (56.63%-84.51%) 71.04% (62.48%-83.55%) 
Table 1: Accuracy of direct-evidence and transitivity methods on different data strata of our test 
corpora. In each case, overall accuracy is listed first in bold, and then, in parentheses, the percentage 
of the test pairs that the method has an opinion for (rather than randomly assign a decision because 
of lack of evidence) and the accuracy of the method within that subset of test cases. 
times as B -< A, no prediction method can get 
all those instances correct. We elected to follow 
this evaluation approach, which lowers the ap- 
parent scores of our method, rather than forcing 
each pair in the test corpus to one unambiguous 
category (A -< B, B -< A, or arbitrary). 
Under this evaluation method, stage one of 
our system achieves on adjectives in the medi- 
cal domain 98.47% correct decisions on pairs for 
which a determination of order could be made. 
Since 11.80% of the total pairs in the test corpus 
involve previously unseen combinations of ad- 
jectives and/or new adjectives, the overall accu- 
racy is 92.67%. The corresponding accuracy on 
data for which we can make a prediction and the 
overall accuracy is 98.35% and 88.79% for adjec- 
tives plus nouns in the medical domain, 98.37% 
and 75.41% for adjectives in the WSJ data, and 
95.27% and 65.93% for adjectives plus nouns in 
the WSJ data. Note that the WSJ corpus is 
considerably more sparse, with 64.24% unseen 
combinations of adjective and noun premodi- 
tiers in the test part. Using lower thresholds 
in equation (2) results in a lower percentage of 
cases for which the system has an opinion but a 
higher accuracy for those decisions. For exam- 
ple, a threshold of 25% results in the ability to 
predict 83.72% of the test adjective pairs in the 
medical corpus with 99.01% accuracy for these 
cases. 
We subsequently applied the transitivity 
stage, testing the three semiring models dis- 
cussed in Section 3.2. Early experimentation 
indicated that the or-and model performed 
poorly, which we attribute to the extensive 
propagation of decisions (once a decision in fa- 
vor of the existence of an ordering relationship is 
made, it cannot be revised even in the presence 
of conflicting evidence). Therefore we report re- 
sults below for the other two semiring models. 
Of those, the min-plus semiring achieved higher 
performance. That model offers additional pre- 
dictions for 9.00% of adjective pairs and 11.52% 
of adjective-plus-noun pairs in the medical cor- 
pus, raising overall accuracy of our predictions 
to 94.93% and 90.67% respectively. Overall ac- 
curacy in the WSJ test data was 80.77% for ad- 
jectives and 71.04% for adjectives plus nouns. 
Table 1 summarizes the results of these two 
stages. 
Finally, we applied our third, clustering ap- 
proach on each data stratum. Due to data 
sparseness and computational complexity is- 
sues, we clustered the most frequent words in 
each set of premodifiers (adjectives or adjectives 
plus nouns), selecting those that occurred at 
least 50 times in the training part of the cor- 
pus being analyzed. We report results for the 
adjectives selected in this manner (472 frequent 
adjectives from the medical corpus and 307 ad- 
jectives from the WSJ corpus). For these words, 
the information collected by the first two stages 
of the system covers most pairs. Out of the 
111,176 (=472.471/2) possible pairs in the med- 
ical data, the direct evidence and transitivity 
stages make predictions for 105,335 (94.76%); 
the corresponding number for the WSJ data is 
40,476 out of 46,971 possible pairs (86.17%). 
140 
The clustering technique makes ordering pre- 
dictions for a part of the remaining pairs--on 
average, depending on how many clusters are 
created, this method produces answers for 80% 
of the ordering cases that remained unanswered 
after the first two stages in the medical corpus, 
and for 54% of the unanswered cases in the WSJ 
corpus. Its accuracy on these predictions is 56% 
on the medical corpus, and slightly worse than 
the baseline 50% on the WSJ corpus; this lat- 
ter, aberrant result is due to a single, very fie- 
quent pair, chief executive, in which executive 
is consistently mistagged as an adjective by the 
part-of-speech tagger. 
Qualitative analysis of the third stage's out- 
put indicates that it identifies many interest- 
ing relationships between premodifiers; for ex- 
ample, the pair of most similar premodifiers on 
the basis of positional information is left and 
right, which clearly fall in a class similar to the 
semantic classes manually constructed by lin- 
guists. Other sets of adjectives with strongly 
similar members include {mild, severe, signifi- 
cant} and {cardiac, pulmonary, respiratory}. 
We conclude our empirical analysis by test- 
ing whether a separate model is needed for pre- 
dicting adjective order in each different domain. 
We trained the first two stages of our system 
on the medical corpus and tested them on the 
WSJ corpus, obtaining an overall prediction ac- 
curacy of 54% for adjectives and 52% for adjec- 
rives plus nouns. Similar results were obtained 
when we trained on the financial domain and 
tested on medical data (58% and 56%). These 
results are not much better than what would 
have been obtained by chance, and are clearly 
inferior to those reported in Table 1. Although 
the two corpora share a large number of ad- 
jectives (1,438 out of 5,703 total adjectives in 
the medical corpus and 8,240 in the WSJ cor- 
pus), they share only 2 to 5% of the adjective 
pairs. This empirical evidence indicates that ad- 
jectives are used differently in the two domains, 
and hence domain-specific probabilities must be 
estimated, which increases the value of an au- 
tomated procedure for the prediction task. 
6 Using Ordered Premodifiers in 
Text Generation 
Extracting sequential ordering information of 
premodifiers is an off-line process, the results of 
(a) "John is a diabetic male white 74- 
year-old hypertensive patient 
with a red swollen mass in the 
left groin." 
(b) "John is a 74-year-old 
hypertensive diabetic white male 
patient with a swollen red mass 
in the left groin." 
Figure 1: (a) Output of the generator without 
our ordering module, containing several errors. 
(b) Output of the generator with our ordering 
module. 
which can be easily incorporated into the over- 
all generation architecture. We have integrated 
the function compute_order(A, B) into our mul- 
timedia presentation system MAGIC \[Dalai et 
al. 1996\] in the medical domain and resolved 
numerous premodifier ordering tasks correctly. 
Example cases where the statistical prediction 
module was helpful in producing a more fluent 
description in MAGIC include placing age infor- 
mation before ethnicity information and the lat- 
ter before gender information, as well as spe- 
cific ordering preferences, such as "thick" before 
"yellow" and "acute" before "severe". MAGIC'S 
output is being evaluated by medical doctors, 
who provide us with feedback on different com- 
ponents of the system, including the fluency of 
the generated text and its similarity to human- 
produced reports. 
Lexicalization is inherently domain depen- 
dent, so traditional lexica cannot be ported 
across domains without major modifications. 
Our approach, in contrast, is based on words 
extracted from a domain corpus and not on 
concepts, therefore it can be easily applied to 
new domains. In our MAGIC system, aggre- 
gation operators, such as conjunction, ellip- 
sis, and transformations of clauses to adjectival 
phrases and relative clauses, are performed to 
combine related clauses together and increase 
conciseness \[Shaw 1998a; Shaw 1998b\]. We 
wrote a function, reorder_premod(... ), which is 
called after the aggregation operators, takes the 
whole lexicalized semantic representation, and 
reorders the premodifiers right before the lin- 
guistic realizer is invoked. Figure i shows the 
difference in the output produced by our gener- 
141 
ator with and without the ordering component. 
7 Conclusions and Future Work 
We have presented three techniques for explor- 
ing prior corpus evidence in predicting the order 
of premodifiers within noun phrases. Our meth- 
ods expand on observable data, by inferring 
new relationships between premodifiers even for 
combinations of premodifiers that do not occur 
in the training corpus. We have empirically val- 
idated our approach, showing that we can pre- 
dict order with more than 94% accuracy when 
enough corpus data is available. We have also 
implemented our procedure in a text generator, 
producing more fluent output sentences. 
We are currently exploring alternative ways 
to integrate the classes constructed by the third 
stage of our system into our generator. In 
the future, we will experiment with semantic 
(rather than positional) clustering of premodi- 
tiers, using techniques such as those proposed in 
\[Hatzivassiloglou and McKeown 1993; Pereira et 
al. 1993\]. The qualitative analysis of the output 
of our clustering module shows that frequently 
positional and semantic classes overlap, and we 
are interested in measuring the extent of this 
phenomenon quantitatively. Conditioning the 
premodifier ordering on the head noun is an- 
other promising approach, at least for very fre- 
quent nouns. 
8 Acknowledgments 
We are grateful to Kathy McKeown for numer- 
ous discussions during the development of this 
work. The research is supported in part by 
the National Library of Medicine under grant 
R01-LM06593-01 and the Columbia University 
Center for Advanced Technology in High Per- 
formance Computing and Communications in 
Healthcaxe (funded by the New York State Sci- 
ence and Technology Foundation). Any opin- 
ions, findings, or recommendations expressed in 
this paper are those of the authors and do not 
necessarily reflect the views of the above agen- 
cies. 

References 
Alfred V. Aho, John E. Hopcroft, and Jeffrey D. 
Ullman. The Design and Analysis of Com- 
puter Algorithms. Addison-Wesley, Reading, 
Massachusetts, 1974. 
Carl Bache: The Order of Premodifying Adjec- 
tives in Present-Day English. Odense Univer- 
sity Press, 1978. 
John A. Bateman; Thomas Kamps, Jorg Kleinz, 
and Klaus Reichenberger. Communicative 
Goal-Driven NL Generation and Data-Driven 
Graphics Generation: An ArchitecturM Syn- 
thesis for Multimedia Page Generation. In 
Proceedings of the 9th International Work- 
shop on Natural Language Generation., pages 
8-17, 1998. 
Eric Brill. A Simple Rule-Based Part of Speech 
Tagger. In Proceedings of the Third Confer- 
ence on Applied Natural Language Process- 
ing, Trento, Italy, 1992. Association for Com- 
putational Linguistics. 
Mukesh Dalal, Steven K. Feiner, Kathleen R. 
McKeown, Desmond A. Jordan, Barry Allen, 
and Yasser al Safadi. MAGIC: An Exper- 
imental System for Generating Multimedia 
Briefings about Post-Bypass Patient Status. 
In Proceedings of the 1996 Annual Fall Sym- 
posium of the American Medical Informat- 
ics Association (AMIA-96), pages 684-688, 
Washington, D.C., October 26-30 1996. 
R. M. W. Dixon. Where Have All the Adjectives 
Gone? Mouton, New York, 1982. 
William Frawley. Linguistic Semantics. 
Lawrence Erlbaum Associates, Hillsdale, New 
Jersey, 1992. 
D. L. Goyvaerts. An Introductory Study on the 
Ordering of a String of Adjectives in Present- 
Day English. Philologica Pragensia, 11:12- 
28, 1968. 
Ralph Grishman, Catherine Macleod, and 
Adam Meyers. COMLEX Syntax: Building 
a Computational Lexicon. In Proceedings of 
the 15th International Conference on Com- 
putational Linguistics (COLING-9~), Kyoto, 
Japan, 1994. 
Vasileios Hatzivassiloglou and Kathleen McKe- 
own. Towards the Automatic Identification of 
Adjectival Scales: Clustering Adjectives Ac- 
cording to Meaning. In Proceedings of the 
31st Annual Meeting of the ACL, pages 172- 
142 
182, Columbus, Ohio, June 1993. Association 
for Computational Linguistics. 
Vasileios Hatzivassiloglou and Kathleen McKe. 
own. Predicting the Semantic Orientation of 
Adjectives. In Proceedings of the 35th Annual 
Meeting of the A CL, pages 174-181, Madrid, 
Spain, July 1997. Association for Computa- 
tional Linguistics. 
John S. Justeson and Slava M. Katz. Co- 
occurrences of Antonymous Adjectives and 
Their Contexts. Computational Linguistics, 
17(1):1-19, 1991. 
J. A. W. Kamp. Two Theories of Adjectives. 
In E. L. Keenan, editor, Formal Semantics 
of Natural Language. Cambridge University 
Press, Cambridge, England, 1975. 
Maurice G. Kendall. A New Measure of 
Rank Correlation. Biometrika, 30(1-2):81- 
93, June 1938. 
Kevin Knight and Vasileios Hatzivassiloglou. 
Two-Level, Many-Paths Generation. In Pro- 
ceedings of the 33rd Annual Meeting of the 
A CL, pages 252-260, Boston, Massachusetts, 
June 1995. Association for Computational 
Linguistics. 
Irene Langkilde and Kevin Knight. Genera- 
tion that Exploits Corpus-Based Statistical 
Knowledge. In Proceedings of the 36th An- 
nual Meeting of the A CL and the 17th Inter- 
national Conference on Computational Lin- 
guistics (ACL//COLING-98), pages 704-710, 
Montreal, Canada, 1998. 
Yakov Malkiel. Studies in Irreversible Bino- 
mials. Lingua, 8(2):113-160, May 1959. 
Reprinted in \[Malkiel 1968\]. 
Yakov Malkiel. Essays on Linguistic Themes. 
Blackwell, Oxford, 1968. 
Mitchell P. Marcus, Beatrice Santorini, and 
Mary Ann Marcinkiewicz. Building a large 
annotated corpus of English: The Penn Tree- 
bank. Computational Linguistics, 19:313- 
330, 1993. 
J. E. Martin. Adjective Order and Juncture. 
Journal of Verbal Learning and Verbal Behav- 
ior, 9:379-384, 1970. 
George A. Miller, Richard Beckwith, Christiane 
Fellbaum, Derek Gross, and Katherine J. 
Miller. Introduction to WordNet: An On- 
Line LexicM Database. International Journal 
of Lexicography (special issue), 3(4):235-312, 
1990. 
Fernando C. N. Pereira and Michael D. Ri- 
ley. Speech Recognition by Composition of 
Weighted Finite Automata. In Emmanuel 
Roche and Yves Schabes, editors, Finite- 
State Language Processing, pages 431-453. 
MIT Press, Cambridge, Massachusetts, 1997. 
Fernando Pereira, Naftali Tishby, and Lillian 
Lee. Distributional Clustering of English 
Words. In Proceedings of the 31st Annual 
Meeting of the ACL, pages 183-190, Colum- 
bus, Ohio, June 1993. Association for Com- 
putational Linguistics. 
Randolph Quirk and Sidney Greenbaum. A 
Concise Grammar of Contemporary English. 
Harcourt Brace Jovanovich, Inc., London, 
1973. 
Jeffrey C. Reynar and Adwait Ratnaparkhi. A 
Maximum Entropy Approach to Identifying 
Sentence Boundaries. In Proc. of the 5th Ap- 
plied Natural Language Conference (ANLP- 
97), Washington, D.C., April 1997. 
James Shaw. Clause Aggregation Using Lin- 
guistic Knowledge. In Proceedings of the 9th 
International Workshop on Natural Language 
Generation., pages 138-147, 1998. 
James Shaw. Segregatory Coordination and El- 
lipsis in Text Generation. In Proceedings of 
the 36th Annual Meeting of the ACL and the 
17th International Conference on Computa- 
tional Linguistics (A CL/COLING-98), pages 
1220-1226, Montreal, Canada, 1998. 
Helmuth Sp~th. Cluster Dissection and Anal- 
ysis: Theory, FORTRAN Programs, Exam- 
ples. Ellis Horwood, Chichester, England, 
1985. 
J. Teyssier. Notes on the Syntax of the Adjec- 
tive in Modern English. Behavioral Science, 
20:225-249, 1968. 
Zeno Vendler. Adjectives and Nominalizations. 
Mouton and Co., The Netherlands, 1968. 
Benjamin Lee Whorf. Language, Thought, and 
Reality; Selected Writings. MIT Press, Cam- 
bridge, Massachusetts, 1956. 
