AUTOMATICALLY EXTRACTING AND REPRESENTING 
COLLOCATIONS FOR LANGUAGE GENERATION* 
Frank A. Smadja t 
and 
Kathleen R. McKeown 
Department of Computer Science 
Columbia University 
New York, NY 10027 
ABSTRACT 
Collocational knowledge is necessary for language gener- 
ation. The problem is that collocations come in a large 
variety of forms. They can involve two, three or more 
words, these words can be of different syntactic cate- 
gories and they can be involved in more or less rigid 
ways. This leads to two main difficulties: collocational 
knowledge has to be acquired and it must be represented 
flexibly so that it can be used for language generation. 
We address both problems in this paper, focusing on the 
acquisition problem. We describe a program, Xtract, 
that automatically acquires a range of collocations from 
large textual corpora and we describe how they can be 
represented in a flexible lexicon using a unification based 
formalism. 
1 INTRODUCTION 
Language generation research on lexical choice has fo- 
cused on syntactic and semantic constraints on word 
choice and word ordering. Colloca~ional constraints, 
however, also play a role in how words can co-occur in 
the same sentence. Often, the use of one word in a par- 
ticular context of meaning will require the use of one or 
more other words in the same sentence. While phrasal 
lexicons, in which lexical associations are pre-encoded (e.g., 
\[Kukich 83\], \[Jacobs 85\], \[Danlos 87\]), allow for the 
treatment of certain types of collocations, they also have 
problems. Phrasal entries must be compiled by hand 
which is both expensive and incomplete. Furthermore, 
phrasal entries tend to capture rather rigid, idiomatic 
expressions. In contrast, collocations vary tremendously 
in the number of words involved, in the syntactic cat- 
egories of the words, in the syntactic relations between 
the words, and in how rigidly the individual words are 
used together. For example, in some cases, the words of 
a collocation must be adjacent, while in others they can 
be separated by a varying number of other words. 
*The research reported in this paper was partially sup- 
ported by DARPA grant N00039-84-C-0165, by NSF grant 
IRT-84-51438 and by ONR grant N00014-89-J-1782. 
tMost of this work is also done in collaboration with Bell 
Communication Research, 445 South Street, Morristown, NJ 
07960-1910 
In this paper, we identify a range of collocations that 
are necessary for language generation, including open 
compounds of two or more words, predicative relations (e.g., subject-verb), 
and phrasal templates represent- 
ing more idiomatic expressions. We then describe how 
Xtract automatically acquires the full range of colloca- 
tions using a two stage statistical analysis of large do- 
main specific corpora. Finally, we show how collocations 
can be efficiently represented in a flexible lexicon using a 
unification based formalism. This is a word based lexicon 
that has been macrocoded with collocational knowledge. 
Unlike a purely phrasal lexicon, we thus retain the flexi- 
bility of word based lexicons which allows for collocations 
to be combined and merged in syntactically acceptable 
ways with other words or phrases of the sentence. Unlike 
pure word based lexicons, we gain the ability to deal with 
a variety of phrasal entries. Furthermore, while there has 
been work on the automatic retrieval of lexical informa- 
tion from text \[Garside 87\], \[Choueka 88\], \[Klavans 88\], 
\[Amsler 89\], \[Boguraev & Briscoe 89\], \[Church 89\], none 
of these systems retrieves the entire range of collocations 
that we identify and no real effort has been made to use 
this information for language generation \[Boguraev & 
Briscoe 89\]. 
In the following sections, we describe the range of col- 
locations that we can handle, the fully implemented ac- 
quisition method, results obtained, and the representa- 
tion of collocations in Functional Unification Grammars 
(FUGs) \[Kay 79\]. Our application domain is the domain 
of stock market reports and the corpus on which our ex- 
pertise is based consists of more than 10 million words 
taken from the Associated Press news wire. 
SINGLE WORDS TO WHOLE 
PHRASES: WHAT KIND OF 
LEXICAL UNITS ARE NEEDED? 
Collocational knowledge indicates which members of a 
set of roughly synonymous words co-occur with other 
words and how they combine syntactically. These affini- 
ties can not be predicted on the basis of semantic or syn- 
tactic rules, but can be observed with some regularity in 
• text \[Cruse 86\]. We have found a range of collocations 
from word pairs to whole phrases, and as we shall show, 
252 
this range will require a flexible method of representa- 
tion. 
3 THE ACQUISITION METHOD: 
Xtract 
Open Compounds . Open compounds involve unin- 
terrupted sequences of words such as "stock mar- 
ket," "foreign ezchange," "New York Stock Ez- 
change," "The Dow Jones average of $0 indust~- 
als." They can include nouns, adjectives, and closed 
class words and are similar to the type of colloca- 
tions retrieved by \[Choueka 88\] or \[Amsler 89\]. An 
open compound generally functions as a single con- 
stituent of a sentence. More open compound exam- 
ples are given in figure 1. x 
Predicative Relations consist of two (or several) 
words repeatedly used together in a similar syn- 
tactic relation. These lexical relations axe harder 
to identify since they often correspond to inter- 
rupted word sequences in the corpus. They axe also 
the most flexible in their use. This class of col 
locations is related to Mel'~uk's Lexical Functions 
\[Mel'~uk 81\], and Benson's L-type relations \[Ben- 
son 86\]. Within this class, Xtract retrieves subject- 
verb, verb-object, noun-adjective, verb-adverb, verb- 
verb and verb-particle predicative relations. Church 
\[Church 89\] also retrieves verb-particle associations. 
Such collocations require a representation that al- 
lows for a lexical function relating two or more 
words. Examples of such collocations axe given in 
figure 2. 2 
Phrasal templates: consist of idiomatic phrases con- 
taining one, several or no empty slots. They axe 
extremely rigid and long collocations. These almost 
complete phrases are quite representative of a given 
domain. Due to their slightly idiosyncratic struc- 
ture, we propose representing and generating them 
by simple template filling. Although some of these 
could be generated using a word based lexicon, in 
general, their usage gives an impression of fluency 
that cannot be equaled with compositional genera- 
tion alone. Xtract has retrieved several dozens of 
such templates from our stock market corpus, in- 
eluding: 
"The NYSE's composite indez of all its listed com- 
mon stocks rose 
*NUMBER* to *NUMBER*" 
"On the American Stock Ezchange the market value 
indez was up 
*NUMBER* at *NUMBER*" 
"The Dow Jones average of 30 industrials fell 
*NUMBER* points to *NUMBER*" 
"The closely watched indez had been down about 
*NUMBER* points in 
the first hour of trading" 
"The average finished the week with a net loss of 
*NUMBER *" 
I All the examples related to the stock market domain have 
been actually retrieved by Xtract. 
2In the examples, the "~" sign, represents a gap of zero, 
one or several words. The "¢*" sign means that the two 
words can be in any order. 
In order to produce sentences containing collocations, a 
language generation system must have knowledge about 
the possible collocations that occur in a given domain. 
In previous language generation work \[Danlos 87\], \[Ior- 
danskaja 88\], \[Nirenburg 88\], collocations are identified 
and encoded by hand, sometimes using the help of lexi- 
cographers (e.g., Danlos' \[Daulos 87\] use of Gross' \[Gross 
75\] work). This is an expensive and time-consuming pro- 
cess, and often incomplete. In this section, we describe 
how Xtract can automatically produce the full range of 
collocations described above. 
Xtract has two main components, a concordancing 
component, Xconcord, and a statistical component, 
Xstat. Given one or several words, Xconcord locates 
all sentences in the corpus containing them. Xstat is 
the co-occurrence compiler. Given Xconcord's output, 
it makes statistical observations about these words and 
other words with which they appear. Only statistically 
significant word pairs are retained. In \[Smadja 89a\], and 
\[Smadja 88\], we detail an earlier version of Xtract and 
its output, and in \[Smadja 891)\] we compare our results 
both qualitatively and quantitatively to the lexicon used 
in \[Kukich 83\]. Xtract has also been used for informa- 
tion retrieval in \[Maarek & Smadja 89\]. In the updated 
version of Xtract we describe here, statistical signifi- 
cance is based on four parameters, instead of just one, 
and a second stage of processing has been added that 
looks for combinations of word pairs produced in the 
first stage, resulting in multiple word collocations. 
Stage one- In the first phase, Xconcord is called for a 
single open class word and its output is pipeIined to 
Xstat which then analyses the distribution of words 
in this sample. The output of this first stage is a list 
of tuples (wx,w2, distance, strength, spread, height, 
type), where (wl, w2) is a lexical relation between 
two open-class words (Wx and w2). Some results 
are given in Table 1. "Type" represents the syn- 
tactic categories of wl and w2. 3. "Distance" is the 
relative distance between the two words, wl and w2 
(e.g., a distance of 1 means w~ occurs immediately 
after wx and a distance of-i means it occurs imme- 
diately before it). A different tuple is produced for 
each statistically significant word pair and distance. 
Thus, ff the same two words occur equally often sep- 
arated by two different distances, they will appear 
twice in the list. "Strength" (also computed in the 
earlier version of Xtract) indicates how strongly the 
two words are related (see \[Smadja 89a\]). "Spread" 
is the distribution of the relative distance between 
the two words; thus, the larger the "spread" the 
more rigidly they are used in combination to one 
another. "Height" combines the factors of "spread" 
3In order to get part of speech information we use a 
stochastic word tagger developed at AT&T Bell Laborato- 
ries by Ken Church \[Church 88\] 
253 
wordl 
stock 
president 
trade 
Table 1: Some binary lexical relations. 
word2 
market 
vice 
deficit 
distance 
-I 
strength 
47.018 
40.6496 
30.3384 
spread 
28.5 
29.7 
28.4361 
11457.1 
10757 
7358.87 
 vre  r avmcm'am     
;,,,Lo¢,~,c-- i~fft~,,,,~l , illll(;t£1 I~.'lgl~:l~i Ig~llI,~lt:.. 
composite 
blue 
totaled 
closing 
-1 12.3874 29.0682 3139.89 index 
chip -1 
-4 
-1 
-2 
-1 
-1 
10.078 
shares 
price 
stocks 
volume 
20.7815 
23.0465 
27.354 
16.8724 
19.3312 
13.5184 
5.43739 
listed 
takeover 
takeovers 
takeover 
takeovers 
30 
29.3682 
25.9415 
23.8696 
29.7 
28.1071 
29.3682 
25.7917 
totaled 
bid 
hostile 
o~er 
2721.06 
5376.87 
4615.48 
4583.57 
4464.89 
4580.39 
3497.67 
1084.05 
I ll"i~.~ l ' _ll-~,'l I~,\[lll Jill '\[ Ib'\]l~$'l 
\[ Type 
NN 
NN 
NN 
NN 
NN 
NN 
NJ 
NJ 
NJ 
NV 
NV 
NV 
NV 
NN 
NJ 
iNN 
I NV 
Table 2: Concordances for "average indus~rial" 
On Tuesday the Dow Jones industrial average rose 26.28 points to 2 304.69. 
The Dow 
... a selling spurt that sent the Dow 
On Wednesday the Dow 
The Dow 
The Dow 
... Thursday with the Dow 
... swelling the Dow 
The rise in the Dow 
Jones industrial average 
Jones industrial average 
Jones industrial average 
Jones industrial average 
Jones industrial average 
Jones industrial average 
Jones industrial average 
Jones industrial average 
went up 11.36 points today. 
down sharply in the first hour of trading. 
showed some strength as ... 
was down 17.33 points to 2,287.36 ... 
had the biggest one day gain of its history ... 
soaring a record 69.89 points to ... 
by more than 475 points in the process ... 
was the biggest since a 54.14 point jump on ... 
Table 
The NYSE s composite index 
The NYSE s composite index 
The NYSE s composite index 
The NYSE s composite index 
The NYSE s composite index 
The NYSE s composite index 
The NYSE s composite index 
The NYSE s composite index 
The NYSE s composite index 
3: Concordances for "composite indez" 
of all its listed common stocks fell 1.76 to 164.13. 
of all its listed common stocks fell 0.98 to 164.91. 
of all its listed common stocks fell 0.96 to 164.93. 
of all its listed common stocks fell 0.91 to 164.98. 
of all its listed common stocks rose 1.04 to 167.08. 
of all its listed common stocks rose 0.76 
of all its listed common stocks rose 0.50 to 166.54. 
of all its listed common stocks rose 0.69 to 166.73. 
of all its listed common stocks fell 0.33 to 170.63. 
254 
open compound 
open compound 
open compound 
open compound 
open compound 
open compound 
open compound 
open compound 
open compound 
open compound 
open compound 
open compound 
open compound 
open compound 
open compound 
qeading industrialized countries" 
"the Dow Jones average of .90 industriais" 
"bear/buil market" 
"the Dow Jones industrial average" 
"The NYSE s composite indez of all it8 listed common stocks" 
"Advancing/winuing/losing/declluing issues" 
"The NASDAQ composite indez for the over the counter market" 
"stock market" 
"central bank 
'qeveraged buyout" 
"the gross national product" 
'q~lue chip stocks" 
"White House spokesman Marlin Fitztoater" 
"takeover speculation/strategist/target/threat/attempt" 
"takeover bid /battle/ defense/ efforts/ flght /law /proposal / rumor" 
Figure 1: Some examples of open compounds 
noun adjective 
noun adjective 
noun adjective 
subject verb 
subject verb 
subject verb 
verb adverb 
verb object 
verb object 
verb particle 
verb verb 
verb verb 
examples 
"heavy/Hght D tradlng/smoker/traffic" 
"hlgh/low ~ fertility/pressure/bounce" 
"large/small D crowd/retailer/client" 
"index ~ rose 
"stock ~ \[rose, fell, closed, jumped, continued, declined, crashed, ...\]" 
"advancers D \[outnumbered, outpaced, overwhelmed, outstripped\]" 
"trade ¢~ actively," "mix ¢~ narrowly," "use ¢~ widely," "watch ¢~ closely" 
~posted ~ gain 
'~momentum D \[pick up, build, carry over, gather, loose, gain\]" 
"take ~ from," "raise ~ by," "mix D with" 
"offer to \[acquire, buy"\] 
"agree to \[acquire, buy"\] 
Figure 2: Some examples of predicative collocations 
and "strength" resulting in a ranking of the two 
words for their "distances". Church \[Church 89\] 
produces results similar to those presented in the 
table using a different statistical method. However, 
Church's method is mainly based on the computa- 
tion of the "strength" attribute, and it does not take 
into account "spread" and "height". As we shall 
see, these additional parameters are crucial for pro- 
ducing multiple word collocations and distinguish- 
ing between open compounds (words are adjacent) 
and predicative relations (words can be separated 
by varying distance). 
Stage two: In the second phase, Xtraet first uses the 
same components but in a different way. It starts 
with the pairwise lexical relations produced in Stage 
one to produce multiple word collocations, then 
classifies the collocations as one of three classes iden- 
tified above, end finally attempts to determine the 
syntactic relations between the words of the collo- 
cation. To do this, Xtract studies the lexical re- 
lations in context, which is exactly what lexicogra- 
phers do. For each entry of Table 1, Xtract calls 
Xconcord on the two words wl and w~ to pro- 
duce the concordances. Tables 2 and 3 show the 
concordances (output of Xconcord) for the input 
pairs: "average-industrial" end "indez-composite". 
Xstat then compiles information on the words sur- 
rounding both wl and w2 in the corpus. This stage 
allows us to filter out incorrect associations such 
as "blue.stocks" or "advancing-market" and replace 
them with the appropriate ones, "blue chip stocks," 
"the broader market in the NYSE advancing is. 
sues." This stage also produces phrasal templates 
such as those given in the previous section. In short, 
stage two filters inapropriate results and combines 
word pairs to produce multiple word combinations. 
To make the results directly usable for language gen- 
eration we are currently investigating the use of a 
bottom-up parser in combination with stage two in 
order to classify the collocations according to syn- 
tactic criteria. For example if the lexical relation 
involves a noun and a verb it determines if it is a 
subject-verb or a verb-object collocation. We plan 
to do this using a deterministic bottom up parser 
developed at Bell Communication Research \[Abney 
89\] to parse the concordances. The parser would 
analyse each sentence of the concordances and the 
parse trees would then be passed to Xstat. 
Sample results of Stage two are shown in Fig- 
ures 1, 2 and 3. Figure 3 shows phrasal templates and 
open compounds. Xstat notices that the words "com- 
posite and "indez" are used very rigidly throughout the 
corpus. They almost always appear in one of the two 
255 
lexical relation 
composite-indez 
composite-indez 
collocation 
"The NYSE's composite indez of all its listed common 
stocks fell *NUMBER* to *NUMBER*" 
"the NYSE's composite indez of all its listed common 
stocks rose *NUMBER* to *NUMBER*." 
\[ "close-industrial" "Five minutes before the close the Dow Jones average of 30 industrials 
~as up/down *NUMBER* to/from *NUMBER*" 
"the Dow Jones industrial average." "average industrial" 
"advancing-market" 
"block- trading" 
"cable- television" 
"the broader market in the NYSE advancing issues" 
"Jack Baker head of block trading in Shearson Lehman Brothers Inc." 
"cable television" 
Figure 3: Example collocations output of stage two. 
sentences. The lexical relation composite-indez thus pro- 
duces two phrasal templates. For the lexical relation 
average-industrial Xtract produces an open compound 
collocation as illustrated in figure 3. Stage two also con- 
firms pairwise relations. Some examples are given in 
figure 2. By examining the parsed concordances and 
extracting recurring patterns, Xstat produces all three 
types of collocations. 
4 HOW TO REPRESENT THEM 
FOR LANGUAGE GENERATION? 
Such a wide variety of lexical associations would be dif- 
ficnlt to use with any of the existing lexicon formalisms. 
We need a flexible lexicon capable of using single word 
entries, multiple word entries as well as phrasal tem- 
plates and a mechanism that would be able to gracefully 
merge and combine them with other types of constraints. 
The idea of a flexible lexicon is not novel in itself. The 
lexical representation used in \[Jacobs 85\] and later re- 
fined in \[Desemer & Jabobs 87\] could also represent a 
wide range of expressions. However, in this language, 
collocational, syntactic and selectional constraints are 
mixed together into phrasal entries. This makes the lex- 
icon both difficnlt to use and difficult to compile. In the 
following we briefly show how FUGs can be successfully 
used as they offer a flexible declarative language as well 
as a powerful mechanism for sentence generation. 
We have implemented a first version of Cook, a sur- 
face generator that uses a flexible lexicon for express- 
in~ co-occurrence constraints. Cook uses FUF \[Elhadad 
90J, an extended implementation of PUGs, to uniformly 
represent the lexicon and the syntax as originally sug- 
gested by Halliday \[Halliday 66\]. Generating a sentence 
is equivalent to unifying a semantic structure (Logical 
Form) with the grammar. The grammar we use is di- 
vided into three zones, the "sentential," the "lezical" 
and "the syntactic zone." Each zone contains constraints 
pertaining to a given domain and the input logical form 
is unified in turn with the three zones. As it is, full 
backtracking across the three zones is allowed. 
• The sentential zone contains the phrasal templates 
against which the logical form is unified first. A 
sententiai entry is a whole sentence that should be 
used in a given context. This context is specified by 
subparts of the logical form given as input. When 
there is a match at this point, unification succeeds 
and generation is reduced to simple template filling. 
• The lezical zone contains the information used to 
lexicalize the input. It contains collocational infor- 
mation along with the semantic context in which 
to use it. This zone contains predicative and open 
compound collocations. Its role is to trigger phrases 
or words in the presence of other words or phrases. 
Figure 5 is a portion of the lexical grammar used 
in Cook. It illustrates the choice of the verb to be 
used when "advancers" is the subject. (See below 
for more detail). 
• The syniacgic zone contains the syntactic grammar. 
It is used last as it is the part of the grammar en- 
suring the correctness of the produced sentences. 
An example input logical form is given in Figure 4. In 
this example, the logical form represents the fact that on 
the New York stock exchange, the advancing issues (se- 
mantic representation or sere-R: c:winners) were ahead 
(predicate c:lead)of the losing ones (sem-R: c:losers)and 
that there were 3 times more winning issues than losing 
ones ratio). In addition, it also says that this ratio is 
of degree 2. A degree of 1 is considered as a slim lead 
whereas a degree of 5 is a commanding margin. When 
unified with the grammar, this logical form produces the 
sentences given in Figure 6. 
As an example of how Cook uses and merges co- 
occurrence information with other kind of knowledge 
consider Figure 5. The figure is an edited portion of 
the lexical zone. It only includes the parts that are rel- 
evant to the choice of the verb when "advancers" is the 
subject. The lex and sem-R attributes specify the lex- 
eme we are considering ("advancers") and its semantic 
representation (c:winners). 
The semantic context (sere-context) which points to 
the logical form and its features will then be used in order 
256 
logical-form 
predicate-name = p : lead 
leaders = \[ sem-R L 
ratio 
trailers 
: c : winners \] J 
: 3 
sem-R : c : losers \] 
: ratio ---- I 
degree = 2 
Figure 4: LF: An example logical form used by Cook 
o,, °°° ooo 
lex = "advancer" 
sam-R = c:~oinners 
sem-context = <logical-form> 
OO0 
10e 
o,o 
sem-context 
SV-collocates = 
predicate-name = p: lead \] 
degree = 2 
lex ---- "o.u~nurn, ber" / 
lex = "lead" 
lex = "finish" 
lex = "hold" 
lex = "~eept' 
lex = "have" 
,,° 
sem-context 
SV-collocates = 
predicate-name : p:lead 
= degree : 4 
lex : U°verp°~er" 1 lex = "outstrip" 
lex : "hold" 
lex : "keel' 
• 
Figure 5: A portion of the lexical grammar showing the verbal collocates of "advancers". 
"Advancers outnumbered declining issues by a margin of 3 4o 1." 
"Advancers had a slim lead over losing issues wi~h a margin of 3 4o 1." 
"Advancers kep~ a slim lead over decliners wi~h a margin of 3 ~o 1" 
Figure 6: Example sentences that can be generated with the logical form LF 
257 
to select among the alternatives classes of verbs. In the 
figure we only included two alternatives. Both are rela- 
tive to the predicate p:lead but they axe used with dif- 
ferent values of the degree attribute. When the degree is 
2 then the first alternative containing the verbs listed un- 
der SV-colloca~es (e.g. "outnumber") will be selected. 
When the degree is 4 the second alternative contain- 
ing the verbs listed under SV-collocal;es (e.g. "over- 
power") will be selected. All the verbal collocates shown 
in this figure have actually been retrieved by Xtract at 
a preceding stage. 
The unification of the logical form of Figure 4 with 
the lexical grammar and then with the syntactic gram- 
mar will ultimately produce the sentences shown in Fig- 
ure 6 among others. In this example, the sentencial zone 
was not used since no phrasal template expresses its 
semantics. The verbs selected are all listed under the 
SV-collocates of the first alternative in Figure 5. 
We have been able to use Cook to generate several 
sentences in the domain of stock maxket reports using 
this method. However, this is still on-going reseaxch and 
the scope of the system is currently limited. We are 
working on extending Cook's lexicon as well as on de- 
veloping extensions that will allow flexible interaction 
among collocations. 
5 CONCLUSION 
In summary, we have shown in this paper that there 
axe many different types of collocations needed for lan- 
guage generation. Collocations axe flexible and they can 
involve two, three or more words in vaxious ways. We 
have described a fully implemented program, Xtract, 
that automatically acquires such collocations from large 
textual corpora and we have shown how they can be 
represented in a flexible lexicon using FUF. In FUF, co- 
occurrence constraints axe expressed uniformly with syn- 
tactic and semantic constraints. The grammax's function 
is to satisfy these multiple constraints. We are currently 
working on extending Cook as well as developing a full 
sized from Xtract's output. 
ACKNOWLEDGMENTS 
We would like to thank Kaxen Kukich and the Computer 
Systems Research Division at Bell Communication Re- 
search for their help on the acquisition part of this work. 
References 
\[Abney 89\] S. Abney, "Parsing by Chunks" in C. Tenny~ 
ed., The MIT Parsing Volume, 1989, to appeax. 
\[Amsler 89\] R. Amsler, "Research Towards the Devel- 
opment of a Lezical Knowledge Base for Natural 
Language Processing" Proceedings of the 1989 SI- 
GIR Conference, Association for Computing Ma- 
\[Benson 86\] M. Benson, E. Benson and R. Ilson, Lezi- 
cographic Description of English. John Benjamins 
Publishing Company, Philadelphia, 1986. 
\[Boguraev & Briscoe 89\] B. Boguraev & T. Briscoe, in 
Computational Lezicography for natural language 
processing. B. Boguraev and T. Briscoe editors. 
Longmans, NY 1989. 
\[Choueka 88\] Y. Choueka, Looking for Needles in a 
Haystack. In Proceedings of the RIAO, p:609-623, 
1988. 
\[Church 88\] K. Church, A Stochastic Par~s Program and 
Noun Phrase Parser for Unrestricted Tezt In Pro- 
ceedings of the Second Conference on Applied Nat- 
ural Language Processing, Austin, Texas, 1988. 
\[Church 89\] K. Church & K. Hanks, Word Association 
Norms, Mutual Information, and Lezicography. In 
Proceedings of the 27th meeting of the Associ- 
ation for Computational Linguistics, Vancouver, 
B.C, 1989. 
\[Cruse 86\] D.A. Cruse, Lezical Semantics. Cambridge 
University Press, 1986. 
\[Danlos 87\] L. Danlos, The linguistic Basis of Tezt 
Generation. Cambridge University Press, 1987. 
\[Desemer & Jabobs 87\] D. Desemer & P. Jacobs, 
FLUSH: A Flezible Lezicon Design. In proceedings 
of the 25th Annual Meeting of the ACL, Stanford 
University, CA, 1987. 
\[Elhadad 90\] M. Elhadad, Types in Functional Unifica- 
tion Grammars, Proceedings of the 28th meeting 
of the Association for Computational Linguistics, 
Pittsburgh, PA, 1990. 
\[Gaxside 87\] R. Gaxside, G. Leech & G. Sampson, edi- 
tors, The computational Analysis of English, a cor- 
pus based approach. Longmans, NY 1987. 
\[Gross 75\] M. Gross, Mdthodes en Syntaze. Hermann, 
Paxis, France, 1975. 
\[Halliday 66\] M.A.K. Halliday, Lezis as a Linguistic 
Level. In C.E. Bazell, J.C. Catford, M.A.K Hal- 
liday and R.H. Robins (eds.), In memory of J.R. 
Firth London: Longmans Linguistics \]la Libraxy, 
1966, pp: 148-162. 
\[Iordanskaja88\] L. Iordanskaja, R. Kittredge, A. 
Polguere, Lezical Selection and Paraphrase in a 
Meaning-Tezt Generation Model Presented at the 
fourth International Workshop on Language Gen- 
eration, Catalina Island, CA, 1988. 
\[Jacobs 85\] P. Jacobs, PHRED: a generator for natu- 
ral language interfaces, Computational Linguis- 
tics, volume 11-4, 1985 
\[Kay 79\] M. Kay, Functional Grammar, in Proceedings 
of the 5th Meeting of the Berkeley Linguistic So- 
ciety, Berkeley Linguistic Society, 1979. 
\[Klavans 88\] J. Klavans, "COMPLEX: a computational 
lezicon for natural language systems." In proceed- 
ing of the 12th International Conference on Corn- 
chinery. Cambridge, Ma, June 1989. 
258 
putational Linguistics, Budapest, Hungary, 1988. 
\[Kukich 83\] K. Kukich, Knowledge-Based Report Gen- 
eration: A Technique for Automatically Gener- 
ating Natural Language Reports from Databases. 
Proceedings of the 6th International ACM SIGIR 
Conference, Washington, DC, 1983. 
\[Maarek & Smadja 89\] Y.S Maarek & F.A. Smadja, Full 
Tezt Indezing Based on Lezical Relations, An Ap. 
plication: Software Libraries. Proceedings of the 
12th International ACM SIGIR Conference, Cam- 
bridge, Ma, June 1989. 
\[Mel'~uk 81\] I.A Mel'euk, Meaning-Tezt Models: a Re- 
cent Trend in Soviet Linguistics. The annual re- 
view of anthropology, 1981. 
\[Nirenburg 88\] S. Nirenburg et al., Lezicon building in 
natural language processing. In program and ab- 
stracts of the 15 th International ALLC, Confer- 
ence of the Association for Literary and Linguistic 
Computing, Jerusalem, Israel, 1988. 
\[Smadja 88\] F.A. Smadja, Lezical Co-occurrence: The 
Missing link. In program and abstracts of the 
15 th International ALLC, Conference of the As- 
sociation for Literary and Linguistic Computing, 
Jerusalem, Israel, 1988. Also in the Journal for 
Literary and Linguistic computing, Vol. 4, No. 3, 
1989, Oxford University Press. 
\[Smadja 89a\] F.A. Smadja, Microcoding the Lezicon for 
Language Generation, First International Work- 
shop on Lexical Acquisition, IJCAI'89, Detroit, 
Mi, August 89. Also in "Lezical Acquisition: Using 
on-line resources to build a lezicon", MIT press, 
Uri Zeruik editor, to appear. 
\[Smadja 89b\] F.A. Smadja, On the Use of Flezible Col- 
locations for Language Generation. Columbia Uni- 
versity, technical report, TR# CUCS-507-89. 
259 
