REQUIREMENTS OF TEXT PROCESSING LEXICONS 
Kenneth C. Litkoweki 
16729 Shea Lane, Gaithersburg, Md. 20760 
Five years ago, Dwight Bolinger \[1\] wrote 
that efforts to represent meaning had 
not yet made use of the insights of lexico- 
graphy. The few substantial efforts, such as 
those spearheaded by Olney \[2,3\], MelOCuk 
\[4\], Smith \[5\], and Simmons \[6,7\], made some 
progress, but never came to fruition. Today, 
lexicography and its products, the diction- 
aries, remain an untapped resource of uncer- 
tain value. Indeed, many who have analyzed 
the contents of a dictionary have concluded 
that it is of little value to linguistics or 
artificial intelligence. Because of the size 
and complexity of a dictionary, perhaps such 
a conclusion is inevitable, but I believe it 
is wrong. To avoid becoming irretrievably 
lost in the minutiae of a dictionary and to 
view the real potential of this resource, it 
is necessary to develop a comprehensive 
model within which a dictionaryOs detail can 
be tied together. When this is done, I believe 
one can identify the requirements for a se- 
mantic representation of an entry in the lex- 
icon to be used in natural language processing 
systems. I describe herein what I have 
learned from this type of effort. 
I began with the objective of identifying 
primitive words or concepts by following 
definitional paths within a dictionary. To 
search for these, I developed a model of a 
dictionary using the theory of labeled di- 
rected graphs. In this model, a point or node 
is taken to represent a definition and a line 
or arc is taken to represent a derivational 
relationship between definitions. With such a 
model, I could use theorems of graph theory 
to predict the existence and form of primi- 
tives within the dictionary. This justified 
continued effort to attempt to find such 
primitives. 
The model showed that the big problem to be 
overcome in trying to find the primitives is 
the apparent rampant circularity of defining 
relationships. To eliminate these apparent 
vicious circles, it is necessary to make a 
precise identification of derivational re- 
lationships, specifically, to find the spe- 
cific definition that provides the sense in 
which its definiendum is used in defining an- 
other word. When this is done, the spurious 
cycles are broken and precise derivational 
relationships are identified. Although this 
can be done manually, the sheer bulk of a 
dictionary requires that it be done with 
well-defined procedures, i.e. with a syn- 
tactic and semantic parser. It is in the 
attempt to lay out the elements of such a 
parser that the requirements of semantic rep- 
resentations have emerged. 
The parser must first be capable of handling 
the syntactic complexity of the definitions 
within a dictionary. This can be done by 
modifying and adding to existing ATN parsers, 
based on syntactic patterns present within a 
dictionary. Incidentally, a dictionary is an 
excellent large corpus upon which to base 
such a parser. 
The parser must go beyond syntactics, i.e., 
it must be capable of identifying which sense of 
a word is being used. Rieger \[8,9\] has argued 
for the necessity of sense selection or dis- 
crimination nets. To develop such a net for 
each word in the lexicon, I suggest the poss- 
ibility of using a parser to analyze the def- 
initions of a word and thereby to create a 
net which will be capable of discriminating 
among all definitions of a word. 
The following requirements must be satisfied 
by such a parser and its resulting nets. 
Diagnostic or differentiating components are 
needed for each definition. Each definition 
must have a different semantic re~resent- 
ation, even though there may be a core mean- 
ing for all the definitions of a word. Since 
the ability to traverse a net successfully 
depends on the context in which a word is 
used, each definition, i.e. each semantic 
representation, must include slots to be 
filled b~ that context. The slots will pro- 
vide a unique context for each sense of a 
word. Context is what permits disambiguation. 
Since the search through a net is inherently 
complex, a definition must drive the parser 
in the search for context which will fill its 
slots. These notions are consistent with 
RiegerOs; however, they were identified in- 
dependently based on my analysis of dictionary 
definitions. Their viability depends on the 
ability to describe procedures for developing 
a parser of this type to generate the desired 
semantic representations. 
AS mentioned before, observation of syntactic 
patterns will lead to an enhancement of syn- 
tactic parsingl to a limited extent, the syn- 
tactic parser will permit some discrimination, 
e.g. of transitive and intransitive verbs or 
verbs which use particles. Further procedures 
for developing semantic representations are 
described using the intransitive senses of the 
verb mchange" as examples. Procedures are de- 
scribed for (I) using definitions of preposi- 
tions for identifying semantic cases which 
will operate as slots in the semantic repre- 
sentation, (2) showing how selectional re- 
strictions on what can fill such slots are 
derived from the definitional matter, and 
(3) identifying semantic components that are 
present within a definition. It is pointed 
out how it will eventually be necessary that 
these representations be given in terms of 
primitives. Procedures are described for 
building discrimination nets from the results 
of parsing the definitions and for adding to 
these nets how the parser should be driven. 
The emphasis of this paper is in describing 
procedures that have been developed thus far. 
Finally, it is shown how these procedures are 
used to identify explicit derivational rela- 
tionships present within a dictionary in order 
to move toward identification of primitives. 
Such relationships are very similar to the 
lexical functions used by NelOCuk, except 
that in this case both the function and the 
argument are elements of the lexicon, rather 
than the argument alone. 
153 
It has become clear that semantic represent- 
ations of definitions in the form described 
must ultimately constitute the elements out 
of which semantic rapresentatlons of multi- 
sentence texts must be created, perhaps with 
twO fool: (I) describing entities (cantered 
around nouns) and (2) describing events 
(centered around verbs). If multisentence 
texts can then be studied empirically, the 
structure of ordinary discourse will then be 
based on observations rather than theory. 
Although this paradigm may seem to be in- 
credibly complex, I believe that it is 
nothing more than what the lexicons of pre- 
sent AI systems are becoming. I believe that 
more rapid progress can be made with an ex- 
plicit effort to exploit and not to duplicate 
~he efforts of lexicographers. 
REFERENCES 
I. Solinger,D°, Aspects of Language, 2rid ed., 
Ear¢ourt Brace Jovanovich, Znco, New York, 
1975, p.224. 
2. Olney,J., C.Revard, and P.Ziff, Toward the 
Developmen~ of Computational Aids for 
Obtaining a Formal Semantic Description of 
English, SP-2766/001/00, System Development 
Corpora~ion, Santa Monica, California, 
1 October 1968. 
3. Olney,J. and D.Rameey, QFrom machine- 
readable dictionaries to s lexicon taster: 
Progress, plans, and an offer," Computer 
Studies in the Humanities and Verbal 
Behavior, Vol.3, NO.4, November 1972, pp. 
213-220. 
4. NeleCuk,I.A°, tA new kind of dictionary 
and its role as a core component of auto- 
matlc text processing systems," T.A. 
Znformatlone, 1978, No.2, pp.3-8. 
5. Smith,RaN°, "Znteractive lexicon updating," 
Computers and the Humanities, vol°6, No.3, 
January 1972, pp. 137-145. 
6. Simmone,R.F. and R°AoAmeler, Modelln~ 
Dictionary Data, Computer Science Depart- 
ment, University of Texas, Austin, April 
1975. 
7. S£mmone,R.F. and w.P.Lehmann, A Proposal to 
Develop a Computational Methodology for 
Deriving Natural Language Semantic Struc- 
tures via Analysis of Machine-Readable 
Dictionaries, University of Texas, Austin, 
1976 (Research proposal submitted to the 
National Science Foundation, Sept.28,1976). 
8. Ringer,Co, Viewing parsin~ as War d Sense 
Discrimination, TR-511, Department of Com- 
puter Science, University of Maryland, 
College Park, Maryland, January 1977. 
9. Rieger,C. and S.Small, Word Expert Parsing, 
TR-734, Department of Computer Science, 
University of Maryland, College Park, 
Maryland, March 1979. 
154 
