Machine Tractable Dictionaries 
as Tools and Resources 
for Natural Language Processing 
Yorick WILKS, Dan FASS, Cheng-ming GUO, 
James E. MCDONALD, Tony PLATE, and Brian M. SLATOR 
Computing Research Laboratory, 
Box 30001/3CRL, 
New Mexico State University, 
Las Cruces, 
NM 88003-0001, 
USA. 
ABSTRACT 
This paper discusses three different but related large-scale compu- 
tational methods for the transformation of machine readable dictionaries 
(MRDs) into machine tractable dictionaries, i.e., MRDs converted into a 
format usable for natural language processing tasks. The MRD used is 
The Longman Dictionary of Contemporary English. 
1 Introduction 
Machine readable dictionaries (MRDs) contain knowledge about 
language and the world essential for tasks in natural language process- 
ing (NLP). However, this knowledge, collected and recorded by lexi- 
cographers for human readers, is not presented in a principled enough 
manner for MRDs to be used directly as tools for such tasks. What is 
badly needed is machine tractable dictionaries (MTDs): MRDs 
transformed into a format usable for NLP tasks. 
This paper discusses three different but related large,scale compu- 
tational methods for the transformation of MRDs into MTDs. The 
MRD used is The Longman Dictionary of Contemporary English 
(LDOCE). The three approaches differ in the amount of knowledge 
they start with and the kinds of knowledge they produce. All begin 
with some hand-coding of initial information but are largely automatic. 
Approach I, a conneetionist approach, uses the least hand-coding but 
then generates data for the co-occurrence of words, which is the sim- 
plest form of semantic information produced by any of the approaches. 
Approach II requires the hand-coding of a grammar and semantic pat- 
tems used by its parser, but not the hand-coding of any lexical material. 
This is because the approach builds up lexical material from sources 
wholly within LDOCE. Approach III employs the most hand-coding 
because it develops and builds lexical entries for a very carefully con- 
trolled defining vocabulary of 3,600 word senses (1,200 words). The 
payoff is that the approach will produce a MTD containing highly 
structured semantic information. 
The three approaches are all processes: tools for transforming 
MRDs into MTDs. Such tools will be applicable to MRDs other than 
LDOCE. The products of these tools are /VlTDs which are resources 
useful not just for NLP tasks but for artificial intelligence (AI) gen- 
erally. 
2 Background: The Value of Machine Readable Dictionaries 
Dictionaries are language texts whose subject matter is language. 
The purpose of dictionaries is to provide definitions of senses of words 
and, in so doing, they supply knowledge about not just language, but 
the world. For years, researchers in computational linguistics (CL) and 
AI have viewed dictionaries (a) with theoretical interest as a means of 
investigating the semantic structure of natural language, and (b) with 
practical interest as a resource for overcoming the knowledge acquisi- 
tion bottleneck in AI. The knowledge acquisition bottleneck has been 
viewed by most researchers as too hard a problem to tackle at present. 
However, more optimistic researchers have recently begun to seek 
methods to overcome it, and have had some success. This difference in 
attitudes regarding the knowledge acquisition bottleneck is reflected in a 
lnng-standing difference between two alternative methods of lexicon 
building: the demo approach and the book approach (Miller 1985; a 
similar distinction appears in Amsler 1982). 
The demo approach, which has been the dominant paradigm in 
natural language processing (and AI in general) for the last two decades, 
does not address the knowledge acquisition bottleneck. This approach 
is to hand-code a small but rich lexicon for a system that analyses a 
small number of linguistic phenomena. This is an expensive method as 
each entry in tire lexicon is prepared individually. Every entry is con- 
structed with foreknowledge of its intended use and hence of the 
knowledge it should contain. Being designed with only a specific pur- 
pose in mind, the knowledge representation runs into problems when 
scaled up to cover additional linguistic phenomena. 
The alternative, the book approach, confronts the problem of the 
knowledge acquisition bottleneck. This approach attempts to develop 
methods for transforming the knowledge within dictionaries or encyclo- 
paedias into some format usable for CL and AI tasks, usually with the 
aim of covering as large a portion of the language as possible. The 
problem with dictionary and encyclopaedia entries is that, although they 
are constructed in a principled manner over many years by professional 
lexicographers and encyclopaedists, they are designed for human use. 
Recently, interest in the book approach has greatly expanded 
because a number of MRDs have become available, each causing a 
flurry of research interest, e.g., The Merriam-Webster New Pocket Dic- 
t/onary (Amsler and White 1979; Amsler 1980, 1981), Webster's 
Seventh New Collegiate Dictionary (Evens and Smith 1983; Chodorow, 
Byrd, and Heidom 1985; Markowitz, Ahlswede, and Evens 1986; Binot 
and lensen 1987), and The Longman Dictionary of Contemporary 
English (Michiels, Mullenders, and Noel 1980; Michiels and Noel 1982; 
Walker and Amsler 1986; Boguraev, .Briscoe, Carroll, Carter, and 
Grover 1987; Boguraev and Briscoe 1987; Wilks, Fass, Guo, 
McDonald, Plate, and Slator 1987). 
The big advantage of MRDs is that now both theoretical and prac- 
tical concerns are investigable by large-scale computational methods. 
Some of the above research has been into the underlying semantic struc- 
ture of dictionaries (e.g., Amsler and White 1979; Arnsler 1980, 1981; 
Chodorow, Byrd, and Heidom 1985). The remainder of the research 
has been seeking to develop praodeal large-scale methods to extract 
syntactic information from MRD entries (e.g., Boguraev and Briscoe 
1987) and transform that information into a format suitable for other 
users. This latter research has the effect of transforming a MRD into a 
limited MTD. We use the word "limited" because such a MTD has 
only syntactic information presented in a format usable by others; 
semantic information remains buried in the MRD though this semantic- 
750 
information is tile knowledge about language and the world that is 
needed as a rosource for many (3~ and AI tasks. The next step is there- 
fore to develop large-scale methods to extract both rile syntactic and 
semantic information from MRD entries and present that information as 
a data base of acceptable format for potential vsers of that information. 
Within the book approach there are a number of ways one can 
construct such a MTD. One way is to automatically extract the semantic 
inlbrmation and build the MTD. We firmly advocate automatic extrac- 
tion. A secoml way is to extract the semantic information manually and 
handocode the entire MTD, as is being attempted in tile CYC Project 
(Lenat, Prakash, and Shepherd 1986; Lenat and Feigenbaum 1987). The 
main problem with this approach is the volume of eflbrt required: the 
C¥C Project aims to hand-code one million entries from an encyclo- 
paedia, which will take au estimated two person-centuries of work. We 
I~;lieve this is a mistaken approach because it wastes precious human 
resonrocs and makes dubious theoretical assumptions, despite Lenat's 
claims that their work is theory free (see section 5). 
Which e,ler form of the book approach is taken, there are two sets 
of issues that must be faced by those developing methods for tile 
transformation of MRDs into MTDs. One set of issues concerns tile 
nature of the knowledge in MRDs. The second set of issues eoncems 
the design of the database format of a MTD. Both sets of issues rest on 
understanding the structure and content of the knowledge that is both 
explicitly and implicitly encoded in dictionaries, but such understanding 
rests on certain key issues in semantics. We examine some of these 
issues in the next section. 
3 Background: Tile State of Semantic Theory 
q\]aere me obstacles to the development of methods (whether 
manual or auttJmatic) for the transformation of the semantic information 
from MRDs into a MTD tbat have not been present for those develop- 
ing methods for syntactic analysis only. The main obstacle is that, com- 
pared to syntactic theory, understanding of semantic theory is much less 
advanced, as zhown by the lack of consensus about even some of the 
general underlying principles of semantics. Nevertheless there is some 
uralerstanding and some local consensus on semantics that can allow 
work to proceed. 
Positions on certain basic issues in semantics affects one's stance 
concerning what semantic information should be extracted from a MRD 
and represented in a MTD. In developing our own methods for the 
transformation of MRDs into MTDs, we have adopted a certain 
approach from computational semantics. Examples of this approach are 
Prefereuce Semantics (Wilks 1973, 1975a, 1975b) and Collative Seman- 
tics (Fass 1986, 1987, 1988). The main assumptions of this approach 
ate the inescapable problem of the word sense and tile inseparability 
of knowledge and language. 
Lexical ambiguity is pervasive in language: the lexical ambiguity 
of words has been a problem since before tire advent of dictionaries and 
is particularly apparent when translating between languages; tasks such 
as t~'anslation cannot be modelled by computer without some repmsenta-- 
tion of lexical anlbiguity. Furthermore, lexical ambiguity is pervasive in 
most forms of language text, including dictionary definitions: fire words 
used in dictiohary definitions of words and their senses are themselves 
lexically ambiguous and must be disambiguated. 
We also believe that it is acceptable for a semantics to be based on 
ti~e approach to lexical ambiguity taken by traditional lexicography that 
constructs dictionaries. The major problem with the approach comes 
from its arbitrariness in the selection of senses for a word. This arbi- 
trariness appears between dictionaries in different sense segmentations 
of the same word. It is also observable within a single dictionary when 
:the sense-distinctions made for the definition of a word do not match 
with rile uses of that word in the definitions of other words in the dic- 
tionary. Thes,: problems can be overcome by methods that reconcile 
different sen,'~ ~. selections of a word within and across dictionaries by 
extending (nr a-xlucing) the coverage of individual word senses. 
Our positkm on the inseparability of knowledge and language is 
that commrm pfinciple~ underlie the semantic strncturo of language text 
and of lulowledge ropresentati0ns, and that some kinds of language text 
structures are a model for knowledge structures (Wilks 1978). Examples 
of such knowledge structures include the planes of QuiUian's Memory 
Model (1967, 1968), pseudo-texts from Preference Semantics, sense- 
frames from Collative Semantics, and integrated semantic units or ISUs 
(Gut 1987). Supporting evidence comes from comparisons between the~ 
semantic structure of dictionaries and the underlying organisation of 
knowledge representations, which have observed similarities between 
them (Amsler 1980; Cbodorow, Byrd, and Heidom 1985). 
These positions on semantics suggest the following for those 
engaged in transforming MRDs into MTDs. First, the problem of lexi- 
eal ambiguity must be faced by any approach seeking to extract seman- 
tic information from a MRD to build a MTD. Because lexical ambi- 
guity exists in the language of dictionary definitions and in language 
generally, it follows that the language in MRD definitions needs to 
analysed to the word sense level and must be represented in the format 
of the MTD. Second, the format of the MTD, while being of principled 
construction, should be as language-like as possible. 
Next, we focus attention onto some basic issues in transforming 
MRDs concerning the nature and accessibility of the knowledge in dic- 
tionaries. 
4 The Analysis of MRDs 
We hold that those who advocate tile extraction (both manual and 
automatic) of semantic information from dictionaries (and .even encyclo- 
paedias) have made certain assumptions about the exten t of kr/owledge 
in a dictionary, about where that knowledge is located; and how that 
knowledge can be extracted from the language of dictionary definitions. 
These are not assumptions about semantics, but rather, are assumptions 
about the extraction of semantic information from.text. These assump- 
tions are methodological assmnptions because they underlie the deci- 
sions made in choosing one method for semantic analysis rather than 
another. These assumptions are about (a) sufficiency, (b) extricability, 
and (c) bootstrapping. 
Sufficiency addresses whether a dictionary is' a strong enough 
knowledge base for English, specifically as regards the linguistic 
knowledge and, above all, the knowledge of the real world needed for 
subsequent text analysis. Sufficiency is of general concern, including 
hand-coding projects like CYC, where they attempt to make explicit (a) 
the facts and heuristics which one would need in order to understand 
sentences, (b) generalisations of those facts and heuristics, and (c) infer- 
ences that fall inter-sentential gaps (Lenat and Feigenbaum" 1987, 
p.llS0). 
Extrieability is concerned with whether it is possible to specify a 
set of computational procodures that operate on a MRD and exlraet, 
through their operatirm alone and without any human intervention, gen- 
eral and reliable semantic information on a large scale, and in a general 
fomlat suitable for, though independent of, a range of subsequent text 
analysis processes. 
Bootstrapping refers to the process of collecting the initial infm'- 
mation that is required by a set of computational procedures that are 
able to extract semantic information from the sense definitions in an 
MRD. The initial information needed is commonly linguistic informa- 
tion, notably syntactic and case information, which is used during the 
parsing of sense-definitions into an underlying representation from 
which semantic information is then extracted. 
Bootstrapping methods can be internal or external. Intemal 
bootstrapping methods obtain the initial information needed Ior their 
procedures from the dictionary itself and use the procedures to extract 
that information. This is not as circular as it may seem. A process may 
require information for the analysis of some sense-definition (e.g., some 
knowledge of the words used in the definition) and may be able to find 
that information elsewhere in the dictionary. By contrast, external 
bootstrapping methods obtain the initial information for their procedures 
by some method other than the use of those procedures. The initial 
~information may be from some source extemal to the dictionary or may 
~be in the dictionary but impossible to extract without the use of the very 
!same information. For example, the word 'noun' may have a definition 
!in a dictionary but the semantic information in that definition cannot be 
extracted without prior knowledge of a sentence grammar that contains 
!knowledge of syntactic categories, including what a noun is. 
Those who advocate hand-coding presumably have pessimistic 
views about extricabiUty and bootstrapping. 
5 The Production of MTDs 
The main issue here concerns tile format that MTDs should have. 
One thing is clear: the format must be versatile if a variety of consu- 
mers in CL and AI are to use it. The most likely initial consumers are 
it hose that place a considerable emphasis on fire availability of words, 
75\]. 
such as spelling correction, and those that already use large lexicons, 
soch as .machine translation (MT) and word processing (Amsler 1982). 
Within CL, twoprimary consumers are the semantics mentioned in sec- 
tion 3, Preference Semantics and Collative Semantics. 
These consumers need a variety of semantic information. To meet 
these needs MTD formats should be clean, unambiguous, preserve much 
of the semantic structure of natural language, and contain as much 
information as is feasible. However, this does not mean that the format 
of a MTD must consist of just a single type of representation because it 
is possible that different kinds of information require different types of 
representation. For example, two kinds of information about word use 
are: (a) the use of senses of words in individual dictionary sense 
definitions, and (b) the use of words throughout a dictionary. It is not 
clear that a single representation can record beth (a) and (b) because (a) 
requires a frame-like representation of the semantic structure of sense 
definitions that records the distinction between genus and differentia, 
• the subdivision of differentia into case roles, and the representation of 
sense ambiguity, whereas (b) requires a matrix or network-like represen- 
tation of word usages that encodes the frequency of occurrence of words 
mad the frequency of co-occurrence of combinations of words. Hence, a 
MTD may consist of several representations, each internally uniform. 
Given the arguments presented in section 3, we believe that the 
first of these representations should be modelled on natural language 
though it should be more systematic and without its ambiguity. Hence, 
this component representation should be as language text-like as possi- 
ble and should represent word senses, whether explicitly or implicitly. 
Other approaches to the building of representations that contain 
semantic information extracted from dictionaries and encyclopaedias 
(e.g., Binot and Jerlsen 1987; Pustejovsky and Bergler 1987; CYC) 
separate knowledge and language and overlook the problem of the lexi- 
cal ambiguity of the words in dictionary definitions (these are the under- 
lying theoretical assumptions made by these approaches). 
The other representation form oI" representation can be construed 
as a connectionist netwoik representation, based on either localist (e.g., 
Cottrell and Small 1983; Waltz and Pollack 1985) or distributed 
approaches to representation (e.g., Hinton, MeClelland and Rumelhart 
1986; St.Jofin and McClelland 1986). Like our position on semantics, 
connectionism emphasises the continuity between knowledge of the 
language and the world and many connecfinnist approaches have paid 
special attention to representing word sense~, especially the fuzzy boun- 
daries between them (e.g., Cottrell and Small 1983; Waltz and Pollack 
1985; St.John and MeClelland 1986). Localist approaches assume sym- 
bolic network representations whose nodes are word senses and whose 
arcs are weights that indicate the relatedness of the word senses at the 
ends of the arcs. An interesting new approach, which we shall outline 
shortly in section 6.1, uses a network whose nodes are words and whose 
arc weights indicate co-oecurrence data between words. Although this 
approach initially appears to be loealist, it is being used to derive more 
distributed representations which offer ways of avoiding some serious 
problems inherent in localist representations. Such frequency-of- 
association data is not represented in standard knowledge 'representation 
schemes, is complementary to the knowledge in such schemes, and may 
be useful in its own right for CL tasks such as lexieal ambiguity resolu- 
tion and spelling correction. 
To summarise so far, we have outlined: (1) some basic theoretical 
assumptions about semantics and our position regarding those assump- 
tions (inseparability of language and knowledge, taeiding the problem 
of the word sense), (2) some basic methodological assumptions about 
the extraction of semantic information from dictionaries (sufficiency, 
extricability, bootstrapping), and (3) some basic theoretical assumptions r 
regarding the format of a MTD (language-like format, inclusion of dif- 
• ferent kinds of semantic information, notably lexical ambiguity). 
6 Three Approaches to the Transformation of MI~Ds into MTDs 
At CRL, we are pursuing three approaches to the automatic trans- 
lation of the information in The Longman Dictionary of Contemporary 
English (Proctor et al 1978) into a MTD. LDOCE is a full-sized dic- 
tionary designed for learners of English as a second language that con- 
tains over 55,000 entries in book 'form and 41,100 entries in machine- 
readable form (a type-setting tape). The preparers of LDOCE claim that 
entries are defined using a "controlled" vocabulary of about 2,000 
words and that the entries have a simple and regular syntax. We have 
analysed the machine-readable tape of LDOCE and found that about 
2,219 words are commonly used. 
The three CRL approaches all fall within the general position on 
computational semantics outlined above and are extensions of fairly! 
well established lines of research. All three approaches also pay special 
attention to their underlying methodological assumptions concerning the 
extraction of semantic information from dictionaries. With respect to 
sufficiency and extricability, all three approaches assume that dic-i 
tionaries do contain sufficient knowledge for at least some CL applica- 
tions, and that such knowledge is extricable. But the approaches differ 
over bootstrapping, i.e., over what knowledge, if any, needs to be 
hand-coded into an initial analysis program for extracting semantic 
information. 
The three approaches differ in the amount of knowledge they start 
with and the kinds of knowledge they produce. All begin with a degree 
of hand-coding of initial information but are largely automatic. In each 
case, moreover, the degree of hand-coding is related to the source and 
nature of semantic information sought by the approach. Approach I, a 
connectionist approach, uses the least hand-coding but then the co- 
occurrence data it generates is the simplest form of semantic informa- 
tion produced by any of the approaches. Approach lI requires the 
hand-coding of a grammar and semantic patterns used by its parser, but 
not the hand-coding of any lexical material. This is because the 
approach builds up lexical material from sources wholly within; 
LDOCE. Approach III employs file most hand-coding because it 
develops and builds lexical entries for a very carefully controlled 
defining vocabulary of 3,600 word senses (1,200 words). The payoff is! 
that the approach will produce a MTD containing highly structured! 
semantic information. 
6.1 Approach I: Obtaining and Using Co-Occurrence Statistics 
from LDOCE (Tony Plate) 
Our first approach extracts semantic information from text 
(specifically LDOCE) that does not require any semantic information to 
bootstrap it. Central to this technique is that all sentences that contain a 
word are used as sources of information about the use of that word, 
rather than just the definition of the word. This technique is based on 
some experimental findings that the frequency of co-occurrence of a 
pair of words provides a reasonable measure of the strength of the 
semantic relationship between them. 
This approach bears some resemblance to Sparck Jones's (1964) 
investigation into the semantic classification of the uses of words. Her 
underlying linguistic assumption was that the uses of words may be dis- 
tinguished, described, or analyzed by the semantic relations which hold 
between them and the vocabulary of a language has a semantic structure 
determined by these relations. Of twelve possible semantic relations, 
synonymy was chosen as the fundamental feature of natural language. 
Despite some surface similarities to Sparek Jones's technique 
there are many differences, some of which are discussed below. First, 
Sparek Jones's data collection method is much more laborious than the 
co-occurrence method (see Wilks, Fass, Gut, McDonald, Plate, and Sla- 
tor 1987). 
Second, Sparck Jones's method requires that the data must contain 
all the senses of words that need to be considered. In the co-occurrence 
method, it is not necessary that the text contain examples of all senses, 
because the sense definitions are used to provide information about the 
senses. The text need only use enough senses of words to define all 
words, but should make frequent use of the senses it does use. 
The approach proposed here finds much more distant and general 
relationships than synonymy, and which involve the combination of 
many semantic relations. Co-occurrence data for the LDOCE controlled 
vocabulary has been collected. This data contains ngarly two-and-a-half 
million frequencies of co,occurrence (the triangle of a 2200 by 2200 
matrix). This is too much data to examine in raw form, so we have 
used two techniques to convert the data into a more understandable for- 
mat. 
We have written a program called BROWSE which can manipu- 
late the entire co-occurrence matrix and can select groups of words 
based on whether the values of various probability functions pass 
selected thresholds. These groups of words can be manipulated as sets, 
and one technique we are using is to build sets of words that are either 
related to a certain word or to a certain sense of a word. 
The other technique involves using BROWSE to extract sub- 
matrices which are then given to the PATHFINDER program 
(Schvaneveldt, Durso and Dearholt 1985). This program was designed 
752 
to discover the network structure of psychological data and it reduces 
the total amount of information while not eliminating much of useful 
infomlation. We have applied this program lo LDOCE co-occurrence 
data with solae success; it produces sparsely connected netwmks which 
,are easy to examine by eye and which appear to contain much useful 
world kalowhMge. 
In both tormats (groups of words and PA'I'I~qNDER networks) 
the data is a potentially useful re~uree for a number of applications. 
Of particulm hlterest is the possibility of sense disambiguation. To 
itwestigate this, we have written a number of processes that use the co- 
occurrence data. One process we are studying involves rating the coher- 
ence of particular sense assignments for sentences, based on the set 
overlap of the groups of words related to each of the assigned senses. 
Another process we are studying is how activation spreading from the 
nodes in a network produced by rile PATItFINDER program can select 
the appropriate senses of words in context. 
Tile work has strong links to commctionism, and indeed we are 
investigating how this work can proceed within the connectionist para- 
digm. We are developing a theory of representation, utilisation, and 
learning of networks within distributed connectionist models. In addi- 
tion, we have been developing a conneetionist simulator for file lntel 
hypercube; this work is well under way (see Plate 1987). 
6.2 Approach lI: A Lexicon-Producer (Brian M. Slator) 
While the first approach begins with no prior knowledge needed at 
all, the other two approaches begin with certain kinds of extemal inlbr- 
mation supplied. The second approach (Slator and Wilks 1987; Slator 
1988) hand-codes a grammar, some semantic patterns, and a list of the 
2,219 words of the LDOCE controlled vocabulary. The approach seeks 
to build dictionary entries for the words of the controlled vocabulary 
and the other words in LDOCE using semantic infbrmation extracted 
from uot only the dictionary entries of LDOCE, as in the other two 
approaches, I,ut also from the box and pragmatic codes found on the 
machine readable version of LDOCE (though not in the book). Tim box 
codes use a ,;pecial set of primitives such as "abstract," "concrete," 
and "animate," organised into a type hierarchy. The primitives are 
use d to assign type restrictions on nouns and adjectives, and type res~ 
trictions on tile arguments of verbs. The pragmatic codes (also called 
"subject" codes but relErred to here as "pragmatic" codes to avoid 
coni'usion with the grammatical subjec0 um another special set of prim- 
itivcs organis\[:d into a hierarchy. The hierarchy consists of maiu head- 
ings such as "engineering" and subheading like "electrical." The 
primitives are used to classify words by their subject, for example, one 
sense of 'cunent' is classilied as "geography:geology" while another 
sense is marked "engineering/electrical." 
"llle semantic information is extracted from LDOCE dictionary 
entries using a large-scale parser that isolates the genus mad differentia 
terms in each entry, expanding upon other similar work (e.g., Cho- 
dorow, Heidom, and Byrd 1985; Alshawi, Boguraev, and Briscoe 1985; 
Boguraev and Briscoe 1987; Binot and Jensen 1987). 
The dictionary entries that are built for individual word senses are 
ti'ame-based lexical semantic structures, intended for subsequent use in 
knowledge ha~d parsing. The process of building a frame for a word 
sense begins hy filst assigning the box and pragmatic code information 
from LDOCE for that word sense. The parser then analyses the 
definition of that word sense from LDOCE. 
The par.~er is a chart parser (taken from Slocum 1985) which is 
left-comer and bottom-up with top-down filtering and early constituent 
tests. Chart parsing was selected because of its utility as a grammar test- 
ing and development tool. The parser is driven by a context free gram- 
mat' of over 100 rules and a lexicon composed of the 2,219 words from 
the LDOCE ecmtrolled vocabulary. It must be emphasised that this chart 
parser is not a parser for Englisfi -- it is a parser for just the language 
of LDOCE d(,finitions. The grammar is still being tuned, but currently 
covers over 90% of tile language used in LDOCE definitions of content 
words. 
Th~ parser produces a phrase-structure tree of an LDOCE word 
sense definition which is passed to an interpreter tot pattern matching 
and inferencing. The intelpreter extracts the dominating phrase, reorgan- 
ises tile phrase into genus and differentia components, ,and attempts to 
infer and fill ill case roles flint subdivide the differentia information. The 
interpreter then accesses the pre-exisfing frame for the word sense, 
wlfich already contains the relevant box and pragmatic code information 
for the word sense, and enriches the frame by adding the genus and dif- 
ferentia infommtion extracted from its definition. 
Consider, for example, how the frame for 'ammeter' is built. 
From the box and pragmatic codes, the following hierarchical informa- 
tion is extracted and used to create an initial frame for 'ammeter': from 
the box code, that an ammeter is of type "solid," and from the prag- 
matic code, that an ammeter is classified under the subject 
"engineering/electrical." 
Next, the chart parser is used to analyse the LDOCE definition of 
an 'ammeter', which is that it "is an instrument for measuring ... elec- 
tric current." The definition is parsed into a phrase-structure tree which 
is passed to the interpreter. The interpreter adds to tile frame lbr 
'ammeter' that 'instrument' is its gemm and "for measuring electric 
current" is its differentia infom~ation. Furthermore, tile interpreter 
notes the phrase "for measuring" and creates the case role slot PUR- 
POSE, i.e., that the purpose of an ammeter is for measuring electric 
current. 
6.3 Approach Ill\[: Building a MTD from a Key Defining Vocabu- 
lary (Cheng=ming Guo) 
The third approach, unlike the first and the second, argncs that a 
small mnount of hand-coding of world knowledge is necessary before 
the bootstrapping process can begin. The amount of hand-coding 
required, though more than the other approaches, is still relatively small 
because over 95% of its MTD is built automatically. The prior world 
knowledge that requires hand-coding is a set of 1,200 words, called the 
Key Delining Vocabulary (KDV), which has been found to define the 
controlled vocabulary of LDOCE, and thence all 27,758 words defined 
in LDOCE. The .senses of all the words in LDOCE call be defined by 
the KDV ill a series of four "defining cycles." 
When a candidate word enters a defining cycle, the stems of the 
words used in the definitions of the first three senses of that candidate 
word are examined. If all the word stems in those three sense 
definitions occur in the KDV, then the candidate word is put into a 
"success" file and added to the KDV at the end of the defining cycle; if 
not, the word is put into a "fail" file and addition of the word to the 
KDV is postponed until a later cycle. In this way, tile size of the KDV 
expands with each cycle until, after three cycles, all the words from the 
LDOCE controlled vocabulary are accounted for. The remaining words 
ill LDOCE is expected to be defined ill the next defining cycle. 
The discovery of tile KDV and the use of defining cycles is valu- 
able tor a number of reasons. First, in building a MTD, a KDV reduces 
the initial number of knowledge structures for dictionary entries that 
have to be hand-coded before such structures can be constructed 
antomatically by some bootstrapping process. The knowledge structures 
used in this particular study are called "integrated semantic units" or 
ISUs. Though the preliminary study reported here uses a KDV of 
around 1,200, tile number can probably be reduced to about 1,000. 
Second, the use of defining cycles helps to identify vacuous circu- 
lar definitions. Circular definitious that use circles of just two words 
pose special problems for building a MTD from a MRD. For example, 
in LDOCE a "trip" is defined as a "journey", and a "journey" as a 
"trip". A MTD built from a MRD should be free Of such circular 
definitions. One way to overcome such circular definitions is to try and 
include just one of the words involved as a KDV word, but not the 
other. The wofd selected for tile KDV will be the one whose first three 
senses fulfil the criteria of a defining cycle given earlier. 
Thirdly, when constructing a MTD, use of the defining cycles 
ensures that all definitions of words and their senses that are built con- 
tain only words that already have definitions. In the case of LDOCE, 
use of the defining cycles sorts out words in the LDOCE controlled 
vocabulary whose definitions include words outside of that vocabulary. 
This has proved to be not uncommon in LDOCE definitions. 
Fourthly, in building a MTD, file main senses of these empirically 
found KDV words are taken as the "semantic primitives" of the MTD. 
The use of defining cycles ensures that a set of primitives that best suit 
a particular MRD eat.~ be found empirically. 
An estimated 3,600 1SUs for an average of three basic senses of 
the 1,200 KDV words are to be hand-coded to start the bootstrapping 
process (the bootstrapping process is shown schematically in Figure 2 
of the Appendix, p.14). A language analyzer and leamer (LAAL) car- 
ties out tile bootstrapping process according to a bootstrapping schedule 
(as with approach II, any grammar rules or semantic patterns used by 
the LAAL will have to be hand-coded). The bootstrapping schedule is 
753 
cone~rned with wIiich word sm~es are m l~e proecssed frst, and which 
later. The ~eces,,;iiy 10r the boolslrappirlg schcduic stems Irom the l~tci; 
that the ISUs l'~}r the basic senses of the words in the dcfnition of a 
word ~nse have to b~ in the ISII dNabase beh~re that defirfition cma be 
analysed and its ISU produced. After the ISlls for the basic word senses 
of the wordg flora the LDOCE c~mlrotted vocabnlary mc built into the 
d~Rabase, the non.basic senses of fl~cae veords will be processed. When 
a/! of the controlled vucabulaw words ale finished, words front Outside 
am c, mtrollcd vo~.:abulmy will be attended to. Following the bootstrap- 
ping sclmdnie~ the LAAI_. system processes word sense delinitions to 
prodnee lnom an0 morn lS//s until |he entire I.DOCE is turned into a 
~ifll MTD of ISUs. 
Furtber details abmut the three approacbes may be fonnd in the 
Appendix (Wilka, Fass, Gun, Mel)onakt, l'late, arid Slamr 198"/). The 
M'FDs pcoduccd by these approache~ are fed into a nnmbEr of consuo 
mers: a Lcxicon..Consumer (Slator mui Wii!..':; t987) and Collative 
Semantics. 
7 Snmma~'y 
We do not expect to produce a single lbrmat Ibm wpn~senting Ihe 
knoa,ledge exti=acmd from LDOCE hecause the three approaches nsc 
dill'trent sources of knowledge and difli~mnt processes. The ionnats 
produced by approaches II and BI are notationally the most alike but the 
k~.;owledge they contain is different. Unlike the othg~s, the fonnat of 
approach I contains eo..occummee data. The format of II contains tx~x 
and pragmatic code inlbmlation not present in tim format of approach 
lIl; but the madertying organisation of tbe knowledge in approach lIl is 
very systematic, nnlike the equivalent knowledge in approach If. We 
expeet that the comparison of formals will he very fiuitflfl, as will the 
coinpafison of underlying approaches to the extraction ol semantic 
information, and will produce clearer understanding for Ibmre work on 
transforming MRI)s into MTDs. 

References 

Alshawi, ttiyan, Bran Bogmaev, m~(l Ted Briscoe (1985). Towards a 
Dictionary Support Environment lbr Real Time Patting. In 
Proceedings of the European Co~¢erence on Computational 
L#tgui, vtics, Pisa, lt~fly. 

Amsler, Rol~q* A: (I 98/)) The ,~;tmcturc ol the Merriam-Webster Pocket 
Dictionary, Technical Repo~.t TRq64, tJ~five~,ity of Texas at 
Ausfirl. 

Amsler, Robert A. (1981) A Taxonomy of English N~mns and Vefl~s. 
/n Proceedings of the 19th Annual Meeting of the Association for 
Computational Linguistics, Stanford, Ca, pp. 133-138. 

Amsler, Robert A. (t982) Coulputational Lexieology: A Research Plod- 
grain. AFIPS Con/erence Proceedings, 1982 National Computer 
Conference pp.657-663. 

Amsler, Robert A. (1986) Deriving Lcxieal Knowlc(lge~Base Entries 
from Existing Machine-Readable lnlblmafion Soumes. Unpub- 
lished Ms. 

AmslEr, Robert A., and John S. Whim (1979) Development of a Com- 
putational Methodology for Deriving Natural language Semantic 
Structures via Analysis of Madfine-Readable Dictionaries. NSF 
Tectmical R epmnt MCS7%01315. 

Binot, Jean-Lonis, and Karen Jensen (198'1) A SemanfiE Expert Using 
an Online Standard Dictionary. Proceedings of the lOth Interna- 
tional Joint Conference on Artificial Intelligence (IJCAI-8/), 
Milan, Italy, pp.709-714. 

Boguraev, Bran, and Ted Briscoe (1987) Large Lexicons for Natural 
Language Processing: F, xploring the Grammar Coding System of 
I,DOCE. Computational Linguistics, 13. 

Bogtn'aev, 13ranimir K., Ted Briscoe, John Ca)aoll, David Carter, and 
Claim Grover (1987) The Dcfiwltion of a Grammatically Indexed 
t',exicon fmnl the lxmgman Dictionary of Contmnporary English. 
l'roceedings of the 251h Annual Meeting of the ACL, Stratford 
Unive~ity, Stanlbrd, CA, pp.193-.200. 

Clmdorow, Martin S., Roy J. Byrd, and Gex~rge E. Heidorn (1985) 
Extracting Semantic ttierarchies from a Large On-Line Dictionary. 
Proceedings t~' the 23rd Annual Meeting of Ne ACL, C~hieago, 
Illinois, USA, pp.299-304. 

Cottmll, Ganison W, and Steven L. Small (1983) A Connectionist 
Scheme for Modelling Word-Sense Disambiguation. Cognition 
and Brain 17ieoty, 6, pp. 89--120 

Evens, M., and R.N. Smith (/983) Determination of Advcfifia~ ~-;ei~acs 
from Webster's Seve~th Collegiate Definitions. Paper p~ese~de(l at 
Workshop on Machine Readable Dictionaries, SRI-ln~ematio~a/, 
April 1983. 

P'ass, Dan C. (1986) Collative Semantics: An Approach to Cuhem~Jee. 
Memor~mdnm in (2~mpnmr and Cognitive Science, MCC.S-86-.56, 
Computing Research Laboratow, New Mexico State University, 
Ne.w Mexico° 

Fass, Dan C. (1987) Semanlie Relations, Metonymy, and I~exical AmbL. 
gully Resolution: A Cohemnc~-Based Account, in Proceedings of 
Ne 9th Annual Cognitive Science Society Conference, ll~iversity 
of Washington, Seattle, Washington, pp.575..586. 

Fuss, Dau C. (1988) Collative Semantics: A Semantics ibr Natural 
Language Processing. Memorandum iu Computer and Cognitive 
Science, MCCS-g8..11{L Computing Research laboratory, NEW 
Mexieo State U~fiversity, NeW Mexico. 

Grid, Cbeng-ming (1987) h~temclive Vocabulmy Acquisition in X.TRA. 
Proceedings of #~e lOt1~ htternatiottal Joint Conference on Aa(fi 
cial b~telligence (17CAI.87), Milan, Italy, pp.715.q17. 

I,enaL Douglas, B., MayzrJk Prakash, and Mary Shepherd (1986) CYC: 
Using Common Senst~ Knowledge to Overcome Brittleness a~i 
Knowledge Acq,isitio~ Bottlenecks. AI Magazine, 7, (4), pp.65-85. 

Lenat, Douglas, B., and Edward A. Feigenbaum (1987) On Tbe Thrrs.. 
holds of Know!edge. Proceedings of tt~e IOth lnternatiomd ,Io#~t 
Conferetace on Artificial lntellige~ce (LtCAI-87), Mihm, italy, 
pp.1173-1i82. 

IIinton, Geoff E., James 1,. MeClelland, and David E, Rnmelhart (1986) 
Distriimted Repw, sentafions. In James L. McClell~md, David E. 
Rmnelhart, and the PDP Research Group (1986) Parallel Distri~ 
bawd Processing: Explora~icms in the Microstructure Of Cogni- 
tion. Volume 2: Psyeho!ogieal and Biological ModeL~, MrI' 
Press/Bradlbrd Books: Cambridge, M A, chapter 3, pp.'l"/- i 09. 

McCtelland~ Jay, Da~'id E. Rumelhart, and the PDP Research Group 
(1986) Parallel Distributed Processing: Explorations in ~'hc 
Microstrueture qf Cogniaon. Volume 2: Psychological and 2~io. 
logical Models, M!T Fmss/B~a0ibN Books: Cambddg% MA. 

Markowitz, Judith, Thomas Ahlswede~ and Martha Evens (~936) 
Semantically Significant Patterns in Dictionary Definitions. Iu 
Proceedings ef Ne 24gh Annual Meeang of the Association fi~r 
Comt, utaeiond Linguislics, New York, pp. 112-119. 

MicNels, A., L M~flle~Nels, and J. Noel (1982) ExNoiting a Lm'gE data 
Base by lmngman. Proceedings of Ne 8th Internatiomd Co~(er.. 
ence on Computational Linguistics (COLING-80), Tokyo, Japan, 
pp.374-382. 

Miehiels, A., m~d J. Nt~ul (1982) Approaches to Thesaurus Production. 
Proceedings of the 9~h International Cot!ference on Compma- 
tional Linguistics (COLING..82), Prague, Czech.oslovakia, pp.227--232. 

Millet', George A. (1985) Dictionaries of the Mind. In Proceed#lgs 4 
the 23rd A~mual Meeting of she Association for Computational 
Linguistics,, Chicago, pp.305--314. 

Plate, Tony (1987) FWCON: A l~sign for the Simtdation of Coxmec.. 
fionist M~n~O,,~ on Comse Grained ParallEl Compute~. 
Memorandnm ix~ Computer and Cognitive Science, MCCS-87..106, 
Computing Research I,almratmy, New Mexico State University, 
New Mexitay. 

Procter, PanL b~nl;ed Fo tlsox~, John Ayto, et al (1978). Longman Di~. 
tiottary of Contemporaey F:~.glish, Imngman Group i,imitol: .Uaro 
low, :Essex, F, ngbmd. 

Pus|cjovsky~ 3awtes, and Sahi~m BeNler (1987) "rim Acquisition of Co~> 
eepmal S|mcmm for the Lexicon. Proceedings' of the 6th 
Naaom;t Uonj);re~ce on Ariificial lmelligence (Az~l-87), Seatt!e, 
Wa., pp.556-570. 

Quillia~, M. Ross (1967) Word Concepts: A Ttmory and Simulation of 
Some Basic Semantic Capabilities. Behavioral Science, i2, 
pp.410430~ Also reprinted in Ronaid J. Braelanan m~d Hector J. 
Levesq~e reds0 (1985) Readings in Knowledge Represemativn, 
Morgan Kaufinanm l~s Alms, CA, pp.98-118. 

t}niili~m~ M. ~o~;s (/96g) .Semmriic Mem¢~y. ~i~ Marviu Minsky (Ed.) 
':'..::.,~i~amic t~forta~h),,~ .~'roc,::,v:.'i~,g, Cambridge, Mass: MVI' Pres,% 
pp.:?, 16-270, 

;-;iJo'~ M;!~i: F., and James :i,, McCIclhmd (1986) Recnnstructive 
M:emo'~y ib~ ° Sente~ncc~;: A PDP Approach. Ohio Univei,~ity lnfcr- 

~:;c.l~vz.~low!d~-, Roger W., Frank T. Dmso, and Don W. J)earholt (1985) 
Pathlind:~'r: Sealing wiflJ Networl,. Stlt,ett,re. Memorandum in 
Comput~r aid Cog~fitive Science, MCCS-85..9, Comi)nting 
Y.¢:.:<~an:h Laboratoly, New Mexico State !./nive~ity, New Mexico. 

,lui~:~ ~n::.lair, Pall,ilk H;mks, Gwynelh Fox, Rosmmmd Moon, l'cnny 
3i~)ck, ci N (15)87) Collins COBUILD English Language Die,ion.. 
c~y, Wilt~mn Collins and Sons: Glasgow, Scollm,d. 

;-;):)ior, ~l~ian M. (t988) l.,c.xical Semantics and a Pmterence Semantics 
h,-:.;er. Memora~,(h.nn in Comtmler and Cogniliw. Science, 
~Y~;!(;S-1~8..I16, Computing Research Laborato\]y, New Mexico 
:-;talc U~ba~,sity, e'~e.w Mexico. 

::;tat, r, }~,~iai~ M. and Yorick A. Witks (1987) TowaN Semantic Sin,c- 
tum~; tiom Diclio~iary Entries. In Proceedings of the Second 
Amu,al Rocky Mountain Conference an Artificial Intelligence, 
lieu/tier Colorado, pp.85-96. Also Memorandum in Computer and 
Cognitive Science, MCCS-87-96, Computing Research Labora- 
to,7, Ni:w Me~tico S~a|e University, New Mexico. 

Sloe,m, .)ona~ha~ (19~i5) Parser Construction Techniques: A TutoriaL. 
Tutorial held at/he 23rd Annual Meeting of the Association for 
.~ompt~tationat Linguistics, Chicago. 

Spa~cl~ }ones, K0xen (1964) Synonymy ~md Semantic Classification. 
PiLD. 'lhesis, tJniw~rsity of Cambridge, England. Published in 
Edinbtn'gh lnlbmmtion Technology Series (EDrrs), Sidney 
Ivfichael:~oa a~d Yorick A. Wilks (Eds.), EdinbuNh University 
t'wss: ~inburgh, L;eofland, 1986. 

Wa~\[~cr, Donald E., and Rotx~.rt A. Amsler (1986) The Use of Machine- 
Readahle Dictionaries in Sublanguage Analysis. hi Ralph Grish- 
m;.m and RiehaM Kittrcdge (Eds.) Analyzing Language it, Res- 
~:Ht:led f)o~mtin.v, l,awmn~'e Eflbanm: t lillsdale, NI. 

W~,i!:q t)~rvid L., and Pntlack, .kmlan I3. (1985) Massively Parallel Pars. 
h~,~: A ;;i~o~/gy inte~,:ac!ive Model of NalurN Language lnterpmta~ 
fi(m. C( gni~ivc Scic~ce, 9, pp.5 i-:74. 

Wi~k~';, "~oric!: A. (i9'7~;) All ArtiiiciN Intelligence Approach Io Machine 
Transi;~.fitm. lu ttoge~ (!. Sehank and Kennelh M. Colby (Eds.) 
Conwuter Mo~NI,~' tO" Thought and Language, San Francisco: W.H. 
l:qce m~l ), pp. l \[ 4.15 I. 

Wilks, Voficl; A. (19'\]5n) A Preferential Pattcm--Seeldng Semantics for 
Nafura! La~guage lt~li:,cnee. Artift'cial lnt'elligcnce, 6, pp.53-74. 

Wilks, Yorick A. (i975b) An Intelligent Analyser and Undersmnder tot 
English. Communications ~" the ACM, 18, pp.264-274. 

Wilks, Yorick A. (1978) Making Preferences More Active. Artificial 
h,~eHigence, 11, pp.197-223. 

Will(s, 'goict:. A., Dai C. Fass, Che~N-ming Gun, Jarnes E. McDonald, 
Tony Plate, and Bria, M. Slator (198'\]) A Tractable Machine Dic- 
~ioumy as a Resource R)r Computatkmal Semantics. Memoran- 
dum ir~ Computer and Cognitive Science, MCCS-86-105, Comput- 
ing Rc~;emch I_aboratory, New Mexico State University, New 
lWexico. 'Fo aN)ear ix, Bran Bogmaev and Ted Briscoe (Eds.) 
Camlnet.'~tional Lexicography Jot Natural Language Processing, 
!~ongmen: Harlow, Essex, England. 
