EXPLOITIN(~ A LARGE DATA BASE By LONG\IAN 
i 
A. blICtlIELS (English Dept), J. 5~LLB~)ERS (Computer Centre), J. NOi~L (English Dept) 
University of Li}ge, Belgium 
We wish to explore some of the aspects 
of the exploitation of two dictionary files 
by LONGHAN Ltd, one for 'core' \[mglish and 
one for Imglish idioms. 
We'll try to show the feasibility of an 
approach to language processing based on a 
lexicon, conceived of as the repository of 
grammatical, semantic and knowledge-of-the- 
world information. 
After giving a brief description of the 
computer :files (Section I) we'll focus on the 
following points : 
a) a lexical approach to granmar allows 
a considerable simplification of the PSG 
component of a parsing system (Section II, 
Part One)~ 
b) the syntactic potential of many 
lexemes (at surface structure level) can 
serve as a guide to their deep structure 
configurations (Section IIsPart Two)j 
c) provided that a dictionary makes use 
of a limited defining vocabulary, the texts of 
the dictionary definitions can be processed on 
the basis of correlations between syntactic 
structures (filled with individual lexenms or 
lexemes belonging to specifiable classes) and 
semantic relationships such as that between a 
process verb and an instrument (Section III). 
SECTION I. DESCRIPTION OF THE COxlPUTER FILES .......................................... 
A contract with LON@IAN Ltd has made it 
possible for us to have access to the computer 
files of two dictionaries, LDOCE (LONDON 
DICTIONARY OF CONTF~IPORARY ENGLISH) and LDOEI 
(LON(~IAN DICTIONAI~OF ENGLISII IDIOMS). I% 
have had the LDOCE file for some time but have 
only just received the LDOEI one. 
spe~c~aThe features LDOCE make of which it 
lly useful for language processing are 
the following : 
a) it reflects the surface structure 
environment of its entries by means of a 
sophisticated system of grammatical codes, 
most of which can be thought of as strict 
subcategorization features. For instance, 
IDOCE specifies 
i.- that nouns like FACT or CLAIM can be 
followed by a THAT-clause, 
2.- that a verb such as WATCH can occur follo- 
wed by an NP followed by an ING-form (we 
watched the soldiers bleeding). 
Though it is mainly concerned with 
SURFACE structure, LDOCE nevertheless distin- 
guishes between an NP pair follow in~ GIVE (He 
gaveJhi.s b.rotl!e~la new bicycle) \[DJcode 
NP I N 2 
and one following CO~SIDER (He considered 
this brother, a f~) ~X1lcode 
T NP 1 NP 2 -- 
b) through a system of semantic codes of 
the Katz-and-Fodor type (these codes do ~mt 
appear in the printed version of the dictiona- 
ry), LDOCE places semantic restrictions on the 
subjects and objects of verbs (or on the type 
of noun that an adjective can modify), speci-  + 
ing for instance that PERSUADE requires a 
t~A~ object, and EXTFNPORIZE a \[+ ItU~4AN~ 
subject. 
c) LDOCE makes use of a defining vocabula- 
ry of some 2,000 items - all the definitions 
and all the examples associated with the 60,000 
entries are couched in that restricted vocabu- 
lary. 
Concerning points a and b it should be 
emphasized that the gra~natical and semantic 
codes can appear at two different levels : 
i.- ENTRY level : the code is appropriate to 
all the definitions of the entry in ques- 
tion, 
2.- DEFINITION level : the code is not appro- 
priate to the whole entry (i.e. in all its 
senses) but only to those readings that 
correspond to the definitions that the 
code is tagged to. 
For instance, READ cannot be assigned the 
same grammatical and semantic codes in senten- 
ces 1 and 2 : 
1.- He rnana~e~ tr~ ro.ad nt \]~a.~t r~no honk 
every day 
2.- Your paper doesn't read too well. 
This second level makes it possible to 
avoid a proliferation of indiscriminate dis- 
junctions in the specification of the codes to 
be associated with a given lexeme. It seems to 
us that by restricting the occurrence of code 
specifications at only one level (nmnely, the 
374 
i~NTRY level), one reduces the predictive power 
of both grammatical and semantic codes to 
practically nil in the case of complex entries. 
On the ot\]~r hand, the codes that are appro- 
priate at DEFINITION level provide an interes- 
ting type of correlation between strict sub- 
categorization and selection rules on the one 
hand and choice of appropriate reading on the 
other : such a type of correlation is bound to 
prove very useful for machine translation pur- 
poses. 
voc~au ~ing to the use of the same defining 
lary, LDOEI is a natural extension of 
LDOCE. Whereas the latter merely lists the 
idiomatic phrases under the relevant headwords, 
LDOEI gives the information necessary for re- 
cognizing and generating all the syntactic and 
morphological variants of each idi~n. To give 
only one example, in the entry "TELL ° I WHERE 
TO GET OFF IV : Pass 2\]" the sign o indicates 
that ~LL admits of morphological variation 
in this phrase, I specifies the place of the 
indirect object (which does not belong to the 
idiomatic phrase as such) and the grammatical 
note iV : Pass 2Jinforms the user that the 
syntactic value of the ~4~ole phrase is verbal 
(i. e. that it functions as a VP) and that the 
passive is to be formed by selecting the 
indirect object as subject (,'Ib was told where 
to get off"). 
(LONGMAN LEXICON) 
~\]is forthcoming thesaurus is also 
designed to tie in with LDOCE, of which it is 
partly a by-product. As Section III will make 
clear, our analysis of LDOCE definitions will 
have to rely on a thesaurus, but we do not 
know yet whether LOLEX will be available in 
machine-readable form. 
SECTION II. TOWAPd)S A S~4ANTICALLY ENRICI~JD 
SURFACE PARSER BASED ON LDOCE 
I.- ~_!~!~!!z_~_~. 
It stands to reason that automatic parsing 
programmes have to have access to at least two 
linguistic components : a grammar and a lexicon. 
In most systems that we know something 
about, the gra~nar is a good deal more sophis- 
ticated than the lexicon. The latter includes 
only a small sub-part of the total lexicon for 
the language under study, while the grsmar 
takes care of a large proportion of the basic 
gran~natical structures. 
We would like to explore a diametrically 
opposed approach : our starting-point is a 
sophisticated lexicon for co:re English and our 
aim is to make maximum use of the information 
it contains to keep our grmmnar within strict 
bounds. 
An obvious first step in developing a 
parser based on LDOCE is to write algorithms 
that translate the various grammatical codes 
into scanning procedures . Most of these algo- 
rithms are fairly straightforward and have 
already been written. What we would like to 
focus on here is the simplification of the cate- 
gorial component that such a lexically based 
syntax permits. Consider 3 : 
3. The claim that he has succeeded is patent- 
ly false. Since there is a code (namely, 5 ) 
that stipulates whether an element (in this 
case, a countable noun coded C - the whole 
code is therefore \[~51 ) can be followed by a 
~IAT-clause, we will not attempt to account 
for T~T-clauses via rewrite rules for the cate- 
gory NP, i. e. we won't have such a rule as : 
NP---~NP ~T S 
Naturally enough, there is no LDOCE code sti- 
pulating that a noun can be followed by a rela- 
tive clause (such a code would be meaningless 
since virtually all nouns can have a relative 
clause - if not a restrictive, thln at least an 
appositive one - tagged on their right). We will 
therefore have to include relative clauses some- 
where in our rewrite rules for the category NP. 
Here too, however, the lexical approach to syn- 
tax can prove useful. To show this, let us 
first define a CONCATENATION as a string every 
member of which is tied to some other by means 
of a LDOCE grammatical code (it requires the 
other member for the satisfaction of its code 
or it serves to satisfy the other member's 
code). The concept of CONCATENATION can be 
equated with that of CLAUSE if it is extended 
to cover : 
i.- free elements, i, e. elements which are not 
bound to one particular word or phrase in- 
side the clause (both sentential adjuncts 
and linking words such as conjunctions 
would fall into this category). 
2.- a subject role, i. e. the creation of a 
link between a tensed V (the starting-point 
for the concatenation - see below) and an 
NP to be found on its right or on its left. 
We have already looked into the mechanisms 
of tensed V searches and subject role assign- 
ments and we have found that various properties 
of English make the task of algorithmizing 
these mechanisms less formidable than it 
appears at first sight. The most prominent 
among these properties are the following : 
I.- the conditions of use of the auxiliary DO; 
2.- the fact that only tensed Vs require a 
subject; 
3.- the fact that only the first (i. e. left- 
most) member of a verbal ~roup can bear 
tense; 
4.- the fact that it must bear tense; 
S.- the morphological contrast between verb and 
noun with respect to m~ber (- S marks sin- 
gular verbs but plural nouns). 
Turning now to relative clauses, we see 
that we can characterize them with great ease : 
a relative clause is s concatenation that opens 
with a relative phrase (one of whose realiza- 
tions is ~ and another the multi-purpose word 
THAT, so that a recognition procedure based on 
375- 
the occurrence of particular morphemes is bound 
to fail in some cases) and that misses an NP 
(it is this second property that has to be re- 
garded as essential). 
The readers who are familiar with Hudson 
1976 will have realized that the approach 
advocated here is nearer to Hudson's version of 
systemic gra~nar than to transfornmtional gram- 
mar : we make full use of sister-dependencies, 
starting with the tensed V, which we believe to 
provide the best entry-point into the network 
of relationships woven by the various code- 
bearing elements in a sentence. 
II.- _Dee_~_st~Ljcture_conf_igkjratitins _. 
It is obvious that our parser will have to 
be able to : 
I.- recognize the situations in which the basic 
order of the constituents (i. e. the one 
stipulated in the scanning procedures asso- 
ciated with the gra~atical codes) is dis- 
rupted under the effect of transformations 
such as PASSIVIZATION, TOPICALIZATION, 
PJ~LATIVE CLAUSE FOt~IATION, GAPPING, ...) 
2.- keep track of the constituents that have 
been moved. 
We do not intend to deal with these 
points here but we would like to stress that 
the problems for RECOGNITION are very different 
from those for GENERATION. RAISING and EQUI, 
for instance, are rather formidable and 
problem-ridden rules from the point of view 
of generation but we shall argue that we do not 
need their counterparts for recognition pur- 
poses. We shall illustrate this point by look- 
ing at verb complementation - at the same time 
we will show that the syntactic potential of a 
verb can be used as a guide to its deep 
structure configuration. 
In a VP the SYNTACTIC head is always the 
first, i. e. tensed verb. As we have seen, the 
way the parser builds up concatenations re- 
flects this property. As for the SEMANTIC head, 
it is very often another verb than the first 
one. This, however, does not matter in so far as 
the auxiliaries and semi-m~xiliaries (IIAPPiZ~4, 
SEEM, ...) do not have any semantic code asso- 
ciated with them and can therefore be regarded 
as semantically transparent : they have no 
effect whatsoever on the pailts that the 
semantic component will be called on to examine 
for compatibility. Consider such a sentence 
as4 : 
4.- b{y father seems to have been reading too 
many strips. 
11-te starting-point for building the 
concatenation would be the tensed V, i. e. 
SED{S : the concatenation would be allowed to 
grow both to the left (assigrunent of subject 
role to the NP 'my father') and to the right : 
he~appropriate syntactic code for SEI~,IS is 
3J here (i. e. followed by an infinitive 
with TO) : g.~ 
ub~j e~t fathers seem~ NP/ ~atisfigs g3\] code of SED.~S 
SI!Di is not coded semantically, so that the 
semantic component would not be called on at 
this stage. In the next step, IIAVE would be 
examined and its ~I ~ code seen to be 
applicable ~ 8Jspecifies that the code-bearing 
element be followed by an EN-form) so that a 
new sister dependency would be established : 
_ ./4---~ ~ 
My father seems to have |been J 
sa~tisfies CI 8J code of HAVE 
In similar fashion, BEEN would have an 
~13~ code (i. e. + ING-form) satisfied by 
IIEADING : 
!ly lather seems to nave been reading 
Neither HAVE nor BE are semantically 
coded with restrict to the definitions that have 
been chosen onYDasis of the grammatical codes 
that are satisfied in the sentence .~ READING 
on the other hand, will be coded sy~ttactically 
(it requires one NP as object-code ~Ti| ) and 
semantically (it requires a ~ ~ANJ subject). 
Since SED4, HAVE and BEEN are semantically 
transparent, the semantic component will exa- 
mine the pair ~JX father and re.ading and find 
them to be compatible as a subject-verb configu- 
ration. But how does the parser know that 
fathe__j is the subject of reading ? A very 
simple-minded rule states that there is no 
change in subject in a verbal complex as long 
as there is no interrupting NP; if there is one, 
it is to be regarded as the subject of the 
following verb(s) : 
I want to read -~ 
I started to read ~ subject of READ 
"i happened to be reading. 
Y want g~. to ready ~" 
I saw you reading~you subject of READ 
I made ~ read 
This rule admits of at least one exception, 
namely PROMISE : 
I promised you to read (I subject of READ 
in spite of interrupting YOU). 
Another problem relating to deep structure 
configurations is that of determining, in 
v + NP + J (TO) + INFINrrIVE l 
+ ING-FOR~ J 
structures, whether the NP is to be regarded as 
the object of the V or not (contrast 'I want 
him to go' with 'I persuaded him to go'). 
Instead of going into each deep structure 
distinction that can be drawn within the field 
of verb complementation, we will show that the 
verb classes which Akmajian and Heny 1975 
(p. 364 and fell.) find it necessary to set up 
in their introduction to transformational 
grammar to account for deep structure distinc- 
tions (Figure i) can be held apart on the 
basis of their surface structure potential as 
captured in their LDOCE gran~natical codes. 
Figure 1 
A1qnajian and Heny's verb classes 
See appendix I . 
376- 
The raised numbers on the features in 
the matrix below refer to the following 
list of test sentences : 
I. I want to go 
2. I want him to go 
3. a) ? I want that he should go 
b) * I want that he goes 
4. * I persuaded to go 
5. I persuaded him to go 
6. * I persuaded that he went 
7. * I believe to have gone 
8. I believe him to have gone 
9. I believe that he has gone 
10. I failed to go 
11. * I failed him to go 
12. * I failed that he went 
CLASS NUM- 
BER + ONE 
TYPICAL 
EXPONENT 
CODES I 
T3/I 3 i v3/x (to be)... 1 T5/T5a 
I : WANT +I 
II : PER- _4 
SUADH 
III: BE- I _7 LIEV  
IV : FAIL i +10 
i 
+2 
+5 
+8 
11 
3 
6 
+9 
12 
The NP following the verb is its deep 
object only in the case of Class II 
verbs (I persuaded him to go ~I 
persuaded him); there is no NP in Class 
IV (* I failed him to go) and the NP is 
not the object in Class I or in Class 
III (I want him to go ~I want him; I 
believe him to have gone-4-~I believe |r 
him) . 
As for PROMISE (not discussed in 
Akmajian and Heny 1975) it could be 
defined by means of the following 
feature row : + T3, + T5, + V 3 : 
I promised to go (T3) 
I promised him to ~o (V3) 
I promised that I would go (T5) 
The NP between PROMISE and the TO- 
INFINITIVE is the object (as in the 
PERSUADE class) but it is not the sub- 
ject of the infinitive. 
SECTION THREE : LDOCE DEFINITIONS : AN 
IR APPROACH TO SEMANTIC AND KNOWLEDGE-OF- 
THE-WORLD INFORMATION. 
/ 
LDOCE definitions convey semantic infor- 
mation in a fairly explicit, but non- 
formatted, form. Even though all defini- 
tions are written in a DEFINING VOCABU- 
LARY (not to be confused with a BASIC 
VOCABULARY - see below), no attempt has 
been made to stick to a limited number 
of DEFINING FORMULAE. To givean example 
of what we mean by DEFINING FORMULA, and 
to anticipate on what will be the main 
concern of this section, we wish to look 
at the class of INSTRUMENTS. In theory, 
it could be agreed by the dictionary- 
makers that all instruments have to 
include the phrase "instrument used for 
Ving" in their definitions. In such a 
defining formula the word INSTRUMENT 
would be a DEFINING PRIMITIVE and the 
predicate USED FOR would be a DEFINING 
RELATION (in this case, between an 
instrument and a predicate). Such a kind 
of formatted definition would be less 
precise and less exact, but infinitely 
more usable, than a common type defini- 
tion. Smith and Maxwell 1973 (p2) point 
out that in a typical dictionary 
approximately 50 % of the vocabulary 
appears in the definitions. LDOCE is a 
major improvement on such a typical 
dictionary in that its defining 
--377-- 
vocabulary is restricted to some 2,000 
items (used to define some 60,000 
entries). My purpose in this section is 
to reflect on the possibility of 
turning a significant number of LDOCE 
definitions into fully formatted ones 
(i.e. making use of defining formulae). 
Consider the sentence : 
I saw the man in the park with a 
telescope 
\[Woods in Rustin 1973, p. 17~ 
The PREFERRED reading is the one that 
associates 'with a telescope' with the 
predicate 'saw' rather than with either 
of the NP heads 'man' or 'park' : 'saw 
with a telescope' rather than 'man with 
a telescope' or 'park with a telescope'. 
If we had available a formatted defini- 
tion of TELESCOPE ("instrument used for 
seeing ..."), there would be no 
problem in a system of preferential 
semantics : the link between 'saw' and 
'telescope' (embodied in the definition 
of the latter) would lead to the 
selection of the preferred reading on 
the basis of the DENSEST MATCH FIRST 
principle. As a matter of fact, the 
LDOCE definition for 'telescope' is 
very nearly what we need : 
"a tubelike scientific instrument 
used for seeing distant objects by 
making them appear nearer and 
larger" 
A simple matching procedure between our 
suggested defining formula for 
instruments and the LDOCE definition 
for 'telescope' would have been 
sufficient in this case. The problem, 
of course, is that there is absolutely 
no guarantee that the defining formula 
will be part of the definition of all 
instruments. HAMMER, for instance, is 
defined as : 
"a tool with a heavy head for 
driving nails into wood or for 
striking things to break them or 
move them" (Definition I) 
No simple procedure will associate 
INSTRUMENT with HAMMER. The fact that 
LDOCE makes use of a defining vocabu- 
lary, however, ensures that the defining 
noun (TOOL in this case) is a member of 
a finite list, namely the LDOCE defining 
vocabulary itself. One can go a step 
further and make the hypothesis that 
the defining noun will belong to a 
definite subset within the defining 
vocabulary. One can go through that 
vocabulary and select the words that 
could stand for INSTRUMENTS. The subset 
that this procedure yields can fairly 
easily be divided into two further 
groups : on the one hand one finds 
such general words as TOOL and APPARATUS 
(note that the latter would not be 
included in a BASIC VOCABULARY) which 
could also be used in defining formulae; 
on the other hand one has to include 
such specific items as BOAT, BICYCLE 
and GUN, which are instances of instru- 
ments. The second group is of course 
much more problematic than the first : 
one has to be concerned with TYPICAL 
instruments, otherwise all PHYSICAL 
OBJECTS would have to be included : 
He hit her with the tail of a dead 
snake. 
The INSTRUMENT reading of the 'with' - 
phrase is not due to any intrinsic 
property of either 'tail' or 'snake', 
but rather to four factors : 
--378-- 
a) WITH often introduces an instrumental 
adjunct; 
b) the 'with' -phrase in this sentence 
cannot be read as postmodifying 
'her'; 
c) it cannot be read as an accompaniment 
adjunct for 'he' either; 
d) the predicate 'hit' can take an 
instrumental adjunct. 
The reader will have noticed that 
factors a, c and d also apply - mutatis 
mutandis - to the example involving the 
predicate SEE. This, however, does not 
imply that the link between TELESCOPE 
and SEE was of no use in preferring the 
instrument reading for the 'with' 
-phrase - note that 'with a telescope' 
COULD postmodify the NP heads 'man' and 
'park'; besides, even if it could not, 
we would still have to find a way of 
telling the system and this task may 
well prove considerably more formidable 
than that of associating instruments 
and predicates. 
The following items in the LDOCE 
defining vocabulary could be regarded 
as making up the subset for the 
concept INSTRUMENT : 
GROUP I 
apparatus 
instrument 
machine 
machinery 
means 
organ 
tool 
GROUR II 
arm \[R\] 
arms CR~ 
army 
arrow 
axe 
beak 
GROUP II (continued) 
belt gun prayer 
bicycle hammer proof 
boat hand \[R~ pump 
boot handle \[R\] radio 
brain hook railway 
brick key road 
bridge knife rod 
brush knot roof 
bullet ladder rope 
bus lamp sail 
button law scales 
camera letter scissors 
candle map screw 
car mat servant ~R\] 
card medicine shield 
cart message shoe 
chain microscope sign 
coin mirror signal 
comb motor \[R\] slave \[R~ 
control \[R\] nail spade 
cover needle spring 
curtain network stairs 
drum pan stone 
engine \[R\] pen string 
factory pin support 
fence plane sword 
fork poison system 
gate pole taxi 
gift post telephone 
glass pot telegram 
telegraph 
television 
thread 
thumb 
ticket 
tooth 
train 
trap 
vehicle 
weapon \[R\] 
w~l 
whip 
whistle 
NOTES 
I. For all items in both groups, 
POS (Part of Speech) = n 
2. All items in Group I - except MEANS, 
which is itself a head - appear 
under the head TOOL in Roget's 
Thesaurus. 
3. In Group II the items followed by 
\[R\] occur in Roger's Thesaurus under 
the head TOOL. 
--379-- 
4. The underlined items in Group II are 
more general and could perhaps be 
singled out in a third group, 
intermediate between I and II. 
Obviously, the lists as such are not 
sufficient for our purpose : words such 
as SPRING and MEDICINE are not relevant 
to the INSTRUMENT concept in some of 
their most frequent uses - for our 
purposes the defining vocabulary should 
not have been limited to a list of 
LEXICAL ITEMS; in case of polysemic 
words, numbers should have been added 
to make clear which definitions were to 
be associated with the defining word : 
SPRING I (= a source), 2 (= a season), 
4 (= elasticity), 5 (= an active 
healthy quality) and 6 (= an act of 
springing), are not relevant to the 
INSTRUMENT concept. Since - in theory - 
the noun SPRING can be used with all 
six meanings in LDOCE definitions, its 
inclusion in our list is liable to 
prove detrimental : it can lead the 
system to associate the INSTRUMENT 
concept with a defining word that has 
nothing to do with instrumentality• 
Going back to the LDOCE definition for 
HAMMER, we realize that the algorithm 
that will associate instruments and 
predicates will have to take into 
account, not only the Ving form (in the 
formula 'for Ving'), but also its 
object; otherwise a hammer is going to 
be thought of as a kind of vehicle : 
Compa r e 
a tool ... for driving DRIVE 1 2/3 in LDOCE 
with 
a tool ... for driving DRIVE 1 5/6 in LDOCE 
nails 
A second difficulty that we must face 
up to is that there may be no defining 
NOUN, but an all-purpose indefinite 
such as SOMETHING or ANYTHING. In that 
case, however, the INSTRUMENT concept 
is likely to be expressed somewhere 
else in the definitions, by means of 
(USED) FOR Ving, for instance. This 
last point leads us to an examination 
of the various ways in which the link 
between instrument and predicate can be 
conveyed; the existence of a defining 
vocabulary is a help but the range of 
SYNTACTIC possibles remains enormous; 
however, there is something that could 
be called the LEXICOGRAPHICAL TRADITION 
and familiarity with that tradition can 
help cut down on the number of possible 
formulae - the following stand a good 
chance of being rather heavily used : 
\[~OME THING \] •.. \] 
THING 
"INSTRUMENT ... 
TOOL 
Q I O 
FIG. 2 
USED rIN1 TO V I. Y.I 
MADE TO V 
'I%IAT \[ CAN V 
IS USED TO 
I MADE TO V I 
USED TO V 
(USED) FOR VING 
Obviously, processing LDOCE definitions 
is a lot of work in terms of the 
necessary algorithms and in terms of 
the sheer volume of language data to be 
scrutinized. We suggest that a useful 
approach is provided by IR (Information 
Retrieval) techniques as embodied in 
--380-- 
the IBM system known as STAIRS. 
STAIRS processes various objects, which 
can be worked into the following 
hierarchy : 
DOCUMENTS (TOP) / 
PARAGRAPHS / 
SENTENCES / 
WORDS (BOTTOM) 
The various paragraphs of a given 
document can be assigned labeL5, so 
that the search within a single 
document can be oriented. 
STAIRS provides a number of SEARCH 
OPERATORS, which will be briefly char- 
acterized below. A ~77~,£/ search 
operator can be used to link any o{ the 
following three categories : 
I. word tokens, e.g. DISEASES, APPLIES, 
COMPUTERIZED, IF~ 
2.a) stems, e.g. RUN-, ANTAGONIZ-, 
MOTHER- (the use of a character mask 
enables the system to assign RUNNING 
RUNS, RUNNER, RUNNERS, etc. to the 
stem RUN-) and b) lexemes for which 
STAIRS generates the morphological 
variants~ 
3. any expression consisting of elements 
of type I or 2 linked by STAIRS 
SEARCH OPERATORS (the definition is 
therefore recursive, and allows any 
degree of embedding). 
Let A and B stand for elements 
belonging to any of the above three 
types. The operato~that STAIRS 
works with are the following : 
A ADJ B : A and B occur next to 
each other and in that 
order in the document 
to be retrieved. 
A SYN B : A and B are to be re- 
garded as synonymS\[or 
a given search operation 
A WITH B : A and B occur in the 
same sentence 
A SAME B : A and B occur in the 
same paragraph 
NOT B : B doesn't occur in the 
document to be retrieved 
A AND B : both A & B 
A OR B : inclusive OR 
A XOR B : exclusive OR 
In our system the STAIRS hierarchy would 
correspond to the following : 
A. A DOCUMENT -A HOMOGRAPH (e.g. 
DOUBT 2) or AN ENTRY 
WITHOUT HOMOGRAPHS 
(e.g. PONDEROUS) in 
the LDOCE file 
B. A PARAGRAPH -a specified FIELD 
within A, e.g. POS 
(part of speech), 
GRAMMATICAL CODE, 
SEMANTIC CODE, TEXT 
OF THE DEFINITION,... 
C. A SENTENCE -any sentence included 
in the text of a given 
definition 
D. A WORD -the various words 
within a definition 
or the various codes 
and POS within a code 
field or a POS field. 
It will be apparent that in order to 
rewrite FIG. 2 as a set of search oper- 
ations in a STAIRS - like system we 
need to be able to refer to specified 
morphological analyses. Moreover, V and 
NP are neither word tokens nor stems : 
they refer to categories (respectively 
lexical and phrase structure) and we 
will have to extend the possibilities of 
the system so that such categories can 
be included in the expressions guiding 
the search and retrieval operations. 
Phrase structure categories are a hard 
nut to crack, and we will probably have 
to do without them in a first stage, but 
lexical categories such as V can be 
housed in a STAIRS - like system : a V 
is the name of any document that includes 
V among its POS - paragraph. 
Leaving aside the reference to NPs, 
FIG. 2 can be rewritten as a complex 
STAIRS - like search operation with six 
levels. The embedding of STAIRS ex- 
pressions within STAIRS expressions 
gives rise to the use of labels such as 
--381-- 
At, BI, etc; the colon is to be read 
as "can be defined as"" 
AI OR A2 
AI : BI WITH B2 
BI :~NYTHING' OR 'SOMETHING' 
B2 : CI OR C2 OR C3 
CI :---'USE~- WITH 'FOR' 
ADJ V-ING 
C2 : ~-~R ~ ADJ V-ING 
C3 : 'MADE' ADJ 'TO' ADJ V 
A2 : B3 WITH B4 
B3 .'~NSTRUMENT' OR SYN- 
INSTRUMENT 
B4 : C4 OR C5 OR C6 OR C7 
C4 ~--DI AD--J D2 -- 
DI : 'WHICH' OR 'THAT' 
D2 : Vs OR EI-'OR E2 
El : 'CANrADJ V 
E2 : 'IS' ADJ 
'USED' 
WITH 'TO' 
ADJ V 
C5 : 'MADE' AD-J-"T~ ' ADJ V 
C6 : 'USED' WITH 'TO' ADJ V 
C7 : D3 OR D4 
D3 'USED' WITH 
'FOR' ADJ V-ING 
D4 : 'FOR' ADJ V-ING __ ----- 
Note that to be really useful, the 
algorithms that associate predicates 
and instruments should have access to 
a thesaurus-like classification of 
predicates. Take for instance the defi- 
nition of MICROSCOPE : 
"an instrument that makes very small 
near objects seem larger, and so can 
be used for examining them" 
The preferential link is between 
MICROSCOPE and EXAMINE and a sentence 
such as : 
"He examined the new virus with an 
extremely powerful microscope" 
will be interpreted the right way. But 
what about 
"He studied the new virus with an 
extremely powerful microscope" ? 
We could get around this problem if we 
had access to a thesaurus like Roget's: 
since STUDY and EXAMINE share a SUBHEAD 
in Roget's, viz. SCAN in 438 VISION, a 
link between STUDY and MICROSCOPE 
could be established. 
~_~ 
Figure I Akmajian and Heny's verb classes 
CLASS I : prefer, want, hate, like, hope, 
desire, love. 
CLASS II : force, persuade, allow, coax, 
help, order, permit, make, cause. 
CLASS III: believe, assume, know, perceive, 
find, prove, understand, imagine. 
CLASS IV : condescend, dare, endeavour, fail, 
manage, proceed, refuse. 
CLASS V : seem, appear, happen, turn out. 
--382 - 

References

-LONGMAN DICTIONARY OF CONTEMPORARY 
ENGLISH, 1978. 

-LONGMAN DICTIONARY OF ENGLISH IDIOMS, 
1979. 

-AKMAJIAN AND HENY, 1975 : Akmajian, A. 
and Heny, F., An Introduction to the 
Principles of Transformational Syntax, 
MIT Press, Cambridge and London, 1975. 

-HUDSON 1976 : Hudson, R.A., Arguments 
for a Non-transformational Grammar, 
The University of Chicago Press, 
Chicago and London, 1976. 

-Roget's Thesaurus of English Words and 
Phrases, Penguin Books, 1962 Longman 
Edition. 

-RUSTIN, 1973 : Rustin, R. (ed.), Natural 
Language Processing, Algorithmics Press 
New York, 1973. 

-SMITH & MAXWELL 1973 : Smith, R.N., and 
Maxwell, E., An English Dictionary 
for Computerized Syntactic and Semantic 
Processing Systems, mimeo, Pisa 1973. 
