ACQUISITION OF SEMANTIC INFORMATION 
FROM AN ON-LINE DICTIONARY 
Nicoletta CALZOLAR1 - Eugenio PICCHI 
Dipartimento di Linguistica, Universita" di Pisa 
lstituto di Linguistica Computazionale, CNR, Pisa 
Via della l:aggiola 32 
L56100 P1SA - ITALY 
Abstract 
After the first work on machine-readable dictionaries 
(MRDs) in the seventies, and with the recent development of 
the concept of a lexical database (LI)B) in which interaction, 
flexibility and multidim;ensionality can be achieved, but 
everything must be explicitly stated in advance, a new 
possibility which is now emerging is that of a procedmal 
exploitation of the full range of semantic in!brmation implicitly 
contained in MRI)s. The dictionary is considered in this 
framework as a prima~'y source of basic general knowledge. In 
the paper we describe a project to develop a system which has 
word-sense acquisition fi'om information contained in 
computerized dictionaries and knowledge organization as its 
main objectives. The approach consists in a discovery proce- 
dure technique operating on natural language delinitions, which 
is recursively applied and relined. We start \[i'om free-text 
definitions, in natural language linear form, analyzing and 
converting them into infbrmationally equivalent structured 
forms. This new approach, which aims at reorganizing ti'ee text 
into elaborately structured information, could be called the 
Lcxical Knowledge Base (I.KB) approach. 
1. Baekgromld 
For a cmlsidcrable period in theoretical and computational 
linguistics, there was a predominant lack of interest in lexical 
problems, which were regarded as being of minor importance 
with respect to "core" issues concerning linguistic phenomena, 
mainly of a syntactic nature, l)uring the last few years, 
howevcr, this trend has been ahnost reversed. The role of the 
lexicon in both linguistic thcories and computational 
applications is now being greatly revalued and one aspect on 
which a number of research groups are now focussing their 
attention is the possibility of reusing the large quantity of data 
contained in alrcady existing machine-readable lcxical sources, 
mainly dictionaries prepared for photocomposition, as a short 
cut in the construction of extensive NLl'-oriented lexicons. 
This position was formulated very clearly in a number of papers 
presented at a recent workshop organized in Grosseto (Italy) 
and sponsored by the European Community (see Walker, 
Zampolli, Calzolari, furthcoming), and can be found in the set 
el" recommendations which was one of" the results of this 
workshop (Zampolli 1987, pp.332-335). 
After the first work on machine-readable dictiona,'ies 
(MRI)s) in the seventies (see Olney 1972, Sherman 1974), and 
with the recent development oI~the concept of'a lexical database 
(l.l)B) in which interaction, flexibility and multidiinensionality 
can be achieved, but everything must be explicitly stated in 
advance (see e.g. Amsler 1980, Byrd 1983, Calzolari 1982, 
Michiels 1980), a new possibility which is now emerging is that 
o1" a procedural exploitation of the lull range of semantic 
intbrmation implicitly contained in MRI)s (see Wilks 1987, 
Binot 1987, Alshawi forthcoming, Calzolari forthcoming). 
\[he dictionary is now considered as a prilnary source not 
only of lcxical knowledge but also of basic general knowledge 
(ranging over the entire "world"), and some of tim dictionary 
systems which are being developed have knowled~,e acquisition 
and knowledge organization as their principal objectives (see 
also l.enat al/d \[:eigenbaum 1987). In this paper we describe at 
project which we are now conducting on the acquisition of 
semantic inlbrmation ti'om computerized dictionaries. 
2. I)ata and estal)lished methods fiw hierarchic'd semantic 
classifying 
The data we use in our research include the lexical 
information contained in the Italian Machine I)ictionary 
(I)MI), which is ah'eady structured as a LI)B and is nrainly 
based on the Zingarelli Italian dictionary (1970); the DM l-l)B 
has different types o\[" linguistic inIormation already accessible 
on-line. A morphological module generates and analy,'es the 
intlected word-forms: approximately I million fiom 120,000 
lemmas, l.cnm/as, word-forms, deriwitivcs/suflixes, POS, usage 
codes, and specialized terminology codes, can be used its direct 
access search keys through which the user can query the 
database dictionary. On the semantic side, synonyms, 
hyponyms, and hypcrnyms constitute already implemented 
access paths covering all of the approximately 200,000 
definitions contained in the dictionary. Examples of possible 
queries arc the lbllowing: give me all the nouns defined as 
names of vehicles, of sounds, of games, all the verbs defined by 
a particular genus term, for example 'M UOVEII.E' (to move), 
'TAGLIARI.:' (to cut), etc. The procedures used to find 
B7 
hypernyms in definitions and to create taxonomies are similar 
to those used by other groups (see Chodorow 1985, Calzolari 
1983, Amslcr 1981). 
We have now begun work on restructuring another dictionary 
available in MRF, the Garzanti Italian dictionary (1984). A 
parser has been implemented which, on the basis of the 
typesetting codes for photocomposition, identifies the rough 
structure of each lexical entry. Fig. 1 displays the output of a 
parsed entry of the Garzanti dictionary. Fig. 2 represents the 
provisional model for a monolingual lexical entry as we have 
defined it so far. Fig. 3 gives the projection of the first 
interpretation of the typesetting codes into this model; other 
kinds of information will be added afterwards (for example, that 
obtained by the inductive procedures described in the paper). 
............................................................ 
\[1\] = arnese 
\[3\] = \[-ne ~\] {s.m.) 
\[4\] : i 
\[3\] = utensile; attrezzo o strumento da lavero: 
{gli arnesi del falegname} \[4\] 
= 2 \[3\] = qualsiasi oggetto che non si sappia o non si veglia 
determinate: {aehe serve quell'-?) / {quell' uomo 
e' un pessimo} -, e ~ un tipo poco raccomandabile 
\[4\] = 3 
\[3\] = abito, vestimento; maniera di vestire (anche {fig.)) 
/ {essere bone}, {male in}.-, trovarsi in buone, cat- 
tive condizioni fisiche o economiche. 
............................................................ 
Fig. I - Output of the photocomposition codes. 
(the number in the first column identifies the type 
of data) 
Entry # 
Homograph # 
Pronunciation 
Paradigm Label 
POS 
Syntactic Codes 
Usage Label 
Pointers to the base-lemma and/or to all derivatives 
Pointers to graphical variants 
Sense# 
Field Label 
Synctactic Codes 
Figurative , extended, etc. 
Definitions 
Pointers to Synonyms 
Pointers to Antonyms 
Pointers to Hyponyms, Hyperonyms 
Pointers to other Entries through other Relations 
Semantic (inherent) Features 
Formalized Word-sense Representation 
Examples # 
Example 
Figurative, rare , .. 
Definitions of a particular contextual usage 
Idioms 
Citations 
Proverbs 
............................................................ 
Fig. 2 - Provisional structure of a monolingual entry. 
001 Entry = arnese 
005 PoS = s.m. 
003 Pron = -no L 
006 Sense = I 
007 Def = utensile 
007 Def = attrezzo o strumento da lavoro 
008 Exan~l = gli arnesi del falegname 
006 Sense = 2 
007 Def = qualsiasi oggetto che non si sappia o 
non si voglia determinare 
008 Exampl = ache serve quell'- 
012 Idiom = quell'uomo e' un pessimo ~ 
013 Expl = e' un tipo poco raccomandabile 
006 Sense = 3 
007 Def = abito, vestimento 
014 Field = anche {fig.} 
007 Oef = maniera di vestire 
012 Idiom = essere bene, male in - 
013 Expl = trovarsi in buone, cattive condizioni 
fisiche o economiche. 
...................................................... 
Fig. 3 - Example of a parsed Entry. 
88 
The merging of part of the data of tile DM I and the Garzanti 
dictionary into a single LI)B has already been completed, e.g. 
for lemmas, POSs, usage codes, etc. We now have to tackle the 
problcm of reorganizing the semantic data (dcfinitions and 
examples). Itcre our strategy is to design a new procedural 
system which is ablc to gradually "learn" and acquire semantic 
infornlation from dictionary definitions, going well bcyond thc 
IS-A hierarchies constructed so far, in order to attempt to also 
capturc what is prescnt in the "diffcrcntia" part of the definition. 
This can be achieved with some success given the particular 
nature of lexicographic definitions, with: a) a generic (and 
pe'rhaps over simplistic) description of the "world"; b) a rather 
lcxically and syntactically constrained and a somewhat regular 
natural language tcxt (Calzolari 1984, Wilks 1987). 
After having mappcd the codcs for photocomposition into 
linguistically relevant codes, all the preliminarily parsed data of 
the Garzanti have been organized on a PC in the form of a 
Textual Database (DBrl'), a fuEl-text Information Retrieval (IR) 
system in which all occurrences of any word-form or lermna can 
be directly accessed (Picchi 1983). The I)BT has been found to 
be a very powerful tool in evidencing lexical units and particular 
syntagms which can then be exploited in our "pattern- 
matching" procedure. With the text in DBT form it is possible 
to search occurrences of single word-forms in definitions and 
examples, lemmas, codes of various types (POS, specialized 
languages, usage labels, etc.), and also cooccurrences of any of 
these items throughout the entire dictionary. In addition, 
structures composed of combinations of the above elements 
connected by the logical operators "and" and "or" to any degree 
of complexity can also be searched. The results of such queries 
are returned together with the pertinent dictionary entries. 
Obviously frequencies can also be obtained. All this 
information can be retrieved with Fast interactive access. 
We have therefore already implemented two types of 
organization for dictionary data: 
1) DB-type organization with the DM1 (we have not used a 
standard DBMS, but an ad hoc designed relational 1)B system); 
2) a full-text IR system for the Garzanti dictionary. 
Although both types of organization have proved to be very 
powerful tools for different scopes, at tile same time each 
presents certain drawbacks and difficulties, due to the particular 
nature of dictionary data which in neither case has it been 
possible to fully exploit. Dictionary data is in fact of a very 
particular nature, consisting of a combination of free text in a 
highly organized structure. The DB approach copes well with 
the second characteristic, while the \[R approach is successful in 
handling free text. tlowever neither is capable of fully 
exploiting the two features in combination. A new method 
must be envisaged, capable of reorganizing free text into 
elaborately structured information: this could be called the 
Lexical Knowledge Base (LKB) approach, and is the aim of the 
project described here. 
3. Techniques fi~r word-sense acquisition 
Discow:ry procedure techniques prove to be useful in 
extracting semantic information from definition texts. In 
general, our approach consists in starting from fi'ee-tcxt 
definitions, in natural languagc linear form, analyzing and 
converting them into inlormationally equivalent structured 
tbrms. The preliminary step of the work consisted in applying 
the morphological analyzer to the definitions; tim result of this 
process tbr one definition appears in Fig. 4. A program 
designed for homograph disambignation was then run on the 
otput produced by this morphological processor. The 
disambiguator consists partly in rules generally valid for Italian, 
based on the immediate right and left context, and partly in ad 
hoc rules written for the particular syntax used in lexicographic 
definitions. Fig. 5 shows the result of applying this 
disambiguation procedure to all the homographs shown in the 
preceding example. We then had to implement a set of 
discovery procedures acting on dictionary definitions. 
Entry ( EDITORE ) Def ( che o chi stampa e pubblica libri, periodici 
o musica, a scopo commereiale) 
F ( che ) L (che, \['PR' ,\['NN'\] \], \['PT' ,\['NS'\] \], 
\['DT' ,\['NN'\] \], \['DE' ,\['NN'\] \], 
\['PI' ,\['MS'\] \], \['C ' ,\[' '2 \] ) 
F(o) L (o, \['SN' ,\['NS'\] \], \['C ' ,\[' '\] \], 
\['I ' ,\[' '\] \] ) 
F ( chl ) L (chi, \['PR' ,\['NS'\] \] ) 
F ( stampa ) L \[stampa, \['SF' ,\['FS'\] \] ) 
L (stampare, \['VTP' ,\['S31P','S2MP'\] \] ) 
F(e) t (e, \['SN' ,\['NS'\] \], \['CC' ,\[' '\] \] ) 
F ( pubblica ) L (pubblico, \['A' ,\['FS '1 \] ) 
L (pubblicare, \['VT' ,\['S31P','S2MP'\] \] ) 
F ( libri ) L (libro, \['SM' ,\['MP'\] \] ) 
L (librare, \['VTR' ,\['S21P','SICP', 'S2CP','S3CP'\] \] ) 
P\[,) F ( periodici ) 
L (periodico, \['A ' ,\['MP'\] \], \['SM' ,\['MP'\] \] ) 
F(o) L (o, \['SN' ,\['NS'\] \], \['C ' ,\[' '1 1, 
'l ' ,\[' '\] \] ) F ( musica ) 
L (musica, \['SF' ,\['FS'\] \] ) L (muslcare, \['VTI' ,\['S31P','S2NP'\] \] ) 
I'(, ) F(a) 
L (a, \['SN' ,\['NN'\] \], \['E ' ,\[' '\] \] ) 
I- \[ scopo ) 
L (scope, \['SM' ,\['MS'\] \] ) 
L (scopare, \['VT' ,\['SLIP'\] \] ) 
F ( commereiale ) 
L (eommerciale, \['A' ,\['NS '\] \] ) 
.......................................................... 
Fig. 4 - Output of the morphological analyzer 
The first analysis of the definitional data was performed 
manually for single definitimls, and quantitatively for the most 
frequently occurring words and syntagms. From this analysis 
we have established a number of broadly defined and simplified 
Categories of knowledge and Relations, which on the one hand 
intuitively reflect basic "conceptual categories" and on the other 
represenl attested lexicographic definitional categories. They 
also rely on past experience of similar work (both on Italian and 
on English), or of AI research. In order to allow the inductive 
a ¸ . ................................................... 
Entry ( EDITORE ) 
Def I che o chi stampa e pubblica libri, periodici 
musica, a scopo commerciale} F che) 
L (che, \['PR' ,\['NN'\] \] ) F o) 
L (o, \['C ' ,\[' '\] \] ) F chi ) 
L (chi, \['PR' ,\['NS'\] \] ) 
F stampa ) L (stampare, \['VTP' ,\['S31P'\] \] ) 
E e) L (e, \['CC' ,\[' '\] \] ) 
F pubbl ica ) 
L (pubblicare, \['VT' ,\['S31P'\] \] ) F libri ) 
L \[fibre, \['SW ,\['MP'\] \] ) 
P , ) F periodici ) 
L (periodico, \['SM' ,\['MP'\] \] ) e(o) 
L (o, \['C ' ,\[' '\] \] ) F ( musica ) 
L (musica, \['SF' ,\['FS'\] \] ) 
P(,) 
F(a) 
L (a, \['E ' ,\[' '\] \] ) F (scopo ) 
L (seopo, \['SM' ,\['MS'\] \] ) F ( eommerciale ) 
L (commerciale, \['A' ,\['NS '\] \] ) 
...................................................... 
Fig. 5 - Output of the disambiguation procedure 
patteru-tnatching rules to perl'orn/ the successive phases 
correctly and so that nlore coherent retrieval operations are 
possible, a "basic vocabulary" has been established (bolh for the 
"(k~tegories" and for the "Relations") mainly (m the basis o1" 
quantitative and intuitive considerations, and is constituted by 
v<ords acting its Labels. As an example, the following lcmma~i: 
'arnese, attrczzo, dispositivo, strumcnto, congcgno', which 
altogether appear in dictionary definitions 761 dines, have been 
grouped under the l.abel 'INSTR.UMI:~NT'. Other examples 
of I.abels behmging to the basic vocalmlary which ha~e been 
established tbr hyl~ernyms are the following: SET, PART, 
SCII!N(II!, Ill;MAN, ANIMAl., Pl.A.CI~, ,\CT, I I-I'ISCI', 
I.IQUII), Pl.ANT, INI \[AIHTANT, SO1.;ND, G:\M F, 
TI'XTII.I-, MOVIi, BliCOMI!, l/)Sl-, etc. 
This is, therefore, ou.r approach. We begin with a system 
which has simple and general pnrpose pattern-matching 
capabilities, designing it as an incremental system. To cope 
with the fact that there are ~ariations in the way the same 
conceptual category or the same relation is linguistically 
(lcxically aider syntactically) rendered in natural language 
definitions, each sttcll category or relation is associated with a 
list of specilicd lcxical units and or syntactic t'caturcs which give 
the variant Ibrms. The search is then driven by these lists of 
patterns to handle the grammatical and lexical variations. 
The "pattcn>nmtching" strategy has bccn obviously 
integrated with the Italian morphological analyzer to handle 
inflectional variation. The patterns may contain either l.abcls, 
or Lemmas, or Word-tbrms. For the Labels, the system 
searches for all the associated lcmmas and all their word-fornas 
(unless otherwise spccificd); in the same way l.emmas are 
automatically expanded to cover their inllccted word-fo,'ms, 
Generally, wc look for recurring patterns in the definitions 
and attempt to associate them with corresponding relations or 
conceptual categories. Fig. 6 lists some of the entries and 
delinitimts obtained when querying the dictionary in t)BT form 
for cooccurrcnces of items such as 'science, discipline, 
branch,...' together with 'studies, concerns, " Analyzing the 
89 
results of similar queries to the dictionary we are able to better 
identify a number of patterns to be used in the semantic 
scanning of the definitions. 
Textual Data Base Dizlonario Garzanti 
.......... ;;~;;;;;;'~;;'~-~iE~-~';i;;i~ ................... 
3) ANATOMIA : PoS s.f. S#1 scienza ehe mediante la dissezio- 
nee altri metodi di ricerca studia gli organismi 
viventi nella lore forma esteriore e ... 
6) ARALDICA : PoS s.f. scienza del blasone, che studia e 
regola la composizione degli stemmi gentilizi. 
9) ASTROFISICA : PoS s.f. scienza che studia la natura fi- 
sica degli astri. 
IB) BIOLOGIA : PoS s.f. scienza che studia i fenomeni della 
vitae le leggi che li governano. 
35) ETIMOLOGIA : PoS s.f. S#I scienza che studia le origini 
delle parole di una lingua. 
37) FISICA : PoS s.f. scienza teorlco-sperimentale che studia 
i fenomeni naturali e le leggi relative 
56) MERCEOLOGIA : PoS s.f. scienza applicata che studia le 
merci secondo la lore origine, i caratteri fisici, 
gli usi, la produzione e ... 
. ............................................................... 
searching for ... BRANCA l STUDIA 
3) DIETETICA : PoS s.f. branca della medicine che studia la 
composizione dei cibi necessari a un'alimentazione 
razionale. 
8) FARMACOLOGIA : PoS s.f. branca della medicina che studia 
i farmaci e la lore azione terapeutica sull'orga- 
nismo. 
21) TOSSICOLOGIA : PoS s.f. branca della medicine che studia 
la nature e gli effetti delle sostanze velenose e 
del lore antidoti. 
................................................................ 
searching for ... SPECIALITA' & $TUDIA 
I) CARDIOLOFIIA : PoS s.f. ({med.}) la speeialith che studia 
le funzioni e le malattie del cuore. 
................................................................ 
searching for ... RAMO& STUDIA 
3) ONOMASTICA : PoS s.f. ramo della linguistica che studia 
i nomi propri di persona o di luogo. 
................................................................ 
searching for ... SCIENZA & OCCUPA 
4) PAPIROLOGIA : PoS s.f. scienza che si occupa dello studio 
e dell'interpretazione degli antichi papiri. 
i) AUXOLOGIA : PoS S.f. discipline delle scienze biologiche 
che si occupa dell'accrescimento degli organismi, 
in particolare di quello umano. 
2) NEUROPSlCHIATRIA : PoS s.f. discipline medica che si 
occupa delle malattie nervose e mentali. 
................................................................ 
searching for ... DISCIPLINA & STUDIA 
I) ALGOLOGIA : PoS s.f. disciplina medica che studia to 
cause e le terapie del dolore. 
13) IMMUNOLOGIA : PoS s.f. discipline biologica che studia 
i fenomeni immunitari. 
................................................................ 
Fig. 6 - Same examples of queries to the dictionary 
in DBT form. 
This is an example of a pattern where the Labels SCIENCE 
and STUI)Y appear: 
!l)et/Adji SCIF.NCF, \[di NP/*Adj/e NP\] "che" (mediante NP) 
STUDY NP-OBJ 
where the tbllowing are the lemmas associated to the Labels: 
SCII;NCE = (scienza, disciplina, specialita', branca, ramo, parte) 
STUDY = (studia, si occupa di). 
NILOi3.1 is the subje(St matter of the science. 
The results of a first run through the whole dictionary using 
an initial set of patterns can afterwards be recursively revised 
when new data are acquired. Our practical global research 
strategy is to develop a system which at the beginning has only 
a generalized expertise. This system obviously breaks down at 
many points on its first rtm; we can then evaluate all these 
90 
points, and consider when and where measures must be taken 
to overcome specilic difficulties. In this way, ncw capabilities 
can be added incrementally to the system so that gradually it is 
able to cope with increasingly difficult data. Thus wc 
systematically add new "knowledge" to the system, prompted 
each time by a failure to cope with the given data. It seems to 
us that this is a practical research strategy For cliciting and 
modelling vague and fuzzy knowledge. 
liven though the methodological approach has been 
deliberately simplified at the beginning (in order to introduce 
problems gradually, a few at a time), the dimensions of the data 
have not bccn limited in any way. 
4. The knowledge organization. 
Although the body of knowledge with which we are dealing 
is at least partly based on intuition, on vague and not even 
coherent data (as lexicographic definitions often are), and on 
inductive empirical strategies, we must attempt to model the 
knowledge as the system acquires it. The formalism for the 
representation of word-senses is as follows. 
Each element is defined as a Function characterized by a 
Type and Arguments. The Type qualifies the function. The 
main types include: tlypernym, Relation, Qualifier, etc. 
Examples of the Type.Relation are: USED, PRODU('I~D, 
IN-TIIIM:OP, M, SI'IJI)Y, LACK, etc. The type llypernym 
can be instantiated by: !lyperriym proper, PART, SliT, etc. 
Arguments may be either Terms, or Terms plus Function, or 
Functions. A Term can be a Label, a Word, or a combination 
of these with the logical operators 'and/or'. A Word can be 
either a Word-form, or a Lemma plus Grammatical 
Information (e.g. INpl means plural Noun). 
The following definitions: 
Battcrio, s.m., microrganismo vcgetale unicellularc priw) di 
clorofilla. 
Batteriologia, s.f., parte della microbioloNa che studia i battcri. 
are now represented as: 
Batterio --def-- > f(T. t tYP,IN Imicrorganismo, 
f(T.QUAL,lAlvegetale, \]AlunicelMare), 
f(-I'.REL-I.,XCK,\[ N\[clorofilla)) 
Baltcliolo~a --def-- > f(T.IIYI'-PART, 
f(T.REL-SPEC,lNImicrobiologia), 
f(T.I~,I-I ,-ST U D,I Nplbanerio) ) . 
As the metalanguage and the rules are declared separately 
from the pattern-matching parser, the system is incremental, 
flexible, portable (it can be used with other languages or other 
dictionaries), and testable. In fact, the system has been 
designed so that it is easy to test alternative strategies or sets 
of rules or constraints. 
This kind of organization will allow us to draw inferences, 
using part of the formal structure associated to an entry and 
inserting it in other structures in which that entry appears as 
an Argument. For example, 'microbiologia' present in tile 
second definition above is dclined in its turn as 'parle della 
biologia the studia i microrganismi...', translated as 
(T.IIYI'-PAIUI',f(T.Rlil..SI'IiC,IN\[I~iologia)), and 'biologia' 
which is "scienza che studia i fenomeni della vita...' is finally 
defined as T.1IYP-SCII!NCIL This last l.abcl SCIENCE is 
obviously also inherited by 'Battcriologia' and by.. 
'Microbiologia', 
5. Nome experimcnlal results 
Ahcady alter just one run, by looking at cooccmrcnces of 
hypcmyms and particular relations, v,'e can identit}¢ those 
cnvironment~; in which certain relatio)~s arc most likely to 
appear, or in which certain ambiguous lcxical and/or syntactic 
cues (e.g. the prepositions PER 'for', DI 'o1", A 'to', etc.) can 
be disambiguated as referring to only one relation, or in which 
certain relations are never found, and so OIl, 
A set of constraining ;ulcs can be associated to anlain 
conceptual units (1 lypemyms or l~,clations, expanded 
automatically to all the pt:rtincnt lexical realizations) in order 
to disambigu;lic their immediate context. Some units therelbre 
activate l)axt(cular subroutines for au ad-hoc interpretation of 
what follows. These rtfles explicitly took lbr items to which a 
determined meaning is associated. In tile following pattern, 
we have a rule which, after an IJSI;I) relation, links the word 
"in" to a 'place' relation, thc woMs "pcr, a'" (for/to) to the 
purpose, "da" (by) to the agent, and "come" (as) to the wa 5 rig 
usage. Other kinds of relations are not ~tc(ivatcd bx. a particular 
rule, but ha\c a meaning in themselves, c.t,. ('ONSIII UI 1!1) 
BY, SIMII.AR "fO, ctc. 
IlYPt';R .... USEI) tt 
:omc NP (:: x~a.~) 
~cra Vmf. NI' (= imrposc) 
in NI' (= place) 
da NP (= agent) 
The analysis in SOmE cases is thercfbrc t',urposcly delayed 
until more relevant information has been acquired, and wi\]l 
eventually be based on the results of dclinitions already 
successl'ully handled. This analysis o\[" the litst resuhs will lead 
to an improven~ent of the system, adding other patterns or 
other surface realizations of already existing patterns to the lirst 
simple list of t)atterns, and also imposing constraints on given 
hypernyms or on given relations. I'heretbre, after the first 
stage, the system consists of patterns augmented with 
conditioning rules which will then drive subsequent runnings of 
tile procedure (\[br those cases which are lexically or 
grammatically conditioned). In this way, the system can be 
gradually retined. The analysis procedure is envisaged as a 
series of cycles which lind relevant cooccurrenccs of categories 
and relations that can then be set as conditioning rules to 
further guide successive searches. Art interactive phase is also 
foreseen so that, when necessary, definitions can be modified \[br 
a normalization in accordance to acceptable analysis structures. 
From succes!ive passes through the data, applying different and 
increasingly <efined sets of patterns and rules, the procedure 
huilds up, as completely as possible with this methodology, a 
formal description of the structure of the lexical definitions. 
At the end, from a comparison of the different formalized 
stuctures generated, we will be able to associate structures 
which differ for only one element (a conceptual category or 
relation). In this way, we can construct something like 
"minimal pairs" of sense-definitions, which only differ in one 
conceptual or relational feature. It can be reasonably supposed 
that this teature is related or realizes one of the differences 
between these words. It will also be possible to build 
hierarchies not only for hypcrnyms, but also, and more 
interestingly, for complex conceptual structures considered as a 
whole. 

ReferencEs 

It. Alsiiawi, Processing dictionary detinitions with phrasal 
pattern hierarchies, in Special Issue of CL on the Lexicon, 
f'ortheoming. 

R. Anlsler, A taxonomy fbr English nouns and verbs, in 
Proceedings of the 19th Annual Meeting of the ACL. Stanford 
(Ca), 1981, 133-138. 

J.I.. Binot, K. Jcnsen, A semantic expert using an on-line stan- 
dard dictionary, in Proceedings of the lOlh L1CAI, Milano, i987, 
709-714. 

B.K. Boguracv, Machine-readable dictionaries in computational 
linguistics research, in D. Walker, A Zampolli, N. Calzolari 
tealS.), forthcoming. 

R.J. B?(rd, Word formation in natural language processing 
systems, in Proceedings off the 81h LICAI, Karlsruhe, 1983, 
704-706. 

RJ Byrd, N. Calzolari, M.S. Chodorow, J.l.. Klavans, )¢I. Neff, 
O.A. Rizk, Tools and methods {ill" Computational I.EXiCOIOgy, 
in .lourna! (~' Computational Linguistics, forthcoming. 

N. Calzolari, Towards the organization of lexical dEfinitions on 
a database structure, in COLING 82, PraguE, Charles 
University, 1982, 61-64. 

N. Calzolari, l.exical dcfinitions in a computerized dictionary, 
in Computers and Artificial Intelligence, II(1983)3, 225-233. 

N. Calzolari, l)etecting patterns in a lexical database, in 
Procee'dings of the lOth International Conference on 
Computational Linguistics, Stanlbrd (Ca), 1984, 170-173. 

N. Calzolari, Structure and access in an automated dictionary 
and related issues, in D.Walker, A.Zampolli, N.Calzolari (eds.), 
tbrthcoming. 

M.S. Chodorow, RJ.Byrd, G.E. Heidorn, Extracting semantic 
hierarchies from a large on-line dictionary, in Proceedings of the 
23rd Annual Meeting of the ACL, Chicago (Ill), 1985, 299-304. 

Garzanti, ll nuovo dizionario Italiano Garzanti, Garzanti: 
Milano, 1984. 

D.B. Lenat, E.A. Feigenbaum, On the thresholds of knowledge, 
in Proceedings of the lOth IJCAI, Milano, 1987, 1173-1182. 

J. Markowitz, T. Ahlswede, M. Evens, Semantically significant 
patterns in dictionary definitions, in Proceedings of the 24th 
Annual Meeting of the ACL, New York, 1986, 112-119. 

A. Michiels, Expoiting a large dictionary database, Ph.D. thesis, 
Liege, 1982. 

.1. Olney, D. Ramsey, From machine-readable dictionaries to a 
lexicon tester: progress, plans, and an offer, in Computer 
Studies in the Humanities and Verbal Behavior, 3(1972)4, 
213-220. 

E. Picchi, Textual Data Base, in Proceedings of the luternational 
Conference on Data Bases in the flumanities and Social 
Sciences, Rutgers University Library: New Brunswick, 1983. 

I;. Picchi, N. Calzolari, Textual perspectives through an 
automatized lexicon, in Proceedings of the XII International 
ALLC Conference, Slatkine: Geneve, 1986. 

D. Sherman, A new computer format for Webster's Seventh 
Collegiate Dictionary, in Computer:~ and the IIumanities, 
V111(1974), 21-26. 

D. Walker, A. Zampolli, N. Calzolari (eds.), Towards a 
polytheoretical lexical database, Pisa, I LC, 1987. 

D. Walker, A. Zampolli, N. Calzolari (eds.), Automating the 
Lexicon." Research and Practice in a Multilingual Environment, 
Proceedings of a Workshop held in Grosseto, Cambridge 
University Press, forthcoming. 

Y. Wilks, D. Fass, C.M. Guo, J.E. McDonald, T. Plate, B.M. 
Slator, A tractable machine dictionary as a resource for 
computational semantics, MCCS-87-105, New Mexico State 
University, 1987. 

A. Zampolli, Perspectives for an Italian Multifunctional Lexical 
Database, in A. Zampolli (ed.), Studies in honour of Roberto 
Busa S.J., Giardini: Pisa, 1987. 

N. Zingarelli, Vocabolario della Lingua ltaliana, Zanichelli: 
Bologna, 1970. 
