TOWARDS THE ORGANIZATION OY LEXICAL DEPINITIONS ON A DATABASE 
STRUCTURE 
Niooletta Calzolari 
Istituto di Glottologia - Untversit~ di Flea, Italy 
Printed dictionaries are grest repositories of informat- 
ion, and it is important that they can be exploited as full~ 
as possible, with regard to all the different types of data 
they contain. This was one of the aims when organizing the 
Machine D£otionax7 of the Italian language on a database 
structure. 
The design and organization of the iexical database for 
the first two relations implemented, i.e. the set of Le~s 
(106, 091) and the set of Word-forms (1,016,320), has been 
described in other papers (see for example Calzolari and Ce¢~ 
oott£, 1980). 
These two very large archives are maintaine~ continuous- 
\].7 on-line and are interactively invoked through a query lang- 
uage whioh permits to the user to access, in transparent mode, 
the data, and to have his particular "view" of the data. The 
database concept and methodology give rise, in fact, to a 
radical change in perspective when confronted with sequential 
organization of data. We have a dynamic rather than a statio 
object which is flexible and easy to query, update, extend. 
This lexical database is now being extended by the 
insertion of lexioal definitions (185,899) and semantic data. 
The guiding principle behind this pro~ect is the ¢onviction 
that the study of the defining vocabulary of an actual dict- 
ionary can provide a precious tool in the semantic analysis 
- 61 - 
of a language (see Noel, 1981). 
The logical or~ni~-atlon of this definitional Infoz:at- 
ion is not a trivial task, and must be performed bearing in 
.~nd the goals to be achieved. It must in fact be possible to 
~ve dlrect access to each and every piece of l~fo~tlon 
contained in the definitions. The significance of "piece of 
lnfo1~atlon" in this context is in ~Lreot relationship to the 
eventual use to be made of it. By "piece of information" in- 
side the definitions, we intend not only the single woe-fezes, 
as they are written in the definitions, but also the lemma, 
to which every word-form is connected; moreover, at a further 
stage of analysis, the specific sense of every polysemio lemma 
in the particul~r context (context:definition) must be consid- 
ered° 
The logical or~isatton of the definitional part ot the 
database must, therefore, be structured to provide, for each 
word in every defiuition, direct access to: a) the word.form 
itself, with the associated information (morphological, usage 
level, etco); b) the lama to which the word-form pertains, 
with the associated information (part-of speech, variants, 
usage level other word forms ice. paradi~n); c) the specific 
sense of the le~ma° The implementation of a definitional arch- 
ive thus requires an enormous task of dissmbi~ation at all 
the three levels: word-forms, lemmas and senses, in order to 
produce material which can be used effectively to extract 
semantic information fran the dictionary. ~ 
The first step in this direction ~s the lemmatization of 
the definitions themselves° For this task,the other two archiv- 
es of the database (the word-form and lemma~ archives) ere be- 
ing used, together with ad hoo prooedu_~es, to produce an auto- 
rustic lemmatizatlon of a large percentage of the words con- 
rained in the defi~itions° For the other words, those for 
which automatic lemmatization has not 7et been achieved, a 
dissmbiguatlon strate~ has been developed in which the human 
- 62 - 
operator works iuteraotively with the computer, and the 
computer can memorize choices on homographic forms as they 
are made. 
After lemmatization, each word is associated in the 
computer memory to the addresses of its word-form and of its 
lamina. Therefore, the definitions are organized in the memory 
not as actual strings of words, but as lists of addresses of 
word-forms and laminas. In this way, a number of important 
results are achieved: a) a great reduction in storage size; 
b) data types (addresses i.e. binary numbers) which are easi- 
ly bandied by the computer; c) data which are strictly assoc- 
iated to the first two archives, ~ at the eventual const- 
ruction of an integrated system; d) much more rapid data 
processing and direct accesses to each kind of data, in each 
position of the definition itself; e) the possibility of be- 
ing able to immediately retramslate addresses into character 
strJ.ugs, and list of addresses into phrases, i.e. definitions; 
f) the possibility of correcting, updating and iuserti~ 
within the deletions. 
0mly once this preliminsa-y stage has been completed is 
it possible to extract many kinds of semantic information 
from the diotionaxT. The memorized definitions have an intern- 
al logical structure which permits the construction of semant- 
ic chains (to evidence taxonomic relationships)and also of 
other types of semantic links (to evidence other types of 
semantic relationships, such as "part of', "set of', "in the 
form of', "apt to', etc.) between words in the lexicon. These 
chains and links, which can be not only displayed, but also 
handled by computer procedures in many different ways, surely 
provide a good starting point for the study of the semantic 
structure of the lexicon. In fact, it is hoped that the com- 
puterized d~cti0nary will offer a model of the Italian lexical 
system in the various aspects which can be associated with a 
lexicon (phonology, morphology, syntax i.e. verbal frames, 
- 63 - 
lezioal semantics), TILts approach is included in the general 
theoretical viewwhich considers the lexicon as a central 
reference point both for language analysis and for many ling- 
uistic applications. 

References

Cs~zolari, N., M. L. Ceccotti, "A project for an exhaus%ive 
lexical database s~etea", in Proceed~e of the Second 
International Coherence on Data Bases In the Humanit- 
ies and Social Sciences, 1980, Madrid, In press. 

Noel, J., "The Lon~aan-Liege Dictionary project", Ccngres 
International Informatique et Sciences Humaines, Liege, 
18--21 nov. 1981. 

Procter, P., "Problems i~ dictionary making", Congres Intern- 
ational Y.ufox~atique et Sciences Humaines, Liege, 18-21 
nov. 1981. 
