Lexicon Design Using a Paradigmatic Approach 
Cristian Dumitrescu 
Research Institute for Informatics 
Alex. Averescu Blvd. 8-10, 
71316 Bucharest 1, Romania, Fax: 653095 
Abstract 
The paper describes models for representation and meth6d~ to handle lexicograp.hic_structures supplied .by the 
MORPHO-2 sfftem. It was built to m_~n.~ge mono "hfigual lexicons and to incorporate lexical proce~in~ 
1 Introduction 
Most advaneed systems for natural lanL~3ge " prgcessing use 
powerfullexicons and mg_rp_ho-lexical proc~ssi~g.~ ehvironments. • The s!r4. em descn\])ed b~low enables monolifigual lexicon han- 
dling and incorporat ~ morpho-lexical processes (i.e. word-form anaanaFysis" and sy~he.sis'_..~sl a_t _at lexicon, leveL At present,., it works as a 
component otthe/U _RES envaronment fQi building natural lan- guage app.lications Ohlis and Cristea,. _19&_5). 
S~c6 in our appr_oach .the morphological processes .obey a paradigmatic morphology (Ttt~.- 19~), word-f6rms analysis and 
synthesis take into account, ont~ grammatical endings.(wlj.'ch in- 
dudes both desinences and sutr@s. ) and the lexicons handled by MORPHO-2 system are root- or lemma-oriented. By. lexicon we 
don't mean only a collection of roots and associated features. At 
lexicon level we also encounter the control structures needed for morphological prow.ssi~, morpho-lexical acquisition menus, 
root modification rules, word-forms synthesis rules, etc. The services provided by the system may be ch¢.~ified accord- 
ing to the follox;vhag, g o,o,~: morphological model .design, lexi.cal st6ck building and morpho-lexicfil pro6essing(Dumitrescu, 1991). 
2 Morphological Model Design 
In order to build the mo.rphological model, an inte~ated environ- ment which allows edi'-tmg , viewing and comping the mor- 
pholoffical model descrioti6n, is available to the 
De-fining the morp.ho\].ogical model takes place m several steps, 
during " .whiCh the lin~afist hfis to specify the following; 
a) the categories, subcategories, features and their values, in a hierarchical manner 
b) the paradigmatic descriptions 
c) the default feature specifications associated to each paradigmatic description 
d) the lemma - entry correspondence, for each paradig- matic description 
e) the inflectional paradigms and root detection rules. . The hierarchical de~crip0.'on of features is achieved by cor- 
relating several feature sp&ifications.+A feature s pe:#fication is given re, the fo .rna of a,(fe~ture: v~lu 9 ) pair.. We ~ a paradig- 
matic 9escripdon. a. hierarchical description build of several stmple (teature: value) pairs. 
F.lgu}e 1 ~ W-~nts, in the forrg_ of an incomplete tree, the hierarchical deki-ipfign Of features trom the morbhologi.cal 
model.for the Romanifin ~. By tree trave.rsal, all paradig- matic dgscriptions of the m.oclel ff~ay b~y generated. 
Each non-terminal node contains a single feature speck'_ca- tion. The leaf nodes .may contain one or more teature (#ca- 
tions: A._gzordi~g " .to the su~r selection criteria, whib..h is a~ plmd when xasmng a non-terminal node, we can distingm~a 
Y'2 
COMMON PROPER IND CJ ,,, GER /- .... /. 
~\[/.~ ~s~ 
/ ". 
PL 1 2 
N/A G/D 
F'~mre 1 Hierarchical description of features 
O-lOOSE nodes(when only one successor is selected) or V'OREACH 
nodes (_when the individual selection of each successor is re- quired). In the figure, a V'OREACH node is outlined by a curve 
drawn over the emerging edges: By trave~ the tree across the longest path_ which starts trom the root node, thru CHOOSE nodes 
only, th6 selector of a paradigmatic description is obtained (e.g. 
CAT = NOUN&SCAT = COMMON&GEN = ~ CAT = VERB). The _description attached to a leaf node 
ts represented by means of a morpho-lexical acqui#. "tion scenario. A scenario entry 
(further on referred to as a slot) corresponds to a point of the parad~aatic de4~'ption spa~. 
Selectors of those descriptions allowin K default feature 
specifications are attached with (feature: value )pairs which are 
~uk inheritances of the corre.sponding slots, fn our example 
me tollowing association is l~asu'ble: (CAT= VB) - > (pER 12 3). 
The area of the m9rphplogi .c.c.c.c # model where the lemma - entry (from par adigmati.c - de£~ripdpn ) correspondenees are descrl\]ged, 
.consi~.- in a specification ot the points from the paradigmatic . .de~ipdon spa_ces , which characterize the lemma field from the 
lexicon entry. Thks way, the lexical level required by the lexical transfer is ensured. 
The .last step in. the morphologi.'cal model description is to inform, me s3~stem about how to buTdd inflexional p~adigms - and 
root detection rules. For each paradigma_ tic desci-iption the lin- 
t~st~ may specify more paradiginatic encfing " families from which t e system then builds the inflectional paradigms. For the 
Rom/mian hnguage, there have been identified 136 inflectional paradigtns. ). 
Bas~l o(n ~'I~e 1.mf198~onal pgradign~ the system will determine 
the rules for root detection and word-lorm generation. Such a rule has the follpwing form: 
< inflexion >: = (< inflectional-paradigm > < slot-number >) with the followin~ meani~_.: 
a) if a word ends in < inflexion > then 
  
 247 
~dae root is what rernaim from the word after dropping 
the < inflexion > 
~dae root belongs to the < inltectional-Lgaradig m > 
othe contextual-information corresponding to the current word is ~'en by < slot-number > 
b) g a root belongs to the < inflectional-paradigm > and 
it is used in the context given by < slot -numlSer > then 
othe word is obtained-by concatenating the given root with the < inflexion >. 
The lexicographer's interface is stric0y deoendent on the 
Sofpeo-fim~ejCatiomfrom the linguist's interface sifice a'lar~e part of the ormer is built automatically from the spedfications o-f tile latter. 
3 Lexical Stock Building 
MORPHO-2 lets the lexi.cog~apher define new entries in the lexi- 
con by means of a user-frien ..dly window oriented interface. Alexicon entry, has the foll6wing formal structure: 
< entry> :: = (< lemma > 
(< par a .digm,atic-description-selector > 
< inflexional-paradigni > .. 
< root > < morpho\].ogic-description >) , , , 
(<syntactic-des~ption> < senianfic-description> ) ) ) 
The fields < lemma >, < _p,gradigmafi'c-deslaJption-selector > 
and < inflexional-paradigm > have 0ae obvious meaning. 
The field < root > may contain one or more roots~Inserting 
r .oots in the lexicon takes place in such a way that these should 
inherit the morphologicaI descriptions bel6nging to the slots 
where they occur. 
By < syntactic-description > ~ refer to restrictions on co- 
occurrence with other words .(or phrases). In order to slxcify 
such restrictions for the Roman~ ~6 we have perfor/ned 
subeategorization of verbs based on their valency, object 
categories which .they govern(ea,~ a direct object may be an ac- 
cu~.a. -tive noun without pre0osition, a reflexive pronoun or a non- finite form of a verb) and semantic features.The latter allow a 
noncontextual subcatego.rizati.on (for exarn ple of nouns_) and a 
contexual one ~ sel~tional resfrictions (ih the case of ve.rbs). 
Typically, verb~ have a valency between 1 and 3 (th. otigh imp&sonaI verbs may have valen .cy 0). The inlransitive verbs.are 
claksified accor .ding to semantic criteria (verbs of motion~ state) or by their syntactic usa~ (like predicativi~ auxiliaries, urgpers6nal 
verbs with dative ). W'e should-notice that the same verb maybe transitiv 9 or intransitive, accor "ding to its m e.,~fin~; for example a 
ajuq. ~ (to get to) with the meaniffg a pt/nde (to catch) is trangitive 
and ~tli th~ mefining aft su~ (tobe enofigh) is h/transitive. 
Trivalent.verbs "m.iSlude verbs tgking: 
~wo direct., objects which have different meanings and are 
not coordinated (the first one is doubled by an accusative, personal pronotai). 
Pe Ion l-am asoaltat/ect/a. 
I examined Ion about the/esson. 
oa direct object and an object clause L-am rfigat sa-mi bnpnvn~ pi~. 
.l aske4 .mm to tend me the pe~. . oa ff~rect ~ject and an inc~rect object 
L-am iritrebat desp~ cane. 
l asked him ~ the book. 
For each syntactic .des~i. "ption, the lexicographer ma.y provide 
one or more semantic degcril~ion~ The <semanti6-clesc0" p- 
fion > field contaim the name oT a case-frame structure placed m 
a generic-specific hierarchy. The actual semantic descriI~tious are 
stored in a separate data area, than the rest of the leficon, and 
they are managed independently of MORPHO-2. 
A lexicon e21itor offers the lekicographer commands for delet- 
~g. modifying a lexicon entry and ~aen" listing according t 9 dif- 
ferent requests with respect to entry fields (Dtffnilxescu,1991). 
4 Morpho-Lexical Processing 
The target natural l,~mguage processing system is the beneficiary 
of the morpho-lexical processes execut~ byMORPHO-2. _Word- 
forms ~ and synf.hesis are mediated b~ a proce.ss interlace. 
In the case of 1.~c~.. analysis, if the interface is giyen a se- 
quence_of words, it will return a sequence of morpho-lexical 
atoms. "lhe stru~ure of these atoms is presented belo~ 
(<root> (<lemma> 
(< para~cMesc0"pfi'on-sclector > < fiaorph~logic .-~:xi. "l~tion> .... 
(< synt~_~c-t~escriptioh> <semandcMgscfi'ption> ) ) ) ) 
A morphological descriodon contaM~ both contextual and context-free inf6i'mafion. The former is oOained from en "ding 
and the latter from the lexicon en_tly corresponding to th~ 
root. The information for the other fields from the atom stru~ure 
is ~ taken from the lexicon entw_ correslx)nding, to the root. 
With respect to the result of morphoqogi _c91 congruence and 
root relxievfl within the lexico~ we may daTsfifv the moroho-lexi- cal atoms 
as unambiguous, am _b~ru_ous ~ind undetermined. 
The unambiguous morlaho-lexical atoms assodate the analyzed word with a single \[emma. Inthe case of a root which 
.corr~po.nds to one lemma and ~ more poss~.qgle morpholegi.cal descriflions, for the same para.digmatic description selector, the 
system will attempt to compact them. 
The ambiguous morpfio-lexical atoms come from words to 
which severaHemmae .may be attached. The association of a root 
with several lemmae is possible either due to ambiguity ot 
category (e.g. noun vs. verb) or to apparent homography, 
gene~at&1193/ the absence of prosodic ma?rkers in the R0hmifi~n 
Kangt~. (m6dele, mod6le, ac61e, ficele, modfil, m6dtfl, etc.). The lX~O_~ iriterpretafions are ordered in such way that those Which 
come from shorter roots (that means longer ending) have prio.rity. 
The undetermined morpho-lexical atoms correspond to 
words which have no entry in the lexicon. The atoms generated in 
this situation have the foll6wing structure: 
(UNKNOWN < unknown-word> 
( < ixm~'ble-root > < morphologicMe.scri'pdon >'~*) 
The unknown word is associated with aql legal segmentations 
and for each of them the morphological inf6i'rnation deduced 
from the identified en "dings is prOcidex\[ Lexical synthesis is the reverse of lexical analysk The process 
interface ensures conversion of a morpho-lexical atom sefluence 
into a word _Sg:luence. The morpho-lekical, synthe.sis requi/es the descri'ption of niorpho-lexical atoms accordifig to the pattern: 
I. < entry-identifier > < morpholoKjc .Mesc~.p0.'on > 
<svntachc-de =riiXion>) .whe~ < entry-identifier > maybe a lemma, a root or a semantic 
descri~on. 
We have to point out that previous to morpho-lexical analysis 
and svnthe.~" tile target pr~r may co n0g~re the structureot 
morpho-.lexical atoms according to tlie desired application, by 
means ot a communication protocol. 
5 Implementation 
The MORPHO project, started in 1986, has achieved as a first 
result, a prototype version now available on a PDP-11 compatible 
computer. The second version of the system, the one presented in 
this_~aper, is ing.plemented in C on a IBM-PC. compah'ble. The network re~ntafion of data and techrfiques used for, 
implementation, lille lexicon in d_e "_rag using prefixekt virtual B . 
trees _(b,xsed on which, for 20000 i~seud'o:random generated 
words, of variable length, retrieval r&l .uires 2 external acxx.sses only), have led to an ave.rage response time of lexical processes, 
qutte independent of the leficons ~ (for more.details on perfor- mance an$1ysis see (Tufts and Dumitrescu,19~)). 

References 

Dum/aescu, C. MORPHO user manual. I.CJ, Bucharest, 1991 
Hfis, D, C_Mgea D. IURES: A human enginge "nng approach to natural language question amwe_rh3g - . Artificial Intell~ence: 

M ethodolog~, S~stems, Applications, North Holland, Elsevier,1985 
Tufi.gD. It would 1~ much easier if WENT were GOED. 
Pfbceedings of the 4"' Conference of ECACL, Manchester, 1989 

TufigD, ~cu, C: MOI~PHO - A dictionary, management syy~tem. Proceedings of the 13 ~' International Semihar on DBMS, 
Mamai~, 1990 
