A High-level Morphological Description Language 
Exploiting Inflectional Paradigms 
Peter Anick and Suzanne Artemieff 
Digital Equipment Corporation 
111 Locke Drive, LMO2d/DI2 
Marlboro, MA 01752 
anick@aiag.enet.dec.cnm 
Abstract 
A high-level language lor the description of inflectional 
morphology is presented, in which the organization of 
word lormation rules into an ii~herilance hierarchy of 
paradigms allows lora natural encoding of the kinds of 
nfles typically pre~uted in grammar txroks. We show 
how tim language, composed of orthographic rides, word 
formation rules, and paradigm inheritance, can be com- 
piled into a run-time data structure for efficient morpho- 
logical analysis and generation with a dynamic secondary 
storage lexicon. 
1 Introduction 
Pedagogical grmnmar Nalks typically organize their de- 
scriptinns of the inflectiomd morphology of a langtmge in 
terms of paradigms, groups of rnlas which characterize 
the inflectional behavior of some subset of the language's 
vocabulary. A French grannnar may divide verbs into the 
first, secoud, and third conjugations; German grammars 
speak of "weak" and "strong" verbs; Spanish grammars 
classify verbs by their infiuitival endings, etc. The family 
nf word forms that each v(x:abuhu'y item may have can 
thus he describexl by a combination of a ba~ stem (such 
as the "citation lbrm" used to index words in a dictionary) 
and the paradigm the word belongs to. Irregular words, 
which exhibit belu~viors not completely captured by gen- 
eral paradigms, often tend to be partially describable by 
reference to regular parudigmatic patterns. 
The word formation rules that comprise a paradigm are 
usually expressed in terms of a sequence of stem change 
and affixation operations. For example, one French text- 
Imok \[NEBEL741, in describing first conjugation verbs, 
shows how to fi)rm present tense forms nsing the inliniti- 
val stem with its "er" suffix rmnoved. Future tense is 
tormed by appending ',fffixes to the fifll infinitival stem, 
while the stem of the imperfect tense is Ionnd by taking 
the first person plural of the present tense and dropping 
the "ons". Ill addition to such word formation roles, there 
are spelling change rules wbich describe variations ill 
spelling, often ctmditioned by file phonologic~d or ortho- 
graphic context th which a word lbrnlation rule is applied. 
While the above characterization of morphological behav- 
ior is a huniliar oue, inost description languages that have 
been developed for cumputatioual morphology (e.g., I KC)- 
SKENNIEMI841, \[G(}RZ881 ) have tended to locus more 
on the orthographic and of fixation rules, and pay less at- 
tention to explicitly captaring the regularities within and 
between parudignts. Recently, some researchers have be- 
gun exploring the advantages to be derived from a nora. 
tion in which paradigms play a more central role (e.g., 
\[CALDER891. IRUSSELL911). This paper presents such 
a notation, called PDL (for Paradigm Description l.au- 
guage), which we are using as the basis of the morpho- 
logical an~dyzer for A1-STARS, a multi-lingual "lexicon- 
assisted" informatiml retrieval system (\[ANICK901). It 
has been a goal of our high-level language design to pre- 
~rve, as umch as possible, the kinds of descriptive de- 
vices traditiorually used in grammar books. 
Our approach to the representation of pmacfigms borrows 
from the Artificial Intelligence cmnmunity's notion of 
"frames", data structures made up (ff slots with attached 
procedares, orgmlized hierarchically to snpport default 
slot inheritance and overtkles (e.g., \[BOBROW771). In a 
paradigm's "frume", the slots correspond to surlace anti 
stem li)nns, whose values are either explicitly stoxvd (in 
the lexicon) or else computed by word formation rules. 
The hierarchical organization of paradigms helps to cap- 
tore the sharexl linguistic behaviors among classes of 
words in all explicit and concise mlnnler. 
Our ;qlplicatiou domain introdnces several constraints on 
the design of its morphological component: 
- The morphological recognizer must work with a dy- 
namic secondary storage lexicnn access~xl via an in- 
dex on stem tornls. 'Ibis constratht rnles out ap- 
proaches relying on a left to right scan of file wool 
using special in-mmnory letter Iree eucodings of the 
dictionary (e.g., \[GORZ881). It requires an approach 
Acri?s DE COLING-92, NANTES, 23-28 AO~n" 1992 6 7 Proc. of COLING-92, NANTES, AUG. 23-28, 1992 
in which potential stems are derived by affix rc- 
moral/addition and/or stem chat,ges and then probed 
for in the lexicon. 
• The morphoh)gical information must additionally 
support surface form genemtiun and "guessing". The 
guesser, to be employed in computer-assisted lexicon 
acquisition, mast he able to construct potenti:tl cita- 
tion forms (e.g., infinitive lorms lot verbs), not just 
stripped stems. 
• The high-level language (PDL) mast be compilable 
into a lonn suitable for efficient run-time perfonn- 
ancc. 'Illis implies not only efficient in-memory data 
structures but also a system which minimizes disk 
(lexicon) accesses. 
Our aim is to develop morphological rcpmsenlations tbr a 
number of (primarily European) hmguages. We have 
built t~firly complete representations h)r English, French, 
and Gennan, and have begun invcsfigating Spanish. 
While it is premature to predict how well our approach 
will apply across the range of European langnages, we 
have fimnd it contains a nnmher of desirable aspects for 
applications such as AI-STARS. 
in the next section, we provide an overview of the PDL 
hmguage, describing how word fonnation rules are organ- 
ized into a hierarchy of paradigms and how the lexicon 
and morphological rules interact. Then we provide an il- 
lustration of the use of paradigm inheritance to construct a 
concise encoding of French verb forms. Next we present 
algorithms for the compilation of PDL rote efficient run- 
time data structures, and lot the recognition and genera- 
tion of word fi)rms. We conclude with an evaluation of 
the strengths anti weaknesses of rite approach, and areas 
for future research. 
2 Paradigm Description Language 
Oar paradigm description language (PDL) is composcd of 
three major components - form rules, an inheritance hier- 
archy of paradigms, and orthographic rules. 
2.1 Form Rules 
We divide word lorms into 
• surface forms, which are those that show tip in a 
text, 
• lexical forms, which are those that are stored di- 
rectly in the lexicon, and 
• intermediate forms, those forms created by affixa- 
tion or stem-change operations applied to other 
lorms. These terms may not ever show up in a text 
but are useful in describing intermediate steps in the 
construction of surlhce lorms from lexical fi)rms. 
In the form ennstruction rules, we distinguish between 
two major categories of strings. Stems are any forms 
which include the primary \[exical base of tile word, 
whereas affixes comprise tile prefixes and suffixes which 
can be concatenated with a stein in the process of word 
formation. Once an affix is appended to or removed from 
a stem, the result is also a stem, since tire result also in- 
cludes the primary lexical base. Form construction rides 
,are restrictexl to the five cases below: 
• <form> : <stem> + <affix> 
• <form> : <stem> - <affix> 
• <form> : + <affix> <stem> 
• <form>: - <affix> <stem> 
• <forul> : <stein> 
The <lotto> is a name for the string form created by the 
rule. <stem> is the name of a stein form. <affix> may be 
a prefix or suffix string (or string variable), its type (i.c., 
prefix or suffix) impliexl by its position before or after the 
<stcm> in the rulc. The operator (+ or -) always precexles 
the affix. If +, then the affix is appended to the stem as a 
prefix or suffix. If -, then the affix is removexl from the 
stem. The rest,lting <lorm> name may in turn be used as 
a stem in the consU'uction el some other k}rm. In this 
way, the construction of a surface form may be described 
via a succession of affixatinn or stem-change operations, 
each operation described in a single rule. 
The special syndml LEX may be used in the right-hand- 
side of a form rule to imlicate that the tonn is stored as a 
lexical stem in the lexicon. 
Grammatical \[~ttures may be associated with form names, 
as follows: 
<form> \[<feature> = <vahte>, 
<feature> = <value>, ...\] 
2.2 Paradigms 
A paradigm in PDL is composed of a set of term con- 
struction rules which collectively characterize the filmily 
of surface forms for those words which belong tn that 
paradigm. To capture the similarities among paradigms 
and to avoid redundancy in the description of a language, 
we allow one paradigm to be based on another paradigm. 
If paradigm B is based on paradigm A, then all the fimns 
and fi)rm construction rules that have been defined R)r 
paradigm A also apply, by default, to paradigm B. We 
can then differentiate paradigm B ti'om A in three ways: 
I. We can add new lorms and their conslrnction rules 
fi~r tbrms that do not exist in A. 
ACN-~S DE COLING-92, NAiVFES, 23-28 AO6"r 1992 6 8 PREC. O1.' COLING-92, NANTES, AUG. 23-28, 1992 
2. We cue rewrite (override) tile construction rnles 
tor tornls Ihal do exist in A. 
3, if a li)rra in A is no longer applicable in \[I, we can 
delete it lionl t3. 
Note that the l~ttnre set(s) associated with lornl names 
cannot change froin paradignl to l)aradignl; fornl nanles 
are nniversal, denoting tim same lcatures regardless of 
where they appear. 
Ill order to facilitate the capture of generalizations across 
paradigms, we allow tile definition of abstract pamdignls. 
These ;ire paradigms to which no words of a langnago ac- 
tnally belong, hut which contain a sot of tbrnls and con- 
smictions which other paradigms have in connnon. Thus 
a COllCrCic paradignl nlay be based on shine ()tiler concrete 
paradigm or on an abstract l)aradigm. Likewise, air ab- 
stract paradigm nlay itself be based on yet another ab- 
stract (or concrete) paradigm. 
The ability to base one paradignl on another, combined 
with the ability to represent intermediate stenl forms ;IS 
slols in a paradigm, is a very lXlwerful feature of our mor- 
phological description langnage. Not only does it allow 
for paradign/descriptions that correspond closely with the 
kinds of descriptions lonnd in graminar hooks, but, since 
the roguhirilies alnong paradignls can Ix: ahstracled ont 
and shared hy nniliil/te llaradiglns, it alklws for very con- 
cise descrilltions el ioiloctional hehavinr (inchlding 
subregularities often overlooked in graulnlar hooks), ;.is il- 
Inslrated in section 3. 
2.3 Orthographic Rules 
l,'orm COllSlfnction rules describe which stems can coln- 
bine with which aflixes to create new |orms. The con- 
catenation or removal of all affix may in some cases result 
ill fl spoiling change other than tile mere concatenation or 
removal of tile affix string. In English, inany words end- 
ing in a vowel followed by a consonant will donble the fi- 
nal consonant whml an affix starting with a vowel is ap- 
pended, ill French, the addition of certain affixes requires 
that ;in "e" in the stein of some verbs be rewritten as "~,". 
Since these spelling change rules ;ire often hased on gen- 
eral phonological/orthographic llroperties nf alfixes and 
steins, rather lhnn llle specific forln rules Ihe, lnsolvos, and 
hmlce may apply acrnss paradigms, we supllort the m¢le- 
poudent st×~cificatinn of spelling rules caplnring lheso 
changes. Each rnle is written to allply to the orthographic 
context of a slen/and affix at tile point el the concatena- 
tion or deletion opontiion. Thus, there ;ire two kinds of 
spelling rules: 
1, Suffix rules, which describe spelling changes ap- 
plying to the end of tile stem and the hoginnmg of 
the snffix, and 
2. Prefix rnles, which describe spelliug changes al I- 
IIlying to tile end el lhc prelix and tim beginning of 
the stein, 
A sllelling rule can make reference to literal strings and 
variables. A vnriahle refers to a nanled Set of characters 
and/or slrings, snch as Vowel f,a,e,i,oai) or Dental 
(d,t,dn,m,chn,fn,gn). The grammar writer nray define 
snch sets and variables ranging over those gets. 
The general feral of a suffix spelling rule is ;is fklllows: 
(<parameter>*) \[<slcm-paneHl>l <opcrator>{<aflix paneul> I 
> \[<mergtul panern>\] {<lots>} 
The opelator may he either ~ or , indicnting concatena- 
tion and deletion respectively. The <incrged-pattern> re 
liars to tile term constructed by perfornlmg tile operation 
on a Stem and alfix. The two pattelns tin tile left of tile 
arrow refer lo tile slem anti affix parlicipating ill tile con 
struction. Each pattern is a list of variables and/or literal 
strings. Whenever tile stlnle variable nanle appears more 
I\]lan once ill the rule, it is assnlned to take on tile salile 
value throughout. 
<paranletcr> is a lloole:in condition on the applicallility of 
tile spelling ride, It it necessary for ttloso cases wilere tile 
application of the rnle depends on iuik)rlnntion al)ont the 
lexical ilcln whk'h is not inclnded in Ih¢ orlhograllhy. 
(Like {BEAR88 I, we choose to represeot these conditions 
;is featnres rather tllan ;is diacritics I KOSKENNIEMIB4 I,) 
All exanlllle in linglish where a parameter is necessary is 
lhe case of gonlinating final consonants. GelninaLinn tle- 
pends on llhonological ciiaracteristics which ;ire not pro- 
dictahle fronl tile spelling alone. Only words whose lexi- 
cnl entries contain the specified parameter valne will 
nndergo spelling changes sensitive to that parameter. 
Specifying orthographic rules indel~ndently el the spe- 
cific affixe, s to which they apply allows for a more coucise 
declarative rcpresenlu\[ioll, as regnklritics across pal'a- 
digms and Ibrms can I~,, abstracted out. However, there 
are cases in which the application of ;in orthographic rnle 
is constrained to specific paradigms or to specific forms 
wilhin a paradigin. The oplional <h/cs> qualifier can Ix: 
nsed to liniit the paradignis and/or specific lornis it) which 
the orthographic rifle applies. 
Prefixntion rules are exliressed ill a similar nlalnler, c, xcept 
that tile <operator> precedes the first pattern in tile left 
haud side. Stein changes fin whk;h a stein undergoes a 
spclliug change in the absence of ally affixalion ot)elation ) 
are llandled hy the association of an orthographic rule 
wilh a fornl rule el tile lorni <:folul> : <stem>. The, ortho- 
graphic rule in snch a case wonhl contain no affix pattern. 
t lore we illnslrato a hypothelical spelling rule: 
I"a" Cons Consl i/Vowell > "t2" (?tills Vowel 
Ac+rEs DI! COLING 92, NANqES. 23-28 AO~r 1992 6 9 F"XoC. OF C()1,IN(3-92, NANTES. AUG. 23-28, 1992 
This is a suffix rule, since the operator precedes the sec- 
ond left-hand-side pattern. Accordingly, the <stem- 
pattern> refers to the characters at the end of the stem 
while the <affix-pattern> refers to the letters at the begin- 
ning of the affix. This rule states that, if we are append- 
ing an affix which begins with a vowel to a stem which 
ends in the character "a" followed by two identical conso- 
nants, then we construct the resulting form (<merged- 
pattern>) as follows: 
1. Remove the last three characters from the stem, 
leaving <sub-stem>. 
2. Remove the first character from the suffix, leaving 
<sub-allix>. 
3. Construct the string <spell-change> by concatenat- 
ing the strings and iastantiated character variables 
described by the right-hand-side pattern. 
4. Construct the resulting form as the concatenation 
of the strings <sub-stem>, <spell-change>, and 
<sub-affix>. 
2.4 The Lexicon 
We have seen above how one paradigm can be based on 
another, thereby allowing lorm conslruction roles to be 
"inherited" by paradigms. This inherit~mce is controlled 
through the form names themselves. If we have a para- 
digm B based on paradigm A, then any form rules in A 
for which there is no rule in B with the same form name 
are by detroit assumed to be part of paradigm B. 
Although onr lexicon is maintained as a secondary storage 
database with entries represented and indexed differently 
from the (memory resident) paradigms, it is useful to 
think of a lexical entry as "inheriting" rules from its para- 
digm ~ts well. The inflectional behavior of any individnal 
word will depend on both the information inherited from 
its paradigm and the information stored in the lexicon. 
Lexicon entries contain the equivalent of a single kind of 
form construction rule: 
<fi)rm> : <stem>/{ supersede I augment} 
The interaction of lexical information with the word's 
p~tradigm is as fi)llows: 
• If <form> correspends to a lexical stem nile in the 
paradigm (i.e., one whose right-hand-side is the 
special symbol LEX), then this form provides the 
stem fi)r that rule. 
• If <form> correspomLs to a surface form in the 
paradigm or an iutermediate form qualified with the 
qualifier/allow lexical override , then the lcxical 
fornl either supersedes or augments the consU'nc- 
tion rule in the paradigm, depending on the value of 
the stem's /\[supersede I augment} qualifier. 
The qualifier/allow_lexical override is necessary to in- 
form the run-time inflectional analyzer when to attempt a 
lexical lookup of an intermediate form stem. By default, 
the analyzer looks up any form found directly in the text 
(surface form) and any forms whose right hand side is 
LEX. The use of the /allow lexical override flag can 
save disk accesses by limiting lexical lookups of interme- 
diate forms to just those cases in which lexical overrides 
may actually occur. 
Utilizing the/allow lexical_override qualifier and the de- 
fault lookup of suri~,ce forms, one could write lexical en- 
tries in which all the rules in a paradigm were overridden 
by lexical information. In general, this is not a good idea, 
since it fails to take advantage of the generalizations that 
paradigms provide, but there are exceptional cases, such 
as the verb "be", fl~r which there must necessarily be a 
large number of lexical stems. Allowing lexical overrides 
in this manner eliminates the need to create tm excessive 
number of highly idiosyncratic paradigms specifically to 
accomodate irregular verbs in languages like French and 
German (see section 3). 
3 Using Paradigm Inheritance to Capture 
Linguistic Generalizations 
In PDL, word formation is characterized as a sequence of 
discrete transformational steps, lu many cases, paradigms 
(as well as iudividual lexical items) will differ with re- 
spect to one or more of these intermediate steps, yet share 
the bulk of the rules that apply to the results of the inter- 
mediate operations. Default inheritance, including the in- 
heritance of the partially derived forms, makes it possible 
to express such facts very succinctly. Figure I depicts the 
hierarchy of paradigms we have developed for the French 
verbs. The root of the hierarchy (VERBROOT) repre- 
sents the "greatest common denominator" of all the para- 
digms in the hierarchy. (All of the inteianediate form 
rules in the root paradigm are shown in Figure 1, but 
many of the surface form rules are omitted because of 
space limitations. However, all of the form rules, both in- 
termediate and surface, in the other paradigms are listed.) 
The first sub-paradigm, VERB ER, represents wlmt are 
commonly referred to ,as first conjugation verbs, 
VERB_IR represents the second conjugation, and 
VERB_RE_IR, VERB OIR, and VERBRE together rep- 
resent the third conjugation, which includes virtually all of 
the "irregular" verbs. 
\[BESCHERELLE90\] describes over 70 conjugation types 
that fall within one of the three basic groups, the third 
group being subdivided iuto three sections, one for the ir- 
regular verbs ending in -ir, one tier the -oir verbs and one 
for the -re verbs. These sections map directly onto para- 
A¢_'TES DI,: COLING-92, NAI'CrHS, 23-28 Aotrr 1992 7 0 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
digms VERB. RE IR, VERB OIR, and VERBRE, re- 
spectively, with the exception of several types (which ac- 
tually fit VERBROOT directly.) Through the use of 
form rule inheritance, intermediate form odes, lexical 
override and orthographic rules, we arc able to condense 
the rules for the 78 types into these six paradigms, which 
capture in a straightforward way most of the linguistic 
regularities within and among the paradigms. 
The useful role played by intermediate form rides in in- 
heritance can be seen by comparing the VERB ER and 
VERB IR paradigms. Both share (inherit) the imp inter- 
mediate form and the set of six surface forms that do- 
scribe the imperfect tense (e.g., imp Is). However, they 
differ in the siirface lbrm prt~s_lp, which is overridden in 
VERB IR, and in the interlnediate form bllse, which is 
overridden in VERB_ER. The interesting point here is 
that even though the imperfect indicative tetras employ 
the stein imp, a form that is generated from a form that is 
not shared (prOs lp) and wliich is in turn generated from 
an unshared form (base), both the imp stem ~md the set of 
imperfect indicative forms may still be shared. 
Another example of how ovcrridable intermediate fonn 
niles can be used to condense paradiguls is provided by 
the VERB_RE IR paradignt (which handles all of the ir- 
regular verbs ending in -it that behave nlore like the -re 
verbs, e.g., dormir and v~tir) and its sub-paradigms. This 
is accomplished by first defining a new intermediate form, 
prl~s_s, which may be oveIliden by a lexical entry (or 
stem change rule). This ,,dlows for au irregular stem in the 
singular fonns of the present indicative (e.g., dormir -> 
dot. mouvoir -> meu) whilc lint overriding the base 
form, which is used elsewhere. Secoudly, allowing lexi- 
cal override of the stems used to generate the fliture and 
t)res, ent conditional tense forms (fur) and the past simple 
and impedcct subjunctivc terms (pas), respectively, al- 
lows for irregular stems such as valoir --> vaudr (fur) anti 
mouvoir --> mu (pas). 
We have found this combination of intermediate form 
niles and lexical override uscful for defining paradignis 
for Gemlan verbs as well. Bccausc some strong verbs un.. 
dergo a stem changc in the 2nd and 3rd person singular 
forms of the prescnt tease, an additional intermediate 
feint uiay bc defiued to accoulotklte ix)ssible stem 
VF, RI~ R(K)T 
intermediate forms { 
base: inf - "Jr" 
imp: prds lp "ons" 
prds sub: prfis 3p - "ent" 
/Allow lexicaloverride 
t fur: mf /allow lcxical_overdde 
17xas: base + 'T' /allow lexical_.over ride 
pas~: base + "\[" /aallow lexical override 
/ , 
.... .7(i.f=,'<:<>,,,,,i: ' '" lil 
p~,td,a~e -,~_: = "co,,,,,:,t") I ,i t 
cueillir (inf ="cucillir" I / 
flit= "cueiller"} I / 
assaillir (inf = "assaillir") \] I// 
VERB_El( i i 
I. 
base: inf "er" II 
surface_forms { 11 prds_; 
pass6 ls:lxasd "i" II prds.: 
,:~'sst-3s: t'as II prds- 
i~ass6 3p: base -<. +Ic:rcllt" prds; 
part pass6 masc s: base + "6" II prds i 
jeter (inf = "jeler") 
nlener (inf = "mencr"), , 
surface totals { 
inf: I.L:X 
pfds Is: base + "c" 
imp Is: imp + "ais" 
hit 1 s: flit t. "at" 
coil ls: fur + "ais" 
passt_ls: pas + "s" 
tmss6 lp: pas p + ",lies" 
sub prts 1 s: prds sub + "e' ~1~ 
palt~ass6 masc s: pas 
""1 
l / / I domlir (i,ff, "domlir" pros .s ="dor") I 
VERB RE 
VER B JtE_IR 
intermediate t\]mns { 
prtss: base /allow lexicaLoverfide 
prts p: base/allow lexical ovelride } 
surface Iorm~ { 
prds Is: prts-s t- "s" 
prts2s: prts- s 1%" 
prts 3s: pl~s s b "l" 
prds \[p: lll~s~l + "ons" 
prds_2p: prds~p e "ez" 
p~s 3p: pros p 4 "ent" } 
intermediate l~mns { 
base: ill|" "tC" 
fur: lid" "e" /allow lexical ovenide 
surface totals { 
prts 3s: prds s 
part i.~tss6 mast s: base ~, "u" ) 
_IR 
isurface totals { 
pr6s ls: base 4 "is" 
prts 2s: base + "is" 
pr6s 3s: base ~ "it" 
pros Ip: base ~ "issotzs" 
prts 2p: base + "issez" 
prds 3p: base b"isscnt" ) t 
\[ \]prds s :: '%at"} linir(int= "linir") \[ 
partir (inf - "parlir") \] 
\ 
VERI~ OIR " - 
intermediate l'omls { 
mouvoif (inf = "lllouvoir" 
prts s ="meu" 
pft~s 3p ~ "lnl~llVellt" 
pas ~: "lltll" 
pas p = "rot?' } 
Figure 1. Paradigm inheritance hierarchy for \[,'reuch verbs, l'aradigms are surrounded by douhle boxes. Exanlple lexical 
items for each paradigm are in single boxes. 
AcrEs DE COIJNG-92, NANn~s, 23-28 Ao(rr 1992 7 l PRo('_ oF COI,ING-92, NANrI!s, AUG. 23~28, 1992 
changes in these two \[orms, just as the intermediate form 
pr~s_s was employed in the French paradigm 
VERB_RE IR. This alh)ws all of the st,'ong verbs to he 
combined into a single l)aradigm. 
4 Compilation and Run-time Algorithms 
A PDL description is con}piled into a non-ileterministic 
transition network, suilable tor the recognition and gen- 
eration of word forms, as tollows. First, the form rules 
arc chained into a network based on the form i}antcs ap- 
pearing in the rules' left aud right hand sides. The full set 
el paradigms u) which each form lule applies is calculated 
and stored ;.it each corresponding node in the network. 
Then the orthographic rules are conftated with lhe word 
formatirnl rules by unifying tile orthographic patterns with 
tile affixes th the form rules, Finally, a character dis- 
crimination net is constructed front all suffix surface lorm 
rules to optimize tile rul}-linlc inatehing of the outermost 
suffix patterns in the form rule transition net. 
During morphological analysis, tile conflated patterns arc 
matched against the input string and the string undergoes 
whatever Iranslormation tile correspontling word lk}rma- 
tion rule diclates. At each step through the network, the 
set of paradigms for which that step is valid is intersected 
with the set that has been valid tip to that point in the deri- 
vation. If this intersection becomes NULL, then the path 
is abanthmed as iuvalid. Traversal through the net pro- 
ceetls ahmg ;.ill possible paths for as h)ng its patterns con- 
tinue to match. Lexicou Iookups of candidate stem strings 
occur only when a I,EX node or node marked ;is Icxically 
overritkthle is reached. If a lexical stein matching the 
fern} mune, paradigm set, and tcaturc constrnints acquired 
from the uet is found, then its len}lna is returned. 
For generation, the traversal is reversed, llowever, m or- 
tier to calcuhtte the sequence uf rules to traverse tu gener- 
ate a surface lorm, we must work backwards from the nile 
that prty.luces the desired surtitce form (given the para- 
digm of tile lemma) to the rule that precedes that rule, and 
s(I on, untd we reach a lorm whose stem is salted with the 
lemma in the lexicon. At this point, we know both the 
proper starting lexical stem li)rm and tile sequence nf 
rules to apply to that stem. 
5 Discussion 
A number of researchers have proposed tile use of tither\[- 
lance in representing aspects of natural language (e.g., 
\[IIUDSON841, \[EVANS891 IDAELEMANS9I)I, 
\[PUSTEJOVSKY91\]). The wnrk described here is most 
similar in spirit to the wurk of \[CALDER89} and \[R/JS- 
SELL91\], who also al)ply principles of del;casible inheri- 
lance to 111e domain of conllltltational morphology, Cal- 
dot's word Rmnation rules make use of string equations, 
an elegant and powerful tlechtrative device which, while 
more expressive than our (deliberately) conslrainetl wm'd 
lbnnatioa and orthographic rules, may bc less amenable to 
efficient compilation anti appears geared towards an th- 
memory lexicon. By di~llowing recursion in our form 
rules, limiting each form rule to at most one affixatkm op- 
eration, and encoding directionality within our nrthtl- 
graphic patterns, wc are able to cOral)lie rules into transi- 
tion networks in a swaightforward manner, reducing the 
need for extensive run-time unification. In oar experience 
to date, these language limitations have m)t interfered wilt} 
the concise capture of morphological behavior. Indeed, 
our separation of orthographic rules and fonn talcs allows 
us to capture orthographic gmtcralizatimts that Calder 
(1989) canm)t. Furthermore, whereas Calder's system 
"disallows the possibility of inheritance of partial derived 
string forms," we have found that the thheritanee of inter- 
mediate stmns contributes considerably to the descriptive 
power of our h}rmalism. 
Russell ct al (\[RUSSELL911) have tlevelnpctl language 
extensions to the PATR II style aniiicatiou grammar Ibr- 
realism which allow lot multiple defanlt inheritance in the 
description of lexical entries. Multil)le inheritance is a 
useful tool fur partitioning syntactic, semantic, and mor- 
phological classes el behavior. However, while we have 
encountered occasional cases iu which a word appears to 
derive variants Item multil)le paradigms, we have so f,'~r 
npted to preserve the simplicity ol a single itthcritance hi- 
erarchy in PDL, utilizing extra lexical stems to accomo- 
date such variants when they arise. 
Byrd and Tzoukermann (\[BYRD881) nolo that Iheir 
French word grammar contains 165 verb stem rules and 
another 110 affix rules; and they question the rehltive 
value nf storing rules versus inflected Iorms. This is a 
concern of ours as well, as we wish to minimize the num- 
ber of run-time "\[~dse alan'as", lXltential stems generated 
during morpl}ological analysis which do not actually exist 
in the lexicon. Our mlxtel of the French verb inflections 
uses 81 form rules and 17 orthogml}hic rules. We have 
tried to tlesign our paradigms to minimize the numtxzr of 
inflected stems that must be stored m the lexicon, while at 
the same time avoiding roles that woukl conlribnte to a 
prolit)ration (ff false alarms during analysis. We 1)clieve 
that the use of lexically overridable intermediate ff)rms is 
a key to strikiug this balance. 
For the purtx}se n\[ acquiring moqthnlogical information 
about unknown words m a coqms, however, it is useful tn 
have a single canonical furm (citation lorm) t~)r each para- 
dignl, from which all inflected fornls in the paradignt can 
be derived. Thus we have opted to extend our language 
with the notkm el "acquisition-only" paradigms. These 
paradigu/s are essentially tile saute as those used for rec- 
ognition; however, they include extra form rules (typically 
siren change rules) to reduce all lexical steins wilhth a 
AclIis m! COLIN(L92, NAr'rrus, 23 28 hOLq" 1992 7 2 PROC. el: COLING 92, NANTES, AUG. 23-28, 1992 
paradigm to a single citation stem. The intleritance provi- 
sions of PDL make it very easy to add sucb paradigms. 
I lowever, any lemum created dnring Ihe acqnisition pro- 
cedure nsing an acquisition-only paradigm must be 
nlappe{| to iks eqnivalent lelnma based ou Ihe correspond- 
ing recognition-thne paradigm. This iuvolves generating 
tile extra lexical stems required by Ihe rec{}gnition-lime 
paradigm, so that these stems, in addition to tile citation 
stem, call be stored directly ill the lexicon. 
Several traditionally problematic aspects of German mor 
pholtlgy have proved problematic for our fllrnlalism as 
well aod we lulve adoptexl extensions to tile language to 
acconmdate thenl. Modeling tile stem changes revolving 
German "l.lmlantmtg" (FI'ROST90\]) has required tbc a{l- 
dition of a variable mappiug function to tile spccificatinn 
of orthographic rales. German separablc prefixes are han- 
dled via tile use of an affix variable, which retains tile 
value of the separable prefix for later unificalion with tile 
separable-pretix fcature of potential lexical stems. Ger- 
inatl conlpounding renlains impossible to capture witllin 
our current I{)rlrl rules, as they are, constrained to a single 
<stenr> component. While we expect t{} store nlost COlO- 
ponnds directly in Ihc lexicon, we arc looking rote henris- 
tics Ibr analyzing componnds that minimize the number 
of probes needed into our secondary slorage lexicon. 
6 Conclusions 
Our experience so far with PDL bas suplxnted our hy- 
pofllesis that organizing moq}lllllogical behavior in terms 
of hierarchically related inflectiomd pamdignls belps to 
explicitly characterize tile similarities and differences 
among classes of words an{I makes it easier tl} capture in a 
COIICISB and transparent nlallller tile kinds of word fornla- 
tion rules describ{xt iu many gralninar books. The lan- 
guage Call be compiled into a form anlenable to efficieut 
analysis and generation with a dynamic secondary Stl}rage 
lexiclm, Future work includes further "tuumg" of existing 
ndesets, extending our coverage of European languages, 
and interfacing the inflectional system with roles {}f {led- 
vational moq)hology and compouuding. 
Aeknowledl~ements 
The anthers gratefully acknllwledge tile invaluable ass{s- 
lance of Alain Couillanlt, Ilcngfimeh Irandoust, and 
Michael Carl ill developing grammars \]i}r I:rench and 
German. We also wish IN thank Rex Elyun, David 
Hansseu, and Mayank Prakash R/r their thoughtful fixxl- 
I)ack on our design of PDI,. 
A{:rEs l}L,: COI,ING 92, NANIES, 23 28 AO\[:I' 11)92 7 3 PRO{:. (): COt IN{- -92 NANrES. AUC. 23-28. 1992 

References 

\[ANICK9{}I An{ok, P. G., J. I) Brcnnan, P,. A, Flynn, D. 
R. tlansseu, lk Alvey and J. M. Robbms. A Direct Ma- 
nipulatkm \[utcrface f{}L t;ooIcan hff{}onation Removal via 
Natulal language Query, ill Proceedings of ACM/SIGIR 
'9{}, B uisscl s, 1990. 

IBEAR88\] P, ear, John. Morphology with Two-l.evel 
I;',ulcs and Negative Rule Features, ill Pr{zccedings of CO- 
LING '88, Budapest, 1988. 

IBIkS{'tlEREI,I,Ik9{)I Bescherelle. I~a Conjngaison 120(}() 
Verbes. ltatie/: Pads, 1990. 

\[BOI~ROW771 Bowbrow, l). G. and T, Wmograd. Au 
Overview of KRI,, a Knowledge l,(epresentation l.au- 
guage. C{}gnitivc Scie, ncc, 1:3-46, 19"t7. 

I\]:IYRI)881 llyrd, Roy J. and F, velync :l'zotlkerlnann. 
Adalllmg au English Morphological Analyzer for French, 
in Proceedings oI ACE, 1988. 

{CAIA)ER89\[ Calder, Jonathan, Paradignlalic MorphoF 
ogy, in Prl}ceedings of the 4th tZACI., Manchester, 1989. 

\]I)AEI.tiMANS9(}I Daelemans, W. and G. Gazllar (eds.) 
lnherilance in Nattlral Lauguage Processing: Workshop 
Proceedings. 1TK, '\['ilbnrg University, 1990. 

IEVANS891 Iivans, Roger and (}craM Gazdar. InfcrcllcC 
in DATR, iu Pr{}ccedmgs el the 4th I';A{'L, Mancheswr, 
1989. 

\[(;()RZSN\] G/irz, Giinther and I)ietrich F'aulus. A Fiaitc 
State Approach to Geunan Verb Morphuklgy, in Prllceed 
ings of COI,ING '88, Budapest, 1988. 

IHUDSON841 lludson, Richard. Word (;rumlnur. Basil 
Blackwell: Oxlord, 1984. 

IKOSKENNIEMI841 Koskenuienli, K. A General Com- 
\[}utational Model lor Word-forln RecclgnilR}n aud Produc- 
t{ira, in Proceedings of COLIN{\] '8,l, SIanlord, 1984. 

\[NEBEL741 Nebel, Cdcile and Frederick F. Falcs. 
French Grammar, Monarch Press: New York, 1974. 

\[I}US'IT:JOVSKY91\] Pusteiovsky. James. The Genera- 
tive I,exicon. Colnl)UlUtional I,inp, uistics, 17(4), 1991. 

II¢,I, JSSEI,I,911 Russell, (;mhauL John Carroll, and Susan 
Warwick-Armstrong. Mnhiplc I)cfault Inherilancc in a 
Unification-.Based l,cxicou, m Pr{}ccexlings el ACI., Ber- 
keley, 1991. 

\[TROST9()I Trost, l larald. The Api}lication of Two-level 
Mllrl}llt}l(lgy to Non-couealcnative Gerlnan Morphology, 
in Proceedings of COLING '91, I lelsiuki, 1990. 1991
