MACHINE LEARNING OF MORPHOLOGICAL RULES 
BY GENERALIZATION AND ANALOGY 
Klaus Wothke 
ArbeiLssLe\]le LinguisLische DaLenverarbeiLung 
INSTI\[UI FOR DEUTSCHE SPRAI;HE 
Mannheim, West. Germany 
ABSTRAI:T: 1his paper describes an experi- 
menLal procedure For Lhe inducLive auLomaLed 
learning of morphological rules From exam- 
ples. At First an ouL\].irle of Lhe problem is 
given. Then a Formalism for Lhe represen- 
t. arian of morphological rules is defined. 
This Formalism is used by Lhe auLomaLed 
procedure, whose anaLomy Js subsequently 
present, ed. Finally t. he performance of t. he 
sysLem is evaluat, ed and Lhe mosL important. 
unsolved problems are discussed. 
l. OuLline of Lhe Problem 
Learning algorithms for Lhe domain of 
naLurai languages were in Lhe pasL mainly 
developed to model Lhe acquisition of synLax 
and Lo generaLe synLacLJc descripLions flrom 
examples (eL. Pinker 1979~ Cohen/Feigenbaum 
\]982: 494-5\]\]). There exist also some sys- 
Lems which learn rules for Lhe auLomaLie 
phonetic LranscripLion off orLhographic LexL 
(eL. Oakey/Cawt:horn 1981, Wolf 1977). Like 
the system presenLed in Lhis paper all Lhese 
systems sLill are exporimenLal sysLems, the 
inductive auLomaLic learning of morphologi~ 
cal rules has Lill now been invesLigaLed 
only Lo a small degree. Research on Lhis 
problem was carried out by Ring (1978), 
3snsen-WJnkeln (\]985) and Wofhl<e (1985). 
The task of' Lhe sysLem described here 
is Lo learn rules f'or inflecLiona\] and 
derivaLional morphology. The system is naL 
designed as a sLandard program, but as an 
experimenLal system. It \]s used For Lhe ex- 
perimenLa\] development and t, he Lesling of 
fundamenLal a\]goriLhmic learning st. rat. egies. 
Lat. er these sLrategies could perhaps become 
necessary components of a standard \].earning 
program devised For Lhe interacLive develop- 
menL off \]inguisLJc algorithms For Lhe domain 
of morphology. 
Input: Lo Lhe sysLem is a seL of exam- 
ples called a learning corpus. Each example 
is an ordered pair of words. We call the 
f'irsL word of each pair Lhe source. \[he 
second word is called Lhe t. argeL. BeLween 
the source and Lhe LargeL of each given pair 
Lhere musL exist: an infllect, ional or a 
derivational morphological relaLion. By ap-. 
plying t. he processes of generallzaLion and 
deLecLion analogies Lhe syst. em has to con- 
sLrucL a seL o6 insLrucLions which describe 
on a purely graphemic basis how Lhe LargeL 
of each pair is generaLed From the source. 
(SemanLic feaLures of morphemes are aL 
presenL ignored by Lhe sysLem.) Such a seL 
of inskrucLions should not only generaLe 
correcL LargeLs For the sources given in the 
learning corpus: The insLrucLions should 
also generaLe correcL targeLs for Lhe major- 
iLy of Lhe sources not in Lhe corpus which 
part. icJpaLe in Lhe same inflectional or 
derJvaLional relaLienship as Lhe source- 
LargeL-pairs Jn Lhe learning corpus. Suppose 
For example LhaL Lhe Following learning cor- 
pus is Fed JnLo Lhe sysLem: 
"assembly' 
"baLh 
"box" 
"boy" 
"bus" 
"bush 
"buzz 
"calf 
"copy 
"cry" 
"door 
"Field" 
'house' 
"knife" 
"lady" 
"moLher" 
"swiLch' 
"universiLy" 
"assemblies" 
"baLhs" 
"boxes" 
"boys" 
buses" 
bushes 
buzzes 
calves 
copies 
cries" 
doors" 
"fields 
"houses 
"knives 
"ladles 
"moLhers' 
"swiLches' 
"unJversiLJes" 
Figure \] . 
In t. his case kilo learning algoriLhm has Lo 
consLrucl a set. off inst. rueLions which gener- 
ales fior each singular noun (= SOLirce~ in 
Lhe leFL column) of: Lhis corpus a sLring 
which is idenLical w.tLh t. he corresponding 
plural Form (= LargeL, in the righL column). 
FurLhermore, Lhe inst. rucLions should also 
generat, e Lhe correcL plural Form For Lbe 
majoriLy of English singu\].ar nouns which are 
not, members off Lhe l~arnirlg corpus. For in- 
seance, Lhe inslrucl, ions should also gener- 
aLe "flies" f'rom "fi\[y', "Lables " f'rom 
"Lable ", "foxes " from "fox ", "lays" from 
"Lay ", "classes " From "(;lass', and "thieves " 
From "Lhief'. Of course Lhere will also be 
singular nouns For which Lhe .tnsLrucLions 
will noL be adequaLe. These will include all 
nouns whose paLLern off pluralizaLion is not 
represenLed by examples in Lhe learning cor- 
pus. WiLh t. he given learning corpus one 
289 
could not expect the inferred instrucLJons 
to be adequat, e e. g. For the pluralizations 
"ox" -> "oxen', "LooLh" -> "teeLh', 
"index" -> "indices', "foot" -> " feeL" ~ and 
"addendum" -> "addenda'. As Lhis example 
illustrates, the linguistic adequacy of" the 
insLrucLions does not only depend on the 
quallLy of the automated learning sLrategies 
but also on the representativity off a given 
\]earning corpus for a morphological pattern. 
2, Formalism for the ReEresentation of 
Me r~ho~ic al Rules 
\]here are two main types of instruction the 
learning algorithm uses for the formulation 
of morphological rules: 
Prefixal substitution instructions change 
the beginning of a source in order to 
generate the corresponding target. \]hey 
have Lhe genera\] \]'arm 
X-> Y/# (Z(1)l ... IZ(i)f ... ~Z(n)). 
Such an instruction means: If a source 
begins with Lhe string X and J fi 
immedJ, ately on the right of X follows the 
string Z(\].) or ... or Z(i) or ... or 
Z(n)~ then substitute X by Y. ('#" 
signifies the word-boundary and 
marks the position where X must occur in 
order Lo be subst i. Lutable by Y, namely at 
Lhe beginning sl' a source (right off " #" ) 
and immediate.}y before Z(1) or ... or 
Z(\].) or ... at" Z(n)). 
~ufflixa.l substJ, tuLion \]nstrucLJons change 
the end of a source in order to generate 
the corresponding target. Ihey have the 
form 
X -> Y/(Z(\])I ... IZ(J)I ... IZ(n)) #. 
rhe meaning off such an instruction .is:IF 
a source ends with the string X and if 
imme(liaLely on Lhe left: of X is the 
str.tng Z(1) or ... or Z(i) or ...or Z(n), 
then substitute X by Y. 
Each seE of" instructions constructed by the 
learning algorithm Js ordered, i. e. the 
later application of the instructions to a 
given source mus~ be tried in a fixed 
sequence in order to generate a target: The 
first applicable prefiixa\] instruction in the 
sequence of prefixal substitution 
instructions must be determined and the 
first applicable suffixal instruct Jan in the 
sequence of suffixal subsLitution 
instructions must be determined. Then, both 
must be applied to the source concurrently, 
thus generating the target. 
the order and application of sets of 
instructions may be illustrated by a small 
example: Suppose the learning algorithm has 
consLructed Lhe Following set of 
instructions for the negation of English 
adjectives (the seL is linguistically noL 
Fully adequate; "" is the nulls\]ring, i. e. 
the string wiLh the length 0): 
290 
\] ) -> 
2) -> ~) _> 
a) -> 
5) -> 
Figure 2. 
il'/# "l" 
ir'/# --'r" 
in"/#-~(" m" I" p" ) 
in"/#.__ 
"/ # 
Then the negation of "perfect' is Formed by 
First determining tile fJrsL applicable 
preflJxa\] substituLion instruct i. on: 
(l) is not applicable, since "perfect" 
does noL begin with "1". 
(2) is not applicable, since 'perfect:" 
does not begin with "r ". 
(3) is opp\].Jeable, since "perfect" begins 
with "p ", 
The first applicable sufflixal subst:it:utJon 
instruction Js the only suffixal :instrunLJon 
at. hand, namely (5): "perfect" ends wiLh "'. 
By the concurrenL app.IJcation of (3) and (5) 
to "perfect " the target 'imperfect " Js 
generated, which \]s t:he negaLion of 
"perfect ". 
3, Anatomy of the System for the Aufomalnd 
L e a.£ni r£~_9 fi _M o ~_tip~11 p£i c s l R u \] e s 
lhe sysLem Js written J.n the programming 
language PL/I. It has the name PRISM, which 
is an acronym for "PRogram For tile Inferennc 
and SJmulaLion of' Morphological ru\].es'. 
PRISM has the macro structure shown Jn 
Figure 3. At an actJvat ion of PRISM, its 
main procedure MONITOR at first activates 
GETOPTN ~lhJch reads \]:he user's options For 
|111o control of PRISM and checks them for 
synLactJc we\] \].-Formedness and For 
plaus:ihilJtyo \[hen MONIIOR activafes Lhe 
component indicaLed by the user "S COl/ire\] 
options. ~here are three alternative 
components : 
- A learning component which infers sels of 
JnstruelJons From a \]earning corpus gJvee 
by the user of PRISM. Th:is component 
comprises the procedures I:ItKCRPS, DISCOV, 
STMT\[}UT, TODSE\], and others. \]he learning 
process is performed by DIS('OV. The other 
procedures perform peripheral functions. 
A componenL For the appl:ication of 
instructions ~hich were inferred by the 
\].earning component, lhis component 
comprises the procedures FRODSE\], APPLY, 
DERIVE, and others. 
A third, marginal component which 
prepares instractions For their printout. 
IL consists of FRODSE\[, SIM\]OU\], and 
other procedures. 
The aet:J vat\]on of the learning 
algorithm starts with a call of CHKCRPS by 
MONITOR. CHK(}RPS cheeks a given learning 
corpus for formal errors. The procedure 
activated next. is DISCOV~ which performs the 
learning processes. DISI'OV first determines 
Lhe different types of substitution patterns 
in the qiven \]earninq corpus. Types of 
"1 ................. 4" -I ............. I" 
! M 0 N I I 0 R ! .............. >t GETOPTN 
4 ................ f I r + + .......... + 
V V V 
.................. < ..................... + ! + ................... > ....................... ~_ 
V V V 
\].earn.i. ng of" app\].icaLion of" prinLeut of 
Lnsl:rucLJone i net: r LIC t: J. one ir/sl:ruuLiona 
! + .......... + ! -J ........... + + .......... j. ! 
+->! CIIt<CRPS ! +->! FRODSFT ! ! PRODSEI !<-+ 
! + ........... +<====/ / ! + .......... F<=====/ /==>+ .......... + ! 
! / I.EARN1NG / ! / KNOWLEDGE / ! 
! -P .......... ,- / CORPUS / ! ~ ........... ~. / BASE / + ......... + ! 
+->\[ DISCOV !<= / +->! APPLY !<==/ / ! SIMIOUT !<--, 
! -P ............. i ~ + ......... F + ......... + 
! + ......... + V 
+->! SIMTOUT ! / / + ......... + / / 
! .i .......... + / SOURCES /=>! DERIVE !=>I TARGErS / 
! / / + ......... + / / 
! "l ........... F 
+->! I OI)SET !=> KNOWLEDGE / 
+ ............ + / BASE / / / 
F.igure, 3. Macro eLrucLLire el PRISM. (For reasells oF lueJdiLy some macro FeatLIres 
of PRISM have been .ignored in Lhis chart.) 
subst, iLuL:i en psi:barns ace Ehe diFferenL 
(X, Y)-pairs which are iml)liciLly presenL in 
Lhe learnJ, og carpus. (For Lhe eLabus of` X 
and Y cempare Lhe deF:iniLion oF the 
formal.Jam I'or Lhe repreeenLation oF 
merpholagJ c:a.l ruJeso ) \[tie second st:ep of 
\[) I S(~(\]V cempLiLes Lhe frequency ef` each 
subst..iLuLion patLern in I:he eortJas. D\]SI~E\]V's 
learning st. raLegy presur)poses LhaL Lhe 
subs b J lLlt:.ierl pa\[:~:erns oeetlrrJng more 
Irequenf\].y Jn a \] anguage also eecur more 
Frequently J n Lbe \] earn:ing corpus. I heref'ore 
D1SCOV creates more general J. nst. rueLiona Per 
Lhe mare f'requent poLLerna of" a learrliog 
corpus and more specific \]liSP. surE:bOllS fop 
Lhe \].ens f'requenL patLerns oF o learning 
corpue~ J. o. the conLexLuo\] sbringe Z(i) of" 
an Jn,<;Lrue|:.i. or~ X --> Y/# (Z(\])\] ... 
iZ(:i)! ... IZ(n)) or X -> Y~(Z(\].)I ... 
tZ(:i)l ... IZ(n)) tt are l:he more general 
Lhe more frequerlt, ly Lhe eubsL~t:ul::i, on pat:LeFr~ 
(X~ Y) aeeUrSo They are bbe more speelf'.ie 
\[-.he mere rarely t. he aubsLJ t. uL.i. on pal:t, ern 
occurs. Provided LhaL a learn:ing oo£pus JS 
represenl.at:\] ve of" Lhe morpholaqical SUb- 
atJt. uLJon pal:terns of` a \].anguage and Lhe 
conLexLua\] at. rings Z(J), t:hie genera\]. 
sLrat, egy Far Lhe deLermJeaLJon of' t. he Z(J) 'a 
increases t. he probabJlJLy thaL the inferred 
:ins f.r ue I::i ons generate correct targeLs For 
sueh sour(',es as are not. elements oF t, he 
giverl \].earrling corpus. D\[SCOV arranges Lhe 
subsLil:utJnn inaLructton,s in such a way t. bat. 
Lhe more npeeif'J.e inst. rueLJons precede t:he 
more general odes. rhis order of` the in- 
st. rueLions guarant, ees durJ. ng t, heJr \]aler ap- 
p\]icaLion Lhat: pot, erlLia\].ly each tnsl. cL~et, iol/ 
can be applied. SIHTOU\] Lransforms subst:itu- 
t. ion instrhlcL:ions inferited by I)IS(:OV From 
Lheir inLerrlal, relDresent, aLJen ~ whJcb aliens 
lheir easy and fasL aubomabie breaLmenL, 
into an external represerlLaLlon and prinLs 
thenl ouL. For Lh.ts ext, ernai represenLaLlon 
Lhe noLat, ion is used which was :int:raduced 
above :in Lhe def'inJl:ions off the l:wo t. ypes oF 
subst, ii uLion in,sLrucLions. F.tna\] \]y TOI)SE \[ 
slates Lhe ~ I\] a \[~ £ ill? ~, J one in an exberns\] 
knowledge base, From i~hieh Lhey can Islet be 
read by t. he oLher |.wo componenLs off PRISM 
(In Lhe l<rloll/ledqe base Lhe J. nsbrLicLJ, ons are 
seared J. rl theJ. r inLet'na\] t'epresenLaLion). 
The spp\]lcaLlon component, sLarLs ~/Jth 
EROI)SEI, ~hJeh loads a set. of" insbrucLions 
I-o be nppJied From Lhe knowledge base lo Lhe 
eenLral memory. Then l. he Ewe procedures 
APPLY and DERIVE apply Lbe inst, I.'uet:ions Lo 
~/orde gives by Lhe user and Lhereby generaLe 
Largel.s i~/hJch are ~lJrit:Len to an ouLpuL data 
set:. \[he I< i. nd of morphological relaLJ, en 
beLween bhe generabed Larget-s and t. he given 
wards depends on l. he apeeifJ, c see af` Jn-- 
sL\['ucl, Jona which is applied. 
4. ~_LaLu~LL~n ~_r L±£_Sy,,~Lem 
\[he perf`ormanee of" PRISM ~J/as evaluaLed Llnder 
the Fo\],J. uwing condit.:ions. 
\]. A see oF insLrucl. J one stlou.'\[d always 
generat, e correct. Lai~gef. s if' iL la applled 
Lo t. he souz'ces of" Lhe learning corpus 
From u/bich iL was inferred. 
2. The larger Lhe learning corpHs Js For a 
given morphoJogical relaLion, the hlghar 
should be on average t. he percenLage of" 
correcl:\]y genet'abed t. acget:s f'or such 
sources as are not: e\].emenbs of` the 
learnLng car pu,q (buL nevert, heless 
291 
participate in the given morphological 
relation). 
3. A set of instructions inferred From a 
linguistically representative learning 
corpus should generate correct targets 
for at \].east 90% of the sources which are 
not elements off the learning corpus (but 
which nevertheless participate in the 
morphological relationship under discus- 
sion). 
4. If a linguistically representative 
learning corpus is given, the learning 
algorithm should classify as regular 
those morphological patterns which 
linguists also usually classify as 
regular. 
Condition i is fulfilled. This could be 
proved deductively with reference to the 
structure of the learning algorithm. (The 
proof is given in Wothke 1985, 144-154.) 
The fulfilment of conditions 2-4 could 
only be tested inductively by applying 
PRISM's learning algorithm to different 
learning corpora and evaluating the results. 
Condition 2 was tested by applying the 
learning component to learning corpora of 
different sizes compiled For two morphologi- 
cal relations: derivation of nomina actionis 
from verbs in German (e. g.: "betreuen" -> 
"8etreuung'), derivation of Female nouns 
from male nouns in French (e. g.: 
"spectateur" -> "spectaLrice'). With the 
sets of instructions inferred from these 
learning corpora PRISM's application com- 
ponent generated targets for a set of words 
not in the learning corpora. The statistical 
results of these tests showed that the per- 
centage of correctly generated targets For 
such sources as are not elements of the 
learning corpus is, on average, the higher 
the larger the learning corpus is. A Further 
important result was that the percentage of 
correctly generated targets is the higher 
the more regular the morphological relation 
is: The tests yielded better results For the 
more reguiar derivation of Female nouns from 
male nouns in French than For the less 
regular derivation of nomina actionis Form 
verbs in German. 
To test the Fulfilment of the third 
condition representative learning corpora 
were manually compiled For the derivation of 
nomina actionis From verbs in German (9.167 
source-target-pairs) and For the derivation 
of female nouns from male nouns in French 
(89 source-target-pairs). The two sets of 
instructions automakieally inferred from 
these two corpora were applied Lo large sets 
of sources which were not members of the 
learning corpora (4.793 sources for German, 
211 sources for French). In both cases the 
percentage of correctly generated targets 
was iOO~. 
Condition 4 was tested with learning 
corpora for the pluralization of English 
nouns and For the derivation of female nouns 
from male nouns in French. An exact quanti- 
fication of the degree of accuracy is not 
292 
possible, since this condition contains some 
vague expressions such as "regular" and 
"usually" My subjective judgement is that 
the instructions constructed by the learning 
algorithm For (approximately) representative 
corpora are quite similar to the morphologi- 
cal regularities described in tradJtionaI 
grammars. This may be illustrated by an ex- 
ample: The learning corpus shown in Figure 
is approximately representative for the 
regular pluralization patterns of English 
nouns. From this corpus PRISM inferred the 
Following set of instructions which 
represent the most important pluralization 
rules: 
(l) " -> "/# 
(2) "f" -> 'yes'/ # 
(3) "re" -> "yes.'/ # 
(4) "y" -> "ies'/( "d'l "l'i "p'i 'r'~ "t ') # 
(5) '' --> "ca'/( "oh 'i "sh't "s'l "x'\[ "z ")# 
(6) "' -> "s'/ # 
__I 
Figure 4. 
5. Unsolved Problems 
- The Formalism which PRISM uses For the 
representation of the instructions is 
designed For the description of graphemie 
changes at: tile beginning and/or at the 
end of a word. Thus this Formalism Js 
inadequate For the description o£ changes 
in the interior of a word. These, how- 
ever, occur more rarely t_han t~he changes 
at: the beginning or at the end. A solu- 
tion to this problem, which could consist. 
in the design of a new Formalism whose 
expressions could also be \].earned 
automatically, has not as yet: been Found. 
PRISM cannot recognize exceptions in a 
learning corpus and treat them 
adequately. If, for instance, the 
learning corpus in Figure 1 would also 
contain the pair ('goose', "geese), 
PRISM would infer the prefixal substitu- 
tion instruction "goo" -> "gee'/# and 
insert it in the set of instructions 
shown in Figure 4 before instruction (1). 
Furthermore PRISM would infer the suf- 
Fixal instruction " ' -> ' "/'ose" # and 
insert it before instruction (3). IF this 
new set of instructions is applied to the 
nouns "good', "goodness" and "goon" the 
incorrect plurals "geeds', "geednesses" 
and "gowns' are generated. - It would be 
preferable for PRISM to identify excep- 
tions as such and store them in a list of 
exceptions instead of inferring 
overgeneralizing instructions from them. 
If a set of instructions is linguisti- 
cally inadequate, the user of PRISM must 
First make the learning corpus more 
representative by adding suitable exam- 
plea. Then he must activate the learning 
component of PRISM ~hich infers a totally 
new set of instructions. Perhaps it 
~ould be better if PRISM could infer new 
instructions only From the ne~ examples 
and then synthesize these ne~ instruc- 
Lions wiLh the fiormerly inferred and 
lJnguisLieally inadequaLe JnsLrueLions 
Lo give a new, more adequaLe seL off in- 
strucLions. 
References 

Cohen, P. R./F eigenbaum, E. A. (Eds,) 
(\]982): lhe handbook of actiflieia\] Jn- 
Lelligence. Vol. 3. London. 

Jansen- Winl<eln, R. M. (1985): IndukLJves 
kez'nen van q\['ammaL:i.l<iregeln aus ausgew~i~.\[- 
Len Beispielen. In: Savory, S. E. (Ed.) 
(1985): K~nstliche InLelJigenz und Exper- 
Lensystome. \[_in PorschungsberJ chL der 
NJ. xdorF AG, 2nd ed, M\[inchcn/WJ. en, 
PP. 211 223. 

Oakey, S./Cawlhorn, R. \[:. (\]981): Inductive 
\]earning of pronuneiaf./on rules by 
hypot, hesLs I. esLing and correct, ion. In: 
Proceedings of Lhe 7Lh InLez'naLianal 
gpint: \[:onFerence on ArLJfJcLal \[n- 
Lel\]igence. AugusL 1981. Vol. 1. 
PP. \]09-114, 

Pinkez" 9 S. (\]979): Formal models off \]anguage 
\[earnincl. In: (:ogniLion 3. PP. 217-283. 

Ring~ II. (1978): PEI. IKAN - eJn Le\['nsyslem 
fdr \[inguJsLische l<lassiFtkaLions - 
a\]goriLhmen. In: Nach\]'ieht-en fldz" Dokumen- 
Lat. ion 6. PP. 224-226. 

Woif~ E. (\]977): Vom BuchsLaben zum Laut.. 
MaschineL\]e Erzeugung und Erpt'ebung yon 
UmseLzauLonlaLen am Oeispiel Scht'J FLeng- 
lJsch Phono.logJsehes t.~nglisch. 
Braunsehl~eig. 

WoLhke~ K, (1984}: PRISM User's Gu:ide. Bonn. 
(= IKP-A~beit. sbe\]~iellL No. 5) 

Wot. hke, K. (1985): Maschinel\].e £rlernung und 
Simu.\[ aL i(~n morpho.\[ogischer Ab\] eiLung.sre- 
geln. Bonn. (DocLora\] dissel~taLion). 
A det. aJled t:reaLmenL off Lhe \[heme dealL ~JLb 
irl fihis papeL" is given in Wot:hke (1985). 
