Unsupervised learning of derivational morphology from 
inflectional lexicons 
l~ric Gaussier 
Xerox Research Centre Europe 6, Chemin de Maupertuis 38240 Meylan F. 
Eric.Ga.ussier@xrce.xerox.com 
Abstract 
We present in this paper an unsupervised method 
to learn suffixes and suffixation operations from an 
inflectional lexicon of a language. The elements ac- 
quired with our method are used to build stemming 
procedures and can assist lexicographers in the de- 
velopment of new lexical resources. 
1 Introduction 
Development of electronic morphological resources 
has undergone several decades of research. The first 
morphologicM analyzers focussed on inflectional pro- 
cesses (inflection, for English, mainly covers verb 
conjugation, and number and gender variations). 
With the development of Information Retrieval, peo- 
ple have looked for ways to build simple analyzers 
which are able to recognize the stem of a given word 
(thus addressing both inflection and derivation1). 
These analyzers are known as stemmers. 
Faced with the increasing demand for natural lan- 
guage processing tools for a variety of languages, 
people have searched for procedures to (semi- 
)automatically acquire morphological resources. On 
the one hand, we find work fl'om the IR community 
aimed at building robust stemmers without much 
attention given to the morphologicM processes of 
a language. Most of this work relies on a list of 
affixes, usually built by the system developer, and 
a set of rules to stem words (Lovins, 1968; Porter, 
1980). Some of these works fit within an unsuper- 
vised setting, (Hafer and Weiss, 1974; Adamson and 
Boreham, 1974) and to a certain extent (Jacquemin 
and Tzoukerman, 1997), but do not directly address 
the problem of learning naorphological processes. On 
the other hand, some researchers from the compu- 
tational linguistics community have developed tech- 
niques to learn affixes of a language and software to 
segment words according to the identified elements. 
The work described in (Daelemans et al., 1999) is a 
good example of this trend, based on a supervised 
1The distinction between inflectional and derivational 
morphology is fax from clearcut. However, in practice, such 
a distinction allows one to divide the problems at hand and 
was implicitly adopted in our lexicon development plan. 
learning approach. However, it is difficult ill most of 
these studies to infer the underlying linguistic frame- 
work assumed. 
We present in this paper an unsupervised method 
to learn suffixation operations of a language from an 
inflectional lexicon. This method also leads to the 
development of a stemming procedure for the lan- 
guage under consideration. Section 2 presents the 
linguistic view we adopt on derivation. Section 3 de- 
scribes the preliminary steps of our learning method 
and constitutes the core of our stemming procedure. 
Finally, section 4 describes the learning of suffixation 
operations. 
2 Derivation in a language 
The derivational processes of a language allow speak- 
ers of that language to analyse and generate new 
words. Most recent linguistic theories view these 
processes as operations defined on words to produce 
words. From a linguistic point, of view, a word can be 
represented as an element made of several indepen- 
dent layers (feature structures, for example, could 
be chosen for this representation. We do not want 
to focus on a particular formalism here, but rather 
to explain the model we will adopt). The different 
layers and the information they contain vary from 
one author to the other. We adopt here the layers 
used in (Fradin, 1994), as exemplified on the French 
noun table: 
(G) table 
(F) (teibl) 
(M) fem-sg 
(SX) N 
(S) table 
where (G) corresponds to the graphemic form of the 
word, (F) to the phonological form, and (M), (SX) 
and (S) respectively contain morphological, syntac- 
tic and semantic information. A derivation process 
then operates on such a structure to produce a new 
structure. Each layer of the original structure is 
transformed via this operation. 
We can adopt the following probabilistic model to 
account for such a derivation process: 
24 
P(w, = p(w:(G) = opG(w,(a)), 
Op 
w~(F) = Opp(wl(F)), w2(M) = OpM(Wl(M)), 
w~(S.V) = 01,s.x(w~(,5'X)), wE(S) = Op.s.(w~(S))) 
where Op is a derivation process, Opt is the compo- 
nent of Op which operates on the graphemic layer, 
and w(G) is the graphemic layer associated to word 
W. 
The different layers can be divided up into three 
main dimensions, used in linguistic studies to iden- 
tify and classify suffixes of a language: the formal 
dimension (corresponding to G and F), the morpho- 
syntactic dimension (M and SX), and the semantic 
dimension (S). The nature of the operation along 
these dimensions mainly depend on the language un- 
der consideration. For example, for Indo-European 
languages, for the formal dimension, a suffixation 
operation consists in the concatenation of a suffix 
to the original form. Morphographemic as well as 
phonological rules are then applied to turn the ideal 
form obtained via concatenation into a valid surface 
form. 
We focus in this article on concatenative lan- 
guages, id est languages for which derivation cor- 
responds, for the formal dimension, to a concate- 
nation operation. We also restrict ourselves to the 
study of suffixes and suffixation. Nevertheless, the 
principles and methods we use can be extended to 
non-concatenative languages and prefixes. 
The a.im of the current work is two-fold. On the 
one hand we want to develop stemming procedures 
for Information Retrieval. On the other hand, we 
want to develop methods to assist lexicographers in 
the development of derivational lexicons. We posses 
inflectional lexicons for a variety of different lan- 
guages, and we'll use these lexicons as input to our 
system. Furthermore, we are interested in making 
the method as language independent as possible, 
which means that we will explore languages with- 
out a priori knowledge 2 and thus we wish to rely on 
an unsupervised learning framework. 
The probabilistic model above suggests that, in 
order to learn suffixation operations of a language, 
one should look at word pairs (wl, w2) of that lan- 
guage for which w2 derives from wl, In an unsu- 
pervised setting, such pairs are not directly accessi- 
ble, and we need to find ways to extract the infor- 
mation we are interested in from a set of pairs the 
words of which are not always related via deriva- 
tion. The method we designed first builds, for a 
given language, relational families, which are an ap- 
proximation of derivational families. These families 
2 In particular, it is interesting to avoid relying on suffix 
lists, which vary from one author to the other. 
are then used to produce pairs of words which are 
a first approximation of the pairs of related words. 
From this set of pairs, we then extract suffixes and 
suffixation operations. 
The next section addresses the construction of re- 
lational families. 
3 Construction of relational families 
Our goal is to build families which are close to the 
derivational families of a language. This contruction 
relies on the notion of suffix pairs that we explain 
below. 
3.1 Extraction of suffix pairs 
The intuition behind the extraction of suffixes is that 
long words of a given language tend to be obtained 
through derivation, and more precisely through suf- 
fixation, and thus could be used to identify regular 
suffixes. 
We first define a measure of similarity between words 
based on the comparison of truncations. 
Definition 1: two words Wl and we of a given lan- 
guage L are said to be p-similar if and only if 
i trunc(wl,p) =- trunc(w2,p), where trunc(w,k) 
is composed of the first k characters of w, 
ii there is no q such that: q > p and 
t,'une(wl, q) _= trune(,v2, V) 
The equivalence relation (=) defined on tile alpha.- 
bet of L allows one to capture orthographic variants, 
such as the alternation c - ¢ in French. The char- 
acter strings sl and s2 obtained after truncation of 
the first p characters fi'om two p-similar words are 
called pseudo-suffixes. The pair (sl,s2) will be called 
a pseudo-suffix pair of the language L, and will be 
said to link Wl and w2. Note that the strings sl 
and/or s2 may be empty: they are both empty if the 
words wl and w2 differ only in their part. of speech, 
in which case we speak about conversion. 
The above definition allows us to state that the En- 
glish words "deplorable" and "deploringly" are 6- 
similar and that (able,ingly) is an English pseudo- 
suffix pair. Since "deplorable" is an adjective and 
"deploringly" is an adverb, we can provide a more 
precise form for the pseudo-suffix pair, and write 
(able+AJ,ingly+AV) (where +AJ stands for adjec- 
tive and +AV fbr adverb) with the following inter- 
pretation: we can go from an adjective (resp. ad- 
verb) to an adverb (resp. Adjective) by removing the 
string "able" (resp. "ingly") and adding the string 
"ingly" (resp. "able"). 
Definition 2: a pseudo-suffix pair of a given lan- 
guage L is valid when the pseudo-suffixes involved 
are actual suffixes of the language L, and when the 
pair can be used to describe the passage fl'om one 
25 
word of a given derivational family of L to another 
word of the same family. 
Two parameters are used to determine as precisely 
as possible valid pseudo-suffix pairs: the p-similarity 
and the number of occurrences of a pseudo-suffix 
pair. This last parameter accounts for the fact that 
the pseudo-suffix pairs encountered fi'equently are 
associated to actual suffixation processes whereas 
the less frequent ones either are associated to ir- 
regular phenomena or are not valid. But, in order 
to design a procedure which can be applied on sev- 
eral languages, and to avoid missing too many valid 
pseudo-suffix pairs, we have set these two parame- 
ters in the following loose way: 
Definition 3: a suffix pair of a language L is a 
pseudo-suffix pair of L which occurs re, ore than once 
in the set of word couples of L which are at least 
5-similar. 
Rein ar ks: 
• two words are at least k-similar if they are p- 
similar with p >_ k, 
• all the suffix pairs are not valid. The above 
definition provides a set of pseudo-suffix pairs 
which is approximately contains the set of valid 
pseudo-suffix pairs. Our purpose here is not to 
miss any valid pseudo-suffix pair, 
• the number of occurrences of a pseudo-suffix 
pair is set at 2, the minilnal value one can think 
of, and which corresponds to our desire to re- 
main language independent,, 
• the choice of the value 5 for the similarity fac- 
tor represents a good trade off between the no- 
tion of long words and the desire to be language 
independent. We believe anyway that a slight 
change in this parameter won't lead to a set of 
pseudo-suffix pairs significantly different fi'om 
the one we have. 
Here is an example of French suffix pairs extracted 
from the French lexicon, with their number of oc- 
currences: 
ation+N er+V 782 
+AJ ment+AV 460 
eur+AJ ion+N 380 
er+V on+N 50 
sation+N tarisme+N 5 
All these suffix pairs are valid except the last one 
which is encountered in cases such as "autorisation 
- autoritarisme" (authorisation- authoritarianism). 
One can note that a valid suffix pair does not al- 
ways link words which belong to the same deriva- 
tiona.l t'amily. For example, the pair (er+V,on+N) 
yields the following link "saler - salon" (salt - lounge) 
though the two words refer to different concepts. 
The notion of validity only requires that two words 
of a same derivational family can be related by the 
suffix pair, which is the case for the previous pair in 
so far as it relates "friser - frison" (curl (+V) - curl 
(+N)) 
3.2 Clustering words into relational 
families 
The problem we have to face now is the one of 
grouping words which belong to the same deriva- 
tional family, and to avoid grouping words which do 
not belong to the same derivational family. A sim- 
ple idea one can try consists in adding words into 
a family to the extent they are p-similar, with a 
value of p to be determined, and related through 
suffix pairs. For example, given the two English suf- 
fix pairs (+V,able+AJ) and (+V,ment+N), we can 
first group the 6-similar words "deploy" and "de- 
ployable", and then add to this family the word 
"deployment". But such a procedure will also lead 
to group "depart" and "departlnent" into the same 
family. The problem here is that suffix pairs relate 
words which do not belong to the same derivational 
family. 
There is however one way we can try to automate 
the control of the removal of a suffix, based on the 
following intuitive idea. If the string "lnent" is not. a 
suffix, as in "department", then it is likely that the 
word obtained after removal of the string, that is 
"depart", will support suffixes which do not usually 
co-occur with "ment", such as "ure" which produces 
"departure". The underlying notion is that of suffix 
families, notion which accounts for the fact. that the 
use of a suffix usually coincides with the use of other 
suffixes, and that suffixes from different families do 
not co-occur. Such an idea is used in (Debili, 1982), 
with manually created suffix families. 
To take advantage of this idea, we used hierarchi- 
cal agglomerative clustering methods. The following 
general algorithm can be given for hierarchicM clus- 
tering methods: 
1. identify the two most similar points (with sim- 
ilarity greater than 0) 
2. combine them in a cluster 
3. go back to step 1, treating clusters as points, 
till no more points can be merged (similarity 0) 
Particular methods differ in the way sinfilarity is 
computed. In our case, the initial points consist 
of words, and we define the similarity between two 
words, wl and w2, as the number of occurrences of 
the suffix pair of L which links wt and u,~. If such a 
suffix pair does not exist, then the similarity equals 
0. The similarity between clusters (or points as ref- 
ferred to in the above Mgorithm) depends on the 
method chosen. We tested 3 methods: 
26 
• single link; the similarity between two clusters is 
defined as the similarity between the two most 
sin:dlar words, 
• group average; the sin:dlarity between two clus- 
ters is defined as the average similarity between 
words, 
• complete link; the similarity between two clus- 
ters is defined as the similarity between the two 
less similar words, 
The single link method makes no use at all of 
the notion of suffix families, and corresponds to the 
naive procedure described above. The group average 
method makes partial use of this notion, whereas the 
complete link heavily relies on it. 
The clusters thus obtained represent an approx- 
imation of the derivational families of a language, 
and consitute our relational families. 
Here is an exemple of some relational falnilies ob- 
tained with the complete link method: 
deprecate deprecation deprecator deprecative dep- 
lvcativeness dep~vcativelg deprecatwity deprecatorily 
dep~vcato'rg dep~vcatingly 
deposabilitg deposable deposableness deposablg de- 
pose deposer deposal 
department departmentality departmental depart- 
mentalness departmentally 
depart departure departer 
3.3 Evaluation 
We performed an evaluation on English considering 
as the gold reference a hand-built derivational lexi- 
con that, we have. We extracted derivational families 
from this lexicon, and compared them to the rela- 
tional families obtained. This comparison is based 
on the number of words which have to be moved 
to go from one set of families to another set. Due 
to overstemming errors, which characterise the fact 
that some unrelated words are grouped in the same 
relational family, as well as to understemming er- 
rors, which correspond to the fact that some related 
words are not grouped in the same relational family, 
relational and derivational families often overlap. 
To account for this fact, we made the assumption 
that a word wi was correctly placed in a relational 
family ri if this relational family comprised in major- 
ity words of the derivational family of wi, and if the 
derivational family of wi was composed in majority 
by words of ri. That is there must be some strong 
agreement between the derivational and relational 
families to state that a word is not to move. All the 
words which did not follow the preceding constraints 
were qualified as "to move". We directly used the 
ratio of words "not to move" to compute the prox- 
imity between relational and derivational fa.milies. 
We, in fact, evaluated several versions of the rela- 
tional families we built, in order validate or inval- 
idate some of our hypotheses. The following table 
summarises the results obtained for the three clus- 
tering methods tested, with the parameters set as 
described above: 
Single link 47% 
Group average 77% 
Complete link 85% 
These results show the importance of the notion 
of suffix families, at least with the parameters we 
used. As a comparison, we performed the same 
evaluation with families obtained by two stemmers, 
the SMART and Porter's stemmer, well-known in 
the Information Retrieval community. To construct 
families with these stemmers, we took the whole 
lemmatised lexicon, submitted it to the stemmers 
and grouped words which shared the same stem. 
We then ran the evaluation above and obtained the 
following results: SMART stemmer: 0.82 Porter's 
stemmer: 0.65 Not surprisingly, the SMART stem- 
mer, which is the result of twenty years of (level- 
opment, is a better approximation of derivational 
processes than Porter's stemmer. 
4 From relational to derivational 
morphology 
Once the relational families have been constructed, 
they can be used to search for actual suffixes. Rather 
than performing this search directly from our lexi- 
con, i.e. from all the possible word pairs, the clus- 
tering made to obtain word families allows us to re- 
strict ourselves to a set of word pairs motivated by 
the broad notion of suffix we used in the previous 
section. 
We thus use the following general algorithm, 
which allows us to estimate the parameters of the 
general probabilistic model given above: 
• 1. from the lexicon, build relational families, 
• 2. from relational families, build a set of word 
pairs and suffixes, 
• 3. from this set, estimate some parameters of 
the general model, 
• 4. use these parameters to induce a derivation 
tree on each relational family, 
• 5. use these trees to refine the previous set of 
word pairs and suffixes, and go back to step 3 
till an end criterion is found 
• 6. the trees obtained can then be used to ex- 
tract dependencies between suffixation opera- 
tions, as well as morphographemic rules. 
We will now describe steps 3, 4 and 5, and give an 
outline of step 6. 
27 
4.1 Extraction of sufl=ixation operations 
Since our lexicons contain neither phonological 
nor semantic information, the general probabilistic 
model given in the introduction can be simplified, so 
that it is based only on the graphemic and morpho- 
syntactic dimensions of words. Furthermore, since 
we restrict ourselves to concatenative languages, we 
adopt the following form for a suffixation operation 
S: 
S= (ad=c°ncat(G°'s) )MSo~MSd 
where G4 (MSa) stands for the graphemic (morpho- 
syntactic) form of the derived word produced by the 
suffixation operation, Go (MSo) for the graphemic 
(morpho-syntactic) form of the original word on 
which the suffixation operation operates, conea* is 
the concatenation operation, and s is the suffÉx as- 
sociated to the suffixation operation S. 
We can then write the probability that a word w2 
derives, through a suffixation process, from a. word 
wl as follows: 
P(Wl--+w2) = 
= ~s p(S)p(Ga -+ G2, MS1 -+ MS',IS) 
= ~s p(S)p(G1 -+ G2\]S)p(MS1 ~ MSu\]G1 -+ G~, S) 
~- ~s p(5')p(Gi ~ G21S)p(M& ~ M&IS) 
the last equation being based on an independence 
assumption between the graphetnic form and the 
morpho-syutactic information attached to words. 
Even though some morpho-syntactic information 
can be guessed from the graphical form of words, 
it is usually done via the suffixes involved in the 
words. Thus, conditioning our probabilities on the 
mere suffixation operations represents a. good ap- 
proximation to the kind of dependence that exists 
between graphemic form and morpho-syntactie in- 
formation. 
The t.erm involving morpho-syntactic information, 
i.e. the probability to produce MS2 from MS1 
knowing the suffixation operation S, can be directly 
rewritten as: 
p(MS~ --+ MS2\]S) = ~(MS1, MSo)5(MS2, MSd) 
where 5 is the Kronecker symbol (6(x, y) equals 1 if 
the two arguments are equal and 0 otherwise). 
The words we observe do not exactly reflect 
the different elements they are made of. Mor- 
phographemie rules, allomorphy and truncation phe- 
nomena make it difficult to identify the underlying 
structure of words (see (Anderson, 1992; Corbin, 
1987; Bauer, 198:3) for discussions on this topic). 
That is, the graphemic forms we observe are the re- 
sults of different operations, concatenation being, in 
most cases, only the first. 
Since: allomorphy, truncation and mor- 
phographemic phenomena do not depend on 
the words themselves but on some subparts of the 
words; direct concatenation gives a better access to 
the suffix used; and suffixation usually adds element 
to the original form 3, we use the following forln for 
p(G, ~ G21S): 
p(G1 ~ G2\]S) = 0 if l(G1) > l(G2) 
else p(G1 ~ G.~IS) = 
co if diff(G1, G2, s) = 0 
el if diff(Gx,G2,s) = 1 
e:, if diff(G1,G;,s) = 2 
ca if diff(G1,G~,s) = 3 
0 otherwise 
where I(G) is the length of G, diff(strl, str2, suff) 
represents the number of characters differing be- 
tween strl and str2- suff (i.e. the string obtained 
via removal of surf from str2, proceeding backward 
from the end of str2), and ei, 0 < i < 3 are arbitrary 
constants, the stun of which equals 1, which control 
the confidence we have on a suffix with respect to 
the edit distance between G1 and G2. 
For our first experiments, we set the four constants 
co, cl , c,., ca according to the constraint: 
1 1 1 
ca = = = gc0 
which accounts for tile fact that we give more weight. 
to direct concatenation, then to concatenation with 
only 1 differing character, etc. 
To estimate the probabilities p(S), we first built 
a set of suffixation operations from relational fami- 
lies; for each word pair (Wl, w~) found in a relational 
family, we consider all the suffixation operations S 
such that: 
( s ) 
S = MS1 ---+ MS2 
with s being a sequence of letters ending G2 such 
that: 
p(G, --+ G21S) > 0 
This process yields, besides the set of suffixation 
operations, a set of word pairs (wl, w:~) possibly 
linked through a suffixation process. We will denote 
aDue to truncation and subtraction, there may be cases 
where the derived form is shorter or the same length as the 
original form. However, these eases are not frequent, and 
should be recovered by the procedures which follow. 
28 
this last set by WP. Some of the pairs in W'P are 
valid, in the sense that the second element of the pair 
directly derives from the first element, whereas other 
pairs relate words which may or may not belong to 
the same family, through a set of deriva.tional pro- 
cesses. However, since relational families represent 
a good approximation to actual derivational fami- 
lies, regular suffixation processes should emerge from 
WP. 
We then used the EM algorithm, (Dempster et 
al., 1977), to estimate the probabilities p(S). Via 
the introduction of Lagrange multipliers, we obtain 
the following reestimation formula: 
po,(S) = 
p~(S)p(G1 ~ G~IS)p(MS1 --+ MS, IS) A-1 
A-, ~s, po(S')p(G~ --+ G2\[S')p(MS~ --+ MS2\[S') ~,vp 
where A is a normalizing factor to assure that prob- 
abilities sum up to 1. 
This method applied to French yields the follow- 
ing results (we display only the first 10 suffixes, i.e. 
the string s associated with the suffixation opera- 
tion S, together with the POS of the original and 
derived words. The first number corresponds to the 
probability estimated ): 
0.071671 Noun ---+ er ~ Verb 
0.019032 Adj ~ er ~ Verb 
0.018231 Verb ~ ion ~ Noun 
0.017365 Noun ~ ion ---+ Noun 
0.017123 Noun ~ ur --+ Noun 
0.012864 Noun ~ eur ---+ Noun 
0.011034 Noun ~ on ~ Noun 
0.010780 Noun ~ te ~ Noun 
0.009955 Adj --+ ation --+ Noun 
0.009881 Noun ---+ nt ---+ Adj 
As can be seen on these results, certain elements, 
such as ur are extracted even though the appro- 
priate suffix is eur, our procedure privileging the 
element with direct concatenation (this concatena- 
tion happens after a word ending with an e). Note, 
however, that the true suffix is close enough to be 
retrieved. 
4.2 Extraction of suffixal paradigms 
The suffixes we extracted are derived from relational 
families. In these families, some words are related 
even though they do not derive from each other. The 
set of related words in a relational family defines a 
graph on this family, whereas the natural represen- 
tation of a derivational family is a tree. We want to 
present here a method to discover such a tree. 
A widely used tree construction method from 
a graph is the Minimal(Maximum) Spanning Tree 
method. We have adapted this method in the fol- 
lowing way: 
1. Step 1: for each word pair (w~, w~) in the family, 
compute a = p(tv~ --+ w~), 
2. Step 2: sort the pairs in decreasing order ac- 
cording to their a value, 
3. Step 3: select the first pair (wl, u,~), add a link 
with wl as father and w2 as daughter, 
4. Step 4: for each possible suffixation operation 
S such that p(wl ~ w21S) > 0, add to the 
node wl the potential allomorph obtained by 
removing s from G2, proceeding backward from 
the end of G2, 
5. Step 5: select the following pair, compute the 
set of allomorphs A, and add a link between the 
elements, if: 
(a) it does not create a loop, 
(b) if the first element of the pair, w~, is al- 
ready present in the tree, then the set, of 
allomorphs of w~ in the tree is either empty 
or has common elements with A. In the lat- 
ter case, replace the set of allomorphs of w~ 
in the tree by its intersection with A, 
6. Step 6: go back to Step 5 till all the pairs have 
been examined 
This algorithm calls for the following remarks: 
we use allomorph in a broad sense, for lexelnes: 
an allomorph of a word is simply a form associ- 
ated to this word and which can be used as the 
support to derivation in place of the word itself, 
if two sets of allomorphs are not empty and do 
not have elements in common, then we face a 
conflict between which elements serve as a sup- 
port for the different derivation processes. If 
they have common elements then the common 
elements can be used in the associated deriva- 
tion processes. If one set is empty, then the 
word itself is used for one derivation process, 
and the allomorphs in the other. 
Let us illustrate this algorithm on a simple exemple. 
Let us assume we have, in the same relational fam- 
ily, the three French words produire (En. produce), 
production (En. production), producteur (En. pro- 
ducer). Step 2 yields the two ordered pairs (produire, 
production); (produire, producteur). Steps 3 and 4 
for the first pair provide the suffixes (on, ion, tion, 
ction) and the associated allomorphs for produire: 
(produ, produc, product, p~vducti). When examin- 
ing the pair (produire, producteur), we obtain the 
suffixes (ur, cur, teur, cteur) with the allomorphs 
for produire: (produ, produc, product). The two sets 
of allomorphs have common elements. The final set 
of allomorphs for produire will obtained by intersect- 
ing the two previous sets, leading to: (produ, produc, 
29 
product). Note that the elimination of the form pro- 
ducti will lead to the rejection of the suffix on in 
subsequent treatment (namely the learning of suf- 
fixes from the trees, step 5 of the algorithm given at 
the beginning of section 2). 
Once the trees have been constructed for all rela- 
tional families (note that with our procedure, more 
than one tree may be used to cover the whole fam- 
ily), it is possible to reestimate the probabilities 
p(S). This time, the word pairs are directly ex- 
tracted from the trees, and, due to the sets of al- 
lomorphs, the probabilities p(G1 ---+ G21S) are not 
necessary anymore, since we will only rely on direct 
concatenation. Lastly, as described in the general al- 
gorithm, the new suffixation operations can be used 
again to build new trees, and so on and so forth, 
until an end condition is reached. A possible end 
condition can be the stabilization of the set of suf- 
fixation operations. Since our procedure gradually 
refines this set (at one iteration, the set ofsuffixation 
operations is a subset of the one used in the previous 
iteration), the algorithm will stop. 
Another extension we can think of is the extrac- 
tion, from the final set of trees, of morphographemic 
rules. Methods borrowed to Inductive Logic Pro- 
gramming seem good candidates for such an extrac- 
tion, since these rules can be formulated as logical 
clauses, and since we can start from specific exam- 
ples to the least general rule covering them (sev- 
eral researchers have addressed this problem, such 
as (Dzeroski and Erjavec, 1997)). 
5 Conclusion 
We have presented an unsupervised method to ac- 
quire derivational rules from an inflectional lexicon. 
In our opinion, the interesting points of our method 
lie in its ability to automatically acquire suffixes, as 
well as to induce a linguistically motivated structure 
in a lexicon. This structure, together with the ele- 
ments extracted, can easily be revised and corrected 
by a lexicographer. 
Acknowledgements 
I thank two anonymous reviewers for useful com- 
ments on a first version of this paper. 

References 

G. Adamson and J. Boreham. 1974. The use of 
an association measure based on character struc- 
ture to identify semantically related pairs of words 
and ocuments titles. Information Storage and Re- 
trieval, 10. 

S. R. Anderson. 1992. A-morphous morphology. 
Cambridge University Press. 

L. Bauer. 1983. English word-formation. Cam- 
bridge University Press. 

V. Cherkassky and F. Muller. 1998. Learning ffrom 
data. John Wiley and Sons. 

D. Corbin. 1987. Morphologie d(rivationnelle et 
structuration du lexique. Presses Universitaires de 
Lille. 

W. Daelemans, J. Zavrel, K. Van der Sloot, and 
A. Van den Bosch. 1999. Timbh Tilbury memory 
based learner, version 2.0, reference guide. Tech- 
nical report, ILK, Tilburg. 

J. Dawson. 1974. Suffix removal and word confla- 
tion. ALLC Bulletin. 

F. Debili. 1982. Analyse syntaxico-sdmantique 
fondde sur une acquisition automatique de rela- 
tions lexicales-sdmantiques. Ph.D. thesis, Univ. 
Paris 11. 

A. P. Dempster, N. M. Laird, and D. B. Dubin. 
1977. Maximum likelihood from incomplete data 
via the em algorithm. Royal Statistical Society, 39. 

S. Dzeroski and T. Erjavec. 1997. Induction of 
slovene nominal paradigms. In Proceedings of 7th 
International Workshop on Inductive Logic PTv- 
gramming. 

B. Fradin. 1994. L'approche £ deux niveaux en mor- 
phologie computationnelle et les d6veloppements 
r~cents de la morphologie. Traitement aatoma- 
tique des langues, 35(2). 

M. Hafer and S. Weiss. 1974. Word segmentation 
by letter successor varieties. Information Storage 
and Retrieval, 10. 

C. Jacquemin and E. Tzoukerman. 1997. Guessing 
morphology from terms and corpora. In Proceed- 
ings of A CM SIGIR. 

J.B. Lovins. 1968. Development of a stemming al- 
gorithm. Mechanical Translation and Computa- 
tional Linguistics, 11. 

C.D. Manning. 1998. The segmentation problem 
in morphology learning. In Proceedings of New 
Methods in Language Processing and Computa- 
tional Natural Language Learning. 

C. Paice. 1996. Method for evaluation of stemming 
algorithms based on error counting. Journal of the 
American Society for Information Science, 47(8). 

M. F. Porter. 1980. An algorithm for suffix strip- 
ping. Program, 14(3). 

A. Stolcke and S. Omohundro. 1994. Best-first 
model merging for hidden markov model induc- 
tion. Technical report, ICSI, Berkeley. 
