Statistical Morphological Disambiguation for 
Agglutinative Languages 
Dilek Z. Hakkani-Tfir, Kemal Oflazer, GSkhan Tiir 
\])el)aItlncn(; of COmlmtcr Engin(',ering, 
Bilkent University, 
Ankara, 06533, TUll.KEY 
{hakkani, ko, tur}~cs, bilkent, edu. tr 
Abstract 
In this 1)aper, we present sta.tistical models for 
morphological disambiguation in Tm'kish. Turkish 
presents an interesting problem for statistical ,nodcls 
since the potential tag set size is very large because 
of the productive, derivational morl/hology. \Ve pro- 
pose to handle this by breaking Ul) 1;11(; morhosyn- 
tactic tags into inflectional groups, each of which 
contains the inflectional features ti)r each (internm- 
diate) derived tbrm. Our statistical models score the 
probability of each morhosyntactic tag by consider- 
ing statistics over the individual inflection groups 
in a trigram model. Among the three models that 
we have deveh)l)ed and tested, (;11(; simplest model 
ignoring the lo(:al mort)hota(:ties within words l)er- 
tbrms the best. Ollr })('.st; trigram model 1)erfornls 
with 93.95% accuracy on otir test data getting all 1;11o 
lllorhosyllta(;ti(; aild semantic fc.atul'es correct. If we 
are just interested in syntactically relevant features 
alld igilore a very sinall set of semantic features, then 
(;tie accuracy increases to 95.07%. 
1 Introduction 
Re(:ent advances in (:onltmter har(lware and avail- 
al)ility of very large corpora have made (;t1(`- al)pli- 
cation of s(;atistical techniques to natural  
processing a t)asible and a very at)pealing resem'ch 
area. Many useflll results have 1)cell obtained by 
applyilig these techniques to English (and similar 
s) in parsing, word sense dismnbiguation, 
part-of speech (POS) tagging, speech recognition, 
et;c. However, s like Turkish, Czech, Hun- 
garian and Finnish, displ W a substantially different 
behavior than English. Unlike English, these lan- 
guages have agglutinative or inflective morphology 
and relatively free constituent order. Such s 
have. received little previous attention in statistical 
processing. 
In this lmper, we t)resent our work on modeling 
Turkish using statistical methods, and present re- 
suits on morphological disainbiguation. The meth- 
ods developed here are certainly al)plicable to other 
agglutinative s, especially those involving 
productive derivational phenomena. The Iml)er is 
organized as follows: After a brief overview of re- 
lated previous work, we smnma.rize relevant aspects 
of Turkish and present details of various statistical 
models for nlorlfliological disanfl/iguation for Turk- 
ish. We then present results and analyses fronl our 
experiments. 
2 Related Work 
There has been a large numl)er of studies in tag- 
ging and mori)hological disambiguation using vari- 
ous techniques. POS taggiug systems have used ei- 
ther a statistical or a rule-based approach, hi the 
statistical api)roach, a large corpus \]ms been used to 
train a t)rotmbilistic model wlfieh then has been used 
to tag new text, assigning the most likely tag for a 
given word in a given context (e.g., Church (1.988), 
Cutting el; al. (1992)). In the rule-based approach, 
a large mmfl>e.r of hand-craft;ed linguisiic constraints 
are used to elinfinate impossible tags or morpho- 
logical t)arse.s tbr a given word in a given context 
(Ka.rlsson et al., 1995). Brill (1995a) has presented 
a transfl)rnmtioi>based lea.rning at)l)roach, whi(:h in- 
duces disanlbiguation rules from tagged corpora. 
Morphologi(:al disanlbiguation in inflecting or ag- 
glutinative s with COlnl)lex morphology in- 
volves more than determining the major or minor 
Imrts-of-sl)cech of the lexiea.l items. Typically, roof 
phology marks a mlmber of inflectional or deriva- 
tioiml features and this involves ambiguity. For in- 
stance, a given word nlay be chopl)ed up in difl'erent 
ways into mort)heroes , a given mort)heine may inark 
different features depending on the morphotactics, 
or lexicalized variants of derived words may interact 
with productively derived versions (see Ottazer and 
Tiir (1997) for the difl'erent kinds of morphological 
ambiguities in Turkish.) We assume that all syn- 
tactically relevant fcat'urcs of word forms have to be 
determined correctly for morphological disambigua.- 
tion. 
In this context, there have l)een some interesting 
previous studies for difl'erent s. Levinger 
ct al. (1995) have reported on an approach that 
learns morpholexical probabilities fi'om an mltagged 
eorlms mid have. used the resulting infornlation in 
285 
morphological disambiguation in Hebrew. Haji~: 
and Hla(lk~i (1998) have used ntaximunl entropy 
modeling approach for morphological dismnbigua- 
tion in Czech. Ezeiza et al. (1998) have combined 
stochastic and rule-based disambiguation methods 
for Basque. Megyesi (19991 has adapted Brill's POS 
tagger with extended lexical templates to Itungar- 
tan. 
Previous ai)proaches to morphological dismnbi- 
guation of Turkish text; had employed a constraint- 
based approach (Otlazer and KuruSz, 1994; Oflazer 
and Tiir, 1996; Oflazer and Tiir, 1997). Although 
results obtained earlier in these at)preaches were rea- 
sonable, the fact that tim constraint rules were hand 
crafted posed a rather serious impediment to the 
generality and improvement of these systems. 
3 Turkish 
Turkish is a flee constituent order . Tlm 
order of the constituents may clmnge freely accord- 
ing to tim discourse context and the syntactic role of 
the constituents is indicated by their case marking. 
Turkish has agglutinative morphology with produc- 
tive inflectional and derixmtional suflixations. The 
number of word forms one can derive from a Turkish 
root; form mW be in the millions (ttankmner, 19891. 
Hence, the number of distinct word forms, i.e., the 
vocabulary size, can be very large. For instance, Ta- 
ble 1 shows the size of the vocabulary for I and 10 
million word corl)ora of Turkish, collected from on- 
line newspaI)ers. This large vocabulary is the reason 
Corpus size Vocabulary size 
1M words 106,547 
10M words 41.7,775 
Table 1: Vocabulary sizes for two Turkish corpora. 
for a serious data sparseness problem and also sig- 
nificantly increases the number of parameters to be 
estimated even for a bigram  model. The 
size of the vocabulary also causes the perplexity to 
be large (although this is not an issue in morpho- 
logical disambiguation). Table 2 lists tlm training 
and test set perplexities of trigram  models 
trained on 1 and 10 million word corpora for ri51rkish. 
For each corpus, tile first cohmm is the perplexity 
for the data the  model is trained on, and 
the second column is the pert)lexity for previously 
unseen test data of 1 million words. Another ma- 
jor reason for the high perplexity of Turkish is the 
high percentage of out-of vocabulary words (words 
in the test; data which did not occur in the training 
data); this results from the productivity of the word 
tbrmation process. 
Training Training Set Test Set (1M words) 
Data Perplexity Perplexity 
1M words 66.13 t449.81 
10M words 94.08 1084.13 
Table 2: The pert)lexity of Turkish corpora using 
word-based trigram  models. 
The issue of large vocabulary brought in by pro- 
ductive inflectional and derivational processes also 
makes tagset design an important issue. 111 lan- 
guages like English, the nunlber of POS tags that can 
be assigned to the words in a text; is rather linfited 
(less than 100, though some researchers have used 
large tag sets to refine g,:anularity, but they are still 
small compared to Turkish.) But, such a finite tagset 
al)proach for s like Turkish may lead to an 
inevitable loss of information. The reason for this 
is that the lnorphological features of intermediate 
derivations can contain markers for syntactic rela- 
tionshil)s. Thus, leaving out this information witlfin 
a fixed-tagset scheme may prevent crucial syntac- 
tic information fl'om being represented (Oilazer et 
al., 1999). For examl)le , it; is not clear what POS 
tag sllould be assigned to the word sa.~lamlaqtwmak 
(below), without losing any information, the cate- 
gory of the root; (Adjective), tile final category of 
the word as a whole (Noun) or one of the interme- 
diate categories (Verb). 1 
s a\[flam + laq + t,r + ma~: 
saglam+kdj ^DB+yerb+Become^DB 
+gerb+Caus+Pos ^DB+Noun+Inf +A3sg+Pnon+lqom 
to ca'ass (,s'ometh, i.ng) to become stron 9 / 
to strength, or,/fortify (somcth, ing) 
Ignoring the fact that the root; word is an adjec- 
tive may sever any relationslfips with a.n adverbial 
modifier modi~ying the root. Thus instead of a sim- 
I)le POS tag, wc use the full rno~Tflt, oIogical a'nahtscs 
of the words, rcprcscntcd as a combination of \]'ca- 
tures (including any dcrivational markers) as their 
morphosyntactic tags. For instance in the exami)le 
above, we would use everything including the root; 
form as the morphosyntactic tag. 
In order to alleviate the data sparseness probleln 
we break down the flfll tags. We represent each word 
as a sequence of inflectional groups (IGs hereafter), 
separated by "DBs denoting derivation boundaries, 
as described by Oflazer (1999). Thus a morphologi- 
cal parse would be represented in the following gen- 
eral tbrm: 
tThe morphological features other than the POSs are: 
+Become: become verb, +Cans: causative verb, +Pos: Positive 
polarity, +Inf: marker that derives an infinitive form fl'om a 
verb, +Aasg: 3sg number-person agreement, +Pnon: No pos- 
sessive agreement, m~d +Nora: Nominative case. "DB's mark 
derivational boundaries. 
286 
Full ~\[hgs (No roots) 
hfltectional GrouI)s 
Possil)le 
OO 
9,129 
O1)served 
10,531 
2,194 
%fl)le 3: Numbers of q2tgs and iGs 
root+IGi ~DB+IG2 ~DB+- - -^DB+IG. 
where IGi denotes relevant inflectional feal;urcs of 
the inflectional groul)s, including the 1)art-ofsl)eech 
for the root or any of the derived forms. 
For exalnlfle , the infinitive, tbtm s(u.~lamla.#'trmak 
given above would be ret)resented with the adjective 
reading of the root sa.rflam mM the tbllowing 4 IGs: 
1. Adj 
2. Verb+Become 
3. Verb+Caus+Pos 
d. Noun+Inf+A3sg+Pnon+Nom 
Table 3 1)rovides a (',oml)arison of the mnnl)er dis- 
l, in(:t full morl)hosyntactic tags (ignoring the root 
words in this case) mid IGs, generativ(dy 1)ossil)le 
a.nd observed in a (:ortms of 1M words (considering 
a\]\[ ambiguities). One can see thai; the' nmnber ob- 
served till tags ignoring the root words is very high, 
significantly higher than quoted tbr Czech by Ita.ji5 
and Itladk5 (1998). 
4 Statistical Morphological 
Disambiguation 
Morphoh)gica.1 disambiguation is the prol)lcun of 
tinding the. corresponding s(;qucnce, of morl/hological 
parses (including l;he root), 7' = t~ ~ = ll,12,...,l,,, 
given a sequence of words 1¥ = 'w~' = 1u 1 , 'W2, ...~ 'lU n. 
Our at)proach ix to model the (listrilmtion of lilOr- 
phological I)arscs give, n the words, using a hidden 
Markov model, and then to seek the variable 7', I.hat 
maxilnizes .I'(TII'V): 
:/' T P(W) ) 
= ~.~,,laxP(T) × P(WlT)(2) 
T 
The term P(W) ix a constant for all choices of T, and 
can thus be ignored when choosing the most prob- 
able 7'. \Ve C~lll further simplify the t)roblem using 
the aSSUlnlil;ion that words arc indc'i)endent of each 
other given their tags. In Tm'ldsh we can use the 
additional simplification that \]'(wilti) = l since l,i 
illcludes tim root fbrm and all morphosyntactic t~a- 
tures to uniquely determine the word f'orm. 2 Since 
2'l'hat is, we assume that there is no morphological gen- 
eration ambiguity. This is ahnost always true. There are 
a tb.w word fin'ms like flelirkcne and horde, which have the 
m o,l,-~/so z'(',,,;ItT) : P(*,,~I*~) = 1, w(; ~u~ ,vri,,: 
7 b 
P(WIT) = I\]P(w;It~') = 1 
i-- \] 
a, lld 
~/rgl~.,x P(SqW) = arg£11Hix \])(T) (3) 
7' !1' 
N o w 
P(~') = P(t,,lt~'-' ) x P(t,-ilt~ '-2) x... 
xP(t~ltl) × l)(tl) 
Simplifying fin%her with the trigram tag model, we 
get: 
P(T) = P(t,dt,,_.,.,Z,_,) × 
l'(t,,_l It,,-:~, t,,-.,) x ... 
v(l,:~l~,,~._,) × e(t~lt.,) × z'(~,) 
= flP(l, ilti_e,ti_l) (4) 
i=1 
win,to w,, d~,ti,,; P(z., It-,, z.0) = I"(z:1), p(z., I*,,, 1,1 ) = 
P(tuItl) to simplif3, the notation. 
If we consider morl)hoh)gi(:al analyses as a se- 
(t11011(:(2 of root ;111(l \]Gs, each parse t,i can \])e rcp- 
¢ res(;nted as (Ii, IGi, ,..., IGi,,,), where ni is the 
nuinber ()t" IG's in the, in, word.:~ This rel)resental.ioil 
changes the l)ro})lem as shown in Figure 1 wher(', the, 
chain rule has been used to factor out the individual 
comt)oncnts. 
This f(irtlttll~ltioll still suffers from 1:,t1(! data spat'so- 
ll(}SS 1)roblclll. To allo, viatc l;\]lis~ wc ill~ll,\[c thQ folkiw- 
ing siml)lifying assumlitions: 
1. A root wor(1 del)ends only on l;he roots of the 
1)revious words, alld ix indet)cndent of the inflec- 
tional and derivationa\] productions on thein: 
l'(nl(n-~, IG~_.,4,..., 1G~_.,,,, .,), 
(ri-a, I6'i_~,~,...,/G~_~,,,_,)) = 
P("~l"~-~,n-,) (~') 
The intention here is that this will be useflll 
in tile disambigua£ion of the root word when a 
given form has mori)hological parses with dif- 
fiwent root words. So, tbr instance, for disam- 
biguating the surface, form adam with the fol- 
lowing two parses: 
same morphological parses with the word forms gclirkcn and 
heretic, respectively but are i)ronounced (and writte.n) sllghlly 
differently. These. m'e rarely seen in written te.xts, and can 
thus l)e. ignored. 
aln our training and W.st data, the nmnbcr of 1Gs in a word 
form is on the average 1.6, the.refore, ni is usually 1 or 2. We. 
have seen, occasionally, word tbrms with 5 or 6 inflectional 
groups. 
287 
I)(ti\[t1-1) 
z 
I)(tiIti-2,ti-1) 
P( (ri, IGi,l . . . IGi,n~ )\[(ri-2, IGi-2,1. . . IGi-'2,ni_2 ), (ri-1, IGi-l,1. . . IGi-i ,,i-, )) 
P(ril(ri-2, \[Gi-2,1... IGi-~,,,_2), (ri-1, IGi-La... IGi-1,,zi_, )) x 
P(IGij \[(ri-2, \[Gi-'e,1...IGi-'e,ni_2), (I'i-1, IOi-l,1 ...\[Gi_l,ni_, ), I'i) x 
... X 
P( IGi,m \[(ri- u, IG i- 2,1 ...\[Gi-2,,~,_=), (ri-l , IG~-I,1 ...\[Gi-l,,,_~ ), ri, IGi,1, .,., i Gi,m-l) 
Figure 1: Equation tbr morphological disambiguation 
(a) adam+Noun+A3sg+Pnon+Nom (man) 
(b) ada+Noun+A3sg+Plsg+Nom (my island) 
in the iloun phrase k'trm~z~ kazakh adam (the 
man with a red sweater), only the roots (along 
with the part-of speech of the root) of the pre- 
vious words will be used to select the right root. 
Note that tile selection of the root has some im- 
pact on what the next IG in the word is, but we 
assuine that IGs are determined by the syntac- 
tic context and not by the root. 
2. An interesting observation that we can make 
about q_hrkish is that when a word is consid- 
ere(l as a sequence of IGs, syntactic relations 
are between the last IG of a (dependent) word 
and with some (including the last) IG of the 
(head) word on the right (with nfinor eXCel)- 
tions) (Oflazer, 1999). 
Based on these assumptions and the equation in Fig- 
ure 1, we define three models, all of which are based 
on word level trigrams: 
1. Model 1: The presence of IGs in a word only 
depends on the final IGs of the previous words. 
This model ignores any morphotactical relation 
between an IG and any previous IG in the same 
word. 
2. Model 2: The presence of IGs in a word only 
depends on the final IGs of the previous words 
and the previous IG in tile same word. In 
this model, we consider morphotactical rela- 
tions and assume that an IG (except the first 
one) in a word form has some dependency on 
tile previous IG. Given that on the average a 
word has about 1.6 IGs, IG bigrams should be 
sufficient. 
3. Model 3: This is the same as Model 2, except 
that the dependence with the previous IG in a 
word is assumed to be indelmndent of the de- 
pendence on the final IGs of the previous words. 
This allows the formulation to separate the con- 
tributions of the morphotactics and syntax. 
The equations for these models are shown in Figure 
2. We also have built a baseline model based on 
when tags are decomposed into inflectional groups. 
tile standard definition of the tagging problem in 
Equation 2. For the baseline, we have assumed that 
the part of the morphological analysis after the root 
word is the tag in the conventional sense (and the 
assumption that P(wi\]ti) = 1 no longer holds). 
5 Experiments and Results 
To evaluate our models, we frst trained our models 
and then tried to morphologically disambiguate our 
test data. For statistical modeling we used SRILM 
- the SRI  modeling toolkit (Stolcke, 1999). 
Both the test data and training data were 
collected from the web resources of a Turkish 
daily newspN)er. The tokens were analyzed using 
the morphological analyzer, developed by Oflazer 
(1994). The mnbiguity of the training data was 
then reduced fl'om 1.75 to 1.55 using a preprocessor, 
that disambiguates lexicalized and non-lexicalized 
collocations and removes certain obviously impossi- 
ble parses, and trigs to analyze unknown words with 
all unkllown word processor. The training data con- 
sists of the unambiguous sequences (US) consisting 
of about 650K tokens in a corpus of i million tokens, 
and two sets of manually dismnbiguated corpora of 
12,000 and 20,000 tokens. Tile idea of using unam- 
biguous sequences is similar to Brill's work on un- 
supervised learning of disambiguation rules for POS 
tagging (199517). 
The test data consists of 2763 tokens, 935 (~34°/0) 
of which have more than one morphological analysis 
after preprocessing. The ambiguity of the test data 
was reduced from 1.74 to 1.53 after prct)rocessing. 
As our evaluation metric, we used accuracy de- 
fined as follows: 
# of correct parses x 100 
accuracy = # o.f tokens 
The accuracy results are given in Table 4. For all 
cases, our models pertbrmed better than baseline tag 
model. As expected, the tag model suffered consid- 
erably from data sparseness. Using all of our train- 
ing data., we achieved an accuracy of 93.95%, wlfich 
is 2.57% points better titan tile tag model trained us- 
ing the same amount of data. Models 2 and 3 gave 
288 
In all three models we assume that roots and IGs are indel)cn(tenl. 
Model 1: This model assumes that un IG in ~ word depends on the last IGs of the two previous words. 
P(IGi,t:\[(ri-~, 1Gi-2,~ ... Gi-'~,,~_.2), (ri-1, IGi-l,~,..., IGi-~,,,~_~ ), ri, IGi,~,..., IGi,t,-~) 
) I" l 1 ( G~,~.IIG~-~,,,,_~,-/Gi-l,ni_l ) 
Ther(;fore, 
l)(ti\[ti-~,ti-1) 
II i 
k=.l 
(0) 
Model 2: The model a ssmn(~s that in addition to th(~ dcl)(mdonci(;s in Model 1, an IG also (lot)ends on tim 
previous IG in the s~mm word. 
P(1Gi,~.l(ri_.~, IGi-~,~ ...IGi-~,~,i .,), (ri-~ 
Th(,r(',for(;, 
IGi_~,,,...,IGi 1,,~__,),ri,lGi,1,...,lGi,~.--1) = 
× \[I ~'(~c~,~.l~(;~--~,,,,_~, ~(;~-~,,,,_,, IG~,k._, ) (7) 
k=l 
Model 3: This is same as Model 2, except the mort)hotactic and syntactic dt'4)t;ntlenci(;s arc considered to 
bc independent. 
\])(\]Gi,kl(l"i-2,lGi-2,1....\[Gi-2,,,i .,),(ri-l,.ld/i-.i,1,...,\]Gi-l,,,~_ 1),ri,lGi,l,...,l-Gi,k-l) = 
) ~ 
\])(IGi l.\[IGi..2,,,~ .,, t'(~i_l,,,, .l (~ri,t,.--l) 
r.l?lwr(;forc, 
P(ti\[ti-2,l.i-j) = l)(~'i\]ri_.~,ri_l) x \] (1(,,,11. ,. IG,_u), .... ,,, IOi-i,,,,_,) x 
Ic--I 
In order to simpli\[y the uotation, wc lmve dctlncd the follc, v:ing: 
) ,t • I (IGi,t. llGi,k_l 
= l'(1Gi,~.llGi ._,,,. ,.IGi j,,,,_,) x P(IGi,~,.) 
I'(IGi,t. IIGi,,~-I) ) 
F(','~l','-,,','o) = IX',',) j~(,.:+.(,,,.,) 
= r,(,.+.,) l(m,,~l±(,_, .... ,,mo,,,o) = J (.rc~,~,,) , , , = 17 , ,. I (IG2,t\[IGo ...... ICt,,,, ) (IG2,11IG,,,,,) 
1 (IGi,l\[IGi-2,,~i_~, IGi-1,,~i_l, IGi,o) 
P(IGI,I~I IG-I .... 1, IGo,~o, IGx,k-1) 
P( IG2,t\[IGo,~o, IG1,~1, IG'~,t-l ) 
= P(IGi,lllGi_.2,,,i_,,IGi_l,,,i_~) 
= P(IGI,~IlGI,,,_I) 
= I (IG2,t\[IG1,,~I,IG2,t_I) 
P(IG2,~\[IGI,,.,IG2,o) = P(IG2,,IIG~ .... ) 
P(fGi,ll\[ai,o) = I;' ( ~r (.T~i, 1 ) 
for k = 1,2,...,'hi, 1 = 1,2, ...,n~, and i = 1,2, ..., 'n. 
Figure 2: Equutions for Modcls \], 2, and 3. 
289 
Training Data ~Lag Model Model 1 Model 1 Model 2 Model 3 
(Baseline) (Bigram) 
Unambiguous sequences (US) 86.75% 88.21% 89.06% 87.01% 87.19% 
US + 12,000 words 91.34% 93.52% 93.34% 92.d3% 92.72% 
US + 32,000 words 91.34% 93.95% 93.56% 92.87% 92.94% 
Table 4: Accuracy results for difli;rent models. 
similar results, Model 2 suffered from data sparse- 
hess slightly more than Model 3, as expected. 
Surprisingly, the bigram version of Model I (i.e., 
Equation (7), but with bigrams in root and IG mod- 
els), also performs quite well. If we consider just the 
syntactically relevant morl)hological features and ig- 
nore any senlantic features that we mark in ulorphol- 
ogy, the accuracy increases a bit flirt, her. These stem 
ti'om two properties of %lrkish: Most Turkish root 
words also have a proper noun reading, when writ- 
ten with the first letter cai)italized. 4 We (:ount it; as 
an error if the tagger does not get the correct 1)roper 
noun marking, for a proper noun. But this is usua\]ly 
impossil)le especially at the begimfing of sentences 
where the tagger can not exploit caI)italization and 
has to back-off to a lower-order model. In ahnost all 
of such cases, all syntactically relevant morl)hosyn- 
tactic features except the proper noun marking are 
actually correct. Another imi)ortant ease is the pro- 
noun o, which has t)oth personal prollottll (s/he) and 
demonstrative 1)ronoun readings (it) (in addition to 
a syntactically distinct determiner reading (that)). 
Resolution of this is always by semantic cousi(ler- 
atious. When we count as (:orreet m~y errors in- 
volving such selnantic marker cases, we get an ac- 
curacy of 95.07% with the best (',as(; (cf. 93.91% 
of the Model 1). This is slightly 1)etter than the 
precision figures that is reported earlier on morpho- 
logical disambiguation of Turkish using constraint- 
based techniques (Oflazer and T/Jr, 1997). Our re- 
suits are slightly better than the results on Czech 
of Haji~ and Hla(lkg (1998). Megyesi (1999) reports 
a 95.53% accuracy on Hungarian (a  whose 
features relevant to this task are very close to those 
of Turkish), with just the POS tags 1)eing correct. In 
our model this corresponds to the root and the POS 
tag of the last IG 1)eing correct and the accuracy 
of our best model with this assumi)tion is 96.07%. 
When POS tags and subtags are considered, the re- 
ported accuracy for Hungarian is 91.94% while the 
corresl)onding accuracy in our case is 95.07%. We. 
can also note that the results presented by Ezeiza 
et al. (1998) for Basque are better titan ours. The 
main reason tbr this is that they eml)loy a much 
more sot)histicated (comt)ared to our t)reprocessor) 
din fact, any word form is a i)otential first name or a last 
naII10. 
constraint-grammar based system which imI)roves 
t)recision without reducing recall. Statistical tech- 
niques applied at'~er this disaml)iguation yield a bet- 
ter accuracy compared to starting from a more am- 
1)iguous initial state. 
Since our models assmned that we have indepen- 
dent models for disambiguating the root words, and 
the IOs, we ran experiments to see the contribution 
of the individual models. Table 5 summm'izes the ac- 
curacy results of the individual models for the best 
case (Model 1 in Table 4.) 
Model Accuracy 
IG Model 92.08% 
Root Model 80.36% 
Combined Model 93.95% 
Table 5: The contril)ution of the individual models 
ibr the 1)est case. 
There are quite a number of classes of words which 
are always ambiguous and the t)reprocessing that 
we have employed in creating the unambiguous se- 
quences ca.n never resolve these cases. Tlms sta- 
tistical models trained using only the unambiguous 
sequences as the training data do not handle these 
ambiguous cases at all. This is why the accuracy 
results with only unalnbiguous sequences are sig- 
nificantly lower (row 1 in Table 4). The manually 
dismnl)iguated training sets have such mnbiguities 
resolved, so those models perform much better. 
An analysis of the errors indicates the following: 
In 15% of the errors, the last IG of the word is in- 
correct t)ut the root and the rest of the IOs, if any, 
are correct. In 3% of the errors, the last IG of the 
word is correct but the either the root or SOlne of 
the previous IGs are incorrect. In 82% of the errors, 
neither the last IG nor any of the previous IOs are 
correct. Along a different dimension, in about 51% 
of the errors, the root and its part-of-speech are not 
determined correctly, while in 84% of the errors, the 
root and the tirst IG combination is not correctly 
determined. 
290 
6 Conclusions 
W(; have 1)resented an ai)l)roach t() slatisti(:al mod- 
eling fl)r agglutinativ(: lmlguages, esi)(;(:ially those 
having l)roducl;ive d(;rivational 1)\]:(ulomena. ()ur ai)- 
l)roa('h essentia.lly involves l)re.al:ing u t) the full m(/r- 
t)hological ana.lysis across (l(~'rivational boundaries 
mid l;reai;ing the (:Onll)On(mt;s ;Is sul)tags, and l;helt 
determining the corre(:l; se(tuenc( ', of tags via sl;al;is- 
tical l;echniques. This, to our knowl(~.(lge, is th('. first 
detailed attempt in statistical modeling of agghttina- 
rive langua.g('~s and (:an cerl;aJnly l)('. al)plied to other 
such lmLguages like ltmlgari:m ;rod Fimfish with 1)re - 
duetive derivational morl)hology. 
7 Acknowledgments 
We tlmnk Andreas Stoleke of SIH STAI{ lml) tbr pro- 
vi(ling us wil;h the \]anguag(; mod('.ling t;oolkit and ti)r 
very helpflfl dis(:ussions on I;his work. IAz Stlriberg 
of SRI STAR Labs, and Bilge Say of Middle East 
Te(:hnical University hfformal;i(:s \]nstitul;e, 1)rovided 
hell)rid insights and (:ommenl;s. 

References 

Eric Brill. 1995a. Transformation-based Error-driven learning and natural  processing: A case study in part-of-speech tagging. Computational 
Linguistics , 21(4):543-566, l)ceeinl)er. 

Eric Brill. 1995b. Unsupervised learning of disambiguation rules for part of sspeech tagging. In PRoceedings of the Third Workshop on Very Large Corpora, Cambridge, MA. 

Kenneth W. Church. 1988. A stochastic parts program and a noun phrase parser for unrestricted text.  In Proceedings of the Second Conference
on Applied Natural Language Processing, Austin, 
Texas. 

Doug Cutting, Julian Kupiec, Jan Pedersen, and 
Penelope Sibun. 1992. A practical part-of-speech 
tagger, In Proceedings of the Third Conference 
on Applied Natural Language Processing, Trento, 
Italy. 

N. Ezeiza, I. Alegria, J. M. Arriola, R. Urizar, and 
I. Aduriz. 1998. Combining stochastic and rule-based methods for disambiguation in agglutinative s. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pages 379 
384, Montreal, Quebec, Canada, August. 

Jan Hajic and Barbora Hladka. 1998. Tagging 
inflective s: Prediction of morphological categories for a rich, structured tagset. In 
Proceedings of COLING/ACL '98, pages 483-490, 
Montreal, Canada, Augusl;. 

Jorge Hankmner, 1989. Lezieal Ilepresentation and 
Process, chapter Mort)hological Parsing and the 
Lexicon. The MIT Press. 

l,¥ed Karlsson, Atro Voutilainen, Juha \]teikkilii, 
and Arto Anl:tila. 1995. Constraint Gramm, ar-A 
Language Inde.pcndent System for I>arsi'n9 Unre- 
.~trieted Tezt. Mouton de Gruyter. 

Moshe Levinger, Uzzi Ornan, and Alon Itai. 1995. 
Learning morpho-lexical probabilities from an 
untagged corpus with an application to Hebrew. Computational Linguistics, 21(3):383-404, Sei)t(nnb(~u'. 

Beata Megyesi. 1999. Improving Brill's POS tagger for an agglutinative . In Pascale Fung and Joe Zhou, editors, Proceedings of the 
Joint SIGDAT Conference on Empirical Methods 
in Natural Language Processing and Very Large Corpora, pages 275-284, College Park, Maryland, 
USA, June. 

Kemal Oflazer and Ilker Kuruoz. 1994. Tagging and 
morphological disambiguation of Turkish text. In 
Proceedings of the 4th Applied Natural Language 
Processing Conference, pages 144-149. ACL, October.

Kemal Oflazer and Gokhan Tur. 1996. Combining hand-crafted rules and unsupervised learning in constraint-based morphological disambiguation. In Eric Brill and Keneth Church, editors, 
Proceedings of the ACL-SIGDAT Conference on 
Empirical Methods in Natural Language Processing. 

Kemal Oflazer and Gokhan Tur. 1997. Morphological disambiguation by voting constraints. In Proceedings of the 35th Annual Meeting of 
the Association for Computational Linguistics 
(ACL '97/EACL '97), Madrid, Spain, July. 

Kemal Oflazer, Dilek Z. Hakkani-Tur and Gokhan Tur. 1999. Design for a Turkish: Treebank. In Proceedings of Workshop on Linguistically Interpreted Corpora, at EACL '99, Bergen, Norway, June.

Kcmal Otlazer. 1994. Two-level description of Turk- 
ish morphology. Literary and l, inguistie Co'reput- 
ing, 9(2):137 148. 

Kemal Oflazer. 1999. Dependency parsing with an 
extended finite state approach. In Proceedings of 
the 37th Annual Meeting of the Association for 
Computational Linguistics, College Park, Maryland, June. 

Andreas Stolcke. 1999. SRILM--the S1H  
mo(teling toolkit, http ://www. speech, sri. corn/ 
proj ects/srilm/.
