PARSING AGGLUTINATIVE WORD STRUCTURES 
AND ITS 
APPLICATION TO SPELLING CHECKING FOR TURKISH 
by 
AY,~IN SOI,AK and KEMAL OFI,AZER 
l)el>arl, In<'nt, of ('onlputer Engineering and llfformation Sciences 
Bilkent Uniw~rsity 
Ililkenl. Ankara, 06533 Tilrkiye 
ABSTRACT 
Most of tile research on parsing natnral \[allguages has beetl concerned with I",nglish, or wil, h other languages 
nlOrl)hologically similar Io English. Parsing agglntinat.ive word st, ructures ha.s altracted relatively little attcnl;ion 
most probal~ly becanse agghlfinatiw? lallgllages COlll~aill word s/ructtlres of considerable complexity, and parsing 
WOrdS ill Stlch languages I'(?(llliros morphok~gical analysis techniques. Ill this pal)er, we pi'eSell(r the design and 
implementation of a morphological root-driven parser tor Turkish word structures which has been mcorporatoed 
into a spelling checking kerllel for on-line Tiirkish texl, The agghltmative Ilatllre of the language and the resulting 
('Olll\[)l<?x Wol'd \['ornlatiollS, V;ll'iOllS pholleLic llall/lOlly l'tlleS alld sill)tie eKcepLiOllS \[)reselll, cel'taill difficulties llOl 
usually on('ountered in the spelling checking of laagua,ges like English and make this a very challenging probhnH. 
1. Introduction 
Morphological cbussilicat, ion of natttral languages ac- 
cording to their word Stl'tI('ttlrt+s idaces languages like 
Turkish, Finnish, and lhmgarian Io a class called "ag 
ghfl+inalive langua.ges". \[n sllch hmguages, words are 
COlllbillaLiOll of several Iilorphel\]les. There is a root 
and several suffixes are conlbined lo this root in order 
to modil},' or extend its meaning. Whal characterizes 
agglut, inative languages is thai stem fornlation hy at" 
fixation 1o previously derived st.oms is extremely pro- 
ductive. A given stellL ew'n Ihough itself qlnt0 corn 
ph+x, call generally serve as basis for evell lllol'l' ('o111 
l)lex words. Consequ.ntly, agglutinative languages 
contain words of considerable COnll~lexity, and parsing 
such languages necessitates a thorough morphological 
analysis. 
Morphological parsing has allracted relatively \]itl,le 
attention ill con'tputational linguistics. Tile reason is 
that nearly all parsing research has been concerned 
wMl English, or wit.h languages morl)hologicaII ) sim- 
ilar to English. ,qillce in such languages words con- 
tain only a fi~w nalldJer of affixes, or none at all. 
alhnost all of the parsing mod<+ls \[br Ill(!lll consider 
recognizing those affix<+s +Is being trivial and thus 
do nol require a mOt'l+hological nnalysis, hi agghni 
native langaages, words C(/lll,ail111o direct indication 
Of t/lOrl;llel/le bOtlltdarios whMi at,, i. gellela\[ (IOpOll 
dent on tit(? inorpho\]ogieal and pllon(Jh)gical conlex\[+ 
A morphological parser requires a nlorphold/OaOlog- 
lest\[ COlllpollellt which l/lediat, es I)olwl?ell I\[he Sill\[kl('t • 
1\['o1'111 of a \[llorp\]lellll! as ellco/llllel'l?d ill Ihe il/ptll text 
aud the lexical form in which the t\]torl)h<~me is stor<.d 
ill tile lllOl'phellle illVelltory, ie, a i\[WallS of i'e('oglliZ- 
ing variallt forms of \[l/Ol'phelllOS as tll~! SaltlO. alld a 
nlorl)hotactic component which specilies which corn 
hi.rot,ions of Inorl)henws at," Iwrn:itt,'d \[7\] 
\lorphotogical parsing algorithms ma+x he divided 
it/to Ix',() classes as ollir .slrtpl~la 9 ;llt(I rool-df'iv~ It ;nlal+ 
ysis met.hods. FIolh approaches hawr beell Ilse/l frOlll 
very early on in l.he history of morphologicM parsing, 
For instance, I)ackal'd's parser flw ancien| Greek \[15). 
aud Brodda and Karlsson's for Finnish \[3\] used affix 
slripping. Sagval\[, on tile other hand, devised a root- 
driwnl morpllological analyzer for Russian \[17\]. In 
addition, other tool; driwm morphological parsers for 
tile agglutinative langmtges Quechna \[9, 10\], Finnish 
\[l 1\], and Turkish \[6\] were developed independently ill 
the early 1980's+ All of these Ihree pars(~rs proceed 
from left to righl,. Iltlot, s ~tre SOllgh| ill the lexicon that, 
mat.oh imtial suhstl'ings of the word, and t, he gram 
Iltatica\[ category o\[ the root del, ermines what (:lass 
of sutlixes may follow. When a suttix in the permil- 
ted class is found to match a furttler substring of t,he 
word, grammatical mfornlation in 1he lexical entry 
fl)r that sulflx del,ernlines once again what class of 
suffixes may follow. If the end of tile word can be 
reached hy il.eration of this process, and if the last 
sullix analyzed is one which illay elld ;i word. t,\]le 
parse is successful \[7\]. 
Another Icft-t+o-right parsing algol'itllni for autolnttlic 
analysis of Turkish words was proposed and ap 
plied by I(iiksal ill his Ph.l), thesis II2} Ills algo 
rithm called 'qdentified Maxillllllll Mat, ch (IMM) AI 
golithnl", tries to find the Ill;IXinllllll h'ngth subslring, 
which is present, in a reel dict.ionary, h'OI\]l the left of 
tim word. If a soltltriOll is ollLailled, ie., the rool IllOl+ 
\])ht?lllU iS identilledL the retnainhlg I)art of the word is 
considered as th( search (?\[elllellL. This part is looked 
tbr in the suffix ItlOrl)henle forms dictionary aml the 
nlorphemes are idl!ntified one by one. '\['he process 
StOpS whell there is no relllaillillg part. \[\]owevet ill 
SOllle casi.s, ;llt\[iotlgll it nolat+ioll is ohtained furl, her 
consistency analysis proves that this solution is tLot 
the corrccl one. In such cases Ill. previotts pseudo 
solution is reduced by one character alld all t,he search 
procedure is initiated once \[ll()l'C. 
'l'heso approaches to tnorphologicaL parsing of Turk 
Ac+~+s DE COLING-92, NANTES, 23-28 Ao~r 1992 3 9 l'roc. OF COLING-92, NANTES, AUG. 23-28, 1992 
ish words have tim following short.coming: They do 
not consider the fact that in Turkish, words contain 
l, rPlllelldOllS alllOlln\[, of selnantic illfOrlllat, iOll that has 
to })e taken into account. Ill these parsers, it is only 
the granlniatical category of the stein that detrrmine 
*lie suffixes that may follow, l|owever, niost of the 
sultixes in Turkish, especially the derivational oaes, 
call be at.taclled only to a linlited number of reels or 
sleltlS Inostl} duo to Sel/lallliC reasollS. 
Another shortcolnhig of the previous parsers for Turk- 
ish is ihat they allow ille iterativr ilsage of derNa- 
iional su\[fixes. Although, bi6ksal \[12\], prevelltS the 
COIISeC/liiVe |lsagl, of the Sallle ltlOl'i)hellle lwicc, lie 
slill l)arsos the word G(3ZI,(II,2('{iI,('YI,~('.i)L{31,; cor- 
rectly, so do llankalner \[7\]. It is tl'lli" l\]lat. SOltle Turk- 
ish sutlixes can form aa iteraiive loop. but usually 
th,' number of iteratioli is not too high. rl'he above 
word ran I)e parsed correctly Ul; to lhe point G(3Z- 
l,{'l((i:l~!L{'tl,; (the occultation of oculists), but the 
words GOZI,UI,2(,'UI,UI,;(,'{: and (IOZLUI((,:trLI l(- 
('(!l,{iK are meaningh'ss, and tllerefore sonle conlro\] 
llle('}lalliSlllS IlSilig semantic iii\[oriliat, iOll SilOllid be ill- 
eluded wilhin the parser Io avoid parsing StlCli inealt- 
inglrss words as if lhey werr corrl>ci. 
One of t.lie loosl iniportant application areas el' pars- 
ins words in natural lallguages is clleeking their 
spellings. Altllough ltianv spelling checkers for l'\]l> 
glish and soltle other bu/guages \]lave been developed, 
st) far no such t.oo\] was present for 'lTurkish. The 
reason for Ibis is l)rol)ably the conlp\]exity of parsing 
problem for Turkish as explained al)ow~. Wrong or- 
(l('l'illg Of li\]orphellleS alld errors ill re;re\] o1' consollaal 
harntcmies Inay C~lllSP till, Wl'Oltg spelling of Turkish 
words (ionsequently. in order t.o check Ihe spelling 
of a Turkish ~ol'd, it is iit, cessai'y to lllake si<gnilieanl 
phonological mid ntorphological analyses. 
This paper describes a ntorphologieal root-driw'n 
parser developed for Turkish language and its appli 
c;itiOli to spoiling cllecking. A lllajor porlion of lhis 
work depends Oll a drlailrd and careful research on 
stJilW \[{'alllres of Turkish ilial lliakc t\]w parsing prob- 
lent for this languagr rsprcially hard and ini.eresling. 
'lh,. lbllowing svclion pr+,sonts all ovrrview of eel 
thin illorl)bOl)bOlielilic alid llioriihological aslwrts of 
tlw turkish language which are especially r,'le~anl to 
i lie probirl,, ulidrr con~idrl'alion (for delails se,' {70\]) 
2. The Turkish Language 
Turkish is an agglutinative languageihat belongs Io 
a group of \]anguagrs known as A\]taic \]anguages. For 
all agg\]ulinative laligllagc, t\]/c collrepl of" word is 
iuiir\]l \]arger than lhe sol c)\[ vocabillary ileilts. ~Vord 
slrllrl tll'es Call grow Io hi, relatively long b} addition 
of suttixes and solnetiiries contain an amount of se- 
nlantic information equivalent to a complete sentence 
in another language. A I)opular example of coin- 
plex Turkish word formation is (,71'\]KOSI,OVAKYA- 
LILAf~TIItAMADI\['~.LAF{,IM1Z\])ANMI~SINIZ whose 
equivalent in English is "(it is speculated that) you 
had been one of those whom we could not convert 
to a Czechoslovakian." In this example, one word 
m Turkisll corresponds to a fllll sentence in English. 
Each suitix has a certain flmction and modifies the 
semantic information in the steni preceding it. In our 
example, the root mori'~heme ~EKOSLOVAKYA is 
the nalne of the country Czechoslovakia and the suffix 
- /,I converts the meaning into Czechoslovakian, while 
the following suffix LA~ makes a verb from the pre- 
vious stem meaning to become a Czechoslovakian, t, 
and so o11. 
2.1. Turkish Phonetic Model 
Being phonetic, the Turkish language can be adal)ted 
t.o a number of different alphabets. In the, past, var- 
ious alphabets haw~ been used to transcribe Turkish, 
e.g., Arabic. Since 1928, Latin characters have been 
used. The Turkish alphabet consists of 29 letters of 
which 8 (A, E, I, L O, (3, U, (~) are vowels, and 21 
(B,C,C~,D,F,G,(L H,J,K, L,M.N,P,R,S,,q,, 
q, V, Y, Z) are consonants. 
Turkish word formation uses a number of phonetic 
harlrlony rules. Vowels and COltSOllants change in cer- 
tain ways when a suffix is apl)ended to a root, so that 
sucll harnlony constraints are not violated. 
2.1.1. Vowcl Change iti Suffixes 
Ahnost all suffixes in Turkish ilse one of two basic 
vowels and their allophones. We have denoted these 
sets of allophones with braces around the main vowels 
A and 1. as {A} and {I}. The allophones of {At are A 
and E, where {It represents I, i, U, or {r. The vowels 
O and (} are only used in root inorl)hemes (especially 
in the first syllable) of Turkish words. ~ 
The vowel harltlOllV rtlies require thai vowels in a silt L 
fix challge according to certain rules whell they are af- 
fixed to a stem. The first vowel in thr suftix changes 
according to the last vowel of the sl.em. Succeeding 
vowels ili tile suffix change according to the vowel pre- 
reding it. If we denole the preceding vowel (lie it in 
the sten, or in the suffix) by v then {At is resolved as 
A if r is A, 1. O. or U. otherwise it is resolved as E. 
OiL the other band, {I} is resolw~d as \[ if e is A or 1. as 
iifeisEori, as U if t: is O or U. and as 0 if v is (3 or 
U. For examl)le the word "YAPMAYACAI,7.TINIZ" 
can be broken htto sutiixes as: 
YAI'/M {A)/\[Y\]:'{ A )C{A} { t'~)4/{l)}S {I}/N {I}Z 
i \[qom nm~ on. ~, wilt indicate lhe I;ng\]ish meatlh/g tff a iVlbl'll ill Turkish ill p,~l'etlllwsl!s following il. 
~ I'h,' proglrssivo lense suffix {\]}YO\[( is an exceptioll. 
<\[ \] iudicates an opti,mal IllOi'l)heilie that nniM Iw inseried before it sulllx to satisfy cel'l&in harniony rules. In this case. \[Y\] 
indi<atrs Illltl liw COllS~lllillll ~l" IIitl~,l I~i" ilisl I'ted if Ihr last lel ICl of (lie $Lflll is ,~. vowel, otherwise il is dl'Op|)ed: e.g., OI'~U (read) 
. ()\[<.1%'AC:\\],{ is/lie will lead), bul 7()R (ask) -- 54C)\[}A(':\1< fs/i,,' ~ill ask) 
i'\[hr iu<, ;tilol)holies <if {K } al'r K and (i 
'l'he I~l~ alloph,mrs of {It} ale |) alld \]'. 
AcrEs DE COLING-92, NAh'TES, 23-28 AOOT 1992 4 0 PRec. Of COLING-92, NANTES, AUG. 23-28, 1992 
1( can bc seen that the vowels ill the correct spelling 
of the word obey the rules almve, while a spelling like 
• ";APMAYACEI<TiNiz violates the harlnony rules 
because all {A} in the sullix call not resolve to all I'\] as 
tile prereding vowel is all A, It shouh\] be nlenlioned 
in passing i,hai t, here are also SOllle suffixes, sucli as 
-\[{l;\].'~, whose vowels llOVOr ch~lllgP. 
~,\]L,2, Consonallt; }lili'lll(llly 
Another basic asperl of Turkish pllonology is con- 
sonant harniony. It is based on the classilicalion of 
'hlrkis\[i (1OllSOllalllS illlO two lllaill groiips, voit'fless 
a.d ,o,c<:d. Th,, voiceh>ss COliSOliaFlls arc (', F, T. 
1t, $. K. 1 >, ~. 'fh<' reluainmg ronsolianls are voiced 
lnterosied readers call find tile complete lisi of con- 
SOllant harlnony lqlh's in l.;oksal \[12\], and Solak \[20J 
To give ~ll/ examl)h', one of thr rules says that if a 
sulIix begins with ore, of t.h( consonants I). (:, (;. 
Ibis COllSOllalll changes iillo T, ('. I{ l'eSlWCl \[rely, if" a 
%oiceless COllSOllalll is iH'(,~('nt as the final I)h(HieillO o\[' 
the pr(wious illl)rpllonl0, e.g.. ~l'()I,\])a (Ol/ l'o~id), bill 
I:(\]AK'IA (ou plalw), 
~oiii(' lilOrl)henles are allixcd wiill ihc insertion ot 
either N. ,q, ~. "l" when Iwo vowels llal)pcn Io fol- 
lmv each otll0r (e.g. ilAIIt:ES;i (his/l~er garden), 
II,tll(.:l::Yi (aecusali;e of garden), il.:i,5_l,;l/ (two 
each)j, or when there is anoLher nloi'phenie follow- 
ing (e.g. BAII(!IC,q_'iNDI'; (in his/iwr ga,'d,~.), or in 
Colltexl of sonic lirOllOtlllS (c.g,, BUNA (to tiffs), 
I<I;;NDiNI)EN (|'rein yourself)) and thr prononiial 
sut/ix I,~i (,,.g. SI':NiNV;i*i (accusaliv,' of .yours)). 
lit OII1' ('xanll)\]l' HI)O%'/', the flllllr(' It,liSt, sii|l\]x 
{'~'\]{A}( :{ Ai{IC} ........... I'le,' till' ~4i ...... YAt'MA ..... I 
since thr \[asl ph()llrnir is a vowel "f is ms('rl,'d. 
2.1..3. D(~forniatioll of ll.(mts 
No/'nlally 'lurkish rliols arc nilI t\[oxi>d. \]\[owovel-, 
tllerc ~tl'P SOlllO ('il'4or, whrr¢ f4Ollll' i~honenws ~ll'(' 
ch,qllgod by aSSilnilalion or variolls olher (icforlllaliOl/S 
\[12\] An ex(:eptioilal cas,' related io ih,' tlexion of 
IOOIs iS observed ill \])l,lSOlla\] i)rOllOllllS BI",N (1) alid 
ql;N (you)\]laving ,lalivo~ lIANA (to ine)and SAN:\ 
(Io yell t rcspeciivc\[~. 'l'hrsr ar(" individual cases and 
Clill hi, Ircated as excc\[lli(lllS. 
,% lil,)ll syslelnatic ,qlipsis OrClii's when il. su\[\[ix 
{1} k(.)ll ('Olll(?S all,el i it(' ~elbal reels alld SlOlllS ('li(I.- 
iiig wil\[i I,ho llholieillc {A} In SilC\[i cases, ttlc wid,, 
\()1\('1 ;ii Ih,' end of lhe sielil is i/arrow~,d, c,g, YAP 
- ','AI>IYOlt (s/h,'/ii is doin-; \[ii\]). but Alia • 
AI'IIYOII (s/ho/it is ,.earchmg). 
AIIOl\[lcq' rool deforlualion o(('tlrs ;is (i vowel ellipsis. 
~,Vlien a sut\[ix brginnin<e> wilh a vowel COllies afler 
SCllllt, ilOtlllS, gener;i\]l) dcsigiiating paris of thr hu 
ma. body. wim'h has . vowd {i} i. ils lasl syllabi,', 
Ihis vm~el drops, e.-. IIl'ltl:X (lieS,,) - BVIINUM 
(mS nose). '-;imilarty. who. lh(" passiw.uess suffix 
{I}L is affixed to some ;crl~s. whose lasl vowel is {I}, 
tNi <, vowel also drops. ~,.~. ('AC, II/MAb; (io call) -- 
('A(;ItlI,MAK (io Iw calh,d). Other root delk)rl.a 
('l{,,f+q 5olak \[20\] fra delailed ilffOil,,aiion oil ,'at, h of th,' ~ullixes 
tions and their exceptions call be found ill Solak \[20\]. 
2.2. Turkish morphology 
Turkish roots can be classified into two main classes: 
,ominal and verbal. The verbal class comprises the 
verbs, while l/Olllillal chess COlllprises llOIlllS, |)rOllOilllS 
and adjecl.iw's, etc. Tile sulfixes that can be receiw~d 
by either of these groups are different, i.e., a suffix 
which C~lll bt, a|Iix(!d to ~1 llOlllillRl l'oot ci%11 llot b(! 
affixed to a w.'rbal root wil, h tile same semantic func- 
tion. 
Turkish suffixes can bc classified as derivalio~lal and 
co~ljuyallonaL l)eriwttional sutfixes change thr mean- 
ing and sometinies lhe class of tim stenls they are 
affixed, while a conjtlgated verb or noun renmins x'~ 
such after the atlixation. Conjugational suffixes call 
b. affixed to all of the roots in |,he class thai \[,hry 
belong. On t, hc el, her hand, 1hr nuniber of roots that 
,,ach derivational suffix can }>e affixed changes. The 
nominal model 
'lhr shnplili,~d models for nominal and verbal grain- 
lllgll'S rail be giVI211 ~lS tollows: 6 
The nomimd niodel: 
nommal root + phu'al suffix + possessive sutflx + case 
suffix -I relalive suffix 
Tim wwlml model: 
verbal root -\[ voic(" sultixes + negation sulfix + corn 
pound verb suffix t- Illaili It'llse suttix -i- qllestioll 
suffix + secoitd I.l!llsl? suffix + Iwrsoll sutllx 
3. Implementation 
\\',' have ilnph,lnrnted n rool-driwul lnorphological 
analyzer lbr '\['urkish ;tlld llSe,,I il as a spelltn 9 chcckl*ui 
,4'e 7~t;I that can be integrated t.o <li\[fiercnt a.I;pli¢iations 
Oil a variely of plattbrnis. 
The progranl takes a list of Turkish words as inpul, 
and thcli checks then10lit? I)y one ill the order t, hey 
appear. If the Slmllhlg of aii hipul word if. iileorrecl, 
il is oulput as inissI>elh'd Each word is allalyzcd 
individually wil, h 11o at, telllion to the Selllalltics or |,o 
the coillcxl. If a r, ord is spellrd corrrcll3 Inil is l,h~, 
wrOllgj ':.oi'd ill lhe ¢Ollll'Xl. w(> liave 11o inl,elll,ion for, 
and way of tlagging it. ;is ci'rOllOOllb, '\['hils, as in all 
oliwr Sl)elling prog~i'alllS, th(> lexl is CXalnilied wilh 
leSliecl Lo words, liol willi rcspccl Io SClilC-iices. hi 
addition, w~, (1o 1101 )'{'t give ally stiggr'stion aboul the 
iliOSl likely correci words afler dole<ling a nlisspelh~d 
word. ie, spelling corl'rClirm is ilol dent, Word 
anal.~sis is handh'd in four step as syllabificaliou 
chrrk, reel dclcrniilial, ion, niorphol)llonenlie check. 
and morphological analvsis. I)uring lhese steps a dic- 
liOllal' 3 of Turkisli root words, and a set o\[' rllles for 
'lurkisli syllable structure, njorlihophonenlics, and 
inorpholog;y arc nsed coucurrenily. All these steps 
will I~e ,'×plain,'d i. llw following sections, after a 
ill Illose m.dol~ an, I I lie e ×cept irma\] ,:;tses ~liJoill \[ heill 
ACRES DE COLING-92, N^t, rrEs, 23-28 hot'n" 1992 4 1 PROC. OF COLING-92, NANTrS, AUG. 23-28, 1992 
brief infornlation on tile dictionary used in this im- 
lllenlentaliou. 
3.1. Dictionary 
The dictionary is bmsed oil the Turkish \Vriting Guide 
\[2,3\] as the source. Some words in the dictionary haw, 
to lie marked ~s having certaiu semantic and struc- 
tural properties such as being a verbal root or a nom- 
iua\] root, being a proper noun, not obeying to vowel 
harmony rules, deformiug under certain conditions, 
and so on. For examph ~, tile word BUII.UN (nose) 
have to be niarked as being a nolllilla\[ root, and de- 
forming by vowel ellipsis. For this reason, for each 
word in Ihe dictiouary a series of flags represeuting 
certain properties of that word are heht. Tllus. each 
elitry of tim dictionary Colltains a word in Turkish 
and a series of flags showing certain properties of that 
wor( \[. 
Nearly 2:1,500 words..'ach having 7 h, lters on the 
averagiN are listed ill otir Ctil'roilt diclionary, 41 flags 
per e.'ord 7 have been lised so far, bill later it iliay 
h¢" liecessary to iise illore, \[leCallSO of this, two long 
inl.egers (whose bits rel/reselll flags. 17)r a toial of 64 
flags) arv assigned for every word. 
3.2. Syllabification Check 
Analyzing all t, he words in Turkish \Vritithg Guide \[23\] 
and all the suffixes ill Turkish \[1, b\]. w~" have con- 
structed a legular expression and a corresponding fi 
nile stale automaton for validating if a word matches 
the syllablestructttre rules of Turkish \[18\] This reg+ 
/llar t?xpr0ssiOll is tised as a heuristic ill oltr spelling 
checker. The input word is first processed with the 
regular expression. It is reported as misspelled if its 
syllaMe structure can not be mat.ched wilh this ex- 
pression, i.e., tile phonemes of Ihe word do no! form 
valid sequences accordiug to Turkish syllable struc- 
iurcs. ()n tile other hand, if it. can lie matched, it, is 
flu'ther analyzed as it. tuay still be a non-Turkish or a 
misspelled word. 
With th(- hell> of tile syllal)ificat.ion cheek, most of the 
typographical e.rrors Call be detected. For examph~. 
if the word YAPMAI( (to make) were typed as YP- 
.\I,\I,2 or YAPMKA. the word would not be matched 
by the expression and its spelling wouhl be reported 
incorrect. On tile other hand, ifil, wew written as 
YAPMEI(, where a vowel harmony error is made, it 
would pass the syllabification cheek, but would lie re- 
porled as misspelled during morl/holJhonemic checks. 
3.3. Root Determination 
Before analyzing the morpholAmnenfic and morpho 
logical structures of a Tm'kish word, the root has to 
be determined. If \[he word passes the syllabification 
check, its root is searched in the dictionary rising a 
maxilnal match algorithm. In this algorithm, lirst 
;\[he list of all \[lags can Im hmnd in Solak \[2(1\]. 
AcrEs DE COLING-92, NANTES, 23-28 AO6-r 1992 42 
the whole word is searched in the dictionary, If it 
is found then the word has no suffixes and therefore 
its spelling is correct. Otherwise, we remove a letter 
from tile right and search tile restllting substring. We 
continue this by removiug letters from the right until 
we find a root. If no root can be found although the 
first letter of the word is reached, tile word is reported 
as misspelled. 
The maximum length substring of the word that is 
present in tile dictionary is riot always its root. If 
fin't.her analyses show that the word is misspelled, a 
new root is searched m the dictionary, this time re- 
moving letters from the end of the previous root.. If 
a Ilew root can be found the same operations are re- 
peated, otherwise tile word is reported &s misspelled. 
Root determination presents some dittieulties wheu 
the root of the word is deformed. For the root words 
which have to be deformed during certain aggluti- 
nations, a flag indicating that property is set in the 
dictionary. For example, the root of the word ,~EHRE 
(to the city) must be found as ~jgltiR (city). In order 
to determine it correctly, when the substring SEHR is 
not found in the dictionary, considering that it illay 
be a deforined root by vowel ellipsis, the vowel I is 
inserted between the consonants 11 and R, and the 
word ~EHIR is searched in the dictionary. When it 
is fotmd, tile flag corresponding to vowel ellipsis is 
checked. Since it is set for this word, the root of the 
word S,'I';IIRE is dcterlnined as ~EIIiR, and remain- 
ins analyses are contiuued. If that word were written 
as .~EHiRE, we should report it ms incorrect although 
~EltiR + dative ease suffix form looks correct. For all 
other root defin'mations, the real root of the word can 
be fotnld by u/aking such cheeks and some necessary 
chauges (see \[20\]). 
For some roots both of the refills above are valid. 
For example, both METN\[ (accusative of text) and 
METiN\]\[ (accusative of strong) are correct although 
the root of both words is MET\[N (text, strong) be- 
catlse this word call be used in twodifferent meanings. 
3.4. Morphophonemic Check 
Turkish words obey vowel and COllSOll~lllt harmouy 
rules during agglutination (see sections :3.2.1 and 
3.2.2). The vowel harmony check may be done jnst 
after tile root determination, but other morphophone- 
mic checks should be done during morphological anal- 
3sis, 
Afier tile root of the word is found, tile rest of tile 
word is considered as its suli\]xes. The first, vowel in 
the sutfixes part must be in harmony with tile last 
vowel of the root, while tile succeediug vowels must 
be in harnmny with the vowel preceding them. Since 
there are some sulllxes, such as --KEN, whose vow- 
e\]s ilever chaugo, when a disharl!lony is fouud, we 
cimck whether it, is tile result of such a snffix (e.g., 
YANARI,2I'\]N (while iI is burning)). 
PRec. OF COLING-92. NANrEs, AUG. 23-28, 1992 
SomP words of foreign origin do uot ohey vowel har- 
mony rules during agglutination (e.g., KONTIIOL 
(control)). Before ttae w)wel harmony cheeks are 
doue, the tlag correslJonding to that property must 
I,e checked, If it is sol for the root of the word, 
du, vowel harmony check must he apl)lied inversely 
Thus, the first vowel in I, he sulllxes part must be 
in disharmony with the last vowel of the root (e.g,, 
I(ONTIIOI,LEI/, (controls)). As another interesting 
('aS(', SOI\]le roots that ii\]ay he used ill tWO illeanings. 
\[,e, |lie holnol\]ylllS, obey vowel fiarulony ruh!s whel/ 
tile3' are used with a eertaiu lllealling, whih' they do 
lie\[ ob,'y thelll when tile)' are used in tit(! other mean- 
ing. For example, both SOLA (to Om left) and SOl,I); 
(t(} the Itote sol) pass the vowel harmony cheek sine,, 
tileir refit ~OI, has two iPl{!anil\]gs ;is "left" slid "'tliti- 
sical u(}t,e. "8 
The suffixes must I}e deierinin,xl before the conso 
llaUl }larlUolly checks are doue. Becanse of this. I hese 
checks are done during niorl}hological anal)sis, after 
eacli sulfix is isolated. 
It' a woM does not pass any of ll\]e nlorphophoiil!ulic 
checks, consideriug the possihility that lhe root may 
have i)eell determined wrollgly, a liew root is searched 
ill the dictionary. 
• 3.5. Morphological Analysis 
Tim spoiling checker has two separate set. of ruh,s for 
I.he two IIKLill root. classes. For tile illlplelllent~d.ioll of 
tile lexical analyzers and parsers in which the rules 
arc inchlded, two standard UNIX utilities, lea" mid 
(lace, have been utilized respecliw~ly \[1;I\]. Lea: is used 
Io separate tile suffixes of a word from left to right, 
;111(I I/ace is tlsed to p;q'se tilose su{\[ixes tlsil\]g Illorpilo- 
logical rules of Turkish granllrlar. 
The models given in various books on Turkish gram 
mar \[I. 2, 1. 5. 14} and previous research on Turkish 
COml)utational linguistics \[12. 16\] have been ul,ilized 
in for generating the rufi's used in the parsers. Addi- 
tionally, all of tim known exceplioua\] cases \]lave also 
been considered (see \[20\]). Although all the eonju- 
gational suffixes flaw? been included into the rules, 
only a mlallsubset of the derivalional suttixes have 
heen ha\]idled, The reasons lot Ihis sre dial majorily 
of Ihe derivatioual sullixes may he receiw~d by only a 
small group of roots, and deternfining such groups is ;i 
rat her dilficult an(I time-consuming job, and depends 
on wmous sen(antic criteria. The derivational sutfixos 
that may I)c. alfixed to all of Ill,? roots ill a {'lawn and 
those which can he affixed to large I\]{rcentage, Illll 
UOi all, of the roots in their clas~ are inclu{led in lhe 
rules. That makes it i)ossible to , limioate a number 
of words from the dictionary. 
'l~ho two p~ll'Sers ~11'(, allerllalively llscd. First parser 
Io I}e limed is deternlined accordilig to Ill,. class of/It{' 
root, hilt its the parsfilg COll\]illlWs it IlHty be IleC(?S:-;&try 
1(} s\\ilt.h frolll o11{, plll'S(?r I(} ill\]other ~llld eOl/til\]ll{' 
8 i'IIC WOlf\[ ~(}l, iN l)l'OtlOlllll?Cd slighl b' dilfel'elll ill Ihc I;~tlCl', 
there, or ~tgain pass hltck to the previous ()lie, since 
the da.ss of a stem can change when it, receives certain 
suffixes. "\['lie switches between parsers C~l\] SOllletinles 
he very complicated. Some suffixes can have two dif- 
ferent usages. In such eases both possibihties haw~ to 
he considered. 
\[f a word has receiw~d more than one derivationM 
sutfixes then mauy switches between parsers will be 
necessary. For example, the root of tile word BEYAZ- 
LA~TIRMAYANLARI}AN (from those which do not 
cause to hecome white) is found as the noun BEYAZ 
(white) in our dictionary. Then comes the suffix 
L{A},5, which makes a verb from a noun, tfierefor," 
a switch t.o the verb parser ha~s to be ulade. Parsillg 
contimles there until I.he suffix M{A} is nlatched. 
This sulfix can either make a w~rh a noun or negate 
it First cousideriug the possibility that it is used 
as a derivationM suffix, tile noun parser is invoked. 
'file rmnaiuing part of the word can not be parsed 
by 1his parser. So accepthlg M{A} as the negation 
suffix, tile verb parser is returned to ;hid parsing con- 
tinues there. Later comes the sullix \[Y\]{A}N, which 
is a sulfix i.fiat illakes ;t lIOill\] fronl a verb, so ~lgS.ill 
a switch to the noun parser is made. Continuing in 
this p~trser, the word is parsed correctly. 
Some Turkish roots call take the sullixes helonging 
to both nominal or verhal chLsses. \[:or such roots if 
parsing is unsuccessfld in the first parser chosen, the 
other olle UlnSt alsG be tried. For exalnphL (fie root of 
the word A(\]LAR (hungry I)eOl)fi~) is At7. 'Ellis root 
may either he used as a verb (open) or as a uoun 
(hungry). If parsing is first attempted with tile ver- 
bal parser it will he unsuccessful. So we backtrack 
aud use the nominal parser. With the nominal parser 
the word can be parsed successfully. 
Figure 1 shows the block diagram of the word anM- 
ysis. Smumurizhlg, first, the syllable structure of the 
word is checked. If it is wrong the word is added 
into the output list of misspelled words, otherwise 
the root is detemfined. If no root can be found the 
word is reported as misspelled. If a root is tbund, 
lirst Ihe vowel Ilarmony check is done. Then, ac- 
cording to lhe ('lass of the root, ol\]e of the parsers is 
actiwlted Ill Ihe parsers, an the sutIixes ;(re isolated 
OI/C Ily oue, ilecessary luorphophollenlic cfieeks a, re 
done. l)epending on the sulfixes, switches between 
the parsers are possihle. When the cud of the word 
is reached, if no errors ('all he tfUllld then the spelfillg 
of the word is correct. If any error is found in itny 
of the parsers or during ulorphophonenlic checks, a 
new root is searched. If another reel is found same 
operations are doile. If no suceessfld parsing can b,> 
done although lilt! Iirst. letter of the word is reached, 
the word is added into the OUtl)ut hist. 
4. Performance Results 
This spelling checker has been i,nl)lenumted in 
UNIX ellvirol\]lllellL, Oil SUN St}AI{C workstations, 
AcrEs DE COLlNG-92, NAmT.s, 23-28 AOm' 1992 4 3 PRec. OF COLING-92, NM, rrES, Auo. 23-28, 1992 
Word 
Syllabification 
Check 
Reel 
Determination 
Misspelled 
verb root 
noun root 
Figure 1 : 
at l~ilkent University, using tile C i)rogramnfing lan- 
guage. Its current version takes nearly 600 Kbyl,es 
including t, he dictionary. 
The checker can be inserted to different word process- 
iagapplieat.ions or can be used separately. We haw' 
integrated it to GNU-I!EMACS text edit, or for use on 
IgI'EX document.s, ht this form, the program is avail 
able for use within the university and around a nun> 
her of sites oa luternot. It is also I~ossible to obtain 
solnp statistical hiforniatiou 1) 3" running the progranl 
willi -s option. 
()Ill' resilll.s indicate thai the llIlitti)er of distinct words 
withiil a document is relaliwdy small, and more par 
ticularly, the percentage of distinct words l.o total 
words processc(l hicreases as the \[eligill of the docu 
Inelll decreases. Approxiniately 40% of the ufissl)e\]led 
word:, are delecled by syllal)ifiealion check and flu' 
resl ale detected by other checks. The nul'lll)er of 
disimct words all\]?cl the execulion linie ill()re than 
Ihc lotai nuuiber of words, as expoclod, because a 
word is fully aualyzod only ouce If it occtlrs again in 
Ihe text, the resillt o1" the pt'~,ViOllS check is used, Iu 
geiiera\]. Ihc slwlling ch,wker can process lit Ill00-::;000 
words (roughly 7-6 pages) per s,'eond, depending on 
Ihe docuutent. The functioilal perforlnance of the. 
spelling checker ('au hi, title trilled \])V analyzing the 
word \[isl and inserting the additional al)l)ropriale 
flags 
5. Conclusions 
lit tlfi~ paper, we have presenled a lnorl)hological 
parser for all agghlthi;ttive lallguagc. Turkisli. all(\] its 
~verb suffix 
Verb 
Parser 
T ~. 
verb \[ l \[ noun Morphophonemic 
suffix \[ suffix Checks 
T 
Noun 
Parser 
rloun suffix 
Word analysis 
application to spelling checking of this language. 
Parsing agglutinative word structures necessitates 
rather nontrivial phonological and morphological 
analyses which present special difficulties in the de- 
velopment of parsers for such languages, not usually 
encountered in parsers for other languages. As a re- 
sub, the number of parsers developed for agghitina- 
five languages, and particularly for Turkish, is quite 
limit.ed, and they have certain shortcomings. We have 
solved most of the problelns encountered in the previ- 
ous parsers by lnaking a detailed and careful research 
on Turkish word formation rules and their exceptions 
\[20\]. These results may hopefully be helpful for fu- 
ture researchers on Turkish linguistics. We should 
note t.hat ewm though it is claimed that word for- 
marion rules in Turkish are well-defined and Turkish 
is a very regular language, as used today it shows 
many irregularities that cause the ln'oblem of parsing 
I.his language to become a very hard and interesting 
problem 
Many grammar books haw~ been referred to collect, 
Turkish word format.ion rules, hi those books, af- 
ter each rule is defined, usually it is reminded that 
tliel'e lllay occur sonte exceptions to that, rule ill SOllle 
condit,ions, hut mostly those conditions can not be 
"well" defined, For example, in all Turkish grammar 
books, it is said that "When a Turkish word ending 
wit.h one of the consonants P, ~', T, K receives a suf- 
fix beginning with a consonant, that final consonant 
is soft,cued, bul t, here are some such words whose fi- 
nal consonant does not change." \]lowever, none of 
the books says what the common propert.y of those 
words which do not obey t,o that rule is, because most 
AcrEs DE COLING-92, NA~n'ES. 23-28 Ao\[rr 1992 4 4 PRec. OF COLING-92, NAI~rrES, AUG. 23-28, 1992 
probably it is not known yet. Ill order t,o inchlde that 
rule correctly in the parser, all words having the in- 
dicated prol)erty have been examined, the list of t,he 
irregular ones have been obtained, and speeial checks 
have hi!ell dolle t,o catch those irregularit.ies. Ill order 
t.o obtain reliabh? resuhs front the spelling checker, 
all of the known rifles and theh' except, ions have been 
inlplellll!tlt,Od 
The spelling checker ,'4OllletillleS i'e\[)ol'LS correcI woFds 
aS illcorreet, ()lie reason ell, his is the absellce of SOllle 
words in ore' dictionary. Although the dict, ionary is 
reasonably complete, there still remains many technl 
cal terms and proper names which are not included. 
Adding more and nlore words will obviously increase 
tile flmelional performance of the checker. Another 
reason is that, most of the derivational sultixes are 
not mchtded rote'die rules. If( stem that is derived 
by such a suffix is not present ill the dictionary, it is 
reported as misspelh~d. Additionally. for th( deriw~- 
lionel sullixes that. are included in our rules, the lis~ 
of the roots that they can be a\[lixed to may no( he 
full~ determined. This problem can also be solved by 
examining the dictionary As far as execution pertbr- 
mance goes. our iml)hmtentation is very S;atisfa¢lory 
giving an ahnost. 1000 words/second word analysis 
throughput \[19\]. 
Ac31~s DECOLING-92, NANTES, 23-28 Aour 1992 4 5 Plot. OF COLING-92, NANTES, AU6. 23-28, 1992 

References 

\[1\] Adah, O., "Tiirkiye '1 iirkwsmde bigiml)irimh'r", 
TI)K, Ankara, 1979. 

\[2\] Bal~guoglu. T.. '"l'{irkqenin grameri", 'I'I)K, 
Ankara, 198(i. 

\[3\] Brodda, B., I.~arlsson, F., "An exl)erm.'alt with 
morphological analysis of FilmislF', Papers from 
the hlst, itude of l,inguistics, University of St,eel,: 
holm. Publicat.ion ,1{1. Stockhohn. 1980. 

\[1\] ('an, K., "Yal~ancflar i,~in 'l'iirkw-ingilizee 
agklamah Tfirk W dersleri", METU, Ankara. 
I {}87 

\[5} I)emircan, (:5.. "'Tiirkiye Tiirkwsmde kSk-ek 
Ifih'..,mehM", T1)K, Ankara. 1977. 

\[61 llankamer, .I., ":l'urkish generativ,~ morphology 
and morphological parsing", a pal)or presented 
al ,¢,eeolld Int.erlLational (',onferellce on Turkish 
Linguistics, !.stanbul. 198.1. 

\[7\] llankamer, d., "Morphological parsing and the 
lexicon", edit,ed by William Marslen-Wilson. 
MIT Press. 

\[8\] llatiboglu, V.. "Tiirkgmin elderi". TI)K, 
Ankara, 1981. 

\[9\] l{aspm'. R,, "~Veb,,r. 1)., "llser's referem'e nlalltial 
\[or t, he ("s Qut'chua adaptat.ion progranf'. Oc 
casional Publications in Academic Computing. 
Nmnber 8. Smmner lnstitude of Linguistic, Inc., 
1982. 

\[10\] Kasper, It., Weber, 1)., 'Trogrammer's refer- 
ence manual for tile C's Quechua adaptatiml 
progranf'. Occasional Publications ill Academic 
Computing, Nmnber 9, Summer Instil, uric of Lin- 
guistic, Inc., 1982, 

\[11\] b:oskenniend, K., "'l'wodevel morl)hology", Uni- 
versity of tlelsinki, l)epartment of General Lin- 
guistics, l>ublication No, 11, tlelsinki, Finland, 
1983, 

\[\[2\] KSksal. A, "Autmmttic morphological analysis 
ofTurkislf', PILl). Thesis, llaeettepe Uniw~rsity, 
Ankara, 1975. 

\[13\] Mason, T., Brown, D., "lax & yaec", edited 
by l)ale Dougherty, O'Reilly & Associates, Inc., 
lISA, May \[990 

\[14\] ()zel, S,, "Tfirkiye Tiirkqesinde sSzciik t, iiretme 
ve bilc.~tmne', TI)K. Ankara, 1977, 

\[15\] Packard, D., "C, omputer-assisted morphological 
analysis of Aucient (;reek", C, omputational and 
Mathematical Linguistics: Proceedings of the ln- 
t,ernational Cont;~:rence on Compul, ationa\] Lin- 
guistics, Pisa Leo S. Olschki, Firenzc, 343 355, 
1973. 

\[16\] Sagay, Z., "SSzciik ~;ekimi", Bili,~im'78, Ankara, 
1978. 

\[17\] Sagvall, A., "A system for automatic inflectional 
analysis implement.ed for R,ussian, Data Lmguis 
tica 8, Ahmluist and Wiksell, ,qt,ockhohn, 1973 

\[18\] Solak, A., Oflazer, K,, "A finite state ma- 
chine lot Turkish syllabh~ structure analysis", 
tn Proceedings of 0w Fifth hlt.ernational Syln- 
positlm 01( COllll)tll;er all(/ Informal;ion ,qciellces, 
Nev,whir , Tftrkiy% 1990. 

\[19\] Solak. A,, ()flazer, K.. "lmtAementation details 
and I)erformance results of a spelling checker for 
Turkish", m Proceedings of the Sixth Interna- 
tional Symposium on (:omput,er and Information 
Sciences, Side, Altt.alya, Tiirkiye. Oct. 1991 

\[29\] Solak, A, "l)esign and inllflementation of ;, 
spelling checker R)r Turkish", M.S, Thesis, 
Bilkent Universit.y, Ankara, 1991. 

{21\] Underhill, R,, "Turkislf'. Studies in Turkish Lin- 
guist, ics, edited by Dan Isaac Slobin and Karl 
Zimmer. 7 -- 21, 1986. 

\[22\] "Tiirk(ie sSzlfik", "I'DK, Ankara, 1988. 

\[23\] "Yeni yazlm kdavuzu", l!;lewmth Edition, TIll(, 
Ankara, 1981. 
