A word-grammar based morl)hoh)gieal analyzer 
for agglutinative s 
Aduriz 1.+, Agirre E., Aldezabal I., Alegria I., Arregi X., Arriohl J. M., Artola X., Gojenola K., 
Marilxalar A., Sarasola K., Urkia M.+ 
l)ept, of Colllptiier 1Aulgtlages and Systems, University of lhe Basqtlo Cotlnlry, 64.9 P. K., 
E-20080 1)onostia, Basque Counh'y 
tUZEI, Aldapeta 20, E-20009 1)onostia, Basque Country 
+Universidad de Barcelona, Grin Vfii de Isis Cortes CalaiallaS, 585, E-08007 Flarcelona 
j ipgogak @ si.elm, es. 
Abstracl 
Agglutinative s presenl rich 
morphology and for sonic applications 
they lleed deep analysis at word level. 
Tile work here presenled proposes a 
model for designing a full nlorpho- 
logical analyzer. 
The model integrates lhe two-level 
fornlalisnl alld a ullificalion-I)asod 
fornialisni. In contrast to other works, 
we propose to separate the treatment of 
sequential and non-sequetTtial mou)ho- 
lactic constraints. Sequential constraints 
are applied in lhe seglllenlalion phase, 
and non-seqtlontial OlleS ill the filial 
feature-combination phase. Early appli- 
cation of sequential nlorpholactic 
coilsli'aiills during tile segnloillaiioi/ 
process nlakes feasible :,ill officienl 
iinplenleilialion of tile full morpho- 
logical analyzer. 
The result of lhis research has been tile 
design and imi)len~entation of a full 
nlorphosynlactic analysis procedure for 
each word in unrestricted Basque texts. 
Introduction 
Morphological analysis of woMs is a basic 
tool for automatic  processing, and 
indispensable when dealing willl highly 
agglutinative s like Basque (Aduriz el 
al., 98b). In lhis conlext, some applications, 
like spelling corfeclion, do ilOI need illOl'e lhan 
the seglllOlltation of each word inlo its 
different COlllponenl nlorphellles alollg with 
their morphological information, ltowever, 
there are oiher applications such as lemnializa- 
tion, lagging, phrase recognition, and 
delernlinaiion of clause boundaries (Aduriz el 
al., 95), which need an additional global 
morphological i)arsing j of the whole word. 
Such a complete nlorphological analyzer has 
lo consider three main aspects (l~,ilchie et al., 
92; Sproal, 92): 
1 Morl)hographenfics (also called morpho- 
phonology). This ternl covers orthographic 
variations that occur when linking 
IllOfphellleS. 
2) morpholactics. Specil'ication of which 
nlorphenles can or cannot combine with 
each other lo form wflid words. 
3) Feature-combination. Specification of how 
these lnorphemes can be grouped and how 
their nlorphosyntactic features can be 
comlfined. 
The system here presented adopts, oil the one 
hand, tile lwo-level fornlalisnl to deal with 
morphogralfilemics and sequential morl)ho- 
lactics (Alegria el al., 96) and, on the other 
hand, a unification-based woM-grammar 2 to 
combine the grammatical information defined 
in nlorphemes and to tackle complex 
nlorphotactics. This design allowed us to 
develop a full coverage analyzer that processes 
efl'iciently unrestricted texts in Basque. 
The remainder of tills paper is organized sis 
follows. After a brief' description of Basque 
nlorphology, section 2 describes tile 
architecture for morphological processing, 
where the morphosynlactic component is 
included. Section 3 specifies tile plaenomena 
covered by the analyzer, explains its desigi~ 
criteria, alld presents implementation and 
ewthialion details. Section d compares file 
I This has also been called mo*7)hOSh,ntactic 
parsitlg. When we use lhc \[(fill #11017~\]lOSyltl~/X WC 
will always refer to il~c lficrarchical structure at 
woM level, conlbining morphology and synlax. 
2 '\]'\]lt3 \[IDl'll\] WOl'd-gF(lllllllUl" should not be confused 
with the synlaclic lilcory presented in (Hudson, 84). 
system with previous works. Finally, the paper 
ends with some concluding renmrks. 
1 Brief description of Basque 
morphology 
These are the most important features of 
Basque morphology (Alegria et al., 96): 
• As prepositional functions are realized by 
case suffixes inside word-fornls, Basque 
presents a relatively high power to generate 
inflected word-forms. For instance, froth a 
single noun a minimum of 135 inflected 
forms can be generated. Therefore, the 
number of simple word-forms covered by 
the current 70,000 dictionary entries woukl 
not be less than 10 million. 
• 77 of the inflected forms are simple 
combinations of number, determination, 
and case marks, not capable of further 
inflection, but the other 58 word-forms 
ending in one of the two possible genitives 
(possessive and locative) can be further 
inflected with the 135 morphemes. This 
kind of recursive construction reveals a 
noun ellipsis inside a noun phrase and 
could be theoretically exteuded ad 
infinitum; however, in practice it is not 
usual to fiud more than two levels of this 
kind of recursion in a word-form. Taking 
into account a single level of noun ellipsis, 
the number of word-forum coukl be 
estimated over half a billion. 
• Verbs offer a lot of grammatical 
information. A verb tbrln conveys informa- 
tion about the subject, the two objects, as 
well as the tense and aspect. For example: 
diotsut (Eng.: 1 am telling you something). 
o Word-formation is very productive in 
Basque. It is very usual to create new 
compounds as well as derivatives. 
As a result of this wealth of infornmtion 
contained within word-forms, complex struc- 
tures have to be built to represent complete 
morphological information at word level. 
2 An architecture for the full 
morphological analyzer 
The framework we propose for the 
morphological treatment is shown in Figure 1. 
The morphological analyzer is the fiont-end to 
all present applications for the processing of 
Basque texts. It is composed of two modules: 
the segmentation module and the 
morphosyntactic analyzer. 
conformant .................. ~ U~atabas N TEZ-conf~ 
\[Segmentation module 
____~| HorphograDhemics 
Morphotactics I 
TEI-FS .............. ~ ~~~ ~-p~ 
conformant Cegmented TexN 
Morphosyntactic analyzer 
Feature- combination 
Morphotactics II 
TEI-FS \] ............................ ~ actically 
Lermnatization, linguistic Analysis tagging tools 
Figure 1. Architecture 1"o1" morphological processing. 
The segmentation ,nodule was previously 
implemented in (Alegria et al., 96). This 
system applies two-level morphology 
(Koskenniemi, 83) for the morphological 
description and obtains, for each word, its 
possible segmentations (one or many) into 
component morphemes. The two-level system 
has the following components: 
® A set of 24 morphograf~hemic rules, 
compiled into transducers (Karttunen, 94). 
• A lexicon made up of around 70,000 items, 
grouped into 120 sublexicons and stored in 
a general lexical database (Aduriz et al., 
98a). 
This module has full coverage of free-running 
texts in Basque, giving an average number of 
2.63 different analyses per word. The result is 
the set of possible morphological segmenta- 
tions of a word, where each morpheme is 
associated with its corresponding features in 
the lexicon: part of speech (POS), 
subcategory, declension case, number, 
definiteness, as well as syntactic function and 
some semantic features. Therefore, the output 
of the segmeutation phase is very rich, as 
shown in Figure 2 with the word amarengan 
(Eng.: on the mother). 
grammar 
mother) 
POS noun) subc~t common 
:count: +) 
(animate +) 
(nleasurable "-) 
aren 
(of life) 
(POS decl-suffix) 
(definite +) 
(number sing) 
(case genitive) 
(synt-f @nouncomp) 
J gan \] (o.1 / 
(POS decl-suf fix) I 
(case inossivo) \] 
(synt-f @adverbial)I 
=> 
amarengan 
(o. the mother) 
POS noun) 
subcat common) 
number sing) 
definite +) 
case inessive) 
count +) 
animate +) 
measurable -) 
synt-f @adverbial) 
iq:e, ure 2. Morphosynlactic analysis e of (unureugun (l{ng.: (m 
The architecture is a modular envhoument that 
allows different types of output depending on 
the desired level of analysis. The foundation of 
the architecture lies in the fact lhat TEI- 
confommnt SGML has been adopted for the 
comnmnication allloIlg modules (Ide and 
VCFOIIiS, 95). l~'eature shucluleS coded 
accoMing TIU are used to represent linguistic 
information, illcluding tile input mM outl)ut of 
the morplaological analyzer. This reprcscnta- 
tion rambles the use of SGML-aware parsers 
and tools, and Call he easily filtered into 
different formats (Artola et ill., 00). 
3 Word level morl)hosyntactic analysis 
This section Hrst presents the l~henomena lhat 
must be covered by the morphosyntactic 
analyzer, then explains ils design criteria, and 
finally shows implementation and ewfluation 
details. 
3.1 Phenomena covered by the analyzer 
There are several features that emphasized the 
need of morphosyntactic almlysis in order to 
build up word level information: 
I) Multiplicity of values for the same feature 
in successive morphemes. In the analysis 
of Figure 2 there are two different values 
for the POS (noun and declension suffix), 
two for the case (genitive and inessive), 
and two for the syntactic function 
(@nouncomp and @adverbial). Multiple 
values at moq~hemc-level will have to be 
merged to obtain the word level infer 
mation. 
2) Words with phrase structure. Although the 
segmentation is done for isolated words, 
independently of context, in several cases 
3 l?calurc wtlues starling with the "@" character 
correspond to syntactic functions, like @noullcomp 
(norm complement) or @adverbial. 
the mother) 
tile resulting structure is oquiwflent to the 
aualysis of a phrase, as can be seen i, 
Figure 2. 111 this case, although there are 
two different cases (genitive and inessive), 
lhe case of the full word-form is simply 
inessive. 
3) Noun ellipsis inside word-lbrms. A noun 
ellipsis can occur withi, the word 
(oceasi(mally more than once). This 
information must be made explicit in the 
resulting analysis. For example, Figure 3 
shows the analysis of a single word-forln 
like diotsudumtrel&z (Eng.: with what I am 
lelling you). The first line shows its 
segmentation into four morphemes 
(die tsut+en+ 0 +arekin). The feature 
compl ill tile final analysis conveys the 
information for the verb (l um lelliHg you), 
that carries information about pc'rson, 
number and case o1' subject, object and 
indirect object. The feature comp2 
represents an elided noun and its 
declension stfffix (with). 
4) l)erivation and composition are productive 
in Basque. There arc more than 80 deri- 
w/tion morphemes (especially suffixes) 
intensively used in word-fornlatioll. 
3.2 Design of the word-grammar 
The need to impose hierarchical structure upon 
sequences of morphemes and to build complex 
constructions from them forced us to choose a 
unil'ication mechanism. This task is currently 
unsolwlble using finite-state techniques, clue to 
the growth in size of the resulting network 
(Beesley, 98). We have developed a unifica- 
tion based word-grammar, where each rule 
combines information flom different 
mot+lJlemes giving as a result a feature 
structure for each interpretation of a word- 
fol'nl, treating the previously mentioned cases. 
3 
diotsut 
I am tellh,g you) 
POS verb) 
(tense present) 
(pers-ergative is)\[ 
(pets-dative 2s) 
(pers-absol 3s) 
en 
(what) 
(POS relation) 
(subcat subord) 
(relator relative 
(synt-f @rel-clause 
0 () 
(POS ellipsis) 
arekin 
(wire) 
(POS declension-suffix)) 
(case sociative) 
(number sing) 
(definite +) 
(synt-f @adverbial) 
=> diotsudanarekin (wi~ what lamtellingyou) 
(POS verb-noun_ellipsis) 
(case sociative) 
(number sing) 
(definite +) 
(synt-f @adverbial) 
(compl (POS verb) 
(subcat subord) 
(relator relative) 
(synt-f @tel-clause) 
(tense present) 
(pers-ergative is) 
(pets-dative 2s) 
(pers-absol 3s)) 
(comp2 (POS noun) 
(subcat common) 
(number sing) 
(definite+) 
(synt-f @adverbial)) 
Figure 3. Morphosyntactic analysis of diotxudanarekin (Eng.: with what I am tellittg you) 
As a consequence of the rich naorphology of 
Basque we decided to control morphotactic 
phenomena, as much as possible, in the 
morphological segmentation phase. Alterna- 
tively, a model with minimal morphotactic 
treatment (Ritchie et al., 92) would produce 
too many possible analyses after segmentation, 
which should be reiected in a second phase. 
Therefore, we propose to separate sequential 
morphotactics (i.e., which sequences of 
morphemes can or cannot combine with each 
other to form valid words), which will be 
recognized by the two-level system by means 
of continuation classes, and non-sequential 
morphotactics like long-distance dependencies 
that will be controlled by the word-gmnunar. 
The general linguistic principles used to define 
unification equations in the word-grannnar 
rules are the following: 
1) Information risen from the lemma. The 
POS and semantic features are risen flom 
the lemnm. This principle is applied to 
common nouns, adjectives and adverbs. 
The lemma also gives the mnnber in 
proper nouns, pronouns and determiners 
(see Figure 2). 
2) lnfornmtion risen from case suffixes. 
Simple case suffixes provide information 
on declension case, number and syntactic 
function. For example, tile singular 
genitive case is given by the suffix -tell in 
ama+ren (Eng.: of the mother). For 
compound case suffixes the number and 
determination are taken from the first 
suffix and the case from the second one. 
First, both suffixes are joined and after 
that they are attached to the lemma. 
3) Noun ellipsis. When an ellipsis occurs, the 
POS of the whole word-form is expressed 
by a compound, which indicates both the 
presence of the ellipsis (always a noun) 
and the main POS of the word. 
For instance, the resulting POS is 
verb-noun_ellipsis when a noun- 
ellipsis occurs after a verb. All the 
information corresponding to both units, 
the explicit lemma and the elided one, is 
stored (see Figure 3). 
4) Subordination morl~hemes. When a 
subordination morpheme is attached to a 
verb, the verb POS and its featm'es are 
risen as well as the subordhmte relation 
and the syntactic fnnction conveyed by the 
naorpheme. 
5) Degree morphemes attached to adjectives, 
past participles and adverbs. The POS and 
diotsudan 
(diotsut + en) 
(POS verb) 
(tense present) 
(relator relative) 
/ \ / 
diotsut 
(POS verb) 
(tense present 
diotsudanarekin 
(diotsut + en -I 0 + arekin) 
(POS verb-noun_ellipsis) 
(case sociative) 
arekin 
(0 + arekin) 
(POS noun ellipsis) 
(case sociative) 
en 
(pos 
• . . 
o 
(POS ellipsis relation) 
arekin 
(case sociative) 
Figure 4. Parse tree for diotmuhmarekitl (Eng.: with what I am lellittg yott) 
main features arc taken from the lemma 
and the features corresponding to the 
degrees of comparison (comparative, 
supcrhttive) aft taken from the degree 
morphemes. 
6) l)efiwttion. 1)miwttion suffixes select tile 
POS of the base-form to create the deriw> 
tive anti in most cases to change its POS. 
For instance, the suffix -garri (Eng.: -able) 
is applied to verbs and the derived word is 
an adjective. When the derived form is 
obtained by means o1' a prefix, it does not 
change the POS of the base-form. In both 
cases the morphosyntactic rules add a new 
feature representing the structure of tile 
word as a derivative (root and affixes). 
7) Composition. At the moment, we only 
treat the most freqttent kind of 
composition (noun-noun). Since Basque is 
syntactically characterized as a right-head 
hmguage, the main information of the 
compound is taken from the second 
element. 
8) Order of application of the mofphosyn- 
tactic phenomena. When several morpho- 
syntactic phenomena are applied to the 
same lemllla, so as to eliminate 
nonsensical readings, the natural order to 
consider them in Basque is the following: 
lemmas, derbation prefixes, deriwltion 
suffixes, composition and inflection (see 
Figure 4). 
9) Morl)hotactic constraints. Elimination of 
illegal sequences of morphemes, such as 
those due to long-distance dependencies, 
which are difficult to restrict by means of 
conti.uation classes. 
The first and second principles are defined lo 
combine information of previously recognized 
mOrl~hemcs, but all the other principles arc 
related to both feature-combination and non- 
sequential moq~hotactics. 
3.3 Implementation 
We have chosen the PATR formalism 
(Shiebcr, 86) for the definition of the moqflm- 
syntactic rules. There were two main reasons 
for this choice: 
• The formalism is based o. unification. 
Unification is adequate for the treatment of 
complex phenomena (e.g., agreement of 
conslituents in case, tmmber and definite- 
hess) and complex linguistic structures. 
® Simplicity. The grammar is not linked to a 
linguistic theory, e.g. GPSG in (Ritchie et 
al., 92)• The fact that PATR is simpler than 
more sophisticated formalisms will allow 
that in @e future the grammar could be 
adapted to any of them. 
25 rules have been defined, distributed in the 
following way: 
• 11 rules for the merging of declension 
morphemes and their combination with the 
main categories, 
° 9 rules for the description of verbal 
subordination morphenles, 
® 2 general fulcs for derivation, 
• 1 rule for each of the following 
phenomeml: ellipsis, degree of COlnpavison 
of adjectives (comparative and SUl)erlative) 
and noun composition. 
3.4 Evaluation 
As a cousequence of the size of the lexical 
database and tile extensive treatment of 
nlorphosyntax, the resulting analyzer offers 
full coverage when applied to real texts, 
capable of treating unknown words and non- 
standard forms (dialectal wtriants and typical 
errors). 
We performed four experilnents to ewtluate 
tile efficiency of the implemented analyzer 
(see Table 1). A 10,832-word text was 
randomly selected from newspapers. We 
measured tile number of words per second 
analyzed by the morphosyntactic analyzer and 
also by the whole morphological analyzer 
(results taken on a Sun Ultra 10). Ill the first 
experiment all tile word-t'ornls were analyzed 
one-by-one; while ill tile other three experi- 
ments words with more than one occurrence 
were analyzed only once. Ill the last two 
experimeuts a memory with the analysis of tile 
most frequent word-forms (MFW) in Basque 
was used, so that only word-forms not found 
in the MFW were analyzed. 
Test 
description 
All 
word forms 
Diffcrent 
word forms 
MFW 
10,000 words 
(I 5 Mb) 
MFW 
50,000 words 
(75 mb) 
# words/scc 
analyzed Morphosynt. 
words analyzer 
10,832 
3,692 
1,483 
533 
15,13 
44 40 
111 95 
308 270 
words/see 
Full 
morphological 
analyzer 
13,5 
Table 1. Evaluation results. 
Even when our  is agglutinative, and 
its morphological phenomena need more 
computational resources to build complex and 
deep structures, the results prove tile feasibility 
of implementiug efficiently a fifll 
morphological analyzer, although efficiency 
was not the main concern of our 
implementation. The system is currently being 
applied to unrestricted texts in real-time 
applications. 
4 Related work 
(Koskeniemmi, 83) defined the formalism 
named two-level morphology. Its main 
contributiou was the treatment of 
morl)hographemics and morphotactics. The 
formalisnl has been stmcessfully applied to a 
wide wlriety ot' s. 
(Karttunen, 94) speeds the two-level model 
compiling two-level rules into lexical 
transducers, also increasing the expressiveness 
of the model 
The morphological analyzer created by 
(Ritchie et al., 92) does not adopt finite state 
mechanisms to control morphotactic 
phenomena. Their two-level implementation 
incorporates a straightforward morphotactics, 
reducing tile number of sublexicons to the 
indispensable (prefixes, lemmas and suffixes). 
This approximation would be highly 
inefficient for agglutinative s, as it 
would create lnany nonsensical interpretatiolas 
that should be rejected by tile unification 
phase. They use the word-grammar for both 
morphotactics and feature-conlbination. 
ill a similar way, (Trost, 90) make a proposal 
to combine two-level morphology and non- 
sequential morphotactics. 
The PC-Kimmo-V2 system (Antworth, 94) 
presents an architecture similar to ours applied 
to English, using a finite-state segmentation 
phase before applying a unification-based 
grammar. 
(Pr6szdky and Kis, 99) describe a morpho- 
syntactic analyzer for Hungarian, an agglu- 
tinative . The system clots not use the 
two-level model for segmentation, precom- 
piling suffix-sequences to improve efficiency. 
They claim the need of a word-grammar, 
giving a first outline of its design, although 
they do not describe it in detail. 
(Oflazer, 99) presents a different approach for 
the treatment of Turkish, an agglutinative 
, applying directly a dependency 
parsing scheme to morpheme groups, that is, 
merging morphosyntax and syntax. Although 
we are currently using a similar model to 
Basque, there are several applications that are 
word-based and need full morphological 
parsing of each word-t'orm, like the word- 
oriented Constraint Graminar formalism for 
disambiguation (Karlsson et aI., 95). 
Conclusion 
We propose a model for fllll morphological 
analysis iutegrating two different components. 
On tile one hand, the two-level formalism 
deals with morphographenfics and sequential 
morphotactics and, on the other hand, a 
unil\]cation-based word-grammar combines lhe 
granlll-iatical in\['ornlatioli defined in illoi'- 
phelllOS alld also handles COlllplcx illori)ho- 
tactics. 
Early application of sCqtloniial I/lOrl)hotactic 
conslraints dtu-ing the segmentation process 
avoids all excessive laUlllber of nleaningless 
segmentation possibilities before the 
coulputationally lllOlO expensive unification 
process. Unification permits lhe resohition of a 
wide variety of morl)hological phenonlena, 
like ellipsis, thal force the definition of: 
complex and deep structures Io roprosenl the 
output of the analyzer. 
This design allowed us io develop a full 
coverage allalyzor that processes efficiently 
unrestricted loxis in Basque, a strongly 
agglulinafive langttage. 
The anaiyzcl" has bccll integrated ill a gCllOl'al 
franlework for the l)lOCessing of l~asquc, with 
all the linguistic inodulos communicating by 
llleallS O\[: foattll'C stltlCllll'eS ill accord {o the 
principles of ihe Text Encoding Initiative. 
Acknowledgements 
This research was partially supported by the 
Basque Government, the University of the 
\]71aS(lUe Cotlntry {/lid the CICYq' (Cotllisidn 
lntcrministorial de Ciencia y Tecnologfil). 

References 

Aduriz 1., Aldczabal I., Ansa ()., Arlola X., I)faz de 
Ilarraza A., Insau.~li .I.M. (1998a) EI)BL: a 
Mttlli-l~ttrposed Lexica/ Sttl)l)c;rl .lot the 
Treatment of Ba,s'que. Proceedings of the l;irst 
Inlernational Confcncncc on l Auiguagc Resources 
and Ewduation, Granada. 

Aduriz I., Agirre E., Aldczabal 1., Alegria 1., Ansa 
O., Arrcgi X., Arriola J.M., ArtolaX., I)faz de 
lhu'raza A., Ezciza N., Gqicnola K., Maritxahu" 
A., Maritxalar M., Oronoz M., Sarasola K., 
Soroa A., Urizar R., Urkia M. (1998b) A 
Framework .for the Automatic Pmce.vsi#~g (if" 
Basqtte. Proceedings o1 the First Ii~ternational 
Con \[elelite on Lall.gtlagc Resources turf 
Evaluation, Granada. 

Aduriz I., Alegria I., Arriola J.M., Artola X., Diaz 
de Ilarraza A., Eceiza N., Gojenola K., 
Maritxalar M. (1995) Different Issues in the 
Design of a lemmatizer/Tagger for Basque. From 
Texts to Tags: Issues in Mullilingual Language 
Analysis. ACL SIGIDAT Workshop, Dublin. 

Alcgria 1., Art(Ha X., Sarasoht K., Urkia M. (1996) 
Automatic moqdzological analysis of Basque. 
IAtcrary and IAnguistic Computing, 11 (4): 193- 
203. Oxford University. 

Aniworlh E. I.. (1994) Morphological Par, ffng with 
a lhl(fication-ba,s'ed Word Grcmmutr. Norlh 
Te, xas Natural l~anguage Processing Workshop, 
Texas. 

Arlola X., Dfaz de \]larraza A., Ezciza N., Oo.icnohi 
K., Marilxahu' A., Soma A. (2000) A proposal 
for the integration of NLP tools using SGML- 
lagged documeHls. Proceedings of ll~e Second 
Cotfforence or1 Language Resources and 
Evaltmfion (IA~,EC 2000). Athens, Greece 2000. 

Bcesl%, K. (1998)AraDic Morphological Analysis 
(m the lnlernet, l'rocccdings of the International 
Conference on Mulii-IAngual Computing (Arabic 
& lhlglish), Cambridge. 

Hudson R. (1990) English Word Grammmar. 
Oxford: Basil Blackwcll. 

ldc N., Vcronis J. K. (1995) Text-Ettcoding hHtia- 
tire, Bac:kgmtmd and Context. Kluwcr Academic 
Publishers. 

Karlsson F., Voulilaincn A., Heikkiht J., Anltila A. 
(1995) Constrai, t Gnmmmr: A lxm,¢tmge- 
i#ldcpcndent System Jor Pm:ffng Um'estricled 
Text, Mouton do Gruyicr ed.. 

Karttunen L. (1994) Constructing lexical 
Transducers. Proc. of COLING '94, 406-411. 

Koskcnniemi, K, (1983) Two-level Mc;qdlo\[ogy: A 
ge,eral Comptttational Model ./br Word-Form 
Recognition and Pmduclioth University of 
Ilclsinki, l)clmrtmcnt of General IAnguisiics. 
l~ublications n" 11. 

Oflazcr K (1999) Dependency Parsing with an
Extended Finite State Approach. ACL'99, 
Maryland. 

Proszeky G., Kis B (1999) A Unification-based 
Approach to Morpho-syntactic Parsing of 
Agglutinative and Other (Highly) lnflectional 
Languages. ACL 99, Ma,yhmd. 

Ritchie G., Pulhnan S. G., FJlack A. W., Russcl G. 
J. (1992) Computational Morphology Practical 
Mechanisms for the English Lexison ACL-MIT 
Series on Natural Language Processing, MIT 
Press. 

Shicbcr S. M. (1986) At/ lntroductiotz to 
Unification-Based Approaches to Grammar. 
CSLI, Slanford. 

Sproat R. (1992) Morphology and Computation. 
ACL-MIT Press series in Natural Language 
Processing. 

Trost H. (1990) The application of two-level 
morphology to non-concatenative German 
morphology. COLING'90, Hclsinki. 
