Incremental Construction of a Lexical Transducer for 
Korean 
Hyuk-Chul Kwon*, Lauri Karttunen 
l)cpt, o\[' Computer Science, l?usan National Univ. Pusa,n, 609-735, South Korea* 
Xerox PAl{C, 3333 Coyote Hill Road,Palo All;() CA94304 
ABSTRACT 
The paper describes the construction of a lexical trans- 
ducer for Korean that can be used for stemming and 
generation. The method contains two innovations: (1) 
two h.'vel rules ms well-formedness constraints in the 
initial phase; (2) the combination of intersection and 
composition of rule transducers in a dee\[) cascade \['or 
the final re.suit. 
Keywords 
Korean Lexieal Transducer, 'Pwo level Morphology, 
Morphotaeties, Ordered ll,ules 
1 Introduction 
This paper presents an incremental construction 
method of a lexieal transducer (LT) for Korean. 
A lexical transducer, lirst described by Karttunen, 
Kaplan, and Zaencn (KKZ) (Karttunen, 1992a), is 
a speeialized tinite state transducer (FST) that maps 
canonical citation forms of words and morphological 
categories to infh.'cted surface forms, and vice versa. 
l;\]'s have many advantages for stemming, morpholog- 
ical analysis, and generation. They are 
(i) bldirectlonah the same structure (:an be used for 
stemming and generation. 
(ii) etticient: the recognition and generation of word 
tbrms does not require the application of any mor- 
phological rules at runtime. 
LTs for li',nglish and French have been built at Xe- 
rox PARC within a t}amework known as two-level 
Inorphology (Koskenniemi, 1983). As described by 
KKZ(Karttuneu, 1992a), this can be done in three 
steps: (i) we construct a simple flnite-state automa 
ton (LA) that defines all valid lexical tbrms (LFs) of 
the language.. A I,F is a eoncateuation of stems and 
morphemes in their canonical dictionary representa- 
tion. (ii) We describe morphological alternations by 
means of two-lew,.1 rules(Koskeuniemi, 1983; Kin't- 
1 '\['his p~per was partially supporl;ed by t(orcan Science and 
h\]nginecring l,'onndation. 
tunen, 1993), compile the rules to finite-state trans- 
ducers, and intersect them to form a single rule trans- 
ducer (RSF). (iii) We merge the LA with the ffF by 
composition producing the 1/\[' that has on its lexi- 
cal side every valid lexical form of the language and 
on the surface side the corresponding realization as 
determined by the morphological alternations of the 
language. 
KKZ argued that for l!'rench, it, was best to divide 
step (ii) into two stages. A three-level description was 
required to give a linguistically satisfactory account of 
the plural lbrmation of eompomld nouns. KKZ opted 
for two cascading two-level rule systems thai; are corn- 
piled separately, then intersected laterally and finally 
composed to a single RT. 
The task of building a morphological analyzer for ~t 
language such as Korean or ,I apanese is a much higher 
challenge than it is for l';uglish and French. A Ko- 
re.an verb may have more than fifty thousand inflected 
forms. = The Korean writing system (l\]anaul) does 
not consistently distinguish be.tween single and eom- 
pound nouns. Because llangul uses syllabic charac- 
ters, changes in syllable strucl, ure are directly reflected 
in the. orthography. 
Because o\[' the complexity of the morphological al- 
ternations in Korean, it; is very difficult, although not 
impossible in principle, to describe them in a single 
two-level rule system or in a system that is limited to 
just three levels like the KKZ system for French. The 
most natural description of the Korean alternation is 
a cascade of rules of greater depth. 
2 Morphological Alternations 
in Korean 
3'he \[langul is a phonemic syllabic-based script where 
morphologieal alternations that change the syllable 
strllctnre of the word are rellected in the orthogra- 
phy (Korean Ministry of Education, 1988; Kim, 1990). 
This paper uses the so-called Yale system for repre- 
senting llaugul in a Romanized \[brm, except; that we 
2A " e-jel(word)" which is a spacing unit of llangul can con- 
sist, of a verb stem, scwu'al endings and pnstpositions. The A,I 
Lab of Dept. of Computer Science, l'usan National Univ. has 
more than 50,000 " c jcl" generated front "mck-~a(eat)" 
1262 
use wue and oa instead of we and wa of the Yale sys- 
tem because we art(\] wa do not show that they are 
diphthongs, composed of wu and e and of o and a 
respectively. 
,,:xa,np|e.~ (l)i.~:ld (2) i.v(,lvc thro(,~ .~i~,~,l,...,o(,pho- 
logical alternat, ions: (i) the realizatiou of a stem fi- 
nal p in irregular predicates as a vowel in front of 
vowel-initial suffixes; (ii) let| to-right voweJ harnlony 
\[)°sell on partitioning of vowels into 'lighl,' (\[+light\]:a, 
o, oa), 'dark' nnd 'neutral'|l-light;l); (iii) tile realiza- 
l;ion of i~ morpheme boundary as a syllable boun(lary 
or as nothing. 
A syllabi(; boundary is introduced tle\[br(' fill(' last; 
consonant of irregular-p verbs/adiectiv('s when a 
vowel-initial suffix tbllows and the -p itself is realized 
as o if the preceding vowel is \[ blight\], otherwise 'wu/)y 
vowel harmony. Only some o\[' l, he predicates ending ill 
-p are irregular. In verbs I;hat, end in a vowel such as 
cwu %o give', Lhe vowel may merge with a sulfix-initial 
vowel to form a diphthong or il, may retain its syllabic, 
st°Ills ill a two-vowel seqllenee. 
Wc us° "+" in the lexieal representation to marl( 
morpheme boundaries, "-" to mark syllable bound- 
aries, "0" I,o represent deletion (surfaee side) and 
cpenthesis (lexical side), an(I two diacritic markers 
{pVerb} \[br an ir,x'.gular -p verb and {rVerb} \['or a 
regular verb to tel)resent classes o\[' verbal si;ems. 
(,) (a) 
(b) 
w~ 0i, {:0Verb} I- a/~ - s e 
...... ~ ('pV,,rb } 4 ,~/ ..... ) 
IVlt - IVlt 0 \[) C - S C 
(:wu-'w~tc-sc: t,o l)ick lip) 
(2) (2) ,,., {,.v(..,,} ~ ,~/ ...... ( ....... {..v,:,,i,} j./ ....... ) 
(Ill .... ,~ 0 ~ . ,: ( .............. t.o ~i,.,,) 
(e) .... 0 . .... ( .............. t,, ~i~,,) 
The (a) t)i~rt of both (1) and (2) are lexMfl forms 
and (b) a.d ((-) ~(, (:(,,:,'o~p(,.dil~g ~.,.la,'.e wo,:d,~. 
Be.cause cwup is ~t, irregular-p verb, tile following 
phoneme a/e is a vowel and the iireceding syllallle wu 
is \[light\], p in (I) (a)is realized as 'wu. The. a/c is 
realized as c because l, he pre(:cding surface vowel wu 
is \[-light\]. At the same time, w'u aim c are eontracl;ed 
into a (liphl;hong wue wflieh is (loser|bed as the deM, ion 
of '%" in (a)of (1). 'Fhese two cha,~ges are linked in 
that one must not be, allowc(t to happ(m without the, 
other. Otherwise cwu-wu-c-se and cwu-wue-se would 
lie general;(~d, but ()lily cw'~t-w'tte-se is graLrttnatic&\]. On 
tile other hand, in tile case of the regular verl) cwu, 
both cwu-e-se and the contrac~,cd variant; cw'tte-se are 
aceeptabh',. 
These rules (:ira lie described easily I)y two-level 
nlorl)hol(>gy as |Clews. 
(s) (i) A syllable boundary ( .... ) is introduced 
before a st;(.'m-fiu;d p in irregular -p 
verbs/~(Iject;ivcs when a vowel-initial suffix 
follows. 
(ii) h st(:m final p in irregular -p verbs/~d.iec.- 
lives is realized as o if' the l)rec('diI,g vowel 
is \[+light\], otherwise wu. 
(iii) ale is r('alizcd as a if the 1)r('ee(ling vowel is 
\[+light\], othe.rwise (',. 
(iv) (a)The nlorpheme boundary following ir 
regular-p vcrbs/adjcet;ives is deleted be. 
fore a wnvel-init;ial sultix and realized as 
syllable bound~ry elsewhere. 
(b) The morl)hcme boundary in regular 
verbs/a(ljectives can lie deleted or real- 
ized as a syllable 1)oundary (le.pe.n<ling 
ell (;olitex\[,. 
With the hell) C' the Xerox two-level rule eolnpiler 
(%wolc')(Karttunen, 1992b) the rules °.an bc compiled 
to finil;e state transducers ~md int;erseeted to a single 
trans(lueer. I)escribillg reich phenomena as paral\[('.l 
rules may be eomplie~t, ed he°°use eaeh rule may be a 
t'ormul~tion of effed;s caused by several t)honologieal 
rules. For example, i,I f'orlnalizing (ii) as a t;wo-h.'vcl 
rill(; we |nus|, take into aceoun\[, bol, h irregular eonjugw 
t, ion C'-p v('rbs/n(ljt'ci, ives and vowel harmony. This 
is a not a desirable state of ~tfl'airs. We will coln(~ back 
t,o this l)oiut later. 
3 Construction of a Korean 
Lexical rI~-ansducer (LT) 
The first, st(q) in the coustruet;ion of a lexieM trans- 
dueer is to create a simple linite--state automaton for 
all wdid k'.xical tbrms of Korean. The lexical aul, oma- 
ton (I,A) is eomllosed wit;h l,he first set of rule trans- 
ducers (R;I'). The result;ing transducer has on its "Ul/ 
per" side, |,he valid lexical forms, and on the "lower" 
si(le, interm0.(tiate represenl;aJ, ions derived fly the lirst 
set C' rules. This inl;ermediate transducer is composed 
with |,he second set of rule trmlsducers and tim in'o 
tess is itera|,ed several l;imes. At each stage ill tit(! 
process, the lexicaI si(le remains unchanged and the 
iut, erme(liate re\[)resenl,atious are changed by the new 
set C' rules. The \[ilml result is a transducer tim| asso 
clare's the valid lexic~d forms with their proper surface 
realizations. Concel)tually this is similar to what hap. 
|)ells ill a traditional phonologic.~d deriw~tion. Ill)w- 
ever, note thai, rul('s a.pply to |,he lexicon as a whole 
r~ther than 1,o individual words an(I (;It(: result; of e~(:h 
application is ~L new transducer. /~ecaus(' th(" interme- 
diate levels (lisa,°)pear in the eomposition, the resulting 
l/l' is equaJly well suited for morphological aualysis as 
it is for general;lolL 
The compila|,ioll aml int;ei:seel,ion of rule d;lNtlls(ltle- 
ers was done with the I.wole eompihw, the cousl;ruetio, 
126.3 
of the LA and the compositions we carried out with 
the Xerox interactive finite-state calculus ('ifsm'). 
3.1 Construction of Lexical Automa- 
ton(Lh) 
The ifsm-utility enabled us to assemble the LA incre- 
mentally. The first step was to divide the total list 
of morphemes into snblexicons on the basis of their 
morphological type and to make a text file for each 
sublexieon. We added diacritic markers to the edges 
of certain types of morphemes in order to be able to 
enforce morphotactie constraints on valid morpheme 
sequences. 
Each sublexicon was compiled separately to a finite- 
state automaton. The sublexicons were used to con 
struct the LA with the help of the regular expression 
facility in the ifsm-toolkit..For example, having com- 
piled a simple automaton from the list ofsm@le nouns, 
we could expand it to an infinite lexicon of compound 
nouns with the regular expression 
"noun.auto" \[# "noun.auto"l* 
'\]'his regular expression reads the noun automaton 
from a file and concatenates it with itself any number 
of times and marks the internal word boundaries with 
#. 
The first version of the LA was made in this way by 
combining sublexicons with regular operations (con- 
catenation, union, iteration). 
In order to enforce morphotactic constraints on the 
concatenation of some classes of snflixes, we wrote a 
set of two-level rules that require or prohibit the occur- 
fence of particular diacritics at certain suffix bound- 
aries. Lexieal forms that do not satisfy the morpho- 
tactic constraints get eliminated in the composition 
with the well-formedness rules. The diacritics them- 
selves are realized as zero so that they are not present 
in the lower side of the resulting transducer. The final 
form of the lexical automaton is obtained by extract- 
ing the lower-side from that transducer as a simple 
automaton. 
We believe that this incremental method of lexi- 
con construction is better suited to morphologically 
complex languages than the lexicon format commonly 
used in two-level morphology. In standard two-level 
lexicons, individual entries contain intbrmation about 
which sublexicon they may concatenate with. The en- 
tire lexical structure is compiled in one step to large 
letter tree (Karttunen, 1993; Antworth, 1990). Our 
method is more tractable in two ways. Firstly, the 
lexicon can be developed and refined stepwise. Sec- 
ondly, the morphotactic rules of the language are de- 
scribed explicitly as the regular expressions that con- 
struct the LA in conjunction with the well-formedness 
constraints that eliminate certain types of concatena- 
tions. In two-level lexicons of the standard variety, 
the morphotactic structure of the language is not de- 
scribed explicitly at; all. l~,ather, it is expressed in a 
very opaque and indirect way, in the sequences of links 
between entries and snblexicons. 
Sproat argued thai; two-level morphology of mor- 
photactics leads to a somewhat inelegant model of 
long-distance dependencies and suggested the unfica- 
lion scheme, due to Bear, as a solution (Sproat, 1992). 
But unification scheme introduces additional runtime 
overhead. The above approach can easily and explic- 
itly describe the fact that "-able" attaches to verbs 
formed with the prefix "en-" and does not require ad- 
ditional runtime overhead. 
We give a few examples of the difficulties in the de- 
scription of Korean morphol;artics. There are two dif- 
ferent types of endings: (i) non-tinM (verbal) endings 
for tense, modality, subject honorific or aspect, and 
(it) final (verbal) endings as cornplementizer, nomi- 
nalizer and adjectivizer. The non-tinal endings are 
placed in fl'ont of final endings and must be followed 
by a suflix of the second type. 
(4) shows the ordering restrictions of non-finM end- 
ings. The parentheses indicate optionality. 
(4.) (+ lion) (4- t'~st 4- Perf (4- Will) 
I(+ Past ) (+ Will) (4- Beta'o)) 
(Hon:Honorific; \]~etro:I{ei~rospection ; 
Perf: Perfect Aspect) 
(4) compiles to a lexicon covering 20 difi>rent com- 
pound non-final ending sequenees including null. 'l'his 
representation is clearly more informative than a sim- 
ple listing of the members of the class. The proM- 
bition of "Past+Perf+Will+l{.etro" in (4) can not be 
described by an adjacency table. 
In (4) we do not need any morphotactic diacritics 
on the left, because all non-final endings can combine 
with any verb and adjective stems arm the combina- 
tion of non-final and final endings is controlled by the 
diacritics of the latter group. 
(5) shows three entries in the suhlexiron of final 
endings. Tim elements in square brackets are morpho 
tactic diacritics. (Square brackets indicate grouping, 
the vertical bar marks a disjunction.) 'Phe diacritics 
are deleted by well-formedness rules when the final 
endings are combined with other morphemes. The di- 
acritics on the left of nun and nuu-ka shows that they 
can not combine with adjectives. 
(5) \[Verb I Adj I I{on I P;~t I WlU I I'~.~'f\] ~ ~ {l)~) ; 
\[Verb \[ Ilon\] 4 ....... {Con) ; 
\[V~,b I rtoH I ru~t I win\] + ....... k. (q,,~} 
({Con):Conjunct, ion; {De,:):1)eclm'ative; {Qne}:Questi(m 
inarking; ";": the end of declaration, t, he meaning is the 
same as "1") 
1264 
'\['h,, dh.:,.i,,i(: ,,..,k,;,.. {D~,4, {q,,,} ,~,,a {C,:,.} 
have two ro\]es as l, he \[C&|,III'(~ 0\[' t, hc i')lor\[)h(!l\[l(~S ~q,\[l(\] 
as I.he righl;-h~md (:(ml;ext. They r(!nl~in ill liual I,A 
bcc~msc they ~u'c t;hc tL'~l;ure of c~(:h mO,'l)hemc. 
I'}y (:onc~t, cn~l;ing tJle sul)ncl;works of col|ll)Olllld 
non-fimd ('.l~(tings and finM emliugs, wc get ~t suht(!x 
icon of endint~ sc(lll(!uce a.s showJ~l ill ((\]). The \[Vcrl) 
I Adj\] di~(:ril, ics indical,e I, hal, nou final eudings (:~m 
combine with ~my vcrh stems aud ad.jcctive st, elll,q. 
(6) (\[VerhlAd.i \] ", ...... i ........ ! ........ ti,ml ..... \]i,,g.aut,," +) 
"llna.l ending.a.uto" 
This con(:a,l,elm.l;ion pi'oduc(~s a,n iuil, ia.I lcxicon of 
974!)8 (2*20*2378 t 2378) diffl!reul, sequences where 20 
is the number o\['compouud non fiual(mdulgs ~utd 2378 
is t, he numl)er or sequences of' fiual (!udiugs with t,\]lcir 
ulorphot~clAc di~cril,ics. This sol, is rcdu(:cd 1:o 7888 
by ~ s(;t, o1 well \['ormedness rule,'; that elimim~l,e un- 
w~ml,ed scqucuces mM delel,c the morlAlOl~act;ic dia 
cril, ics. The cO\]Ul)ositi(m of the iuit, iM Icxicou wil, h 
I,\]lc well \]'ormcdness rli\]cs pro(h~(:es a I,ra.lls(hl(:(;r \['rom 
which lhc lower side is exl, r~cLed as a simple ~u,tom~> 
torl a, Ild lls,:!(I ill the coustruct, ion of i, he linal 1,A. 
Allowing uouns Lo fl'c(;ly (:Olnl)oun({ wil, Ii l,ouus (:1'(2 
M;es ~ t)rol)lem I)cc~mse il, gives rise to ma, ny umt(:c(;I)l;- 
~dfl(' or unlil(c\]y cO\]nlmunds. For examl)}e , the E)rtn 
cw'ang-krt~-z ha, s \[iv(! ~dt,(.'rn~\[Lc mialy,ses: 
(7) c'~.'.~ql-k~'.-i 
G) ......... . v,.,4,.,i,~a~,,)t ~(.,,h.i,:,:~ ....... k,,,) 
*(h) ,: ......... , ,,,,,,.(,,i,l,n,,)//:¢(~ ....... ) 
*((') ........... :/( ....... k ) -//keu~ ( Ii v,w,s.'d t.i ...... \] 
k/(s,,bject m*u'ker ) 
*(d) ............. ,( ....... k) #,:,,,, (H,,,..,.,~.,i ...... \]://:,(,,,,,,m,I ....... ) 
:'~ ((}) ..... "L,,,,\[\]( ....... k) "//~ ~ H f' ~(\[ ....... \]i ...... ) 
Our solul, ion wa.s to constrain cO\]Ul)OUlMing wit;h 
a. wcll-f'ormeduess rule I,hM, excludes COml~OUnd,'; with 
monosyllabic nouns (l(wou, 1!)90). 'l'hc (:Oml)l(!xit, y 
of Lhc n~orpl~ological Mt, ernal, iollS in KOl/(NLII iN HO high 
1;ha J; we need ~m easy way 1,o give coHsl;ra.inl,s hlcremen 
ta\]ly. Our al)l)ro~tch is a consistelH, mM explicil, w{w o1' 
describing morphol, a(:tic rules iiMuding Ioug-distaI~cc. 
dcpen(h'.n cic~';. 
3.2 Composil;ion of l.exical A utoma~ 
ton wil;h Rule Transducers 
A\['tc.r constructing t, lm I(ore~m I,A, wc derive from it, 
a h:xic:~l I,ra.ns(\[ucer by (:<)lUl)OSing; lhc I,A with ruh! 
lJ'ans(lucers (l{:t's) iu sevcrM sl~g(!s. AI, ea.ch st,age the 
previous resull; is composed wil, h an 1{71' derived hy iu- 
(,(;rsecl, ion from sew~r;d I,wo level rules. The rule sets 
i,,clude (i) morpheme gcne,'a;ion ,'tiles, (ii),.,los for ir 
,.,~g,,h.. v.,.t>.~/~.U.~:tivo.~, (iii)d..L..~,io,, ,:.k..., (iv) w>w,~l 
harmony rules ~md (v) coni, r~cgion rules. Morpheme 
geuer~tl;ion ruh',s give a, surl';tcc r,mlizat,ion to morl)ho. 
\]ogic,%l tags, such a,s P0.sl,, l\[on(ori\[ic)> el;c,, t{,uIes \['o:r 
irregtlla, r vcrl)s (lea\[ with final c, onson:-mts ~m,:t sy\]iah 
ilicat,ion. Dc'lel, iou rule~'; climiml, I,c ouc of l,wo ,~Mja 
cent vowels on morpheme boundaries. Vowel }ltu'rnony 
rules rcMize t, he h;u'rnonizing ~zrchiphone.me,'~ WU as 
0 () I 1 11\[) '1~ ~t i ~ (l /d ~S (t () ll ~ delmnding on the quMity 
of l,hc Im'ce<ling vowel. (~onb'~wtion rifles involve tim 
merging of ~M,iaceut vowels t.o a. dipht, hollg or a single 
w)wel ;is a result o\[' the loss of the iu(;crw:ning sylhd)le 
holm(hu'y. 
All, hough it; is possible in principle to wi'it,e jus|: 
oue l,wo-levcl rule sysl:cm I;\[HtL describes all l;Ile alter 
md, io\]m in lm, rallel, it is very difficult in practice Lo 
creal;e a rule sy~;l;em with l, lu~t degree o17 colnplexil;y. 
The cOnll)h'Mi,y m'ises \['tom t, ll(! \['act, t,\[Hd, the \['ormu 
hd, iou of every rule iu a t,wo level system de.peuds ou 
every rule I, ha.t h~t,'; ,'K)me elleel, out, hc c(mtexl, of I;hc 
rill(: l;\[lal, We ~tre I,rying l,o express. For ex~mll)lc , ir 
l, here is a ruh! I,haL forces X 1,o be dclcl,cd in I'ronl, of 
~ Y ~md ~mot, h<,.r rule thai; introduces Z between X 
a.nd Y, gl'ett~ (:a.re lnusl, \])c exer(:isc(\] \])y I, he rule wril;er 
I,o InMce sure I, tHd, bot;h rules ~re specilicd in {~ w~w 
I, ha(, leaves room \['or the ol;her rule Lo }uwe il;s ef\['(:cl, 
but does not, (lepeud Oil Jl; il' the (lclel;ion of X ~md 
the inscrl,ion o\[' Z ~u'e two iudcl)eudenL altermd;ions. 
T\]Ic t)tu'l,iouing of rule'; inl;o scl,s and I;he inl, crletwin~,; 
o\[ i\[lt(~rsecLiou arm (:onll)osil,ion I,;r(;atly simplili(:s t,h(" 
task o\[' creal, ing and updaJ,iug the rule system, l¢.,ll(~s 
t, hal. ttl)l)ly iu dilFcr(mt envh'olmlenl;s ~md (\]o not, M" 
t'ecl, each otJmr can be COml>ihxl and iul;ersccl, ed easily, 
whereas rules filM, involve MI;crm~tions in overhtl)ping 
colll;ext.s ~u'c n\]osl, ea.sily ha.ndled hy l~l~.(:ing them in 
dill>rent, levels in law. cascade. \[u ell'eel, l,h(', l'tll(.'s arc. 
I)artially ordered. Sproal; also nol, iced l;tml, rub; inter 
~,~ctions which \[mty bc ca.sy l,o sl;a.Le ill Lct'ms o\[ orderc.d 
I:ILICS, ~tn'e O\['I,C\[I much lllOt:e di\[licult to sl,M,e m one two 
h:wl rule sysl, eln (Sl)roai,, 1992). 
For Korean, l, hc partil;ioniltg o\[' t, hc rules fbr mor- 
phological alterna.l,ious iut.o t, he five s('.l;s described 
al)ovc, apl)C~u:s 1,o be tim Ol)fimM choice. I';ach of the 
rules iu the lla'm#ld sl, au(hu'd orl, hogr;q)hy Imhl\]shed iu 
M{u'ch of 1988(l(or<m M inisl;ry o\[ Education, l!)88) 
is descril)cd in the corresl)oudiug l,wo level r~fle Sel)~t 
rarely in our inli)lemcut,~l, ion. The order of rules Lakes 
l;hc roh' oF rule iutcr~cl;ions. In this casct~de, qw Mter- 
i,;~l, ious described in sccl,iou 2 ~s e×~mqqe (3) ~\],I:Q split, 
het,ween three levels: 
(s) (i) Rule,q E:.r irregular predicates: 
A s.ylia, bh~ boumhu:y is introduced be\[brc the. 
stem \[ina, I p in irreguhu"-p vcrbs/~djective,s 
when ;r vowcl.iuitJal suffix follows. The t'ol 
lowing morph( mc homuhlry is deleted ~md p 
is rcMized as the harmo\]li/,hlg arcldl)houeme 
WU 
1265 
(ii) Vowel harmony rules: 
(a) The WU is realized as o if the vowel of 
the preceding surface syllable is \[-flight\], 
otherwise wu 
(b) The 15' is realized as a if the w)wel of 
the preceding surfi~ce syllable is \[÷light\], 
otherwise e. 
(iii) Contraction rules: 
The morpheme boundary " ~" (:an option- 
ally be deleted between wu and e. 
The etfect of these rules with respect to the irregular 
-p verb cwup 'to pick up' is shown in (9). 
(!t) (a) ..... 0 p {vWr~,} + *':- .~,-~ 
(b) ..... WU 0 0 tO- .~,, 
(c) ~ ~ .... ~, 0 0 ..... 
The intermediate level, (b), is eliminated in the cas- 
cade, thus the final lexical transducer maps (a) di- 
rectly to (e). 
4 Conclusion 
The success of our work on Korean further underscores 
the point made by KKZ(Karttnncn, 1992a) thai; the 
most salient property of two-level morphology is not 
the number of levels but the fact two-level rules de- 
scribe regular relations (just like classical phonological 
rewrite rnles) (Kaplan, 1988; Ritehie, 1992). Conse- 
quently, it is possible to combine sets of parallel two- 
level rules by intersection and merge them with the 
lexicon and other rule systems in a cascade, of compo- 
sitions. The complexities of Korean morphology make 
it desirable both tbr linguistic and computational rea- 
sons to allow for many more intermediate levels than 
assumed in previous works on English and t,~reneh, l{.e- 
gardless of the nnmber of intervening levels, the out- 
come is a single lexieal transduce.r thaw directly maps 
lexical forms to their intlected surface realizations, and 
vice versa. 
In the construction of the lexieal automaton for Ko-. 
rean, we have put two-level rules to a novel use as well- 
formedness constraints on lexical tbrms. The sublex- 
icons from which the LA is constructed contain (tia- 
critic marks on the outer edge that identify the type of 
morphoh)gical constituents that the lexicon contains. 
The role of rules in the \[,A constructions is to enforce 
morphotaetics and, at the same time, to eliminate the 
diacritics that encode them. 
Theoretically, we can get the same LT to compose 
the morphotactic and phonological >ties all together 
into one. rule and compose it with the initial LA or 
to compose the initial LA with each rule of the mor 
photaetic and phonological rules one by one in order. 
Practically, the composition of all the morphotactic 
and phonologieal rules into one rule causes the combi- 
natorial explosion of states. This shows that ordered 
rules can be used to avoid the combinatorial explosion 
of states in one two level rule system too. 
References 
\[1\] An|worth, l,;van 1,.(1990) I~U-KIMMO: a twol- 
level processor for morphological analysis. 0(> 
casional Publications in Academic (?omputing, 
No. 16,Summer \[nstitnte of l,inguistics, Dallas, 
Texas. 1990 
\[2\] Kaplan, I1,. M.(1988) "Regular models of phono- 
logical rule systems". Alvcy Workshop on Pars- 
ing and Pattern l{.eeognition. ()xibrd University, 
April, 1988 
\[3\] Korean Ministry of Education.(1988) Hangul 
Standard Orthography (Revised in 19881, l)ocu- 
men| number 88-l, Published in March 1988. 
\[4\] Karttunen, Lauri, Kaplan, RonMd M., and Zae- 
nen, Annie.(1992a) "Two-Level Morphology with 
Composition". Coling-92. Proceedings of the 
fifteenth \[nternationM (\]onference on (\]omputa- 
tional I,inguisties. Volume \[. pp. 141-148. 19!12. 
\[5\] Karttunen, 
Lauri and Beesh'.y, Kenneth ll..(1992b) Two-Level 
Rule Compiler. Technical lt.eport. Xerox Palo 
Alto l~.eseareh Center. IS'1'L-92-2. October 1992. 
\[P92 000149\]. Palo Alto, California. 1992. 
\[6\] Karttnnen, Lauri.(t993) "Finite-State Con- 
straiuts". To appear in The Last Phonological 
Rule, John Goldsmith, ed. Chicago University 
Press. Chicago. 1993. 
\[7\] I(im, C.(1990) Th.e \]';zplanation of Ne'w \[lang~d 
Standard Orthography, Kul-Sup Press. Seoul. 
1990. 
\[8\] Koskenniemi, K.(19831 Two-level Morphology. A 
General Computational Model for Word-l"orm 
Recognition and Production. l)epartment of Gen- 
eral Linguisties. I.lniversity of Ilelsinki. 1983. 
\[9\] Kwon, lI. and (?hae, Y.(1991) "A Dictionary- 
Based Morphological Anah\]sis". Proceedings of 
the Natural Language lh:ocessing: Pacific Rim 
Syrup osium'91, p p. 178-185, 199 l. 
\[10\] tt.itchic', Graeme I).(1992) "Languages Generated 
by Two-level Morphological l~ules". Computa- 
tional l,inguisties, No. 18, Volume. 1, pp.41-59. 
March 1992. 
\[11\] Sproat, tL(1992) Morphology and Comp~ttation, 
MVI' press, 1992. 
7266 
