File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-1087_metho.xml
Size: 12,111 bytes
Last Modified: 2025-10-06 14:13:44
<?xml version="1.0" standalone="yes"?> <Paper uid="C94-1087"> <Title>APPENDIX: Running Examples Lexicon Verbal Lexicon Ending Lexicon Postposition Lexicon Dthers</Title> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> INTROD U CTION </SectionTitle> <Paragraph position="0"> The two-level morphology model (Koskenniemi, 1983; Antworth, 1990; Barton, 1986; l(~itchie, 199:1; Sproat, 1992) is a well-known comi)u~,at, ional model of mor-phology, which ha~ adaptability a~ well ~u~ siml)licity. In t)ractice, this mo(M ha.s been successfully al)-. plied to several languages including Finnish, El@ish, Japanese, II, ussim h and French. However, the two-level model ha~ been considered to l)c inapl~rol)riate tbr Korean (Kang, 1992; Kwon, 1991). That is, the two-level morphological analysis of Korean is believed to be diliicuit and infcasible because the complex conjugation (inItection) ainl agglutination in word formation, and the syllable based representation of words may lead to a huge mmlber of two-level morphologicM rules. In this paper, we show that the two-level model can be successfully applied to Korean and its rule size is limited to only 52.</Paragraph> <Paragraph position="1"> This paper presents a successful two-lcvel system \[*or Korean morphological analysis. The system wa.s ba~ed on a shareware PC-KIMMO (Antworth, 1990); however, wc extended the I/O component of I'(J-KIMMO to handle Korean alphabet HANUUL; we c(m.~,ructed a Korean dictionary and a Korean morphological grammar (i.e., morphotactics and spelling rules) tot the I'G-K1MMO; wc also used a shareware KGI';N (Miles, 1!191) to translate the linguistic spelling rules into the executable automal, a (i.e., tinite state transducers (FSTs)). This paper focuses on the dictionary and the morphologicM grammar for Korcalt.</Paragraph> </Section> <Section position="3" start_page="0" end_page="536" type="metho"> <SectionTitle> TWO-LEVEL REPRESENTATION OF KOREAN WORDS </SectionTitle> <Paragraph position="0"> The two lewd model is conceLned with directly mapping bctwcen two rcprescntations of a word: (1) tile sur\]hcefo,'m (SF) ~ it appears in the text, and (2) the lexical \]orm (LF) which is represented ms a sequence of ba.~ic morphs and diacritics (c.g., '+' to mark mor-.</Paragraph> <Paragraph position="1"> pheme boundary and '~' for word boundary). As a re suit, an input word in the two-level modcl is analyzed by mapping the word itself (SF) to a sequence of le~ ical forms in dictionary without intermediate stages.</Paragraph> <Paragraph position="2"> In this section, we present a two-level representation of i(ore~m words.</Paragraph> <Paragraph position="3"> 'lb understand the two-level description for Korean ntorphology, one should be properly familiar with Korean alphabet mid their transcription system. So we tirst describe them. l&quot;or ordinary writing system~ the Korean alphabet consists of 40 letters: l0 purc vowels, 11 compound vowels, 14 basic consonants and 5 double consonants. A Korean word is represented with a sequence of syllables; a syllable can be made up of a consonant, a vowel, and a consonant; there are scv-.</Paragraph> <Paragraph position="4"> eral tbrms of syllables (e.g., CV~ CVC~ VC, V~ and C forms); and initial consonant lettcr may not be distinguished front Iinal consonant letter, iiowever, the initiM consonant and the final consonant iiiust be distinguished from each other for successful two-level ya y0 yo yu yC ye wo we wa wE iy ya ye yo yu y8 y9 we w9 wa w8 yi *J ~ I::: i=l vj 1=\[ ~ O ~, 2~ k n t 1 m p ~ ~ E ~h g n d 1 m b s j c</Paragraph> <Paragraph position="6"> systcm; if not, it might cause a lot of useless work (i.e., invalid mapping) and incorrect results because i-th consonant in a word is not clear whether it is an initial consonant or afinM consonant. Furthermore, to write two-level spelling rules for PC-KIMMO, each of Korean alphabet must be m~pped to ASCII character on the keyboard. Therefore, we devised a transcription system for Korean Mphabet a~ shown in Table l, which ha~ the following features: * There is rio letter corresponding to the initial consonant , o,. We did not consider the letter because it is a sort of an orthographic filler \['or the ordinary writing system and is not pronounced.</Paragraph> <Paragraph position="7"> * The initiM consonant letters are not the same as the finM consonant letters. (To sec this, compare the initial consonants MYGODE(I) with the final consonants MYCODE(F) in Table 1.) * Each of compound vowels is represented by a pair of two letters: a semi-vowel letter (i.e., y or w) and one of pure vowel IO, ters excluding 'ql'/fi/ and 'a\]'/5/; here ,?\], and '-~t' are treated as the compound vowels.</Paragraph> <Paragraph position="8"> * There are two archiphoncnic letters: (1) the archiphoneme A for the proper treatment of vowel harmony l, which can bc changed into NULL 1Modern Korean hms a &quot;diagonal&quot; vowel harmony (Ahn, 1985) kept in only one area o\[ word formation, that is, between the tinal vowel of a verbal stem and the following o-initial suffix. This system works in the 0-initiM suffix symbol 0, a vowel letter a, or a vowel letter 9 by context; and (2) the arctfiphoneme I for the proper treatment of predicativc postposition 'ol' /i/, which can be changed into either 0 or a vowel letter i by context.</Paragraph> <Paragraph position="9"> Wc believe that our transcription system makes it simple and clear to describe two-level spelling rules of Korean, and it enables the two-level processor to handle elliciently the complcx spelling changes.</Paragraph> <Paragraph position="10"> IIcre, three spccial symbols are used properly to treat lexical irregularities of Korcau verbal morphology: + for regularity, X for '/c'-irregularity, and $ for all irregularities excluding the '/d-irregularity; X must be differentiated from $ because of the following reasons. In Korean morphology, most of verbal stems ending in the syllable '~' //i/ are irregular. The finM syllable '~'//i/of the stem, when tbllowed by the vowel 'degt' /o/ and preceded by any vowcl other than the light vowels ('o}' /a/ and '22/o/), is changed into 'el'/Io/and the consonant '~'///is added to the preceding syllable. We call it '/_'-irregularity. For examplc, the vcrb stem '~' ~hi-{i~ (to flow) plus the suffix 'ot' /0/ (INFINITIVE) becomes the verbal word '~et' /hil-to/. tlowever, there is 'le'-irrcgularity which ocharmony where o has an alternation a if the final vowel of a verbal stem is a light vowel a or o. For exampl% the verb stem '_W /bol (to see) plus the sullix 'oI' /o/ (INFINI-TIVE) becomes the verbal word 'Lo~' /bo-a/. tlowever, the verb stem '~' /cu/ (to givc) plus the suffix '&quot;t' /0/ (INFINITIVE) becomes the verbal word '.~o\]' ~co-O~. As a result, the archiphoncme A is used for the initial vowel o of suffixes, which is to distinguish it from 0 elsewhere. curs in the same context ms 'L'-irregularity: it causes only to be changed the following vowel 'o1' /o/ into '~t' lie/; for example, the verb stem 'o1~' li-~/(to arrive) plus the snflix '0t'/a/(INFINI'rlVE) becomes the verbal word ' deg1 ~et'/i-/i- to/. Therefore, a mechanism is needed to treat them properly.</Paragraph> <Paragraph position="11"> One of the special symbols is used to represent a specific lexical form, and is ahnost placed at tlm e,d of tlle lexical form. For example, the verbal stem tub has two meanings, i.e., &quot;curved&quot; as an adjective and &quot;grill&quot; as a verb. Ill this case, the probleln is on the ditrcrence between the variation \[brine for adjective and those for verb; when it is combined with a sultix A, tim surface form becomes either the guile as adjective, or tim guwc as verb. 'Fo distinguish betwcen them, the following lexical fi~rms can be listed in dictionary: gvH+ for regular adjective, and guH$ h)r '1\]'-irregular verb.</Paragraph> </Section> <Section position="4" start_page="536" end_page="536" type="metho"> <SectionTitle> WORD STI{UCTURE AND LEXI- CONS </SectionTitle> <Paragraph position="0"> The word structure in general denotes knowledge of tin: internal morpheme combinations of known words.</Paragraph> <Paragraph position="1"> As a result, it shows how morl)hemes can combine to l'orm valid words; it is important to a proper word recognition. In tim two-level model it is represented with linked lexicons, i.e., with coniinvaliou claaaes of morphemes.</Paragraph> <Paragraph position="2"> The contimmtion chmses used in our lexicovs are as follows: i.terjection (IS), prenoun (Pit), adverb (A\]3), noun (iNN), pronou,l (PN), numeral (NU), verb (VB), adjective (AJ), verbalizer (Vit), postposition (PP), l-po~tposition (I1'), nominal-prelix (NF), verbal-preIix (VF), preliual-ending (PE), final-ending (FE), nominal ending (NE) =, Begin, and End. Every class indicates a lexicon, lIowew:r, the 11c9iu and End are some special lexicons; llcgin amounts to the initim state in automata, and End has tile same role as the final state; in fact, there is no lcxical entry. The following ~hows our linked lexicons.</Paragraph> <Paragraph position="3"> The right arrow '-}' indicates that a class on its left side can continue with one of classes on its right side; a vertical bar '\[' indicates OH,.</Paragraph> </Section> <Section position="5" start_page="536" end_page="537" type="metho"> <SectionTitle> TWO-LEVEL RULES AND FINITE STATE AUTOMATA </SectionTitle> <Paragraph position="0"> Based on tile work of Korean morphology by Lcc (1991), 52 two-level rulcs has been developed for the Korcan morphological alteruations. By way of an example, we explain the following Korean morphological al;ernation in the two-level framework.</Paragraph> <Paragraph position="1"> In Korcan, some verbals cnding in the final consonant B are irregular. The final consonant B of the stein, when followed by a vowel, is changed into w.</Paragraph> <Paragraph position="2"> But it is not changed when followed by a consonant.</Paragraph> <Paragraph position="3"> For example, when an irregular verb doB (to help) is combined with the suftix A, it is changed into dowa.</Paragraph> <Paragraph position="4"> hi the two-levd system, it is represented as follows: Lczical Representation: d o B $ + A SuTface Representation: d o w 0 0 a This shows a correspondence between lexical representation and surface representation. In PC-KIMMO, such a correspondence is represented with the notation lezieal-eharacter:surface-eharacter like d:d, o:o, B:w, 8:0, +:0, and A:a. IIerc the lexieal character 8 is a signal indicating that a basic word or stem followed by it is irregular, and it corresponds to a surface O (the NULL symbol) which is not printed in the output form. The lexical 4- (a morpheme boundary symbol) also corresponds to a surface 0.</Paragraph> <Paragraph position="5"> The above alternation may be described as the following two-level rule: B:w C/~ --- 8:0 4&quot;:0 A:@ (11 Variation lgule) This rule stales Lhat a lexical 11 is realized as a surface w if and only if it is followed by the conjugation information 8, thc morpheme boundary 4&quot;, and a linking suflix A. A surface @ in the above rule stands for any alphabetic charactcr that constitutes a feasible pair with a lexical A. For example, the surface @ may bc realized ms a, c, or O whcrt all feasible pairs with lcxicM A arc like A:a, Arc, and A:O.</Paragraph> <Paragraph position="6"> The two-level rules cart be automatically translated into the state transition tables by using a rule compiler such as TWOL (Karttunen, 1987) and KGEN (Miles, 1991). The tables built by KGEN may bc actually used in PC-KIMMO. The above rule is translated by KGEN into the state transition table below: The rows of the table represent the seven states, in which linal states are marked with colons and nonfinal states arc marked with periods. The columns represent arcs frorn one state to another. A zero transition indicates that there is no valid transition from that state for that input symbol.</Paragraph> </Section> class="xml-element"></Paper>