File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/92/c92-1025_abstr.xml
Size: 10,709 bytes
Last Modified: 2025-10-06 13:47:22
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-1025"> <Title>Two-Level Morphology with Composition</Title> <Section position="2" start_page="0" end_page="0" type="abstr"> <SectionTitle> 2. Desiderata </SectionTitle> <Paragraph position="0"> lexical level happy +Comp +Adj surface level happi er 0 lexical level good +Comp +Adj surface level bett er 0 Figure 1 The stems are presented as the lemmas found in a dictionary, followed by morphological tags. 0 serves here as the epsilon symbol. Because there is no need to have other annotations on the lexicon trees, problems I and II in Section 1 have been eliminated. Lexical forms are always sequences of morphemes in their canonical representation.</Paragraph> <Paragraph position="1"> The only obstacle to this approach is that the rules that constrain the surface realization of lexical forms become more difficult to write when there is little or no similarity between the two levels of representation. Designing such rules and understanding their interactions is a hard task even with the computational assistance provided by a complete compiler for the two-level formalism (Karttunen et al. \[6\]). We follow two simple principles: (1) Inflected forms of the same word are mapped to the same canonical dictionary form. This applies to both regular and irregular forms. For example, in our English analyzer the surface forms happier and better are directly matched with the lexical forms happy and good, respectively, rather than some nonwords.</Paragraph> <Paragraph position="2"> As the distance between lexical and surface form increases, the mapping is easier to describe by allowing one or more intermediate levels of representation. The solution we adopted combines the two-level rule formalism with the cascade model of finite-state morphology dis- null cussed by Kaplan & Kay \[7\].</Paragraph> <Paragraph position="3"> 3. Composition of two-level rules (2) Morphological categories are repre- null sented as part of the lexical form. Instead of encoding morphological categories such as Plural, Comparative, lstPerson as annotations on strings that realize them, we include them directly in the lexical representation. Consequently, our two-level representation of happier and better are: Ac'rF.s DE COLING-92, NANTES, 23-28 Ao0r 1992 1 4 2 Our formal understanding of finite-state morphology is based on the demonstrations that both rewriting rules and two-level rules denote regular relations on strings (Kaplan \[9\]). The correspondence between regular relations and finite-state transducers and the closure properties of regular relations provide the computational and mathematical tools that our approach depends on. One of the earliest results of finite-state morphology is the observation PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 that regular relations are closed under composition (Johnson \[8\], Kaplan&Kay \[7\], Kaplan \[9\]). Consequently, a single transducer can be constructed whose behavior is exactly the same as a set of transducers arranged in an ordered feeding cascade: lexical string lexical string This observation was originally made about transducers corresponding to phonological rewrite rules, but it applies to regular relations or transducers no matter how they are specified. Although regular relations in general are not closed under intersection, the subclass of relations denoted by standard two-level rules is closed under this operation (Kaplan \[9\]). Thus fstl and fst2 in Figure 2 may represent either a single two-level rule or the intersection of any number of rules.</Paragraph> <Paragraph position="4"> When the relationship between lexical and surface forms is complex, the descriptive task of setting up rules that relate the two levels can be simplified by decomposing the complex relation to a series of less opaque matches. For efficient recognition and generation, the resulting cascade can be reduced to a single transducer. Although it would be possible in principle to produce the same single transducer directly from two-level rules, we have found many cases in our descriptions of English and French where the composition approach is not only easier but also ACRES DECOLING-92, NANTES, 23-28 ^O',3T 1992 linguistically more justified. We describe one such case in detail.</Paragraph> <Paragraph position="5"> 4. French compound plurals French plurals can be formed in a variety of ways. Some of the most common patterns are illustrated in Figure 3.</Paragraph> <Paragraph position="6"> We omit here the actual two-level rules; what Figure 3 illustrates is simply the joint effect of several rules that constrain the realization of the plural morpheme and the shape of the stern in regular nouns. Note that the constraints here are local; the stem and the plural morpheme are in a fixed position with respect to each other.</Paragraph> <Paragraph position="7"> In compound nouns and adjectives, several patterns are possible: (1) only the first part of the compound is marked for the plural, (2) both are, (3) none are or (4) I 4 3 PROC. OF COLING-92, NANTEs, AUG. 23-28, 1992 only the last is. The possible patterns and some examples are given in Figure 4.</Paragraph> <Paragraph position="8"> The interesting cases are those in which the first part needs to be pluralized. In a simple two-level system, the information about plural formation summarized in Figure 3 would have to be rewritten and adapted so that the rules could apply over a. distance in the position just before the hyphen.</Paragraph> <Paragraph position="9"> No plural marking at all The simple rules for regular plural formation illustrated in Figure 3 do not work for first parts of compounds because the affected elements are not in the same configuration relative to each other. Although it is possible to modify the rules, the new versions would be rather complicated and do not capture the simple fact that the plurals portes and fen~tres in portes-fen~tres in themselves are regular, the only thing that is special about the word is that plurality is expressed in both parts of the compound. null We avoid these complications by creating a cascade of two-level rules in which the first stage is only concerned with the plurals of compounds. It starts from a lexical form in which the words are marked for the pattern that they take and creates an intermediate level in which the information about number and gender is distributed over the agreeing parts. This is illustrated in Figure 5 for the masculine plural of social-ddmocrate, a word in which both parts get pluralized.</Paragraph> <Paragraph position="10"> social 0 0 -d~noc r ate+DP L+ma s c+pl social+mas c+pl -d~mocrat e 0 +masc+pl Figure 5 The effect of the first stage of rules is to copy the morphological tags from the end of the compound to the middle whenever the +DPL (double plural) diacritic is present. null The second layer of rules applies uniformly to simple nouns as well as compounds. In the case at hand, the two plurals in sociaux-ddmocrates are realized in the regular way, as shown in Figure 6.</Paragraph> <Paragraph position="11"> sociau 0 x -d~mocrate 0 s Figure 6 By first intersecting the rules in each set and then composing the results in the way shown in Figure 2, we end up with a transducer that eliminates the intermediate level altogether and maps the lexical representation directly to the correct surface form, and vice versa. Figure 7 illus- null AcrEs DE COLING-92, NAgIES, 23-28 ^ot~r 1992 1 4 4 PRoc. OF COLING-92, NANTES. AUG. 23-28, 1992 The representation in Figure 7 fulfills the desiderata laid out in Section 2 except that it contains a special diacritic +DPL that marks the behavior of social-ddmocrate, with respect to plural formation. In the next section, we show how that diacritic can be eliminated.</Paragraph> <Paragraph position="12"> 5. Composition with the lexicon By choosing the canonical dictionary form as the lexical form in our English and French analyzers and by including morphological categories directly as part of that representation, we have eliminated the need for additional annotations in the lexical structure that are common in existing Kimmo systems. We can treat the letter tree as a simple finite-state network in which all morphological information is carried on the branches of the tree and not on the leaves.</Paragraph> <Paragraph position="13"> Taking this idea one step further, we may think of the lexicon as a trivial first stage in a cascade of transducers that maps between the lexical and the surface levels. The second stage is the two-level rule system. In the case of our analyzers for English and French, the rule system starts out with three levels but reduces to two by intersection and composition. The final stage is the composition of the rule system with the lexicon.</Paragraph> <Paragraph position="14"> This progression of pushing the original Kaplan & Kay \[7\] program to its logical conclusion is depicted in Figure 8.</Paragraph> <Paragraph position="15"> Stage 1 Stage 2 Stage 3 our morphological analyzers for English and French. Arrows labeled with & represent intersection, arrows marked with o stand for composition. (We have simplified this picture slightly by omitting the composition of small bookkeeping relations that are necessary to model properly the interpretation of epsilon transitions in two-level rules.) Ac~s DE COLING-92, NANTES, 23-28 AOOT 1992 Stage 1 consists of two parallel two-level rule systems arranged in a cascade, as illustrated in Section 4. In Stage 2, the rules on each level have been intersected to a single transducer. Stage 3 shows the composition of the two-level rule systems to a single transducer and Stage 4 represents the final result: a transducer that maps sequences of canonical dictionary forms and morphological categories to the corresponding surface forms, and vice versa. Although the conceptual picture is 1 4 5 P~oc. oF COLING-92, NANTES, AUG. 23-28, 1992 quite straightforward, the actual computations to produce the structures can be resource intensive, in some cases quite impractical.</Paragraph> <Paragraph position="16"> At the last stage, when the idiosyncratic behavior of particular lexical items has been taken into account in the composition of the lexicon with the rule transducers, all morphological diacritics such as the +DPL tag for French nouns with double plurals can be eliminated because the rules that depend on them have been applied. In full compliance with our desiderata in Section 2, the resulting transducer maps, among other things, social-ddmocrate+masc+pl directly to sociaux-ddmocrates, and vice versa.</Paragraph> </Section> class="xml-element"></Paper>