File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/a97-1016_intro.xml
Size: 3,948 bytes
Last Modified: 2025-10-06 14:06:15
<?xml version="1.0" standalone="yes"?> <Paper uid="A97-1016"> <Title>Automatic Acquisition of Two-Level Morphological Rules</Title> <Section position="3" start_page="0" end_page="103" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Computational systems based on the two-level model of morphology (Koskenniemi, 1983) have been remarkably successful for many languages (Sproat, 1992). The language specific information of such a system is stored as 1. a morphotactic description of the words to be processed as well as 2. a set of two-level morphonological (or spelling) rules.</Paragraph> <Paragraph position="1"> Up to now, these two components had to be coded largely by hand, since no automated method existed to acquire a set of two-level rules for input source-target word pairs. To hand-code a 100% correct rule set from word pairs becomes almost impossible when a few hundred pairs are involved. Furthermore, there is no guarantee that such a hand coded lexicon does not contain redundant rules or rules with too large contexts. The usual approach is rather to construct general rules from small sub-sets of the input pairs. However, these general rules usually allow overrecognition and overgeneration -even on the subsets from which they were inferred. Simons (Simons, 1988) describes methods for studying morphophonemic alternations (using annotated interlinear text) and Grimes (Grimes, 1983) presents a program for discovering affix positions and cooccurrence restrictions. Koskenniemi (Koskenniemi, 1990) provides a sketch of a discovery procedure for phonological two-level rules. Golding and Thompson (Golding and Thompson, 1985) and Wothke (Wothke, 1986) present systems to automaticaily calculate a set of word-formation rules. These rules are, however, ordered one-level rewrite rules and not unordered two-level rules, as in our system. Kuusik (Kuusik, 1996) also acquires ordered one-level rewrite rules, for stem sound changes in Estonian. Daelemans et al. (Daelemans el al., 1996) use a general symbolic machine learning program to acquire a decision tree for matching Dutch nouns to their correct diminutive suffixes. The input to their process is the syllable structure of the nouns and a given set of five suffix allomorphs. They do not learn rules for possible sound changes. Our process automatically acquires the necessary two-level sound changing rules for prefix and suffix allomorphs, as well as the rules for stem sound changes. Connectionist work on the acquisition of morphology has been more concerned with implementing psychologically motivated models, than with acquisition of rules for a practical system ((Sproat, 1992, p.216) and (Gasser, 1994)).</Paragraph> <Paragraph position="2"> The contribution of this paper is to present a complete method for the automatic acquisition of an op- null timal set of two-level rules (i.e. the second component above) for source-target word pairs. It is assumed that the target word is formed from the source through the addition of a prefix and/or a suffix 1. Furthermore, we show how a partial acquisition of the morphotactic description (component one) results as a by-product of the rule-acquisition process. For example, the morphotactic description of the target word in the input pair</Paragraph> <Paragraph position="4"> The right-hand side of this morphotactic description is then mapped on the left-hand side,</Paragraph> <Paragraph position="6"> For this example the two-level rule</Paragraph> <Paragraph position="8"> can be derived. These processes are described in detail in the rest of the paper: Section 2 provides an overview of the two-level rule formalism, Section 3 describes the acquisition of morphotactics through segmentation and Section 4 presents the method for computing the optimal two-level rules. Section 5 evaluates the experimental results and Section 6 summarizes.</Paragraph> </Section> class="xml-element"></Paper>