File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/a92-1034_metho.xml
Size: 7,681 bytes
Last Modified: 2025-10-06 14:12:56
<?xml version="1.0" standalone="yes"?> <Paper uid="A92-1034"> <Title>MORPHIa: A Practical Compiler for Reversible Morphology Rules</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 The Processing Model </SectionTitle> <Paragraph position="0"> In Morph~ the process of inflection is seen as consisting of two basic steps: 1. By making a series of feature- and orthographically-based decisions, choose an inflection procedure. 2. Apply that procedure to the uninflected root. To implement the first step, Morph~ uses a feature-based discrimination network with orthographically-based inflection rules at the leaves. Each node in the discrimination network specifies a set of features common to all of its descendants. For example, at the top of a subtree for nouns, a node might contain the features { (cat: noun) } which wouldbe inherited by the nodes for single-noun and plural-noun, and SO on.</Paragraph> <Paragraph position="1"> That Morph~ explicitly divides feature-based decisions from orthographic decisions has two important consequences: ical and/or morphological features (e.g. paradigm) can be checked alongside syntactic features (e.g. category). * A single morpheme can be split across several leaf nodes if feature tests below the morpheme level are necessary.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 The Rule Formalism </SectionTitle> <Paragraph position="0"> As shown in Figure 1, a rule consists of a set of clauses, each of which contains orthographic pattern on the left-hand side and a set of inflection operations on the right-hand side.</Paragraph> <Paragraph position="1"> * Orthographic patterns. The orthographically-based decisions are made by matching against regular expression-based patterns. Standard regular expression operations (i.e. Kleene closure, wildcards, etc.) are included. In addition, non-standard operations for matching against a pre-defined class of strings 3, and binding and retrieval of portions of the word 4 are included.</Paragraph> <Paragraph position="2"> * Inflection Operations. The application of the inflection procedure is implemented as the sequential execution of the inflection operations in the right-hand side. The inflection operations include affixation, deletion, and the combined operation of &quot;replacement&quot; in prefix, suffix, and infix positions. Also included is an operation for performing regular string-to-string mapping within a word.5</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="233" type="metho"> <SectionTitle> 3 Processing </SectionTitle> <Paragraph position="0"> During generation, processing begins with a feature structure entering the tree at the root node, and trickling down to the appropriate leaf node. Once at the leaf node, the word root is compared against each clause's orthographic pattern in turn.</Paragraph> <Paragraph position="1"> When a match is found, the inflection procedure for that clause is applied to the word root and the result is returned.</Paragraph> <Paragraph position="2"> During parsing, processing begins with an inflected form entering the tree at each leaf node where the inflection rules are applied &quot;in reverse&quot; and the non-passing results discarded. Applying a rule &quot;in reverse&quot; means that the word is matched against the inflected forms and the operations perform deinflection, rather than vice versa. After all clauses in all leaves have been tried, and presumably most results have been discarded, each remaining parse follows the network upwards, collecting the features of each node it traverses until a set of full feature structures arrives at the root node. When this process is finished, a lexicon check is made to ensure that only valid words (of the proper category, paradigm, etc.) are kept.</Paragraph> </Section> <Section position="5" start_page="233" end_page="233" type="metho"> <SectionTitle> 4 Handling Common Morphological </SectionTitle> <Paragraph position="0"> This section explains how common morphological processes are handled by Morph~.</Paragraph> <Paragraph position="1"> * Affixation. Prefixation, suffixation, and infixation are handled directly by the +p, +s, and +5. inflection operators. To determine the insertion point, infixes must be placed either before or after some portion of the word that was bound during pattern matching.</Paragraph> <Paragraph position="2"> * Deletion. Word initial, word final, and word internal deletion are handled directly by the -p, -s, and -5_ inflection operators. As with infixation, some bound part of the word must act as an &quot;anchor&quot; for the deletion. * Gemination and Reduplication. Since expressions may be bound during pattern matching, bound expressions can be affixed to the word to create the effects of gemination or reduplication. For example, when forming the present participle, certain English verbs repeat the final consonant before adding the suffix &quot;ing&quot; (e.g. &quot;cut&quot; --~ &quot;cutting&quot;). This simple twinning is encoded by the third clause in the above sample rule. Reduplication, as found in Warlpiri \[Sproat and Brunson, 1988\], or Latin \[Matthews, 1974\], can be handled in a similar manner (i.e. by binding the appropriate portion of the root and retrieving it during affixation).</Paragraph> <Paragraph position="3"> * Paradigmatic Alternation. Alternations that consists of a single mapping of one string to another, such as the &quot;-fe/-ve&quot; alternation for the plural of English nouns like &quot;wife&quot; or&quot;knife&quot; can be handled by a single replacement operation. Alternations that consist of a number of related alternation, such as the {&quot;-us/-i&quot; &quot;-um/-a&quot; &quot;-a/-ae&quot;} alternation for the plural of English nouns like &quot;octopus&quot;, &quot;spectrum&quot;, and &quot;vertebra&quot; could be handled as separate cases, but it is convenient to be able to refer to the entire class of alternations. The map operator invokes a string-to-string mapping on a bound portion of a word. 6 Alternations such as vowel rounding in the comparative forms of German adjectives, and consonant and vowel alternation in Rumanian, can be handled by this method.</Paragraph> <Paragraph position="4"> * Suppletion. Morph~ currently handles suppletion by requiring suppletive forms (e.g. &quot;went&quot; for &quot;go&quot;) to be included in the lexicon. In this, it is not unlike many other system, such as KIMMO and DIMORPH.</Paragraph> </Section> <Section position="6" start_page="233" end_page="233" type="metho"> <SectionTitle> 5 Current Uses and Future Research </SectionTitle> <Paragraph position="0"> Morph~ is presently being used for French and German generation morphology in the Kant project, a knowledge-based machine translation system being developed at Carnegie Mellon University \[Mitamura et al., 1991\]. In addition, a rule file has been developed for English and one is currently being designed for Spanish. Future research will be directed towards morphological phenomena that cannot currently be handled in an elegant fashion. Certain types of suppletion, such as irregular stems with regular endings in Latin, should be handled more generally and with less reliance on the lexicon as a storehouse of irregularities. In addition, the design of mechanisms appropriate to the handling of prosodic inflection will also be investigated.</Paragraph> </Section> class="xml-element"></Paper>