File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/84/p84-1038_abstr.xml

Size: 4,367 bytes

Last Modified: 2025-10-06 13:46:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="P84-1038">
  <Title>A GENERAL COMPUTATIONAL MODEL FOR WORD-FORM RECOGNITION AND PRODUCTION</Title>
  <Section position="1" start_page="0" end_page="178" type="abstr">
    <SectionTitle>
ABSTRACT
</SectionTitle>
    <Paragraph position="0"> A language independent model for recognition and production of word forms is presented. This &amp;quot;two-level model&amp;quot; is based on a new way of describing morphological alternations. All rules describing the morphophonological variations are parallel and relatively independent of each other. Individual rules are implemented as finite state automata, as in an earlier model due to Martin Kay and Ron Kaplan.</Paragraph>
    <Paragraph position="1"> The two-level model has been implemented as an operational computer programs in several places. A number of operational two-level descriptions have been written or are in progress (Finnish, English, Japanese, Rumanian, French, Swedish, Old Church Slavonic, Greek, Lappish, Arabic, Icelandic). The model is bidirectional and it is capable of both analyzing and synthesizing word-forms.</Paragraph>
    <Paragraph position="2"> I. Generative phonology The formalism of generative phonology has been widely used since its introduction in the 1960's. The morphology of any language may be described with the formalism by constructing a set of rewriting rules. The rules start from an underlying lexical representation, and transform it step by step until the surface representation is reached.</Paragraph>
    <Paragraph position="3"> The generative formalism is unidirectional and it has proven to be computationally difficult, and therefore it has found little use in practical morphological programs.</Paragraph>
    <Paragraph position="4"> 2. The model of Kay and Kaplan Martin Kay and Ron Kaplan from Xerox PARC noticed that each of the generative rewriting rules can be represented by a finite state automaton (or transducer) (Kay 1982). Such an automaton would compare two successive levels of the generative framework: the level immediately The work described in this paper is a part of the project 593 sponsored by the Academy of Finland.</Paragraph>
    <Paragraph position="5"> before application of the rule, and the level after application of the rule. The whole morphological grammar would then be a cascade of such levels and automata:</Paragraph>
    <Paragraph position="7"> A cascade of automata is not operational as such, but Kay and Kaplan noted that the automata could be merged into a single, larger automaton by using the techniques of automata theory. The large automaton would be functionally identical to the cascade, although single rules could no more be identified within it. The merged automaton would be both operational, efficient and bidirectional. Given a lexical representation, it would produce the surface form, and, vice versa, given a surface form it would guide lexical search and locate the appropriate endings in the lexicon.</Paragraph>
    <Paragraph position="8"> In principle, the approach seems ideal. But there is one vital problem: the size of the merged automaton. Descriptions of languages with complex morphology, such as Finnish, seem to result in very large merged automata. Although there are no conclusive numerical estimates yet, it seems probable that the size may grow prohibitively large.</Paragraph>
    <Paragraph position="9"> 3. The two-level approach My approach is computationally close to that of Kay and Kaplan, but it is based on a different morphological theory. In- null stead of abstract phonology, I follow the lines of concrete or natural morphology (e.g. Linell, Jackendoff, Zager, Dressler, Wurzel). Using this alternative orientation I arrive at a theory, where there is no need for merging the automata in order to reach an operational system.</Paragraph>
    <Paragraph position="10"> The two-level model rejects abstract lexical representations, i.e. there need not always be a single invariant underlying representation. Some variations are considered suppletion-like and are not described with rules. The role of rules is restricted to one-segment variations, which are fairly natural. Alternations which affect more than one segment, or where the alternating segments are unrelated, are considered suppletion-like and handled by the lexicon system.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML