File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/00/j00-1006_abstr.xml

Size: 8,679 bytes

Last Modified: 2025-10-06 13:41:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="J00-1006">
  <Title>Multitiered Nonlinear Morphology Using Multitape Finite Automata: A Case Study on Syriac and Arabic</Title>
  <Section position="2" start_page="0" end_page="78" type="abstr">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> This paper is concerned with presenting a general finite-state framework for computing complex nonlinear (i.e., nonconcatenative) morphophonological descriptions. The framework subsumes previous models in that it is not only capable of handling the complex nonlinear morphological phenomena we are about to describe, but also the usual linear ones found in many languages. The framework is a multitiered model that encompasses linear and nonlinear morphology.</Paragraph>
    <Paragraph position="1"> The elegance of the proposed framework lies in the fact that it is capable of handling sophisticated linguistic descriptions, such as those of autosegmental phonology, in a computationally tractable way. The model consists of three components. The first is a multitier lexicon that consists of several sublexica, each describing a lexical tier as in autosegmental phonology. The second component is a bidirectional rewrite rules system that maps, inter alia, the multitier lexical representations to a linearized surface form and vice versa. The third component provides morphotactic constraints. The proposed framework has the computational property that its lexical and rule formalisms are compiled algorithmically into finite-state machinery using operators under which finite-state machines are closed.</Paragraph>
    <Paragraph position="2"> * 700 Mountain Ave., Murray Hill NJ 07974. E-mail: gkiraz@research.belMabs.com ~) 2000 Association for Computational Linguistics Computational Linguistics Volume 26, Number 1 The rest of this section discusses the importance of nonlinear morphology and outlines the research objectives of this work.</Paragraph>
    <Section position="1" start_page="0" end_page="78" type="sub_section">
      <SectionTitle>
1.1 Nonlinear Morphology
</SectionTitle>
      <Paragraph position="0"> Early generative morphophonological theory was mainly based on the linear segmental approach of The Sound Pattern of English (Chomsky and Halle 1968). In the mid seventies, however, linguists had departed from this linear framework to a nonlinear one. Goldsmith (1976), working on the phonology of African tone languages, proposed autosegmental phonology with multitiered representations. Goldsmith made use of two tiers to describe tone languages: one to represent sequences of vowels and consonants, and another to describe tone segments. McCarthy (1979) applied autosegmental phonology to Semitic root-and-pattern morphology resulting in what is now known as the theory of nonconcatenative (or nonlinear) morphology, as opposed to concatenative morphology. McCarthy's findings have become ubiquitous. In fact, &amp;quot;every aspect of the theory of morphology and morphophonology,&amp;quot; remarks Spencer (1991, 134), &amp;quot;has had to be reappraised in one way or another in the wake of \[Mc-Carthy's\] analysis of Semitic and other languages.&amp;quot; Spencer's statement could well apply to the theory of computational morphology.</Paragraph>
      <Paragraph position="1"> Two-level morphology (Koskenniemi 1983), as well as its predecessor in the work of Kay and Kaplan (1983), is also deeply rooted in the linear concatenative tradition. Indeed, &amp;quot;if Koskenniemi had been interested in Arabic or Warlpiri rather than Finnish,&amp;quot; notes Sproat (1992, 206), &amp;quot;his system might have taken on a rather different character from the start.&amp;quot; It would prove difficult, if not impossible, to implement Semitic languages using linguistically motivated theoretical models such as those of McCarthy and others in the field with traditional two-level morphology.</Paragraph>
      <Paragraph position="2"> Kay (1987) was the first computational linguist to make use of McCarthy's findings. He proposed that a four-tape finite-state machine, as opposed to the traditional two-tape machines of two-level morphology, be used to describe the autonomous morphemes of Arabic. Kay devised a system for manipulating the multitape machine, albeit using an ad hoc procedure to control the movements of the machine's head(s).</Paragraph>
      <Paragraph position="3"> We shall build upon Kay's work by providing higher-level lexical and rule formalisms, and algorithms for compiling the formalisms into multitape machines, eliminating the need for the ad hoc procedure that controls head movements. We shall revisit Kay's approach in Section 6.1.</Paragraph>
      <Paragraph position="4"> Previous work to implement Semitic languages, namely, Akkadian (Kataja and Koskenniemi 1988), Arabic (Beesley, Buckwalter, and Newton 1989; Beesley 1990, 1991, 1996, 1998a, 1998b, 1998c, forthcoming) and Hebrew (Lavie, Itai, and Ornan 1990), employed traditional two-level morphology with some augmentation to handle the nonlinearity of stems, but did not make any use of the then-available theory of non-concatenative morphology. The challenge here lies in the fact that two-level morphology assumes the lexical representation of a surface form to be the concatenation of the corresponding lexical morphemes in question. To resolve the problem, these authors (with the exception of Beesley's work from 1996 on) provided for a simultaneous search of various root and affix lexica, the result of which served as the lexical tape of the two-level system. We shall revisit these approaches in Sections 6.2 and 6.3.</Paragraph>
      <Paragraph position="5"> Other publications dealing with Semitic computational morphology are confined to proposals for compiling autosegmental descriptions into automata (Kornai 1991; Wiebe 1992; Bird and Ellison 1992). They revolve around encoding autosegmental representations (by various encoding mechanisms) and providing ways for compiling such encodings into finite machines. None provide for lexicon and rule formalisms that can be compiled into their respective encodings or directly into automata. No  Kiraz Multitiered Nonlinear Morphology Semitic language, to the best of the author's knowledge, has been implemented with these proposals. We shall revisit these approaches in Section 6.4.</Paragraph>
    </Section>
    <Section position="2" start_page="78" end_page="78" type="sub_section">
      <SectionTitle>
1.2 Research Objectives
</SectionTitle>
      <Paragraph position="0"> The purpose of this work is to provide a theoretical computational framework under which nonlinear morphology can be handled in a linguistically and computationally motivated manner with the following objectives in mind: 1. The framework is to present a general multitiered computational morphology model that allows for both linear and nonlinear morphophonological descriptions.</Paragraph>
      <Paragraph position="1"> 2. The formalism of the framework is to handle various linguistic theories and models including McCarthy's initial findings as well as later models for Hebrew (Bat-E1 1989), moraic theory (McCarthy and Prince 1990a, 1995), the affixational approach to handling templates in Arabic (McCarthy 1993), and others. That is, a flexible formalism that leaves the grammar writer with ample room to choose the appropriate linguistic theory for an application.</Paragraph>
      <Paragraph position="2"> 3. Multitiered lexica and grammars written in the formalism are to be compiled into finite-state machines (multitape automata in this case). The multitape machines are created by a compiler that employs a finite-state engine with an algebraic interface to n-way regular expressions.</Paragraph>
      <Paragraph position="3"> 4. The multitape machines are to be as close as possible in spirit to two-level morphology in that surface forms map to lexical morphemes.</Paragraph>
      <Paragraph position="4"> Our lexical level is a multitiered representation.</Paragraph>
      <Paragraph position="5"> This paper provides an overall description of the theoretical framework, compilation algorithms, and illustrations. Additionally, it discusses other related topics crucial to developing Semitic grammars.</Paragraph>
      <Paragraph position="6"> Results that emerged earlier from this work appear elsewhere (Kiraz 1994b, 1996, 1997a, 1997b, 1997c, in press), but have been thoroughly enhanced and reworked since. New contributions include enhancing the theoretical framework (Section 3), compiling lexica and rules into multitape finite-state machines (Section 4), and evaluating the current model with respect to previous ones (Section 6).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML