File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/98/w98-1312_abstr.xml
Size: 3,011 bytes
Last Modified: 2025-10-06 13:49:38
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-1312"> <Title>Constraining Separated Morphotactic Dependencies in Finite-State Grammars</Title> <Section position="2" start_page="0" end_page="118" type="abstr"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In finite-state morphotactics, the efficient constraint of separated (non-adjacent) morpheme dependencies is a serious practical challenge. This paper will examine some typical separated dependencies, using examples from Modem Standard Arabic, showing various methods that have been invented, and perhaps reinvented several times, to block lexical overgeneration. The challenge in working systems is to enforce the necessary constraints without causing the lexicons to explode in size and without slowing the nmtime performance too badly.</Paragraph> <Paragraph position="1"> The term MORPHOLOGY, as used by linguists in the Two-Level and Finite-State traditions, encompasses both MORPHOTACTICS (also called MORPHOSYNTAX), and the phonological or orthographical VARIATION rules that map between LEXICAL strings (i.e. abstract or underlying strings) and SURFACE strings. The theory and practical use of finite-state variation rules are well documented (Koskenniemi, 1983; Karttunen, 1983; Antworth, 1990; Karttunen and Beesley, 1992; Sproat, 1992; Karttunen, 1994) and will not be dealt with here. In the area of morphotactics, the commonly available languages for finite-state lexical specification provide linguists with a notation wherein related classes of morphemes, e.g. verb endings, noun endings, direct-object clitic suffixes, etc., are grouped together into sublexicons, and each individual morpheme is assigned a CONTINUATION CLASS which designates which subclasses of morphemes can follow it in a valid word (Karttunen, 1993). In formal terms, the grouping together of related morphemes into sublexicons translates into the union operation, and continuations translate into the concatenation operation. As far as concatenating languages are concerned, these two finite-state operations are often sufficient for defining the language of possible lexical strings.</Paragraph> <Paragraph position="2"> Where there are morphotaetie dependencies, i.e. where some morphemes require or prohibit the appearance of other morphemes in a word, and where the morphemes in question are adjacent, the necessary dependencies can be constrained via appropriate definition of the continuation classes. However, when similar co-occurrence restrictions exist between morphemes that are physically separated in a word, then the continuation-class notation breaks down and must be supplemented by one of the mechanisms to be discussed below. We shall conclude with a presentation of FLAG DIACRITICS as a practical compromise that keeps lexicons small, runs efficiently, provides linguists with a notation reminiscent of feature-unification, and is compatible with general finite-state computation.</Paragraph> </Section> class="xml-element"></Paper>