File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/c96-2105_intro.xml

Size: 4,826 bytes

Last Modified: 2025-10-06 14:05:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-2105">
  <Title>Parallel Replacement in Finite State Calculus</Title>
  <Section position="2" start_page="0" end_page="622" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> A replacement expression specifies that a given symbol or a sequence of symbols should be replaced by another one in a certain context or contexts.</Paragraph>
    <Paragraph position="1"> Phonological rewrite-rules (Kaplan and Kay, 1994), two-level rules (Koskenniemi 1983), syntactic disarnbiguation rules (Kar\]sson et al 1994, Koskenniemi, Tapanainen, and Voutilainen 1992), and part-of-speech assignment rules (Brill 1992, Roche and Schabes 1995) are examples of replacement in context of finite-state grammars.</Paragraph>
    <Paragraph position="2"> Kaplan and Kay (1994) describe a general method representing a replacement procedure as finite-state transduction. Karttunen (1995) takes a somewhat simpler approach by introducing to the calculus of regular expression a replacement operator that is defined just in terms of the other regular expression operators. We follow here the latter approach. null In the regular expression calculus, the replacement operator, -&gt;, is similar to crossproduct, in that a replacement expression describes a relation between two simple regular languages. Consequently, regular expresmons can be conveniently combined with other kinds of coperations, such as composition and union to form complex expressions. null A replacement relation consists of pairs of strings that are related to one another in the manner sketched below: x u.~ y, u~ z upper string \[1\] x 1~ y 1~ z lower string We use u i and u~ to represent instances of Ui (with i C \[1, n\])and 1~ and 1~ to represent instances of Li. The upper string contains zero or more instances of Ui, possibly interspersed with other material (denoted here by x, y, and z). In the corresponding lower string the sections corresponding to Ui are instances of Li, and the intervening material remains the same (Karttunen, 1995).</Paragraph>
    <Paragraph position="3"> The -&gt; operator makes the replacement obligatory, (-&gt;) makes it optional. For the sake of completeness, we also define the inverse operators, &lt;and (&lt;-), and the bidirectional variants, &lt;-&gt; and (&lt;-&gt;). We have incorporated the new replacement expressions into our implementation of the finite-state calculus (Kempe and Karttunen, 1995).</Paragraph>
    <Paragraph position="4"> Thus, we can construct transducers directly from replacement expressions as part of the general calcnlus, without invoking any special rule compiler.</Paragraph>
    <Section position="1" start_page="0" end_page="622" type="sub_section">
      <SectionTitle>
1.1 Simple regular expressions
</SectionTitle>
      <Paragraph position="0"> The table below describes the types of regular expressions and special symbols that are used to define the replacement operators.</Paragraph>
      <Paragraph position="1">  (h) option, \[ h I 0 \] \[2\] h* Kleene star h+ Kleene plus h/B ignore (A possibly interspersed with strings from B) &amp;quot;h colnplement (negation) $h contains (at least one) A h B concatenation h I B union h g~ t3 intersection h - B relative complement (minus) h .x. B crossproduct (Cartesian product) h .o. 13 composition 0 or \[ J epsilon (the empty string) \[. .\] affects empty string replacement (see. 2.2) ? any symbol ?* the universal (&amp;quot;sigma-star&amp;quot;) language (contains MI possible strings of any length including the empty string) .#. string beginldng or end (see. 2.1)  Note that expressions that contain the cross-product (.x.) or the composition ( o . . .) operatot, describe regular relations rather than regular hmguages. A regular relation is a mapping from one regular language to another one. t{egular languages correspond to simple finite-state automata; regular relations are modelled by finite-state transducers. null In the relation A .x. B, we call the first lnernber, h, the upper language and the second member, B, the lower language. This choice of words is motivated by the linguistic tradition of writing the result of a rule application underneath the original form. In a cascade of compdsitions, I~3..o. 1~2 .... o. Rn, which models a linguistic derivation by rewrite-rules, the upper side of the first relation, R1, contains the &amp;quot;underlying lexical  form&amp;quot;, while the lower side of the last relation, Rn, contains the resulting &amp;quot;surface form&amp;quot;. We recognize two kinds of symbols: simple symbols (a, b, c, etc.) and fst p'drs)ai (a:b, y:z, etc.). An Nt pair a : b can be thought of as tim crossproduct of a and b, the minimal relation consisting of a (the upper syml)ol) and b (the lower symbol).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML