File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/p96-1015_intro.xml

Size: 6,432 bytes

Last Modified: 2025-10-06 14:06:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="P96-1015">
  <Title>Directed Replacement</Title>
  <Section position="3" start_page="0" end_page="109" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Transducers compiled from simple replace expressions UPPER -&gt; LOWER (Karttunen 1995, Kempe and Karttunen 1996) are generally nondeterministic in the sense that they may yield multiple results even if the lower language consists of a single string. For example, let us consider the transducer in Figure 1, representing a b I b I b a I a b a-&gt; x. 1  four paths with &amp;quot;aba&amp;quot; on the upper side are: &lt;0 a 0 b:x 2 a 0&gt;, &lt;0 a 0 b:x 2 a:0 0&gt;, &lt;0 a:x 1 b:0 2 a 0&gt;, and &lt;0 a:x 1 b:0 2 a:0 0&gt;.</Paragraph>
    <Paragraph position="1"> The application of this transducer to the input &amp;quot;aba&amp;quot; produces four alternate results, &amp;quot;axa&amp;quot;, &amp;quot;ax&amp;quot;, &amp;quot;xa&amp;quot;, and &amp;quot;x&amp;quot;, as shown in Figure 1, since there are four paths in the network that contain &amp;quot;aba&amp;quot; on the upper side with different strings on the lower side.</Paragraph>
    <Paragraph position="2"> This nondeterminism arises in two ways. First of all, a replacement can start at any point. Thus we get different results for the &amp;quot;aba&amp;quot; depending on whether we start at the beginning of the string or in the middle at the &amp;quot;b&amp;quot;. Secondly, there may be alternative replacements with the same starting point. In the beginning of &amp;quot;aba&amp;quot;, we can replace either &amp;quot;ab&amp;quot; or &amp;quot;aba&amp;quot;. Starting in the middle, we can replace either &amp;quot;b&amp;quot; or &amp;quot;ba&amp;quot;. The underlining in Figure 2 shows aba aba aba aba a X a a X X a X  the four alternate factorizations of the input string, that is, the four alternate ways to partition the string &amp;quot;aba&amp;quot; with respect to the upper language of the replacement expression. The corresponding paths in the transducer are listed in Figure 1.</Paragraph>
    <Paragraph position="3"> For many applications, it is useful to define an- null other version of replacement that produces a unique outcome whenever the lower language of the relation consists of a single string. To limit the number of alternative results to one in such cases, we must impose a unique factorization on every input.</Paragraph>
    <Paragraph position="4"> The desired effect can be obtained by constraining the directionality and the length of the replacement. Directionality means that the replacement sites in the input string are selected starting from the left or from the right, not allowing any overlaps. The length constraint forces us always to choose the longest or the shortest replacement whenever there are multiple candidate strings starting at a given location. We use the term directed replacement to describe a replacement relation that is constrained by directionality and length of match. (See the end of Section 2 for a discussion about the choice of the term.) With these two kinds of constraints we can define four types of directed replacement, listed in Figure  For reasons of space, we discuss here only the leftto-right, longest-match version. The other cases are similar.</Paragraph>
    <Paragraph position="5"> The effect of the directionality and length constraints is that some possible replacements are ignored. For example, a b I b I b a \[ a b a @-&gt; x maps &amp;quot;aba&amp;quot; uniquely into &amp;quot;x&amp;quot;, Figure 4.  single path with &amp;quot;aba&amp;quot; on the upper side is: &lt;0 a:x I b:O 2 a:O 0&gt;.</Paragraph>
    <Paragraph position="6"> Because we must start from the left and have to choose the longest match, &amp;quot;aba&amp;quot; must be replaced, ignoring the possible replacements for &amp;quot;b&amp;quot;, &amp;quot;ba&amp;quot;, and &amp;quot;ab&amp;quot;. The (c)-&gt; operator allows only the last factorization of &amp;quot;aba&amp;quot; in Figure 2.</Paragraph>
    <Paragraph position="7"> Left-to-right, longest-match replacement can be thought of as a pr.ocedure that rewrites an input string sequentially from left to right. It copies the input until it finds an instance of UPPER. At that point it selects the longest matching substring, which is rewritten as LOWER, and proceeds from the end of that substring without considering any other alternatives. Figure 5 illustrates the idea.</Paragraph>
    <Paragraph position="8">  It is not obvious at the outset that the operation can in fact be encoded as a finite-state transducer for arbitrary regular patterns. Although a unique substring is selected for replacement at each point, in general the transduction is not unambiguous because LOWER is not required to be a single string; it can be any regular language.</Paragraph>
    <Paragraph position="9"> The idea of treating phonological rewrite rules in this way was the starting point of Kaplan and Kay (1994). Their notion of obligatory rewrite rule incorporates a directionality constraint. They observe (p. 358), however, that this constraint does not by itself guarantee a single output. Kaplan and Kay suggest that additional restrictions, such as longestmatch, could be imposed to further constrain rule application. 2 We consider this issue in more detail. The crucial observation is that the two constraints, left-to-right and longest-match, force a unique factorization on the input string thus making the transduction unambiguous if the L01gER language consists of a single string. In effect, the input string is unambiguously parsed with respect to the UPPER language. This property turns out to be important for a number of applications. Thus it is useful to provide a replacement operator that implements these constraints directly.</Paragraph>
    <Paragraph position="10"> The definition of the UPPER @-&gt; LOWER relation is presented in the next section. Section 3 introduces a novel type of replace expression for constructing transducers that unambiguously recognize and mark 2The tentative formulation of the longest-match constraint in (Kaplan and Kay, 1994, p. 358) is too weak. It does not cover all the cases.</Paragraph>
    <Paragraph position="11">  instances of a regular language without actually replacing them. Section 4 identifies some useful applications of the new replacement expressions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML