File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/95/p95-1003_metho.xml

Size: 12,659 bytes

Last Modified: 2025-10-06 14:14:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="P95-1003">
  <Title>The Replace Operator</Title>
  <Section position="2" start_page="16" end_page="18" type="metho">
    <SectionTitle>
\[ NO UPPER \[UPPER .x. LOWER\] \] *
NO UPPER ;
</SectionTitle>
    <Paragraph position="0"> where NO UPPER abbreviates ~$ \[UPPER - \[\] \].</Paragraph>
    <Paragraph position="1"> The defi~ion describes a regular relation whose members contain any number (including zero) of iterations of \[UPPER . x. LOWER\], possibly alternating with strings not containing UPPER that are mapped to themselves.</Paragraph>
    <Paragraph position="2"> 1.1. Examples We illustrate the meaning of the replacement operator with a few simple examples. The regular expression \[8\] a b I c -&gt; x ; (same as \[\[a b\] \[ c\] -&gt; x) describes a relation consisting of an infinite set of pairs such as</Paragraph>
    <Paragraph position="4"> where all occurrences of ab and c are mapped to x interspersed with unchanging pairings. It also indudes all possible pairs like</Paragraph>
    <Paragraph position="6"> that do not contain either ab or c anywhere.</Paragraph>
    <Paragraph position="7"> Figure 1 shows the state diagram of a transducer that encodes this relation. The transducer consists of states and arcs that indicate a transition from  state to state over a given pair of symbols. For convenience we represent identity pairs by a single symbol; for example, we write a : a as a. The symbol ? represents here the identity pairs of symbols that are not explicitly present in the network. In this case, ? stands for any identity pair other than a : a, b : b, c : c, and x : x. Transitions that differ only with respect to the label are collapsed into a single multiply labelled arc. The state labeled 0 is the start state. Final states are distinguished by a double circle.</Paragraph>
    <Paragraph position="8">  Every pair of strings in the relation corresponds to a path from the initial 0 state of the transducer to a final state. The abaca to xaxa path is 0-1-0-20-2, where the 2-0 transition is over a c : x arc. In case a given input string matches the replacement relation in two ways, two outputs are produced. For example,  maps abc to both ax and xc: a b c , a b c a x x c  The corresponding transducer paths in Figure 2 are 0-1-3-0 and 0-2-0-0, where the last 0-0 transition is over a c arc.</Paragraph>
    <Paragraph position="9"> If this ambiguity is not desirable, we may write two replacement expressions and compose them to indicate which replacement should be preferred if a choice has to be made. For example, if the ab match should have precedence, we write</Paragraph>
    <Paragraph position="11"> Figure3: a b -&gt; x .o. b c -&gt; x This composite relation produces the same output as the previous one except for strings like abc where it unambiguously makes only the first replacement, giving xc as the output. The abe to xc path in Figure 3 is 0-2-0-0.</Paragraph>
    <Paragraph position="12"> 1.2. Special cases Let us illustrate the meaning of the replacement operator by considering what our definition implies in a few spedal cases.</Paragraph>
    <Paragraph position="13"> If UPPER is the empty set, as in \[\] -&gt; a \[ b  the expression compiles to a transducer that freely inserts as and bs in the input string.</Paragraph>
    <Paragraph position="14"> If UPPER describes the null set, as in,  the LOWER part is irrelevant because there is no replacement. This expression is a description of the sigma-star language.</Paragraph>
    <Paragraph position="15"> If LOWER describes the empty set, replacement becomes deletion. For example,  The optional replacement relation maps UPPER to both LOWER and UPPER. The optional version of &lt;is defined in the same way.</Paragraph>
  </Section>
  <Section position="3" start_page="18" end_page="22" type="metho">
    <SectionTitle>
2. Conditional replacement
</SectionTitle>
    <Paragraph position="0"> We now extend the notion of simple replacement by allowing the operation to be constrained by a left and a right context. A conditional replacement expression has four components: UPPER, LOWER, LEFT, and RIGHT. They must all be regular expressions that describe a simple language. We write the replacement part UPPER -&gt; LOWER, as before, and the context part as LEFT _ RIGHT, where the underscore indicates where the replacement takes place.</Paragraph>
    <Paragraph position="1"> In addition, we need a separator between the replacement and the context part. We use four alternate separators, \[ I, //, \ \ and \/, which gives rise to four types of conditional replacement expressions: null  We define UPPER-&gt; LOWER l\[ LEFT RIGHT and the other versions of conditional replacement in terms of expressions that are already in our regular expression language, including the unconditional version just defined. Our general intention is to make the conditional replacement behave exactly like unconditional replacement except that the operation does not take place unless the specified context is present.</Paragraph>
    <Paragraph position="2"> This may seem a simple matter but it is not, as Kaplan and Kay 1994 show. There are several sources of complexity. One is that the part that is being replaced may at the same time serve as the context of another adjacent replacement. Another complication is the fact just mentioned: there are several ways to constrain a replacement by a context. null We solve both problems using a technique that was originally invented for the implementation of phonological rewrite rules (Kaplan and Kay 1981, 1994) and later adapted for two-level rules (Kaplan, Karttunen, Koskenniemi 1987; Karttunen and  Beesley 1992). The strategy is first to decompose the complex relation into a set of relatively simple components, define the components independently of one another, and then define the whole operation as a composition of these auxiliary relations. We need six intermediate relations, to be defined shortly:  (1) InsertBrackets (2) ConstrainBrackets (3) LeftContext (4) RightContext (5) Replace (6) RemoveBrackets Relations (1), (5), and (6) involve the unconditional replacement operator defined in the previous section. null Two auxiliary symbols, &lt; and &gt;, are introduced in (1) and (6). The left bracket, &lt;, indicates the end of a  left context. The right bracket, &gt;, marks the beginning of a complete right context. The distribution of the auxiliary brackets is controlled by (2), (3), and (4). The relations (1) and (6) that introduce the brackets internal to the composition at the same time remove them from the result.</Paragraph>
    <Section position="1" start_page="19" end_page="22" type="sub_section">
      <SectionTitle>
2.2. Basic definition
</SectionTitle>
      <Paragraph position="0"> The full spedfication of the six component relations is given below. Here UPPER, LOWER, LEFT, and RIGHT are placeholders for regular expressions of any complexity.</Paragraph>
      <Paragraph position="1"> In each case we give a regular expression that precisely defines the component followed by an English sentence describing the same language or relation. In our regular expression language, we have to prefix the auxiliary context markers with the escape symbol % to distinguish them from other  The unconditional replacement of &lt;UPPER&gt; by &lt;LOWER&gt;, ignoring irrelevant brackets.</Paragraph>
      <Paragraph position="2"> The redundant brackets on the lower side are important for the other versions of the operation.  The relation that maps the strings of the upper language to the same strings without any context markers. null The upper side brackets are eliminated by the inverse replacement defined in (1).</Paragraph>
      <Paragraph position="3"> 2.3. Four ways of using contexts The complete definition of the first version of conditional replacement is the composition of these six relations:  The composition with the left and right context constraints prior to the replacement means that any instance of UPPER that is subject to replacement is surrounded by the proper context on the upper side. Within this region, replacement operates just as it does in the unconditional case.</Paragraph>
      <Paragraph position="4"> Three other versions of conditional replacement can be defined by applying one, or the other, or both context constraints on the lower side of the relation. It is done by varying the order of the three middle relations in the composition. In the rightoriented version (//), the left context is checked on the lower side of replacement:  deg deg deg The first three versions roughly correspond to the three alternative interpretations of phonological rewrite rules discussed in Kaplan and Kay 1994. The upward-oriented version corresponds to simultaneous rule application; the right- and left-oriented versions can model rightward or leftward iterating processes, such as vowel harmony and assimilation.</Paragraph>
      <Paragraph position="5"> The fourth logical possibility is that the replacement operation is constrained by the lower context.  When the component relations are composed together in this manner, UPPER gets mapped to LOWER just in case it ends up between LEFT and RIGHT in the output string.</Paragraph>
      <Paragraph position="6"> 2.4. Examples Let us illustrate the consequences of these definitions with a few examples. We consider four versions of the same replacement expression, starting with the upward-oriented version  a b-&gt; x II a b a ; applied to the string abababa. The resulting relation is ab ab a b a a b x x a The second and the third occurrence of ab are replaced by x here because they are between ab and  x on the upper side language of the relation* A transducer for the relation is shown in Figure 4.  following the path 0-1- 2- 5- 6-1- 2- 3. The last occurrence of ab must remain unchanged because it does not have the required left context on the lower side.</Paragraph>
      <Paragraph position="7"> The left-oriented version of the rule shows the opposite behavior because it constrains the left context on the upper side of the replacement relation and the right context on the lower side. \[37\] a b -&gt; x \\ a b a ; The first two occurrences of ab remain unchanged because neither one has the proper right context on the lower side to be replaced by x.</Paragraph>
      <Paragraph position="8"> Finally, the downward-oriented fourth version:  This time, surprisingly, we get two outputs from the same input: \[40\] ab abab a , ab ab aba a b x a b a a b a b x a Path 0-1-2-5-6-1-2-3 yields abxaba, path 01-2-3-4-5-6-1 gives us ababxa It is easy to see that if the constraint for the replacement pertains to the lower side, then in this case it can be satisfied in two ways.</Paragraph>
      <Paragraph position="9">  3. Comparisons 3.1. Phonological rewrite rules  Our definition of replacement is in its technical aspects very closely related to the way phonological rewrite-rules are defined in Kaplan and Kay 1994 but there are important differences. The initial motivation in their original 1981 presentation was to model a left-to-right deterministic process of rule application. In the course of exploring the issues, Kaplan and Kay developed a more abstract notion of rewrite rules, which we exploit here, but their 1994 paper retains the procedural point of view. Our paper has a very different starting point. The basic case for us is unconditional obligatory replacement, defined in a purely relational way without any consideration of how it might be applied. By starting with obligatory replacement, we can easily define an optional version of the operator. For Kaplan and Kay, the primary notion is optional rewriting. It is quite cumbersome for them to provide an obligatory version. The results are not equivalent.</Paragraph>
      <Paragraph position="10"> Although people may agree, in the case of simple phonological rewrite rules, what the outcome of a deterministic rewrite operation should be, it is not clear that this is the case for replacement expressions that involve arbitrary regular languages. For that reason, we prefer to define the replacement operator in relational terms without relying on an uncertain intuition about a particular procedure. 3.2. Two-level rules Our definition of replacement also has a close connection to two-level rules. A two-level rule always specifies whether a context element belongs to the input (= lexical) or the output (= surface) context of the rule. The two-level model also shares our pure relational view of replacement as it is not concerned about the application procedure. But the two-level formalism is only defined for symbol-tosymbol replacements.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML