File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-2105_metho.xml

Size: 22,378 bytes

Last Modified: 2025-10-06 14:14:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-2105">
  <Title>Parallel Replacement in Finite State Calculus</Title>
  <Section position="3" start_page="622" end_page="624" type="metho">
    <SectionTitle>
2 Parallel Replacement
</SectionTitle>
    <Paragraph position="0"> Conditional parallel replacement denotes a relation which maps a set of n expressions Ui (i E \[1, n\]) in the upper language into a set of corr~;sponding n expressions Li in the lower language if, and only if, they occur between a Left and a right context (ll,</Paragraph>
    <Paragraph position="2"> Unconditiomd parallel replacement denotes a similar relation where the replacement is not constraint by contexts.</Paragraph>
    <Paragraph position="3"> Conditional parallel replacement corresponds to what Kaplan and Kay 0994) call &amp;quot;batch rules&amp;quot; where a. set of rules (replacements. .) is collected togel;her m a batch and performed m parallel, at the same time, in a way that all of them work on the same input, i.e. not one applies to the output of another replacement.</Paragraph>
    <Section position="1" start_page="622" end_page="622" type="sub_section">
      <SectionTitle>
2.1 Examples
</SectionTitle>
      <Paragraph position="0"> Regular expressions based on \[3\] can be abbreviated if some of the Ut 1 EIt-I,OWF, I{ pairs, and/or some of the LEI.'T-I{IGIIT pairs, are equivalent. The complex expression: { a -&gt; b , b -&gt; c II x _ y } ; \[4\] which contains multiple replacement in one left and right context, can be written in a more elementary way as two parallel replacements:</Paragraph>
      <Paragraph position="2"> with more than one label actually stands for a set of arcs with one label each.) Figure 1 shows the state diagram of a transducer resulting from \[4\] or \[5\]. 'Fhe transducer maps the string xaxayby to xaxbyby following the path 0-1-2-1-3-0-0-0 and the string xbybyxa to xcybyxa following the path 0-1-3-0-0-0-1-2.</Paragraph>
      <Paragraph position="3"> The complex expression { a -&gt; b , b -&gt; c I I x _ y , v _ w } , \[6\] { a -&gt; c\[I p - q } ; contains five single parallel replacements:</Paragraph>
      <Paragraph position="5"> where a is replaced by b only when occuriug between x and y, or after v, or before w.</Paragraph>
      <Paragraph position="6"> An unspecitied context is equivalent to ?% the universal (sigma-star) language. Similarly, a specitied context, such as x _ y, is actually interpreted as ?* x _ y ?*, that is, implicitly extending the context to infinity on both sides of the replacement.</Paragraph>
      <Paragraph position="7"> 'l'his is a useful convention, but we also need to be able to refer explicitly to the beginning or the end of a string. For this purpose, we introduce a special symbol, .#. (Kaplan and Kay, 1994, p. 349).</Paragraph>
      <Paragraph position="8"> In the example { a -&gt; b II .#.- , v _ 7 ? .#,} ; \[9\] a is replaced by b only when it is at the beginning of a string or/)etween v and the two tinal symbols of a string I.</Paragraph>
    </Section>
    <Section position="2" start_page="622" end_page="622" type="sub_section">
      <SectionTitle>
2.2 ReI)la('ement of the Empty String
</SectionTitle>
      <Paragraph position="0"> The language described by the UI)PER \[)art of a</Paragraph>
      <Paragraph position="2"> can contain the empty string e. In this case, every string that is in the upper-side language of the relation, is mapped to an infinite set of strings in the lower-side language as the upper-side string can be considered as a concatenation of empty and non-empty substrings, with e at any position and in any number. E.g.</Paragraph>
      <Paragraph position="4"> maps the string bb to the infinite set of strings bb, xbb, xbxb, xbxbx, xxbb, etc., since the language described by a* contains e, and the string bb can be considered as a result of any one of the concatenations b~b, e~'b~b, e~'b~b, ~b~e.~b~c, ~e~b~b, etc.</Paragraph>
      <Paragraph position="5"> For many practical l)urposes it is convenient to construct a version of empty-string replacement that allows only one application between any two adjacent symbols (Karttunen, 1995). In order not to confllse the notation by a non-standard interpretation of the notion of empty string, we introduce a special pair of brackets, \[. .\], placed around the  upper side of a replacement expression that presupposes a strict alternation of empty substrings and non-empty substrings of exactly one symbol: e x e y e z e ... \[12\] In applying this to the above example, we obtain \[. a* .1 -&gt; x II - ; \[13\] that maps the string bb only to xbxbx since bb is here considered exclusively as a result of the concatenation c.~b~C/~b~.</Paragraph>
      <Paragraph position="6"> If contexts are specified (in opposition to the above example) then they are taken into account.</Paragraph>
    </Section>
    <Section position="3" start_page="622" end_page="624" type="sub_section">
      <SectionTitle>
2.3 The Algorithm
</SectionTitle>
      <Paragraph position="0"> The replacement of one substring by another one inside a context, requires the introduction of auxiliary symbols (e.g. brackets). Kaplan and Kay (1994) motivate this step.</Paragraph>
      <Paragraph position="1"> If we would use an expression like 1, \[Ui .x. Li\] ri \[14\] to map. a particular Ui (i E .\[1, n\]) to l,i when occuring between a left and a right context, li and ri, then every li and ri would map substring adjacent to Ui.</Paragraph>
      <Paragraph position="2"> However, this approach is impossible for the following reason (Kaplan and Kay, 1994): In an example like { a -&gt; b II x _ x } ; \[15\] where we expect xaxax to be replaced by xbxbx, the middle x serves as a context for both a's. A relation described by \[14\] could not accomplish this. The middle x would be mapped either by an ri or by an li but not by both at the same time. That is why only one a could be replaced and we would get two alternative lower strings, xbxax and xaxbx. Therefore, we have to use the contexts, li and ri, without mapping them. For this purpose we introduce auxiliary brackets &lt;i after every left context li and &gt;i before every right context ri. The replacement maps those brackets without looking at the actual contexts.</Paragraph>
      <Paragraph position="3"> We need separate brackets for empty and non-empty UPPER. If we used the same bracket for both this would mean an overlap of the substrings to replace in an example like X&gt;l&lt;la&gt;l. Here we might have to replace &gt;1&lt;1 and &lt;la&gt;l where &lt;1 is part of both substrings. Because of this overlap, we could not replace both substrings in parallel, i.e. at the same time. To make the two replacements sequentially is also impossible in either order, for reasons in detail explained in (Kempe and Karttunen, 1995).</Paragraph>
      <Paragraph position="4"> A regular relation describing replacement in context (and a transducer that represents it), is defined by the composition of a set of &amp;quot;simpler&amp;quot; auxiliary relations. Context brackets occur only in intermediate relations and are not present in the final resuit. null  Before tile replacement we make the following three transformations:  (1) Complex regular expressions like \[4\] are transformed into elementary ones like \[5\], where every single replacement consists of only one UI-'I~ER, one LOWER, one LEI?T and one RIGHT expression.</Paragraph>
      <Paragraph position="5"> E.g.</Paragraph>
      <Paragraph position="6"> { \[.(a).\] -&gt; b II x_ y } , { \[ \] -&gt; c , e -&gt; f II v _ ~ } ; \[16\] would be expanded to { \[.(a).\] -&gt; b l\[ x _ y } ,</Paragraph>
      <Paragraph position="8"> (2) Since we have to use different types of brackets for the replacement of empty and non-empty UPPER (el. 2.3.1), we split the set of parallel replacements into two groups, one containing only replacements with empty UPPER and the other one only with non-empty UPPER. If an UPPER contains the empty string but is not identical with it, the replacement will be added to both groups but with a different UPPER. E.g. \[\].7\] wouldbe split into  the group of empty UPPER.</Paragraph>
      <Paragraph position="9"> (3) All empty UPPER of type \[ \] are trans- null formed into type \[. .\] and the corresponding LOWER are replaced by their Kleene star flmction. E.g. \[19\] would be transformed into</Paragraph>
      <Paragraph position="11"> The following algorithm of conditional parallel replacement will consider all empty UPPER as being of type \[. . \], i.e. as not being adjacent to another empty string.</Paragraph>
      <Paragraph position="12">  Apart fi'om the previously explained symbols, we will make use of the following symbols in the next regular expressions: \[21\] &lt;o,, \[ &lt;,~ I...I &lt;mE \], union of all left brackets for empty UPPER.</Paragraph>
      <Paragraph position="13">  inside the string abe, i.e_ laetween'a and b and between b and c, alL x will be ignored any number of times.  We compose the conditional parallel replacement of the six auxiliary relations described by Kaplan and Kay (1994) and Karttunen (1995) which are:  (1) InsertBrackets \[22\] (2) ConstrainBrackets (3) LeftContext (4) RightContext (5) Replace (6) RemoveBrackets  The composition of these relations in the above order, defines the npward-oriented replacement. The resulting transducer maps UPPER inside an irtput string to LOWER: when UPl't,;I/, is between l,l~\]l,&amp;quot;\[&amp;quot; and tlIGHT in the input context, leaving everything else unchanged. Other wu:iants of the replacement opel:ator will be defined later.</Paragraph>
      <Paragraph position="14"> For every single replacement { Ui -&gt; 1,i II li ri } we introduce a separate pair of brackets &lt;i and &gt;i with i * \[1E...mE\] if UPI'Et{ is identical with the empty string and i ff \[\]...n\] if UPPEI-t does not contain the empty string. A left bracket &lt;i indicates the end of a complete left context. A right bracket &gt;i marks the beginning of a complete right context.</Paragraph>
      <Paragraph position="15"> We define the component relations irl the following way. Note that UI'PI,\]R, LOW|!\]t{, I,I,;FT and IHGtIT (Ui, Li, li and ri) stand for regular expressions of any complexity but restricted to denote regular languages. Consequently, they are represented by networks thai; contain no fst pairs.</Paragraph>
      <Paragraph position="17"> The relation inserts instances of all brackets on the lower side (everywhere and in any numl)er and order).</Paragraph>
      <Paragraph position="18">  (2) ConstrainBraekets</Paragraph>
      <Paragraph position="20"> The language does not apply to single brackets but to their types and allows them to be only in the following order: &gt;atlNt,7,* &gt;a/IF,* &lt;all/';* &lt;aaNI,:* \[25\] The composition of the steps (1) and (2) invokes this constraint, which is necessary for the tbllowing reasons: If we allowed sequences like &lt;3 Ua &lt;1&gt;3 U1 &gt;1 we would have an overlap of the two substrin~s &lt;a U3 &gt;:l and &lt;, U1 &gt;1 which have to be replacea. Itere, either U1 or Ua could be replaced but not both at the same time.</Paragraph>
      <Paragraph position="21"> If we permitted sequences like &gt;11z&lt;=&lt;ll~' U2 &gt;2 we would also have an overlap of the two replacements which means we could either replace  &lt;2 U2 &gt;u or &gt;lU&lt;lle but not both.</Paragraph>
      <Paragraph position="22"> (3) LeftContext</Paragraph>
      <Paragraph position="24"> The constraint forces every instance of a left bracket &lt;i to be immediately preceded by tilt; corresponding left context li and every instance of'li to be immediately folk)wed by &lt;i, ignoring all brackets that are different from &lt;i irlbetween, and all brackets ..... (&lt;i included. * ) inside, . Ii ( .... /.) We ,separately.,. make the constraints Ai for every &lt;i and li and then intersect them in order to get tim constraint for all  left brackets and contexts.</Paragraph>
      <Paragraph position="25"> (4) RightContext</Paragraph>
      <Paragraph position="27"> 'l'he constraint relates instances of right brackets &gt;i and of right contexts ri, attd is the mirror image of step (3). We &amp;;rive it from the left context constraint by reversing every right context r~, before making the single constraints ,~i (not pi) and revel:sing again the result after having intersected all )h.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="624" end_page="625" type="metho">
    <SectionTitle>
(5) Replace
</SectionTitle>
    <Paragraph position="0"> EHn\],Ar \[28\] 'i'he relation mal)s every bracketed I.Jl'l'l,;I/, &lt;i Ui &gt;i for non-empty UI'PEI{ and &gt;i&lt;i for empty UPPI)\]I/., to the corresponding bracketed LOWEll, &lt;i Li &gt;i, leaving everything else unchanged.</Paragraph>
    <Paragraph position="1"> i ' \] string not 'l'he term N&amp;quot; n \[28 means a that does contain ~my bracketed UPPEI{: .IV&amp;quot; = J~IE g...g #~mE g J~'l gO...g J~n \[29\] A particular bracketed empty UPPEll &gt;i&lt;i is excluded lY=om the correspondiug N (i * \[~Z,:, ,,,lC\])</Paragraph>
    <Paragraph position="3"> and a bracketed non-empty UPPER &lt;i Ui &gt;i is excluded from the corresponding A// (i * \[1, n\]) by</Paragraph>
    <Paragraph position="5"> I he term TC/m expression \[28\] abbrevmtes a relation that maps any bracketed -UPPER to the corresponding bracketed I,OWER. It is the union ot' all single TQ relations mapping all occurl:ences of one Ui (empty and non-empty) to the corresponding</Paragraph>
    <Paragraph position="7"> To illustrate this: Suppose we have a set of replacements containing among others a-&gt; b II x_ y ; \[34:\] This particular replacement is done by mapping inside an input string every substring that looks like (underlined part) \[35\] ...x &gt;2&gt;l&gt;IE&lt;1N&lt;2 &lt;18-&gt;1 &gt;2&gt;IE&lt;IE&lt;I&lt;2y...</Paragraph>
    <Paragraph position="8"> using the brackets &lt;1 and &gt;t to a substring (underlined part)  In the following example we replace the empty U2E by L2E. Suppose we have in total one replacement of non-empty UPPER and two of empty UP-PER, one of which is \[..\] -&gt; b I I x_ y ; \[38\] This replacement is done by mapping inside a string every substring that looks like (underlined part) ...x &gt;1&gt;1E &gt;2E &lt;2E &lt;1E&lt;1 y... \[39\] using the brackets &gt;2E&lt;2E into a substring (underlined part) ...x &gt;1&gt;1. I&gt;1 I&lt;1. I&lt;d* \[40\]</Paragraph>
    <Paragraph position="10"> The occurrence of exactly one bracket pair &gt;iE and &lt;iE between a left and a right context, actually corresponds to the definition of a (single) empty string expressed by \[. .\] (ef. sac. 2.2).</Paragraph>
    <Paragraph position="11"> The brackets \[&gt;2E t &gt;lE I &lt;lE I &lt;1\] and \[&gt;1 \]&gt;rE I &lt;lE \] &lt;2El in \[40\] are inserted on the lower side any number of times (including zero), i.e.</Paragraph>
    <Paragraph position="12"> they exist optionally, which makes them present if checking for the left or right context requires them, and absent if they are not allowed in this place.</Paragraph>
    <Paragraph position="13"> This set of brackets does not contain those ones used for the replacement, &gt;i&lt;i, because if we later check for them we do not want this check to be always satisfied but only when the specified contexts are present, in order to be able to confirm or to cancel the replacement a posteriori.</Paragraph>
    <Paragraph position="14"> This set of optionally inserted brackets equally does not contain those which potentially could be used for the replacement of adjacent non-empty strings, i.e. &gt;aUNE on the left and &lt;aUNE on the right side of the expression. Otherwise, checking later for the legitimacy of the adjacent replacements would no longer be possible.</Paragraph>
  </Section>
  <Section position="5" start_page="625" end_page="625" type="metho">
    <SectionTitle>
(6) RemoveBrackets
</SectionTitle>
    <Paragraph position="0"> -&gt; \[ \] \[41\] The relation eliminates from the lower-side language all brackets that appear on the upper side.</Paragraph>
    <Section position="1" start_page="625" end_page="625" type="sub_section">
      <SectionTitle>
3 Variants of Replacement
3.1 Application of context constraints
</SectionTitle>
      <Paragraph position="0"> We distinguish four ways how context can constrain the replacement. The difference between them is where the left and the right contexts are expected, on the upper or on the lower side of the relation, i.e.</Paragraph>
      <Paragraph position="1"> LEFT and RIGHT contexts can be checked before or after the replacement.</Paragraph>
      <Paragraph position="2"> We obtain these four different applications of context constraints (denoted by I1, //, \\ and V) by varying the order of the auxiliary relations (steps (3) to (5)) described in section 2*3.3  (cf. \[221): (a) Upward-oriented { U1 -&gt; L1 II 11 _ ra } .... \[42\] .... { U.-&gt; L. II In _r. } *..LeftContext .o. RightContext .o. Replace*.. (b) Right-oriented { U1 -&gt; LI II h - rl } .... \[43\] *..Righteontext .o. Replace . o. LeftContext... (c) Left-oriented { vl -&gt; L1 \\ 11 - ,'1 } .... \[44\] *..LeftContext .o. Replace .o. RightContext... (d) Downward-oriented { /\]1 -&gt; L1 \/ 11 _ rl } .... \[45\]  *..Replace .o. LeftContext .o. RightContext... The versions (a) to.()c roughly, correspond to the three alternative interpretations of phonological rewrite rules discussed in Kaplan and Kay (1994). The upward-oriented version corresponds to the simultaneous rule application; the right- and left-oriented versions can model rightward or leftward iterating processes, such as vowel harmony and assimilation.</Paragraph>
      <Paragraph position="3"> In the downward-oriented replacement the operation is constrained by the lower (left and right) context. Here the Ui get mapped to the corresponding L/ just in case they end up between l{ and ri in the output string.</Paragraph>
    </Section>
    <Section position="2" start_page="625" end_page="625" type="sub_section">
      <SectionTitle>
3.2 Inverse, bidirectional and optional
replacement
</SectionTitle>
      <Paragraph position="0"> Replacement as described above, -&gt;, maps every U{ on the upper side unambiguously to the corresponding Li on the lower side but not vice versa.</Paragraph>
      <Paragraph position="1"> A L{ on the lower side gets mapped to Li or U{ on the upper side.</Paragraph>
      <Paragraph position="2"> The inverse replacement, &lt;-, maps unambiguously from the lower to the upper side only. The bidirectional replacement, &lt;-&gt;, is unambiguous in both directions.</Paragraph>
      <Paragraph position="3"> Replacements of all of these three types (directions) can be optional, (-&gt;) (&lt;-) (&lt;-&gt;), i.e. they are either made or not. We define such a relation by changing Af (the part not containing any bracketed UPPER) in expression \[28\] into ?* that accepts every substring: \[ ?* ~\]* ?* \[46\] Here an Ui is either mapped by the corresponding TQ contained in 7~ (cf. \[32\]) and therefore replaced by Li, or it is mapped by ?* and not replaced.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="625" end_page="626" type="metho">
    <SectionTitle>
4 A Practical Application
</SectionTitle>
    <Paragraph position="0"> In this section we illustrate the usefulness of the replace operator using a practical example.</Paragraph>
    <Paragraph position="1"> We show how a lexicon of French verbs ending in -it, inflected in the present tense subjunctive mood, can be derived from a lexicon containing the corresponding present indicative forms. We assume here that irregular verbs are encoded separately.</Paragraph>
    <Paragraph position="2"> It is often proposed that the present subjunctive of-it verbs be derived, for the most basic case, from  a stem in -iss- (e.g.: finir/finiss) rather than from a more general root (e.g.: fin(i)) because once this stern is assumed, the snbjunctive ending itself becomes completely regular: (that l finish) (that I run) que je flniss-c que je cour-e que tu finiss-cs quc tu cour-es que ils flniss-ent qucils cour-en* '\]?he algorithm we propose },ere, is strMghtforward: We first derive the present subjunctive stem from the third person plural t)resent indicative (e.g'...fini~%'~ cour), then append the suffix corresponding to the given person and number.</Paragraph>
    <Paragraph position="3"> The first step can be described as follows:  dego.</Paragraph>
    <Paragraph position="4"> \[ e n t &lt;-&gt; SUFF 1\] _ TAG \] ; The first transducer in \[49\] inserts the tags of the third person plural present indicative between the word and the tags of the actually required subjunctive form. The second transducer in\[49\] which is an indicative lexicon of -Jr verbs, concatenated with a sequence of at least one tag, provides the indicative form and keeps the initial subjunctive tags. The last transducer in \[49\] replaces the suffix -cut by the symbol SUFF. E.g.: inir ................... SubjP PL P2 Verb finir _ IndP PL P3_Verb SubjP PL_P2 Verb f inissent ............... SubjP PL_P2 Verb finis s_SUFF ............. Subj P_PL_P2_Verb 'I?o append the appropriate suffix to the subjunctive stem, we use the following transducer which maps the symbol SUFF to a suffix and deletes all tags: \[50\]  The complete generation of subjunctive forms can be described by the composition: define LexSnbjP : \[51\] StemRegular .o. Suffix ; The resulting (single) transducer LexSubjP represents a lexicon of present subjunctive forms of French verbs ending in -iv. It maps the infinitive of those verbs followed by a sequence of subjunctive tags, to the corresponding inflected surface form and vice versa.</Paragraph>
    <Paragraph position="5"> All intermediate transducers mentioned in this section will contribute to this finM t, ransducer bnt will themselves disappear.</Paragraph>
    <Paragraph position="6"> The regular expressions in this section could also be written in the two-level formalism (Koskenniemi, 1983). However, some of them can be expressed more conveniently in the above way, espe-ciMly when tile replace operator is used. E.g., the first line of \[49\], written above as: \[..\] &lt;-&gt; IndP PL P3 Verb I I LETTER _ TAG \[52\] would have to be expressed in the two-level formalism by four rules:  IIere, the difficulty comes not only from the large nmnber of rules we would have to write in the above example, but also from the fact that writing one of lihese rules requires to have in mind all the others, to avoid inconsistencies between them.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML