File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/95/p95-1003_abstr.xml
Size: 5,510 bytes
Last Modified: 2025-10-06 13:48:22
<?xml version="1.0" standalone="yes"?> <Paper uid="P95-1003"> <Title>The Replace Operator</Title> <Section position="1" start_page="0" end_page="16" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> This paper introduces to the calculus of regular expressions a replace operator and defines a set of replacement expressions that concisely encode alternate variations of the operation. Replace expressions denote regular relations, defined in terms of other regular expression operators. The basic case is unconditional obligatory replacement. We develop several versions of conditional replacement that allow the operation to be constrained by context O. Introduction Linguistic descriptions in phonology, morphology, and syntax typically make use of an operation that replaces some symbol or sequence of symbols by another sequence or symbol. We consider here the replacement operation in the context of finite-state grammars.</Paragraph> <Paragraph position="1"> Our purpose in this paper is twofold. One is to define replacement in a very general way, explicitly allowing replacement to be constrained by input and output contexts, as in two-level rules (Koskenniemi 1983), but without the restriction of only single-symbol replacements. The second objective is to define replacement within a general calculus of regular expressions so that replacements can be conveniently combined with other kinds of operations, such as composition and union, to form complex expressions.</Paragraph> <Paragraph position="2"> Our replacement operators are close relatives of the rewrite-operator defined in Kaplan and Kay 1994, but they are not identical to it. We discuss their relationship in a section at the end of the paper. 0. 1. Simple regular expressions The replacement operators are defined by means of regular expressions. Some of the operators we use to define them are specific to Xerox implementations of the finite-state calculus, but equivalent formulations could easily be found in other notations. null The table below describes the types of expressions and special symbols that are used to define the replacement operators.</Paragraph> <Paragraph position="3"> A . x. B crossproduct (Cartesian product) A . o. B composition Square brackets, \[ l, are used for grouping expressions. Thus \[AI is equivalent to A while (A) is not. The order in the above table corresponds to the precedence of the operations. The prefix operators (-, \, and $) bind more tightly than the postfix operators (*, +, and/), which in turn rank above concatenation. Union, intersection, and relative complement are considered weaker than concatenation but stronger than crossproduct and composition. Operators sharing the same precedence are interpreted left-to-right. Our new replacement operator goes in a class between the Boolean operators and composition. Taking advantage of all these conventions, the fully bracketed expression</Paragraph> <Paragraph position="5"> can be rewritten more concisely as</Paragraph> <Paragraph position="7"> Expressions that contain the crossproduct (. x.) or the composition (. o.) operator describe regular relations rather than regular languages. A regular relation is a mapping from one regular language to another one. Regular languages correspond to simple finite-state automata; regular relations are modeled by finite-state transducers. In the relation a . x. B, we call the first member, A, the upper language and the second member, B, the lower language. null To make the notation less cumbersome, we systematically ignore the distinction between the language A and the identity relation that maps every string of A to itself. Correspondingly, a simple automaton may be thought of as representing a language or as a transducer for its identity relation. For the sake of convenience, we also equate a language consisting of a single string with the string itself. Thus the expression abc may denote, depending on the context, (i) the string abc, (ii) the language consisting of the string abc, and (iii) the identity relation on that language.</Paragraph> <Paragraph position="8"> We recognize two kinds of symbols: simple symbols (a, b, c, etc.) and fst pairs (a : b, y : z, etc.). An fst pair a : b can be thought of as the crossproduct of a and b, the minimal relation consisting of a (the upper symbol) and b (the lower symbol). Because we regard the identity relation on A as equivalent to A, we write a : a as just a. There are two special ? any symbol in the known alphabet and its extensions.</Paragraph> <Paragraph position="9"> The escape character, %, allows letters that have a special meaning in the calculus to be used as ordinary symbols. Thus %& denotes a literal ampersand as opposed to &, the intersection operator; %0 is the ordinary zero symbol.</Paragraph> <Paragraph position="10"> The following simple expressions appear frequently in our formulas:</Paragraph> <Paragraph position="12"> ?* the universal (&quot;sigma-star&quot;) language: all possible strings of any length including the empty string.</Paragraph> <Paragraph position="13"> 1. Unconditional replacement To the regular-expression language described above, we add the new replacement operator. The unconditional replacement of UPPER by LOWER is</Paragraph> <Paragraph position="15"> Here UPPER and LOWER are any regular expressions that describe simple regular languages. We define this replacement expression as</Paragraph> </Section> class="xml-element"></Paper>