File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/e99-1008_intro.xml

Size: 8,367 bytes

Last Modified: 2025-10-06 14:06:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="E99-1008">
  <Title>Range Concatenation Grammars</Title>
  <Section position="3" start_page="53" end_page="54" type="intro">
    <SectionTitle>
2 Range Concatenation Grammars
</SectionTitle>
    <Paragraph position="0"> This section introduces the notion of RCG and presents some of its properties, more details appear in \[Boullier 98a\].</Paragraph>
    <Paragraph position="1"> Definition 1 A positive range concatenation grammar (PRCG) G = (N,T, V,P,S) is a 5-tuple where N is a finite set o\] predicate names, T and V are finite, disjoint sets of terminal symbols and variable symbols respectively, S E N is the start predicate name, and P is a finite set of clauses C/0 --* C/1-.-Cm where m &gt;_ 0 and each o\]C/0,C/1,... ,era is a predicate of the form A(al,..., ap) where p &gt;_ 1 is its arity, A E N and each of ai E (T U V)*, 1 &lt; i &lt; p, is an argument.</Paragraph>
    <Paragraph position="2"> Each occurrence of a predicate in the RHS of a clause is a predicate call, it is a predicate definition if it occurs in its LHS. Clauses which define predicate A are called A-clauses. This definition assigns a fixed arity to each predicate name. The arity of S, the start predicate name, is one. The arity k of a grammar (we have a k-PRCG), is the maximum arity of its predicates.</Paragraph>
    <Paragraph position="3"> Lower case letters such as a, b, c,... will denote terminal symbols, while late occurring upper case letters such as T, W, X, Y, Z will denote elements of V.</Paragraph>
    <Paragraph position="4"> The language defined by a PRCG is based on the notion of range. For a given input string w = al...an a range is a couple (i,j), 0 &lt; i &lt; j _&lt; n of integers which denotes the occurrence of some substring ai+l.., aj in w. The number i is its lower bound, j is its upper bound and j - i is its size. If i = j, we have an empty range. We will 1 Since this closure properties can be reached without changing the structure (grammar) of the constituents (i.e. we can get the intersection of two grammars G1 and G2 without changing neither G1 nor G2), this allows for a form of modularity which may lead to the design of libraries of reusable grammatical components. null use several equivalent denotations for ranges: an explicit dotted notation like wl * w2 * w3 or, if w2 extends from positions i + 1 through j, a tuple notation (i..j)~, or (i..j) when w is understood or of no importance. Of course, only consecutive ranges can be concatenated into new ranges. In any PRCG, terminals, variables and arguments in a clause are supposed to be bound to ranges by a substitution mechanism. An instantiated clause is a clause in which variables and arguments are consistently (w.r.t. the concatenation operation) replaced by ranges; its components are instantiated predicates.</Paragraph>
    <Paragraph position="5"> For example, A( (g..h), (i..j), (k..1) ) --* B((g+l..h), (i+l..j-1), (k..l-1)) is an instantiation of the clause A(aX, bYc, Zd) --* B(X, \]7, Z) if the source text al...an is such that ag+l = a,a~+l = b, aj = c and al = d. In this case, the variables X, Y and Z are bound to (g+l..h), (i+l..j-t) and (k..l-1) respectively. 2 For a grammar G and a source text w, a derive relation, denoted by =~, is defined on strings of G,w instantiated predicates. If an instantiated predicate is the LHS of some instantiated clause, it can be replaced by the RHS of that instantiated clause.</Paragraph>
    <Paragraph position="6"> Definition 2 The language of a PRCG G = (N, T, V, P, S) is the set</Paragraph>
    <Paragraph position="8"> An input string w = al...an is a sentence if and only if the empty string (of instantiated predicates) can be derived from S((0..n)), the instantiation of the start predicate on the whole source text.</Paragraph>
    <Paragraph position="9"> The arguments of a given predicate may denote discontinuous or even overlapping ranges. Fundamentally, a predicate name A defines a notion (property, structure, dependency,... ) between its arguments, whose ranges can be arbitrarily scattered over the source text. PRCGs are therefore well suited to describe long distance dependencies. Overlapping ranges arise as a consequence of the non-linearity of the formalism. For example, the same variable (denoting the same range) may occur in different arguments in the RHS of some clause, expressing different views (properties) of the same portion of the source text.</Paragraph>
    <Paragraph position="10"> 2Often, for a variable X, instead of saying the range which is bound to X or denoted by X, we will say, the range X, or even instead of the string whose occurrence is denoted by the range which is bound to X, we will say the string X.</Paragraph>
    <Paragraph position="11">  Proceedings of EACL '99 Note that the order of RI-IS predicates in a clause is of no importance.</Paragraph>
    <Paragraph position="12"> As an example of a PRCG, the following set of clauses describes the three-copy language {www \[ w * {a,b}*} which is not a CFL and even lies beyond the formal power of TAGs.</Paragraph>
    <Paragraph position="13">  S(XYZ) ~ A(X,Y,Z) A(aX, aY, aZ) --* A(X, Y, Z) A(bX, bY, bZ) --* A(X, Y, Z) A(c, ~, e) --* e Definition 3 A negative range concatenation grammar (NRCG) G = (N, T, V, P, S) is a 5 null tuple, like a PRCG, except that some predicates occurring in RHS, have the form A(al,..., ctp). A predicate call of the form A(al,...,ap) is said to be a negative predicate call. The intuitive meaning is that an instantiated negative predicate succeeds if and only if its positive counterpart (always) fails. The idea is that the language defined by A(al,...,ap) is the complementary w.r.t T* of the language defined by A(ax,...,ap). More formally, the couple A(p-') =~ e is in the derive relation if and only if /SA(p&amp;quot;) ~ e. Therefore this definition is based on a &amp;quot;negation by failure&amp;quot; rule. However, in order to avoid inconsistencies occurring when an instantiated predicate is defined in terms of its negative counterpart, we prohibit derivations exhibiting this possibility. 3 Thus we only define sentences by so called consistent derivations. We say that a grammar is consistent if all its derivations are consistent.</Paragraph>
    <Paragraph position="14"> Definition 4 A range concatenation grammar (RCG) is a PRCG or a NRCG.</Paragraph>
    <Paragraph position="15"> The PRCG (resp. NRCG) term will be used to underline the absence (resp. presence) of negative predicate calls.</Paragraph>
    <Paragraph position="16"> 3As an example, consider the NRCG G with two clauses S(X) --* S(X) and S(e) --* e and the source text w = a. Let us consider the sequence S(*a.) G,w S(*a*) ~ e. If, on the one hand, we consider this G,w sequence as a (valid) derivation, this shows, by definition, that a is a sentence, and thus (S(*a*),e) ~. G,w This last result is in contradiction with our hypothesis. On the other hand, if this sequence is not a (valid) derivation, and since the second clause cannot produce a (valid) derivation for S(*a*) either, we can conclude that we have S(*a*) =~ e. Since, by the first clause, G,zv for any binding p of X we have S(p) ~ S(p), we con- G,w clude that, in contradiction with our hypothesis, the initial sequence is a derivation.</Paragraph>
    <Paragraph position="17"> In \[Boullier 98a\], we presented a parsing algorithm which, for an RCG G and an input string of length n, produces a parse forest in time polynomial with n and linear with IGI. The degree of this polynomial is at most the maximum number of free (independent) bounds in a clause. Intuitively, if we consider an instantiation of a clause, all its terminal symbols, variable, arguments are bound to ranges. This means that each position (bound) in its arguments is mapped onto a source index, a position in the source text. However, at some times, the knowledge of a basic subset of couples (bound, source index) is sufficient to deduce the full mapping. 4 We call number of free bounds, the minimum cardinality of such a basic subset.</Paragraph>
    <Paragraph position="18"> In the sequel we will assume that the predicate names len, and eq are defined: s * len(l, X) checks that the size of the range denoted by the variable X is the integer l, and * eq(X, Y) checks that the substrings selected by the ranges X and Y are equal.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML