File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-1101_intro.xml

Size: 6,590 bytes

Last Modified: 2025-10-06 14:06:32

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1101">
  <Title>Finite-state Approximation of Constraint-based Grammars using Left-corner Grammar Transforms</Title>
  <Section position="2" start_page="0" end_page="619" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> This paper describes a method for approximating grammars with finite-state machines. Unlike the method derived from the LR(k) parsing algorithm described in Pereira and Wright (1991), these methods use grammar transformations based on the left-corner grammar transform (Rosenkrantz and Lewis II, 1970; Aho and Ullman, 1972). One advantage of the left corner methods is that they generalize straightforwardly to complex feature &amp;quot;unification based&amp;quot; grammars, unlike the LR(k) based approach. For example, the implementation described here translates a DCG version of the example grammar given by Pereira and Wright (1991) directly into a FSM without constructing an approximating CFG.</Paragraph>
    <Paragraph position="1"> Left-corner based techniques are natural for this kind of application because (with the simple optimization described below) they can parse pure left-branching or pure right-branching structures with a stack depth of one (two if terminals are pushed and popped from the stack). Higher stack depth occurs with center-embedded structures, which humans find difficult to comprehend. This suggests that we may get a finite-state approximation to human performance by simply imposing a stack depth bound. We provide a simple tree-geometric description of the configurations that cause an increase in a left corner parser's stack depth below.</Paragraph>
    <Paragraph position="2"> The rest of this paper is structured as follows.</Paragraph>
    <Paragraph position="3"> The remainder of this section outlines the &amp;quot;grammar transform&amp;quot; approach, summarizes the top-down * This research was supported by NSF grant SBR526978. I began this research while I was on sabbatical at the Xerox Research Centre in Grenoble, France. I would like to thank them and my colleages at Brown for their support.</Paragraph>
    <Paragraph position="4"> parsing algorithm and discusses how finite state approximations of top-down parsers can be constructed. The fact that this approximation is not exact for left linear grammars (which define finite-state languages) motivates a finite-state approximation based on the left-corner parsing algorithm (which is presented as a grammar transform in section 2).</Paragraph>
    <Paragraph position="5"> In its standard form the approximation based on the left-corner parsing algorithm suffers from the complementary problem to the top-down approximation: it is not exact for right-linear grammars, but the &amp;quot;optimized&amp;quot; variants presented in section 3 overcome this deficiency, resulting in finite-state CFG approximations which are exact for left-linear and right-linear grammars. Section 4 discusses how these techniques can be combined in an implementation.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.1 Parsing strategies as grammar
transformations
</SectionTitle>
      <Paragraph position="0"> The parsing algorithms discussed here are presented as grammar trans\]ormations, i.e., functions T that map a context-free grammar G into another context-free grammar T(G). The transforms have the prop-erty that a top-down parse using the transformed grammar is isomorphic to some other kind of parse using the original grammar. Thus grammar transforms provide a simple, compact way of describing various parsing algorithms, as a top-down parser using T(G) behaves identically to the kind of parser we want to study using G.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="619" type="sub_section">
      <SectionTitle>
1.2 Mappings from trees to trees
</SectionTitle>
      <Paragraph position="0"> The transformations presented here can also be understood as isomorphisms from the set of parse trees of the source grammar G to parse trees of the transformed grammar which preserve terminal strings.</Paragraph>
      <Paragraph position="1"> Thus it is convenient to explain the transforms in terms of their effect on parse trees. We call a parse tree with respect to the source grammar G an analysis tree, in order to distinguish it from parse trees with respect to some transform of G. The analysis tree t in Figure 1 will be used as an example throughout this paper.</Paragraph>
      <Paragraph position="3"> that the phonological forms are treated here as annotations on the nodes drawn above them, rather than independent nodes. That is, DEW (annotated with the) is a terminal node.</Paragraph>
    </Section>
    <Section position="3" start_page="619" end_page="619" type="sub_section">
      <SectionTitle>
1.3 Top-down parsers and parse trees
</SectionTitle>
      <Paragraph position="0"> The &amp;quot;predictive&amp;quot; or &amp;quot;top-down&amp;quot; recognition algorithm is one of the simplest CFG recognition algorithms. Given a CFG G = (N, T, P, S), a (topdown) stack state is a sequence of terminals and nonterminals. Let Q = (N U T)* be the set of stack states for G. The start state qo E Q is the sequence S, and the final state ql E Q is the empty sequence e.</Paragraph>
      <Paragraph position="1"> The state transition function 6 : Q x (TU {e}) ~ 2 Q maps a state and a terminal or epsilon into a set of states. It is the smallest function 5 that satisfies the following conditions: -~ ~ ~(a% a) : a ~ T,'~ ~ (N u T)*.</Paragraph>
      <Paragraph position="2"> f17 E ~(AT, e) : A E N, 3' E (N W T)*, A --~ fl * P.</Paragraph>
      <Paragraph position="3"> A string w is accepted by the top-down recognition algorithm if q/ E 5*(q0,w), where 5* is the reflexive transitive closure of 6 with respect to epsilon moves. Extending this top-down parsing algorithm to a 'unification-based' grammar is straight-forward, and described in many textbooks, such as Pereira and Shieber (1987).</Paragraph>
      <Paragraph position="4"> It is easy to read off the stack states of a top-down parser constructing a parse tree from the tree itself. For any node X in the tree, the stack contents of a top-down parser just before the construction of X consists of (the label of) X followed by the sequence of labels on the right siblings of the nodes encountered on the path from X back to the root.</Paragraph>
      <Paragraph position="5"> It is easy to check that a top-down parser requires a stack of depth 3 to construct the tree t depicted in</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML