File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-2106_intro.xml

Size: 5,258 bytes

Last Modified: 2025-10-06 14:00:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2106">
  <Title>Parsing Schemata for Grammars with Variable Number and Order of Constituents</Title>
  <Section position="2" start_page="0" end_page="733" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> This t)aper addresses the qllestion of how l;o define (talmlar) parsing algorithms on a greater level of al)straction, in order to apply them to larger (:lasses of grammars (as compared to parsing algorithms tbr context-Dee gramlllars). SllCtl an abstraction is useflll beCallSe it; allows to study l)rot)erties of parsing algorithms, and to compare different parsing algorithms, independently of tile prot)erties of an mtderlying grammar formalism. While previous atteml)ts to define more general parsers have only aimed at expanding the domain of the nontenninal symbols of a grammar (Pereira and Warren, 1983), this paper aims at a generalization of parsing in a difl'erent dimension, namely to include grammars with a flexible constituent sI;ructure, i.e., where tile sequence of subconstituents specified by a grammar production is not fixed. We consider two grammar tbrmalisms: Extended context-ii'ee grammars (ECFG) and ID/LP granllllars.</Paragraph>
    <Paragraph position="1"> ECFG's (sometimes called r(~.q'ular righ, t part grammars) are a generalization of context-free grammars (CFG) in which a grammar production specifies a regular set of sequences of sub-constituents of its left-haM side instead of a fixed sequence of subconstituents. The right-hand side of a production can 1)e represented as a regular set, or a regular expression, or a finite automaton, which are all equivalent concepts (Hopcroft and Ulhnan, 1979). ECFG's are often used by linguistic and programming language grammar writers to represent a (possibly infinite) set of context-free productions as a single production rule (Kaplan and Bresnan, 1982; Woods, 1973). Parsing of ECFG's has been studied t br example ill Purdom, Jr. and Brown (1981)and l;~','r,nakers (1989). 'rab,ll~r parsing teclmiques tbr CFG's can be generalized 1;o ECFG's in a natural way by using the con&gt; putations of the tinite automata in the grammar productions to guide the recognition of new subconstituents. null ID/LP grammars are a variant of CFG's that were introduced into linguistic tbrmalisms to encode word order generalizations (Gazdar et al., 1985). Her(',, the number of snbconstituents of the left-hand side of a production is fixed, but their order can w~ry. ID rules (immediate dominance rules) speci(y the subconstituents of a constituent but leave their order unspeeitied.</Paragraph>
    <Paragraph position="2"> The adnfissible order|rigs of subeonstituents are specified separate, ly by a set of LP constraints (linear precedence constraints).</Paragraph>
    <Paragraph position="3"> A simple approach to ID/LP parsing (called indirect parsing) is to tully expand a grammar into a CFG, but this increases the nmnber of productions significantly. Therefore, direct; parsing algorithms for ID/LP grammars were proposed (Shieber, 1984). It is also possible to encode an ID/LP grammar as an ECFG by interleaving the ID rules with LP checking with- null out increasing the number of productions. However, tbr unification ID/LP grammars, expansion into a CFG or encoding as an ECFG is ruled out because the information contained in the ID rules is only partial and has to be instantiated, which can result in an infinite number of productions. Moreover, Seiffert (1991) has observed that, during the recognition of subconstituents, a subconstituent recognized in one step can instantiate t~atures on another subconstituent recognized in a previous step. Theretbre, all recognized subconstituents must remain accessible fbr LP checking (Morawietz, 1995).</Paragraph>
    <Paragraph position="4"> We define an intermediate tbrmalism between grammars and parsers (called state transition 9rammars, STG) in which different grammar fbrmalisms, including CFG's, ECFG's, and ID/LP grammars can be tel)resented. Moreover, admissible sequences of subconstituents are defined in a way that allows a parser to access subconstituents that were recognized in previous parsing steps. Next, we describe an Earley algorithm tbr STG's, using the parsing schemata ibrmalism of Sikkel (1993). This gives us a very high level description of Earley's algorithm, in which the definition of parsing steps is separated from the properties of the grammar tbrmalism. An Earley algorithm for a grammar may be obtained tiom this description by representing the grammar as an STG.</Paragraph>
    <Paragraph position="5"> The paper is organized as tbllows. In Section 2, we define STG's and give a characterization of various grammar tbrmalisms in terms of properties of STG's. In Section 3 we present an Earley parsing schema for STG's and give a characterization of the wflid parse items. In Section 4, we introduce a variant; of STG's tbr head-corner parsing. In Section 5, we discuss the usability of STG's to define parsers for grammars that define constituent structures by means of local tree constraints, i.e., formulae of a (restricted) logical language. Section 6 presents final conclusions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML