File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-2106_metho.xml
Size: 15,786 bytes
Last Modified: 2025-10-06 14:07:15
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-2106"> <Title>Parsing Schemata for Grammars with Variable Number and Order of Constituents</Title> <Section position="3" start_page="733" end_page="733" type="metho"> <SectionTitle> 2 State Transition Grammars </SectionTitle> <Paragraph position="0"> Wc denote nonterminal symbols with A, B, terminal symbols with a, terminal and nonterminal symbols with X, states with F, strings of symbols with/3, % and the empty string with c. An STG is defined as tbllows: Definition 1 (ST(\]). Art STG G is a tuple</Paragraph> <Paragraph position="2"> Note thai; we do not allow final states in the right-hand side of a production. A pair (F,/3) is called a configuration. If F is a fnal state then (P,/3) is called a final configuration. The reflexive and transitive closure of \[-c, is denoted with H~. The state projection of Hc is the binary relation (Ho) = {(r, r')l /3x: (p,/3) (p',/3x)}.</Paragraph> <Paragraph position="3"> Ha is called context:free iff a transition from (P,/3) does not del)end on fl, tbrmally: for all /3, fl', r, r', x: (r,/3) Ha (r', fiX) iff (r,/3') He; (F',/3'X). The set of terminal states of G is the</Paragraph> <Paragraph position="5"> The language defined by a state P is the set of strings in the final configurations reachable</Paragraph> <Paragraph position="7"> Note that if A --> F is a production then e L(P) (i.e., there are no ~-productions). The derivation relation is defined by 7A5 ==> 7fl5 itf for some production A ~ P: /3 C L(P). The language defined by G is the set of strings in E* that are derivable fi'om the start symbol.</Paragraph> <Paragraph position="8"> We denote a CFG as a tuple (N,E,P,S) where N, E, S are as betbre and P C_ N x V + is a finite set of productions A -+/~. We assume that there are no e-productions.</Paragraph> <Paragraph position="9"> An ECFO can be represented as an extension of a CFO with productions of the tbrm A -+ A, where .A = (V, Q, qo, 5, Of) is a nondeterministic finite automaton (NFA) without e-transitions,</Paragraph> <Paragraph position="11"> with input alphalmt V, state set Q, initial state q0, final (or accepting) states Q f, m~(t tr~msition relation 5 C_ Q x V x Q (I{opcroft and Ullman, 1979). A accepts ~ string fl ill tbr some final st;;~l;e q C Q f, (qo,/'-\], q) ~ 5&quot;. Furl;hermore, we assume that q0 ~ Q f, i.e., ..4 does nol; ac(:ept the emi)l;y word. We can assmne wit;hour loss of generalizal;ion thai, the mfl;omal;a in the right-lmnd sides of a grammar are nll disjoint.</Paragraph> <Paragraph position="12"> Then we cml rel)resent ml ECFG as a tul)le (N, E, Q, Q f, 5,1 ), S) where N, E, Q, Q f, 5, S m'e as befbre and P C N x (2 is ~t finite set of productions A -> q0 (q0 is ml initial st~te.). For rely production p = A ~ q0 let A p = (17, Q, q0, (t, Oj.) l)e the NFA with initiM state q0. The, deriwd;ion relation is detined by 7A5 ~ 7/35 itf fbr some 1)roduction p = A ---> q0, A p accet)ts fl.</Paragraph> <Paragraph position="13"> An ID/LP grnmm~tr is represented as a l;upie (N~ E, \] , LP, S) whoa'e. N, E, S are as before nnd P is a finite set of productions (ID rules) A --+ M, where. A C N ;uid ~4 is ~ multiset over V, and LP is a set ()f line~r l)re(:edence constraints. We are not concerned with de.tails of the LP constra.ints here. We write fl ~ LP to denote that the sl;ring fi s~d;isties all the constraints in l,P. The derivation r(;l~|;ion is defined by 7A5 ~ 7\[3d i1\[ fl = X~...X~ and a > {X~,...,Xk} ~ 1&quot; mM fl ~ LI'.</Paragraph> <Paragraph position="14"> CFG's, ECFG's and ID/LP grmnlnars (:;m t)e chara(:l;erized by al)t)rol)ri~te restrictions on the transition relation and the fired st;~l;es of an STG: ~ * CFG: \]-o is context-free and deterministic, cy(t-6,) is acyelic, 2~4F = T(G).</Paragraph> <Paragraph position="15"> * ECFG: t-a is context-free.</Paragraph> <Paragraph position="16"> * ID/LP: or(t-(;) is aeyclic: J~41,' = T(G), for all F: iffl, 7 C L(F) then 7 is ~t permutal,iolt These conditions define normal-forms of STG's; that is, for STG's that do not, satist~y the conditions for some type there can nevertheless lm strongly equivalent grammars of that; t;ype. These STG's are regarded as degenerate mM are not fllrther considered.</Paragraph> <Paragraph position="17"> of ft.</Paragraph> <Paragraph position="18"> For instance, if G is an STG that satisfies the conditions tbr CFG's, then a CFG G / can be constructed as follows: l,br every production A -~ q0 in G, let A -~ fl be a production in G' whe.re L(qo) = {/3}. Then the deriw~tion relations of G mid G' coincide. Similarly tbr the other grammar tyl)es. Conversely, if ~t grammar is of a given type, l;hen it (:ml be rel)resented as ml STG satist~ying the conditions tbr that type, by spe(:it~ying the states and transition relation, as shown in Table 1 (tO denotes nmltiset lnlion).</Paragraph> </Section> <Section position="4" start_page="733" end_page="736" type="metho"> <SectionTitle> 3 Earley Parsing </SectionTitle> <Paragraph position="0"> Parsing schemat~ were proposed by Sikkel (1.993) as a framework for the specific~tion 0rod comparison) of tabular parsing algorithms.</Paragraph> <Paragraph position="1"> Parsing schemata provide n well-detined level of abstra(:l;ion by al)stra(:ting fi'om (:ontrol structures (i.e., or(lering of operations) and (later structures. A parsing schem;t cmJ \])e implemented as n tabulm: parsing ;flgorithm in ~ em~onical w;~y (Sikkel, 1998).</Paragraph> <Paragraph position="2"> A \])re:sing schema for n gr;tllllll;~r cla,ss is & function that assigns ('.~mh grmnmar and each input string a deduction system, called a parsing sy.ste.m. A parsing schema is usmdly defined by pre.senting a parsing system. A parsing system consists of ~ finite set Z of pars(; items, a finite set &quot;H of hyt)otheses , whi(:h ell(:()(\](; the input string, mxd ~ finite set 29 of deduction stel)s of the fbrm x~,...,x, t- a: where xi C 2; U ~ and x E Z. The hypotheses can be represented as deduction steps with empty prenfises, so we can assume that, all xi m'e it;eros, and represent a parsing system as a pair (Z, 29).</Paragraph> <Paragraph position="3"> Correctness of a l)~rsing system is defined with respect to some item senmntics. Every item denotes a particub~r deriw~tion of some substring of the input string. A parsing syste.m is correct if an item is deducible precisely if it denotes an admissible deriw~tion. Items that denote admissible derivations are called coffee/,.</Paragraph> <Paragraph position="4"> STG's constitute a level of abstraction between grammars and parsing schemata because they can be used to encode various classes of grammars, whereas the mechanism for recognizing admissible sequences of subconstituents by a parsing algorithm is built into the grammar. Thereibre, STG's allow to define the parsing steps separately fiom the mechanism in a grmnmar that specifies admissible sequences of subconstituents.</Paragraph> <Paragraph position="5"> A generalization of Earley's algorithm ibr CFG's (Earley, 1970) to STG's is described by the parsing schema shown in Fig. 1. An item \[A -~/3.P, i, j\] denotes an A-constituent that is partially recognized fi'om position i through j in tile input string, where/3 is the sequence of recognized subconstituents of A, and a sequence of transitions that recognizes ~ can lead to state F. Note that the length of/5 can be restricted to the length of the int)ut string because there are no g-productions.</Paragraph> <Paragraph position="6"> In order to give a precise definition of the semantics of the items, we define a derivation relation which is capable of describing the partial recognition of constituents. This relation is defined on pairs (7, A) where 7 E V* and A is a finite sequence of states (a pair (% A) could be called a super configuration). 7 represents the fi'ont (or yield) of a partial derivation, while A contains one state for every partially recognized constituent.</Paragraph> <Paragraph position="7"> Definition 2. The Earley derivation relation is defined by th, e clauses:</Paragraph> <Paragraph position="9"> The first clause describes the I)artial recognition of an A-constituent, where/3 is the recognized part and tile state P is reached when /3 is recognized. The second clause describes ~he complete recognition of an A-constituent; in this case, the final state is discarded. Each step ill the derivation of a super configuration (% A) corresponds to a sequence of deduction steps in the parsing schema. As a consequence of the second clause we have that w E L(G) iff (S, c) ~* (w, c). Note that ~-, is too weak to de-scribe the recognition of the next subconstituent of a partially recognized constituent, but it is sufficient to define the semantics of the items in tion of the definition of the semantics of Earley items for CFG's (Sikkel, 1993) (al... an is the input string): Theorem 1 (Correctness).</Paragraph> <Paragraph position="10"> F* \[A --+/3.F, i,j\] iff the conditions are satisfied: * for some A, (S, c) \]'--,* (al... aiA, A).</Paragraph> <Paragraph position="12"> The first and third condition are sometimes called top-down and bottom-up condition, respectively. The second condition refers to the partial recognition of the A-constituent.</Paragraph> <Paragraph position="14"/> <Paragraph position="16"> with the following transitions (for all fi): (m, l~) i-(~ (<s~, I~T), (,s~, i~) i-c~ (m, i~+), (qa, i3) t-c (q4,/~S~), (<S4, h ~) i-c; (<S:~, iJ*), (q,~, f~) i-c (<so,/Ja).</Paragraph> <Paragraph position="17"> Table 2 shows soule valid parse items fbr the recognition of the string a * a, together with the conditions according to Theorem 1.</Paragraph> </Section> <Section position="5" start_page="736" end_page="736" type="metho"> <SectionTitle> 4 Bidirectional Parsing </SectionTitle> <Paragraph position="0"> STG's describe the recognition of admissible sequences of subconstituents in unidirectional parsing algorithms, like Earley's algorithm. Bidirectional parsing strategies, e.g., head-conic< strategies, start the recognition of a sequence of subconstituents at sonic position in the middle of the sequence and proceed to both sides. We can define appropriate STG's for 1)idirectional parsing strategies as follows.</Paragraph> <Paragraph position="1"> Definition 3. A h, eaded, bidirectional STG G is like an STG excq~t that P is a finite set of productions of the form A --+ (P,X, A), 'where A c N and X E V and F, A c .M.</Paragraph> <Paragraph position="2"> The two states in a production accOullt for the bidirectional expansion of a constituent. The derivation relation for a headed, bidirectional STG is defined by 7A6 ~ 7fllXfl&quot;6 if\[ for some production A -+ (P, X, A): (fit)-* c L(P) and fi' C L(A) ((S) -1 denotes the inversion of fit).</Paragraph> <Paragraph position="3"> Note that P defines the left part of an adnfissible sequence Doul right to left,.</Paragraph> <Paragraph position="4"> A t)ottom-up head-conmr parsing schema uses items of the tbrm \[A -+ F. fl. A, i, j\] (Schneider, 2000). The semantics of these items is given by the tbllowing clauses: * tbr some production A ~ (P0, X, A0), some fll,fl,.: fl = flZXflr and (P0,e) t-G (r, (/~)-~) dud (A0,~)~o (a,/~&quot;). ,, /3 ~* ai+l.., aj.</Paragraph> </Section> <Section position="6" start_page="736" end_page="737" type="metho"> <SectionTitle> 5 Local Tree Constraints </SectionTitle> <Paragraph position="0"> In this section we discuss the usability of STG's for the design of direct parsing algorithms for grammars that use a set of well-fonnedness conditions, or constraints, expressed in a logical language, to define the admissible syntactic structures (i.e., trees), in contrast to grammars that are based on a derivation mechanism (i.e., production rules). Declarative characterizations of syntactic structures provide a nlealiS to tbrmalize grammatical frameworks, and thus to compare theories expressed in different formalisms. There are also applications in theoretical explorations of the complexity of linguistic theories, based on results which relate language classes to definability of structures in certain logical languages (Rogers, 2000).</Paragraph> <Paragraph position="1"> From a model-theoretic point of view, such a grammar is an axiomatization of a class of structures, and a well-formed syntactic structure is a model of the grammar (Blackt)urn et al., 1993). The connection between models and strings is established via a yield function, which assigns each syntactic structure a string of terminal symbols. The parsing problem can then be stated as the problem: Given a string w and a grammar G, find the models .A4 with A.4 ~ G and yieId(./V4) = w.</Paragraph> <Paragraph position="2"> In many cases, there are eft~ctive methods to translate logical fornmlae into equivalent tree automata (Rogers, 2000) or rule-based grammars (Pahn, 1997). Thus, a possible way to approach the parsing problem is to translate a set of tree constraints into a grammar and use standard parsing methods. However, depending on the expressive power of the logical language, the complexity of the translation often limits this approach in practice.</Paragraph> <Paragraph position="3"> In this section, we consider the possibility to apply tabular parsing methods directly to grammars that consist of sets of tree constraints. The idea is to interleave the translation of tbrmulae into production rules with the recognition of subconstituents. It should be noted that this approach suffers from the same complexity limitations as the pure translation.</Paragraph> <Paragraph position="4"> In Schneider (1999), we used a fragment of a propositional bimodal language to express local constraints on syntactic structures. The two modal operators ($} and (-~) refer to the left-most child and the right sibling, respectively, of a node in a tree. Furthermore, the nesting of ($) is limited to depth one. A so-called modal grammar consists of a formula that represents the conjunction of a set of constraints that must be satisfied at every node of a tree. In addition, a second formula represents a condition tbr the root of a tree.</Paragraph> <Paragraph position="5"> In Schneider (1999), we have also shown how an extension of a standard nlethod tbr automatic proof search in modal logic (socalled analytic labelled tableauz) in conjmmtion with dynamic progrmnming techniques can be employed to parse input strings according to a modal grammar. Basically, a labelled tableau procedure is used to construct a labelled tableau, i.e., a tree labelled with tbrmnlae, by breaking tbrmulae up into subtbrmulae; this tableau may then be used to construct a model tbr the original formula. The extended tableau procedure constructs an infinite tableau that allows to obtain all admissible trees (i.e., models of the grammar).</Paragraph> <Paragraph position="6"> The approach can be described as tbllows: An STG is defined by using certain formulae that appear on the tableau as states, and by defining the transition relation in terms of the tableau rules (i.e., the operations that are used to construct a tableau). The states are formulae of the form x A A<,>o A AI. \]o' A A A\[q ' where X is a propositional variable and \[$\], \[-->\] are the dnal operators to (.\[), (~). X is used as a node \]abe\] in a tree model. The transition relation can be regarded as a silnnlation of the application of tableau rules to fbrmulae, and a tabular parser tbr this STG can be viewed as a tabulation of the (infinite) tal)leau construction. In particular, it should be noted that this construction makes no reference to any particular parsing strategy.</Paragraph> </Section> class="xml-element"></Paper>