File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/88/p88-1032_intro.xml
Size: 6,067 bytes
Last Modified: 2025-10-06 14:04:43
<?xml version="1.0" standalone="yes"?> <Paper uid="P88-1032"> <Title>AN EAR.LEY-TYPE PAR.SING ALGOR.ITHM FOR. TR.EE ADJOINING GR_kMMAR.S *</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Although formal properties of Tree Adjoining Grammars (TAGs) have been investigated (Vijay-Shanker, 1987)--for example, there is an O(ns)time CKY-like algorithm for TAGs (Vijay-Shanker and Joshi, 1985)--so far there has been no attempt to develop an Earley-type parser for TAGs.</Paragraph> <Paragraph position="1"> This paper presents an Earley parser for TAGs and discusses modifications to the parsing algorithm that make it possible to handle extensions of TAGs such as constraints on adjunction, sub*This work is partially supported by ARO grant DAA29-84-9-007, DARPA grant N0014-85-K0018, NSF grants MCS-82-191169 and DCR-84-10413. The authors would like to express their gratitude to Vijay-Shankc~r for his helpful comments relating to the core of the algorithm, Richard Billington and Andrew Chalnlck for their graphical TAG editor which we integrated in our system and for their programming advice. Tb,m~ are also due to Anne Abeill~ and Ellen Hays.</Paragraph> <Paragraph position="2"> stitution, and feature structure representation for TAGs.</Paragraph> <Paragraph position="3"> TAGs were first introduced by Joshi, Levy and Takahashi (1975) and Joshi (1983). We describe very briefly the Tree Adjoining Grammar formalism. For more details we refer the reader to Joshi (1983), Kroch and Joshi (1985) or Vijay-Shanker (1987). Definition 1 (Tree Adjoining Grammar) : A TAG is a 5-tuple G -- (VN, VT,S,I,A) where VN is a finite set of non-terminal symbols, VT is a finite set of terminals, S is a distinguished nonterminal, I is a finite set of trees called initial trees and A is a finite set of trees called auxiliary trees. The trees in I U A are called elementary trees.</Paragraph> <Paragraph position="4"> Initial trees (see left tree in Figure 1) are characterized as follows: internal nodes are labeled by non-terminals; leaf nodes are labeled by either terminal symbols or the empty string.</Paragraph> <Paragraph position="5"> Auxiliary trees (see right tree in Figure 1) are characterized as follows: internal nodes are labeled by non-terminals; leaf nodes are labeled by a terminal or by the empty string except for exactly one node (called the foot node) labeled by a non-terminal; furthermore the label of the foot node is the same as the label of the root node.</Paragraph> <Paragraph position="6"> We now define a composition operation called adjoining or adjunction which builds a new tree from an auxiliary tree/9 and a tree ~ (~ is any tree, initial, auxiliary or tree derived by adjunction). The resulting tree is called a derived tree. Let c~ be a tree containing a node n labeled by X and let fl be an auxiliary tree whose root node is also labeled by X. Then the adjunction of fl to a at node n will be the tree 7 shown in Figure 2. The resulting tree, 7, is built as follows: Then define the tree set of a TAG G, T(G) to be the set of all derived trees starting from initial trees in I. Furthermore, the string language generated by a TAG, L(G), is defined to be the set of all terminal strings of the trees in T(G).</Paragraph> <Paragraph position="7"> TAGs factor recursion and dependencies by extending the domain of locality. They offer novel ways to encode the syntax of natural language grammars as discussed in Kroch and Joshi (1985) and Abeill~ (1988).</Paragraph> <Paragraph position="8"> In 1985, Vijay-Shanker and Joshi introduced a CKY-like algorithm for TAGs. They therefore established O(n 6) time as an upper bound for parsing TAGs. The algorithm was implemented, but in our opinion the result was more theoretical than practical for several reasons. First the algorithm assumes that elementary trees are binary branching and that there are no empty categories on the frontiers of the elementary trees. Second, since it works on nodes that have been isolated from the tree they belong to, it isolates them from their domain of locality. However all important linguistic and computational properties of TAGs follow from this extended domain of locality. And most importantly, although it runs in O(n 6) worst time, it also runs in O(n s) best time. As a consequence, the CKY algorithm is in practice very slow.</Paragraph> <Paragraph position="9"> Since the average time complexity of Earley's parser depends on the grammar and in practice runs much better than its worst time complexity, we decided to try to adapt Earley's parser for CFGs to TAGs. Earley's algorithm for CFGs (Earley, 1970, Aho and Ullman, 1973) is a bottom-up parser which uses top-down information. It manipulates states of the form A -* a.fl\[i\] while using three processors: the predictor, the completot and the scanner. The algorithm for CFGs runs in O(IGl2n s) time and in O(IGI n2) space in all cases, and parses unambiguous grammars in O(n 2) time (n being the length of the input, IGI the size of the grammar).</Paragraph> <Paragraph position="10"> Given a context-free grammar in any form and an input string al &quot;'an, Earley's parser for CFGs maintains the following invariant: The state A --* a./3\[i\] is in states set Skiff S ::b 6A'r, 6 :bal &quot; &quot;ai and a ~ ai+l &quot;&quot;ak The correctness of the algorithm is a corollary of this invariant.</Paragraph> <Paragraph position="11"> Finding a Earley-type parser for TAGs was a difficult task because it was not clear how to parse TAGs bottom up using top-down information while scanning the input string from left to right. In order to construct an Earley-type parser for TAGs, we will extend the notions of dotted rules and states to trees. Anticipating the proof of correctness and soundness of our algorithm, we will state an invariant similar to Earley's original invariant. Then we present the algorithm and its main extensions.</Paragraph> </Section> class="xml-element"></Paper>