File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1503_intro.xml
Size: 7,221 bytes
Last Modified: 2025-10-06 14:02:38
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1503"> <Title>A Simple String-Rewriting Formalism for Dependency Grammar</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Dependency grammar has a long tradition in syntactic theory, dating back to at least Tesni`ere's work from the thirties. Recently, it has gained renewed attention as empirical methods in parsing have emphasized the importance of relations between words (see, e.g., (Collins, 1997)), which is what dependency grammars model explicitly, but context-free phrase-structure grammars do not. In this paper, we address an important issue in using grammar formalisms: the compact representation of the parse forest. Why is this an important issue? It is well known that for non-toy grammars and non-toy examples, a sentence can have a staggeringly large number of analyses (for example, using a context-free grammar (CFG) extracted from the Penn Treebank, a sentence of 25 words may easily have 1,000,000 or more analyses). By way of an example of an ambiguous sentence (though with only two readings), the two dependency representations for the ambiguous sentence (1) are given in Figure 1.</Paragraph> <Paragraph position="1"> (1) Pilar saw a man with a telescope It is clear that if we want to evaluate each possible analysis (be it using a probabilistic model or a different method, for example a semantic checker), we cannot efficiently do so if we enumerate all cases.1 We have two options: we can either use a greedy heuristic method for checking which does not examine all possible solutions, which entails we may miss the optimal solution, or we perform our checking operation on a representation which encodes all options in a compact representation.</Paragraph> <Paragraph position="2"> This is possible because the exponential number of possible analyses (exponential in the length of the input sentence) share subanalyses, thus making a polynomial-size representation possible. This representation is called the shared parse forest and it has been extensively studied for CFGs (see, for example, (Lang, 1991)). To our knowledge, there has been no description of the notion of shared parse forest for dependency trees to date. In this paper, we propose a formalization which is very closely based on the shared parse forest for CFG. We achieve this by defining a generative string-rewriting formalism whose derivation trees are dependency trees.</Paragraph> <Paragraph position="3"> The formalism, and the corresponding shared parse forests, are used in a probabilistic chart parser for dependency grammar, which is described in (Nasr and Rambow, 2004b).</Paragraph> <Paragraph position="4"> While there has been much work on formalizing dependency grammar and on parsing algorithms for dependency grammars in the past, we observe that there is not, to our knowledge, a complete generative formalization2 of dependency grammar based on string-rewriting in which the derivation structure is exactly the desired dependency structure. The most salient reason for the lack of such a generative dependency grammar is the absence of non- null we refer to a type of mathematical formalization, not to the school of linguistics known as &quot;Generative Grammar&quot;. terminal symbols in a dependency tree, which prevents us from interpreting it as a derivation structure in a system that distinguishes between terminal and nonterminal symbols. The standard solution to this problem, proposed by Gaifman (1965), is to introduce nonterminal symbols denoting lexical categories, as depicted in figure 2 (called the &quot;labelled phrase-structure trees induced by a dependency tree&quot; by Gaifman (1965)). Clearly, the &quot;pure&quot; dependency tree can be derived in a straightforward manner. The string rewriting system described in (Gaifman, 1965) generates as derivation structures this kind of trees.</Paragraph> <Paragraph position="5"> There is however a deeper problem when considering dependency trees as derivation structures, following from the fact that in a dependency tree, modifiers3 are direct dependents of the head they modify, and (in certain syntactic contexts) the number of modifiers is unbounded. Thus, if we wish to obtain a tree as shown in Figure 2, we need to have productions whose right-hand side is of unbounded size, which is not possible in a context-free grammar. Indeed, the formalization of dependency grammar proposed by Gaifman (1965) is unsatisfactory in that it does not allow for an unbounded number of modifiers! In this paper, we follow a suggestion made by Abney (1996) and worked out in some detail in (Lombardo, 1996)4 to extend Gaifman's notation with regular expressions, similar to the approach used in extended context-free grammars. The result is a simple generative formalism which has the property that the derivation structures are dependency trees, except for the introduction of pre-terminal nodes as shown in Figure 2. We do not mean to imply that our formalism is substantially different from previous formalizations of dependency grammar; the goal of this paper is to present a clean and easy-to-use generative formalism with a straightforward notion of parse forest. In particular, our formalism, Generative Dependency Grammar, allows for an unbounded number of daughter nodes in the derivation tree through the use of regular expressions in its rules. The parser uses the 3We use the term modifier in its linguistic sense as a type of syntactic dependency (another type being argument). We use head (or mother) and dependent (or daughter) to refer to nodes in a tree. Sometimes, in the formal and parsing literature, modifier is used to designate any dependent node, but we consider that usage confusing because of the related but different meaning of the term modifier that is well-established in the linguistic literature.</Paragraph> <Paragraph position="6"> 4In fact, much of our formalism is very similar to (Lombardo, 1996), who however does not discuss parsing (only recognition), nor the representation of the parse forest. corresponding finite-state machines which straight-forwardly allows for a binary-branching representation of the derivation structure for the purpose of parsing, and thus for a compact (polynomial and not exponential) representation of the parse forest.</Paragraph> <Paragraph position="7"> This formalism is based on previous work presented in (Kahane et al., 1998), which has been substantially reformulated in order to simplify it.5 In particular, we do not address non-projectivity here, but acknowledge that for certain languages it is a crucial issue. We will extend our basic approach in the spirit of (Kahane et al., 1998) in future work.</Paragraph> <Paragraph position="8"> The paper is structured as follows. We start out by surveying previous formalizations of dependency grammar in Section 2. In Section 3, we introduce several formalisms, including Generative Dependency Grammar. We present a parsing algorithm in Section 4, and mention empirical results in Section 5. We then conclude.</Paragraph> </Section> class="xml-element"></Paper>