XML Viewer - w04-1503

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-1503_metho.xml
Size: 18,589 bytes
Last Modified: 2025-10-06 14:09:17
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1503">
  <Title>A Simple String-Rewriting Formalism for Dependency Grammar</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Previous Formalizations of Dependency
Grammar
</SectionTitle>
    <Paragraph position="0"> We start out by observing that &amp;quot;dependency grammar&amp;quot; should be contrasted with &amp;quot;phrase structure grammar&amp;quot;, not &amp;quot;CFG&amp;quot;, which is a particular formalization of phrase structure grammar. Thus, just as it only makes sense to study the formal properties of a particular formalization of phrase structure grammar, the question about the formal properties of dependency grammar in general is not well defined, nor the question of a comparison of a dependency formalism with dependency grammar.</Paragraph>
    <Paragraph position="1"> There have been (at least) four types of formalizations of dependency grammars in the past.6 None of these approaches, to our knowledge, discuss the notion of shared parse forest. The first approach (for example, (Lombardo and Lesmo, 2000)) follows Gaifman (1965) in proposing traditional string rewriting rules, which however do not allow for an unbounded number of adjuncts.</Paragraph>
    <Paragraph position="2"> In the second approach, the dependency structure is constructed in reference to a parallel (&amp;quot;deeper&amp;quot;) structure (Sgall et al., 1986; Mel'Vcuk, 1988). Because the rules make reference to other struc5Kahane et al. (1998) present three different types of rules, for subcategorization, modification, and linear precedence. In the formalism presented in this paper, they have been collapsed into one.</Paragraph>
    <Paragraph position="3"> 6We leave aside here work on tree rewriting systems such as Tree Adjoining Grammar, which, when lexicalized, have derivation structures which are very similar to dependency trees. See (Rambow and Joshi, 1997) for a discussion related to TAG, and see (Rambow et al., 2001) for the definition of a tree-rewriting system which can be used to develop grammars whose derivations faithfully mirror syntactic dependency.</Paragraph>
    <Paragraph position="4">  tures, these approaches cannot be formalized in a straightforward manner as context-free rewriting formalisms.</Paragraph>
    <Paragraph position="5"> In the third approach, which includes formalizations of dependency structure such as Dependency Tree Grammar of Modina (see (Dikovsky and Modina, 2000) for an overview), Link Grammar (Sleator and Temperley, 1993) or the tree-composition approach of Nasr (1996), rules construct the dependency tree incrementally; in these approaches, the grammar licenses dependency relations which, in a derivation, are added to the tree one by one, or in groups. In contrast, we are interested in a string-rewriting system; in such a system, we cannot add dependency relations incrementally: all daughters of a node must be added at once to represent a single rewrite step.</Paragraph>
    <Paragraph position="6"> In the fourth approach, the dependency grammar is converted into a headed context-free grammar (Abney, 1996; Holan et al., 1998), also the Basic Dependency Grammar of Beletskij (1967) as cited in (Dikovsky and Modina, 2000). This approach allows for the recovery of the dependency structure both from the derivation tree and from a parse forest represented in polynomial space. (In fact, our parsing algorithm draws on this work.) However, the approach of course requires the introduction of additional nonterminal nodes. Finally, we observe that Recursive Transition Networks (Woods, 1970) can be used to encode a grammar whose derivation trees are dependency trees. However, they are more a general framework for encoding grammars than a specific type of grammar (for example, we can also use them to encode CFGs). In a somewhat related manner, Alshawi et al. (2000) use cascaded head automata to derive dependency trees, but leave the nature of the cascading under-formalized.</Paragraph>
    <Paragraph position="7"> Eisner (2000) provides a formalization of a system that uses two different automata to generate left and right children of a head. His formalism is very close to the one we present, but it is not a string-rewriting formalism (and not really generative at all). We are looking for a precise formulation of a generative dependency grammar, and the question has remained open whether there is an alternate formalism which allows for an unbounded number of adjuncts, introduces all daughter nodes at once in a string-rewriting step, and avoids the introduction of additional nonterminal nodes.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Formalism
</SectionTitle>
    <Paragraph position="0"> In this section we first review the definition of Extended Context-Free Grammar and then show how we use it to model dependency derivations. An Extended Context-Free Grammar (or ECFG for short) is like a context-free grammar (CFG), except that the right-hand side is a regular expression over the terminal and nonterminal symbols of the grammar.</Paragraph>
    <Paragraph position="1"> At each step in a derivation, we first choose a rewrite rule (as we do in CFG), and then we choose a string which is in the language denoted by the regular expression associated with the rule. This string is then treated like the right-hand side of a CFG rewrite rule.</Paragraph>
    <Paragraph position="2"> In the following, if G is a grammar and R a regular expression, then L(G) denotes the language generated by the grammar and L(R) the language denoted by the regular expression. If F is a class of grammars (such as CFG), then L(F) denote the class of languages generated by the grammars in F. We now give a formal definition, which closely follows that given by Albert et al. (1999).7 A Extended Context-Free Grammar is a 4tuple (VN;VT;P;S), where: VN is a finite set of nonterminal symbols, VT is a finite set of terminal symbols (disjoint from VN), P is a finite set of rules, which are ordered pairs consisting of an element of VN and a regular expression over VN [VT, S, a subset of VN, contains the possible start symbols.</Paragraph>
    <Paragraph position="3"> We will use the traditional arrow notation ( !) to write rules.</Paragraph>
    <Paragraph position="4"> For A 2 VN and u;v 2 (VN [ VT) we say that uAv yields uwv (written uAv =) uwv) if A ! R is in P and w is in L(R). The transitive closure of the yield relation (denoted =) ) is defined in the usual manner.</Paragraph>
    <Paragraph position="5"> The language generated by a Extended Context-Free Grammar is the set fw 2 V T j A =)w;A 2 Sg.</Paragraph>
    <Paragraph position="6"> We now define a restricted version of ECFG which we will use for defining dependency grammars. The only new formal requirement is that the rules be lexicalized in the sense of (Joshi and Schabes, 1991). For our formalism, this means that the regular expression in a production is such that each string in its denoted language contains at least one terminal symbol. Linguistically speaking, this means that each rule is associated with exactly 7ECFG has been around informally since the sixties (e.g., the Backus-Naur form); for a slightly different formalization, see (Madsen and Kristensen, 1976), whose definition allows for an infinite rule set.</Paragraph>
    <Paragraph position="7"> one lexical item (which may be multi-word). We will call this particular type of Extended Context-Free Grammar a lexicalized Extended Context-Free Grammar or, for obvious reasons, a Generative Dependency Grammar (GDG for short). When we use a GDG for linguistic description, its left-hand side nonterminal will be interpreted as the lexical category of the lexical item and will represent its maximal projection.8 A Generative Dependency Grammar is a lexicalized ECFG.</Paragraph>
    <Paragraph position="8"> It is sometimes useful to have dependency representations with labeled arcs (typically labeled with syntactic functions such as SUBJ for subject or ADJ for adjunct). There are different ways of achieving this goal; here, we discuss the use of feature structures in conjunction with the nonterminal symbols, for example N[gf:subj] instead of just N. Feature structures are of course useful for other reasons as well, such as expressing morphological features. In terms of our formalism, the use of bounded feature structures can be seen as a shorthand notation for an increased set of nonterminal symbols. The use of feature structures (rather than simple nonterminal symbols) allows for a more perspicuous representation of linguistic relations through the use of underspecification. Note that the use of underspecified feature structures in rules can potentially lead to an exponential increase (exponential in the size of the set of feature values) of the size of the set of rules if rules contain underspecified feature structures on the right-hand side. However, we note that the feature representing, grammatical function will presumably always be fully specified on the right-hand side of a rule (the head determines the function of its dependents). Underspecification in the left-hand side of a rule only leads to linear compactification of the rule set.</Paragraph>
    <Paragraph position="9"> We now give a toy linguistic example. We let GLing be (VN;VT;P;S) as follows:</Paragraph>
    <Paragraph position="11"/>
    <Paragraph position="13"> A derivation is shown in Figure 3; the corresponding derivation tree is shown in the right part of Figure 2. As can be seen, the derivation structure is a dependency tree, except for the use of preterminals, as we desired.</Paragraph>
    <Paragraph position="14"> The first part of the following theorem follows from the existence of a Greibach Normal Form for ECFG (Albert et al., 1999). The second part follows immediately from the closure of CFG under regular substitution.</Paragraph>
    <Paragraph position="16"> Of course, ECFG, GDG and CFG are not strongly equivalent in the standard sense for string rewriting systems of having the same sets of derivation trees. Clearly, ECFG can generate all sets of derivation trees that GDG can, while CFG cannot (because of the unbounded branching factor of ECFG and of GDG); ECFG can also generate all sets of derivation trees that CFG can, while GDG cannot (because of the lexicalization requirement). ECFG thus has a greater strong generative capacity than CFG and GDG, while those of GDG and CFG are incomparable. null It is interesting to notice the difference between the rewriting operation of a nonterminal symbol as defined for a ECFG or a GDG and the equivalent rewriting steps with a weakly equivalent CFG. A GDG rewriting operation of a symbol X using a rule r is decomposed in two stages, the first stage consists in choosing a string w which belongs to the set denoted by the right-hand side of r. During the second stage, X is replaced by w. These two stages are of a different nature, the first concerns the generation of CFG rules (and hence a CFG) using a GDG while the second concerns the generation of a string using the generated CFG. The equivalent rewriting operation (X ) w) with a CFG does not distinguish the same two stages, both the selection of w and the rewriting of X as w are done at the same time.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Parsing Algorithm
</SectionTitle>
    <Paragraph position="0"> The parsing algorithm given here is a simple extension of the CKY algorithm. The difference is in the use of finite state machines in the items in the chart to represent the right-hand sides of the rules of the ECFG.9 A rule with category C as its left-hand side will give rise to a finite state machine which we call a C-rule FSM; its final states mark the completed recognition of a constituent of label C.</Paragraph>
    <Paragraph position="1"> CKY-Style parsing algorithm for Extended Context-Free Grammars.</Paragraph>
    <Paragraph position="2"> Input. A ECFG G and an input string W = w1 wn.</Paragraph>
    <Paragraph position="3"> Output. The parse table T for W such that ti;j contains (M;q) iff M is a C-rule-FSM, q is one of the final states of M, and we have a derivation C +=)wi wj. If i = j, ti;j also contains the input symbol wi.</Paragraph>
    <Paragraph position="4"> Method.</Paragraph>
    <Paragraph position="5"> Initialization: For each i, 1 i n, add wi to ti;i.</Paragraph>
    <Paragraph position="6"> Completion: If ti;j contains either the input symbol w or an item (M;q) such that q is a final state of M, and M is a C-rule-FSM, then add to ti;j all (M0;q0) such that M0 is a rule-FSM which transitions from a start state to state q0 on input w or C. Add a single back-pointer from (M0;q0) in ti;j to (M;q) or w in ti;j.</Paragraph>
    <Paragraph position="7"> 9Recent work in the context of using ECFG for parsing SGML and XML proposes an LL-type parser for ECFG (Br&amp;quot;uggemann-Klein and Wood, 2003); their approach also exploits the automaton representation of the right-hand side of rules, as is natural for an algorithm dealing with ECFG.  N-rule-FSM which corresponds to rule p2 and m3 is a P-rule-FSM which corresponds to rule p3</Paragraph>
    <Paragraph position="9"> Scanning: If (M1;q1) is in ti;k, and tk+1;j contains either the input symbol w or the item (M2;q2) where q2 is a final state and M2 is a C-rule-FSM, then add (M1;q) to ti;j (if not already present) if M1 transitions from q1 to q on either w or C. Add a double backpointer from (M1;q) in ti;j to (M1;q1) in ti;k (left backpointer) and to either w or (M2;q2) in tk+1;j (right backpointer).</Paragraph>
    <Paragraph position="10"> At the end of the parsing process, a packed parse forest has been built. The packed forest corresponding to the parse of sentence Pilar saw a man with a telescope, using the grammar of Section 3 is represented in Figure 4. The nonterminal nodes are labeled with pairs (M;q) where M is an rule-FSM and q a state of this FSM. Three rule-FSMs corresponding to rules p1, p2 and p3 have been represented in Figure 5.</Paragraph>
    <Paragraph position="11"> Obtaining the dependency trees from the packed parse forest is performed in two stages. In a first stage, a forest of binary syntagmatic trees is obtained from the packed forest and in a second stage, each syntagmatic tree is transformed into a dependency tree. We shall not give the details of these processes. The two trees resulting from de-packing of Figure 4 are represented in Figure 6. The different nodes of the syntagmatic tree that will be grouped in a single node of the dependency trees have been circled.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Empirical Results
</SectionTitle>
    <Paragraph position="0"> While the presentation of empirical results is not the object of this paper, we give an overview of some empirical work using ECFG for natural language processing in this section. For full details, we refer to (Nasr and Rambow, 2004a; Nasr and Rambow, 2004b; Nasr, 2004).</Paragraph>
    <Paragraph position="1"> The parser presented in Section 4 above has been implemented. We have investigated the use the parser in a two-step probabilistic framework. In a first step, we determine which rules of the ECFG should be used for each word in the input sentence.</Paragraph>
    <Paragraph position="2"> (Recall that a grammar rule encodes the active and passive valency, as well as how any arguments are realized, for example, fronted or in canonical position.) This step is called supertagging and has been suggested and studied in the context of Tree Adjoining Grammar by Bangalore and Joshi (1999). In a second step, we use a probabilistic ECFG where the probabilities are non-lexical and are based entirely on the grammar rules. We extract the most probable derivation from the compact parse forest using dynamic programming in the usual manner.</Paragraph>
    <Paragraph position="3"> This non-lexical probability model is used because the supertagging step already takes the words in the sentence into account. The probabilities can be encoded directly as weights on the transitions in the rule-FSMs used by the parser.</Paragraph>
    <Paragraph position="4"> The ECFG grammar we use has been automatically extracted from the Penn Treebank (PTB). In fact, we first extract a Tree Insertion Grammar following the work of (Xia et al., 2000; Chen, 2001; Chiang, 2000), and then directly convert the trees of the obtained TAG into automata for the parser.</Paragraph>
    <Paragraph position="5"> It is clear that one could also derive an explicit ECFG in the same manner. The extracted grammar has about 4.800 rules. The probabilities are estimated from the corpus during extraction. Note that there are many different ways of extracting an ECFG from the PTB, corresponding to different theories of syntactic dependency. We have chosen to directly model predicate-argument relations rather than more surface-oriented syntactic relations such as agreement, so that all function words (determiners, auxiliaries, and so on) depend on the lexical word. Strongly governed prepositions are treated as part of a lexeme rather than as full prepositions.</Paragraph>
    <Paragraph position="6"> We have investigated several different ways of modeling the probability of attaching a sequence of modifiers at a certain point in the derivation (conditioning on the position of the modifier in the sequence or conditioning on the previous modifier used). We found that using position or context improves on using neither.</Paragraph>
    <Paragraph position="7"> We have performed two types of experiments: using the correct ECFG rule for each word, and assigning ECFG rules automatically using supertagging. In the case of using the correct supertag, we obtain unlabeled dependency accuracies of about 98% (i.e., in about 2% of cases a word is assigned a wrong governor). Automatic supertagging (using standard n-gram tagging methods) for a grammar our size has an accuracy of about 80%. This is also approximately the dependency accuracy obtained when parsing the output of a supertagger. We conclude from this performance that if we can increase the performance of the supertagger, we can also directly increase the performance of the parser.</Paragraph>
    <Paragraph position="8"> Current work includes examining which grammatical distinctions the grammar should make in order to optimize both supertagging and parsing (Toussenel, 2004).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML