File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/e06-1010_metho.xml

Size: 10,614 bytes

Last Modified: 2025-10-06 14:10:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-1010">
  <Title>Constraints on Non-Projective Dependency Parsing</Title>
  <Section position="3" start_page="73" end_page="74" type="metho">
    <SectionTitle>
2 Dependency Graphs
</SectionTitle>
    <Paragraph position="0"> A dependency graph is a labeled directed graph, the nodes of which are indices corresponding to the tokens of a sentence. Formally: Definition 1 Given a set R of dependency types (arc labels), a dependency graph for a sentence</Paragraph>
    <Paragraph position="2"> 1. V = Zn+1 2. E[?]V xV 3. L : E-R Definition 2 A dependency graph G is well-formed if and only if: 1. The node 0 is a root (ROOT). 2. G is connected (CONNECTEDNESS).1  The set of V of nodes (or vertices) is the set Zn+1 ={0,1,2,...,n}(n[?]Z+), i.e., the set of non-negative integers up to and including n. This means that every token index i of the sentence is a node (1[?]i[?]n) and that there is a special node 0, which does not correspond to any token of the sentence and which will always be a root of the dependency graph (normally the only root). The set E of arcs (or edges) is a set of ordered pairs (i,j), whereiandj are nodes. Since arcs are used to represent dependency relations, we will 1To be more exact, we require G to be weakly connected, which entails that the corresponding undirected graph is connected, whereas a strongly connected graph has a directed path between any pair of nodes.</Paragraph>
    <Paragraph position="3">  say that i is the head and j is the dependent of the arc (i,j). As usual, we will use the notation i - j to mean that there is an arc connecting i and j (i.e., (i,j) [?] E) and we will use the notation i-[?] j for the reflexive and transitive closure of the arc relation E (i.e., i -[?] j if and only if i = j or there is a path of arcs connecting i to j). The function L assigns a dependency type (arc label) r [?]R to every arc e[?]E. Figure 1 shows a Czech sentence from the Prague Dependency Treebank with a well-formed dependency graph according to Definition 1-2.</Paragraph>
  </Section>
  <Section position="4" start_page="74" end_page="74" type="metho">
    <SectionTitle>
3 Constraints
</SectionTitle>
    <Paragraph position="0"> Theonly conditions sofarimposedondependency graphs is that the special node 0 be a root and that the graph be connected. Here are three further constraints that are common in the literature:  3. Every node has at most one head, i.e., if i-j then there is no node k such that k negationslash= i and k-j (SINGLE-HEAD).</Paragraph>
    <Paragraph position="1"> 4. The graph G is acyclic, i.e., if i-j then not j-[?] i (ACYCLICITY).</Paragraph>
    <Paragraph position="2"> 5. The graph G is projective, i.e., if i-j then  i-[?] k, for every node k such that i &lt; k &lt; j or j &lt; k &lt; i (PROJECTIVITY).</Paragraph>
    <Paragraph position="3"> Note that these conditions are independent in that none of them is entailed by any (combination) of the others. However, the conditions SINGLE-HEAD and ACYCLICITY together with the basic well-formedness conditions entail that the graph is a tree rooted at the node 0. These constraints are assumed in almost all versions of dependency grammar, especially in computational systems. By contrast, the PROJECTIVITY constraint is much more controversial. Broadly speaking, we can say that whereas most practical systems for dependency parsing do assume projectivity, most dependency-based linguistic theories donot. More precisely, most theoretical formulations of dependency grammar regard projectivity as the norm but also recognize the need for non-projective representations to capture non-local dependencies (Mel'Vcuk, 1988; Hudson, 1990).</Paragraph>
    <Paragraph position="4"> In order to distinguish classes of dependency graphs thatfallinbetweenarbitrary non-projective and projective, we define a notion of degree of non-projectivity, such that projective graphs have degree 0 while arbitrary non-projective graphs have unbounded degree.</Paragraph>
    <Paragraph position="5"> Definition 3 LetG = (V,E,L) beawell-formed  dependency graph, satisfying SINGLE-HEAD and ACYCLICITY, and let Ge be the subgraph of G that only contains nodes between i and j for the arc e = (i,j) (i.e., Ve ={i+1,...,j[?]1}if i &lt; j and Ve ={j+1,...,i[?]1}if i &gt; j).</Paragraph>
    <Paragraph position="6"> 1. The degree of an arc e[?]E is the number of connected components c in Ge such that the root of c is not dominated by the head of e.</Paragraph>
    <Paragraph position="7"> 2. The degree of G is the maximum degree of  any arc e[?]E.</Paragraph>
    <Paragraph position="8"> To exemplify the notion of degree, we note that the dependency graph in Figure 1 (which satisfies SINGLE-HEAD and ACYCLICITY) has degree 1.</Paragraph>
    <Paragraph position="9"> The only non-projective arc in the graph is (5,1) and G(5,1) contains three connected components, each of which consists of a single root node (2, 3 and 4). Since only one of these, 3, is not dominated by 5, the arc (5,1) has degree 1.</Paragraph>
  </Section>
  <Section position="5" start_page="74" end_page="75" type="metho">
    <SectionTitle>
4 Parsing Algorithm
</SectionTitle>
    <Paragraph position="0"> Covington (2001) describes a parsing strategy for dependency representations that has been known  since the 1960s but not presented in the literature. The left-to-right (or incremental) version of this strategy can be formulated in the following way:</Paragraph>
    <Paragraph position="2"> chooses between (i) adding the arc i - j (with some label), (ii) adding the arc j -i (with some label), and(iii)adding noarcatall. Inthis way, the algorithm builds a graph by systematically trying to link every pair of nodes (i,j) (i &gt; j). This graph will be a well-formed dependency graph, provided that we also add arcs from the root node 0 to every root node in{1,...,n}. Assuming that the LINK(i, j)operation can be performed in some constant time c, the running time of the algorithm issummationtextni=1 c(n[?]1) = c(n22 [?] n2), which in terms of asymptotic complexity is O(n2).</Paragraph>
    <Paragraph position="3"> In the experiments reported in the following sections, we modify this algorithm by making the performance of LINK(i, j) conditional on the arcs (i,j) and (j,i) being permissible under the given graph constraints:</Paragraph>
    <Paragraph position="5"> The function PERMISSIBLE(i, j, C) returns true iff i - j and j - i are permissible arcs relative to the constraint C and the partially built graph G. For example, with the constraint SINGLE-HEAD, LINK(i, j) will not be performed if both i and j already have a head in the dependency graph. We call the pairs (i,j) (i &gt; j) for which LINK(i, j) is performed (for a given sentence and set of constraints) the active pairs, and we use the number of active pairs, as a function of sentence length, as an abstract measure of running time. This is well motivated if the time required to compute PERMISSIBLE(i,j,C) is insignificant compared to the time needed for LINK(i,j), as is typically the case in data-driven systems, where LINK(i,j) requires a call to a trained classifier, while PERMISSIBLE(i,j,C) only needs access to the partially built graph G.</Paragraph>
    <Paragraph position="6"> Theresults obtained in this waywill be partially dependent on the particular algorithm used, but they can in principle be generalized to any algorithm that tries to link all possible word pairs and that satisfies the following condition: For any graph G = (V,E,L) derived by the algorithm, if e,eprime [?]E and e covers eprime, then the algorithm adds eprime before e.</Paragraph>
    <Paragraph position="7"> This condition is satisfied not only by Covington's incremental algorithm but also by algorithms that add arcs strictly in order of increasing length, such as the algorithm of Eisner (2000) and other algorithms based on dynamic programming.</Paragraph>
  </Section>
  <Section position="6" start_page="75" end_page="76" type="metho">
    <SectionTitle>
5 Experimental Setup
</SectionTitle>
    <Paragraph position="0"> The experiments are based on data from two treebanks. The Prague Dependency Treebank (PDT) contains 1.5M words of newspaper text, annotated in three layers (HajiVc, 1998; HajiVc et al., 2001) according to the theoretical framework of Functional Generative Description (Sgall et al., 1986). Our experiments concern only the analytical layer and are based on the dedicated training section of the treebank. The Danish Dependency Treebank (DDT) comprises 100K words of text selected from the Danish PAROLEcorpus, with annotation of primary and secondary dependencies based on Discontinuous Grammar (Kromann, 2003). Only primary dependencies are considered in the experiments, which are based on 80% of the data (again the standard training section).</Paragraph>
    <Paragraph position="1"> The experiments are performed by parsing each sentence of the treebanks while using the gold standard dependency graph for that sentence as an oracletoresolve thenondeterministic choice inthe LINK(i, j) operation as follows:</Paragraph>
    <Paragraph position="3"> where Eg is the arc relation of the gold standard dependency graph Gg and E is the arc relation of the graph G built by the parsing algorithm.</Paragraph>
    <Paragraph position="4"> Conditions are varied by cumulatively adding constraints in the following order:  1. SINGLE-HEAD 2. ACYCLICITY 3. Degree d[?]k (k[?]1) 4. PROJECTIVITY  The purpose of the experiments is to study how different constraints influence expressivity and running time. The first dimension is investigated by comparing the dependency graphs produced by the parser with the gold standard dependency graphs in the treebank. This gives an indication of the extent to which naturally occurring structures can beparsed correctly under different constraints. The results are reported both as the proportion of individual dependency arcs (per token) and as the proportion of complete dependency graphs (per sentence) recovered correctly by the parser. In order to study the effects on running time, we examine how the number of active pairs varies as a function of sentence length. Whereas the asymptotic worst-case complexity remains O(n2) under all conditions, the average running time will decrease with the number of active pairs if the LINK(i, j) operation is more expensive than the call to PERMISSIBLE(i, j, C). For data-driven dependency parsing, this is relevant not only for parsing efficiency, butalsobecause itmayimprove training efficiency by reducing the number ofpairs that need to be included in the training data.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML