File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/e06-1010_evalu.xml

Size: 4,514 bytes

Last Modified: 2025-10-06 13:59:32

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-1010">
  <Title>Constraints on Non-Projective Dependency Parsing</Title>
  <Section position="7" start_page="76" end_page="77" type="evalu">
    <SectionTitle>
6 Results and Discussion
</SectionTitle>
    <Paragraph position="0"> Table 1 displays the proportion of dependencies (single arcs) and sentences (complete graphs) in the two treebanks that can be parsed exactly with Covington's algorithm under different constraints.</Paragraph>
    <Paragraph position="1"> Starting at the bottom of the table, we see that the unrestricted algorithm (None) of course reproduces all the graphs exactly, but we also see that the constraints SINGLE-HEAD and ACYCLICITY do not put any real restrictions on expressivity with regard to the data at hand. However, this is primarily a reflection of the design of the treebank annotation schemes, which in themselves require dependency graphs to obey these constraints.2 If we go to the other end of the table, we see that PROJECTIVITY, on the other hand, has a very noticeable effect on the parser's ability to capture the structures found in the treebanks. Almost 25% of the sentences in PDT, and more than 15% in DDT, are beyond its reach. At the level of individual dependencies, the effect is less conspicuous, but it is still the case in PDT that one dependency in twenty-five cannot be found by the parser even with a perfect oracle (one in fifty in DDT). It should be noted that the proportion of lost dependencies is about twice as high as the proportion of dependencies that are non-projective in themselves (Nivre and Nilsson, 2005). This is due to error propagation, since some projective arcs are blocked from the parser's view because of missing non-projective arcs.</Paragraph>
    <Paragraph position="2"> Considering different bounds on the degree of non-projectivity, finally, wesee that even the tightest possible bound (d [?] 1) gives a much better approximation than PROJECTIVITY, reducing the 2Itshould be remembered thatweareonly concerned with one layer of each annotation scheme, the analytical layer in PDT and the primary dependencies in DDT. Taking several layers into account simultaneously would have resulted in more complex structures.</Paragraph>
    <Paragraph position="3">  proportion of non-parsable sentences with about 90% in both treebanks. At the level of individual arcs, the reduction is even greater, about 95% for both data sets. And ifweallow amaximum degree of2, wecan capture morethan99.9% ofall dependencies, and more than 99.5% of all sentences, in both PDTand DDT.Atthe same time, there seems to be no principled upper bound on the degree of non-projectivity, since in PDT not even an upper bound of 10 is sufficient to correctly capture all dependency graphs in the treebank.3 Let us now see how different constraints affect running time, as measured by the number of active pairs in relation to sentence length. A plot of this relationship for a subset of the conditions can be found in Figure 2. For reasons of space, we only display the data from DDT, but the PDT data exhibit very similar patterns. Both treebanks are represented in Table 2, where we show the result of fitting the quadratic equation y = ax + bx2 to the data from each condition (where y is the numberofactivewordsandxisthenumberofwordsin null thesentence). Theamountofvariance explained is given by the r2 value, which shows a very good fit under all conditions, with statistical significance beyond the 0.001 level.4 Both Figure 2 and Table 2 show very clearly that, with no constraints, the relationship between words and active pairs is exactly the one predicted by the worst case complexity (cf. section 4) and that, with each added constraint, this relationship becomes more and more linear in shape. When we get to PROJECTIVITY, the quadratic coefficient b is so small that the average running time is practically linear for the great majority of sentences.  However, the complexity is not much worse for the bounded degrees of non-projectivity (d [?] 1, d [?] 2). More precisely, for both data sets, the linear term ax dominates the quadratic term bx2 for sentences up to 50 words at d [?] 1 and up to 30 words at d [?] 2. Given that sentences of 50 words or less represent 98.9% of all sentences in PDT and 98.3% in DDT (the corresponding percentages for 30 words being 88.9% and 86.0%), it seems that the average case running time can be regarded as linear also for these models.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML