File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1010_intro.xml

Size: 4,858 bytes

Last Modified: 2025-10-06 14:02:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1010">
  <Title>Deterministic Dependency Parsing of English Text</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> There has been a steadily increasing interest in syntactic parsing based on dependency analysis in recent years. One important reason seems to be that dependency parsing offers a good compromise between the conflicting demands of analysis depth, on the one hand, and robustness and efficiency, on the other. Thus, whereas a complete dependency structure provides a fully disambiguated analysis of a sentence, this analysis is typically less complex than in frameworks based on constituent analysis and can therefore often be computed deterministically with reasonable accuracy. Deterministic methods for dependency parsing have now been applied to a variety of languages, including Japanese (Kudo and Matsumoto, 2000), English (Yamada and Matsumoto, 2003), Turkish (Oflazer, 2003), and Swedish (Nivre et al., 2004).</Paragraph>
    <Paragraph position="1"> For English, the interest in dependency parsing has been weaker than for other languages. To some extent, this can probably be explained by the strong tradition of constituent analysis in Anglo-American linguistics, but this trend has been reinforced by the fact that the major treebank of American English, the Penn Treebank (Marcus et al., 1993), is annotated primarily with constituent analysis. On the other hand, the best available parsers trained on the Penn Treebank, those of Collins (1997) and Charniak (2000), use statistical models for disambiguation that make crucial use of dependency relations. Moreover, the deterministic dependency parser of Yamada and Matsumoto (2003), when trained on the Penn Treebank, gives a dependency accuracy that is almost as good as that of Collins (1997) and Charniak (2000).</Paragraph>
    <Paragraph position="2"> The parser described in this paper is similar to that of Yamada and Matsumoto (2003) in that it uses a deterministic parsing algorithm in combination with a classifier induced from a treebank. However, there are also important differences between the two approaches. First of all, whereas Yamada and Matsumoto employs a strict bottom-up algorithm (essentially shift-reduce parsing) with multiple passes over the input, the present parser uses the algorithm proposed in Nivre (2003), which combines bottom-up and top-down processing in a single pass in order to achieve incrementality. This also means that the time complexity of the algorithm used here is linear in the size of the input, while the algorithm of Yamada and Matsumoto is quadratic in the worst case.</Paragraph>
    <Paragraph position="3"> Another difference is that Yamada and Matsumoto use support vector machines (Vapnik, 1995), while we instead rely on memory-based learning (Daelemans, 1999).</Paragraph>
    <Paragraph position="4"> Most importantly, however, the parser presented in this paper constructs labeled dependency graphs, i.e. dependency graphs where arcs are labeled with dependency types. As far as we know, this makes it different from all previous systems for dependency parsing applied to the Penn Treebank (Eisner, 1996; Yamada and Matsumoto, 2003), although there are systems that extract labeled grammatical relations based on shallow parsing, e.g. Buchholz (2002). The fact that we are working with labeled dependency graphs is also one of the motivations for choosing memory-based learning over support vector machines, since we require a multi-class classifier. Even though it is possible to use SVM for multi-class classification, this can get cumbersome when the number of classes is large. (For the</Paragraph>
    <Paragraph position="6"> unlabeled dependency parser of Yamada and Matsumoto (2003) the classification problem only involves three classes.) The parsing methodology investigated here has previously been applied to Swedish, where promising results were obtained with a relatively small treebank (approximately 5000 sentences for training), resulting in an attachment score of 84.7% and a labeled accuracy of 80.6% (Nivre et al., 2004).1 However, since there are no comparable results available for Swedish, it is difficult to assess the significance of these findings, which is one of the reasons why we want to apply the method to a benchmark corpus such as the the Penn Treebank, even though the annotation in this corpus is not ideal for labeled dependency parsing.</Paragraph>
    <Paragraph position="7"> The paper is structured as follows. Section 2 describes the parsing algorithm, while section 3 explains how memory-based learning is used to guide the parser. Experimental results are reported in section 4, and conclusions are stated in section 5.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML