File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-2009_intro.xml

Size: 8,150 bytes

Last Modified: 2025-10-06 14:03:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2009">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics A Pipeline Framework for Dependency Parsing</Title>
  <Section position="3" start_page="0" end_page="65" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> A pipeline process over the decisions of learned classi ers is a common computational strategy in natural language processing. In this model a task is decomposed into several stages that are solved sequentially, where the computation in the ith stage typically depends on the outcome of computations done in previous stages. For example, a semantic role labeling program (Punyakanok et al., 2005) may start by using a part-of-speech tagger, then apply a shallow parser to chunk the sentence into phrases, identify predicates and arguments and then classify them to types. In fact, any left to right processing of an English sentence may be viewed as a pipeline computation as it processes a token and, potentially, makes use of this result when processing the token to the right.</Paragraph>
    <Paragraph position="1"> The pipeline model is a standard model of computation in natural language processing for good reasons. It is based on the assumption that some decisions might be easier or more reliable than others, and their outcomes, therefore, can be counted on when making further decisions. Nevertheless, it is clear that it results in error accumulation and suffers from its inability to correct mistakes in previous stages. Researchers have recently started to address some of the disadvantages of this model. E.g., (Roth and Yih, 2004) suggests a model in which global constraints are taken into account in a later stage to x mistakes due to the pipeline. (Punyakanok et al., 2005; Marciniak and Strube, 2005) also address some aspects of this problem. However, these solutions rely on the fact that all decisions are made with respect to the same input; speci cally, all classi ers considered use the same examples as their input. In addition, the pipelines they study are shallow.</Paragraph>
    <Paragraph position="2"> This paper develops a general framework for decisions in pipeline models which addresses these dif culties. Speci cally, we are interested in deep pipelines a large number of predictions that are being chained.</Paragraph>
    <Paragraph position="3"> A pipeline process is one in which decisions made in the ith stage (1) depend on earlier decisions and (2) feed on input that depends on earlier decisions. The latter issue is especially important at evaluation time since, at training time, a gold standard data set might be used to avoid this issue.</Paragraph>
    <Paragraph position="4"> We develop and study the framework in the context of a bottom up approach to dependency parsing. We suggest that two principles to guide the pipeline algorithm development: (i) Make local decisions as reliable as possible.</Paragraph>
    <Paragraph position="5"> (ii) Reduce the number of decisions made.</Paragraph>
    <Paragraph position="6"> Using these as guidelines we devise an algo- null rithm for dependency parsing, prove that it satises these principles, and show experimentally that this improves the accuracy of the resulting tree.</Paragraph>
    <Paragraph position="7"> Speci cally, our approach is based on a shiftreduced parsing as in (Yamada and Matsumoto, 2003). Our general framework provides insights that allow us to improve their algorithm, and to principally justify some of the algorithmic decisions. Speci cally, the rst principle suggests to improve the reliability of the local predictions, which we do by improving the set of actions taken by the parsing algorithm, and by using a look-ahead search. The second principle is used to justify the control policy of the parsing algorithm which edges to consider at any point of time. We prove that our control policy is optimal in some sense, and that the decisions we made, guided by these, principles lead to a signi cant improvement in the accuracy of the resulting parse tree.</Paragraph>
    <Section position="1" start_page="65" end_page="65" type="sub_section">
      <SectionTitle>
1.1 Dependency Parsing and Pipeline Models
</SectionTitle>
      <Paragraph position="0"> Dependency trees provide a syntactic reresentation that encodes functional relationships between words; it is relatively independent of the grammar theory and can be used to represent the structure of sentences in different languages. Dependency structures are more ef cient to parse (Eisner, 1996) and are believed to be easier to learn, yet they still capture much of the predicate-argument information needed in applications (Haghighi et al., 2005), which is one reason for the recent interest in learning these structures (Eisner, 1996; McDonald et al., 2005; Yamada and Matsumoto, 2003; Nivre and Scholz, 2004).</Paragraph>
      <Paragraph position="1"> Eisner's work O(n3) parsing time generative algorithm embarked the interest in this area.</Paragraph>
      <Paragraph position="2"> His model, however, seems to be limited when dealing with complex and long sentences. (Mc-Donald et al., 2005) build on this work, and use a global discriminative training approach to improve the edges' scores, along with Eisner's algorithm, to yield the expected improvement. A different approach was studied by (Yamada and Matsumoto, 2003), that develop a bottom-up approach and learn the parsing decisions between consecutive words in the sentence. Local actions are used to generate a dependency tree using a shift-reduce parsing approach (Aho et al., 1986). This is a true pipeline approach, as was done in other successful parsers, e.g. (Ratnaparkhi, 1997), in that the classi ers are trained on individual decisions rather than on the overall quality of the parser, and chained to yield the global structure. Clearly, it suffers from the limitations of pipeline processing, such as accumulation of errors, but nevertheless, yields very competitive parsing results. A somewhat similar approach was used in (Nivre and Scholz, 2004) to develop a hybrid bottom-up/topdown approach; there, the edges are also labeled with semantic types, yielding lower accuracy than the works mentioned above.</Paragraph>
      <Paragraph position="3"> The overall goal of dependency parsing (DP) learning is to infer a tree structure. A common way to do that is to predict with respect to each potential edge (i, j) in the tree, and then choose a global structure that (1) is a tree and that (2) maximizes some score. In the context of DPs, this edge based factorization method was proposed by (Eisner, 1996). In other contexts, this is similar to the approach of (Roth and Yih, 2004) in that scoring each edge depends only on the raw data observed and not on the classi cations of other edges, and that global considerations can be used to overwrite the local (edge-based) decisions.</Paragraph>
      <Paragraph position="4"> On the other hand, the key in a pipeline model is that making a decision with respect to the edge (i, j) may gain from taking into account decisions already made with respect to neighboring edges. However, given that these decisions are noisy, there is a need to devise policies for reducing the number of predictions in order to make the parser more robust. This is exempli ed in (Yamada and Matsumoto, 2003) a bottom-up approach, that is most related to the work presented here. Their model is a traditional pipeline model a classi er suggests a decision that, once taken, determines the next action to be taken (as well as the input the next action observes).</Paragraph>
      <Paragraph position="5"> In the rest of this paper, we propose and justify a framework for improving pipeline processing based on the principles mentioned above: (i) make local decisions as reliably as possible, and (ii) reduce the number of decisions made. We use the proposed principles to examine the (Yamada and Matsumoto, 2003) parsing algorithm and show that this results in modifying some of the decisions made there and, consequently, better overall dependency trees.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML