File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/e06-1011_intro.xml
Size: 2,921 bytes
Last Modified: 2025-10-06 14:03:20
<?xml version="1.0" standalone="yes"?> <Paper uid="E06-1011"> <Title>Online Learning of Approximate Dependency Parsing Algorithms</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Dependency representations of sentences (Hudson, 1984; Me'lVcuk, 1988) model head-dependent syntactic relations as edges in a directed graph.</Paragraph> <Paragraph position="1"> Figure 1 displays a dependency representation for the sentence John hit the ball with the bat. This sentence is an example of a projective (or nested) tree representation, in which all edges can be drawn in the plane with none crossing. Sometimes a non-projective representations are preferred, as in the sentence in Figure 2.1 In particular, for freer-word order languages, non-projectivity is a common phenomenon since the relative positional constraints on dependents is much less rigid. The dependency structures in Figures 1 and 2 satisfy the tree constraint: they are weakly connected graphs with a unique root node, and each non-root node has a exactly one parent. Though trees are 1Examples are drawn from McDonald et al. (2005c).</Paragraph> <Paragraph position="2"> more common, some formalisms allow for words to modify multiple parents (Hudson, 1984).</Paragraph> <Paragraph position="3"> Recently, McDonald et al. (2005c) have shown that treating dependency parsing as the search for the highest scoring maximum spanning tree (MST) in a graph yields efficient algorithms for both projective and non-projective trees. When combined with a discriminative online learning algorithm and a rich feature set, these models provide state-of-the-art performance across multiple languages. However, the parsing algorithms require that the score of a dependency tree factors as a sum of the scores of its edges. This first-order factorization is very restrictive since it only allows for features to be defined over single attachment decisions. Previous work has shown that conditioning on neighboring decisions can lead to significant improvements in accuracy (Yamada and Matsumoto, 2003; Charniak, 2000).</Paragraph> <Paragraph position="4"> In this paper we extend the MST parsing framework to incorporate higher-order feature representations of bounded-size connected subgraphs. We also present an algorithm for acyclic dependency graphs, that is, dependency graphs in which a word maydepend on multiple heads. In both cases parsing is in general intractable and we provide novel approximate algorithms to make these cases tractable. We evaluate these algorithms within an online learning framework, which has been shown to be robust with respect approximate inference, and describe experiments displaying that these new models lead to state-of-the-art accuracy for English and the best accuracy we know of for Czech and Danish.</Paragraph> </Section> class="xml-element"></Paper>