XML Viewer - w05-1513

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-1513_metho.xml
Size: 14,651 bytes
Last Modified: 2025-10-06 14:10:00
<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-1513">
  <Title>Vancouver, October 2005. c(c)2005 Association for Computational Linguistics A Classifier-Based Parser with Linear Run-Time Complexity</Title>
  <Section position="4" start_page="125" end_page="128" type="metho">
    <SectionTitle>
2 Parser Description
</SectionTitle>
    <Paragraph position="0"> Our parser employs a basic bottom-up shift-reduce parsing algorithm, requiring only a single pass over the input string. The algorithm considers only trees with unary and binary branching. In order to use trees with arbitrary branching for training, or generating them with the parser, we employ an instance of the transformation/detransformation process described in (Johnson, 1998). In our case, the transformation step involves simply converting each production with n children (where n &gt; 2) into n - 1 binary productions. Trees must be lexicalized1, so that the newly created internal structure of constituents with previous branching of more than two contains only subtrees with the same lexical head as the original constituent. Additional non-terminal symbols introduced in this process are clearly marked. The transformed (or &amp;quot;binarized&amp;quot;) trees may then be used for training. Detransformation is applied to trees produced by the parser. This involves the removal of non-terminals intro- null node (NP) with four children. In the transformed tree, internal structure (marked by nodes with asterisks) was added to the subtree rooted by the node with more than two children. The word &amp;quot;dog&amp;quot; is the head of the original NP, and it is kept as the head of the transformed NP, as well as the head of each NP* node.</Paragraph>
    <Paragraph position="1">  duced in the transformation process, producing trees with arbitrary branching. An example of transformation/detransformation is shown in figure</Paragraph>
    <Section position="1" start_page="126" end_page="126" type="sub_section">
      <SectionTitle>
2.1 Algorithm Outline
</SectionTitle>
      <Paragraph position="0"> The parsing algorithm involves two main data structures: a stack S, and a queue W. Items in S may be terminal nodes (POS-tagged words), or (lexicalized) subtrees of the final parse tree for the input string. Items in W are terminals (words tagged with parts-of-speech) corresponding to the input string. When parsing begins, S is empty and W is initialized by inserting every word from the input string in order, so that the first word is in front of the queue.</Paragraph>
      <Paragraph position="1"> Only two general actions are allowed: shift and reduce. A shift action consists only of removing (shifting) the first item (POS-tagged word) from W (at which point the next word becomes the new first item), and placing it on top of S. Reduce actions are subdivided into unary and binary cases.</Paragraph>
      <Paragraph position="2"> In a unary reduction, the item on top of S is popped, and a new item is pushed onto S. The new item consists of a tree formed by a non-terminal node with the popped item as its single child. The lexical head of the new item is the same as the lexical head of the popped item. In a binary reduction, two items are popped from S in sequence, and a new item is pushed onto S. The new item consists of a tree formed by a non-terminal node with two children: the first item popped from S is the right child, and the second item is the left child.</Paragraph>
      <Paragraph position="3"> The lexical head of the new item is either the lexical head of its left child, or the lexical head of its right child.</Paragraph>
      <Paragraph position="4"> If S is empty, only a shift action is allowed. If W is empty, only a reduce action is allowed. If both S and W are non-empty, either shift or reduce actions are possible. Parsing terminates when W is empty and S contains only one item, and the single item in S is the parse tree for the input string. Because the parse tree is lexicalized, we also have a dependency structure for the sentence. In fact, the binary reduce actions are very similar to the reduce actions in the dependency parser of Nivre and Scholz (2004), but they are executed in a different order, so constituents can be built. If W is empty, and more than one item remain in S, and no further reduce actions take place, the input string is rejected. null</Paragraph>
    </Section>
    <Section position="2" start_page="126" end_page="127" type="sub_section">
      <SectionTitle>
2.2 Determining Actions with a Classifier
</SectionTitle>
      <Paragraph position="0"> A parser based on the algorithm described in the previous section faces two types of decisions to be made throughout the parsing process. The first type concerns whether to shift or reduce when both actions are possible, or whether to reduce or reject the input when only reduce actions are possible.</Paragraph>
      <Paragraph position="1"> The second type concerns what syntactic structures are created. Specifically, what new non-terminal is introduced in unary or binary reduce actions, or which of the left or right children are chosen as the source of the lexical head of the new subtree produced by binary reduce actions. Traditionally, these decisions are made with the use of a grammar, and the grammar may allow more than one valid action at any single point in the parsing process. When multiple choices are available, a grammar-driven parser may make a decision based on heuristics or statistical models, or pursue every possible action following a search strategy. In our case, both types of decisions are made by a classifier that chooses a unique action at every point, based on the local context of the parsing action, with no explicit grammar. This type of classifier-based parsing where only one path is pursued with no backtracking can be viewed as greedy or deterministic. null In order to determine what actions the parser should take given a particular parser configuration, a classifier is given a set of features derived from that configuration. This includes, crucially, the two topmost items in the stack S, and the item in front of the queue W. Additionally, a set of context features is derived from a (fixed) limited number of items below the two topmost items of S, and following the item in front of W. The specific features are shown in figure 2.</Paragraph>
      <Paragraph position="2"> The classifier's target classes are parser actions that specify both types of decisions mentioned above. These classes are:  * SHIFT: a shift action is taken; * REDUCE-UNARY-XX: a unary reduce action is taken, and the root of the new subtree pushed onto S is of type XX (where XX is a non-terminal symbol, typically NP, VP, PP, for example); * REDUCE-LEFT-XX: a binary reduce action is taken, and the root of the new subtree pushed onto S is of non-terminal type XX.</Paragraph>
      <Paragraph position="3">  Additionally, the head of the new subtree is the same as the head of the left child of the root node; * REDUCE-RIGHT-XX: a binary reduce action is taken, and the root of the new subtree pushed onto S is of non-terminal type XX.</Paragraph>
      <Paragraph position="4"> Additionally, the head of the new subtree is the same as the head of the right child of the root node.</Paragraph>
    </Section>
    <Section position="3" start_page="127" end_page="128" type="sub_section">
      <SectionTitle>
2.3 A Complete Classifier-Based Parser than
Runs in Linear Time
</SectionTitle>
      <Paragraph position="0"> When the algorithm described in section 2.1 is combined with a trained classifier that determines its parsing actions as described in section 2.2, we have a complete classifier-based parser. Training the parser is accomplished by training its classifier.</Paragraph>
      <Paragraph position="1"> To that end, we need training instances that consist of sets of features paired with their classes correLet: null S(n) denote the nth item from the top of the stack S, and W(n) denote the nth item from the front of the queue W. Features:  rectly related to the lexicalized constituent trees that are built during parsing, while the features described in items 8 - 13 are more directly related to the dependency structures that are built simultaneously to the constituent structures.</Paragraph>
      <Paragraph position="2">  sponding to the correct parsing actions. These instances can be obtained by running the algorithm on a corpus of sentences for which the correct parse trees are known. Instead of using the classifier to determine the parser's actions, we simply determine the correct action by consulting the correct parse trees. We then record the features and corresponding actions for parsing all sentences in the corpus into their correct trees. This set of features and corresponding actions is then used to train a classifier, resulting in a complete parser.</Paragraph>
      <Paragraph position="3"> When parsing a sentence with n words, the parser takes n shift actions (exactly one for each word in the sentence). Because the maximum branching factor of trees built by the parser is two, the total number of binary reduce actions is n - 1, if a complete parse is found. If the input string is rejected, the number of binary reduce actions is less than n - 1. Therefore, the number of shift and binary reduce actions is linear with the number of words in the input string. However, the parser as described so far has no limit on the number of unary reduce actions it may take. Although in practice a parser properly trained on trees reflecting natural language syntax would rarely make more than 2n unary reductions, pathological cases exist where an infinite number of unary reductions would be taken, and the algorithm would not terminate. Such cases may include the observation in the training data of sequences of unary productions that cycle through (repeated) non-terminals, such as A-&gt;B-&gt;A-&gt;B. During parsing, it is possible that such a cycle may be repeated infinitely.</Paragraph>
      <Paragraph position="4"> This problem can be easily prevented by limiting the number of consecutive unary reductions that may be made to a finite number. This may be the number of non-terminal types seen in the training data, or the length of the longest chain of unary productions seen in the training data. In our experiments (described in section 3), we limited the number of consecutive unary reductions to three, although the parser never took more than two unary reduction actions consecutively in any sentence. When we limit the number of consecutive unary reductions to a finite number m, the parser makes at most (2n - 1)m unary reductions when parsing a sentence of length n. Placing this limit not only guarantees that the algorithm terminates, but also guarantees that the number of actions taken by the parser is O(n), where n is the length of the input string. Thus, the parser runs in linear time, assuming that classifying a parser action is done in constant time.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="128" end_page="129" type="metho">
    <SectionTitle>
3 Similarities to Previous Work
</SectionTitle>
    <Paragraph position="0"> As mentioned before, our parser shares similarities with the dependency parsers of Yamada and Matsumoto (2003) and Nivre and Scholz (2004) in that it uses a classifier to guide the parsing process in deterministic fashion. While Yamada and Matsumoto use a quadratic run-time algorithm with multiple passes over the input string, Nivre and Scholz use a simplified version of the algorithm described here, which handles only (labeled or unlabeled) dependency structures.</Paragraph>
    <Paragraph position="1"> Additionally, our parser is in some ways similar to the maximum-entropy parser of Ratnaparkhi (1997). Ratnaparkhi's parser uses maximum-entropy models to determine the actions of a shiftreduce-like parser, but it is capable of pursuing several paths and returning the top-K highest scoring parses for a sentence. Its observed time is linear, but parsing is somewhat slow, with sentences of length 20 or more taking more than one second to parse, and sentences of length 40 or more taking more than three seconds. Our parser only pursues one path per sentence, but it is very fast and of comparable accuracy (see section 4). In addition, Ratnaparkhi's parser uses a more involved algorithm that allows it to work with arbitrary branching trees without the need of the binarization transform employed here. It breaks the usual reduce actions into smaller pieces (CHECK and BUILD), and uses two separate passes (not including the POS tagging pass) for determining chunks and higher syntactic structures separately.</Paragraph>
    <Paragraph position="2"> Finally, there have been other deterministic shift-reduce parsers introduced recently, but their levels of accuracy have been well below the stateof-the-art. The parser in Kalt (2004) uses a similar algorithm to the one described here, but the classification task is framed differently. Using decision trees and fewer features, Kalt's parser has significantly faster training and parsing times, but its accuracy is much lower than that of our parser.</Paragraph>
    <Paragraph position="3"> Kalt's parser achieves precision and recall of about 77% and 76%, respectively (with automatically tagged text), compared to our parser's 86% (see section 4). The parser of Wong and Wu (1999) uses a separate NP-chunking step and, like Ratnaparkhi's parser, does not require a binary trans- null form. It achieves about 81% precision and 82% recall with gold-standard tags (78% and 79% with automatically tagged text). Wong and Wu's parser is further differentiated from the other parsers mentioned here in that it does not use lexical items, working only from part-of-speech tags.</Paragraph>
  </Section>
  <Section position="6" start_page="129" end_page="129" type="metho">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"> We conducted experiments with the parser described in section 2 using two different classifiers: TinySVM (a support vector machine implementation by Taku Kudo)2, and the memory-based learner TiMBL (Daelemans et al., 2004). We trained and tested the parser on the Wall Street Journal corpus of the Penn Treebank (Marcus et al., 1993) using the standard split: sections 2-21 were used for training, section 22 was used for development and tuning of parameters and features, and section 23 was used for testing. Every experiment reported here was performed on a Pentium IV 1.8GHz with 1GB of RAM.</Paragraph>
    <Paragraph position="1"> Each tree in the training set had empty-node and function tag information removed, and the</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML