XML Viewer - e06-1011

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/e06-1011_evalu.xml
Size: 6,675 bytes
Last Modified: 2025-10-06 13:59:31
<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-1011">
  <Title>Online Learning of Approximate Dependency Parsing Algorithms</Title>
  <Section position="6" start_page="85" end_page="86" type="evalu">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"> The score of adjacent edges relies on the definition of a feature representation f(i,k,j). As noted earlier, thisrepresentation subsumes thefirst-order representation of McDonald et al. (2005b), so we can incorporate all of their features as well as the new second-order features we now describe. The old first-order features are built from the parent and child words, their POS tags, and the POS tags of surrounding words and those of words between the child and the parent, as well as the direction and distance from the parent to the child. The second-order features are built from the following conjunctions of word and POS identity predicates xi-pos, xk-pos, xj-pos xk-pos, xj-pos xk-word, xj-word xk-word, xj-pos xk-pos, xj-word where xi-pos is the part-of-speech of the ith word in the sentence. We also include conjunctions between these features and the direction and distance from siblingj tosiblingk. Wedetermined theusefulness of these features on the development set, which also helped us find out that features such as the POS tags of words between the two siblings would not improve accuracy. We also ignored fea- null tures over triples of words since this would explode the size of the feature space.</Paragraph>
    <Paragraph position="1"> We evaluate dependencies on per word accuracy, which is the percentage of words in the sentence with the correct parent in the tree, and on complete dependency analysis. In our evaluation we exclude punctuation for English and include it for Czech and Danish, which is the standard.</Paragraph>
    <Section position="1" start_page="85" end_page="85" type="sub_section">
      <SectionTitle>
5.1 English Results
</SectionTitle>
      <Paragraph position="0"> To create data sets for English, we used the Yamada and Matsumoto (2003) head rules to extract dependency trees from the WSJ, setting sections 2-21 as training, section 22 for development and section 23 for evaluation. The models rely on part-of-speech tags as input and we used the Ratnaparkhi (1996) tagger to provide these for the development and evaluation set. These data sets are exclusively projective so we only compare the projective parsers using the exact projective parsing algorithms. The purpose of these experiments is to gauge the overall benefit from including second-order features with exact parsing algorithms, which can be attained in the projective setting. Results are shown in Table 1. We can see that there is clearly an advantage in introducing second-order features. In particular, the complete tree metric is improved considerably.</Paragraph>
    </Section>
    <Section position="2" start_page="85" end_page="86" type="sub_section">
      <SectionTitle>
5.2 Czech Results
</SectionTitle>
      <Paragraph position="0"> For the Czech data, we used the predefined training, development and testing split of the Prague Dependency Treebank (HajiVc et al., 2001), and the automatically generated POS tags supplied with the data, which we reduce to the POS tag set from Collins et al. (1999). On average, 23% of the sentences in the training, development and test sets have at least one non-projective dependency, though, less than 2% of total edges are ac- null tually non-projective. Results are shown in Table 2. McDonald et al. (2005c) showed a substantial improvement in accuracy by modeling non-projective edges inCzech, shownbythe difference between two first-order models. Table 2 shows that a second-order model provides a comparable accuracy boost, even using an approximate non-projective algorithm. The second-order non-projective model accuracy of 85.2% is the highest reported accuracy forasingle parserforthese data.</Paragraph>
      <Paragraph position="1"> Similar results were obtained by Hall and N'ov'ak (2005) (85.1% accuracy) who take the best output of the Charniak parser extended to Czech and rerank slight variations on this output that introduce non-projective edges. However, this system relies on a much slower phrase-structure parser as its base model as well as an auxiliary reranking module. Indeed, our second-order projective parser analyzes the test set in 16m32s, and the non-projective approximate parser needs 17m03s toparse theentire evaluation set, showing thatruntime for the approximation is completely dominated by the initial call to the second-order projective algorithm and that the post-process edge transformation loop typically only iterates a few times per sentence.</Paragraph>
    </Section>
    <Section position="3" start_page="86" end_page="86" type="sub_section">
      <SectionTitle>
5.3 Danish Results
</SectionTitle>
      <Paragraph position="0"> For our experiments we used the Danish Dependency Treebank v1.0. The treebank contains a small number of inter-sentence and cyclic dependencies and we removed all sentences that contained such structures. The resulting data set contained 5384 sentences. We partitioned the data into contiguous 80/20 training/testing splits. We held out a subset of the training data for development purposes.</Paragraph>
      <Paragraph position="1"> We compared three systems, the standard second-order projective and non-projective parsing models, as well as our modified second-order non-projective model that allows for the introduction of multiple parents (Section 3). All systems use gold-standard part-of-speech since no trained tagger is readily available for Danish. Results are shown in Figure 3. As might be expected, the non-projective parser does slightly better than the projective parser because around 1% of the edges are non-projective. Since each word may have an arbitrary number of parents, we must use precision and recall rather than accuracy to measure performance. This also means that the correct training loss is no longer the Hamming loss. Instead, we use false positives plus false negatives over edge decisions, which balances precision and recall as our ultimate performance metric.</Paragraph>
      <Paragraph position="2"> As expected, for the basic projective and non-projective parsers, recall is roughly 5% lower than precision since these models can only pick up at most one parent per word. For the parser that can introduce multiple parents, we see an increase in recall of nearly 3% absolute with a slight drop in precision. These results are very promising and further show the robustness of discriminative onlinelearning withapproximate parsing algorithms.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML