File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/p05-1039_metho.xml

Size: 12,686 bytes

Last Modified: 2025-10-06 14:09:42

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-1039">
  <Title>and smoothing</Title>
  <Section position="5" start_page="316" end_page="317" type="metho">
    <SectionTitle>
3.4 Results
</SectionTitle>
    <Paragraph position="0"> Table 1 shows the effect of rule type choice, and Table 2 lists the effect of the GF re-annotations. From Table 1, we see that Markov rules achieve the best performance, ahead of both standard rules as well as our formulation of probabilistic LP/ID rules.</Paragraph>
    <Paragraph position="1"> In the first group of experiments, suffix analysis marginally lowers performance. However, a different pattern emerges in the second set of experiments. Suffix analysis consistently does better than the simpler word generation probability model.</Paragraph>
    <Paragraph position="2"> Looking at the treebank transformations with suffix analysis enabled, we find the coordination re-annotation provides the greatest benefit, boosting performance by 2.4 to 71.5. The NP and PP case re-annotations together raise performance by 1.2 to 72.7. While the SBAR annotation slightly lowers performance, removing the GF labels from S nodes increased performance to 73.1.</Paragraph>
    <Section position="1" start_page="316" end_page="317" type="sub_section">
      <SectionTitle>
3.5 Discussion
</SectionTitle>
      <Paragraph position="0"> There are two primary results: first, although LP/ID rules have been suggested as suitable for German's flexible word order, it appears that Markov rules actually perform better. Second, adding suffix analysis provides a clear benefit, but only after the inclusion of the Coord GF transformation.</Paragraph>
      <Paragraph position="1"> While the SBAR transformation slightly reduces performance, recall that we argued the S GF transformation only made sense if the SBAR transforma- null tion is already in place. To test if this was indeed the case, we re-ran the final experiment, but excluded the SBAR transformation. We did indeed find that applying S GF without the SBAR transformation reduced performance.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="317" end_page="318" type="metho">
    <SectionTitle>
4 Smoothing &amp; Search
</SectionTitle>
    <Paragraph position="0"> With the exception of DOP models (Bod, 1995), it is uncommon to smooth unlexicalized grammars. This is in part for the sake of simplicity: unlexicalized grammars are interesting because they are simple to estimate and parse, and adding smoothing makes both estimation and parsing nearly as complex as with fully lexicalized models. However, because lexicalization adds little to the performance of German parsing models, it is therefore interesting to investigate the impact of smoothing on unlexicalized parsing models for German.</Paragraph>
    <Paragraph position="1"> Parsing an unsmoothed unlexicalized grammar is relatively efficient because the grammar constraints the search space. As a smoothed grammar does not have a constrained search space, it is necessary to find other means to make parsing faster. Although it is possible to efficiently compute the Viterbi parse (Klein and Manning, 2002) using a smoothed grammar, the most common approach to increase parsing speed is to use some form of beam search (cf. Goodman (1998)), a strategy we follow here.</Paragraph>
    <Section position="1" start_page="317" end_page="317" type="sub_section">
      <SectionTitle>
4.1 Models
</SectionTitle>
      <Paragraph position="0"> We experiment with three different smoothing models: the modified Witten-Bell algorithm employed by Collins (1999), the modified Kneser-Ney algorithm of Chen and Goodman (1998) the smoothing algorithm used in the POS tagger of Brants (2000). All are variants of linear interpolation, and are used with 2nd order Markovization. Under this regime, the probability of adding the ith child to A a0 B1</Paragraph>
      <Paragraph position="2"> The models differ in how the l's are estimated. For both the Witten-Bell and Kneser-Ney algorithms, the l's are a function of the context Aa7 Bi</Paragraph>
      <Paragraph position="4"> (2000) approach for POS tagging.</Paragraph>
      <Paragraph position="5"> for all possible contexts. As both the Witten-Bell and Kneser-Ney variants are fairly well known, we do not describe them further. However, as Brants' approach (to our knowledge) has not been used elsewhere, and because it needs to be modified for our purposes, we show the version of the algorithm we use in Figure 1.</Paragraph>
    </Section>
    <Section position="2" start_page="317" end_page="317" type="sub_section">
      <SectionTitle>
4.2 Method
</SectionTitle>
      <Paragraph position="0"> The purpose of this is experiment is not only to improve parsing results, but also to investigate the over-all effect of smoothing on parse accuracy. Therefore, we do not simply report results with the best model from Section 3. Rather, we re-do each modification in Section 3 with both search strategies (Viterbi and beam) in the unsmoothed case, and with all three smoothing algorithms with beam search. The beam has a variable width, which means an arbitrary number of edges may be considered, as long as their probability is within 4 a17 10a13 3 of the best edge in a given span.</Paragraph>
    </Section>
    <Section position="3" start_page="317" end_page="318" type="sub_section">
      <SectionTitle>
4.3 Results
</SectionTitle>
      <Paragraph position="0"> Table 3 summarizes the results. The best result in each column is italicized, and the overall best result  in shown in bold. The column titled Viterbi reproduces the second column of Table 2 whereas the column titled Beam shows the result of re-annotation using beam search, but no smoothing. The best result with beam search is 73.3, slightly higher than without beam search.</Paragraph>
      <Paragraph position="1"> Among smoothing algorithms, the Brants approach yields the highest results, of 76.3, with the modified Kneser-Ney algorithm close behind, at 76.2. The modified Witten-Bell algorithm achieved an F-score of 75.7.</Paragraph>
    </Section>
    <Section position="4" start_page="318" end_page="318" type="sub_section">
      <SectionTitle>
4.4 Discussion
</SectionTitle>
      <Paragraph position="0"> Overall, the best-performing model, using Brants smoothing, achieves a labelled bracketing F-score of 76.2, higher than earlier results reported by Dubey and Keller (2003) and Schiehlen (2004).</Paragraph>
      <Paragraph position="1"> It is surprisingly that the Brants algorithm performs favourably compared to the better-known modified Kneser-Ney algorithm. This might be due to the heritage of the two algorithms. Kneser-Ney smoothing was designed for language modelling, where there are tens of thousands or hundreds of thousands of tokens having a Zipfian distribution.</Paragraph>
      <Paragraph position="2"> With all transformations included, the nonterminals of our grammar did have a Zipfian marginal distribution, but there were only several hundred tokens. The Brants algorithm was specifically designed for distributions with fewer tokens.</Paragraph>
      <Paragraph position="3"> Also surprising is the fact that each smoothing algorithm reacted differently to the various treebank transformations. It is obvious that the choice of search and smoothing algorithm add bias to the final result. However, our results indicate that the choice of search and smoothing algorithm also add a degree of variance as improvements are added to the parser.</Paragraph>
      <Paragraph position="4"> This is worrying: at times in the literature, details of search or smoothing are left out (e.g. Charniak (2000)). Given the degree of variance due to search and smoothing, it raises the question if it is in fact possible to reproduce such results without the necessary details.2</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="318" end_page="319" type="metho">
    <SectionTitle>
5 Error Analysis
</SectionTitle>
    <Paragraph position="0"> While it is uncommon to offer an error analysis for probabilistic parsing, Levy and Manning (2003) argue that a careful error classification can reveal possible improvements. Although we leave the implementation of any improvements to future research, we do discuss several common errors. Because the parser with Brants smoothing performed best, we use that as the basis of our error analysis.</Paragraph>
    <Paragraph position="1"> First, we found that POS tagging errors had a strong effect on parsing results. This is surprising, given that the parser is able to assign POS tags with a high degree of accuracy. POS tagging results are comparable to the best stand-alone POS taggers, achieving results of 97.1% on the test set, matching the performance of the POS tagger described by Brants (2000) When GF labels are included (e.g.</Paragraph>
    <Paragraph position="2"> considering ART-SB instead of just ART), tagging accuracy falls to 90.1%. To quantify the effect of POS tagging errors, we re-parsed with correct POS tags (rather than letting the parser guess the tags), and found that labelled bracket F-scores increase from 76.3 to 85.2. A manual inspection of 100 sentences found that GF mislabelling can accounts for at most two-thirds of the mistakes due to POS tags.</Paragraph>
    <Paragraph position="3"> Over one third was due to genuine POS tagging errors. The most common problem was verb mistagging: they are either confused with adjectives (both 2As an anonymous reviewer pointed out, it is not always straightforward to reproduce statistical parsing results even when the implementation details are given (Bikel, 2004).</Paragraph>
    <Paragraph position="4">  take the common -en suffix), or the tense was incorrect. Mistagged verb are a serious problem: it entails an entire clause is parsed incorrectly. Verb mistagging is also a problem for other languages: Levy and Manning (2003) describe a similar problem in Chinese for noun/verb ambiguity. This problem might be alleviated by using a more detailed model of morphology than our suffix analyzer provides.</Paragraph>
    <Paragraph position="5"> To investigate pure parsing errors, we manually examined 100 sentences which were incorrectly parsed, but which nevertheless were assigned the correct POS tags. Incorrect modifier attachment accounted for for 39% of all parsing errors (of which 77% are due to PP attachment alone). Misparsed co-ordination was the second most common problem, accounting for 15% of all mistakes. Another class of error appears to be due to Markovization. The boundaries of VPs are sometimes incorrect, with the parser attaching dependents directly to the S node rather than the VP. In the most extreme cases, the VP had no verb, with the main verb heading a subordinate clause.</Paragraph>
  </Section>
  <Section position="8" start_page="319" end_page="319" type="metho">
    <SectionTitle>
6 Comparison with Previous Work
</SectionTitle>
    <Paragraph position="0"> Table 4 lists the result of the best model presented here against the earlier work on NEGRA parsing described in Dubey and Keller (2003) and Schiehlen (2004). Dubey and Keller use a variant of the lexicalized Collins (1999) model to achieve a labelled bracketing F-score of 74.1%. Schiehlen presents a number of unlexicalized models. The best model on labelled bracketing achieves an F-score of 71.8%.</Paragraph>
    <Paragraph position="1"> The work of Schiehlen is particularly interesting as he also considers a number of transformations to improve the performance of an unlexicalized parser. Unlike the work presented here, Schiehlen does not attempt to perform any suffix or morphological analysis of the input text. However, he does suggest a number of treebank transformations. One such transformation is similar to one we prosed here, the NP case transformation. His implementation is different from ours: he annotates the case of pronouns and common nouns, whereas we focus on articles and pronouns (articles are pronouns are more strongly marked for case than common nouns). The remaining transformations we present are different from those Schiehlen describes; it is possible that an even better parser may result if all the transformations were combined.</Paragraph>
    <Paragraph position="2"> Schiehlen also makes use of a morphological analyzer tool. While this includes more complete information about German morphology, our suffix analysis model allows us to integrate morphological ambiguities into the parsing system by means of lexical generation probabilities.</Paragraph>
    <Paragraph position="3"> Levy and Manning (2004) also present work on the NEGRA treebank, but are primarily interested in long-distance dependencies, and therefore do not report results on local dependencies, as we do here.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML