File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/j98-2004_evalu.xml

Size: 6,800 bytes

Last Modified: 2025-10-06 14:00:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="J98-2004">
  <Title>New Figures of Merit for Best-First Probabilistic Chart Parsing</Title>
  <Section position="8" start_page="287" end_page="289" type="evalu">
    <SectionTitle>
5.4 Results
</SectionTitle>
    <Paragraph position="0"> The results for the figures of merit introduced in the previous section according to the measurements given in Section 2.2 are shown in Table 3.</Paragraph>
    <Paragraph position="1"> Figure 11 shows a graph of %non-0 E for each sentence length for the boundary models and the trigram and prefix estimates. This graph shows that the contextual information gained from using OL L in the prefix estimate is almost completely included in just the previous tag, as illustrated by the left boundary trigram estimate. Adding right contextual information in the boundary trigram estimate gives us the best performance on this measure of any of our figures of merit.</Paragraph>
    <Paragraph position="2"> We can consider the left boundary trigram estimate to be an approximation of the prefix estimate, where the effect of the left context is approximated by the effect of the single tag to the left. Similarly, the boundary trigram estimate is an approximation to an estimate involving the full context, i.e., an estimate involving the outside probability c~. However, the parser cannot compute the outside probability of a constituent during a parse, and so in order to use context on both sides of the constituent, we need to use something like our boundary statistics. Our results suggest that a single tag before or after the constituent can be used as a reasonable approximation to the full context on  Computational Linguistics Volume 24, Number 2</Paragraph>
    <Paragraph position="4"> Average CPU time for 95% of the probability mass for the boundary estimates.</Paragraph>
    <Paragraph position="5"> that side of the constituent. Figure 12 shows the average CPU time for each sentence length.</Paragraph>
    <Paragraph position="6"> Since the boundary trigram estimate has none of the overhead associated with the prefix estimate, it is the best performer in terms of CPU time as well. We can also see that using just the boundary statistics, which can be precomputed and require no extra processing during parsing, still results in a substantial improvement over the non-best-first &amp;quot;stack&amp;quot; model.</Paragraph>
    <Paragraph position="7"> As another method of comparison between the two best-performing estimates, the context-dependent boundary trigram model and the context-independent trigram model, we compared the number of edges needed to find the first parse for average-length sentences. The average length of a sentence in our test data is about 22 words. Figure 13 shows the percentage of sentences of length 18 through 26 for which a parse could be found within 2,500 edges. For this experiment, we used a separate test set from the Wall Street Journal corpus, containing approximately 570 sentences in the desired length range. This measure also shows a real advantage of the boundary trigram estimate over the trigram estimate.</Paragraph>
  </Section>
  <Section position="9" start_page="289" end_page="292" type="evalu">
    <SectionTitle>
6. Results Summary
</SectionTitle>
    <Paragraph position="0"> 7. Comparing Figures of Merit Using a Treebank Grammar</Paragraph>
    <Section position="1" start_page="289" end_page="291" type="sub_section">
      <SectionTitle>
7.1 Background
</SectionTitle>
      <Paragraph position="0"> To verify that our results are not an artifact of the particular grammar we chose for testing, we also tested using a treebank grammar introduced in Charniak (1996). This  grammar was trained in a straightforward way by reading the grammar directly (with minor modifications) from a portion of the Penn Treebank Wall Street Journal data comprised of about 300,000 words. The boundary statistics were counted directly from the training data as well. The treebank grammar is much larger and more ambiguous than our original grammar, containing about 16,000 rules and 78 terminal and nonterminal symbols, and it was impractical to parse sentences to exhaustion using our existing hardware, so the figures based on 95% of the probability mass could not be computed. We were able to use this grammar to compare the number of edges needed to find the first parse using the trigram and boundary trigram estimates.</Paragraph>
      <Paragraph position="2"> % of the 18- to 26-word sentences finding a parse in a fixed number of edges for a treebank grammar.</Paragraph>
    </Section>
    <Section position="2" start_page="291" end_page="292" type="sub_section">
      <SectionTitle>
7.2 Results
</SectionTitle>
      <Paragraph position="0"> Figure 14 shows the percentage of sentences of length 18 through 26 for which a parse could be found within 20,000 edges. Again, we used a test set of approximately 570 sentences of the appropriate length from the Wall Street Journal corpus. Although the x-axis covers a much wider range than in Figure 13, the relationship between the two estimates is quite similar.</Paragraph>
      <Paragraph position="1"> 8. Previous Work In an earlier version of this paper (Caraballo and Charniak 1996), we presented the results for several of these models using our original grammar. The treebank grammar was introduced in Charniak (1996), and the parser in. that paper is a best-first parser using the boundary trigram figure of merit.</Paragraph>
      <Paragraph position="2"> The literature shows many implementations of best-first parsing, but none of the previous work shares our goal of explicitly comparing figures of merit.</Paragraph>
      <Paragraph position="3"> Bobrow (1990) and Chitrao and Grishman (1990) introduced statistical agenda-based parsing techniques. Chitrao and Grishman implemented a best-first probabilistic parser and noted the parser's tendency to prefer shorter constituents. They proposed a heuristic solution of penalizing shorter constituents by a fixed amount per word. Miller and Fox (1994) compare the performance of parsers using three different types of grammars, and show that a probabilistic context-free grammar using inside probability (unnormalized) as a figure of merit outperforms both a context-free grammar and a context-dependent grammar.</Paragraph>
      <Paragraph position="4"> Kochman and Kupin (1991) propose a figure of merit closely related to our prefix estimate. They do not actually incorporate this figure into a best-first parser.</Paragraph>
      <Paragraph position="5">  Caraballo and Charniak Figures of Merit Magerman and Marcus (1991) use the geometric mean to compute a figure of merit that is independent of constituent length. Magerman and Weir (1992) use a similar model with a different parsing algorithm.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML