XML Viewer - w98-1115

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/w98-1115_evalu.xml
Size: 4,403 bytes
Last Modified: 2025-10-06 14:00:34
<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-1115">
  <Title>Edge-Based Best-First Chart Parsing *</Title>
  <Section position="6" start_page="130" end_page="132" type="evalu">
    <SectionTitle>
5 Results
</SectionTitle>
    <Paragraph position="0"> To better understand the experimental results it first behooves us to compare them to those achieved previously. Goodman's results (1997) are hard to compare against ours because his parser returns more than a singe best parse and because he measures processing time, not edges.</Paragraph>
    <Paragraph position="1"> However he does give edges/second for one of his  parsers and this plus his parsing times suggests that for him edges/sentence will measure in the tens of thousands -- a far cry from our hundreds. Ratnaparki's (1997) beam search parsing procedure produces higher accuracy results than our PCFG model, and achieves this with a beam width of 20. Unfortunately his paper does not give statistics which can be directly compared with ours.</Paragraph>
    <Paragraph position="2"> The work by C&amp;C is easier to compare. In Figure 4 we reproduce C&amp;C's results on the percentage of sentences (length 18-26) parsed as a function of number of edges used. We performed the same experiment, and our results are incliaded there as well. This figure makes dramatic the order of magnitude improvement provided by our new scheme, but it is not too easy to read numbers off of it. Such numbers are provided in Table 1.</Paragraph>
    <Paragraph position="3">  Our figures were obtained using rl = 1.2. As can be seen, our parser requires about one twentieth the number of edges required by C&amp;C. Indeed, the low average number of edges to first parse is probably the most striking thing about our results. Even allowing for the fact that considerably more edges must be pushed than are popped, the total number of edges required to first parse is quite small. Since the average number of edges required to construct just the (left-factored) test corpus trees is 47.5, our parsing system considers as few as 3 times as many edges as are required to actually produce the output tree.</Paragraph>
    <Paragraph position="4"> Almost as interesting, if r I is below 1.4, the precision and recall scores of the first parse are better than those obtained by running the parser to exhaustion, even though the probability of the first parses our algorithm returns cannot be higher than that found by the exhaustive version. Furthermore, as seen in Figure 3, running our parser past the first parse by a small amount (150% of the edges required for the first parse) produces still more accurate parses. At 150% of the minimum number of edges and r I = 1.2 the precision/recall figures are about 2% above those for the maximum likelihood parse.</Paragraph>
    <Paragraph position="5"> We have two (possibly related) theories of these phenomona. It may be that the FOM metric used to select constituents forces our parser to concentrate on edges which are plausible given their surrounding preterminals; information which is ignored by the exhaustive maximum likelihood parser. Alternatively, it may be that because our FOM causes our parser to prefer edges with a high inside times (estimated) outside probability, it is in fact partially mim- null icking Goodman's (Goodman, 1996) 'Labelled Recall' parsing algorithm, which does not return the highest probability parse but attempts to maximize labelled bracket recall with the test set.</Paragraph>
    <Paragraph position="6"> Finally, it is interesting to note that the minimum number of edges per parse is reached when r/~ 1.65, which is considerably larger than the theoretical estimate of 1.3 given earlier. Notice that one effect of increasing rl is to raise the FOM for longer constituents. It may be that on average a partial parse is completed fastest if larger constituents receive more attention since they are more likely to lead quickly to a complete analysis, which would be one consequence of the larger than expected r/.</Paragraph>
    <Paragraph position="7"> This last hypothesis is also consistent with the observation that average precision and recall sharply falls off when r/ is increased beyond its theoretically optimal value, since then the parser is presumably focusing on relatively larger constituents and ignoring other, strictly more plausible, smaller ones.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML