File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/96/p96-1025_evalu.xml

Size: 4,192 bytes

Last Modified: 2025-10-06 14:00:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="P96-1025">
  <Title>A New Statistical Parser Based on Bigram Lexical Dependencies</Title>
  <Section position="6" start_page="189" end_page="190" type="evalu">
    <SectionTitle>
4 Results
</SectionTitle>
    <Paragraph position="0"> The parser was trained on sections 02 - 21 of the Wall Street Journal portion of the Penn Treebank (Marcus et al. 93) (approximately 40,000 sentences), and tested on section 23 (2,416 sentences). For comparison SPATTER (Magerman 95; Jelinek et al. 94) was also tested on section 23. We use the PARSEVAL measures (Black et al. 91) to compare performance: Labeled Precision -- number of correct constituents in proposed parse number of constituents in proposed parse Labeled Recall = number of correct constituents in proposed parse number of constituents in treebank parse Crossing Brackets = number of constituents which violate constituent boundaries with a constituent in the treebank parse.</Paragraph>
    <Paragraph position="1"> For a constituent to be 'correct' it must span the same set of words (ignoring punctuation, i.e. all tokens tagged as commas, colons or quotes) and have the same label ldeg as a constituent in the treebank 1degSPATTER collapses ADVP and PRT to the same label, for comparison we also removed this distinction when  the model. The results are for all sentences of &lt; 100 words in section 23 using model (3). For 'no lexical information' all estimates are based on POS tags alone. For 'no distance measure' the distance measure is Question 1 alone (i.e. whether zbj precedes or follows ~hj).</Paragraph>
    <Paragraph position="2"> parse. Four configurations of the parser were tested: (1) The basic model; (2) The basic model with the punctuation rule described in section 2.7; (3) Model (2) with tags ignored when lexical information is present, as described in 2.7; and (4) Model (3) also using the full probability distributions for POS tags. We should emphasise that test data outside of section 23 was used for all development of the model, avoiding the danger of implicit training on section 23. Table 3 shows the results of the tests. Table 4 shows results which indicate how different parts of the system contribute to performance.</Paragraph>
    <Section position="1" start_page="189" end_page="190" type="sub_section">
      <SectionTitle>
4.1 Performance Issues
</SectionTitle>
      <Paragraph position="0"> All tests were made on a Sun SPARCServer 1000E, using 100% of a 60Mhz SuperSPARC processor. The parser uses around 180 megabytes of memory, and training on 40,000 sentences (essentially extracting the co-occurrence counts from the corpus) takes under 15 minutes. Loading the hash table of bigram counts into memory takes approximately 8 minutes.</Paragraph>
      <Paragraph position="1"> Two strategies are employed to improve parsing efficiency. First, a constant probability threshold is used while building the chart - any constituents with lower probability than this threshold are discarded.</Paragraph>
      <Paragraph position="2"> If a parse is found, it must be the highest ranked parse by the model (as all constituents discarded have lower probabilities than this parse and could  calculating scores.</Paragraph>
      <Paragraph position="3"> not, therefore, be part of a higher probability parse). If no parse is found, the threshold is lowered and parsing is attempted again. The process continues until a parse is found.</Paragraph>
      <Paragraph position="4"> Second, a beam search strategy is used. For each span of words in the sentence the probability, Ph, of the highest probability constituent is recorded. All other constituents spanning the same words must have probability greater than ~-~ for some constant beam size /3 - constituents which fall out of this beam are discarded. The method risks introducing search-errors, but in practice efficiency can be greatly improved with virtually no loss of accuracy. Table 5 shows the trade-off between speed and accuracy as the beam is narrowed.</Paragraph>
      <Paragraph position="5">  as the beam-size is varied. Model (3) was used for this test on all sentences &lt; 100 words in section 23.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML