File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/w01-0720_evalu.xml

Size: 3,190 bytes

Last Modified: 2025-10-06 13:58:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-0720">
  <Title>A Psychologically Plausible and Computationally Effective Approach to Learning Syntax</Title>
  <Section position="7" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
6 Results
</SectionTitle>
    <Paragraph position="0"> Early results on small simple corpora with a simpler version of the learner were presented in (Watkinson and Manandhar, 1999; Watkinson and Manandhar, 2000). Here, we present experiments performed using two complex corpora, C1 and C2, extracted from the Penn Treebank (Marcus et al., 1993; Marcus et al., 1994). These corpora did not contain sentences with null elements (i.e. movement). C1 contains 5000 sentences of 15 words or less. C2 contains 1000 sentences of 15 words or less. Lexicons were induced from C1 and then used with the parser to parse C2.</Paragraph>
    <Paragraph position="1"> Experiments were performed with a closed-class word initial lexicon of 348 entries (LIL) and a smaller closed-class word initial lexicon of 31 entries (SIL) to determine the bootstrapping effect of this initial lexicon.</Paragraph>
    <Paragraph position="2"> The resulting lexicons are described in Table 1.</Paragraph>
    <Paragraph position="3"> These can be compared with a gold standard CG annotated corpus which has been built (Watkinson and Manandhar, 2001), which has a size of 15,136 lexical entries and an average ambiguity of 1.25 categories per word. This corpus is only loosely a gold standard, as it has been automatically constructed. However, it gives an indication of the effectiveness of the lexical labelling and is currently the best CG tagged resource available to us. The accuracy of the parsed examples both from the training and test corpora are also described in Table 1. Two measures are used to evaluate the parses: lexical accuracy, which is the percentage of correctly tagged words compared to the extracted gold standard corpus (Watkinson and Manandhar, 2001) and average crossing bracket rate (CBR) (Goodman, 1996).</Paragraph>
    <Paragraph position="4"> In general the system performs better with the larger initial lexicon to bootstrap it. The size and ambiguity of the lexicon are close to that of the gold standard, indicating that the right level of compression has occurred. The best crossing bracket rate of 4.70 compares favourably with Osborne and Briscoe (Osborne and Briscoe, 1997) who give crossing bracket rates of around 3 for a variety of systems. Considering that they are solving a much simpler problem, our average crossing bracket rates seem reasonable.</Paragraph>
    <Paragraph position="5"> The lexical accuracy value is fairly low. Joshi and Srinivas (Joshi and Srinivas, 1994) achieve a best of 77.26% accuracy. Two factors explain this. Firstly their system is simply disambiguating which tag to use in a context again using a corpus of tag sequences - a much simpler problem. Secondly, it would appear that the gold standard corpus they use is much more accurate than ours. Despite this, a system that assigned the tags randomly for our problem, would achieve an accuracy of 3.33%, so over 50% is a reasonable achievement.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML