File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/p93-1006_intro.xml

Size: 4,344 bytes

Last Modified: 2025-10-06 14:05:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="P93-1006">
  <Title>USING BRACKETED PARSES TO EVALUATE A GRAMMAR CHECKING APPLICATION</Title>
  <Section position="3" start_page="0" end_page="38" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> The recent development of broad-coverage natural language processing systems has stimulated work on the evaluation of the syntactic component of such systems, for purposes of basic evaluation and improvement of system performance.</Paragraph>
    <Paragraph position="1"> Methods utilizing hand-bracketed corpora (such as the University of Pennsylvania Treebank) as a basis for evaluation metrics have been discussed in Black et al. (1991), Harrison et al. (1991), and Black et al. (1992). Three metrics discussed in those works were the Crossing Parenthesis Score (a count of the number of phrases in the machine produced parse which cross with one or more phrases in the hand parse), Recall (the percentage of phrases in the hand parse that are also in the machine parse), and Precision (the percentage of phrases in the machine parse that are in the hand parse).</Paragraph>
    <Paragraph position="2"> We have developed a methodology for using hand-bracketed parses to examine both the internal and external performance of a grammar checker. The internal performance refers to the behavior of the underlying system--i.e, the tokenizer, parser, lexicon, and grammar. The external performance refers to the error critiques generated by the system. 1 Our evaluation methodology relies on three separate error reports generated from a corpus of randomly selected sentences: 1) a report based on unbracketed sentences, 2) a report based on optimally bracketed sentences with our current system, and 3) a report based on the optimal bracketings with the system modified to insure the same coverage as the unbracketed corpus.</Paragraph>
    <Paragraph position="3"> The bracketed report from the unmodified system tells us something about the coverage of our underlying system in its current state. The bracketed report from the modified system tells us something about the external accuracy of the error reports presented to the user.</Paragraph>
    <Paragraph position="4"> Our underlying system uses a bottom-up, funambiguity parser. Our error detection method relies on including grammar rules for parsing errorful sentences, with error critiques being generated from the occurrence of an error rule in the parse. Error critiques are based on just one of all the possible parse trees that the system can find for a given sentence. Our major concern about the underlying system is whether the system has a correct parse for the sentence in question. We are also concerned about the accuracy of the selected parse, but our current methodology does not directly address that issue, because correct error reports do not depend on having precisely the correct parse. Consequently, our evaluation of the underlying grammatical coverage is based on a simple metric, namely the parser success rate for satisfying sentence bracketings (i.e. correct parses). Either the parser can produce the optimal parse or it can't.</Paragraph>
    <Paragraph position="5"> We have a more complex approach to evaluating the performance of the system's ability to detect errors. Here, we need to look at both the 1. We use the term critique to represent an instance of an error detected. Each sentence may have zero or more critiques reported for it.</Paragraph>
    <Paragraph position="6">  overgeneration and undergeneration of individual error critiques. What is the rate of spurious critiques, or critiques incorrectly reported, and what is the rate of missed critiques, or critiques not reported. Therefore we define two additional metrics, which illuminate the spurious and missed critique rates, respectively: Precision: the percentage of correct critiques from the unbracketed corpus.</Paragraph>
    <Paragraph position="7"> Recall: the percentage of critiques generated from an ideal bracketed corpus that are also present among those in the unbracketed corpus.</Paragraph>
    <Paragraph position="8"> Precision tells us what percentage of reported critiques are reliable, and Recall tells iJs what percentage of correct critiques have been reported (modulo the coverage).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML