File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/p05-1040_evalu.xml

Size: 3,252 bytes

Last Modified: 2025-10-06 13:59:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-1040">
  <Title>Detecting Errors in Discontinuous Structural Annotation</Title>
  <Section position="6" start_page="327" end_page="327" type="evalu">
    <SectionTitle>
5 Results on the TIGER Corpus
</SectionTitle>
    <Paragraph position="0"> We ran the variation n-grams error detection method for discontinuous syntactic constituents on v. 1 of TIGER (Brants et al., 2002), a corpus of 712,332 tokens in 40,020 sentences. The method detected a total of 10,964 variation nuclei. From these we sampled 100 to get an estimate of the number of errors in the corpus which concern variation. Of these 100, 13 variation nuclei pointed to an error; with this point estimate of .13, we can derive a 95% confidence interval of (0.0641, 0.1959),6 which means that we are 95% confident that the true number of variation-based errors is between 702 and 2148. The effectiveness of a method which uses context to narrow down the set of variation nuclei can be judged by how many of these variation errors it finds.</Paragraph>
    <Paragraph position="1"> Using the non-fringe heuristic discussed in the previous section, we selected the shortest non-fringe variation n-grams to examine. Occurrences of the same strings within larger n-grams were ignored, so as not to artificially increase the resulting set of ngrams. null When the context is defined as identical words, we obtain 500 variation n-grams. Sampling 100 of these and labeling for each position whether it is an error or an ambiguity, we find that 80 out of the 100 samples point to at least one token error. The 95% confidence interval for this point estimate of .80 is</Paragraph>
    <Paragraph position="3"> n , where p is the point estimate and n the sample size.</Paragraph>
    <Paragraph position="4"> (0.7216, 0.8784), so we are 95% confident that the true number of error types is between 361 and 439. Note that this precision is comparable to the estimates for continuous syntactic annotation in Dickinson and Meurers (2003b) of 71% (with null elements) and 78.9% (without null elements).</Paragraph>
    <Paragraph position="5"> When the context is defined as identical parts of speech, as described in section 4.3.1, we obtain 1498 variation n-grams. Again sampling 100 of these, we find that 52 out of the 100 point to an error. And the 95% confidence interval for this point estimate of .52 is (0.4221, 0.6179), giving a larger estimated number of errors, between 632 and 926.</Paragraph>
    <Paragraph position="6">  Words convey more information than part-of-speech tags, and so we see a drop in precision when using part-of-speech tags for context, but these results highlight a very practical benefit of using a generalized context. By generalizing the context, we maintain a precision rate of approximately 50%, and we substantially increase the recall of the method.</Paragraph>
    <Paragraph position="7"> There are, in fact, likely twice as many errors when using POS contexts as opposed to word contexts.</Paragraph>
    <Paragraph position="8"> Corpus annotation projects willing to put in some extra effort thus can use this method of finding variation n-grams with a generalized context to detect and correct more errors.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML