File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/w01-0908_intro.xml

Size: 2,198 bytes

Last Modified: 2025-10-06 14:01:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-0908">
  <Title>Using the Distribution of Performance for Studying Statistical NLP Systems and Corpora</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Motivation
</SectionTitle>
    <Paragraph position="0"> Words, parts-of-speech (POS), words, or any feature in text may be regarded as outcomes of a statistical process. Therefore, word counts, count ratios, and other data used in creating statistical NLP models are statistical quantities as well, and as such prone to sampling noise. Sampling noise results from the niteness of the data, and the particular choice of training and test data.</Paragraph>
    <Paragraph position="1"> A model is an approximation or a more abstract representation of training data. One may look at a model as a collection of estimators analogous, e.g., to the slope calculated by linear regression. These estimators are statistics with a distribution related to the way they were obtained, which may be very complicated. The performance gures, being dependent on these estimators, have a distribution function which may be di cult to nd theoretically. This distribution gives rise to intrinsic noise.</Paragraph>
    <Paragraph position="2"> Performance comparisons based on a single run or a few runs do not take these noises into account. Because we cannot assign the resulting statements a con dence measure, they are more qualitative than quantitative. The degree to which we can accept such statements depends on the noise level and more generally, on the distribution of performance.</Paragraph>
    <Paragraph position="3"> In this paper, we use recall as a performance measure (cf. Section 4.4 and Section 3.2 in (Yeh, 2000)). Recall samples are obtained by resampling from training data and training classi ers on these samples.</Paragraph>
    <Paragraph position="4"> The resampling methods used here are cross-validation and bootstrap (Efron and Gong, 1983; Efron and Tibshirani, 1993, cf. Section 3). Section 4 presents the experimental goals and setup. Results are presented and discussed in Section 5, and a summary is provided in Section 6.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML