File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/91/h91-1061_intro.xml

Size: 2,932 bytes

Last Modified: 2025-10-06 14:05:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="H91-1061">
  <Title>Evaluating Text Categorization I</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Text classification systems, i.e. systems which can make distinctions between meaningful classes of texts, have been widely studied in information retrieval and natural language processing. The majority of information retrieval research has been devoted to a particular form of text classification-~ext raft/eva/. Text retrieval systems find or route texts in response to arbitrary user queries or interest profiles. Evaluation has been a focus of research in text retrieval since the beginning, and standard evaluation methods are in wide use.</Paragraph>
    <Paragraph position="1"> A smaller, but significant, body of work has examined a task variously known as machine-aided indexing, automated indexing, authority control, or text categorization. Text categorization is the assignment of texts to one or more of a pre-existing set of categories, rather than classifying them in response to an arbitrary query. Categorization may be performed for a wide range of reasons, either as an end in itself or as a component of a larger system.</Paragraph>
    <Paragraph position="2"> i Current Address: Center for Information and Language Studies; University of Chicago; Chicago, IL 60637; le~ig@tira.uchicago.edu The literature on text categorization is widely scattered and shows little agreement on evaluation methods. This makes it very difficult to draw conclusions about the relative effectiveness of techniques so that, unlike the situation in query-driven retrieval, there is no consensus on a set of basic evaluation methods for text categorization.</Paragraph>
    <Paragraph position="3"> In this paper I discuss measures of effectiveness for text categorization systems and algorithms. Effectiveness refers to the ability of a categorization to supply information to a system or user that wants to access the texts. Measuring effectiveness is just one of several kinds of evaluation that should be considered \[Spa81a, CH88, PFg0\].</Paragraph>
    <Paragraph position="4"> After considering effectiveness evaluation for text categorization we will turn to a related task, text extraction, and consider what role the effectiveness measures discussed for categorization have there. A common theme is the need to consider in an evaluation the purpose for which information is generated from the text.</Paragraph>
    <Paragraph position="5"> I will have occasion in the following to repeatedly refer to a chapter by Tague \[Tag81\] in Sparck Jones' collection on information retrieval experimentation \[Spagla\]. This collection discusses a wide range of evaluation issues, and is an important resource for anyone interested in the evaluation of text-based systems.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML