File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/w01-0502_intro.xml

Size: 4,765 bytes

Last Modified: 2025-10-06 14:01:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-0502">
  <Title>A Sequential Model for Multi-Class Classificationa0</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> A large number of important natural language inferences can be viewed as problems of resolving ambiguity, either semantic or syntactic, based on properties of the surrounding context. These, in turn, can all be viewed as classification problems in which the goal is to select a class label from among a collection of candidates. Examples include part-of speech tagging, word-sense disambiguation, accent restoration, word choice selection in machine translation, context-sensitive spelling correction, word selection in speech recognition and identifying discourse markers.</Paragraph>
    <Paragraph position="1"> Machine learning methods have become the most popular technique in a variety of classification problems of these sort, and have shown significant success. A partial list consists of Bayesian classifiers (Gale et al., 1993), decision lists (Yarowsky, 1994), Bayesian hybrids (Golding, 1995), HMMs (Charniak, 1993), inductive logic methods (Zelle and Mooney, 1996), memorya3 This research is supported by NSF grants IIS-9801638, IIS0085836 and SBR-987345.</Paragraph>
    <Paragraph position="2"> based methods (Zavrel et al., 1997), linear classifiers (Roth, 1998; Roth, 1999) and transformation-based learning (Brill, 1995).</Paragraph>
    <Paragraph position="3"> In many of these classification problems a significant source of difficulty is the fact that the number of candidates is very large - all words in words selection problems, all possible tags in tagging problems etc. Since general purpose learning algorithms do not handle these multi-class classification problems well (see below), most of the studies do not address the whole problem; rather, a small set of candidates (typically two) is first selected, and the classifier is trained to choose among these. While this approach is important in that it allows the research community to develop better learning methods and evaluate them in a range of applications, it is important to realize that an important stage is missing. This could be significant when the classification methods are to be embedded as part of a higher level NLP tasks such as machine translation or information extraction, where the small set of candidates the classifier can handle may not be fixed and could be hard to determine.</Paragraph>
    <Paragraph position="4"> In this work we develop a general approach to the study of multi-class classifiers. We suggest a sequential learning model that utilizes (almost) general purpose classifiers to sequentially restrict the number of competing classes while maintaining, with high probability, the presence of the true outcome in the candidate set.</Paragraph>
    <Paragraph position="5"> In our paradigm the sought after classifier has to choose a single class label (or a small set of labels) from among a large set of labels. It works by sequentially applying simpler classifiers, each of which outputs a probability distribution over the candidate labels. These distributions are multiplied and thresholded, resulting in that each classifier in the sequence needs to deal with a (significantly) smaller number of the candidate labels than the previous classifier. The classifiers in the sequence are selected to be simple in the sense that they typically work only on part of the feature space where the decomposition of feature space is done so as to achieve statistical independence. Simple classifier are used since they are more likely to be accurate; they are chosen so that, with high probability (w.h.p.), they have one sided error, and therefore the presence of the true label in the candidate set is maintained. The order of the sequence is determined so as to maximize the rate of decreasing the size of the candidate labels set.</Paragraph>
    <Paragraph position="6"> Beyond increased accuracy on multi-class classification problems , our scheme improves the computation time of these problems several orders of magnitude, relative to other standard schemes.</Paragraph>
    <Paragraph position="7"> In this work we describe the approach, discuss an experiment done in the context of part-of-speech (pos) tagging, and provide some theoretical justifications to the approach. Sec. 2 provides some background on approaches to multi-class classification in machine learning and in NLP. In Sec. 3 we describe the sequential model proposed here and in Sec. 4 we describe an experiment the exhibits some of its advantages. Some theoretical justifications are outlined in Sec. 5.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML