File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/n06-2014_intro.xml

Size: 2,708 bytes

Last Modified: 2025-10-06 14:03:30

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-2014">
  <Title>Agreement/Disagreement Classi cation: Exploiting Unlabeled Data using Contrast Classi ers</Title>
  <Section position="2" start_page="0" end_page="53" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In natural language understanding research with data-driven techniques, data labeling is an essential but time-consuming and costly process. To alleviate this effort, various semi-supervised learning algorithms such as self-training (Yarowsky, 1995), co-training (Blum and Mitchell, 1998; Goldman and Zhou, 2000), transductive SVM (Joachims, 1999) and many others have been proposed and successfully applied under different assumptions and settings. They all aim to improve classi cation accuracy by exploiting more readily available unlabeled data as well as labeled examples. However, these iterative training methods have shortcomings when trained on data with imbalanced class distributions.</Paragraph>
    <Paragraph position="1"> One reason is that most classi ers underlying these methods assume a balanced training set, and thus when one of the classes has a much larger number of examples than the other classes, the trained classi er will be biased toward the majority class. The imbalance will propagate through subsequent iterations, resulting in a more skewed data set upon which a further biased classi er will be trained. To exploit unlabeled data in learning an inherently skewed data distribution, we introduce a semi-supervised classication method using contrast classi ers, rst proposed by Peng et al. (Peng et al., 2003). It approximates the posterior class probability given an observation using class-speci c contrast classi ers that implicitly model the difference between the distribution of labeled data for that class and the unlabeled data.</Paragraph>
    <Paragraph position="2"> In this paper, we will explore the applicability of contrast classi ers to the problem of semi-supervised learning for identifying agreements and disagreements in multi-party conversational speech.</Paragraph>
    <Paragraph position="3"> These labels represent a simple type of speech act that can be important for understanding the interaction between speakers, or for automatically summarizing or browsing the contents of a meeting. This problem was previously studied (Hillard et al., 2003; Galley et al., 2004), using a subset of ICSI meeting recording corpus (Janin et al., 2003). In semi-supervised learning, there is a challenge due to an imbalanced class distribution: over 60% of the data are associated with the default class and only 5% are with disagreements.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML