File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/n03-1023_intro.xml
Size: 4,845 bytes
Last Modified: 2025-10-06 14:01:41
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-1023"> <Title>Weakly Supervised Natural Language Learning Without Redundant Views</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Multi-view weakly supervised learning paradigms such as co-training (Blum and Mitchell, 1998) and co-EM (Nigam and Ghani, 2000) learn a classification task from a small set of labeled data and a large pool of unlabeled data using separate, but redundant, views of the data (i.e. using disjoint feature subsets to represent the data). Multi-view learning has been successfully applied to a number of tasks in natural language processing (NLP), including text classification (Blum and Mitchell, 1998; Nigam and Ghani, 2000), named entity classification (Collins and Singer, 1999), base noun phrase bracketing (Pierce and Cardie, 2001), and statistical parsing (Sarkar, 2001; Steedman et al., 2003).</Paragraph> <Paragraph position="1"> The theoretical performance guarantees of multi-view weakly supervised algorithms come with two fairly strong assumptions on the views. First, each view must be sufficient to learn the given concept. Second, the views must be conditionally independent of each other given the class label. When both conditions are met, Blum and Mitchell prove that an initial weak learner can be boosted using unlabeled data.</Paragraph> <Paragraph position="2"> Unfortunately, finding a set of views that satisfies both of these conditions is by no means an easy problem. In addition, recent empirical results by Muslea et al. (2002) and Nigam and Ghani (2000) have shown that multi-view algorithms are quite sensitive to the two underlying assumptions on the views. Effective view factorization in multi-view learning paradigms, therefore, remains an important issue for their successful application. In practice, views are supplied by users or domain experts, who determine a natural feature split that is expected to be redundant (i.e. each view is expected to be sufficient to learn the target concept) and conditionally independent given the class label.1 We investigate here the application of weakly supervised learning algorithms to problems for which no obvious natural feature split exists and hypothesize that, in these cases, single-view weakly supervised algorithms will perform better than their multi-view counterparts.</Paragraph> <Paragraph position="3"> Motivated, in part, by the results in Mueller et al. (2002), we use the task of noun phrase coreference resolution for illustration throughout the paper.2 In our experiments, we compare the performance of the Blum and Mitchell co-training algorithm with that of two commonly used single-view algorithms, namely, self-training and Expectation-Maximization (EM). In comparison to co-training, self-training achieves substantially superior performance and is less sensitive to its input parameters.</Paragraph> <Paragraph position="4"> EM, on the other hand, fails to boost performance, and we attribute this phenomenon to the presence of redundant features in the underlying generative model. Consequently, we propose a wrapper-based feature selection method (John et al., 1994) for EM that results in performance improvements comparable to that observed with self-training. Overall, our results suggest that single-view 1Abney (2002) argues that the conditional independence assumption is remarkably strong and is rarely satisfied in real data sets, showing that a weaker independence assumption suffices.</Paragraph> <Paragraph position="5"> 2Mueller et al. (2002) explore a heuristic method for view factorization for the related problem of anaphora resolution, but find that co-training shows no performance improvements for any type of German anaphor except pronouns over a baseline classifier trained on a small set of labeled data.</Paragraph> <Paragraph position="6"> weakly supervised learning algorithms are a viable alternative to multi-view algorithms for data sets where a natural feature split into separate, redundant views is not available.</Paragraph> <Paragraph position="7"> The remainder of the paper is organized as follows.</Paragraph> <Paragraph position="8"> Section 2 presents an overview of the three weakly supervised learning algorithms mentioned previously. In section 3, we introduce noun phrase coreference resolution and describe the machine learning framework for the problem. In section 4, we evaluate the weakly supervised learning algorithms on the task of coreference resolution.</Paragraph> <Paragraph position="9"> Section 5 introduces a method for improving the performance of weakly supervised EM via feature selection.</Paragraph> <Paragraph position="10"> We conclude with future work in section 6.</Paragraph> </Section> class="xml-element"></Paper>