File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/p03-1042_intro.xml
Size: 2,872 bytes
Last Modified: 2025-10-06 14:01:49
<?xml version="1.0" standalone="yes"?> <Paper uid="P03-1042"> <Title>Uncertainty Reduction in Collaborative Bootstrapping: Measure and Algorithm</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> We consider here the problem of collaborative bootstrapping. It includes co-training (Blum and Mitchell, 1998; Collins and Singer, 1998; Nigam and Ghani, 2000) and bilingual bootstrapping (Li and Li, 2002).</Paragraph> <Paragraph position="1"> Collaborative bootstrapping begins with a small number of labelled data and a large number of unlabelled data. It trains two (types of) classifiers from the labelled data, uses the two classifiers to label some unlabelled data, trains again two new classifiers from all the labelled data, and repeats the above process. During the process, the two classifiers help each other by exchanging the labelled data. In co-training, the two classifiers have different feature structures, and in bilingual bootstrapping, the two classifiers have different class structures.</Paragraph> <Paragraph position="2"> Dasgupta et al (2001) and Abney (2002) conducted theoretical analyses on the performance (generalization error) of co-training. Their analyses, however, cannot be directly used in studies of co-training in (Nigam & Ghani, 2000) and bilingual bootstrapping.</Paragraph> <Paragraph position="3"> In this paper, we propose the use of uncertainty reduction in the study of collaborative bootstrapping (both co-training and bilingual bootstrapping). We point out that uncertainty reduction is an important factor for enhancing the performances of the classifiers in collaborative bootstrapping. Here, the uncertainty of a classifier is defined as the portion of instances on which it cannot make classification decisions. Exchanging labelled data in bootstrapping can help reduce the uncertainties of classifiers.</Paragraph> <Paragraph position="4"> Uncertainty reduction was previously used in active learning. We think that it is this paper which for the first time uses it for bootstrapping.</Paragraph> <Paragraph position="5"> We propose a new measure for representing the uncertainty correlation between the two classifiers in collaborative bootstrapping and refer to it as 'uncertainty correlation coefficient' (UCC). We use UCC for analysis of collaborative bootstrapping. We also propose a new algorithm to improve the performance of existing collaborative bootstrapping algorithms. In the algorithm, one classifier always asks the other classifier to label the most uncertain instances for it.</Paragraph> <Paragraph position="6"> Experimental results indicate that our theoretical analysis is correct. Experimental results also indicate that our new algorithm outperforms existing algorithms.</Paragraph> </Section> class="xml-element"></Paper>