File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/p02-1061_evalu.xml
Size: 3,169 bytes
Last Modified: 2025-10-06 13:58:53
<?xml version="1.0" standalone="yes"?> <Paper uid="P02-1061"> <Title>Teaching a Weaker Classifier: Named Entity Recognition on Upper Case Text</Title> <Section position="7" start_page="0" end_page="0" type="evalu"> <SectionTitle> 5 Experimental Results </SectionTitle> <Paragraph position="0"> For manually labeled data (corpus C), we used only the official training data provided by the MUC-6 and MUC-7 conferences, i.e., using MUC-6 training data and testing on MUC-6 test data, and using MUC-7 training data and testing on MUC-7 test data.1 The task definitions for MUC-6 and MUC-7 are not exactly identical, so we could not combine the training data. The original MUC-6 training data has a total of approximately 160,000 tokens and plotted against amount of selected unlabeled data used MUC-7 a total of approximately 180,000 tokens.</Paragraph> <Paragraph position="1"> The unlabeled text is drawn from the TREC (Text REtrieval Conference) corpus, 1992 Wall Street Journal section. We have used a total of 4,893 articles with a total of approximately 2,161,000 tokens. After example selection, this reduces the number of tokens to approximately 46,000 for MUC-6 and 67,000 for MUC-7.</Paragraph> <Paragraph position="2"> Figure 3 and Figure 4 show the results for MUC-6 and MUC-7 obtained, plotted against the number of unlabeled instances used. As expected, it increases the recall in each domain, as more names or their contexts are learned from unlabeled data. However, as more unlabeled data is used, precision drops due to the noise introduced in the machine tagged data. For MUC-6, F-measure performance peaked at the point where 30,000 tokens of machine labeled data are added to the original manually tagged 160,000 tokens. For MUC-7, performance peaked at 20,000 tokens of machine labeled data, added to the original manually tagged 180,000 tokens.</Paragraph> <Paragraph position="3"> The improvements achieved are summarized in Table 3. It is clear from the table that this method of using unlabeled data brings considerable improvement for both MUC-6 and MUC-7 named entity task.</Paragraph> <Paragraph position="4"> The result of the teaching process for MUC-6 is a lot better than that of MUC-7. We think that this is measure of 93.27% on the official MUC-6 test data, while that of MUC-7 (also trained on only the official MUC-7 training data) achieved an F-measure of only 87.24%. As the mixed case NER is used as the teacher, a bad teacher does not help as much.</Paragraph> <Paragraph position="5"> Domain Shift in MUC-7. Another possible cause is that there is a domain shift in MUC-7 for the formal test (training articles are aviation disasters articles and test articles are missile/rocket launch articles). The domain of the MUC-7 test data is also very specific, and hence it might exhibit different properties from the training and the unlabeled data. The Source of Unlabeled Data. The unlabeled data used is from the same source as MUC-6, but different for MUC-7 (MUC-6 articles and the unlabeled articles are all Wall Street Journal articles, whereas MUC-7 articles are New York Times articles). null</Paragraph> </Section> class="xml-element"></Paper>