XML Viewer - c02-1025

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/c02-1025_evalu.xml
Size: 4,462 bytes
Last Modified: 2025-10-06 13:58:46
<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1025">
  <Title>Named Entity Recognition: A Maximum Entropy Approach Using Global Information</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5 Experimental Results
</SectionTitle>
    <Paragraph position="0"> The baseline system in Table 3 refers to the maximum entropy system that uses only local features.</Paragraph>
    <Paragraph position="1"> As each global feature group is added to the list of features, we see improvements to both MUC-6 and  MUC-7 test accuracy.2 For MUC-6, the reduction in error due to global features is 27%, and for MUC-7, 14%. ICOC and CSPP contributed the greatest improvements. The effect of UNIQ is very small on both data sets.</Paragraph>
    <Paragraph position="2"> All our results are obtained by using only the official training data provided by the MUC conferences. The reason why we did not train with both MUC-6 and MUC-7 training data at the same time is because the task specifications for the two tasks are not identical. As can be seen in Table 4, our training data is a lot less than those used by MENE and IdentiFinder3. In this section, we try to compare our results with those obtained by IdentiFinder '97 (Bikel et al., 1997), IdentiFinder '99 (Bikel et al., 1999), and MENE (Borthwick, 1999). IdentiFinder '99's results are considerably better than IdentiFinder '97's. IdentiFinder's performance in MUC-7 is published in (Miller et al., 1998). MENE has only been tested on MUC-7.</Paragraph>
    <Paragraph position="3"> For fair comparison, we have tabulated all results with the size of training data used (Table 5 and Table 6). Besides size of training data, the use of dictionaries is another factor that might affect performance. Bikel et al. (1999) did not report using any dictionaries, but mentioned in a footnote that they have added list membership features, which have helped marginally in certain domains. Borth- null wick (1999) reported using dictionaries of person first names, corporate names and suffixes, colleges and universities, dates and times, state abbreviations, and world regions.</Paragraph>
    <Paragraph position="4"> In MUC-6, the best result is achieved by SRA (Krupka, 1995). In (Bikel et al., 1997) and (Bikel et al., 1999), performance was plotted against training data size to show how performance improves with training data size. We have estimated the performance of IdentiFinder '99 at 200K words of training data from the graphs.</Paragraph>
    <Paragraph position="5"> For MUC-7, there are also no published results on systems trained on only the official training data of 200 aviation disaster articles. In fact, training on the official training data is not suitable as the articles in this data set are entirely about aviation disasters, and the test data is about air vehicle launching. Both BBN and NYU have tagged their own data to supplement the official training data. Even with less training data, MENERGI outperforms Borthwick's MENE + reference resolution (Borthwick, 1999).</Paragraph>
    <Paragraph position="6"> Except our own and MENE + reference resolution, the results in Table 6 are all official MUC-7 results. The effect of a second reference resolution classifier is not entirely the same as that of global features. A secondary reference resolution classifier has information on the class assigned by the primary classifier. Such a classification can be seen as a not-always-correct summary of global features.</Paragraph>
    <Paragraph position="7"> The secondary classifier in (Borthwick, 1999) uses information not just from the current article, but also from the whole test corpus, with an additional feature that indicates if the information comes from the same document or from another document. We feel that information from a whole corpus might turn out to be noisy if the documents in the corpus are not of the same genre. Moreover, if we want to test on a huge test corpus, indexing the whole corpus might prove computationally expensive. Hence we decided to restrict ourselves to only information from the same document.</Paragraph>
    <Paragraph position="8"> Mikheev et al. (1998) have also used a maximum entropy classifier that uses already tagged entities to help tag other entities. The overall performance of the LTG system was outstanding, but the system consists of a sequence of many hand-coded rules and machine-learning modules.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML