XML Viewer - w03-0424

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0424_intro.xml
Size: 1,980 bytes
Last Modified: 2025-10-06 14:01:56
<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0424">
  <Title>Language Independent NER using a Maximum Entropy Tagger</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Named Entity Recognition1 (NER) can be treated as a tagging problem where each word in a sentence is assigned a label indicating whether it is part of a named entity and the entity type. Thus methods used for part of speech (POS) tagging and chunking can also be used for NER. The papers from the CoNLL-2002 shared task which used such methods (e.g. Malouf (2002), Burger et al. (2002)) reported results significantly lower than the best system (Carreras et al., 2002). However, Zhou and Su (2002) have reported state of the art results on the MUC-6 and MUC-7 data using a HMM-based tagger.</Paragraph>
    <Paragraph position="1"> Zhou and Su (2002) used a wide variety of features, which suggests that the relatively poor performance of the taggers used in CoNLL-2002 was largely due to the feature sets used rather than the machine learning method.</Paragraph>
    <Paragraph position="2"> We demonstrate this to be the case by improving on the best Dutch results from CoNLL-2002 using a maximum entropy (ME) tagger. We report reasonable precision and recall (84.9 F-score) for the CoNLL-2003 English test data, and an F-score of 68.4 for the CoNLL-2003 German test data.</Paragraph>
    <Paragraph position="3"> 1We assume that NER involves assigning the correct label to an entity as well as identifying its boundaries.</Paragraph>
    <Paragraph position="4"> Incorporating a diverse set of overlapping features in a HMM-based tagger is difficult and complicates the smoothing typically used for such taggers. In contrast, a ME tagger can easily deal with diverse, overlapping features. We also use a Gaussian prior on the parameters for effective smoothing over the large feature space.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML