File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0434_intro.xml

Size: 2,844 bytes

Last Modified: 2025-10-06 14:01:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0434">
  <Title>A Robust Risk Minimization based Named Entity Recognition System</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> An important research area in the field of information extraction is Named Entity Recognition. This topic was a central theme in the message understanding conferences (MUCs). It has become more important nowadays due to the large amount of available electronic text, which makes it necessary to build systems that can automatically process and extract information from text.</Paragraph>
    <Paragraph position="1"> In spite of significant work in this area, the problem itself has not been solved. Although some earlier reports suggested accuracy (F1-number) of machine learning based systems to be in the lower 90s with relatively small amount of labeled data (for example, (Bikel et al., 1999; Mikheev et al., 1998; Sundheim, 1995)), these studies were often performed on relatively restricted domains. Our experience indicates that the performance of a statistically based named entity extraction system can vary significantly depending on the underlying domain.</Paragraph>
    <Paragraph position="2"> There are still open challenges to make the performance of a statistical system consistent across different types of data sources.</Paragraph>
    <Paragraph position="3"> In this paper we present a system for named entity recognition based on our earlier work on text chunking (Zhang et al., 2002). One advantage of the proposed system is that it can easily incorporate a large number of linguistic features. This advantage is similar to a number of other approaches, such as the maximum entropy method, which has been widely used to solve NLP problems, see (Borthwick, 1999; Ratnaparkhi, 1999) for example.</Paragraph>
    <Paragraph position="4"> The performance of our system can be significantly affected by the choice of available linguistic features. The main focus of this paper is to investigate the impact of some local features. Specifically we show that the system performance can be enhanced significantly with some relatively simple token-based features. More sophisticated linguistic features, although helpful, yield much less improvement in system performance than might be expected.</Paragraph>
    <Paragraph position="5"> We believe that this study provides useful insight into the usefulness of various available local linguistic features. Since these simple features are readily available for many languages, it suggests the possibility of setting up a language independent named entity recognition system quickly so that its performance is close to a system that uses much more sophisticated, language dependent features.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML