File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/a00-1040_intro.xml

Size: 3,004 bytes

Last Modified: 2025-10-06 14:00:42

<?xml version="1.0" standalone="yes"?>
<Paper uid="A00-1040">
  <Title>Using Corpus-derived Name Lists for Named Entity Recognition</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Named entity (NE) recognition is the process of identifying and categorising names in text. Systems which have attempted the NE task have, in general, made use of lists of common names to provide clues.</Paragraph>
    <Paragraph position="1"> Name lists provide an extremely efficient way of recognising names, as the only processing required is to match the name pattern in the list against the text and no expensive advanced processing such as full text parsing is required. However, name lists are a naive method for recognising names. McDonald (1996) defines internal and external evidence in the NE task. The first is found within the name string itself, while the second is gathered from its context.</Paragraph>
    <Paragraph position="2"> For example, in the sentence &amp;quot;President Washington chopped the tree&amp;quot; the word &amp;quot;President&amp;quot; is clear external evidence that &amp;quot;Washington&amp;quot; denotes a person. In this case internal evidence from the name cannot conclusively tell us whether &amp;quot;Washington&amp;quot; is a per-son or a location (&amp;quot;Washington, DC&amp;quot;). A NE system based solely on lists of names makes use of only internal evidence and examples such as this demonstrate the limitations of this knowledge source.</Paragraph>
    <Paragraph position="3"> Despite these limitations, many NE systems use extensive lists of names. Krupke and Hausman (1998) made extensive use of name lists in their system. They found that reducing their size by more than 90% had little effect on performance, conversely adding just 42 entries led to improved results. This implies that the quality of list entries is a more important factor in their effectiveness than the total number of entries. Mikheev et al. (1999) experimented with different types of lists in an NE system entered for MUC7 (MUC, 1998). They concluded that small lists of carefully selected names are as effective as more complete lists, a result consistent with Krupke and Hausman. However, both studies altered name lists within a larger NE system and it is difficult to tell whether the consistency of performance is due to the changes in lists or extra, external, evidence being used to balance against the loss of internal evidence.</Paragraph>
    <Paragraph position="4"> In this paper a NE system which uses only the internal evidence contained in lists of names is presented. Section 3 explains how such lists can be automatically generated from annotated text. Sections 4 and 5 describe experiments in which these corpusgenerated lists are applied and their performance compared against hand-crafted lists. In the next section the NE task is described in further detail.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML