File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/97/a97-1030_evalu.xml

Size: 2,469 bytes

Last Modified: 2025-10-06 14:00:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="A97-1030">
  <Title>Categorizing and standardizing proper nouns for efficient information retrieval, In B. Boguraev and</Title>
  <Section position="10" start_page="206" end_page="207" type="evalu">
    <SectionTitle>
8 Evaluation
</SectionTitle>
    <Paragraph position="0"> An evaluation of an earlier version of Nominator, was performed on 88 Wall Street Journal documents (NIST 1993) that had been set aside for testing. We chose the Wall Street Journal corpus because it follows standard stylistic conventions, especially capitalization, which is essential for Nominator to work.</Paragraph>
    <Paragraph position="1"> Nominator's performance deteriorates if other conventions are not consistently followed.</Paragraph>
    <Paragraph position="2"> A linguist manually identified 2426 occurrences of proper names, which reduced to 1354 unique tokens. Of these, Nominator correctly identified the boundaries of 91% (1230/1354). The precision rate was 92% for the 1409 names Nominator identified (1230/1409). In terms of semantic disambiguation, Nominator failed to assign an entity type to 21% of the names it identified. This high percentage is due to a decision not to assign a type if the confidence measure is too low. The payoff of this choice is a very high precision rate -- 99 % -- for the assignment of semantic type to those names that were disambiguated. (See (Ravin and Wacholder 1996) for details.</Paragraph>
    <Paragraph position="3"> The main reason that names remain untyped is insufficent evidence in the document. If IBM, for example, occurs in a document without International Business Machines, Nominator does not type it; rather, it lets later processes inspect the local context for further clues. These processess form part of the Talent tool set under development at the T.:\]. Watson Research Center. They take as their input text processed by Nominator and further disambiguate untyped names appearing in certain contexts, such as an appositive, e.g., president of CitiBank Corp.</Paragraph>
    <Paragraph position="4"> Other untyped names, such as Star Bellied Sneetches or George Melloan's Business World, are neither people, places, organizations nor any of the other legal or financial entities we categorize into.</Paragraph>
    <Paragraph position="5"> Many of these uncategorized names are titles of articles, books and other works of art that we currently do not handle.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML