File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-1130_intro.xml
Size: 3,000 bytes
Last Modified: 2025-10-06 14:01:25
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1130"> <Title>Fine Grained Classification of Named Entities</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> There has been much interest in the recent past concerning automated categorization of named entities in text. Recent advances have made some systems (such as BBN's IdentiFinder (Bikel, 1999)) very successful when classifying named entities into broad categories, such as person, organization, and location. While the accurate classification of general named entities is useful in many areas of natural language research, more fine-grained categorizations would be of particular value in areas such as Question Answering, information retrieval, and the automated construction of ontologies.</Paragraph> <Paragraph position="1"> The research presented here focuses on the subcategorization of person names, which extends research on the subcategorization of location names (Fleischman, 2001). While locations can often be classified based solely on the words that surround the instance, person names are often more challenging because classification relies on much deeper semantic intuitions gained from the surrounding text. Further, unlike the case with location names, exhaustive lists of person names by category do not exist and cannot be relied upon for training and test set generation. Finally, the domain of person names presents a challenge because the same individual (e.g., &quot;Ronald Reagan&quot;) is often represented differently at different points in the same text (e.g., &quot;Mr. Reagan&quot;, &quot;Reagan&quot;, etc.). The subcategorization of person names is not a trivial task for humans either, as the examples below illustrate. Here, names of persons have been encrypted using a simple substitution cipher. The names are of only three subtypes: politician, businessperson, and entertainer, yet prove remarkably difficult to classify based upon the context of the sentence.</Paragraph> <Paragraph position="2"> 1. Unfortunately, Mocpm_____ and his immediate family did not cooperate in the making of the film . 2. &quot;The idea that they'd introduce Npn Fuasm______ into that is amazing ,&quot;he said.</Paragraph> <Paragraph position="3"> 3. &quot;It's dangerous to be right when government is wrong ,&quot; Lrsyomh______ told reporters 1. Mocpm = Nixon: politician 2. Npn Fuasm = Bob Dylan: entertainer 3. Lrsyomh = Keating: businessperson In this work we examine how different features and learning algorithms can be employed to automatically subcategorize person names in text. In doing this we address how to inject semantic information into the feature space, how to automatically generate training sets for use with supervised learning algorithms, and how to handle orthographic inconsistencies between instances of the same person.</Paragraph> </Section> class="xml-element"></Paper>