File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-1127_intro.xml

Size: 4,618 bytes

Last Modified: 2025-10-06 14:01:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1127">
  <Title>Location Normalization for Information Extraction*</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The task of location normalization is to identify the correct sense of a possibly ambiguous location Named Entity (NE). Ambiguity is very serious for location NEs. For example, there are 23 cities named 'Buffalo', including the city in New York State and in Alabama State. Even country names such as 'Canada', 'Brazil', and 'China' are also city names in the USA. Almost every city has a Main Street or Broadway. Such ambiguity needs to be properly handled before converting location names into some normal form to support entity profile construction, event merging and visualization of extracted events on *This work was partly supported by a grant from the Air Force Research Laboratory's Information Directorate (AFRL/IF), Rome, NY, under contract F30602-00-C-0090.</Paragraph>
    <Paragraph position="1"> a map for an Information Extraction (IE) System.</Paragraph>
    <Paragraph position="2"> Location normalization is a special application of word sense disambiguation (WSD). There is considerable research on WSD.</Paragraph>
    <Paragraph position="3"> Knowledge-based work, such as (Hirst, 1987; McRoy, 1992; Ng and Lee, 1996) used hand-coded rules or supervised machine learning based on annotated corpus to perform WSD.</Paragraph>
    <Paragraph position="4"> Recent work emphasizes corpus-based unsupervised approach (Dagon and Itai, 1994; Yarowsky, 1992; Yarowsky, 1995) that avoids the need for costly truthed training data. Location normalization is different from general WSD in that the selection restriction often used for WSD in many cases is not sufficient to distinguish the correct sense from the other candidates.</Paragraph>
    <Paragraph position="5"> For example, in the sentence &amp;quot;The White House is located in Washington&amp;quot;, the selection restriction from the collocation 'located in' can only determine that &amp;quot;Washington&amp;quot; should be a location name, but is not sufficient to decide the actual sense of this location. Location normalization depends heavily on co-occurrence constraints of geographically related location entities mentioned in the same discourse. For example, if 'Buffalo', 'Albany' and 'Rochester' are mentioned in the same document, the most probable senses of 'Buffalo', 'Albany' and 'Rochester' should refer to the cities in New York State. There are certain fixed keyword-driven patterns from the local context, which decide the sense of location NEs. These patterns use keywords such as 'city', 'town', 'province', 'on', 'in' or other location names. For example, the pattern &amp;quot;X + city&amp;quot; can determine sense tags for cases like &amp;quot;New York city&amp;quot;; and the pattern &amp;quot;City + comma + State&amp;quot; can disambiguate cases such as &amp;quot;Albany, New York&amp;quot; and &amp;quot;Shanghai, Illinois&amp;quot;. In the absence of these patterns, co-occurring location NEs in the same discourse can be good evidence for predicting the most probable sense of a location name.</Paragraph>
    <Paragraph position="6">  For choosing the best matching sense set within a document, we simply construct a graph where each node represents a sense of a location NE, and each edge represents the relationship between two location name senses. A graph spanning algorithm can be used to select the best senses from the graph. If there exist nodes that cannot be resolved in this step, we will apply default location senses that were extracted semi-automatically by statistical processing. The location normalization module, or 'LocNZ', is applied after the NE tagging module in our InfoXtract IE system as shown in Figure 1.</Paragraph>
    <Paragraph position="7"> This paper focuses on how to resolve ambiguity for the names of island, town, city, province, and country. Three applications of LocNZ in Information Extraction are illustrated in Section 2. Section 3 presents location sense identification using local context; Section 4 describes disambiguation process using information within a document through graph processing; Section 5 shows how to semi-automatically collect default senses of locations from a corpus; Section 6 presents an algorithm for location normalization with experimental results. The summary and conclusions are given in Section 7. Sample text and the results of location tagging are given in the Appendix.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML