File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-1127_metho.xml

Size: 11,277 bytes

Last Modified: 2025-10-06 14:07:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1127">
  <Title>Location Normalization for Information Extraction*</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Applications of Location Normalization
</SectionTitle>
    <Paragraph position="0"> Several applications are enabled through location normalization.</Paragraph>
    <Paragraph position="1"> * Event extraction and merging Event extraction is an advanced IE task. Extracted events can be merged to provide key content in a document. The merging process consists of several steps including checking information compatibility such as checking synonyms, name aliases and co-reference of anaphors, time and location normalization. Two events cannot be merged if there is a conflicting condition such as time and location. Figure 2 shows an example of event merging where the events occurred in Microsoft at Beijing, not in Seattle.</Paragraph>
    <Paragraph position="2"> * Event visua lization Visualization applications can illustrate where an event occurred with support of location normalization. Figure 3 demonstrates a visualized event on a map based on the normalized location names associated with the events. The input to visualization consists of extracted events from a news story pertaining to Julian Hill's life. The arrow points to the city where the event occurred.</Paragraph>
    <Paragraph position="3"> * Entity profile construction An entity profile is an information object for entities such as person, organization and location. It is defined as an Attribute Value Matrix (AVM) to represent key aspects of information about entities, including their relationships with other entities. Each attribute slot embodies some  information about the entity in one aspect. Each relationship is represented by an attribute slot in the Profile AVM. Sample Profile AVMs involving the reference of locations are illustrated below.</Paragraph>
    <Paragraph position="4">  Hausman, 1998; Srihari et al., 2000) attempt to tag information such as names of people, organizations, locations, time, etc. in running text. In InfoXtract, we combine Maximum Entropy Model (MaxEnt) and Hidden Markov Model for NE tagging (Shrihari et al.,, 2000). The Maximum Entropy Models incorporate local contextual evidence in handling ambiguity of information from a location gazetteer. In the Tipster Location gazetteer used by InfoXtract, there are a lot of common words, such as I, A, June, Friendship , etc. Also, there is large overlap between person names and location names, such as Clinton, Jordan, etc. Using MaxEnt, systems learn under what situation a word is a location name, but it is very difficult to determine the correct sense of an ambiguous location name. If a word can represent a city or state at the same time, such as New York or Washington, it is difficult to decide if it refers to city or state. The NE tagger in InfoXtract only assigns the location super-type tag NeLOC to the identified location words and leaves the task of location sub-type tagging such as NeCITY or NeSTATE and its normalization to the subsequent module LocNZ.</Paragraph>
    <Paragraph position="5"> For representation of LocNZ results, we add an unique zip code and position information that is longitude and latitude for the cities for event visualization.</Paragraph>
    <Paragraph position="6"> The first step of LocNZ is to use local context that is the co-occurring words around a location name. Local context can be a reliable source in deciding the sense of a location. The following are most commonly used patterns for this purpose.</Paragraph>
    <Paragraph position="7">  (1) location+comma+NP(headed by 'city') e.g. Chicago, an old city (2) 'city of' +location1+comma+location2 e.g. city of Albany, New York (3) 'city of' +location (4) 'state of'+location (5) location1+{,}+location2+{,}+location3 e.g. (i) Williamsville, New York, USA (ii) New York, Buffalo,USA (6) {on, in}+location e.g. on Strawberry NeIsland in Key West NeCity</Paragraph>
    <Paragraph position="9"> if the location is a city, a state or an island, while patterns (2) and (5) can be used to determine both the sub-tag and its sense. These patterns are implemented in our finite state transducer formalism.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Maximum Spanning Tree
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Calculation with Global Information
</SectionTitle>
      <Paragraph position="0"> Although local context can be reliable evidence for disambiguating location senses, there are still many cases which cannot be captured by the above patterns. Information in the entire document (i.e. discourse information) should be considered. Since all location names in a document have meaning relationships among them, a way to represent the best sense combination within the document is needed.</Paragraph>
      <Paragraph position="1"> The LocNZ process constructs a weighted graph where each node represents a location sense, and each edge represents similarity weight between location names. Apparently there will be no links among the different senses of a location name, so the graph will be partially complete. We calculate the maximum weight spanning tree (MaxST) using Kruskal's MinST algorithm (Cormen et al, 1990). The nodes on the resulting MaxST are the most promising senses of the location names.</Paragraph>
      <Paragraph position="2"> We define three criteria for similarity weight assignment between two nodes:  (1) More weight will be given to the edge between a city and the province (or the country) to which it belongs.</Paragraph>
      <Paragraph position="3"> (2) Distance between location names mentioned  in the document is taken into consideration. The shorter the distance, the more we assign the weight between the nodes.</Paragraph>
      <Paragraph position="4"> (3) The number of word occurrences affects the weight calculation. For multiple mentions of a location name, only one node will be represented in the graph. We assume that all the same location mentions have the same meaning in a document following one sense per discourse principle (Gale, Church, and Yarowsky, 1992).</Paragraph>
      <Paragraph position="5"> When calculating the weight between two location names, the predefined similarity values shown in Table 1, the number of location name occurrences and the distance between them in a text are taken into consideration. After selecting each edge, the senses that are connected will be chosen, and other senses of the same location name will be discarded so that they will not be considered again in the MaxST calculation. A weight value is calculated with equation (1), where sij indicate the jth sense of wordi, a reflects the number of location name occurrences in a text, and b refers to the distance between the two location names. Figure 4 shows the graph for calculating MaxST. Dots in a circle mean the number of senses of a location name.</Paragraph>
      <Paragraph position="6">  In our experiments, we found that the system performance suffers greatly from the lack of lexical information on default senses. For example, people refer to &amp;quot;Los Angeles&amp;quot; as the city at California more than the city in Philippines, Chile, Puerto Rico, or the city in Texas in the USA. This problem becomes a bottleneck in the system performance. As mentioned before, a location name usually has a dozen senses that need sufficient evidence in a document for selecting one sense among them.</Paragraph>
      <Paragraph position="8"> 171,039 location entries with 237,916 total senses that cover most location names all over the world. Each location in the gazetteer may have several senses. Among them 30,711 location names have more than one sense. Although it has ranking tags on some location entries, a lot of them have no tags attached or the same rank is assigned to the entries of the same name.</Paragraph>
      <Paragraph position="9"> Manually calculating the default senses for over 30,000 location names will be difficult and it is subject to inconsistency due to the different knowledge background of the human taggers. To solve this problem in calculating the default senses of location names, we propose to extract the knowledge from a corpus using statistical processing method.</Paragraph>
      <Paragraph position="10"> With the TREC-8 (Text Retrieval Conference) corpus, we can only extract default senses for 1687 location names, which cannot satisfy our requirement. This result shows that the general corpus is not sufficient to suit our purpose due to the serious 'data sparseness' problem. Through a series of experiments, we found that we could download highly useful information from Web search engines such as Google, Yahoo, and Northern Light by searching ambiguous location names in the Gazetteer. Web search engines can provide the closest content by their built-in ranking mechanisms. Among those engines, we found that the Yahoo search engine is the best one for our purpose. We wrote a script to download web-pages from Yahoo! using each ambiguous location name as a search string.</Paragraph>
      <Paragraph position="11"> In order to derive default senses automatically from the downloaded web-pages, we use the similarity features and scoring values between location-sense pairs described in Section 3. For example, if &amp;quot;Los Angeles&amp;quot; co-occurs with &amp;quot;California&amp;quot; in the same web-page, then its sense will be most probably set to the city in California by the system. Suppose a location word w has several city senses si: Sense(w) indicates the default sense of w; sim(wi,xjk) means the similarity value between two senses of the word w and the jth co-occuring word xj; num(w) is the number of w in the document, and NumAll is the total number of locations. a is a parameter that reflects the importance of the co-occurring location names and is determined empirically.</Paragraph>
      <Paragraph position="12"> The default sense of w is wi that maximizes the similarity value with all co-occurring location names. The maximum similarity should be larger than a threshold to keep meaningful default senses. The threshold can be determined empirically through experimentation.</Paragraph>
      <Paragraph position="13">  For each of 30,282 ambiguous location names, we used the name itself as search term in Yahoo to download its corresponding web-page. The system produced default senses for 18,446 location names. At the same time, it discarded the remaining location names because the corresponding web-pages do not contain sufficient evidence to reach the threshold. We observed that the results reflect the correct senses in most cases, and found that the discarded location names have low references in the search results of other Web search engines. This means they will not appear frequently in text, hence minimal impact on system performance. We manually modified some of the default sense results based on the ranking tags in the Tipster Gazetteer and some additional information on population of the locations in order to consolidate the default senses.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML