File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0101_intro.xml

Size: 3,210 bytes

Last Modified: 2025-10-06 14:01:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0101">
  <Title>Experiments with geographic knowledge for information extraction Dimitar Manov,</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Related work
</SectionTitle>
    <Paragraph position="0"> In the context of this paper, the two most relevant areas of work are on large-scale gazetteers and location disambiguation. Here we present the Alexandria Digital Library Gazetteer because we used the ADL Feature Type Thesaurus as a basis of our location ontology. Related work on location disambiguation, like the one done in the Perseus Digital Library project, is relevant because in future work we will improve the location disambiguation mechanism in our system.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Alexandria Digital Library Gazetteer
The Alexandria Digital Library (ADL), an NSF-funded
</SectionTitle>
      <Paragraph position="0"> project at the University of California, Santa Barbara, has included gazetteer development from its beginning in 1994. Currently it contains approximately 4.4 million entries. The data is taken from various sources, including NIMA (National Imagery and Mapping Agency's of United States) Gazetteer, a set of countries and U.S.</Paragraph>
      <Paragraph position="1"> counties, set of U.S. topographic map quadrangle footprints, set of volcanoes, and set of earthquake epicenters. The Geographic Names Information System (GNIS) data from the U.S. Geological Survey has been partly added to the collection. The results as of today include thesaurus for feature types, Time Period data for the historical entries and spatial data with boundaries. The boundaries are defined as &amp;quot;satisficing&amp;quot; rectangles. The term &amp;quot;satisficing&amp;quot; is described in (Hill, 2000), and additional information about the project could also be found there as well as on the ADL gazetteer development page at http://alexandria.sdc.ucsb.edu/~lhill/adlgaz/.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Toponym-disambiguation in Perseus Digital
Library project
</SectionTitle>
      <Paragraph position="0"> A disambiguation system for historical place names for Perseus digital library is described in (Smith and Crane, 2001). The library is concentrated on representing historical data in the humanities from ancient Greece to nineteenth-century America. The authors present a procedure for disambiguation of such place names, based on internal and external evidence from the text. Internal evidence includes the use of honorifics, generic geographic labels, or linguistic environment. External evidence includes gazetteers, biographical information, and general linguistic knowledge. Evaluation of the performance of the system is given, using standard precision/recall methods for each of the five corpora: Greek, Roman, London, California, Upper Midwest. The system is best on Greek and worst on Upper Midwest corpus, and its overall performance for place names is higher than the most of other applications.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML