File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-1323_intro.xml

Size: 1,887 bytes

Last Modified: 2025-10-06 14:01:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1323">
  <Title>Combining Lexical and Formatting Cues for Named Entity Acquisition from the Web</Title>
  <Section position="4" start_page="0" end_page="181" type="intro">
    <SectionTitle>
2 Focusing on Definitory Contexts
</SectionTitle>
    <Paragraph position="0"> Two issues are addressed in this paper: 1. While traditional electronic corpora can be accessed directly and entirely through large-scale filters such as shallow parsers, access to Web pages is restricted to the narrow and specialized medium of a search engine. In order to spot and retrieve relevant text chunks, we must focus on linguistic cues that can be used to access pages containing typed NEs with high precision.</Paragraph>
    <Paragraph position="1"> 2. While Web pages are full of NEs, only a small proportion of them are relevant for the acquisition of public, fresh and well-known NEs (the name of someone's cat  is not relevant to our purpose). So that automatically acquired NEs can be used in a NE recognition task, they are associated with types such as actor (PER-SON), lake (LOCATION), or university (ORGANIZATION).</Paragraph>
    <Paragraph position="2"> The need for selective linguistic cues (wrt to the current facilities offered by search engines) and for informative and typifying contexts has led us to focus on collections, a specific type of definitory contexts (Pdry-Woodley, 1998). Because they contain specific linguistic triggers such as following or such as, definitory contexts can be accessed through phrase queries to a search engine. In addition, these contexts use the classical scheme genus/differentia to define NEs, and thus provide, through the genus, a hypernym of the NEs they define.</Paragraph>
    <Paragraph position="3"> Our study extends (Hearst, 1998) to Web-based and spatially formatted corpora.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML