File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-1323_intro.xml
Size: 1,887 bytes
Last Modified: 2025-10-06 14:01:07
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1323"> <Title>Combining Lexical and Formatting Cues for Named Entity Acquisition from the Web</Title> <Section position="4" start_page="0" end_page="181" type="intro"> <SectionTitle> 2 Focusing on Definitory Contexts </SectionTitle> <Paragraph position="0"> Two issues are addressed in this paper: 1. While traditional electronic corpora can be accessed directly and entirely through large-scale filters such as shallow parsers, access to Web pages is restricted to the narrow and specialized medium of a search engine. In order to spot and retrieve relevant text chunks, we must focus on linguistic cues that can be used to access pages containing typed NEs with high precision.</Paragraph> <Paragraph position="1"> 2. While Web pages are full of NEs, only a small proportion of them are relevant for the acquisition of public, fresh and well-known NEs (the name of someone's cat is not relevant to our purpose). So that automatically acquired NEs can be used in a NE recognition task, they are associated with types such as actor (PER-SON), lake (LOCATION), or university (ORGANIZATION).</Paragraph> <Paragraph position="2"> The need for selective linguistic cues (wrt to the current facilities offered by search engines) and for informative and typifying contexts has led us to focus on collections, a specific type of definitory contexts (Pdry-Woodley, 1998). Because they contain specific linguistic triggers such as following or such as, definitory contexts can be accessed through phrase queries to a search engine. In addition, these contexts use the classical scheme genus/differentia to define NEs, and thus provide, through the genus, a hypernym of the NEs they define.</Paragraph> <Paragraph position="3"> Our study extends (Hearst, 1998) to Web-based and spatially formatted corpora.</Paragraph> </Section> class="xml-element"></Paper>