File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2201_intro.xml

Size: 3,123 bytes

Last Modified: 2025-10-06 14:04:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2201">
  <Title>Learning Effective Surface Text Patterns for Information Extraction</Title>
  <Section position="2" start_page="0" end_page="1" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Ravichandran and Hovy (2002) present a method to automatically learn surface text patterns expressing relations between instances of classes using a search engine. Their method, based on a training set, identiPSes natural language surface text patterns that express some relation between two instances. For example, &amp;quot;was born in&amp;quot; proved to be a precise pattern expressing the relation between instances Mozart (of class 'person') and 1756 (of class 'year').</Paragraph>
    <Paragraph position="1"> We address the issue of learning surface text patterns, since we observed two drawbacks of Ravichandran and Hovy's work with respect to the application of such patterns in a general information extraction setting.</Paragraph>
    <Paragraph position="2"> The PSrst drawback is that Ravichandran and Hovy focus on the use of such surface text patterns to answer so-called factoid questions (Voorhees, 2004). They use the assumption that each instance is related by R to exactly one other instance of some class. In a general information extraction setting, we cannot assume that all relations are functional.</Paragraph>
    <Paragraph position="3"> The second drawback is that the criterion for selecting patterns, precision, is not the only issue for a pattern to be effective. We call a pattern effective, if it links many different instance-pairs in the excerpts found with a search engine.</Paragraph>
    <Paragraph position="4"> We use an ontology to model the information domain we are interested in. Our goal is to populate an ontology with the information extracted. In an ontology, instances of one class can be related by some relation R to multiple instances of some other class. For example, we can identify the classes 'movie' and 'actor' and the 'acts in'relation, which is a many-to-many relation. In general, multiple actors star in a single movie and a single actor stars in multiple movies.</Paragraph>
    <Paragraph position="5"> In this paper we present a domain-independent method to learn effective surface text patterns representing relations. Since not all patterns found are highly usable, we formulate criteria to select the most effective ones. We show how such patterns can be used to populate an ontology.</Paragraph>
    <Paragraph position="6"> The identiPScation of effective patterns is important, since we want to perform as few queries to a search engine as possible to limit the use of its services.</Paragraph>
    <Paragraph position="7"> This paper is organized as follows. After dePSning the problem (Section 2) and discussing related work (Section 3), we present an algorithm to learn effectivesurfacetextpatternsinSection4. Wediscuss the application of this method in an ontology population algorithm in Section 5. In Section 6, we present some of our early experiments. Sec-</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML