File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2208_intro.xml

Size: 5,304 bytes

Last Modified: 2025-10-06 14:04:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2208">
  <Title>Expanding the Recall of Relation Extraction by Bootstrapping</Title>
  <Section position="3" start_page="0" end_page="56" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Relation extraction is a task to extract tuples of entities that satisfy a given relation from textual documents. Examples of relations include CeoOf(Company, Ceo) and Acquisition(Organization, Organization). There has been much work on relation extraction; most of it employs knowledge engineering or supervised machine learning approaches (Feldman et al., 2002; Zhao and Grishman, 2005). Both approaches are labor intensive.</Paragraph>
    <Paragraph position="1"> We begin with a baseline information extraction system, KnowItAll (Etzioni et al., 2005), that does unsupervised information extraction at Web scale.</Paragraph>
    <Paragraph position="2"> KnowItAll uses a set of generic extraction patterns, and automatically instantiates rules by combining these patterns with user supplied relation labels. For example, KnowItAll has patterns for a generic &amp;quot;of&amp;quot; relation: NP1 's BOrelationBQ , NP2 NP2 ,BOrelationBQ of NP1 where NP1 and NP2 are simple noun phrases that extract values of argument1 and argument2 of a relation, and BOrelationBQ is a user-supplied string associated with the relation. The rules may also constrain NP1 and NP2 to be proper nouns.</Paragraph>
    <Paragraph position="3"> If a user supplies the relation labels &amp;quot;ceo&amp;quot; and &amp;quot;chief executive officer&amp;quot; for the relation CeoOf(Company, Ceo), KnowItAll inserts these labels into the generic patterns shown above, to create 4 extraction rules:</Paragraph>
    <Paragraph position="5"> The same generic patterns with different labels can also produce extraction rules for a MayorOf relation or an InventorOf relation. These rules have alternating context strings (exact string match) and extraction slots (typically an NP or head of an NP). This can produce rules with high precision, but low recall, due to the wide variety of contexts describing a relation. This paper looks at ways to enhance recall over this baseline system while maintaining high precision.</Paragraph>
    <Paragraph position="6"> To enhance recall, we employ bootstrapping techniques which start with seed tuples, i.e. the most frequently extracted tuples by the baseline system. The first method represents rules with three context strings of tokens immediately adjacent to the extracted arguments: a left context,  middle context, and right context. These are induced from context strings found adjacent to seed tuples.</Paragraph>
    <Paragraph position="7"> The second method uses a less restrictive pattern representation such as bag of words, similar tothat ofSnowBall(Agichtein, 2005). SnowBallis a semi-supervised relation extraction system. The input of Snowball is a few hand labeled correct seed tuples for a relation (e.g. &lt;Microsoft, Steve Ballmer&gt; for CeoOf relation). SnowBall clusters the bag of words representations generated from the context strings adjacent to each seed tuple, and generates rules from them. It calculates the confidence of candidate tuples and the rules iteratively by using an EM-algorithm. Because it can extract any tuple whose entities co-occur within a window, the recall can be higher than the string pattern learning method. The main disadvantage of SnowBall or a method which employs less restrictive patterns is that it requires Named Entity Recognizer (NER).</Paragraph>
    <Paragraph position="8"> We introduce Relation-dependent NER (Relation NER), which trains an off-the-shelf supervised NER based on CRF(Lafferty et al., 2001) with bootstrapping. This learns relation-specific NEtags, and we present a method to use these tags for relation extraction.</Paragraph>
    <Paragraph position="9"> This paper compares the following two bootstrapping strategies.</Paragraph>
    <Paragraph position="10"> SPL: a simple string pattern learning method. It learns string patterns adjacent to a seed tuple.</Paragraph>
    <Paragraph position="11"> LRPL: a less restrictive pattern learning method.</Paragraph>
    <Paragraph position="12"> It learns a variety of bag of words patterns, after training a Relation NER.</Paragraph>
    <Paragraph position="13"> Both methods are completely self-supervised extensions to the unsupervised KnowItAll. A user supplies KnowItAll with one or more relation labels to be applied to one or more generic extraction patterns. No further tagging or manual selection ofseeds isrequired. Each ofthe bootstrapping methods uses seeds that are automatically selected from the output of the baseline KnowItAll system.</Paragraph>
    <Paragraph position="14"> The results show that both bootstrapping methods improve the recall of the baseline system. The two methods have comparable results, with LRPL outperforms SPL for some relations and SPL out-performs LRPL for other relations.</Paragraph>
    <Paragraph position="15"> The rest of the paper is organized as follows.</Paragraph>
    <Paragraph position="16"> Section 2 and 3 describe SPL and LRPL respectively. Section 4 reports on our experiments, and section 5 and 6 describe related works and conclusions. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML