File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/06/w06-1656_relat.xml

Size: 5,640 bytes

Last Modified: 2025-10-06 14:15:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1656">
  <Title>Boosting Unsupervised Relation Extraction by Using NER</Title>
  <Section position="4" start_page="473" end_page="474" type="relat">
    <SectionTitle>
2 Related Work
</SectionTitle>
    <Paragraph position="0"> The IE systems most similar to URES are based on bootstrap learning: Mutual Bootstrapping (Riloff and Jones 1999), the DIPRE system (Brin 1998), and the Snowball system (Agichtein and Gravano 2000 ).</Paragraph>
    <Paragraph position="1"> (Ravichandran and Hovy 2002) also use bootstrapping, and learn simple surface patterns for extracting binary relations from the Web.</Paragraph>
    <Paragraph position="2"> Unlike those unsupervised IE systems, URES patterns allow gaps that can be matched by any sequences of tokens. This makes URES patterns much more general, and allows to recognize instances in sentences inaccessible to the simple surface patterns of systems such as (Brin 1998; Riloff and Jones 1999; Ravichandran and Hovy 2002). The greater power of URES requires different and more complex methods for learning, scoring, and filtering of patterns.</Paragraph>
    <Paragraph position="3"> Another direction for unsupervised relation learning was taken in (Hasegawa, Sekine et al. 2004; Chen, Ji et al. 2005). These systems use a NER system to identify pairs of entities and then cluster them based on the types of the entities and the words appearing between the entities. Only pairs that appear at least 30 times were considered. The main benefit of this approach is that all relations between two entity types can be discovered simultaneously and there is no need for the user to supply the relations definitions. Such a system could have been used as a preliminary step to URES, however its relatively low precision makes it unfeasible. Unlike URES, the evaluations performed in these papers ignored errors that were introduced by the underlying NER component. The precision reported by these systems (77% breakeven for the COM-COM domain) is inferior to that of URES.</Paragraph>
    <Paragraph position="4"> We compared our results directly to two other unsupervised extraction systems, the Snowball (Agichtein and Gravano 2000 ) and KnowItAll. Snowball is an unsupervised system for learning relations from document collections. The system takes as input a set of seed examples for each relation, and uses a clustering technique to learn patterns from the seed examples. It does rely on a full fledged Named Entity Recognition system. Snowball achieved fairly low precision figures (30-50%) on relations such as Merger and Acquisition on the same dataset we used in our experiments.</Paragraph>
    <Paragraph position="5"> KnowItAll is a system developed at University of Washington by Oren Etzioni and colleagues (Etzioni, Cafarella et al. 2005). We shall now briefly describe it and its pattern learning component.</Paragraph>
    <Paragraph position="6"> Brief description of KnowItAll KnowItAll uses a set of generic extraction patterns, and automatically instantiates rules by combining those patterns with user supplied relation labels. For example, KnowItAll has patterns for a generic &amp;quot;of&amp;quot; relation:</Paragraph>
    <Paragraph position="8"> where NP1 and NP2 are simple noun phrases that extract values of attribute1 and attribute2 of a relation, and &lt;relation&gt; is a user-supplied string associated with the relation. The rules may also constrain NP1 and NP2 to be proper nouns.</Paragraph>
    <Paragraph position="9"> The rules have alternating context strings (exact string match) and extraction slots (typically an NP or head of an NP). Each rule has an associated query used to automatically find candidate sentences from a Web search engine.</Paragraph>
    <Paragraph position="10"> KnowItAll also includes mechanisms to control the amount of search, to merge redundant extractions, and to assign a probability to each extraction based on frequency of extraction or on Web statistics (Downey, Etzioni et al. 2004).</Paragraph>
    <Paragraph position="11"> KnowItAll-PL. While those generic rules lead to high precision extraction, they tend to have low recall, due to the wide variety of contexts describing a relation. KnowItAll includes a simple pattern learning scheme (KnowItAll-PL) that builds on the generic extraction mechanism (KnowItAll-baseline).</Paragraph>
    <Paragraph position="12"> Like URES, this is a self-supervised method that bootstraps from seeds that are automatically extracted by the baseline system. KnowItAll-PL creates a set of positive training sentences by downloading sentences that contain both argument values of a seed tuple and also the relation label. Negative training is created by downloading sentences with only one of the seed argument values, and considering a nearby NP as the other argument value. This does not guarantee that the negative example will actually be false, but works well in practice.</Paragraph>
    <Paragraph position="13"> Rule induction tabulates the occurrence of context tokens surrounding the argument values of the positive training sentences. Each candidate extraction pattern has a left context of zero to k tokens immediately to the left of the first argument, a middle context of all tokens between the two arguments, and a right context of zero to k tokens immediately to the right of the second argument. A pattern can be generalized by dropping the furthest terms from the left or right context. KnowItAll-PL retains the most general version of each pattern that has training frequency over a threshold and training precision over a threshold.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML