File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1656_intro.xml

Size: 3,951 bytes

Last Modified: 2025-10-06 14:03:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1656">
  <Title>Boosting Unsupervised Relation Extraction by Using NER</Title>
  <Section position="3" start_page="0" end_page="473" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Soderland 1999) is the task of extracting factual assertions from text.</Paragraph>
    <Paragraph position="1"> Most IE systems rely on knowledge engineering or on machine learning to generate extraction patterns - the mechanism that extracts entities and relation instances from text. In the machine learning approach, a domain expert labels instances of the target relations in a set of documents. The system then learns extraction patterns, which can be applied to new documents automatically.</Paragraph>
    <Paragraph position="2"> Both approaches require substantial human effort, particularly when applied to the broad range of documents, entities, and relations on the Web. In order to minimize the manual effort necessary to build Web IE systems, we have designed and implemented URES (Unsupervised Relation Extraction System).</Paragraph>
    <Paragraph position="3"> URES takes as input the names of the target relations and the types of their arguments. It then uses a large set of unlabeled documents downloaded from the Web in order to learn the extraction patterns.</Paragraph>
    <Paragraph position="4"> URES is most closely related to the KnowItAll system developed at University of Washington by Oren Etzioni and colleagues (Etzioni, Cafarella et al. 2005), since both are unsupervised and both leverage relationindependent extraction patterns to automatically generate seeds, which are then fed into a pattern-learning component.</Paragraph>
    <Paragraph position="5"> KnowItAll is based on the observation that the Web corpus is highly redundant. Thus, its selective, high-precision extraction patterns readily ignore most sentences, and focus on sentences that indicate the presence of relation instances with very high probability.</Paragraph>
    <Paragraph position="6"> In contrast, URES is based on the observation that, for many relations, the Web corpus has limited redundancy, particularly when one is concerned with less prominent instances of these relations (e.g., the acquisition of Austria Tabak). Thus, URES utilizes a more expressive extraction pattern language, which enables it to extract information from a broader set of sentences.</Paragraph>
    <Paragraph position="7"> URES relies on a sophisticated mechanism to  assess its confidence in each extraction, enabling it to sort extracted instances, thereby improving its recall without sacrificing precision.</Paragraph>
    <Paragraph position="8"> Our main contributions are as follows: * We introduce the first domain-independent system to extract relation instances from the Web with both high precision and high recall.</Paragraph>
    <Paragraph position="9"> * We show how to minimize the human effort necessary to deploy URES for an arbitrary set of relations, including automatically generating and labeling positive and negative examples of the relation.</Paragraph>
    <Paragraph position="10"> * We show how we can integrate a simple NER component into the classification scheme of URES in order to boost recall between 5-15% for similar precision levels.</Paragraph>
    <Paragraph position="11"> * We report on an experimental comparison between URES, URES-NER and the state-of-the-art KnowItAll system, and show that URES can double or even triple the recall achieved by KnowItAll for relatively rare relation instances.</Paragraph>
    <Paragraph position="12"> The rest of the paper is organized as follows: Section 2 describes previous work. Section 3 outlines the general design principles of URES, its architecture, and then describes each URES component in detail. Section 4 presents our experimental evaluation. Section 5 contains conclusions and directions for future work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML