File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/p05-1047_intro.xml

Size: 3,168 bytes

Last Modified: 2025-10-06 14:03:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-1047">
  <Title>A Semantic Approach to IE Pattern Induction</Title>
  <Section position="2" start_page="0" end_page="379" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Developing systems which can be easily adapted to new domains with the minimum of human intervention is a major challenge in Information Extraction (IE). Early IE systems were based on knowledge engineering approaches but suffered from a knowledge acquisition bottleneck. For example, Lehnert et al.</Paragraph>
    <Paragraph position="1"> (1992) reported that their system required around 1,500 person-hours of expert labour to modify for a new extraction task. One approach to this problem is to use machine learning to automatically learn the domain-specific information required to port a system (Riloff, 1996). Yangarber et al. (2000) proposed an algorithm for learning extraction patterns for a small number of examples which greatly reduced the burden on the application developer and reduced the knowledge acquisition bottleneck.</Paragraph>
    <Paragraph position="2"> Weakly supervised algorithms, which bootstrap from a small number of examples, have the advantage of requiring only small amounts of annotated data, which is often difficult and time-consuming to produce. However, this also means that there are fewer examples of the patterns to be learned, making the learning task more challenging. Providing the learning algorithm with access to additional knowledge can compensate for the limited number of annotated examples. This paper presents a novel weakly supervised algorithm for IE pattern induction which makes use of the WordNet ontology (Fellbaum, 1998).</Paragraph>
    <Paragraph position="3"> Extraction patterns are potentially useful for many language processing tasks, including question answering and the identification of lexical relations (such as meronomy and hyponymy). In addition, IE patterns encode the different ways in which a piece of information can be expressed in text. For example, &amp;quot;Acme Inc. fired Jones&amp;quot;, &amp;quot;Acme Inc. let Jones go&amp;quot;, and &amp;quot;Jones was given notice by his employers, Acme Inc.&amp;quot; are all ways of expressing the same fact. Consequently the generation of extraction patterns is pertinent to paraphrase identification which is central to many language processing problems.</Paragraph>
    <Paragraph position="4"> We begin by describing the general process of pattern induction and an existing approach, based on the distribution of patterns in a corpus (Section 2).</Paragraph>
    <Paragraph position="5"> We then introduce a new algorithm which makes use of WordNet to generalise extraction patterns (Section 3) and describe an implementation (Section 4).</Paragraph>
    <Paragraph position="6"> Two evaluation regimes are described; one based on the identification of relevant documents and another which aims to identify sentences in a corpus which  are relevant for a particular IE task (Section 5). Results on each of these evaluation regimes are then presented (Sections 6 and 7).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML