File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/a00-1039_intro.xml

Size: 2,538 bytes

Last Modified: 2025-10-06 14:00:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="A00-1039">
  <Title>Unsupervised Discovery of Scenario-Level Patterns for Information Extraction</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
0 Introduction
</SectionTitle>
    <Paragraph position="0"> The task of Information Extraction (I-E) is the selective extraction of meaning from free natural language text. I &amp;quot;Meaning&amp;quot; is understood here in terms of a fixed set of semantic objects--entities, relationships among entities, and events in which entities participate. The semantic objects belong to a small number of types, all having fixed regular structure, within a fixed and closely circumscribed subject domain. The extracted objects are then stored in a relational database. In this paper, we use the nomenclature accepted in current IE literature; the term subject domain denotes a class of textual documents to be processed, e.g., &amp;quot;business news,&amp;quot; and scenario denotes the specific topic of interest within the domain, i.e., the set of facts to be extracted. One example of a scenario is &amp;quot;management succession,&amp;quot; the topic of MUC-6 (the Sixth Message Understanding Conference); in this scenario the system seeks to identify events in which corporate managers left 1For general references on IE, cf., e.g., (Pazienza, 1997; muc, 1995; muc, 1993).</Paragraph>
    <Paragraph position="1"> their posts or assumed new ones. We will consider this scenario in detail in a later section describing experiments.</Paragraph>
    <Paragraph position="2"> IE systems today are commonly based on pattern matching. The patterns are regular expressions, stored in a &amp;quot;pattern base&amp;quot; containing a general-purpose component and a substantial domain- and scenario-specific component.</Paragraph>
    <Paragraph position="3"> Portability and performance are two major problem areas which are recognized as impeding widespread use of IE. This paper presents a novel approach, which addresses both of these problems by automatically discovering good patterns for a new scenario. The viability of our approach is tested and evaluated with an actual IE system.</Paragraph>
    <Paragraph position="4"> In the next section we describe the problem in more detail in the context of our IE system; sections 2 and 3 describe our algorithm for pattern discovery; section 4 describes our experimental results, followed by comparison with prior work and discussion, in section 5.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML