File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/a00-1039_intro.xml
Size: 2,538 bytes
Last Modified: 2025-10-06 14:00:43
<?xml version="1.0" standalone="yes"?> <Paper uid="A00-1039"> <Title>Unsupervised Discovery of Scenario-Level Patterns for Information Extraction</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 0 Introduction </SectionTitle> <Paragraph position="0"> The task of Information Extraction (I-E) is the selective extraction of meaning from free natural language text. I &quot;Meaning&quot; is understood here in terms of a fixed set of semantic objects--entities, relationships among entities, and events in which entities participate. The semantic objects belong to a small number of types, all having fixed regular structure, within a fixed and closely circumscribed subject domain. The extracted objects are then stored in a relational database. In this paper, we use the nomenclature accepted in current IE literature; the term subject domain denotes a class of textual documents to be processed, e.g., &quot;business news,&quot; and scenario denotes the specific topic of interest within the domain, i.e., the set of facts to be extracted. One example of a scenario is &quot;management succession,&quot; the topic of MUC-6 (the Sixth Message Understanding Conference); in this scenario the system seeks to identify events in which corporate managers left 1For general references on IE, cf., e.g., (Pazienza, 1997; muc, 1995; muc, 1993).</Paragraph> <Paragraph position="1"> their posts or assumed new ones. We will consider this scenario in detail in a later section describing experiments.</Paragraph> <Paragraph position="2"> IE systems today are commonly based on pattern matching. The patterns are regular expressions, stored in a &quot;pattern base&quot; containing a general-purpose component and a substantial domain- and scenario-specific component.</Paragraph> <Paragraph position="3"> Portability and performance are two major problem areas which are recognized as impeding widespread use of IE. This paper presents a novel approach, which addresses both of these problems by automatically discovering good patterns for a new scenario. The viability of our approach is tested and evaluated with an actual IE system.</Paragraph> <Paragraph position="4"> In the next section we describe the problem in more detail in the context of our IE system; sections 2 and 3 describe our algorithm for pattern discovery; section 4 describes our experimental results, followed by comparison with prior work and discussion, in section 5.</Paragraph> </Section> class="xml-element"></Paper>