File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/x96-1029_intro.xml

Size: 4,646 bytes

Last Modified: 2025-10-06 14:06:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="X96-1029">
  <Title>THE ROLE OF SYNTAX IN INFORMATION EXTRACTION</Title>
  <Section position="2" start_page="0" end_page="139" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Our group at New York University has developed a number of information extraction systems over the past decade. In particular, we have been participants in the Message Understanding Conferences (MUCs) since MUC-1. During this time, while experimenting with many aspects of system design, we have retained a basic approach in which information extraction involves a phase of full syntactic analysis, followed by a semantic analysis of the syntactic structure \[2\]. Because we have a good, broad-coverage English grammar and a moderately effective method for recovering from parse failures, this approach held us in fairly good stead.</Paragraph>
    <Paragraph position="1"> However, we have recently found ourselves at a disadvantage with respect to groups which performed more local pattern matching, in three regards: 1. our systems were quite slow In processing the language as a whole, our system is operating with only relatively weak semantic preferences. As a result, the process of building a global syntactic analysis involves a large and relatively unconstrained search space and is consequently quite expensive. In contrast, pattern matching systems assemble structure &amp;quot;bottomup&amp;quot; and only in the face of compelling syntactic or semantic evidence, in a (nearly) deterministic manner.</Paragraph>
    <Paragraph position="2"> Speed was particularly an issue for MUC-6 because of the relatively short time frame (1 month for training). With a slow system, which can analyze only a few sentences per minute, it is possible to perform only one or at best two runs per day over the full training corpus, severely limiting debugging.</Paragraph>
    <Paragraph position="3"> 2. global parsing considerations sometimes led to local errors Our system was designed to generate a full sentence parse if at all possible. If not, it attempted a parse covering the largest substring of the sentence which it could. This global goal sometimes led to incorrect local choices of analyses; an analyzer which trusted local decisions could in many cases have done better.</Paragraph>
    <Paragraph position="4"> . adding syntactic constructs needed for a new scenario was hard Having a broad-coverage, linguistically-principled grammar meant that relatively few additions were needed when moving to a new scenario. However, when specialized constructs did have to be added, the task was relatively difficult, since these constructs had to be integrated into a large and quite complex grammar. null We considered carefully whether these difficulties might be readily overcome using an approach which was still based on a comprehensive syntactic grammar. It appeared plausible, although not certain, that problems (1) and (2) could be overcome within such an approach, by adopting a strategy of conservative parsing. A conservative parser would perform a reduction only if there was strong (usually, local) syntactic evidence or strong semantic support. In particular, chunking parsers, which built up small chunks using syntactic criteria and then assembled larger structures only if they were semantically licensed, might provide a suitable candidate.</Paragraph>
    <Paragraph position="5"> In any case, problem (3) still loomed. Our Holy Grail, like that of many groups, is to eventually get the computational linguist out of the loop in adapting an information extraction system for a new scenario. This will be difficult, however, if the scenario requires the addition of some grammatical construct, albeit minor. It would require us to organize the grammar in such a way that limited additions could be made  by non-specialists without having to understand the entire grammar -- again, not a simple task.</Paragraph>
    <Paragraph position="6"> In order to better understand the proper role of syntax analysis, we decided to participate in the most recent MUC, MUC-6 (held in the fall of 1995), using a quite different approach, often referred to as &amp;quot;pattern matching&amp;quot;, which has become increasingly popular among information extraction groups. In particular, we carefully studied the FASTUS system of Hobbs et al. \[1\], who have clearly and eloquently set forth the advantages of this approach. This approach can be viewed as a form of conservative parsing, although the high-level structures which are created are not explicitly syntactic.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML