File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/95/m95-1014_intro.xml

Size: 4,725 bytes

Last Modified: 2025-10-06 14:05:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="M95-1014">
  <Title>The NYU System for MUC- 6 or Where's the Syntax ?</Title>
  <Section position="1" start_page="0" end_page="167" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Over the past five MUCs, New York University has clung faithfully to the idea that information extractio n should begin with a phase of full syntactic analysis, followed by a semantic analysis of the syntactic structure .</Paragraph>
    <Paragraph position="1"> Because we have a good, broad-coverage English grammar and a moderately effective method for recoverin g from parse failures, this approach held us in fairly good stead .</Paragraph>
    <Paragraph position="2"> However, we were at a disadvantage with respect to groups which performed more local pattern recognition, in three regards: 1. our systems were quite slow In processing the language as a whole, our system is operating with only relatively weak semanti c preferences . As a result, the process of building a global syntactic analysis involves a large and relativel y unconstrained search space and is consequently quite expensive . In contrast, pattern matching system s assemble structure &amp;quot;bottom-up&amp;quot; and only in the face of compelling syntactic or semantic evidence, in a (nearly) deterministic manner.</Paragraph>
    <Paragraph position="3"> Speed was particularly an issue for MUC-6 because of the relatively short time frame (1 month fo r training) . With a slow system, which can analyze only a few sentences per minute, it is possible t o perform only one or at best two runs per day over the full training corpus, severely limiting debugging . 2. global parsing considerations sometimes led to local error s Our system was designed to attempt to generate a full sentence parse if at all possible . If not, it attempted a parse covering the largest substring of the sentence which it could . This global goal sometimes led to incorrect local choices of analyses ; an analyzer which trusted local decisions could i n many cases have done better.</Paragraph>
    <Paragraph position="4"> :3. adding syntactic constructs needed for a new scenario was har d Having a broad-coverage, linguistically-principled grammar meant that relatively few additions wer e needed when moving to a new scenario . However, when specialized constructs did have to be added, th e task was relatively difficult, since these constructs had to be integrated into a large and quite comple x grammar.</Paragraph>
    <Paragraph position="5"> We considered carefully whether these difficulties might be readily overcome using an approach which wa s still based on a comprehensive syntactic grammar . It appeared plausible, although not certain, that problem s (1) and (2) could be overcome within such an approach, by adopting a strategy of conservative parsing . A conservative parser would perform a reduction only if there was strong (usually, local) syntactic evidenc e or strong semantic support . In particular, chunking parsers, which built up small chunks using syntacti c criteria and then assembled larger structures only if they were semantically licensed, might provide a suitabl e candidate. Ideally, a parser might learn which decisions could be safely made based purely on syntacti c  evidence, but building such a parser would be a substantial research project not to be lightly undertaken i n the months leading up to a MUG .</Paragraph>
    <Paragraph position="6"> In any case, problem (3) still loomed . Our Holy Grail, like that of many groups, is to eventually get the computational linguist out of the loop in adapting an information extraction system for a new scenario . This will be difficult, however, if the scenario requires the addition of some grammatical construct, albei t minor . It would require us to organize the grammar in such a way that limited additions could be made b y non-specialists without having to understand the entire grammar -- again, not a simple task .</Paragraph>
    <Paragraph position="7"> All these considerations led us to conclude that we should &amp;quot;do a MUC&amp;quot; ourselves using the patter n matching approach, in order to better appreciate its strengths and weaknesses . In particular, we carefull y studied the FASTUS system of Hobbs et al . [1], who have clearly and eloquently set forth the advantage s of this approach . This approach can be viewed as a form of conservative parsing, although the high-leve l structures which are created are not explicitly syntactic . At the end of this paper we return to the questio n of the relation of pattern matching to approaches which use a comprehensive grammar .</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML