File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/91/h91-1066_intro.xml

Size: 3,088 bytes

Last Modified: 2025-10-06 14:05:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="H91-1066">
  <Title>LEXICO-SEMANTIC PATTERN MATCHING AS A COMPANION TO PARSING IN TEXT UNDERSTANDING</Title>
  <Section position="4" start_page="0" end_page="0" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> The interpretation of large volumes of text poses many control problems, including limiting the complexity of analysis and ensuring the production of valid interpretations without considering too many possibifities. These problems are especially severe in processing news stories, where long sentences, information-rich news-style constructions, and the complex structure of events make normal syntax-first analysis especially impractical.</Paragraph>
    <Paragraph position="1"> Normal left-to-right syntactic parsing, in virtually all its forms, is a disaster for interpreting broad classes of extended texts.</Paragraph>
    <Paragraph position="2"> Multiple-path methods are haunted by attachment problems that can lead to a combinatoric explosion of paths, while simple deterministic methods bring on parser failures and problems in combining preferences. In previous work aimed at word sense coding of news stories \[1\], we have found that even heavy pruning of a multiple-path chart parsing strategy often leaves hundreds of parses to consider for a single sentence. Even worse, minor irregularities in linguistic structure or word usage bring on parser failures and inadequate interpretations.</Paragraph>
    <Paragraph position="3"> Better parsing strategies, including control using statistical data, flexible partial parsing, and recovery, can certainly help with some of these problems, but some of the easiest improvements in the control of parsing come from the creative use of pre-processing. Our system incorporates a lexico-semantic pattern mateher, which uses much of the same knowledge base as the parser and semantic interpreter but performs a global, superficial analysis of text prior to parsing. The design and implementation of the pattern matcher is simple; instead of concentrating on its details, this paper focuses on the functionality of pre-processing and its impact on parser control.</Paragraph>
    <Paragraph position="4"> Three aspects of pre-processing have particular promise for the quality and efficiency of later processing--tagging, template aciiva~ion (including topic analysis), and segmentation (or bracketing). Tagging uses lexical data to constrain the part of speech and word senses of important words, template activation determines a set of possible templates, or frames, and segmentation associates portions of text with templates or template fillers. These techniques help the language analyzer to cope with the complexity of real text, both by reducing the combinatorics of parsing and by constraining word senses and attachment decisions. The following is a sample text taken from the development corpus of the MUC-3 message understanding evaluation 1, with the results of pre-processing after segmentation: Original text:</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML