File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/h94-1033_intro.xml

Size: 4,313 bytes

Last Modified: 2025-10-06 14:05:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="H94-1033">
  <Title>Pattern Matching in a Linguistically- Motivated Text Understanding System</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> Virtually all systems which participated in the Fifth Message Understanding Conference, MUC-5 \[1\], used finite-state (FS) pattem matching to some extent. Two useful tasks that this approach is well suited for are: 1. treating application-specific simple constructions that may not belong in a general grammar of the language, and 2. detecting constructions which, though grammatical, may be found more reliably using domain-specific patterns.</Paragraph>
    <Paragraph position="1"> For example, special-purpose FS subgrammars were used widely to efficiently and reliably recognize equipment names and company names. This illustrates one (1) above. An illustration of (2) appears in the complex sentence below: Daio Paper Corp. said it will set up a cardboard factory in Ho Chi Minh City, Vietnam, jointly with state-run</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Cogido Paper Manufacturing Company.
</SectionTitle>
      <Paragraph position="0"> It is easy for any parser to err in not attaching the modifier &amp;quot;jointly&amp;quot; to &amp;quot;set up,&amp;quot; and thereby miss the fact that a joint venture is being reported. One might argue that the sentence includes a discontiguous constituent (&amp;quot;set up ... jointly&amp;quot;). Nevertheless, it is easy to write a general pattern to deal with the discontiguous constituent correctly for this domain.</Paragraph>
      <Paragraph position="1"> Finite-state parsers perform simple operations, and they are fast.</Paragraph>
      <Paragraph position="2"> In data-extraction applications, where much of the input can be safely ignored, they provide an easy means to skip text without deep analysis. Some of the best-performing systems in MUC-5 relied heavily on the use of finite-state pattern-matching in crucial system components.</Paragraph>
      <Paragraph position="3"> However, there are several advantages in maintaining broad linguistically-based coverage of a language, even in a data-extraction task. First, it allows for well-defined linguistic structures to be recognized and represented in a domain independent way. This provides a level of linguistic representation that can be used by other general linguistic components such as a domain-independent discourse processor. In fact, this is a representational level which will probably be evaluated in the next Message Understanding Conference, MUC-6.</Paragraph>
      <Paragraph position="4"> Secondly, general linguistic coverage provides application independence. Different applications, such as data detection (information retrieval) can use the linguistic representations for various purposes. Achieving a synergistic operation of data-extraction and data-detection systems is one of the key goals of ARPA's TIPSTER Phase II project.</Paragraph>
      <Paragraph position="5"> Another intuitive advantage is portability. When porting a system to a new application, a base level of understanding is achieved very quickly before having to add domain-specific patterns. This is possible because the bulk of the processing work is done by the domain-independent rules.</Paragraph>
      <Paragraph position="6"> BBN's data-extraction system, PLUM \[2\], showed consistently high-ranking performance in the MUC-3 \[3\], MUC-4 \[4\], and MUC-5 evaluations. We added two new finite-state pattern-matching modules to PLUM between MUC-4 and MUC-5, expecting a substantial payoff in performance. The surprising result, as measured on TIPSTER test data, was that although domain-specific pattern matching improved performance, in the English domains it was only a slight improvement over more general, linguistically-motivated techniques.</Paragraph>
      <Paragraph position="7"> In the next section we further discuss the movement towards FS approximations in the community. We then describe the role of finite-state pattern-matching in BBN's PLUM system in more detail. Finally we present experiments used to measure the resulting effect in PLUM.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML