File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/88/a88-1007_intro.xml

Size: 8,481 bytes

Last Modified: 2025-10-06 14:04:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="A88-1007">
  <Title>IMPROVED PORTABILITY AND PARSING THROUGH INTERACTIVE ACQUISITION OF SEMANTIC INFORMATION t</Title>
  <Section position="3" start_page="0" end_page="50" type="intro">
    <SectionTitle>
1. INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> A major concern in designing a natural-language system is portability: It is advantageous to design a system in such a way that it can be ported to new domains with a minimum of effort.</Paragraph>
    <Paragraph position="1"> The level of effort required for such a port is considerably simplified if the system features a high degree of modularity. For example, if the domain-independent and domain-specific components of a system are clearly factored, only the domain-specific knowledge bases need be changed when porting to a new domain. Even if a system demonstrates such separation, however, the problem remains of acquiring this domain-specific tThis work has been supported in part by DARPA under contract N00014-85-C-0012, administered by the Office of Naval Research, and in part by National Science Foundation contract DCR-85-02205, as well as by Independent R~D funding from System Development Corporation, now part of Unisys Corporation.</Paragraph>
    <Paragraph position="2"> knowledge.</Paragraph>
    <Paragraph position="3"> One obvious benefit of acquiring domain-specific semantic information is rejecting parses generated by the syntactic component which are semantically anomalous. Using domain knowledge to rule out semantically anomalous parses is especially important when parsing with large, broad-coverage grammars such as ours: Our Prolog implementation of Restriction Grammar ~-Iirschman1982,Hirschman1985\] includes about 100 grammar rules and 75 restrictions, and is based on Sager's Linguistic String Grammar \[Sager1981\]. It also includes a full treatment of sentential fragments and telegraphic message style. As a result of this extended coverage, many sentences receive numerous syntactic analyses. A majority of these analyses, however, are incorrect because they violate some semantic constraint.</Paragraph>
    <Paragraph position="4"> Let us take as an example the sentence High lsbe oil temperatsre belle~ed contribstor to snlt failure. Two of the parses for this sentence could be paraphrased as: (1) The high lube oil temperature believed the contributor to the unit failure.</Paragraph>
    <Paragraph position="5"> (2) The high lube oil temperature was believed to be a contributor to the unit failure.</Paragraph>
    <Paragraph position="6"> but our knowledge of the domain (and common sense) tells us that the first parse is wrong, since temperatures cannot hold beliefs.</Paragraph>
    <Paragraph position="7"> It is only because of this semantic information that we know that parse (2) is correct, and that parse (1) is not, since we cannot rule out parse (1) on syntactic grounds alone. In fact, our grammar generates the incorrect parse before the correct one, since it produces full assertion parses before fragment parses. If the syntactic component has access to semantic knowledge,  however, many incorrect parses such as (1) will never be generated.</Paragraph>
    <Paragraph position="8"> How then can we collect the necessary semantic information about a domain? One traditional approach involves analysing a corpus of texts by hand, or perhaps even simply relying on one's intuitive knowledge of the domain in order to gather information about what relations can hold among domain entities. Several obvious drawbacks to these approaches are that they are time-consuming, error-prone, and incomplete. A more robust approach would be to use (semi-) automated tools designed to collect such information by cataloguing selectional patterns found in correct parses of sentences.</Paragraph>
    <Paragraph position="9"> However, our reasoning appears circular: The desired domain-specific information can only be obtained from analyses of correctly parsed sentences, but our goal is to restrict the parser to these correct analyses precisely by using this domain knowledge. In the example above, we need the semantic knowledge to rule out the first parse; but it is only by knowing that this parse is semantically anomalous that we can obtain the selectional information about the domain.</Paragraph>
    <Paragraph position="10"> One way to avoid this circularity is to bootstrap into a state of increasingly complete domain knowledge. We have implemented in SPQR such a bootstrapping process by incrementally collecting and ~toring domaln-specific data gathered through interaction with the user. The data are in the form of selectlonal constraints expressed as allowable and unallowable syntactic co-occurrence patterns. All the data collected while parsing a set of sentences can then be used to help guide the parser to correct analyses and to decrease the search space traversed during future parsing. As the system's semantic knowledge becomes increasingly rich, we can expect it to demonstrate some measure of learning, since it will produce fewer incorrect analyses and present fewer queries to the user about the validity of syntactic patterns.</Paragraph>
    <Paragraph position="11"> A number of systems have been developed to assist the user in acquiring domaln-speclfic knowledge, including TELI ~allard1986\], TEAM \[Cross1083\],</Paragraph>
    <Paragraph position="13"> Related work has also been reported in \[Tomita1984\], as well as in \[Grishman1986\] and ~-Iirschman1986a\]. The work described here differs from these previous efforts in several ways: 1 * Since PUNDIT is not a natural-language interface or a database front end, but rather a full text-processing system, sentences analysed by PUNDIT are taken from corpuses of naturally-occurring texts. The semantic information gathered is therefore empirically or statistically based, and not derived from sentences generated by a user.</Paragraph>
    <Paragraph position="14"> * The ellcltation of information from the user follows a highly structured, data-driven approach, yielding results which should be more reproducible and consistent among users.</Paragraph>
    <Paragraph position="15"> * IV\[any systems have a clearly defined knowledge-acquisltlon phase which must be completed before the system can be effectively used or tested. We have chosen instead to adopt a paradigm of incremental knowledge acquisition.</Paragraph>
    <Paragraph position="16"> Our incremental approach is based on the assumption that gathering complete knowledge about domain is an unattainable ideal, especially for a system which performs in-depth analysis of texts written in technical sublanguages: Even if one could somehow be assured of acquiring all conceivable knowledge about a domain, the system's omniscience would be transient, since the technical fields themselves are constantly changing, and thus require modifications to one's knowledge base. An incremental acquisition method therefore allows us to start from an essentially empty knowledge base. Each sentence parsed can add inforr~.atlon about the domain, and the system thereby effectively bootstraps itself until its knowledge about the selectional patterns in a domain approaches completeness.</Paragraph>
    <Paragraph position="17"> In this paper we present SPQR, the component of the PUNDIT 2 text-understanding system which is designed to acquire domain-specific selectional information ~ang1987\]. We present in Section 2 the methodology we have adopted to collect and use selectional patterns, and then give in Section 3 some examples of the operation of our 1See \[Ballard1986\] for a detailed and informative comparison of TELI, TEAM, IRACQ, T(~A, and ASK.</Paragraph>
    <Paragraph position="18"> =PUNDIT (Prolog UNDerstands Integrated Text) is implemented in Quintus Prolog, and has been described in \[Hirschman1985\] and \[Hirschman1988b\] (syntax), \[Palmer19861 (semantics), \[Dah119861 (discourse), and \[Passonneau1988\] (temporal analysis).</Paragraph>
    <Paragraph position="19">  module. We conclude by presenting some experimental results and discussing some future plans to extend the module.</Paragraph>
    <Paragraph position="20"> SPQR has been used in analysing texts in three domains: casualty reports (CASREPs) dealing with mechanical failures of starting air eompreuorJ (SACs are a component of a ship's engine), queries to a Navy ships database, and Navy sighting messages (RAINFORMs).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML