XML Viewer - w91-0106

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/91/w91-0106_intro.xml
Size: 6,874 bytes
Last Modified: 2025-10-06 14:05:05
<?xml version="1.0" standalone="yes"?>
<Paper uid="W91-0106">
  <Title>REVERSIBLE NLP BY DERIVING THE GRAMMARS FROM THE KNOWLEDGE BASE</Title>
  <Section position="3" start_page="0" end_page="40" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Most natural language processing systems are initially built in a single direction only; most are parsers (understanding systems), a few are generators.</Paragraph>
    <Paragraph position="1"> These systems are often then embedded in full, bi-directional interfaces, whereupon a new, almost non-technical kind of problem arises if differences in the two uni-directional subsystems are not controlled.</Paragraph>
    <Paragraph position="2"> The full system may not understand the same wording or syntactic constructions that it can generate; or generation and parsing development teams may both have to work on extensions and modifications to their grammars, with the likely result that still further differences will be introduced.</Paragraph>
    <Paragraph position="3"> These practical problems bolster an intuition that many have that knowledge of parsing and generation is the same knowledge in a person's mind, or at least that the two faculties draw on a single representation of their language even if it is engaged in different ways. This has led to the goal of reversible NLP systems. The common approach has been to take the computational artifact constructed by one of the single-direction projects, typically its grammar, and to aSapt it for use in the other direction. At ISI, for example, their massive systemic grammar for generation, NIGEL, (Mann &amp; Matthiessen 1985) has since been adapted for use as a parser (Casper 1989). With the conceptual basis of the transformation in place, the development of further extensions and modifications is done on the generation grammar, and then that grammar is retransformed to yield the new parsing grammar.</Paragraph>
    <Paragraph position="4"> The other well-known approach to reversible NLP is of course to use the very same computational artifact in both processing dkections. Thus far this artifact has invariably been a grammar, typically some kind of specification of the text-stream -logical form relation that can be used as a transducer or can supply the data for it.</Paragraph>
    <Paragraph position="5"> Parsers and generators draw on their grammars as their predominant knowledge source. The grammar thus becomes a bottleneck for the processing if it is not designed with efficiency of processing in mind.</Paragraph>
    <Paragraph position="6"> When virtually the same computational representation of the grammar is used in both processes and it is given an active role, e.g. when the grammar is couched in a unification formalism, this bottleneck can be substantial since the &amp;quot;common denominator&amp;quot; processing architecture that must be employed in order for the grammar to be literally usable by both processes will be markedly less efficient than architectures that work from single-direction representations of the grammar.</Paragraph>
    <Paragraph position="7"> By their nature as information processing systems, language understanding and generation are quite different kinds of processes. Understanding proceeds from texts to intentions. The &amp;quot;known&amp;quot; is the wording of the text and its intonation. From these, the understanding process constructs and deduces the propositional content conveyed by the text and the probable intentions of the speaker in producing it. Its primary effort is to scan the words of the text in sequence, during which the form of the text gradually unfolds. This requirement to scan forces the adoption of algorithms based on the management of multiple hypotheses and predictions that feed a representation that must be expanded dynamically. Major problems are caused by  ambiguity and under-specification (i.e. the audience typically receives more information from situationally motivated inferences than is conveyed by the actual text).</Paragraph>
    <Paragraph position="8"> In generation, information flows in the opposite direction from understanding. Generation proceeds from content to iform, from intentions and perspectives to linearly arrayed words and syntactic markers. A generator's &amp;quot;known&amp;quot; is its awareness of its intentions, its plgns, and the text it has already produced. Coupled with a model of the audience, the situation, and the discourse, this provides the basis for making choices among the alternative wordings and constructions that the language provides---the principal activity iff generation. Most generation systems do produce ;texts sequentially from left to right---just like an understanding system would scan it; but they do this only after having made decisions about the content and form of the text as a whole.</Paragraph>
    <Paragraph position="9"> Ambiguity in a generator's knowledge is not possible (indeed one of its problems is to notice that it has inadvertently introduced an ambiguity into the text).</Paragraph>
    <Paragraph position="10"> And rather than under-specification, a generator's problem is to choose from its over-supply of information what to include and what to omit so as to adequately signal its intended inferences to the audience.</Paragraph>
    <Paragraph position="11"> Our concern with efficiency---optimizing the two processes to fit their differing information processing characteristics---has led us to approach reversible NLP by al compilation-style route where the grammar that the processes use is not one artifact but two, each with its own representation that is deliberately tailored to the process that uses it. Like the system at ISI, our reversible knowledge source is grounded in the generation process and then projected, via a compiler, to create the representation used by the parser. The difference is that while ISI projected the grammar that the generator used, i.e. the set of system networks that is the model of the linguistic resources provided by the language and their dependencies, our system is a projection from the underlying application's conceptual model.</Paragraph>
    <Paragraph position="12"> In generation deghe starts with a set of objects representing individuals, relations, propositions, etc. that have been selected from the application program as its representation of the information it wants to communicate. Accordingly, the kind of knowledge that a generator must draw on most frequently is what are the options for realizing those objects linguistically. In Order to make this look-up efficient, one is naturally led to an architecture where Is stored directly with the definmons this knowledge &amp;quot; ~ .... of the objects or their classes, in effect distributing a highly lexicalized grammar over the knowledge base.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML