File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/92/j92-1004_abstr.xml

Size: 6,874 bytes

Last Modified: 2025-10-06 13:47:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="J92-1004">
  <Title>TINA: A Natural Language System for Spoken Language Applications</Title>
  <Section position="2" start_page="0" end_page="62" type="abstr">
    <SectionTitle>
1. Introduction and Overview
</SectionTitle>
    <Paragraph position="0"> Over the past few years, there has been a gradual paradigm shift in speech recognition research both in the U.S. and in Europe. In addition to continued research on the transcription problem, i.e., the conversion of the speech signal to text, many researchers have begun to address as well the problem of speech understanding. 1 This shift is at least partly brought on by the realization that many of the applications involving human/machine interface using speech require an &amp;quot;understanding&amp;quot; of the intended message. In fact, to be truly effective, many potential applications demand that the system carry on a dialog with the user, using its knowledge base and information gleaned from previous sentences to achieve proper response generation. Current advances in research and development of spoken language systems 2 can be found, for example, in the proceedings of the DARPA speech and natural language workshops, as well as in publications from participants of the ESPRIT SUNDIAL project. Representative systems are described in Boisen et al. (1989), De Mattia and Giachin (1989), Niedermair (1989), Niemann (1990), and Young (1989).</Paragraph>
    <Paragraph position="1"> Spoken Language Systems Group, Laboratory for Computer Science, MIT, Cambridge MA 02139 ~This research was supported by DARPA under Contract N00014-89-J-1332, monitored through the Office of Naval Research. 1 Speech understanding research flourished in the U.S. in the 1970s under DARPA sponsorship. While &amp;quot;understanding&amp;quot; was one of the original goals, none of the systems really placed any emphasis on this aspect of the problem. 2 We will use the term &amp;quot;speech understanding systems&amp;quot; and &amp;quot;spoken language systems&amp;quot; interchangeably. (~) 1992 Association for Computational Linguistics Computational Linguistics Volume 18, Number 1 A spoken language system relies on its natural language component to provide the meaning representation of a given sentence. Ideally, this component should also be useful for providing powerful constraints to the recognizer component in terms of permissible syntactic and semantic structures, given the limited domain. If it is to be useful for constraint, however, it must concern itself not only with coverage but also, and perhaps more importantly, with overgeneralization. In many existing systems, the ability to parse as many sentences as possible is often achieved at the expense of accepting inappropriate word strings as legitimate sentences. This had not been viewed as a major concern in the past, since systems were typically presented only with well-formed text strings, as opposed to errorful recognizer outputs.</Paragraph>
    <Paragraph position="2"> The constraints can be much more effective if they are embedded in a probabilistic framework. The use of probabilities in a language model can lead to a substantially reduced perplexity 3 for the recognizer. If the natural language component's computational and memory requirements are not excessive, and if it is organized in such a way that it can easily predict a set of next-word candidates, then it can be incorporated into the active search process of the recognizer, dynamically predicting possible words to follow a hypothesized word sequence, and pruning away hypotheses that cannot be completed in any way. The natural language component should be able to offer significant additional constraint to the recognizer, beyond what would be available from a local word-pair or bigram 4 language model, because it is able to make use of long-distance constraints in requiring well-formed whole sentences.</Paragraph>
    <Paragraph position="3"> This paper describes a natural language system, TINA, which attempts to address some of these issues. The mechanisms were designed to support a graceful, seamless interface between syntax and semantics, leading to an efficient mechanism for constraining semantics. Grammar rules are written such that they describe syntactic structures at the high levels of a parse tree and semantic structures at the low levels. All of the meaning-carrying content of the sentence is completely encoded in the names of the categories of the parse tree, thus obviating the need for separate semantic rules. By encoding meaning in the structural entities of the parse tree, it becomes feasible to realize probabilistic semantic restrictions in an efficient manner. This also makes it straightforward to extract a semantic frame representation directly from an unannotated parse tree.</Paragraph>
    <Paragraph position="4"> The context-free rules are automatically converted to a shared network structure, and probability assignments are derived automatically from a set of parsed sentences.</Paragraph>
    <Paragraph position="5"> The probability assignment mechanism was deliberately designed to support an ability to predict a set of next-word candidates with associated word probabilities. Constraint mechanisms exist and are carried out through feature passing among nodes. A unique aspect of the grammar is that unification constraints are expressed one-dimensionally, being associated directly with categories rather than with rules. Syntactic and semantic fields are passed from node to node by default, thus making available by default the second argument to unification operations. This leads to a very efficient implementation of the constraint mechanism. Unifications introduce additional syntactic and semantic constraints such as person and number agreement and subject/verb semantic restrictions.</Paragraph>
    <Paragraph position="6"> This paper is organized as follows. Section 2 contains a detailed description of the grammar and the control strategy, including syntactic and semantic constraint mech- null Stephanie Seneff TINA: A Natural Language System for Spoken Language Applications anisms. Section 3 describes a number of domain-dependent versions of the system that have been implemented, and addresses, within the context of particular domains, several evaluation measures, including perplexity, coverage, and portability. Section 4 discusses briefly two application domains involving database access in which the parser provides the link between a speech recognizer and the database queries. The last section provides a summary and a discussion of our future plans. There is also an appendix, which walks through an example grammar for three-digit numbers, showing how to train the probabilities, parse a sentence, and compute perplexity on a test sentence.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML