XML Viewer - j92-1004

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/j92-1004_intro.xml
Size: 18,382 bytes
Last Modified: 2025-10-06 14:05:16
<?xml version="1.0" standalone="yes"?>
<Paper uid="J92-1004">
  <Title>TINA: A Natural Language System for Spoken Language Applications</Title>
  <Section position="4" start_page="73" end_page="79" type="intro">
    <SectionTitle>
3. Evaluation Measures
</SectionTitle>
    <Paragraph position="0"> This section addresses some performance measures for a grammar, including coverage, portability, perplexity, and trainability. Perplexity, roughly defined as the geometric mean of the number of alternative word hypotheses that may follow each word in the sentence, is of particular concern in spoken language tasks. Portability and trainability concern the ease with which an existing grammar can be ported to a new task, as well as the amount of training data necessary before the grammar is able to generalize well to unseen data.</Paragraph>
    <Paragraph position="1">  Stephanie Seneff TINA: A Natural Language System for Spoken Language Applications To date, four distinct domain-specific versions of TINA have been implemented. The first version (TIMIT) was developed for the 450 phonetically rich sentences of the TIMIT database (Lamel et al. 1986). The second version (RM) concerns the Resource Management task (Pallett 1989) that has been popular within the DARPA community in recent years. The third version (VOYAGER) serves as an interface both with a recognizer and with a functioning database back-end (Zue et al. 1990). The VOYAGER system can answer a number of different types of questions concerning navigation within a city, as well as provide certain information about hotels, restaurants, libraries, etc., within the region. A fourth domain-specific version is under development for the ATIS (Air Travel Information System) task, which has recently been designated as the new common task for the DARPA community.</Paragraph>
    <Section position="1" start_page="74" end_page="74" type="sub_section">
      <SectionTitle>
3.1 Portability
</SectionTitle>
      <Paragraph position="0"> We tested ease of portability for TINA by beginning with a grammar built from the 450 TIMIT sentences and then deriving a grammar for the RM task. These two tasks represent very different sentence types. For instance, the overwhelming majority of the TIMIT sentences are statements, whereas the RM task is made up exclusively of questions and requests. The process of conversion to a new grammar involves parsing the new sentences one by one, and adding context-free rules whenever a parse fails.</Paragraph>
      <Paragraph position="1"> The person entering the rules must be very familiar with the grammar structure, but for the most part it is straightforward to identify and incrementally add missing rules.</Paragraph>
      <Paragraph position="2"> The parser identifies where in the sentence it fails, and also maintains a record of the successful partial parses. These pieces of information usually are adequate to pinpoint the problem. Once the grammar has been expanded to accomodate the new set of sentences, a subset grammar can be created automatically that only contains rules needed in the new domain, eliminating any rules that were particular to the original domain. It required less than one person-month to convert the grammar from TIMIT to the RM task.</Paragraph>
    </Section>
    <Section position="2" start_page="74" end_page="75" type="sub_section">
      <SectionTitle>
3.2 Perplexity and Coverage in RM Task
</SectionTitle>
      <Paragraph position="0"> A set of 791 sentences within the RM task have been designated as training sentences, and a separate set of 200 sentences as the test set. We built a subset grammar from the 791 parsed training sentences, and then used this grammar to test coverage and perplexity on the unseen test sentences. The grammar could parse 100% of the training sentences and 84% of the test sentences.</Paragraph>
      <Paragraph position="1"> A formula for the test set perplexity (Lee 1989) is: 13</Paragraph>
      <Paragraph position="3"> where the wi are the sequence of all words in all sentences, N is the total number of words, including an &amp;quot;end&amp;quot; word after each sentence, and P(wi I Wi--I~'''Wl) is the probability of the ith word given all preceding wordsJ 4 If all words are assumed equally likely, then P(wi \] wi-1,.., wl) can be determined by counting all the words that could follow each word in the sentence, along all workable partial theories. If the grammar contains probability estimates, then these can be used in place of the equally  likely assumption. If the grammar's estimates reflect reality, the estimated probabilities will result in a reduction in the total perplexity.</Paragraph>
      <Paragraph position="4"> An average perplexity for the 167 test sentences that were parsable was computed for the two conditions, without (Case 1) and with (Case 2) the estimated probabilities. The result was a perplexity of 368 for Case 1, but only 41.5 for Case 2, as summarized in Table 1. This is with a total vocabulary size of 985 words, and with a grammar that included some semantically restricted classes such as \[ship-name\] and \[readinesscategory\]. The incorporation of arc probabilities reduced the perplexity by a factor of nine, a clear indicator that a proper mechanism for utilizing probabilities in a grammar can help significantly. An even lower perplexity could be realized within this domain by increasing the number of semantic nodes. In fact, this is a trend that we have increasingly adopted as we move to new domains.</Paragraph>
      <Paragraph position="5"> We didn't look at the test sentences while designing the grammar, nor have we yet looked at those sentences that failed to parse. However, we decided to examine the parse trees for those sentences that produced at least one parse to determine the depth of the first reasonable parse. The results were essentially the same for the training and the test sentences, as shown in Table 2. Both gave a reasonable parse as either the first or second proposed parse 96% of the time. Two of the test sentences never gave a correct parse.</Paragraph>
    </Section>
    <Section position="3" start_page="75" end_page="76" type="sub_section">
      <SectionTitle>
3.3 Experiments within the VOYAGER domain
</SectionTitle>
      <Paragraph position="0"> We have recently developed a subdomain for TINA that has been incorporated into a complete spoken language system called VOYAGER. The system provides directions on how to get from one place to another within an urban region, and also gives information such as phone number or address for places such as restaurants, hotels, libraries, etc. We have made extensive use of semantic filters within this domain, in order to reduce the perplexity of the recognition task as much as possible.</Paragraph>
      <Paragraph position="1"> To obtain training and test data for this task, we had a number of naive subjects use the system as if they were trying to obtain actual information. Their speech was recorded in a simulation mode in which the speech recognition component was  excluded. Instead, an experimenter in a separate room typed in the utterances as spoken by the subject. Subsequent processing by the natural language and response generation components was done automatically by the computer (Zue et al. 1989).</Paragraph>
      <Paragraph position="2"> We were able to'collect a total, of nearly 5000 utterances in this fashion. The speech material was then used to train the recognizer component, and the text material was used to train the natural language and back-end components.</Paragraph>
      <Paragraph position="3"> We designated a subset of 3312 sentences as the training set, and augmented the original rules so as to cover a number of sentences that appeared to stay within the domain of the back-end. We did not try to expand the rules to cover sentences that the back-end could not deal with, because we wanted to keep the natural language component tightly restricted to sentences with a likely overall success. In this way we were able to increase the coverage of an independent test set of 560 utterances from 69% to 76%, with a corresponding increase in perplexity, as shown in Table 3.</Paragraph>
      <Paragraph position="4"> Perplexity was quite low even without probabilities; this is due mainly to an extensive semantic filtering scheme. Probabilities decreased the perplexity by a factor of three, however, which is still quite significant. An encouraging result was that both perplexity and coverage were of comparable values for the training and test sets, as shown in the table.</Paragraph>
    </Section>
    <Section position="4" start_page="76" end_page="79" type="sub_section">
      <SectionTitle>
3.4 Generation Mode
</SectionTitle>
      <Paragraph position="0"> As mentioned previously, generation mode has been a very useful device for detecting overgeneralization problems in a grammar. After the addition of a number of semantically loaded nodes and semantic filters, the VOYAGER version of the grammar is now restricted mainly to sentences that are semantically as well as syntactically legitimate.</Paragraph>
      <Paragraph position="1"> To illustrate this point we show in Table 4 five examples of consecutively generated sentences. Since these were not selectively drawn from a larger set, they accurately reflect the current performance level.</Paragraph>
      <Paragraph position="2"> We also used generation mode to construct a word-pair grammar automatically for the recognizer component of our VOYAGER system. To do this, over 100,000 sentences were generated, and word-pair links were established for all words sharing the same terminal category (such as \[restaurant-name\], for all category-pairs appearing in the generated sentences. We could test completion by continuing until no new pairs were found. The resulting word pair grammar has a perplexity of over 70, in contrast to a perplexity of less than nine for the grammar used to construct it. This difference reflects the additional constraint of both the probabilities and the long-distance dependencies.</Paragraph>
      <Paragraph position="3">  At present, we have available at MIT two systems, VOYAGER and ATIS, involving specific application domains in which a person can carry on a dialog with the computer, either through spoken speech or through text input. In both of these systems, TINA provides the interface between the recognizer and the application back-end. In this section, I will describe our current interfaces between TINA and the recognizer and our future plans in this area. In addition, I will describe briefly how we currently translate the parse tree into a semantic frame that serves as the input to database access and text response generation. This aspect of the system is beyond the scope of this paper, and therefore it will not be covered in detail.</Paragraph>
      <Paragraph position="4"> The recognizer for these systems is the SUMMIT system (Zue et al. 1989), which uses a segmental-based framework and includes an auditory model in the front-end processing. The lexicon is entered as phonetic pronunciations that are then augmented to account for a number of phonological rules. The search algorithm is the standard Viterbi search (Viterbi 1967), except that the match involves a network-to-network alignment problem rather than sequence-to-sequence.</Paragraph>
      <Paragraph position="5"> When we first integrated this recognizer with TINA, we used a &amp;quot;wire&amp;quot; connection, in that the recognizer produced a single best output, which was then passed to TINA for parsing. A simple word-pair grammar constrained the search space. If the parse failed, then the sentence was rejected. We have since improved the interface by incorporating a capability in the recognizer to propose additional solutions in turn once the first one fails to parse (Zue et al. 1991) To produce these &amp;quot;N-best&amp;quot; alternatives, we make use of a standard A* search algorithm (Hart 1968, Jelinek 1976). Both the A* and the Viterbi search are left-to-right search algorithms. However, the A* search is contrasted with the Viterbi search in that the set of active hypotheses take up unequal segments of time. That is, when a hypothesis is scoring well it is allowed to procede forward, whereas poorer scoring hypotheses are kept on hold.</Paragraph>
      <Paragraph position="6"> We have thus far developed two versions of the control strategy, a &amp;quot;loosely coupled&amp;quot; system and a &amp;quot;tightly coupled&amp;quot; system. Both versions begin with a Viterbi search all the way to the end of the sentence, resulting in not only the first candidate solution but also partial scores for a large set of other hypotheses. If this first solution fails to parse, then the best-scoring partial theory is allowed to procede forward incrementally.</Paragraph>
      <Paragraph position="7"> In an A* search, the main issue is how to get an estimate of the score for the unseen portion of the sentence. In our case, we can use the Viterbi path to the end as the estimate of the future score. This path is guaranteed to be the best way to get to the end; however, it may not parse. Hence it is a tight upper bound on the true score for the rest of the sentence. The recognizer can continue to propose hypotheses until one  Stephanie Seneff TINA: A Natural Language System for Spoken Language Applications successfully parses, or until a quitting criterion is reached, such as an upper bound on N.</Paragraph>
      <Paragraph position="8"> Whereas in the loosely coupled system the parser acts as a filter only on completed candidate solutions (Zue et al. 1991), the tightly coupled system allows the parser to discard partial theories that have no way of continuing. Following the Viterbi search, each partial theory is first extended by the parser to specify possible next words, which are then scored by the recognizer. We have not yet made use of TINA'S probabilities in adjusting the recognizer scores on the fly, but we have been able to incorporate linguistic scores to resort N-best outputs, giving a significant improvement in performance (Goodine et al. 1991). Ultimately we want to incorporate TINA'S probabilities directly into the A* search, but it is as yet unclear how to provide an appropriate upper bound for the probability estimate of the unseen portion of the linguistic model.</Paragraph>
      <Paragraph position="9"> Once a parser has produced an analysis of a particular sentence, the next step is to convert it to a meaning representation form that can be used to perform whatever operations the user intended by speaking the sentence. We currently achieve this translation step in a second-pass treewalk through the completed parse tree. Although the generation of semantic frames could be done on the fly as the parse is being proposed, it seems inappropriate to go through all of that extra work for large numbers of incorrect partial theories, due to the uncertainty as to the identity of the terminal word strings inherent in spoken input.</Paragraph>
      <Paragraph position="10"> We have taken the point of view that all syntactic and semantic information can be represented uniformly in strictly hierarchical structures in the parse tree. Thus the parse tree contains nodes such as \[subject\] and \[dir-object\] that represent structural roles, as well as nodes such as \[on-street\] and \[a-school\] representing specific semantic categories. There are no separate semantic rules off to the side; rather, the semantic information is encoded directly as names attached to nodes in the tree.</Paragraph>
      <Paragraph position="11"> Exactly how to get from the parse tree to an appropriate meaning representation is a current research topic in our group. However, the method we are currently using in the ATIS domain (Seneff et al. 1991) represents our most promising approach to this problem. We have decided to limit semantic frame types to a small set of choices such as CLAUSE (for a sentence-level concept, such as request), PREDICATE (for a functional operation), REFERENCE (essentially proper noun), and QSET (for a set of objects). The process of obtaining a completed semantic frame amounts to passing frames along from node to node through the completed parse tree. Each node receives a frame in both a top-down and a bottom-up cycle, and modifies the frame according to specifications based on its broad-class identity (as one of noun, noun-phrase, predicate, quantifier, etc.). For example, a \[subject\] is a noun-phrase node with the label &amp;quot;topic.&amp;quot; During the top-down cycle, it creates a blank frame and inserts it into a &amp;quot;topic&amp;quot; slot in the frame that was handed to it. It passes the blank frame to its children, who will then fill it appropriately, labeling it as a QSET or as a REFERENCE. It then passes along to the right sibling the same frame that was handed to it from above, with the completed topic slot filled with the information delivered by the children.</Paragraph>
      <Paragraph position="12"> The raw frame that is realized through the treewalk is post-processed to simplify some of the structure, as well as to augment or interpret expressions such as relative time. For example, the predicate modifier in &amp;quot;flights leaving at ten a.m.&amp;quot; is simplified from a predicate leave to a modifier slot labeled departure-time. An expression such as &amp;quot;next Tuesday&amp;quot; is interpreted relative to today's date to fill in an actual month, date, and year. Following this post-analysis step, the frame is merged with references contained in a history record, to fold in information from the previous discourse.</Paragraph>
      <Paragraph position="13"> The completed semantic frame is used in ATIS both to generate an SQL (Structured Query Language) command to access the database and to generate a text output to be  Computational Linguistics Volume 18, Number 1 spoken in the interactive dialog. The SQL pattern is controlled through lists of frame patterns to match and query fragments to generate given the match. Text generation is done by assigning appropriate temporal ordering for modifiers on nouns and for the main noun. The modifiers are contained in slots associated with the QSET frame. Certain frames such as clock-time have special print functions that produce the appropriate piece of text associated with the contents.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML