File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/j92-1004_metho.xml
Size: 11,047 bytes
Last Modified: 2025-10-06 14:13:13
<?xml version="1.0" standalone="yes"?> <Paper uid="J92-1004"> <Title>TINA: A Natural Language System for Spoken Language Applications</Title> <Section position="5" start_page="79" end_page="80" type="metho"> <SectionTitle> 5. Discussion </SectionTitle> <Paragraph position="0"> This paper describes a new natural language system that addresses issues of concern in building a fully integrated spoken language system. The formalism provides an integrated approach to representations for syntax and for semantics, and produces a highly constraining language model to a speech recognizer. The grammar includes arc probabilities reflecting the frequency of occurrence of patterns within the domain.</Paragraph> <Paragraph position="1"> These probabilities are used to control the order in which hypotheses are considered, and are trained automatically from a set of parsed sentences, making it straightforward to tailor the grammar to a particular need. Ultimately, one could imagine the existence of a very large grammar that could parse almost anything, which would be subsetted for a particular task by simply providing it with a set of example sentences within that domain.</Paragraph> <Paragraph position="2"> The grammar makes use of a number of other principles that we felt were important. First of all, it explicitly incorporates into the parse tree semantic categories intermixed with syntactic ones, rather than having a set of semantic rules provided separately. The semantic nodes are dealt with in the same way as the syntactic nodes; the consequence is that the node names alone carry essentially all of the information necessary to extract a meaning representation from the sentence. The grammar is not a semantic grammar in the usual sense, because it does include high level nodes of a syntactic nature, such as noun-clause, subject, predicate, etc.</Paragraph> <Paragraph position="3"> A second important feature is that unifications are performed in a one-dimensional framework. That is to say, features delivered to a node by a close relative (sibling/parent/ child) are unified with particular feature values associated with that node. The x variable in an x-y relationship is not explicitly mentioned, but rather is assigned to be &quot;whatever was delivered by the relative.&quot; Thus, for example, a node such as \[subject\] unifies in exactly the same way, regardless of the rule under construction.</Paragraph> <Paragraph position="4"> Another important feature of TINA is that the same grammar can be run in generation mode, making up random sentences by tossing the dice. This has been found to be extremely useful for revealing overgeneralization problems in the grammar, as well as for automatically acquiring a word-pair grammar for a recognizer and producing sentences to test the back-end capability.</Paragraph> <Paragraph position="5"> We discussed a number of different application domains, and gave some performance statistics in terms of perplexity/coverage/overgeneralization within some of these domains. The most interesting result was obtained within the VOYAGER domain (see Sections 3.3 and 3.4). The perplexity (average number of words that can follow a given word) decreased from 70 to 28 to 8 when the grammar changed from word-pair (derived from the same grammar) to parser without probabilities to parser with probabilities.</Paragraph> <Paragraph position="6"> We_currently have two application domains that can carry on a spoken dialog with a user. One, the VOYAGER domain (Zue et al. 1990), answers questions about places of interest in an urban area, in our case, the vicinity of MIT and Harvard University.</Paragraph> <Paragraph position="7"> The second one, ATIS (Seneff et al. 1991), is a system for accessing data in the Official Stephanie Seneff TINA: A Natural Language System for Spoken Language Applications Airline Guide and booking flights. Work continues on improving all aspects of these domains.</Paragraph> <Paragraph position="8"> Our current research is directed at a number of different remaining issues. As of this writing, we have a fully integrated version of the VOYAGER system, using an A* search algorithm (Goodine et al. 1991). The parser produces a set of next-word candidates dynamically for each partial theory. We have not yet incorporated probabilities from TINA into the search, but they are used effectively to resort the final output sentence candidates. In order to incorporate the probabilities into the search we need a tight upper bound on the future linguistic score for the unseen portion of each hypothesis. This is a current research topic in our group. We also plan to experiment with further reductions in perplexity based on a discourse state. This should be particularly effective within the ATIS domain where the system often asks directed questions about as yet unresolved particulars to the flight.</Paragraph> </Section> <Section position="6" start_page="80" end_page="84" type="metho"> <SectionTitle> 6. Appendix: Sample Grammar Illustrating Probability Calculation and Perplexity </SectionTitle> <Paragraph position="0"> Computation This appendix walks through a pedagogical example to parse spoken digit sequences up to three long, as in &quot;three hundred and sixteen.&quot; Included is a set of initial context-free rules, a set of training sentences, an illustration of how to compute the path probabilities from the training sentences, and an illustration of both parsing and perplexity computation for a test sentence.</Paragraph> <Paragraph position="1"> Since there are only five training sentences, a number of the arcs of the original grammar are lost after training. This is a problem to be aware of in building grammars from example sentences. In the absence of a sufficient amount of training data, some arcs will inevitably be zeroed out. Unless it is desired to intentionally filter these out as being outside of the new domain, one can insert some arbitrarily small probability for these arcs, using, for example, an N-gram back-off model (Katz 1987).</Paragraph> <Paragraph position="2"> The Grammar: (parentheses indicate optional elements)</Paragraph> <Paragraph position="4"> The training sentences: (with spoken form) I: 144 &quot;one hundred and forty four&quot; The training pairs for &quot;hundreds-place&quot; (gathering together all rules in (1, 2, 3, 5) above that have &quot;hundreds-place&quot; on the LHS: digits, digits hundred, hundred and, and end digits, digits end digits, digits end a, a hundred, hundred end The count array for &quot;hundreds-place&quot;: digits hundred and end a total start 3 0 0 0 1 4</Paragraph> <Paragraph position="6"> The probability of a transition from start to digits, within the parent node &quot;hundredsplace,&quot; is just 3/4, the ratio of the number of times &quot;hundreds-place&quot; started with &quot;digits&quot; over the number of times it started with anything.</Paragraph> <Paragraph position="7"> Stephanie Seneff TINA: A Natural Language System for Spoken Language Applications Parsing the phrase &quot;four fifteen&quot; with the trained parser: The initial stack: ~5 After &quot;hundreds-place&quot; gets popped and expanded: digitsJhundreds-place, start 4/5*3/4 tens-placelnumber, start 1/5 alhundreds-place, start 4/5.1/4 (this is a tie score with the above) After &quot;digitslhundreds-place&quot; is popped and a match with &quot;four&quot; is found: endihundreds-place, digits hundredJhundreds-place, digits tens-placelnumber, start alhundreds-place, start 2/3 (given &quot;four&quot; with certainty) 1/3 (this is the word &quot;hundred&quot;)</Paragraph> <Paragraph position="9"> After &quot;endlhundreds-place, digits&quot; is popped, &quot;hundreds-place&quot; has a solution in hand, &quot;four.&quot; It now activates its only right sibling, &quot;tens-place.&quot; This is a different instance of &quot;tens-place&quot; from the one at the third place in the stack. Its left sibling is &quot;hundreds-place&quot; rather than &quot;start.&quot; tens-placeJnumber, hundreds-place 2/3 hundredIhundreds-place, digits i/3 tens-placeinumber, start I/5 aihundreds-place, start 4/5,1/4 After &quot;tens-place&quot; is expanded, we have: tensftens-place, start 2/3~3/5 hundredihundreds-place, digits i/3 tens-placelnumber, start i/5 aJhundreds-place, start 4/5~1/4 teensftens-place, start 2/3.1/5 ohltens-place, start 2/3~1/5 &quot;Tens&quot; and &quot;hundred&quot; will both get popped off and rejected, because there is no match with the word &quot;fifteen.&quot; &quot;Tens-151ace&quot; will also get popped, and eventually rejected, because nothing within &quot;tens-place&quot; matches the digit &quot;four.&quot; A similar fate meets the &quot;a&quot; hypothesis. Finally, &quot;teens&quot; will be popped off and matched, and &quot;endltens-place, teens&quot; will be inserted at the top with probability 1.0. This answer will be returned to the parent, &quot;tens-place,&quot; and two new hypotheses will be inserted at the top of the Paths through the parse tree for the phrase &quot;four fifteen&quot; with associated probabilities derived from the training data.</Paragraph> <Paragraph position="10"> stack as follows: ones-placelnumber, tens-place endlnumber, tens-place 315 215 After the first one is rejected, the second one finds a completed &quot;number&quot; rule and an empty input stream. The correct solution is now in hand. Notice that because &quot;teens&quot; was a relatively rare occurrence, a number of incorrect hypotheses had to be pursued before the correct one was considered.</Paragraph> <Paragraph position="11"> Computation of perplexity, for the phrase, &quot;four fifteen:&quot;</Paragraph> <Paragraph position="13"> These are the three transitions with associated probabilities, following the appropriate paths in Figure A.I: Stephanie Seneff TINA: A Natural Language System for Spoken Language Applications particular phrase. This is higher than the norm for numbers given the grammar, again because of the rare occurrence of the &quot;teens&quot; node, as well as the fact that there is no ones-place. This example is a bit too simple - in general there would be multiple ways to get to a particular next word, and there are also constraints which kill certain paths and make it necessary to readjust probabilities on the fly. In practice, one must find all possible ways to extend a word sequence, computing total path probability for each one, and then renormalize to assure that with probability 1.0 there is an advance to some next word. It is the normalized probability contribution of all paths that can reach the next word that is used to update the log P calculation.</Paragraph> </Section> class="xml-element"></Paper>