File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/90/h90-1025_abstr.xml
Size: 6,517 bytes
Last Modified: 2025-10-06 13:46:59
<?xml version="1.0" standalone="yes"?> <Paper uid="H90-1025"> <Title>Session 6: ATIS Site Reports and General Discussion</Title> <Section position="2" start_page="0" end_page="123" type="abstr"> <SectionTitle> ATIS Pilot Corpus, and the MIT group monitored progress </SectionTitle> <Paragraph position="0"> in handling the utterances in each successive release, both in terms of parser coverage and agreement of the back-end responses with the canonical answers. These studies led Seneff to express concern &quot;that rules created to deal with utterances in one release don't seem to generalize well to new releases&quot;, a finding that may be related to other observations about the &quot;very high inter-speakers variability that accompanies low intra-speaker variability in linguistic structure&quot; (see, for example \[1\]). While noting that an inordinate amount of time had been required to work with the back end and the need to generate SQL queries, Seneff remarked that &quot;the idea of a common task involving booking flights is a good one&quot;, and that they &quot;look forward to ... integrating the natural language component with a recognizer&quot;. null In a refreshing contrast to the other papers in the evening presentation (which focussed largely on diagnostic evaluations), Patti Price reported on studies at SRI (involving the ATIS relational database) which assessed the effects of changes in the simulations on the speech and language of the experimental subjects \[4\]. The stated goal of these studies is to &quot;design an appropriate human-machine interface&quot;. Price also noted that &quot;the greatest source of variability in the system is that across subjects&quot;. Five expedrnents were described for several data collection con- null ditions. The SRI studies suggest that &quot;the goal of designing an appropriate spoken language system can conflict with the goal of collecting data for evaluation of spoken database queries&quot;, but that they &quot;believe that it is possible to find some ways to coordinate the two endeavors.&quot; In the last of the formal presentations in the session, Lew Norton described the Unisys ATIS domain system \[5\].</Paragraph> <Paragraph position="1"> The Unisys approach combines a number of elements: (1) the MIT SUMMIT speech recognition system, (2) the Unisys PUNDIT language understanding system, (3) use of a module termed QTIP (Query Translation and Interface Program, and (4) the Intelligent Database Server (IDS), a &quot;general knowledge/database interface&quot; to mediate access to the database, (5)INGRES to access the ATIS relational database, and (6) a Dialogue Manager to integrate overall user-system interaction. \[Note that the MIT system described by Seneff also made use of elements of the IDS component (i.e., the IDI portion).\] In the Unisys &quot;diagnostic evaluation&quot; as reported by Norton, errors were noted due to several causes: (1) words not being in the lexicon, (2) problems in parsing, (3) problems in obtaining an appropriate semantic/pragmatic analysis, and (4) failure of the QTIP module to generate an appropriate call to the relational database. Like other sites, the Unisys researchers noted great inter-subject variability -- with their systems's &quot;success rate&quot; for the different subjects in the test set ranging from &quot;30% to 88%&quot;. Norton further noted that the implications of this finding suggest that there &quot;are a large number of different ways to ask essentially the same questions&quot;, and that &quot;a natural language understanding system will have to be trained on much larger volumes of data.&quot; This observation was further supported by data documenting the rate of incremental growth of the grammar in the ATIS domain, which appears to be much slower for ATIS than for the MIT Voyager domain.</Paragraph> <Paragraph position="2"> Following the presentations from BBN, CMU, MIT, SRI, and Unisys, some time was devoted to general discussion of issues raised in the afternoon and evening ATIS Sessions.</Paragraph> <Paragraph position="3"> Bob Moore underscored what a number of individuals had noted: that there had been insufficient time between the release(s) of the training data and the test data. lit is important to note that the relational ATIS database used in these studies had not been &quot;frozen&quot; until mid-April, and incremental releases of the training data took place during a six-week period during May and just prior to the release of the test data on June 15th.\] Lynnette Hirschman noted that substantially more dam should be made available for this domain m of the order of ten times more than to date.</Paragraph> <Paragraph position="4"> Victor Zue noted that proposals to extend the test methodology to accommodate context (such as those outlined by Bates and Hirschman) seemed attractive, but that all evaluations are, to some degree, subjective and that we need to plan on developing procedures for formal subjective evaluations. Victor likened the present approach to ATIS implementations to a &quot;shotgun&quot; approach, and expressed a preference for more focused or constrained scenarios and local implementations that might be regarded as &quot;rifle&quot; approaches. Victor also underscored what others had suggested: that a pooling of data from several sites may be the only practical way to gather the amount of data that appear to be needed.</Paragraph> <Paragraph position="5"> John Makhoul noted that the focus of the studies reported at this session should be seen as a vehicle for NL/SLS evaluation, not so much as an effort to develop real air travel information systems.</Paragraph> <Paragraph position="6"> Patti Price noted &quot;TI's Heroic Role&quot; in developing the ATIS relational database used for these studies, collecting spontaneous speech data and providing &quot;canonical answers&quot;. Charles Hemphill and his colleagues at TI worked very hard to provide data for the Pilot Corpus for both training and test purposes, and the ATIS studies at the several sites would not have been possible without the TI group's efforts.</Paragraph> <Paragraph position="7"> Charles Wayne closed the discussion by thanking the participants for the significant Spoken Language Systems progress in the ATIS task domain.</Paragraph> </Section> class="xml-element"></Paper>