XML Viewer - h90-1021

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/90/h90-1021_abstr.xml
Size: 15,644 bytes
Last Modified: 2025-10-06 13:46:58
<?xml version="1.0" standalone="yes"?>
<Paper uid="H90-1021">
  <Title>The ATIS Spoken Language Systems Pilot Corpus</Title>
  <Section position="1" start_page="0" end_page="98" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Speech research has made tremendous progress in the past using the following paradigm: * define the research problem, * collect a corpus to objectively measure progress, and * solve the research problem.</Paragraph>
    <Paragraph position="1"> Natural language research, on the other hand, has typically progressed without the benefit of any corpus of data with which to test research hypotheses. We describe the Air Travel Information System (ATIS) pilot corpus, a corpus designed to measure progress in Spoken Language Systems that include both a speech and natural language component. This pilot marks the first full-scale attempt to collect such a corpus and provides guidelines for future efforts.</Paragraph>
    <Paragraph position="2"> Introduction The ATIS corpus provides an opportunity to develop and evaluate speech systems that understand spontaneous speech. This corpus differs from its predecessor, the Resource Management corpus (Price eg al, 1988), in  at least four significant ways.</Paragraph>
    <Paragraph position="3"> 1. Instead of being read, the speech has many of the characteristics of spontaneous spoken language (e.g., dysfiuencies, false starts, and colloquial pronunciations). null 2. The speech collection occurs in an office environment rather than a sound booth.</Paragraph>
    <Paragraph position="4"> 3. The grammar becomes part of the system under evaluation rather than a given part of the experiment. null 4. The reference answer consists of the actual reply  for the utterance rather than an orthographic transcription of the speech.</Paragraph>
    <Paragraph position="5"> The evaluation methodology supported by ATIS depends on having a comparable representation of the answer for each utterance. This is accomplished by limiting the utterances to database queries~ and the answers to a ground set of tuples from a fixed relational database. The ATIS corpus comprises the acoustic speech data for a query, transcriptions of that query, a set of tuples that constitute the answer, and the SQL expression for the query that produced the answer tuples.</Paragraph>
    <Paragraph position="6"> The ATIS database consists of data obtained from the Official Airline Guide (OAG, 1990), organized under a relational schema. The database remained fixed throughout the pilot phase. It contains information about flights, fares, airlines, cities, airports, and ground services, and includes twenty-five supporting tables. The large majority of the questions posed by subjects can be answered from the database with a single relational query.</Paragraph>
    <Paragraph position="7"> To collect the kind of English expected in a real working system, we simulate one. The subject, or &amp;quot;travel planner,&amp;quot; is in one room, with those running the simulation in another. The subject speaks requests over a microphone and receives both a transcription of the speech and the answer on a computer screen. A session lasts approximately one hour, including detailed preliminary instructions and an exit questionnaire.</Paragraph>
    <Paragraph position="8"> Two &amp;quot;wizards&amp;quot; carry out the simulation: one transcribes the query while the other produces the answer. The transcriber interprets any verbal editing by the sub-ject and removes dysfluencies in order to produce an orthographic transcription of what the subject intended to say. At the same time, the answerer uses a natural language-oriented command language to produce an SQL expression that elicits the correct answer for the subject. On-line utilities maintain a complete log of the session, including time stamps.</Paragraph>
    <Paragraph position="9"> At the conclusion of the session, the utterances are sorted into categories to determine those utterances suitable for objective evaluation. Finally, each utterance receives three different transcriptions. First, a checked version of the transcription produced during the session provides an appropriate input string for evaluating text-based natural language systems. Second, a slightly expanded version of this serves as a prompt in collecting a read version of the spontaneously spoken sentences.</Paragraph>
    <Paragraph position="10"> Finally, a more detailed orthographic transcription represents the speech actually uttered by the subject, appropriate for use in acoustic modeling.</Paragraph>
    <Section position="1" start_page="96" end_page="96" type="sub_section">
      <SectionTitle>
Corpus Collection
</SectionTitle>
      <Paragraph position="0"> About one session a day was conducted, using subjects recruited from within Texas Instruments. A typical session included approximately 20 minutes of introduction, 40 minutes of query time and 10 minutes for follow-up.</Paragraph>
      <Paragraph position="1"> Each session resulted in two speech files for each query and a complete log of the session. Figure 1 depicts the session procedure.</Paragraph>
      <Paragraph position="3"/>
    </Section>
    <Section position="2" start_page="96" end_page="97" type="sub_section">
      <SectionTitle>
Session Introduction
</SectionTitle>
      <Paragraph position="0"> The subjects were given the following instructions, both orally and in writing: The Air Travel Information System (ATIS) is a prototype of a voice-input information retrieval system. It has the same information that is contained in the Official Airline Guide (OAG) to help you make air travel plans. We would like you to participate in a trial use of this experimental system.</Paragraph>
      <Paragraph position="1"> Subjects were not told whether that the &amp;quot;experimental system&amp;quot; was totally automated or involved human intervention. It was hoped that most subjects would believe that the system was real to elicit natural speech.</Paragraph>
      <Paragraph position="2"> Subjects were informed about the contents of the relational database in a one page summary. The summary described the major database entities in fairly general terms to avoid influencing the vocabulary used during the session. To avoid some misconceptions in advance, subjects were told that the database did not contain information about hotels or rental cars.</Paragraph>
      <Paragraph position="3"> The subject was next assigned a travel planning scenario, systematically chosen from a set of six scenarios designed to exercise various aspects of the database. For example, some scenarios focused on flight time constraints while others concentrated on fares. The scenarios did not specify particular times or cities in an effort to make the scenario more personal to the subject. The following example illustrates this: Plan the travel arrangements for a small family reunion. First pick a city where the gettogether will be held. From 3 different cities (of your choice), find travel arrangements that are suitable for the family members who typify the &amp;quot;economy&amp;quot;, &amp;quot;high class&amp;quot;, and &amp;quot;adventurous&amp;quot; life styles.</Paragraph>
      <Paragraph position="4"> After receiving the scenario, subjects were left with the instructions and given five minutes to plan the details of the scenarios. Subjects were given pen and paper on which to write the details and to take notes during the session.</Paragraph>
      <Paragraph position="5"> Finally, subjects were given instructions regarding the operation of the system. The &amp;quot;system&amp;quot;, from the subjects perspective, consisted of a 19 inch color monitor running the X Window System, and a head-mounted Sennheiser (HMD 410-6) microphone. A desk mounted Crown (PCC-160 phase coherent cardioid) microphone was also used to record the speech. The &amp;quot;office&amp;quot; contained a spare-station cpu and disk to replicate office noise, and a wall map of the United States to help subjects solve their scenarios.</Paragraph>
      <Paragraph position="6"> The monitor screen was divided into two regions: a large, scrollable window for system output and a smaller window for speech interaction. The system used a &amp;quot;pushto-talk&amp;quot; input mechanism, whereby speech collection occurred while a suitably marked mouse button was depressed. Subjects were given the opportunity to cancel an utterance for a period of time equal to the length of the utterance.</Paragraph>
      <Paragraph position="7"> A single sentence was used for all subjects to illustrate the push-to-talk mechanism and interaction with the system: Show me all the nonstop flights between Atlanta and Philadelphia.</Paragraph>
      <Paragraph position="8"> This sentence was processed as if the system actually responded to the utterance, including a transcription of the speech on the subject's display followed by the answer in table format.</Paragraph>
      <Paragraph position="9"> Session Queries After the introduction, subjects were given approximately 40 minutes to complete the task described in the scenario. If they finished early, subjects were instructed to select another scenario or to explore the capabilities of the system. After the 40 minutes, subjects were given the opportunity to continue, finally ending the session by saying &amp;quot;all done&amp;quot;.</Paragraph>
      <Paragraph position="10"> Once the actual session started, subjects cycled through thinking, querying, waiting, and writing. While the thinking portion of the session actually required the most time, the query portion required the most resources. null Several things happened at once as a given subject spoke a query. While speech from both the head-mounted and desk-mounted microphones was recorded, one wizard began to transcribe the speech and the other wizard began to answer the query. A playback capability could be used if needed by the transcription wizard. The answer wizard was constrained not to send the answer before the transcription wizard finished the transcription. Typically, the subject received the typed  transcription a few seconds after speaking and the answer approximately 20 seconds later.</Paragraph>
      <Paragraph position="11"> Each wizard each had their own X Window terminal. The transcription wizard used a gnuemacs-based tool that checked the spelling of the transcription and sent the transcription to both the answer wizard and the subject. Despite the transcription wizard's best efforts, some transcription mistakes did reach the subject: occasionally words were omitted, inserted, or substituted (e.g., &amp;quot;fight&amp;quot; for &amp;quot;flight&amp;quot;).</Paragraph>
      <Paragraph position="12"> The answer wizard used a tool called NLParse (Hemphill et al, 1987) to form the answer to the subjects queries. This tool used a natural language-oriented command language to produce a set of tuples for the answer. NLParse provides a set of menus to help convey the limited coverage to the wizard. In practice, the answer wizard knew the coverage and used typing with escape completion to enter the appropriate NLParse command.</Paragraph>
      <Paragraph position="13"> NLParse provides several advantages as a wizard tool: * every answerable query (with respect to the database) receives an answer, * the NLParse query language avoids ambiguity, * the wizard formulates the answer in terms of database entities, and * the wizard can easily discern the correctness of the answer.</Paragraph>
      <Paragraph position="14"> However, the NLParse query language was not originally designed for rapid query entry, prompting several small grammar enhancements during the pilot.</Paragraph>
      <Paragraph position="15"> The answer wizard's terminal also included a gnuemacs-based utility that created a session log. This included the transcription, the NLParse input, the resulting SQL expression, and the set of tuples constituting the answer. The answer wizard sent only the set of tuples to the subject.</Paragraph>
    </Section>
    <Section position="3" start_page="97" end_page="97" type="sub_section">
      <SectionTitle>
The ATIS Database
</SectionTitle>
      <Paragraph position="0"> The ATIS database was designed to model as much of a real-world resource as possible. In particular, we tried to model the printed OAG in a straightforward manner.</Paragraph>
      <Paragraph position="1"> With this approach, we could rely on travel data expertise from Official Airline Guides, Incorporated. We also used the data directly from the OAG and did not invent any data -- something that is difficult to accomplish in a realistic manner. Additionally, the printed OAG was available to all sites and provided a form of documentation for the database.</Paragraph>
      <Paragraph position="2"> The relational schema were designed to help answer queries in an intuitive manner, with no attempt to maximize the speech collected (e.g., by supplying narrow tables as answers). Toward this end, entities were represented with simple sets or lists in the most direct way. Session Follow-Up After the query phase of the session, subjects were given a brief questionnaire to let us know what they thought of the system. This consisted of the following ten questions with possible answers of &amp;quot;yes&amp;quot; &amp;quot;maybe/sometimes&amp;quot;, &amp;quot;no&amp;quot; or &amp;quot;no opinion&amp;quot;:  1. Were you able to get the travel information you needed? 2. Were you satisfied with the way the information was presented? 3. Did the responses contain the kinds of information you were seeking? 4. Were the answers provided quickly enough? 5. Would you prefer this method to looking up the information in a book? 6. Did the system understand your requests the first time? 7. If the system did not understand you, could you easily find another way to get the information on a later try? 8. Was the travel planning scenario appropriate for a trial use of the system? 9. Do you think a person unfamiliar with computers could use the system easily? 10. Do you think a human was interpreting your questions? null  After the questionnaire, the subjects were given a chance to ask questions, and were informed that the system was a simulation involving human intervention. Finally, we thanked our subjects with their choice of either a mug or a T-shirt.</Paragraph>
    </Section>
    <Section position="4" start_page="97" end_page="98" type="sub_section">
      <SectionTitle>
Corpus Processing
</SectionTitle>
      <Paragraph position="0"> After data collection, a rather elaborate series of processing steps was required before the subject's utterances actually became part of the corpus. A session resulted in a set of speech files and a session log that formed the raw materials for the corpus. Figure 2 illustrates the processing steps.</Paragraph>
      <Paragraph position="1"> Transcriptions To facilitate use of the corpus, three transcriptions were provided with each query. A more detailed transcription document specifies the details of these, with the rationale explained below.</Paragraph>
      <Paragraph position="2"> * NL-input: This transcription is a corrected version of the on-the-fly session transcription, corrected&amp;quot; while reviewing the subject's speech off-line. This transcription reflects the speech as the sub-ject meant to say it, that is, dysfluencies corrected  :. ....... degdeg..I..,.. ........................ .deg .degdeg..degdegdegdegdeg.deg ..deg ..deg. .............. |  ation and dysfluencies were removed, resulting in something resembling the NL_input transcription, but with abbreviations and numbers expanded. Example: | |</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML