XML Viewer - h91-1014

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/h91-1014_metho.xml
Size: 19,582 bytes
Last Modified: 2025-10-06 14:12:42
<?xml version="1.0" standalone="yes"?>
<Paper uid="H91-1014">
  <Title>Development and Preliminary Evaluation of the MIT ATIS System 1</Title>
  <Section position="4" start_page="0" end_page="91" type="metho">
    <SectionTitle>
SYSTEM DESCRIPTION
</SectionTitle>
    <Paragraph position="0"> In this section we will describe those aspects of the system that have changed significantly since our report last June \[5\].</Paragraph>
    <Paragraph position="1"> The most significant change has been the incorporation of the speech recognition component. We begin by describing the recognizer configuration and the interface mechanism we are currently using. In the natural language component, the parser and grammar remain unchanged, except for augmentations to improve coverage. However, we have completely redesigned the component that translates from a parse tree to executable SQL queries, and the component that generates verbal responses. Both of these areas are described here in more detail.</Paragraph>
    <Section position="1" start_page="0" end_page="91" type="sub_section">
      <SectionTitle>
Speech Recognition Component
</SectionTitle>
      <Paragraph position="0"> The speech recognition configuration is similar to the one used in the VOYAGER system and is based on the SUMMIT system \[6\]. For the ATIS task, we used 76 context-independent phone models trained on speaker-independent data collected at TI and MIT \[3\]. There were 1284 TI sentences (read and spontaneous versions of 642 sentences) and 1146 spontaneous sentences taken from the MIT training corpus. The lexicon was derived from the vocabulary used by the ATIS natural language component and consisted of 577 words. In order to provide some conservative natural language constraints, the speech recognition component used a generalized word-pair grammar derived from the speech training data augmented with a large number of additional sentences pooled from all available sources of nTIS related text material. The word-pair grammar was generated by parsing each sentence, and then generalizing each word in a terminal node to all words in the same semantic class. Thus for example, an instance of the word &amp;quot;Boston&amp;quot; would generalize to all cities. In the  case where a sentence did not parse, no additions were made to the word-pair grammar. When evaluated on the TI June-90 data set of 138 sentences, the word-pair grammar had a coverage of 70% and a perplexity of 92. Overall, 3.6% of the sentences that parsed failed to pass the word-pair grammar.</Paragraph>
      <Paragraph position="1"> The interface to the natural language component was implemented with the N-best mechanism we have described previously for the VOYACER. system \[6\]. In our original implementation, the first N-best output which parsed was used by the back-end to generate a response. Since our natural language component (TINA) is able to produce a parse probability derived from training data, we have tried to make use of the probability in the selection of the N-best output. In both the VOYAGER. and ATIS domains we have found that a linear combination of the acoustic score produced by s v M M IT and the parse score produced by TINA improved the overall system performance \[1\]. In ATIS the improvement in recognition accuracy was about 2% on the TI June-90 data set.</Paragraph>
      <Paragraph position="2"> In order to control the number of false alarms produced by the system, we investigated the use of severn,1 pruning measures which could be applied to the N-best outputs. To date we have found the N-best rank and the relative acoustic score (relative to the first choice output) to be effective parameters.</Paragraph>
      <Paragraph position="3"> The ATIS back-end After reassessing the status of our ATIS system last June, we were concerned that the design of the back-end component might not be as easily extended or ported to new domains as we would like. We therefore decided to redesign the system, with the goal of emphasizing both system modularity and system portability. In choosing a design for the new system, we had two major goals. One was to design a semantic frame representation that would capture all necessary information from the sentence and serve as a focal point for all components of the back-end. The frame design should be flexible enough to be able to extend to other domains. The second goal was to provide a mechanism that would permit the domain-dependent aspects of the system to be entered completely through table-driven mechanisms, without requiring any explicit programming.</Paragraph>
      <Paragraph position="4"> Processing of a sentence involves several steps. The first step is to provide a parse tree for the input word stream.</Paragraph>
      <Paragraph position="5"> A second-pass treewalk through the parse tree yields a semantic frame, which is then integrated with available frames from the history. Both an SQL query and a generated text response are derived from the completed frame. The verbal response is spoken to the subject and a table is retrieved from the database through the database management system O R.-ACLE. A table post-processing step converts the table to a much more readable and informative form prior to display.</Paragraph>
      <Paragraph position="6"> Finally, the system examines the goal plan and optionally initiates an additional response, based on its assessment of  role of semantic frame.</Paragraph>
      <Paragraph position="7"> a likely next step. A thorough description of dialogue and discourse aspects of the system along with an example flight reservations dialogue can be found in \[4\].</Paragraph>
      <Paragraph position="8"> The Semantic Frame: The parse outputs of TINA are first converted to a semantic frame representation which serves three critical roles, as shown in Figure 1: it is translated to SQL through table-driven pattern matching devices, it is delivered to a text-generation program to construct appropriate verbal responses, and it serves as input to the discourse history used to restore implicit information in subsequent queries and resolve explicit anaphoric references.</Paragraph>
      <Paragraph position="9"> Each frame is associated with a name, a type, and a set of (key: value) pairs. The value can be an integer, a string, a symbol, another frame, or a set of frames. There are only a small number of possible types of frames, such as clause, predicate, qset (for common noun phrases), reference (for proper nouns), and quantifier. The type reference always has a special key reflype associated with it, identifying the class of proper nouns it belongs in (i.e., city-name would be the reflype for &amp;quot;Boston.&amp;quot;) Conversion of Parse Tree to Semantic Frame: The process of producing a semantic frame involves a second-pass tree walk through a completed parse tree. Only the names of the nodes are needed, because of the semantic nature of the grammar. In the tree walk, nodes pass along frames, modifying them if necessary according to the node's seman- null tic significance. A completed semantic frame is ultimately returned to the top-level sentence node and delivered as-is to the back end for further processing.</Paragraph>
      <Paragraph position="10"> About half of the nodes in the ATIS grammar have no semantic significance, and hence they simply pass along to their children and later to their right sibling whatever was delivered to them. Each of the active nodes is associated by name to a particular semantic name, which is often the same as its &amp;quot;given&amp;quot; name. Each semantic name is in turn associated with a particular functionality. There are fewer than twenty possible functions, and during the tree walk, the particular function to choose is dictated by the association.</Paragraph>
      <Paragraph position="11"> Each function is called with three arguments: the semantic name, the subparse tree and the current frame.</Paragraph>
      <Paragraph position="12"> A simple example may help to clarify this process. The node named dir-object is associated with the semantic name theme which calls the function process-noun-phrase. This function, during the top-down cycle, creates an empty frame of type qset and inserts it into the current frame under the key theme as specified by the argument. It then passes the empty frame along to its children, who will fill it in. Finally, it passes the original frame to its right siblings, with a completed entry under the key theme.</Paragraph>
      <Paragraph position="13"> Decoding the Frame: A completed semantic frame is passed to the back-end for interpretation. The top-level frame is always of type clause, and its name determines a particular clause-level analysis function to be executed. Options include request, statement, yes-no-question, clarifier, etc. For example, the function for a yes-no question makes two separate calls to the database. The first one determines the set of all objects as specified by the topic, and the second one finds the set defined by the topic restricted by the predicate (or complement). A final step seeks a non-null intersection between the two sets. There are three possible types of response, namely &amp;quot;There is no &lt;topic&gt;,&amp;quot; &amp;quot;Yes, &lt;topic&gt; does do &lt;predicate&gt;,&amp;quot; and &amp;quot;No, &lt;topic&gt; does not do &lt;predicate&gt;.&amp;quot; Thus, to answer the question, &amp;quot;Does the earliest flight serve lunch?&amp;quot; the system finds both the earliest flight and the earliest flight that serves lunch, and determines whether they are the same flight.</Paragraph>
      <Paragraph position="14"> In addition to the high-level interpretation of clauses, some low level routines serve to reorganize certain information in the frame as delivered by the parser. For example, there are many modifiers which can be attached to either flights or fares. We decided that it would be easier for later processing if all fare modifiers are physically transferred to a flight object, which is created if it didn't exist explicitly in the sentence. Thus if the person says, &amp;quot;Show fares from Boston to Denver,&amp;quot; the sentence is converted into: &amp;quot;Show fares forflights from Boston to Denver.&amp;quot; In addition, phrases about time and date are regularized and turned into absolute references. Thus &amp;quot;the following Wednesday,&amp;quot; is decoded as &amp;quot;the date which is on the subsequent Wednesday to the  me the price of a limousine to Oakland.&amp;quot; date stored in the history table.&amp;quot; After the frame is properly restructured, it is sent off to the discourse module, which augments noun phrases (mainly flights and fares) with appropriate modifiers from the history.</Paragraph>
      <Paragraph position="15"> SQL Query Generation Mechanism: All of the domain-dependent information needed to map frames into SQL queries is contained in a small set of tables, which are decoded through a simple artificial language involving a small number of special operations. The basic unit of recognition is a pattern containing (name (key value-type)), where name is the name of the parent frame, and value-type is the uniquely defined identifier for the value associated with the key. For example, the value-type of a qset is simply its name~ the value-type of a reference is its reflype, and the value-type of a string is STRING.</Paragraph>
      <Paragraph position="16"> We will explain the interface between the semantic frame and the back end by walking through a simple example. The semantic frame derived from the sentence, &amp;quot;Show me the price of a limousine to Oakland,&amp;quot; is given in Figure 2, and the table entries needed to decode that frame are shown in  The top-level display-table defines a set of elements to be displayed and the set of database tables in which to find these elements. For our simple example, the instructions are to display all elements in the ground_service table, given a qset named fare with a for key whose value is a qset named auto.</Paragraph>
      <Paragraph position="17"> The final set of elements and tables to be displayed is constructed as the union of all sets whose patterns are matched in display-table. In some cases, entries from multiple tables must be displayed, and for these cases there is an additional table that defines how to link the two database tables.</Paragraph>
      <Paragraph position="18"> The qset-table contains a set of patterns particular to frames of type qset, which trigger the augmentation of a simple database SQL query with a set of where-clause's. The system processes a top-level qset through recursive processing of possible nested qsets. In our example, both the top- null me the price of a limousine to Oakland,&amp;quot; whose semantic frame is shown in Figure 1 level fare and the auto entry under the for slot are qsets. The entry under fare that matches this pattern instructs the system to add to the parent query all the where clauses that are generated by the auto qset where the special code $1 stands for the argument.</Paragraph>
      <Paragraph position="19"> There are two entries under auto that are activated by our frame. The one matched by car-type constructs the where-clause for the unit: &amp;quot;where transport_code = 'L' &amp;quot; and the one under the key to constructs the where-clause for the city_code. The decoding of the city &amp;quot;Denver&amp;quot; is done through the conversion-table, keyed by the special operator cvt (convert). The operator sql in conversion-table triggers the construction of another SQL query, &amp;quot;select distinct city_code from city where city_name = 'DENVER,' &amp;quot; which is inserted into the where-clause for city_code in ground_transport. What is constructed through this decoding step is not the actual string appropriate for calling the database, but rather a hierarchy of structures representing queries and where clauses, which can be converted to the query string through a printquery function, resulting in the SQL command shown in Figure 4.</Paragraph>
      <Paragraph position="20"> The Table Display: We felt that in many cases the raw information from the database would not be readily comprehended without a further transformation. Therefore, we wrote a set of conversion routines associated with each column heading that would make the table easier to understand. Thus a clock time would be converted from &amp;quot;1426&amp;quot; to &amp;quot;2:26 P.M.&amp;quot;, an airline name from &amp;quot;DL&amp;quot; to &amp;quot;Delta&amp;quot;, and a fare class from &amp;quot;QX&amp;quot; to &amp;quot;QX: coach class discounted weekday.&amp;quot; In some cases, we felt the database column was sufficiently  select distinct * from ground_service where transport_code = 'L' and city_code in (select distinct city_code from city where city_name = 'OAKLAND')  of a limousine to Oakland.&amp;quot; confusing that it was better to leave it out altogether, especially in cases where the text response redundantly carried the information. For instance, we never display the column &amp;quot;flight days,&amp;quot; since the verbal response will always say, &amp;quot;on Tuesday&amp;quot; when appropriate. Likewise, we omit the flightcode column because it invites the user to refer to flights by their flight code using unpredictable language constructs.</Paragraph>
      <Paragraph position="21"> Our paper on database collection \[3\] discusses the effects of this transformation on solicited speech.</Paragraph>
      <Paragraph position="22"> Verbal Response: A completed semantic frame is sent to a text generation program along with the database table indicating the answer. Text generation is mostly guided through tables, associating keys with both a print function and a positional specification within the parent frame's over-all scheme. For example, adjectival modifiers precede the main noun, a flight-number immediately follows the main noun, and a post-modlfier such as a relative clause or a gerund occurs at the end. Clause level generation is done through specialized functions, each associating with a particular clause type, such as yes-no-question. The database table is used both to infer what should be said at the top-level, and to determine whether the noun phrase is singular or plural.</Paragraph>
      <Paragraph position="23"> Thus, for example, an existential clause would be required to produce one of, &amp;quot;There is&amp;quot; There are&amp;quot; or &amp;quot;There are no&amp;quot; preceding a noun-phrase describing the intended flight set.</Paragraph>
      <Paragraph position="24"> When a person asks a wh-query, such as &amp;quot;What meals do these flights serve?&amp;quot; the system detects the trace under the object of the verb &amp;quot;serve&amp;quot; and inserts the canned phrase, &amp;quot;the following meals&amp;quot; into the verbal response. The database table is then displayed providing the answer in a meals column.</Paragraph>
      <Paragraph position="25"> Other Aspects of the System: There are two other major components of the system that have not yet been discussed. These are the discourse history management system and the dialogue component. Both of these are described in detail in \[4\] and therefore will only be briefly mentioned here. Discourse is managed through a history table containing several types of elements derived from the previous sentences, including both semantic frames identifying named objects such as flights and dates, display tables from the database, and, in the case of bookings, previous states of the ticket. Most of the history revolves around a flight-event object. Modifiers are inherited from the history either if they are not explicitly mentioned in the current frame or if no  &amp;quot;masker&amp;quot; modifiers are present. For each history modifier, a set of maskers is specified in a table. We determined the masking conditions based on experience with real data. For example, if the subject asks about &amp;quot;non-stop&amp;quot; flights, then a connection-place would not be inherited. The most complex history management involves references to &amp;quot;return flights,&amp;quot; in which a previously mentioned source and destination must be &amp;quot;swapped,&amp;quot; unless the previous sentence also concerned return flights. In addition, only fare restrictions and airline should be inherited, along with source and destination. Any previous references to a date or a flight number would be dropped when talking about return flights.</Paragraph>
      <Paragraph position="26"> The computer essentially always gives a verbal response to the subject's question identifying the contents of the displayed table. Dialogue is maintained through a dialogue state stack which is popped and evaluated after each input sentence is fully processed. A clear division is kept in the computer code between the subject's half of the conversation and the computer's half. During the analysis of the subject's contribution, the dialogue state may be modified, but none of the dialogue execution routines are called. Most of the time the dialogue stack is empty, and it rarely contains more than one previous state. Dialogue is used mostly during bookings, which involve a complex interplay between the subject and the computer. For example, if the subject says, &amp;quot;Book the cheapest flight.&amp;quot; the system must remember that a booking is underway, but must first ask whether the subject wants a one-way or round-trip fare. Hence the stack becomes twodeep at this point.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML