STOCHASTIC REPRESENTATION OF 
CONCEPTUAL STRUCTURE IN THE ATIS TASK 
Roberto Pieraccini, Esther Levin, Chin-Hui Lee 
AT&T Bell Laboratories 
600 Mountain Avenue 
Murray Hill, N J, 07974 
ABSTRACT 
We propose a model for a statistical representation of the conceptual 
structure in a restricted subset of spoken natural language. The model 
is used for segmenting a sentence into phrases and labeling them with 
concept relations (or cases). The model is trained using a corpus of 
annotated transcribed sentences. The performance of the model was 
assessed on two tasks, including DARPA ATIS class A sentences. 
INTRODUCTION 
The goal of a speech understanding system is generally that of 
translating a sequence of acoustic measurements of the speech sig- 
nal into some form that represents the meaning conveyed by the 
sentence. One of the knowledge representation paradigms, known 
as semantic networks \[2\] establishes relations between conceptual 
entities using a graph structure. These concept relations, or lin- 
guistic cases, can be used to label different parts of a sentence 
in order to obtain its interpretation. The task itself defines the 
set of relevant cases. For instance, for the task of assigning the 
origin, the destination and the departure time of a flight, a con- 
venient representation is in terms of the following set of cases: 
C = {01,C2, C3, C4}, where Cl = ORIGIN, C2 = DESTINATION, 
03 = DEPARTURE_TIME, and C4 = DUMMY. The introduction 
of a DUMMY case is useful for covering all the parts of the sen- 
tence that are not relevant to the task. A sentence like ! would 
like to fly from Boston to Chicago next Saturday night can be 
analyzed as: 
• DUMMY: I would like to fly 
• ORIGIN: from Boston 
• DESTINATION: to Chicago 
• DEPARTURE_TIME: next Saturday night. 
Note that although the first phrase (I would like to fly) conveys 
important information, it is considered irrelevant to this partic- 
ular task, and therefore assigned to the DUMMY case. The 
segmentation of a sentence into cases (conceptual segmentation) 
can be described by labeling each word in the sentence with the 
index of the case it expresses. In the example above, the con- 
ceptuM segmentation is represented by the following sequence of 
labels: 
C = (cI,c2, c3...c12) (1) 
where: 
el = c2 ..... e5 = 04 (2) 
e 6 = c? = C1 
c 8 ~ c 9 z C 2 
In this paper we tackle the problem of decoding the words con- 
stituting the spoken sentence and the corresponding sequence of 
case labels, from the speech signal. 
MAP DECODING OF CASES 
Let us denote by 
A = a,,a2...aN, (3) 
the sequence of acoustic observations extracted from a spoken 
sentence, by 
W = ~,,w2...WM, (4) 
the sequence of words constituting the sentence, and by 
C = cl,c2...CM, (5) 
the sequence of case labels, where ci takes its values from a pre- 
defined set of conceptual relations C = {C1,C2,...CK}. The 
problem of finding W and C given A can be approached using 
the maximum a posteriori decoding (MAP). Following this crite- 
rion we want to find the sequence of words ~V and the sequence 
of cases C that maximizes the conditional probability 
P(~V, CJA) = max P(W,C\]A). (6) WxC 
This conditional probability can be written using the Bayes in- 
version formula as: 
P(W,C\[A) = P(AIW,C)P(W\[C)P(C) 
P(A) (7) 
In this formula P(C) represents the a-priori probability of the 
sequence of cases, P(W I C) is the probability of a sentence 
expressing a given sequence of cases, and P(A I W,C) is the 
acoustic model. We can reasonably assume that the acoustic 
representation of a word is independent of the conceptual relation 
it belongs to, hence: 
P(AlW,C) = P(AIW), (8) 
and this is the criterion that is usually maximized in stochastic 
based speech recognizers, for instance those using hidden Markov 
121 
modeling \[1\] for the acoustic/phonetic decoding. In this paper 
we deal with the remaining terms 
P(W \[ C)P(C)= (9) 
M 
H P(w~lwi-,...wl,C)P(Wl l C) 
i=2 
M 
II P(c~ L c~_1 ...cl)e(c~) 
i=2 
We proceed by assuming that: 
P(wi I wi-t ...wl, C) = (10) 
P(wi I ~',-~... w~_., ci), 
and 
P(cilci_t...ct) = (11) 
P(ci l ei-1...ci-m). 
These are Markov processes of order n and m respectively, and if 
n and m are large we don't lose any generality by making this as- 
sumption. For practical purposes n and m should be small enough 
to allow a reliable estimation of the probabilities from a finite set 
of data. An additional assumption in equation (10) is that a given 
word in the sentence, used for expressing a certain case, is inde- 
pendent of the case of the preceding words. Assuming that the 
sequence of words could be directly observed (for instance pro- 
viding a transcription of the uttered sentence), and the sequence 
of cases is unknown, equations (10) and (11) describe a a hidden 
Markov process, where the states of the underlying model corre- 
spond to the cases, the observation probabilities of each state are 
represented by equation (10) in the form of state local (n + 1)- 
gram language models, and the transitions between the states are 
described by equation (11). 
THE FROM-TO TASK 
A first evaluation of the model was performed based on a set of 
825 sentences artificially generated by a finite state grammar \[3\] 
using a vocabulary of 41 different words. The sentences express 
different ways of making requests to travel between two cities. A 
typical example is: 
I want to travel into Boston and I am interested in flights 
between Boston and Washington 
The task consisted of identifying the origin and destination cities 
of the flight. The relevant cases of this task are then flight ori- 
gin and flight destination. However the model has three states, 
ORIGIN, DESTINATION and DUMMY. 50 sentences, randomly 
selected out of the 825, were used to estimate the parameters 
of the model, i.e. the transition probabilities (equation 11) and 
the state local language models (equation 10), with n = 1 and 
m = 1 (i.e. the underlying Markov process was a 1 st order pro- 
cess and the state local language models were bi-grams). The 
training sentences were hand-labeled with the appropriate cases. 
The remaining 775 sentences were decoded using Viterbi decod- 
ing algorithm. The performance was assessed by counting the 
number of sentences that were segmented assigning the correct 
words (i.e. the correct city names) to the DESTINATION and 
ORIGIN states. We observed that 7% of the sentences (55 out of 
775) had a wrong origin/destination assignment. In some of the 
wrong segmentations one of the relevant states was missing, the 
other state containing both the real destination and origin cities. 
In other examples, similar to the sentences shown above, both the 
destination and the origin states were assigned to the same city 
name, that appeared twice in the sentence. To improve the per- 
formance we imposed some additional constraints in the decoding 
procedure. For a given sentence the decoded state sequence was 
searched among those sequences of states where both the origin 
and destination states were visited only once (i.e. when one of 
those states was left, the current partial path was not allowed to 
enter that state again). In addition, the phrases assigned to the 
origin and destination states had to include different city names. 
These constraints, representing a higher level a priori knowledge 
of the task, were imposed in the Viterbi decoding by keeping 
track of the past sequence of states for each partial candidate so- 
lution, and duplicating the partial solutions when two (or more) 
candidates merged at the same state and showed conflicting con- 
straints. This approach resulted in a substantial improvement of 
the performance. Only one error was observed out of the 775 test 
sentences ( 0.13% error rate). The same level of performance was 
obtained in experiments using a 1-gram language model inside 
each state, but increasing the number of states to five: 
ORIGIN, DESTINATION, DUMMY, FROM, TO. 
The last two states accounted for the expressions that usually 
precede the origin and destination city names respectively. For 
example the FROM state was associated to expressions of the 
kind: from, depart out of, leaving, etc., and the TO state was as- 
sociated to expressions like: to, going to, arriving into, etc. This 
experiment indicates that there is a tradeoff between the number 
of states and the complexity (order) of the state language mod- 
els. Expanding the set of states to reflect the linguistic structure 
of the sentences may result in a reduction of the number of pa- 
rameters to be estimated during training, giving a more robust 
model. 
THE ATIS TASK 
The technique of case decoding is being applied to the class 
A sentences of the DARPA ATIS task. A sentence of this task 
can be analyzed in terms of 7 general cases, that are QUERY, 
generally associated to the phrases expressing the kind of request, 
OBJECT expressing the object of the query, ATTRIBUTE that 
describes some attributes of the object, RESTRICTION describing 
the restrictions on the values of the answer, Q_ATTR describing 
possible attributes of the query, AND including connectives like 
and, or, also, indicating that the sentence may have more that 
one query. Of course we include a DUMMY state like in the above 
mentioned examples. For example, a sentence like: 
What type of economy fare could I get from San Francisco to 
Dallas on the 25th of April 
is segmented as: 
• QUERY: What type of 
• ATTRIBUTE: economy 
• OBJECT: fare 
122 
• Q_ATTR: could I get 
• RESTRICTION: from San Francisco to Dallas on the 25th 
of April 
We can further analyze some of the cases into more detailed con- 
ceptual relations, giving the following representation: 
s ATTRIBUTE 
o a_fare: economy 
• RESTRICTION 
o origin: from San Francisco 
o destination: to Dallas 
O date : on the 25th of April 
We defined 44 different cases for describing the whole set of 547 
class A training sentences. The complete list of cases is shown in 
Table l. The training sentences (covered by a vocabulary of 501 
words) were hand-labeled according to this set of states and the 
transition probabilities and the state local bigram models were 
estimated using the maximum likelihood criterion. Table 2 shows 
examples of the phrases used for estimating the bigram language 
models for some of the defined states. Considering the large num- 
ber of parameters to be estimated ( i.e. the transition probabili- 
ties between the 44 states of the model and the 44 bigram models 
extended to the entire vocabulary of 501 words), and considering 
the small number of training sentences, this estimation poses ro- 
bustness problems. One way to alleviate these problems consists 
of grouping the words in the vocabulary into equivalence classes. 
For example all the' city names can be grouped in the same class, 
as well as the airport names, the numbers, the airline names, etc. 
The testing of the system was performed on the transcribed 
Jun-90 and Feb-91 class A test sentences. New words were allo- 
cated to a new-word category that was assigned a small probabil- 
ity within each state. Table 3 reports the number of sentences, 
for each test set, that were correctly labeled by the case decoder, 
along with the statistics on the correctly assigned cases. Table 4 
shows examples of correct segmentations from the FEB-91 test 
set. It is interesting to notice the allocation of the connective 
and to different cases in sentences 1),3), and 4)..Although sen- 
tences 1) and 3) contain similar expressions (between ... and ...), 
the system recognizes that in the first case the phrase refers to 
a period of time, while in the second case it refers to origin and 
destination cases. Moreover, sentence 3) shows that the concept 
relations origin and destination are not necessarily referred to the 
origin and destination of a flight, but can be referred to other 
events, like ground transportation in this case. This sensitivity 
to the context (to the value of the 0 BJ ECT in the example above) 
shown by certain cases must be taken into account by the module 
that will interpret the conceptual segmentation and generate the 
SQL query. In sentence 4) the word and is clearly interpreted 
as connecting two distinct restrictions on the query. The same 
phenomenon is shown in sentence 5) where the word or connects 
two alternative possible origins of the flight. Table 5 shows ex- 
amples of incorrect segmentations from the FEB-91 test set. In 
sentence 1) the phrase used for Eastern should be assigned to 
the airline case. The error is due to the fact that the word Eastern 
was not observed in the training set. In sentence 2) the phrase 
through Dallas Fort Worth should have been labeled with the con- 
nect case, but this case has very few examples in the training set 
QUERY 
OBJECT 
ATTRIBUTE attribute 
a_date 
a_origin 
a_destin 
a_tirne 
a_airline 
a_flcode 
a_aircraft 
a_class 
a_fare 
a_stop 
a_atplace 
a _way 
a_restrict 
a_table 
a_body 
Q_AT T R 
'AND 
, DUMMY 
RESTRICTION date 
origin 
destin 
time 
airline 
flcode 
meal 
ground 
aircraft 
class 
fare 
stop 
atplace 
dept_time 
arvl_time 
way 
restrict 
table 
range 
speed 
body 
day 
connect 
Table 1: The set of cases in the ATIS task 
QUERY I would like 
can I have a list of 
it give me a description of 
OBJECT the flights 
the fare on 
a price on a ticket 
origin arriving from Dallas 
from Atlanta airport 
between airport B WI 
departing Atlanta 
destin and Boston 
arriving in San Francisco 
going to San Francisco 
returning to Atlanta 
dept_time leaving after 1:00 pm 
that depart in the afternoon 
way round- trip 
return 
that are round-trip 
class a class Q W ticket 
a 1st class ticket 
which have 1st class service available 
Table 2: Examples of phrases assigned to cases in the training sen- 
tences 
TEST Number of Sentences I Number of I Cases 
sentences correct cases correct 
JUN-90 98 87 (88.7%) i 419 \]398 (95.0%) 
FEB-91 i 148 119 (80.4%) 713 671 (94.1%) 
Table 3: Results with two different test sets 
123 
1)Please list all flights between Baltimore and Atlanta 
on Tuesdays between 4 in the afternoon and 9 in the eveninc 
DUMMY: Please 
QUERY: list all 
OBJECT: the flights 
origin: between Baltimore 
destin: and Atlanta 
day: on Tuesdays 
time: between 4 in the afternoon and 9 in the evening 
2) What's the cheapest round-trip airfare on American 
flight 1074 from Dallas to Philadelphia 
QUERY: What's 
a_fare: the cheapest 
a_way: round-trip 
OBJECT: airfare 
airline: on American 
flcode: flight 1074 
origin: from Dallas 
destin: to Philadelphia 
3) What kind of ground transportation is there between the 
airport and dowrltown Atlanta 
QUERY: What kind of 
OBJECT: ground transportation 
Q_ATTR: is there 
origin: between the airport 
destin: and downtown Atlanta 
4) What are the restrictions on the cheapest fare from 
Pittsburgh to Denver and from Denver to San Francisco 
QUERY: What are 
OBJECT: the restrictions 
fare: on the cheapest fare 
i 
!origin: from Pittsburgh 
destin: to Denver 
AND: and 
origin: from Denver 
destin: to San Francisco 
5)Display flights from Oakland or San Francisco to Denver 
Q U E RY: Display 
OBJECT: flights 
origin: from Oakland 
AND: or 
origin: San Francisco 
destin: to Denver 
Table 4: Examples of correctly decoded test sentences from FEB-91 
test set 
1) What kind of aircraft is used for Eastern flight 205 
QUERY: What kind of 
OBJECT: aircraft 
Q_ATTR: is 
I flcode: = used for Eastern flight 205 
2)1s there a flight from Denver through Dallas Fort Worth 
to Philadelphia 
QUERY: Is there 
OBJECT: a flight 
origin: from Denver through Dallas Fort Worth 
destin: to Philadelphia 
3)Can you please tell me the type of plane that my client 
would be flying on from Baltimore to Pittsburgh 
DUMMY: Can you please 
QUERY: tell me the type of plane that my client would be 
OBJECT: flying on 
origin: from Baltimore 
destin: to Pittsburgh 
Table 5: Examples of incorrectly decoded test sentences from FEB-91 
test set 
with a consequent poor estimation of the parameters related to 
it. The same problem, i.e. inadequate training, is also the cause 
of the wrong segmentation of sentence 3. 
Future Work 
The goal of the understanding system is to retrieve the infor- 
mation in the ATIS database. In order to do this we are develop- 
ing a module that translates the conceptual representation of the 
sentence obtained with the described method into an SQL query. 
Since the ambiguity of the sentence is resolved by the conceptual 
segmentation, this module implements a deterministic mapping. 
CONCLUSIONS 
We proposed a very simple semantic grammar for the ATIS 
task. The grammar was designed to be rich enough to handle 
most queries, but limited in certain ways so as to facilitate pars- 
ing by very simple and well-understood HMM methods. The ad- 
vantages of this approach are its straightforward integration with 
an HMM based speech recognizer, and its capability of learning 
from examples. Even with an extremely small training set, the 
system was able to assign the correct analysis to more than 80% 
of the class A sentences in both the JUN-90 and FEB-91 test sets. 
The authors gratefully acknowledge the helpful advice and 
consultation of Ken Church, Alexandra Gertner, A1 Gorin, Fer- 
nando Pereira, and Evelyne Tzoukerman. 
REFERENCES 
\[1\] Jelinek, F., "Continuous Speech Recognition by Statistical Meth- ods"Proceedings oflEEE, 
vol. 64, no. 4, pp. 532-556, 1976 
\[2\] Sowa, J., F. Conceptual Structures: Information Processing in 
Mind and Machine, Addison-Wesley, Reading, MA, 1984. 
\[3\] Gertner, A. N., Gorin, A. L., Roe, D. B., " Adaptive Language 
Acquisition from a Subset of the Airline Reservation Task," paper 
in preparation. 
124 
