A Robust and Efficient Three-Layered Dialogue Component 
for a Speech-to-Speech Translation System* 
Jan Alexandersson and Elisabeth Maier and Norbert Reithinger 
DFKI GmbH, Stuhlsatzenhausweg 3, D-66123 Saarbrficken, Germany 
{alexanders son, maier, reithinger}~dfki, uni- sb. de 
Abstract 
We present the dialogue component of 
the speech-to-speech translation system 
VERBMOBIL. In contrast to conventional 
dialogue systems it mediates the dia- 
logue while processing maximally 50% of 
the dialogue in depth. Special require- 
ments (robustness and efficiency) lead 
to a 3-layered hybrid architecture for 
the dialogue module, using statistics, an 
automaton and a planner. A dialogue 
memory is constructed incrementally. 
1 Introduction 
VERBMOBIL combines the two key technologies 
speech processing and machine translation. The 
long-term goal of this project is the development 
of a prototype for the translation of spoken di- 
alogues between two persons who want to find a 
date for a business meeting (for more detail on the 
objectives of VERBMOBIL see (Wahlster, 1993)). A 
special characteristic of VERBMOBIL is that both 
participants are assumed to have at least a pas- 
sive knowledge of English which is used as inter- 
mediate language. Translations are produced on 
demand so that only parts of the dialogue are pro- 
cessed. If VERBMOBIL is inactive, shallow process- 
ing by a keyword spotter takes place which allows 
the system to follow the dialogue at least partially. 
In this paper focus is on the description of the 
dialogue component, which processes the interac- 
tion of the two dialogue partners and builds a rep- 
resentation of the discourse. Dialogue processing 
in VERBMOBIL differs from systems,like SUNDIAL 
(Andry, 1992) in two important points: (1) VERB- 
MOBIL mediates the dialogue between two human 
dialogue participants; the system is not a partic- 
ipant of its own, i.e. it does not control the di- 
alogue as it happens in the flight scheduling sce- 
nario of SUNDIAL; (2) VERBMOBIL processes maxi- 
mally 50% of the dialogue contributions in depth, 
*The research within VERBMOBIL presented here is 
funded by the German Ministry of Research and Tech- 
nology under grant 011V101K/1. 
i.e. when the 'owner' of VERBMOBIL speaks Ger- 
man only. The rest of the dialogue can only be 
followed by a keyword spotter. 
In the remainder of this paper first the require- 
ments of the VERBMOBIL setting with respect to 
functionality and design of the dialogue compo- 
nent section are introduced. Then a hybrid archi- 
tecture for the dialogue component and its embed- 
ding into the VERBMOBIL prototype are discussed. 
Finally, results from our implemented system are 
presented. We conclude with an outline of future 
extensions. 
2 Tasks of the Dialogue 
Component 
The dialogue component within VERBMOBIL has 
four mQor tasks: 
(1) to support speech recognition and linguis- 
tic analysis when processing the speech signal. 
Top-down predictions can be made to restrict the 
search space of other analysis components to get 
better results in shorter time (Young et al., 1989; 
Andry, 1992)). For instance, predictions about a 
speech act can be used to narrow down the set of 
words which are likely to occur in the following 
utterance - a fact exploited by the speech recog- 
nition component which uses adaptive language 
models (Jellinek, 1990). Top-down predictions are 
also used to limit the set of applicable grammar 
rules to a specific subgrammar. They are of par- 
ticular importance since the system has to work 
under real-time constraints. 
(2) to provide contextual information for other 
VERBMOBIL components. In order to get good 
translations, context plays an important role. One 
example is the translation of the German "Geht es 
bei Ihnen?" which can be translated as "Does it 
suit you?" or "How about your place?", depend- 
ing on whether the dialogue partners discussed a 
time or a place before. A discourse history is con- 
structed which can be accessed by other VP.RB- 
MOBIL components(Ripplinger and Caroli, 1994; 
LuperFoy and Rich, 1992). 
(3) to follow the dialogue when V~.RBMOBIL is 
off-line. When both dialogue participants speak 
188 
English (and no automatic translation is neces- 
sary) VERBMOBIL is "passive", i.e. no syntactic or 
semantic analyses are performed. In such cases, 
the dialogue component tries to follow the dia- 
logue by using a keyword spotter. This device 
scans the input for a small set of predetermined 
words which are characteristic for certain stages of 
the dialogue. The dialogue component computes 
the most probable speech act type of the next ut- 
terance in order to selects its typical key words. 
(4) to control clarification dialogues between 
VERBMOBIL and its users. If processing breaks 
down VERBMOBIL has to initiate a clarification di- 
alogue in order to recover. 
3 The Architecture 
Planner 
I 
FSM 
'1 
Statistics 
Dialogue Memory Intel~~ 
Refer~mtlal StnJcture 
.  Key-Word Spotting I 
• -~ SemanUc \] Evaluation \] 
IH,.°.,. i 
~ Generation I 
Figure 1: Architecture of the dialogue module 
The abovementioned requirements cannot be 
met when using a single method of processing: if 
we use structural knowledge sources like plans or 
dialogue-grammars, top-down predictions are dif- 
ficult make, because usually one can infer many 
possible follow-up speech acts from such knowl- 
edge sources that are not scored (Nagata and Mo- 
rimoto, 1993). Also, a planning-only approach is 
inappropriate when the dialogue is processed only 
partially. Therefore we chose a hybrid 3-layered 
approach (see fig. 1) where the layers differ with 
respect to the type of knowledge they use and the 
task they are responsible for. The components are 
A Statistic Module The task of the statistic 
module is the prediction of the following 
speech act, using knowledge about speech act 
frequencies in our training corpus. 
A Finite State Machine (FSM) The 
finite state machine describes the sequence of 
speech acts that are admissible in a standard 
appointment scheduling dialogue and checks 
the ongoing dialogue whether it follows these 
expectations (see fig. 2). 
A Planner The hierarchical planner constructs 
a description of the dialogue's underlying dia- 
logue and thematic structures, making exten- 
sive use of contextual knowledge. This mod- 
ule is sensitive to inconsistencies and there- 
fore robustness and backup-strategies are the 
most important features of this component. 
While the statistical component completely re- 
lies on numerical information and is able to pro- 
vide scored predictions in a fast and efficient way, 
the planner handles time-intensive tasks exploit- 
ing various knowledge sources, in particular lin- 
guistic information. The FSM can be located in 
between these two components: it works like an 
efficient parser for the detection of inconsistent di- 
alogue states. The three modules interact in cases 
of repair, e.g. when the planner needs statistical 
information to resume an incongruent dialogue. 
On the input side the dialogue component is in- 
terfaced with the output from the semantic con- 
struction/evaluation module, which is a Drts-like 
feature-value structure (Bos et al., 1994) contain- 
ing syntactic, semantic, and occasionally prag- 
matic information. The input also includes infor- 
mation from the generation component about the 
utterance produced in the target language and a 
word lattice from the keyword spotter. 
The output of the dialogue module is de- 
livered to any module that needs information 
about the-dialogue pursued so far, as for example 
the transfer module and the semantic construc- 
tion/evaluation module. Additionally, the key- 
word spotter is provided with words expected in 
the next utterance. 
4 Layered Dialogue Processing 
4.1 Knowledge-Based Layers 
4.1.1 The Underlying Knowledge Source 
- The Dialogue Model 
Like previous approaches for modeling task- 
oriented dialogues we base our ideas on the as- 
sumption that a dialogue can be described by 
means of a limited but open set of speech acts 
(e.g. (Bilange, 1991), (Mast, 1993)). As point 
of departure we take speech acts as proposed by 
(Austin, 1962) and (Searle, 1969) and also a num- 
ber of so-called illocutionary acts as employed in a 
model of information-seeking dialogues (Sitter and 
Stein, 1992). We examined the VERBMOBIL cor- 
pus of appointment, scheduling dialogues for their 
occurrence and for the necessity to introduce new 
speech acts 1 . 
At present,, our model contains 17 speech acts 
(see (Maier, 1994) for more details on the char- 
acterization of the various speech acts; the di- 
alogue model describing admissible sequences of 
z The acts we introduce below are mostly of illocu- 
tionary nature. Nevertheless we will refer to them as 
speech acts throughout this paper. 
189 
speech acts is given in fig. 2). Among the domain- 
dependent speech acts there are low-level (primi- 
tive) speech acts like BEC~RUESSUNG for initiating 
and VERABSCHIEDUNG for concluding a dialogue. 
Among the domain-independent speech acts we 
use acts as e.g. AKZEPTANZ and ABLEHNUNG. 
Additionally, we introduced two speech acts nec- 
essary for modeling our appointment scheduling 
dialogues: INIT_TERMINABSPRACHE and BESTAE- 
TIGUNG. While the first is used to describe utter- 
ances which state date s or places to be negotiated, 
the latter corresponds to contributions that con- 
tain a mutual agreement concerning a given topic. 
Mnin dialogue 
Begruessung Auflordecung_Stellung BecufL_Position Init _Telminal~prache Vorsehlag 
~Vorsldlung 
rund .TA ~ Vomchlag 
Aufforderung_Stellung ~ 
• ~$ Inlt_Temamabsprache~ ~ Akzeplanz, Ablehnung/ 
Auforderung Vorschlag /,,~ U at~k NV°mchlag // // uestaetigung \ // 
w forderung_- ~0 ! Stellung 
Bestaetigung Dank Vomchlag. Aufforderung_Stellung Verabsehiedung 
\[ r'\] lnilial State 0 Fu,al Sta. • Non-finalS== \] 
Potential sdditions English Equivalents for German Speech Act Names: 
in any diMogue state Begruessung Greeting 
Berufi_Position Position Vorstellung Introduction 
Grund_TA Reason_for_Appointmmt Begruendung Init Terminabsprache Initialisation 
Deliberation A ultord erung_Stellung Reque6t_for_Statement Abweichung AuffordeRing_Vorsehlag Request_for Suggestion 
~/~ Akzeptanz Accept Ablehnung Reject 
Vorschlag Suggestion Bestaetigung Confirmation 
\[\] Verabschiedung Bye 
Klaerur~gs- I ~ Klaerungs- Dar~ Thanks Deliberation Delibemli0n 
lrage ~//antwoa Abweichung Deviation Klaerungsfrage Clarification_Question 
0 Klaerungsantwoa. Clarifcation_Answer Begruendung. Reason 
Figure 2: A dialogue model for the description of 
appointment scheduling dialogues 
The dialogue consists Of three phases (Maier, 
1994). First, an introductory phase, where the 
discourse participants greet I each other, introduce 
themselves and provide information e.g. about 
their professional status. After • this, the topic of 
the conversation is introduced, usually the fact 
that one or more appointments have to be sched- 
uled. Then negotiation •begins where the discourse 
participants repeatedly offer possible time frames, 
make counter offers, refine the time frames, re- 
ject offers and request other • possibilities. Once 
an item is accepted ~nd mutual agreement exists 
either the dialogue can be terminated, or another 
appointment is negotiated. 
A dialogue model based on speech acts seems 
to be an appropriate approach also from the point 
of view of machine translation and of transfer in 
particular: While in written discourse sentences 
can be considered the basic units of transfer, this 
assumption is not valid for spoken dialogues. In 
many cases only sentence fragments are uttered, 
which often are grammatically incomplete or even 
incorrect. Therefore different descriptive units 
have to be chosen. In the case Of VERBMOBIL these 
units are speech acts. 
The speech acts which in our approach are em- 
bedded in a sequential model of interaction can be 
additionally classified using the taxonomy of dia- 
logue control functions as proposed in e.g. (Bunt, 
1989). Speech acts like BEGRUESSUNG and VE- 
RABSCHIEDUNG, for example, can be classified as 
dialogue flmctions controlling interaction manage- 
ment. More fine-grained taxonomical distinctions 
like CONFIRM and CONFIRM/WEAK as proposed 
in (Bunt, 1994) are captured in our approach by 
pragmatic features like suitability and possibility 
specified in the DRS-description of an utterance, 
which serves as input for the dialogue component. 
4.1.2 Tile Finite State Machine 
The finite state machine provides an efficient 
and robust implementation of the dialogue model. 
It parses the speech acts encountered so far, tests 
their consistency with the dialogue model and 
saves the current state. When an inconsistency 
occurs fall back strategies (using for instance the 
statistical layer) are used to select the most prob- 
able state. The state machine is extended to al- 
low for phenomena that might appear anywhere 
in a dialogue, e.g. human-human clarification di- 
alogues and deliberation. It can also handle re- 
cursively embedded clarification dialogues. 
An important task of this layer is to signal to 
the planner when an inconsistency has occurred, 
i.e. when a speech act is not within the standard 
model so that it can activate repair techniques. 
4.1.3 The Dialogue Planner 
To incorporate constraints in dialogue process- 
ing and to allow decisions to trigger follow-up 
actions a plan-based approach has been chosen. 
This approach is adopted from text generation 
where plan-operators are responsible for choos- 
ing linguistic means in order to create coherent 
stretches of text (see, for instance, (Moore and 
Paris, 1989) and (Hovy, 1988)). The application 
of plan operators depends on the validity of con- 
straints. Planning proceeds in a top-down fash- 
ion, i.e. high-level goals are decomposed into sub- 
goals, each of which has to be achieved individ- 
ually in order to be fulfilled. Our top-level goal 
SCHEDULE-MEETING (see below) is decomposed 
into three subgoals each of which is responsible 
for the treatment of one dialogue segment: the in- 
190 
troductory phase (GREET-INTRODUCE-TOPIC), the 
negotiation phase (NEGOTIATE) and the closing 
phase (FINISH). These goals have to be fulfilled in 
the specified order. The keyword iterate speci- 
fies that negotiation phases can occur repeatedly. 
begin-plan-operator GENERIC-OPERATOR 
goal \[SCHEDULE-MEETING\] 
constraints nil 
actions nil 
subgoals (sequence \[GREET-INTRODUCE-TOPIC\] 
iterate \[NEGOTIATE\] 
\[FINISH\] ) 
end-plan-operator 
begin-plan-operator OFFER-OPERATOR 
goal \[OFFER\] 
constraints nil 
actions (retrieve-theme) 
subgoals primitive 
end-plan-operator 
In our hierarchy of plan operators the leaves, i.e. 
the most specific operators, correspond to the in- 
dividual speech acts of the model as given in fig. 2. 
Their application is mainly controlled by prag- 
matic and contextual constraints. Among these 
constraints are, for example, features related to 
the discourse participants (acquaintance, level of 
expertise) and features related to the dialogue his- 
tory (e.g. the occurrence of a certain speech act in 
the preceding context). 
Additionally, our plan operators contain an ac- 
tions slot, where operations which are triggered 
after a successful fulfillment of the subgoals are 
specified. Actions, therefore, are employed to in- 
teract with other system components. In the sub- 
plan 0FFER-0PERATOR, for example, which is re- 
sponsible for planning a speech act of the type 
VORSCHLAG, the action (retrieve-theme) filters 
the information relevant for the progress of the 
negotiation (e.g. information related to dates, like 
months, weeks, days) and updates the thematic 
structure of the dialogue history. During the plan- 
ning process tree-like structures are built which 
mirror the structure of the dialogue. 
The dialogue memory consists of three layers 
of dialog structure: (1) an intentional structure 
representing dialogue phases and speech acts as 
occurring in the dialogue, (2) a thematic struc- 
ture representing the dates being negotiated, and 
(3) a referential structure keeping track oflexical 
realizations. The planner also augments the input 
sign by pragmatic information, i.e. by information 
concerning its speech act. 
The plan-based and the other two layers - 
statistics and finite state machine - interact in 
a number of ways" in cases where gaps occur 
in the dialogue statistical rating can help to de- 
termine the speech acts which are most likely to 
miss. Also, when the finite state machine detect§ 
an error, the planner must activate plan operators 
which are specialized for recovering the dialogue 
state in order not to fail. For this purpose spe- 
cialized repair-operators have been implemented 
which determine both the type of error occurred 
and the most likely and plausible way to continue 
the dialogue. It is an intrinsic feature of the dia- 
logue planner that it is able to process any input 
- even dialogues which do not the least coincide 
with our expectations of a valid dialogue - and 
that it proceeds properly if the parts processed by 
VERBMOBIL contain gaps. 
4.2 The Statistical Layer- Statistical 
Modeling and Prediction 
Another level of processing is an implementa- 
tion of an information-theoretic model. In speech 
recognition language models are commonly used 
to reduce the search space when determining a 
word that can match a given part of the in° 
put. This approach is also used in the domain 
of discourse modeling to support the recognition 
process in speech-processing systems (Niedermair, 
1992; Nagata and Morimoto, 1993). The units to 
be processed are not words, but the speech acts of 
a text or a dialogue. The basis oLprocessing is a 
training corpus annotated with the speech acts of 
the utterances. This corpus is used to gain sta- 
tistical information about the dialogue structure, 
namely unigram, bigram and trigram frequencies 
of speech acts. They can be used for e.g. the pre- 
diction of following speech acts to support the 
speech processing components (e.g. dialogue de- 
pendent language models), for the disambiguation 
of diflhrent readings of a sentence, or for guiding 
the dialogue planner. Since the statistical model 
always delivers a result and since it can adapt it- 
self to unknown structures, it is very robust. Also, 
if the statistic is updated during normal operation, 
it can adapt itself to the dialogue patterns of the 
VERBMOBIL user, leading to a higher prediction 
accuracy. 
Considering a dialogue to be a source that has 
speech acts as output, we can predict the nth 
speech act s,~ using the maximal conditional prob- 
ability 
s,, := max.. P(sls,,,1, s,,-2, s,_a, ...) 
We approximate P with the standard smooth- 
ing technique known as deleted interpolation 
(Jellinek, 1990), using unigram, bigram and tri- 
gram relative frequencies, where f are relative fre- 
quencies and qi are weights whose sum is 1: 
P(s,,Is.-,, s,,..~) = 
q,f(s..) + q~f(s..Is.._, ) + q.~f(s..ls,-,, s..-2) 
Given tl/is formula and the required N-grams we 
can determine the k best predictions for the next 
speech acts. 
In order to evaluate the statistical model, we 
made various experiments. In the table below the 
results for two experiments are shown. Experi- 
ment TS1 uses 52 hand-annotated dialogues with 
191 
2340 speech acts as training corpus, and 41 dia- 
logues with 2472 speech acts as test data. TS2 
uses another 81 dialogues with 2995 speech acts 
as test data. 
I P~ed" I TSI I TS2 I 
1 40,65 % 44,24 % 
2 60,19 % 66,47 % 
3 73,92 % 81,46 % 
Compared to the data from (Nagata and Mo- 
rimoto, 1993) who report prediction accuracies of 
61.7 %, 77.5 % and 85.1% for one, two or three 
predictions respectively, our predictions are less 
reliable. The main reason is, that the dialogues in 
our corpus frequently do not follow conventional 
dialogue behavior, i.e. the dialogue structure dif- 
fers remarkably from dialogue to dialogue. 
5 An Annotated Example 
To get an impression of the flmctionality of the 
dialogue module, we will show the processing of 
three sentences which are part of an example di- 
alogue which has a total length of 25 turns. This 
dialogue is part of a corpus of 200 dialogues which 
are all fully processed by our dialogue component. 
Prior to sentence DEO04 given below ~.L initialized 
the dialogue requesting a date for a trip s. 
DEO04: #oh ja, gut, nach meinem Termin- 
kalender <Pause>, wie w"ars im 
Oktober?# (VORSCHLAG) 
VMO05: just lookin at my diary, I would 
suggest October. (VORSCHLAG) 
DEO06/I: <Pause> I propose from Tuesday 
the fifth/- 
DEO06/2: <Pause> no, Tuesday the fourth to 
Saturday the eighth <Pause>, 
those five days? (VORSCHLAG) 
ELO07: oh, that's too bad, I'm not free 
right then. (ABLEHNUlVG) <Pause> 
I could fit it into my schedule 
<Smack> the week after, from 
Saturday to Thursday, the 
thirteenth. (VORSCHLAG) 
If we trace the processing with the finite state 
machine and the statistics component, allowing 
two predictions, we get the following results: 
ELO03 : INIT_TERMINABSPRACHE 
Prediction: (VORSCHLAG AUFFORDERUNG_VORSCHLAG) 
DEO04 : VORSCHLAG 
Prediction: (KKZEPTANZ VORSCHLAG) 
DEO06/I : VORSCHLAG 
Prediction: (AKZEPTANZ VORSCHLAG) 
DEO06/2: VORSCHLAG 
Prediction: (AKZEPTANZ VORSCHLAG) **Failed** 
ELO07/I : ABLEHNUNG 
Prediction: (VORSCHLAG AUFFORDERUNG_STELLUNG) 
ELO07/2 : VGRSCHLAG 
2DE indicates the German speaker, VM the trans- 
lation provided by VERBMOBIL and EL the English 
speaker. # indicates pressing or release of the button 
that activates VERBMOBIL. 
Prediction: (AKZEPTANZ ABLEHNUNG) 
While the finite state machine accepts the se- 
quence of speech acts without failure the pre- 
dictions made by the statistical module are not 
correct for DE006/2. The four best predic- 
tions and their scores are AKZEPTANZ (28.09~,), 
VORSCHLAG (26.93~,), ABLEHNUNG (21.67~,) and 
AUFFORDERUNG_STELLUNG (9.7~,). In comparison 
with the fourth prediction, the first three pre- 
dictions have a very similar ranking, so that the 
failure can only be considered a near miss. The 
overall prediction rates for the whole dialogue are 
56.52 %, 82,60%, and 95.65% for one, two, and 
three predictions, respectively. 
Since the dialogue can be processed properly 
by the finite state machine no repair is neces- 
sary. The only task of the planner therefore is 
the construction of the dialogue memory. It adds 
the incoming speech acts to the intentional struc- 
ture, keeps track of the dates being negotiated, 
stores the various linguistic realizations of objects 
(e.g. lexical variations, referring expressions) and 
builds and administrates the links to the instanti- 
ated representation of these objects in the knowl- 
edge representation language BACK (Hoppe et al., 
1993). In fig. 3 we give two snapshots showing 
how the dialogue memory looks like after process- 
ing the turns DE006/2 and EL007. 
6 Conclusion and future extensions 
Dialogue processing in VERBMOBIL poses prob- 
lems that differ from other systems like (Mast, 
1993) and (Bilange, 1991). Not being in a control- 
ling position within a speech-processing system 
but tracking a mediated dialogue calls for an ar- 
chitecture where different approaches to dialogue 
processing cooperate. One important goal of our 
module is to provide top-down information for the 
other modules of VERBMOBIL, e.g. to reduce the 
search space of the word recognizer. This require- 
ment is solved partially by using a statistics-based 
speech act prediction component. Also, we repre- 
sent contextual information that is important for 
other VERBMOBIL components, as e.g. transfer 
and generation. This information is built up by 
the planner during dialogue processing. 
Future extensions of the dialogue component, 
which has been sucessfidly tested with 200 dia- 
logues of our corpus concern the treatment of clar- 
ification dialogues. Robust processing will be an- 
other issue to be tackled: the possibility to process 
gaps in the dialogue will also be integrated. 
References 
Andry, F. 1992. Static and dynamic predictions: a 
method to improve speech understanding in coop- 
erative dialogues. In Proc. of the Int. Conference 
on Spoken Language Processing (ICSLP'9~), pages 
639-642, Bamff, October. 
192 
DE006/2t no, Tuesday the fourth to \[ EL007= oh that's too bad, I'a no~ free right 
Saturday the algth, those flve days? i then. I could fit it Into my schedule the wlak 
I after, from Saturday to Thursday, the thir~aenth, 
Intentional TERMINVEREINBARUNG I TERMINVEREINSARUNG 
Sbucture .,..-"4~VER HAN DLUN "- | EINLEITUNG J'4~ VERHANDLUNG EINLEITUNG -/ ~"~L% nn ~ I LAGS ~ 
DEOO4:VORSCHLAG r ~ \[ DEOO4:VORSCH ELOO7/2:VORSCHLAG 
DEOO6:VORSCHLAG / \[ DEOO6:VORSCHLAG / / 
.................................... i ...... ............ ...... 
Structure 
Month Week Group-of-days Month Week \Group-of-days / 
-_:.::-,,-.,\[ ..... ~;,o-~j.; .............. T-r ......... ~;,~;j:; ..... , .... !--~;,o;j-~ ............ l- 
_~;.."~t','2~,- / I ITu~,., \] | | \] \]Tue 4t'h " I Sat. to | I 
- " - / I " ItoSat ',~ " / I - ItoSat \[ ', I " IThut / I~'h / ~'" ', I 1'3'h / 
/ . I those5 _ | those5 I.,-: I - \[ . | 
J German | English German | English German | English ..................... ~ ................................ ~ .................. ~ .............. 
BACK @4-8 @4-8 @9-13 
Figure 3: A snapshot of the dialogue memory after processing the utterances DE006/2 and EL007 
Austin, 3. 1962. How to do things with words. Oxford: 
Clarendon Press. 
Bilange, E. 1991. A task independent oral dialogue 
model. In Proceedings of EACL-91, pages 83-88, 
Berlin, Germany, April. 
Bos, 3., Mastenbroek, E., McGlashan, S., Millies, S. 
and Pinkal, M. 1994. The VERBMOBIL Semantic 
Formalism. Technical report, Computerlinguistik, 
Universit/it des Saarlandes, Sam'brficken. 
Bunt, H. C. 1989. Information Dialogues as Com- 
municative Action in Relation to Partner Model- 
ing and Information Processing. In M. M. Taylor, 
F. N~el, and D. B. Bouwhuis, editors, Tile Struc- 
ture of Muitimodal Dialogue, pages 47-73. Elsevier 
Sience Publishers, North-Holland. 
Bunt, H. C. 1994. Context and Dialogue Control. 
Think, 3:19-31, May. 
Hoppe, T., Kindermann, C., Quantz, J. a., Schmiedel, 
A., and Fischer, M. 1993. BACK V5 Tutorial & 
Manual. KIT - REPORT 100, TU Berlin, March. 
Hovy, E. H. 1988. Planning coherent multisentential 
text. In Proceedings of the 26th A CL Conference, 
pages 179-186, Buffalo. 
Jeilinek, F. 1990. Self-organized language modeling 
for speech recognition. In A. Waibel and K.-F. Lee, 
editors, Readings in Speech Recognition, pages 450- 
506. Morgan Kaufmann. 
LuperFoy, S. and Rich, E.A. 1992. A three tiered dis- 
course representation framework for computational 
discourse processing. Technical report, MITRE 
Corporation and MCC. 
Maler, E. (ed.) 1994. Dialogmodellierung in VERB- 
MOBIL - Festlegung der Sprechhandlungen ffir den 
Demonstrator. Technical Report Verbmobil-Memo 
31, DFKI Saarbr/icken, July. 
Mast, M. 1993. Ein Dialogmodui /iir ein 
Spracherkennungs- und Dialogsystem. Ph.D. the- 
sis, Universit/it Erlangen. 
Moore, J. D. and PaISs, C. L. 1989. Planning text for 
advisory dialogues. In Proc. of A CL, Vancouver. 
Nagata, M. and Morimoto, T. 1993. An experimental 
statistical dialogue model to predict the Speech Act 
Type of the next utterance. In Proc. of the Int. 
Symposium on Spoken Dialogue (ISSD-93), pages 
83-86, Waseda University, Tokyo, Japan. 
Niedermair, G. Th. 1992. Linguistic Modeling in the 
Context of Oral Dialogue. In Proceedings of IC- 
SLP'92, volume 1, pages 635-638, Banff', Canada. 
Ripplinger, B. and Caroli, F. 1994. Konzept-basierte 
Ubersetzung in Verbmobil. Technical report, IAI 
Sam'br6cken, May. 
SeaHe, J. R. 1969. Speech Acts. Cambridge/GB: 
University Press. 
Sitter, S. and Stein, A. 1992. Modeling the iUocution- 
ary aspects of information-seeking dialogues. In. 
formation Processing and Management, 28(2):165 
- 180. 
Wahlster, W. 1993. Verbmobil -Translation of face- 
to-face dialogues. In Proceedings of the Fourth Ma- 
chine Translation Summit, Kobe, Japan, July. 
Young, S. R., Ward, W. H., and Hauptmann, A. G. 
1989. Layering predictions: flexible use of dialogue 
expectation in speech recognition. Proceedings of 
IJCAI-89, Detroit. 
193 
