Multilingual Summary Generation in a Speech-To-Speech 
Translation System for Multilingual Dialogues* 
Jan Alexandersson, Peter Poller, Michael Kipp, Ralf Engel 
DFKI GmbH 
Stuhlsatzenhausweg 3 
66123 Saarbrficken 
{alexanders son, poller, engel, kipp}@dfki, de 
Abstract 
This paper describes a novel functionality of the 
VERBMOBIL system, a large scale translation sys- 
tem designed for spontaneously spoken multilingual 
negotiation dialogues. The task is the on-demand 
generation of dialogue scripts and result summaries 
of dialogues. We focus on summary generation and 
show how the relevant data are selected from the 
dialogue memory and how they are packed into 
an appropriate abstract representation. Finally, we 
demonstrate how the existing generation module of 
VERBMOBIL was extended to produce multilingual 
and result summaries from these representations. 
1 Introduction 
In the last couple of years different methods for 
summarization have been developed. In this pa- 
per we report on a new system functionality within 
the scope of VERBMOBIL (Bub et al., 1997), a fully 
implemented speech-to-speech translation system, 
that generates German or English dialogue scripts 
(Alexandersson and Poller, 1998) as well as Ger- 
man or English summaries of a multilingual nego- 
tiation dialogue held with assistance of the system. 
By a script we mean a document that reflects the 
domain-specific propositional contents of the indi- 
vidual turns of a dialogue as a whole, while a sum- 
mary gives a compact summarization of all negotia- 
tions the dialogue participants agreed on. 
The key idea behind our approach is to utilize 
as many existing resources as possible. Conceptu- 
ally we have added one module (although techni- 
cally realized in different already existing modules 
of the overall VERBMOBIL system) - the summary 
generator. Besides formatting, our new module gen- 
erates sequences of  specific (i.e., German) 
semantic representations for thegeneration of Sam: 
maries/seripts based on the content of the dialogue 
memory (Kipp et al., 1999). These descriptions are 
• The research within VERBMOBIL presented here is funded 
by the German Ministry of Research and Technology under 
grant 011V101K/1. The authors would like to thank Tilman 
Becker for comments on earlier drafts on this paper, and 
Stephan Lesch for invaluable help with programming. 
realized into text by the existing VERBMOBIL gen- 
erator (Becker et al., 1998). To produce multilingual 
summaries we utilize the transfer module of VERS- 
MOBIL (Dorna and Emele, 1996). 
The next section gives an overview of the VERB- 
MOBIL system focusing on the modules central for 
the production of summaries/scripts. It is followed 
by a section describing the extraction and mainte- 
nance of summary relevant data. We then describe 
the functionality of the summary generator in detail. 
An excerpt of the sample dialogue we refer to in the 
paper is given at the end of the paper. 
2 Prerequisites 
VERBMOBIL is a speech-to-speech translation 
project, which at present is approaching its end and 
in which over 100 researchers 1 at academic and in- 
dustrial sites are developing a translation system 
for multilingual negotiation dialogues (held face to 
face or via telephone) using English, German, and 
Japanese. The main difference between VERBMO- 
BIL and, c.f., man-machine dialogue systems is that 
VERBMOBIL mediates the dialogue instead of con- 
trolling it. Consequently, the complete dialogue 
structure as well as almost the complete macro- 
planning is out of the system's control. 
The running system of today is complex, consist- 
ing of more than 75 separate modules. About one 
third of them concerns linguistic processing and the 
rest serves technical purposes. (For more informa- 
tion see for instance (Bub et al., 1997)). For the sake 
of this paper we concentrate on a small part of the 
system as shown in figure 1. 
A user contribution is called a turn which is di- 
vided into segments. A segment ideally resembles 
a complete sentence as we know it from traditional 
grammars, However; because :of -the. spontaneity of 
the user input and because the turn is chunked by 
a statistical process, the input segments for the lin- 
guistic components are sometimes merely pieces of 
linguistic material. For the dialogue memory and 
one of the shallow translation components the dia- 
lSee http://verbmobil.dfki.de for the list of project 
partners. 
148 
Data - ~ 
Figure 1: Part of the VERBMOBIL system 
logue act (Alexandersson et al., 1998) plays an im- 
portant role. The dialogue act represents the com- 
municative function of an utterance, which is an im- 
portant information for the translation as well as the 
modeling of the dialogue as a whole. Examples of il- 
locutionary acts are REQUEST and GREET. Other 
acts can carry propositional content, like SUGGEST 
and INFORM_FEATURE. 
To obtain a good translation and enhance the 
robustness of the overall system the translation is 
based on several competing translation tracks, each 
based on different paradigms. The deep translation 
track consists of an HPSG based analysis, semantic 
transfer and finally a TAG-based generator (VM- 
GECO). The linguistic information within this track 
is encoded in a so-called VIT 2 (Bos et al., 1996; 
Dorna, 1996) which is a formalism following DRT. 
It consists of a set of semantic conditions (i.e. predi- 
cates, roles, operators and quantifiers) and allows for 
underspecification with respect to scope and subor- 
dination or inherent underspecification. A graphical 
representation of the VIT for the English sentence 
"They will meet at the station" is shown in figure 2. 
Besides the deep translation track several shallow 
tracks have been developed. The main source of 
input for the generation of summaries comes from 
one of these shallow analysis components (described 
in section 3) which produces dialogue acts, topic 
suggestions and expressions in a new knowledge 
representation  called DIREX 3. These ex- 
pressions represent domain related information like 
source and destination-o!ties~ dates;-important hotel 
related data, and meeting points. This input is pro- 
cessed by the dialogue module which computes the 
relevant (accepted) objects of the negotiation (each 
consisting of dialogue act, topic, and a DIREX) 
Figure 3 shows the conceptual architecture, where 
2Verbmobil Interface Term 
aDomaln Represematioa EXpression 
. J.d.=C,.i;,hi3, h2) 
BZI ... II, ' " 
Figure 2: Graphical representation of VIT for "They 
will meet at the station" 
the summary generation process as a whole is indi- 
cated with thicker lines. It consists of the following 
steps: 
o Content Selection: The relevant structures are 
selected from the dialogue memory. 
. ..o .Summary~ Generation: These- Structures are 
converted into sequences of semantic descriptions 
(VITs) of full sentences for German (see section 4). 
o Transfer: Depending on the target , the 
German sentence VITs are sent through the transfer 
module. 
* Sentence Generation: The VITs are generated 
by the existing VERBMOBIL generator (Becker et al., 
149 
Figure 3: Conceptual Architecture of the Summary Generation Process 
2000). .... 
® Presentation: The sentences are incorporated 
into the final, e.g., HTML document. 
Throughout the paper we will refer to a German- 
English dialogue (see appendix for an excerpt). 
The information presented there is the spoken sen- 
tence(s) together with the information extracted as 
described in section 3. To save space we only present 
parts of it, namely those which give rise to the struc- 
tures in figure 4. 
3 Extraction and Maintenance of 
Protocol Relevant Data 
The dialogue memory gets its input from one of 
the shallow translation components, which bases 
its translation on the dialogue act and Dll:tEX- 
expression extracted from the segment. The input 
is a triple consisting of: 
® Dialogue Act representing the intention of the 
segment. 
® Topic is one of the four topics scheduling, travel- 
ing, accommodation and entertainment. 
• Direx representing the propositional content of 
the segment. 
For the extraction of propositional content and in- 
tention we use a combination of knowledge based 
and statistical methods. To compute the propo- 
sitional content finite state transducers (FSTs) 
(Appelt et al., 1993) with built-in functions are 
used (Kipp et al., 1999). The intention (represented 
by a dialogue act) is computed statistically us- 
ing  models (Reithinger and Klesen, 1997). 
Both methods were chosen because of their robust- 
ness - since the speech recognizers have a word error 
rate of about 20%, we cannot expect sound input 
for the analysis. Also the segmentation of turns in 
utterances is stochastic and therefore sometimes de- 
livers suboptimal segments. Consider the input to 
be processed: 
I would so we were to leave Hamburg on the 
first 
where the speech recognizer replaced "good so we 
will" with "I would so we were to". The result of 
the extraction module looks like: 
..... """ "\[ITNFORMTtravel ing, he~s_move : \[move, 
has_source_locat ion : \[city, has_name = 
' hamburg ' \] , has_departure_time : 
\[date, time= \[day : i\] \] \] \] 
The result consists of the dialogue act INFORM, 
the topic suggestion traveling, and and a DIREX. 
The top object is a move with two roles: A source 
location (which is a city - Hanover), and a departure 
time (which is a date - day 1). 
Dialog processing 
For each utterance, and hence each DIREX the di- 
alogue manager (1) estimates its relevance, and (2) 
enriches it with context. For summary generation, 
we are solely interested in the most specific, accepted 
objects. Therefore, we also (3) compute more spe- 
cific~general relations between objects: 
Relevance detection. Depending on the dialogue act 
of the current utterance different courses of action 
are taken. SUGGEST dialogue acts trigger the stor- 
age, completion, focusing and inter-object relation 
(see below) computation for the current structure. 
ACCEPT and REJECT acts let the system mark the 
focused object accepted/rejected. 
Object Completion. Suggestions in negotiation dia- 
logues are incomplete most of the time. E.g., the 
utterance "I would prefer to leave at five" is a sug- 
gestion referring to the departure time for a trip 
from Munich to Hanover on the 19. Jan. 2000 (see 
turn 1005 in the appendix). Most of the complete 
data has been mentioned in the preceding dialogue. 
Our completion algorithm uses the focused object 
(itself a completed suggestion) to complete the cur- 
rent structure. All non-conflicting information of tile 
focused object is copied onto the new object. In our 
example the current temporal information "I would 
prefer to leave at five" would be completed with date 
(i.e., "19. Jan. 2000'" ) and other travel data ("trip 
from-Munich to Hanover"). Afterwards, it Will be 
put to focus. 
Object Relations. The processing results in a number 
of accepted and rejected objects. Normally, a nego- 
tiation produces a series of suggestions that become 
more specific over time. For each new object we cal- 
culate the relation to all other suggestions it\] terms 
of more specific/general or equal. A final inference 
150 
procedure then filters redundant objects and pro- representation onto a semantic description (VIT) for 
duces a list of accepted objects with highest speci ...... each sentence (suitable foz.further processing by the 
ficity. Figure 4 shows two such objects extracted 
from the sample dialogue. Both structures have been 
completed from context data including situational 
data, i.e., current time and place of the negotiation. 
........................................... ........................................... 
Topic SCHEDULING 
........................................... ........................................... 
relations: 
((MDRE_SPECIFIC_THAN.#~APPOINTMENT P2*>)) 
APPOINTMENT (Ph*+0) 
HAS_LOCATION --> CITY (P4*) 
HAS_NAME="hannover" 
HAS_MEETING --> MEETING (P3**) 
HAS_NAME="ges chae ft st re f fen" 
HAS_DATE --> DATE (Ph*) 
TEMPEX= \[year : 2000, 
month: j an, 
day : 20, 
part :am, 
time: ii :0\] 
relations : 
((MOKE_SPECIFIC_THAN . #<APPOINTMENT P26.>) 
(MORE_SPECIFIC_THAN . #<APPOINTMENT P30**+0>)) 
APPOINTMENT (P29.+0) 
HAS_LOCATION --> NONGEO_LOCATION (P30***) 
HAS_NAME="b~hnhof" 
HAS_DATE --> DATE (P29") 
TEMPEX=\[year:2000, 
month:jan, 
day:lg, 
time:9:30\] 
Figure 4: The scheduling part of the thematic struc- 
ture 
4 Generating Summaries 
Our system uses many of tim existing components 
of VERB~'IOBIL. However, we had to develop a new 
component, the summary generator, which is de- 
scribed below. It solves the task of mapping the 
DIREX structures selected in the dialogue nmmory 
into sequences of full fledged semant.ic sentence de- 
scriptions (VITs), thereby performing the following 
steps: 
* Document Planning: Extracting, preparing 
and dividing the content of the dialogue memory into 
a predefined format. - This includes, c.f., time/place 
of negotiation, participants, result of the negotia- 
tion. 
o Sentence Planning: Splitting the input into 
chunks suitable for a sentence. This process in- 
voh'es choosing an appropriate verb and arranging 
the parts of the chunk as arguments and/or a(l- 
.iuncts. The final step is the mapping of this internal 
existing VERBMOBIL components). 
® Generation: Verbalizing the VITs by the exist- 
ing multilingual generator of VERBMOBIL. 
® Presentation: Formatting of the complete doc- 
ument content to an, e.g., HTML-page. Finally, the 
document is displayed by an appropriate browser. 
Our approach has been mostly guided by robust- 
ness: our representation  (DIREX) was co- 
developed during the course of the project. More- 
over, as the extraction component increased its vo: 
cabulary, we wanted to be able to generate new in- 
formation which had not been seen before. Hence 
we needed an approach which is fault tolerant. In- 
stead of failing when the representation changes or 
new type of objects were introduced we degrade in 
precision. Our two step approach has proven its use- 
fulness for this. 
4.1 Document Planning 
The document itself contains two main parts. The 
top of the document includes general informa- 
tion about the dialogue (place, date, participants, 
theme). The body of the document contains the 
summary part which is divided into four paragraphs, 
each of them verbalizing the agreements for one ne- 
gotiation topic: scheduling, accommodation, travel- 
ing and entertainment. Therefore, our document 
planning is very straightforward. The four elements 
of the top document are processed in the following 
manner: 
o Place and Date: For place and date the informa- 
tion is simply retrieved from the dialogue memory. 
• Participants: The participants information are 
transformed into a VIT by the plan processor de- 
scribed below. In the absence of name/title infor- 
mation, a character, e.g., h, B, .. • is used. 
® Theme: By a shallow examination of the result of 
the content extraction, a semantic description corre- 
sponding to a noun phrase mirroring the content of 
the document as a whole is construed. An example 
is Business trip with accommodation. 
• The summary." Finally, the summary relevant D1- 
REX objects are retrieved from the dialogue men> 
ory: First we compute the most specific suggestions 
by using the most specific/general and equal rela- 
tions. The remaining suggestions are partitioned 
into equivalence classes which are filtered by com- 
puting the degree of acceptance. In case of conflict 
the most recent one is taken. The resulting set is par- 
titioned into the above mentioned topics the)' belong 
to. Finally these are processed by the plan processor 
as described below. 
4.2 Sentence Planning 
We now turn into the process of mapping the inter- 
esting part of the dialogue memory onto sequences 
151 
of VITs. An example of the content of one topic - 
scheduling - was shown in figure 4. O.ur two step 
approach consists of: 
* A plan processor whose task it is to split the 
objects selected into chunks suitable for a sentence. 
Possibly it contributes to the selection of verbs. 
o A semantic constructor whose task it is to con- 
vert the output of the plan processor into full fledged 
semantic descriptions (VITs) for the sentences of the 
document. This second step can be viewed as a ro- 
bust fall-back: If the plan processor does not succeed 
in obtaining full Specifications of all sentence parts, 
this step secures a valid and complete specification. 
4.2.1 The plan processor 
Input to the plan processor (Alexandersson and Rei- 
thinger, 1997) is the thematic structure partly shown 
in figure 4. The plan processor interprets (currently 
about 150) plan operators which are expanded in a 
top-down left to right fashion. 
For the overall structure of the text, the imposed 
topic structure of the thematic structure is kept. 
Within a topic we use a set of operators which are ca- 
pable of realizing (parts of) the structures to NPs, 
PPs and possibly verb information forming a high 
level specification of a sentence. 
Plan operators 
A plan operator consists of a goal which is option- 
ally divided into subgoal(s). Its syntax contains the 
keywords :constraints and :actions which can 
be any Lisp expression. Variables are indicated with 
question/exclamation marks (see figures 5 and 6). 
The goal of the operators uses an interface based 
on a triple with the following usage: 
o <description> This is the input position of the 
operator. It describes and binds the object which 
will be processed by this operator. 
o <context> This is the context - input/output. 
The context contains a stack for objects in focus, 
handled as described in (Grosz and Sidner, 1986). 
Additionally we put the generated information on a 
history list (Dale, 1995). The context supports the 
generation of, e.g., pronouns (see below). At present 
the context is only used local to each topic. 
o <output> The result of the operator. Tile possible 
output types are NP, PP and sentence(s). 
We the distinguish two types of operators; complex 
operators, responsible for complex objects, which 
can contain several roles, and simple operators, 
which can process simple objects (carrying only one 
role). The general design of a complex operator -- see 
figure 5 for an operator responsible for appointment 
objects - consists of three subgoals: 
o (find-roles ...) Retrieve tile content of the 
object. "ghe operators responsible for soh'ing the 
find-roles goal optionally allow for an enumera- 
tion of the roles we want to use. 
e (split-roles . . .) These roles (and values) will 
be partitioned,into chunks, (which we, call a split) 
suitable for generating one sentence. 
• (generate-splits ...) Finally the output - a 
sentence description - will be constructed. 
(defplan appointment 
:goal ((class (Vapp scheduling)) 
(?in-context ?out-context) 
?sentence) 
:constraints (appointment-p !app) 
:subgoals (:sequence 
(find-roles ?appZrels) 
(split-roles ?rels 
appointment ?l-of-splits) 
(generate-splits ?l-of-splits 
(Via-context ?out-context) 
appointment ?sentence))) 
Figure 5: An example of an operator for a "complex" 
object 
Behind the functionality of the split-roles goal 
we use pairs of operators (figure 6), where the first is 
a fact describing the roles of the split, and the second 
is a description for how to realize the sentence. In 
this example the selection of an appropriate verb is 
not performed by this operator but by the semantic 
constructor. 
The second type of operators are simple operators 
like the one for the generation of time expressions 
(tempex) or cities (see figure 4). 
Figure 7 shows a simplified plan processor output 
(building block) for one sentence. 
4.2.2 The Semantic Constructor 
The task of the semantic constructor is to map the 
information about sentences computed by the plan 
processor to full semantic representations (VITs). 
The knowledge source for this computational step 
is a declarative set of about 160 different semanti- 
cally oriented sentence patterns which are encoded 
in an easily extendable semantic/syntactic descrip- 
tion . 
To obtain a complete semantic representation for 
a sentence we first select a sentence pattern. This 
pattern is then, together with tile output of the plan 
processor, interpreted to produce the VIT. The se- 
lection criteria for a sentence pattern are: 
All patterns are ordered topic-wise because 
the appropriateness of sentence patterns is topic- 
dependent (e.g., the insertion of topic-specific NPs 
or PPs into a sentence). 
-+ The int.entional state of the inforination to 
be verbalized highly restricts the set of appropriate 
verbs. 
Depending on the propositional content de- 
scribed within a DIat-:x-VIT - i.e., a VIT repre- 
senting one sentence part in a building block of the 
152 
;; - Das <Treffen> finder in <City> 
;; am <tempex> statt 
• ;; - The <Meeting>takes place 
;; in <City> on the <tempex> 
(deffact sentence-split 
:goal (sentence-split 
((has_meeting ?has_name) 
(has_location ?has_location) 
(has_date ?has_date)) 
?_topic)) 
(defplan generate-split 
:goal (generate-split 
((has_meeting ?nmme) ......... ;;:meeting 
(has_location ?location) ;; city 
(has_date ?date)) ;; tempex 
(?in-context ?out-context) 
?topic 
?s) 
:subgoals 
(:seq ((class (?location ?scheduling pp)) 
7topic ?loc-pp) 
((class (?name ?scheduling)) 
?topic ?s-topic) 
(generate-full-tempex ?date ?tempex) 
(((generate-sentence decl) 
(subj ?topic has_topic) 
(obj ?l-pp has_location) 
(obj-add ?tempex has_date)) 
?in-context ?out-context ?s))) 
Figure 6: Example of sentence definition and gener- 
ation 
(ACCOMMODATION 
(ACCEPTED 
(HAS_SIZE VIT: <Einzelzimmer>) 
(HAS_PRICE VIT: <80-Euro-pro-Nacht>) )) 
Figure 7: Exmnple of a plan processor output 
plan processor output - it has to play different se- 
mantic roles in the sentence (e.g., verb-argument vs. 
verb-complement) 
Additionally, the number of DtREx-VITs given 
within a building block for a sentence, influences the 
distribution of them to appropriate semantic roles. 
Figure 8 shows a simplified sentence pattern that 
is selected for the building block in figure 7 to con- 
struct a VIT for, e.g., the German sentence Das 
Einzelzimmer kostet 80 Euro pro Nacht. ("The sin- 
gle room costs 80 euro per night."). According 
(( : verb kosten_v) 
(:subj HAS_SIZE) 
(: obj HAS_PRICE) 
(:rest DIREX_PPS)) 
Figure 8: Example of a sentence pattern 
to the above mentioned selection criteria, this pat- 
tern is selected only for building blocks within 
. ...the.~ accommodation:topi.c~ that-contain, at least ,val- 
ues for the roles HAS.SIZE and HAS.PRIZE, respec- 
tively. The sentence pattern contains the following 
"building instructions": The semantic verb predi- 
cate (:verb) is kosten_v (to cost), its subject ar- 
gument (:subj) is to be filled by the DIREX-VIT 
associated to the DmEx-role HAS.SIZE while :obj 
means a similar instruction for the direct object. 
The robustness fallback (:rest DIREX._PPS) means 
.that.all_other DmEx=VITs are attached to the verb 
as PP complement§. It ispah ~/f a\]l 'Sen~df/6+ pit- 
terns to ensure that even erroneous building blocks 
or erroneously selected sentence patterns produce a 
sentence VIT. 
Finally, the VIT is constructed by interpreting the 
sentence pattern. The interpreter walks through the 
sentence pattern and performs different actions de- 
pending on the keywords, e.g., :verb, :subj and 
their values. 
4.2.3 Utilizing Context 
During'the course of the generation, the plan proces- 
sor incrementally constructs a context (Dale, 1995), 
which allows for the generation of, c.f., anaphora or 
demonstratives for making the text fluent or con- 
trasting purposes. 
• Anaphora If, e.g., a meeting is split into 
more than one sentence, the plan processor uses an 
anaphora to the meeting in the second sentence. 
• Discourse Markers In case of multiple, e.g., 
meetings we introduce the second with a discourse 
marker, e.g., "also". 
o Demonstratives In case of multiple meetings, we 
use a demonstrative to refer to the second meeting. 
In addition to the plan processor, the seman- 
tic constructor also takes care of coherence within 
the paragraphs produced for the individual topics 
hereby focusing on the generation of anaphora and 
adverbial discourse markers. While the local con- 
text of the plan processor is based on the proposi- 
tional content at hand, the semantic constructor uses 
a postprocessing module that is based oil the output 
\qTs of the plan processor (DIREx-VITs) using its 
own semantically oriented local context memory. 
Anaphorization and insertion of discourse mark- 
ers within the semantic constructor are based on a 
comparison of plan processor output VITs occur- 
ring within consecutive sentences of a paragraph. 
Identical verb arguments (NPs) in consecutive sen- 
., tences are replaced by .appropriate anaphoric pro- 
nouns while identical verbs themselves lead to the in- 
sertion of an appropriate adverbial discourse marker. 
5 Multilinguality 
The generation of dialogue scripts and result sum- 
maries is fully implemented in VERB~VIoBIL for Ger- 
man and English. For the English smnmaries we 
153 
extracted, then the transfer module produces equiv- 
alent English VITs which are finally sent to the En- 
glish generation component for producing the En- 
glish text. 
Figure 9 shows the English result summary of the 
dialogue shown in the appendix. 
make use of the transfer component as follows. All o TN A feature was not part of the dialogue, and 
VITs from the German-document representation are . not included in. the..summary. 
The evaluation result is shown in figure 10. It uses 
the standard precision, recall and fallout as defined 
in (Mani et.al., 1998). 
Dialogue 1 2 3 4 aver 
Turns 33 33 31 32 32.25 
Corr 6 13 9 11 9.75 
Miss 6 3 5 4 4.5 
False 3 3 3 0 2.25 I 
TN 32 28 30 32 30.5 I 
Recall 0.5---0- 0.8-'--1- 0.6----4-- 0.7-'--3--~ 
10 I 
Fallout i 0.0__9 0.1___0_ 0.0____9_ __0"00 
Figure 10: Evaluation Results 
Figure 9: Example of an English result summary 
6 Evaluation 
We have performed a small evaluation of the overall 
system as described in this paper. Basis for the eval- 
uation were the transcripts of four German-English 
negotiation dialogues. For each dialogue the result- 
ing features of the negotiation (maximally 47, e.g., 
location, date for a meeting, speakers name and title, 
book agent) were annotated by a lmman, and then 
compared with the result of running the dialogues 
through the system and generating the summaries. 
The features in the summary were compared using 
the following classifications: 
• Corr The feature approximately corresponds to 
the human annotation. This means that the feature 
is either (1) a 100% match; (2) it was not sufficiently 
specified or (2) too specific. An example of (2) is 
when the correct date included a time, which was 
not captured. An example of (3) is when a date 
with time was annotated but the feature contained 
just a (late. 
o Miss A feature is not included in the summary. 
o False A feature was erroneously iimluded in the 
sumlnary, meaning that the feature was not part of 
the dialogue or it received a wrong value. 
Obviously, our approach tries to be on the safe 
side; the summary contains only those features that 
the system thinks both partners agreed on. The 
main reasons for not getting higher numbers is 
twofold. The recognition of dialogue acts, and thus 
the recognition of the intension behind the utter- 
ances reaches a 70% recall (Reithinger and Klesen, 
1997). We also still make errors during the content 
extraction. 
7 Conclusion 
We have presented an extension to existing modules 
allowing for the generation of summaries within the 
VERBMOBIL system. To our knowledge our system 
is the only one that uses semantic representation as 
basis for summarizing. Other approaches use, e.g., 
statistical techniques or rhetorical parsing (Waibel 
et al., 1998; Hovy and Marcu, 1998) to obtain the 
summaries. Moreover, although our module is re- 
stricted to  specific processing, the use of 
semantics and the transfer module allow for the gen- 
eration of multilingual documents in a very straight- 
forward fashion. 
In the near future we will extend the system with 
respect to: 
o Sentence Split At present the first found sen- 
tence split is chosen. This is not necessarily the op- 
timal one. We are currently in the process of devel- 
oping criteria for ranking competing results. 
o Japanese The VERBMOBIL system currently in- 
cludes German, English and Japanese. We intend 
to apply the same technique as for the English sum- 
maries to generate Japanese ones. 

References 
J. Alexandersson and P. Poller. 1998. Towards multilin- 
gual protocol generation for spontaneous speech dia- 
gues. In Probeedings of INLG-98, Niagara-On-The- 
Lake. Ontario. Canada. 
J. Alexandersson and N. Reithinger. 1997. Learning di- 
alogue structures from a corpus. In Proceedings of 
EufoSpeech-97; pages' 2231-2235," Rhodes. 
Jan Alexandersson, Bianka Buschbeck-Wolf, Tsutomu 
Fujinami, Michael Kipp, 'Stephan Koch, Elisa- 
beth Maier, Norbert P~eithinger, Birte Schmitz, 
and Melanie Siegel. 1998. Dialogue Acts in 
VERBMOBIL-2 - Second Edition. Vergmobil-Report 
226, DFKI Saarbrficken, Universitgt Stuttgart, Tech- 
nische Universit/it Berlin, Universit/it des Saarlandes. 
D. Appelt, J. Hobbs, J. Bear, and M. Tyson. 1993. FAS- 
TUS: A finite-state processor for information extrac- 
tion from real-world text. In IJCAL93. 
T. Becker, W. Finkler, A. Kilger, and P. Poller. 1998. An 
efficient kernel for multilingual generation in speech- 
to--speech dialogue -translation-.-In :Proceediiigs of 
COLING/ACL-98, Montreal, Quebec, Canada. 
T. Becket, A. Kilger, P. Lopez, and P. Poller. 2000. Mul- 
tilingual generation for translation in speech-to-speech 
dialoga.les and its realization in verbmobil. In Proceed- 
ings of ECAI-2000, Berlin, Germany. 
J. Bos, B. Gamb/ick, C. Lieske, Y. Mori, M. Pinkal, and 
K. Worm. 1996. Compositional semantics in verbmo- 
bil. In Proceedings of Coling '96, Copenhagen, Den- 
mark. 
T. Bub, W. Wahlster, and A. Waibel. 1997. Verbmo- 
bil The combination of deep and shallow processing 
for spontaneous speech translation. In Proceedings of 
ICASSP-97, pages 71-74, Munich. 
R. Dale. 1995. An introduction to natural lan- 
guage generation. Technical report, Microsoft 
Research Institute (MRI), Macquarie Univer- 
sity. Presented at the 1995 European Summer 
School on Logic, Language and Information, Avail- 
able from http://www.mri.mq.edu.au/-rdale/nlg- 
textbook/ESSLLI95/. 
M. Dorna and M. Emele. 1996. Efficient Implementation 
of a Semantic-Based Transfer Approach. In Proceed- 
ings of ECAI-96, pages 567-571, Budapest, Hungary, 
August. 
M. Dorna. 1996. The ADT-Package for the VERBMOBIL 
Interface Term. Verbmobil Report 104, IMS, Univer- 
sit/it Stuttgart, Germany. 
B. Grosz and C. Sidner. 1986. Attention. Intentions and 
the Structure of Discourse. Journal o~ Computational 
Linguistics, 12(3). 
E. Hovy and D. Marcu. 1998. Coling/acl-98 tu- 
torial on automated text summarization. Avail- 
able from http://w~v.isi.edu/-marcu/coling-ac198- 
tutorial.html. 
M. Kipp, J. Alexandersson, and N. Reithinger. 1999. 
Understanding Spontaneous Negotiation Dialogue. In 
Workshop Proceedings 'Knowledge And Reasoning in 
Practical Dialogue Systems' of TJCAI '99, pages 57- 
64. 
I. Mani, D. House, G. Klein, L. Hirschman, L. 
Obrist, T. Firmin. M. Chrzanowski, and B. 
Sundheim. 1998. The tipster summac text sum- 
marization evaluation - final report. Technical 
reports The Mitre Corp. Available from http://www- 
24.nist.gov/related_projects/tipster_summac/finalxpt- 
.html. 
N. Reithinger and M. Klesen. 1997. Dialogue Act Clas- 
sification Using Language Models. In Proceedings of 
EuroSpeech-97, pages 2235-2238, Rhodes. 
A. Waibel, M. Bett, M. Finke, and R Stiefelhagen. 1998. 
Meeting Browser: Tracking and Summarizing Meet- 
ings. In Proceedings of the DARPA Broadcast News Workshop. 
