File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-1420_metho.xml
Size: 26,455 bytes
Last Modified: 2025-10-06 14:07:25
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1420"> <Title>Multilingual Summary Generation in a Speech-To-Speech Translation System for Multilingual Dialogues*</Title> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> This paper describes a novel functionality of the VERBMOBIL system, a large scale translation system designed for spontaneously spoken multilingual negotiation dialogues. The task is the on-demand generation of dialogue scripts and result summaries of dialogues. We focus on summary generation and show how the relevant data are selected from the dialogue memory and how they are packed into an appropriate abstract representation. Finally, we demonstrate how the existing generation module of VERBMOBIL was extended to produce multilingual and result summaries from these representations.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In the last couple of years different methods for summarization have been developed. In this paper we report on a new system functionality within the scope of VERBMOBIL (Bub et al., 1997), a fully implemented speech-to-speech translation system, that generates German or English dialogue scripts (Alexandersson and Poller, 1998) as well as German or English summaries of a multilingual negotiation dialogue held with assistance of the system. By a script we mean a document that reflects the domain-specific propositional contents of the individual turns of a dialogue as a whole, while a summary gives a compact summarization of all negotiations the dialogue participants agreed on.</Paragraph> <Paragraph position="1"> The key idea behind our approach is to utilize as many existing resources as possible. Conceptually we have added one module (although technically realized in different already existing modules of the overall VERBMOBIL system) - the summary generator. Besides formatting, our new module generates sequences of language specific (i.e., German) semantic representations for thegeneration of Sam: maries/seripts based on the content of the dialogue memory (Kipp et al., 1999). These descriptions are * The research within VERBMOBIL presented here is funded by the German Ministry of Research and Technology under grant 011V101K/1. The authors would like to thank Tilman Becker for comments on earlier drafts on this paper, and Stephan Lesch for invaluable help with programming.</Paragraph> <Paragraph position="2"> realized into text by the existing VERBMOBIL generator (Becker et al., 1998). To produce multilingual summaries we utilize the transfer module of VERS-MOBIL (Dorna and Emele, 1996).</Paragraph> <Paragraph position="3"> The next section gives an overview of the VERBMOBIL system focusing on the modules central for the production of summaries/scripts. It is followed by a section describing the extraction and maintenance of summary relevant data. We then describe the functionality of the summary generator in detail.</Paragraph> <Paragraph position="4"> An excerpt of the sample dialogue we refer to in the paper is given at the end of the paper.</Paragraph> </Section> <Section position="4" start_page="0" end_page="149" type="metho"> <SectionTitle> 2 Prerequisites </SectionTitle> <Paragraph position="0"> VERBMOBIL is a speech-to-speech translation project, which at present is approaching its end and in which over 100 researchers 1 at academic and industrial sites are developing a translation system for multilingual negotiation dialogues (held face to face or via telephone) using English, German, and Japanese. The main difference between VERBMOBIL and, c.f., man-machine dialogue systems is that VERBMOBIL mediates the dialogue instead of controlling it. Consequently, the complete dialogue structure as well as almost the complete macroplanning is out of the system's control.</Paragraph> <Paragraph position="1"> The running system of today is complex, consisting of more than 75 separate modules. About one third of them concerns linguistic processing and the rest serves technical purposes. (For more information see for instance (Bub et al., 1997)). For the sake of this paper we concentrate on a small part of the system as shown in figure 1.</Paragraph> <Paragraph position="2"> A user contribution is called a turn which is divided into segments. A segment ideally resembles a complete sentence as we know it from traditional grammars, However; because :of -the. spontaneity of the user input and because the turn is chunked by a statistical process, the input segments for the linguistic components are sometimes merely pieces of linguistic material. For the dialogue memory and one of the shallow translation components the dialSee http://verbmobil.dfki.de for the list of project partners.</Paragraph> <Paragraph position="3"> logue act (Alexandersson et al., 1998) plays an important role. The dialogue act represents the communicative function of an utterance, which is an important information for the translation as well as the modeling of the dialogue as a whole. Examples of illocutionary acts are REQUEST and GREET. Other acts can carry propositional content, like SUGGEST and INFORM_FEATURE.</Paragraph> <Paragraph position="4"> To obtain a good translation and enhance the robustness of the overall system the translation is based on several competing translation tracks, each based on different paradigms. The deep translation track consists of an HPSG based analysis, semantic transfer and finally a TAG-based generator (VM-GECO). The linguistic information within this track is encoded in a so-called VIT 2 (Bos et al., 1996; Dorna, 1996) which is a formalism following DRT.</Paragraph> <Paragraph position="5"> It consists of a set of semantic conditions (i.e. predicates, roles, operators and quantifiers) and allows for underspecification with respect to scope and subordination or inherent underspecification. A graphical representation of the VIT for the English sentence &quot;They will meet at the station&quot; is shown in figure 2. Besides the deep translation track several shallow tracks have been developed. The main source of input for the generation of summaries comes from one of these shallow analysis components (described in section 3) which produces dialogue acts, topic suggestions and expressions in a new knowledge representation language called DIREX 3. These expressions represent domain related information like source and destination-o!ties~ dates;-important hotel related data, and meeting points. This input is processed by the dialogue module which computes the relevant (accepted) objects of the negotiation (each consisting of dialogue act, topic, and a DIREX) Figure 3 shows the conceptual architecture, where will meet at the station&quot; the summary generation process as a whole is indicated with thicker lines. It consists of the following steps: o Content Selection: The relevant structures are selected from the dialogue memory.</Paragraph> <Paragraph position="6"> . ..o .Summary~ Generation: These- Structures are converted into sequences of semantic descriptions (VITs) of full sentences for German (see section 4). o Transfer: Depending on the target language, the German sentence VITs are sent through the transfer module.</Paragraph> <Paragraph position="7"> * Sentence Generation: The VITs are generated by the existing VERBMOBIL generator (Becker et al., (r) Presentation: The sentences are incorporated into the final, e.g., HTML document.</Paragraph> <Paragraph position="8"> Throughout the paper we will refer to a German-English dialogue (see appendix for an excerpt). The information presented there is the spoken sentence(s) together with the information extracted as described in section 3. To save space we only present parts of it, namely those which give rise to the structures in figure 4.</Paragraph> </Section> <Section position="5" start_page="149" end_page="150" type="metho"> <SectionTitle> 3 Extraction and Maintenance of </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="149" end_page="150" type="sub_section"> <SectionTitle> Protocol Relevant Data </SectionTitle> <Paragraph position="0"> The dialogue memory gets its input from one of the shallow translation components, which bases its translation on the dialogue act and Dll:tEXexpression extracted from the segment. The input is a triple consisting of: (r) Dialogue Act representing the intention of the segment.</Paragraph> <Paragraph position="1"> (r) Topic is one of the four topics scheduling, traveling, accommodation and entertainment. * Direx representing the propositional content of the segment.</Paragraph> <Paragraph position="2"> For the extraction of propositional content and intention we use a combination of knowledge based and statistical methods. To compute the propositional content finite state transducers (FSTs) (Appelt et al., 1993) with built-in functions are used (Kipp et al., 1999). The intention (represented by a dialogue act) is computed statistically using language models (Reithinger and Klesen, 1997). Both methods were chosen because of their robustness - since the speech recognizers have a word error rate of about 20%, we cannot expect sound input for the analysis. Also the segmentation of turns in utterances is stochastic and therefore sometimes delivers suboptimal segments. Consider the input to be processed: I would so we were to leave Hamburg on the first where the speech recognizer replaced &quot;good so we will&quot; with &quot;I would so we were to&quot;. The result of the extraction module looks like: ..... &quot;&quot;&quot; &quot;\[ITNFORMTtravel ing, he~s_move : \[move, has_source_locat ion : \[city, has_name = ' hamburg ' \] , has_departure_time : \[date, time= \[day : i\] \] \] \] The result consists of the dialogue act INFORM, the topic suggestion traveling, and and a DIREX. The top object is a move with two roles: A source location (which is a city - Hanover), and a departure time (which is a date - day 1).</Paragraph> <Paragraph position="3"> Dialog processing For each utterance, and hence each DIREX the dialogue manager (1) estimates its relevance, and (2) enriches it with context. For summary generation, we are solely interested in the most specific, accepted objects. Therefore, we also (3) compute more specific~general relations between objects: Relevance detection. Depending on the dialogue act of the current utterance different courses of action are taken. SUGGEST dialogue acts trigger the storage, completion, focusing and inter-object relation (see below) computation for the current structure. ACCEPT and REJECT acts let the system mark the focused object accepted/rejected.</Paragraph> <Paragraph position="4"> Object Completion. Suggestions in negotiation dialogues are incomplete most of the time. E.g., the utterance &quot;I would prefer to leave at five&quot; is a suggestion referring to the departure time for a trip from Munich to Hanover on the 19. Jan. 2000 (see turn 1005 in the appendix). Most of the complete data has been mentioned in the preceding dialogue. Our completion algorithm uses the focused object (itself a completed suggestion) to complete the current structure. All non-conflicting information of tile focused object is copied onto the new object. In our example the current temporal information &quot;I would prefer to leave at five&quot; would be completed with date (i.e., &quot;19. Jan. 2000'&quot; ) and other travel data (&quot;trip from-Munich to Hanover&quot;). Afterwards, it Will be put to focus.</Paragraph> <Paragraph position="5"> Object Relations. The processing results in a number of accepted and rejected objects. Normally, a negotiation produces a series of suggestions that become more specific over time. For each new object we calculate the relation to all other suggestions it\] terms of more specific/general or equal. A final inference procedure then filters redundant objects and pro- representation onto a semantic description (VIT) for duces a list of accepted objects with highest speci ...... each sentence (suitable foz.further processing by the ficity. Figure 4 shows two such objects extracted from the sample dialogue. Both structures have been completed from context data including situational data, i.e., current time and place of the negotiation. ........................................... ...........................................</Paragraph> </Section> <Section position="2" start_page="150" end_page="150" type="sub_section"> <SectionTitle> Topic SCHEDULING </SectionTitle> <Paragraph position="0"> ........................................... ...........................................</Paragraph> <Paragraph position="1"> relations:</Paragraph> <Paragraph position="3"> HAS_NAME=&quot;ges chae ft st re f fen&quot; HAS_DATE --> DATE (Ph*) TEMPEX= \[year : 2000, month: j an, day : 20, part :am, time: ii :0\]</Paragraph> <Paragraph position="5"/> </Section> </Section> <Section position="6" start_page="150" end_page="152" type="metho"> <SectionTitle> 4 Generating Summaries </SectionTitle> <Paragraph position="0"> Our system uses many of tim existing components of VERB~'IOBIL. However, we had to develop a new component, the summary generator, which is described below. It solves the task of mapping the DIREX structures selected in the dialogue nmmory into sequences of full fledged semant.ic sentence descriptions (VITs), thereby performing the following steps: * Document Planning: Extracting, preparing and dividing the content of the dialogue memory into a predefined format. - This includes, c.f., time/place of negotiation, participants, result of the negotiation. null o Sentence Planning: Splitting the input into chunks suitable for a sentence. This process invoh'es choosing an appropriate verb and arranging the parts of the chunk as arguments and/or a(l.iuncts. The final step is the mapping of this internal existing VERBMOBIL components).</Paragraph> <Paragraph position="1"> (r) Generation: Verbalizing the VITs by the existing multilingual generator of VERBMOBIL.</Paragraph> <Paragraph position="2"> (r) Presentation: Formatting of the complete doc null ument content to an, e.g., HTML-page. Finally, the document is displayed by an appropriate browser. Our approach has been mostly guided by robustness: our representation language (DIREX) was codeveloped during the course of the project. Moreover, as the extraction component increased its vo: cabulary, we wanted to be able to generate new information which had not been seen before. Hence we needed an approach which is fault tolerant. Instead of failing when the representation changes or new type of objects were introduced we degrade in precision. Our two step approach has proven its usefulness for this.</Paragraph> <Section position="1" start_page="150" end_page="150" type="sub_section"> <SectionTitle> 4.1 Document Planning </SectionTitle> <Paragraph position="0"> The document itself contains two main parts. The top of the document includes general information about the dialogue (place, date, participants, theme). The body of the document contains the summary part which is divided into four paragraphs, each of them verbalizing the agreements for one negotiation topic: scheduling, accommodation, traveling and entertainment. Therefore, our document planning is very straightforward. The four elements of the top document are processed in the following manner: o Place and Date: For place and date the information is simply retrieved from the dialogue memory. * Participants: The participants information are transformed into a VIT by the plan processor described below. In the absence of name/title information, a character, e.g., h, B, .. * is used. (r) Theme: By a shallow examination of the result of the content extraction, a semantic description corresponding to a noun phrase mirroring the content of the document as a whole is construed. An example is Business trip with accommodation.</Paragraph> <Paragraph position="1"> * The summary.&quot; Finally, the summary relevant D1-REX objects are retrieved from the dialogue men> ory: First we compute the most specific suggestions by using the most specific/general and equal relations. The remaining suggestions are partitioned into equivalence classes which are filtered by computing the degree of acceptance. In case of conflict the most recent one is taken. The resulting set is partitioned into the above mentioned topics the)' belong to. Finally these are processed by the plan processor as described below.</Paragraph> </Section> <Section position="2" start_page="150" end_page="152" type="sub_section"> <SectionTitle> 4.2 Sentence Planning </SectionTitle> <Paragraph position="0"> We now turn into the process of mapping the interesting part of the dialogue memory onto sequences of VITs. An example of the content of one topic scheduling - was shown in figure 4. O.ur two step approach consists of: * A plan processor whose task it is to split the objects selected into chunks suitable for a sentence. Possibly it contributes to the selection of verbs. o A semantic constructor whose task it is to convert the output of the plan processor into full fledged semantic descriptions (VITs) for the sentences of the document. This second step can be viewed as a robust fall-back: If the plan processor does not succeed in obtaining full Specifications of all sentence parts, this step secures a valid and complete specification. Input to the plan processor (Alexandersson and Reithinger, 1997) is the thematic structure partly shown in figure 4. The plan processor interprets (currently about 150) plan operators which are expanded in a top-down left to right fashion.</Paragraph> <Paragraph position="1"> For the overall structure of the text, the imposed topic structure of the thematic structure is kept. Within a topic we use a set of operators which are capable of realizing (parts of) the structures to NPs, PPs and possibly verb information forming a high level specification of a sentence.</Paragraph> <Paragraph position="2"> Plan operators A plan operator consists of a goal which is optionally divided into subgoal(s). Its syntax contains the keywords :constraints and :actions which can be any Lisp expression. Variables are indicated with question/exclamation marks (see figures 5 and 6).</Paragraph> <Paragraph position="3"> The goal of the operators uses an interface based on a triple with the following usage: o <description> This is the input position of the operator. It describes and binds the object which will be processed by this operator.</Paragraph> <Paragraph position="4"> o <context> This is the context - input/output.</Paragraph> <Paragraph position="5"> The context contains a stack for objects in focus, handled as described in (Grosz and Sidner, 1986).</Paragraph> <Paragraph position="6"> Additionally we put the generated information on a history list (Dale, 1995). The context supports the generation of, e.g., pronouns (see below). At present the context is only used local to each topic.</Paragraph> <Paragraph position="7"> o <output> The result of the operator. Tile possible output types are NP, PP and sentence(s).</Paragraph> <Paragraph position="8"> We the distinguish two types of operators; complex operators, responsible for complex objects, which can contain several roles, and simple operators, which can process simple objects (carrying only one role). The general design of a complex operator -- see figure 5 for an operator responsible for appointment objects - consists of three subgoals: o (find-roles ...) Retrieve tile content of the object. &quot;ghe operators responsible for soh'ing the find-roles goal optionally allow for an enumeration of the roles we want to use.</Paragraph> <Paragraph position="9"> e (split-roles . . .) These roles (and values) will be partitioned,into chunks, (which we, call a split) suitable for generating one sentence.</Paragraph> <Paragraph position="10"> Behind the functionality of the split-roles goal we use pairs of operators (figure 6), where the first is a fact describing the roles of the split, and the second is a description for how to realize the sentence. In this example the selection of an appropriate verb is not performed by this operator but by the semantic constructor.</Paragraph> <Paragraph position="11"> The second type of operators are simple operators like the one for the generation of time expressions (tempex) or cities (see figure 4).</Paragraph> <Paragraph position="12"> Figure 7 shows a simplified plan processor output (building block) for one sentence.</Paragraph> <Paragraph position="13"> The task of the semantic constructor is to map the information about sentences computed by the plan processor to full semantic representations (VITs). The knowledge source for this computational step is a declarative set of about 160 different semantically oriented sentence patterns which are encoded in an easily extendable semantic/syntactic description language.</Paragraph> <Paragraph position="14"> To obtain a complete semantic representation for a sentence we first select a sentence pattern. This pattern is then, together with tile output of the plan processor, interpreted to produce the VIT. The selection criteria for a sentence pattern are: All patterns are ordered topic-wise because the appropriateness of sentence patterns is topic-dependent (e.g., the insertion of topic-specific NPs or PPs into a sentence).</Paragraph> <Paragraph position="15"> -+ The int.entional state of the inforination to be verbalized highly restricts the set of appropriate verbs.</Paragraph> <Paragraph position="16"> Depending on the propositional content described within a DIat-:x-VIT - i.e., a VIT representing one sentence part in a building block of the plan processor output - it has to play different semantic roles in the sentence (e.g., verb-argument vs. verb-complement) Additionally, the number of DtREx-VITs given within a building block for a sentence, influences the distribution of them to appropriate semantic roles. Figure 8 shows a simplified sentence pattern that is selected for the building block in figure 7 to construct a VIT for, e.g., the German sentence Das Einzelzimmer kostet 80 Euro pro Nacht. (&quot;The single room costs 80 euro per night.&quot;). According</Paragraph> <Paragraph position="18"> to the above mentioned selection criteria, this pattern is selected only for building blocks within . ...the.~ accommodation:topi.c~ that-contain, at least ,values for the roles HAS.SIZE and HAS.PRIZE, respectively. The sentence pattern contains the following &quot;building instructions&quot;: The semantic verb predicate (:verb) is kosten_v (to cost), its subject argument (:subj) is to be filled by the DIREX-VIT associated to the DmEx-role HAS.SIZE while :obj means a similar instruction for the direct object.</Paragraph> <Paragraph position="19"> The robustness fallback (:rest DIREX._PPS) means .that.all_other DmEx=VITs are attached to the verb as PP complementSS. It ispah ~/f a\]l 'Sen~df/6+ pitterns to ensure that even erroneous building blocks or erroneously selected sentence patterns produce a sentence VIT.</Paragraph> <Paragraph position="20"> Finally, the VIT is constructed by interpreting the sentence pattern. The interpreter walks through the sentence pattern and performs different actions depending on the keywords, e.g., :verb, :subj and their values.</Paragraph> <Paragraph position="21"> During'the course of the generation, the plan processor incrementally constructs a context (Dale, 1995), which allows for the generation of, c.f., anaphora or demonstratives for making the text fluent or contrasting purposes.</Paragraph> <Paragraph position="22"> * Anaphora If, e.g., a meeting is split into more than one sentence, the plan processor uses an anaphora to the meeting in the second sentence.</Paragraph> <Paragraph position="23"> * Discourse Markers In case of multiple, e.g., meetings we introduce the second with a discourse marker, e.g., &quot;also&quot;.</Paragraph> <Paragraph position="24"> o Demonstratives In case of multiple meetings, we use a demonstrative to refer to the second meeting.</Paragraph> <Paragraph position="25"> In addition to the plan processor, the semantic constructor also takes care of coherence within the paragraphs produced for the individual topics hereby focusing on the generation of anaphora and adverbial discourse markers. While the local context of the plan processor is based on the propositional content at hand, the semantic constructor uses a postprocessing module that is based oil the output \qTs of the plan processor (DIREx-VITs) using its own semantically oriented local context memory.</Paragraph> <Paragraph position="26"> Anaphorization and insertion of discourse markers within the semantic constructor are based on a comparison of plan processor output VITs occurring within consecutive sentences of a paragraph. Identical verb arguments (NPs) in consecutive sen., tences are replaced by .appropriate anaphoric pronouns while identical verbs themselves lead to the insertion of an appropriate adverbial discourse marker.</Paragraph> </Section> </Section> <Section position="7" start_page="152" end_page="153" type="metho"> <SectionTitle> 5 Multilinguality </SectionTitle> <Paragraph position="0"> The generation of dialogue scripts and result summaries is fully implemented in VERB~VIoBIL for German and English. For the English smnmaries we extracted, then the transfer module produces equivalent English VITs which are finally sent to the English generation component for producing the English text.</Paragraph> <Paragraph position="1"> Figure 9 shows the English result summary of the dialogue shown in the appendix.</Paragraph> <Paragraph position="2"> make use of the transfer component as follows. All o TN A feature was not part of the dialogue, and VITs from the German-document representation are . not included in. the..summary. The evaluation result is shown in figure 10. It uses the standard precision, recall and fallout as defined</Paragraph> </Section> class="xml-element"></Paper>