File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/a97-1007_metho.xml
Size: 15,643 bytes
Last Modified: 2025-10-06 14:14:32
<?xml version="1.0" standalone="yes"?> <Paper uid="A97-1007"> <Title>Insights into the Dialogue Processing of VERBMOBIL</Title> <Section position="5" start_page="0" end_page="33" type="metho"> <SectionTitle> 2 Introduction to Dialogue </SectionTitle> <Paragraph position="0"> Processing in VERBMOBIL In contrast to many other NL-systems, the VEaBMOBIL system is mediating a dialogue between two persons. No restrictions are put on the locutors, except for the limitation to stick to the approx. 2500 words VERBMOBIL recognizes. Therefore, VERBMOBIL and especially its dialogue component has to follow the dialogue in any direction. In addition, the dialogue module is faced with incomplete and incorrect input, and sometimes even gaps.</Paragraph> <Paragraph position="1"> When designing a component for such a scenario, we have chosen not to use one big constrained processing tool. Instead, we have selected a combination of several simple and efficient approaches, which together form a robust and efficient processing platform. null As an effect of the mediating scenario, our module cannot serve as a &quot;dialogue controller&quot; like in man-machine dialogues. The only exception is when</Paragraph> </Section> <Section position="6" start_page="33" end_page="33" type="metho"> <SectionTitle> (GREET, INTRODUCE--NAME, INIT..DATE, </SectionTitle> <Paragraph position="0"> SUGGEST.SUPPORT-DATE) (Hello, Mrs. Klein, we should arrange an appointment, for the team meeting) A03: Ja,// ich eiird* Ihnen vorschlagen im Januar,// zwischen dam ffinfzehnten und neunzehnten.</Paragraph> <Paragraph position="2"> fifteenth and the nineteenth) B04: Oh // das ist ganz echlecht. // zwischen dem elften und achtzehnten Janua~ bin ich in Hamburg.</Paragraph> <Paragraph position="3"> (UPTAKE, REJECT.DATE, SUGGEST.SUPPORT-DATE) (Oh, that is really inconvenient, I'm in Hamburg between the eighteenth of January and the eleventh, ) clarification dialogues are necessary between VERBMOBIL and a user.</Paragraph> <Paragraph position="4"> Due to its role as information server in the overall VERBMOBIL system, we started early in the project to collect requirements from other components in the system. The result can be divided into three subtasks: * we.allow for other components to store and retrieve context information.</Paragraph> <Paragraph position="5"> . we draw inferences on the basis of our input. * we predict what is going to happen next. Moreover, within VERBMOBIL there are different processing tracks: parallel to the deep, linguistic based processing, different shallow processing modules als0 enter information into, and retrieve it from, the dialogue module. The data from these parallel tracks must be consistently stored and made accessible in a uniform manner.</Paragraph> <Paragraph position="6"> Figure 2 shows a screen dump of the graphical user interface of our component while processing the example dialogue. In the upper left corner we see the structures of the dialogue sequence memory, where the middle right row represents turns, and the left and right rows represent utterances as segmented by different analysis components. The upper right part shows the intentional structure built by the plan recognizer. Our module contains two instances of a finite state automaton. The one in the lower left corner is used for performing clarification dialogues, and the other for visualization purposes (see section 7). The thematic structure representing temporal expressions is displayed in the lower right corner.</Paragraph> </Section> <Section position="7" start_page="33" end_page="35" type="metho"> <SectionTitle> 3 Maintaining Context </SectionTitle> <Paragraph position="0"> As basis for storing context information we developed the dialogue sequence memory. It is a generic structure which mirrors the sequential order of turns and utterances. A wide range of operation has been defined on this structure. For each turn, we store e.g. the speaker identification, the language of the contribution, the processing track finally selected for translation, and the number of translated utter- null ances. For the utterances we store e.g. the dialogue act, dialogue phase, and predictions. These data are partly provided by other modules of VERBMOBIL or computed within the dialogue module itself (see below). null Figure 3 shows the dialogue sequence memory after the processing of turn B02. For the deep analysis side (to the right), the turn is segmented into four utterances: Guten Tag//~u Klein// Wit m~ssen noch einen Terrain ausmachen //flit die Mitarbeiterbesprechung, for which the semantic evaluation component has assigned the dialogue acts GREET, INTRODUCE-NAME, INIT_DATE, and SUG-GEST_SUPPORT_DATE respectively. To the left we see the results of one of the shallow analysis components. It splits up the input into two utterances Guten Tag F~au Klein// Wit m~ssen ... die Mitarbeiterbesprechung and assigns the dialogue acts GREET and INIT_DATE.</Paragraph> <Paragraph position="1"> The need for and use of this structure is highlighted by the following example. In the domain of appointment scheduling the German phrase Geht es bei Ihnen? is ambiguous: bei lhnen can either refer to a location, in which case the translation is Would it be okay at your place? or, to a certain time. In the latter case the correct translation is Is that possible for your. A simple way of disambiguating this is to look at the preceding dialogue act(s). In our example dialogue, turn A13, the utterance ich wiirde ahm vierzehn Uhr vorschlagen (I would hmm fourteen o'clock suggest) contains the proposal of a time, which is characterized by the dialogue act SUGGEST_SUPPORT-DATE. With this dialogue act in the immediately preceding context the ambiguity is resolved as referring to a time and the correct translation is determined.</Paragraph> <Paragraph position="2"> In our domain, in addition to the dialogue act the most important propositional information are the dates as proposed, rejected, and finally accepted by the users of VERBMOBIL. While it is the task of the semantic evaluation module to extract time information from the actual utterances, the dialogue module integrates those information in its thematic memory. This includes resolving relative time expressions, e.g. two weeks ago, into precise time descriptions, like &quot;23rd week of 1996&quot;. The information about the dates is split in a specialization hierarchy. Each date to be negotiated serves as a root, while the nodes represent the information about years, months, weeks, days, days of week, period of day and finally time. Each node contains also information about the attitude of the dialogue participants concerning this certain item: proposed, rejected, or accepted by one of the participants.</Paragraph> <Paragraph position="3"> Figure 4 shows parts of the thematic structure after the processing of turn B10. The black boxes stand for the date currently under consideration.</Paragraph> <Paragraph position="4"> Thursday, 8., is the current date agreed upon. We also see the previously proposed interval from 6.-9.</Paragraph> <Paragraph position="5"> of the same month in the box above (FROM_T0 (6,9)).</Paragraph> </Section> <Section position="8" start_page="35" end_page="36" type="metho"> <SectionTitle> 4 Inferences </SectionTitle> <Paragraph position="0"> Besides the mere storage of dialogue related data, there are also inference mechanisms integrating the data in representations of different aspects of the dialogue. These data are again stored in the context memories shown above and are accessed by the other VERBMOBIL modules.</Paragraph> <Paragraph position="1"> Inspecting our corpus, we can distinguish three phases in most of the dialogues. In the first, the opening phase, the locutors greet each other and the topic of the dialogue is introduced. The dialogue then proceeds into the negotiation phase, where the actual negotiation takes place. It concludes in the closing phase where the negotiated topic is confirmed and the locutors say goodbye. This phase information contributes to the correct transfer of an utterance. For example, the German utterance Guten Tag is translated to &quot;Hello&quot; in the greeting phase, and to &quot;Good day&quot; in the closing phase.</Paragraph> <Paragraph position="2"> The task of determining the phase of the dialogue has been given to the plan recognizer (Alexandersson, 1995). It builds a tree like structure which we call the intentional structure. The current version makes use of plan operators both hand coded and automatically derived from the VERBMOBIL corpus. The method used is transferred from the field of grammar extraction (Stolcke, 1994). To contribute to the robustness of the system, the processing of the recognizer is divided into several processing levels like the &quot;turn level&quot; and the &quot;domain dependent level&quot;. The concepts of turn levels and the automatic acquisition of operators are described in (Alexandersson, 1996).</Paragraph> <Paragraph position="3"> In figure 5 we see the structure after processing turns B02 and A03. The leaves of the tree are the dialogue acts. The root node of the left subtree for B02 is a GREE(T)-INIT-... operator which belongs to the greeting phase, while the partly visible one to the right belongs to the negotiation phase.</Paragraph> <Paragraph position="4"> In the example used in this paper we are processing a &quot;well formed&quot; dialogue, so the turn structure can be linked into a structure spanning over the whole dialogue. We also see in figure 3 how the phase information has been written into the boxes representing the utterances of turn B02 as segmented by the deep analysis.</Paragraph> <Paragraph position="5"> Thematic Inferences In scheduling dialogues, referring expressions like the German word ndchste occur frequently. Depending on the thematic structure it can be translated as next if the date referred to is immediately after the speaking time, or .following in the other cases. The thematic structure is mainly used to resolve this type of anaphoric expressions if requested by the semantic evaluation or the transfer module. The information about the relation between the date under consideration and the speaking time can be immediately computed from the thematic structure.</Paragraph> <Paragraph position="6"> The thematic structure is also used to check whether the time expressions are correctly recognized. If some implausible dates are recognized, e.g. April, 31, a clarification can be invoked. The system proposes the speaker a more plausible date, and waits for an acceptance or rejection of the proposal.</Paragraph> <Paragraph position="7"> In the first case, the correct date will be translated, in the latter, the user is asked to repeat the whole turn.</Paragraph> <Paragraph position="8"> Using the current state of the thematic structure and the dialogue act in combination with the time information of an utterance, multiple readings can be inferred (Maier, 1996). For example, if both locutors propose different dates, an implicit rejection of the former date can be assumed.</Paragraph> </Section> <Section position="9" start_page="36" end_page="37" type="metho"> <SectionTitle> 5 Predictions </SectionTitle> <Paragraph position="0"> A different type of inference is used to generate predictions about what comes next. While the plan-based component uses declarative knowledge, albeit acquired automatically, dialogue act predictions are based solely on the annotated VERBMOBIL corpus.</Paragraph> <Paragraph position="1"> The computation uses the conditional frequencies of dialogue act sequences to compute probabilities of the most likely follow-up dialogue acts (Reithinger et al., 1996), a method adapted from language modeling (Jelinek, 1990). As described above, the dialogue sequence memory serves as the central repository for this information.</Paragraph> <Paragraph position="2"> The sequence memory in figure 3 shows in addi- null tion to the actual recognized dialogue act also the predictions for the following utterance. In (Reithinger et al., 1996) it is demonstrated that exploiting the speaker direction significantly enhances the prediction reliability. Therefore, predictions are computed for both speakers. The numbers after the predicted dialogue acts show the prediction probabilities times 1000.</Paragraph> <Paragraph position="3"> As can be seen in the figure, the actually recognized dialogue acts are, for this turn, among the two most probable predicted acts. Overall, approx. 74% of all recognized dialogue acts are within the first three predicted ones.</Paragraph> <Paragraph position="4"> Major consumers of the predictions are the semantic evaluation module, and the shallow translation module. The former module that uses mainly knowledge based methods to determine the dialogue act of an utterance exploits the predictions to narrow down the number of possible acts to consider. The shallow translation module integrates the predictions within a Bayesian classifier to compute dialogue acts directly from the word string.</Paragraph> </Section> <Section position="10" start_page="37" end_page="38" type="metho"> <SectionTitle> 6 Robustness </SectionTitle> <Paragraph position="0"> For the dialogue module there are two major points of insecurity during operation. On the one hand, the user's dialogue behaviour cannot be controlled.</Paragraph> <Paragraph position="1"> On the other hand, the segmentation as computed by the syntactic-semantic construction module, and the dialogue acts as computed by the semantic evaluation module, are very often not the ones a linguistic analysis on the paper will produce. Our example dialogue is a very good example for the latter problem.</Paragraph> <Paragraph position="2"> Since no module in VERBMOBIL must ever crash, we had to apply various methods to get a high degree of robustness. The most knowledge intensive module is the plan recognizer. The robustness of this sub-component is ensured by dividing the construction of the intentional structure into several processing levels. Additionally, at the turn level the operators are learned from the annotated corpus. If the construction of parts of the structure fails, some functionality has been developed to recover. An important ingredience of the processing is the notion of repair - if the plan construction is faced with something unexpected, it uses a set of specialized repair operators to recover. If parts of the structure could not be built, we can estimate on the basis of predictions what the gap consisted of.</Paragraph> <Paragraph position="3"> The statistical knowledge base for the prediction algorithm is trained on the VZRBMOmL corpus that in its major parts contains well-behaved dialogues.</Paragraph> <Paragraph position="4"> Although prediction quality gets worse if a sequence of dialogue acts has never been seen, the interpola- null tion approach to compute the predictions still delivers useful data.</Paragraph> <Paragraph position="5"> As mentioned above, to contribute to the correctness of the overall system we perform different kinds of clarification dialogues with the user. In addition to the inconsistent dates, we also e.g. recognize similar words in the input that will be most likely exchanged by the speech recognizer. Examples are the German words for thirteenth (dreizehnter) and thirtieth (dreifligster). Within a uniform computer-human interaction, we resolve these problems.</Paragraph> </Section> class="xml-element"></Paper>