File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/95/p95-1005_metho.xml
Size: 22,096 bytes
Last Modified: 2025-10-06 14:14:00
<?xml version="1.0" standalone="yes"?> <Paper uid="P95-1005"> <Title>Discourse Processing of Dialogues with Multiple Threads</Title> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In this paper we will present our ongoing work on a plan-based discourse processor developed in the context of the Enthusiast Spanish to English translation system (Suhm et al. 1994) as part of the JANUS multi-lingual speech-to-speech translation system.</Paragraph> <Paragraph position="1"> The focus of the work reported here has been to draw upon techniques developed recently in the computational discourse processing community (Lambert 1994; Lambert 1993; Hinkelman 1990), developing a discourse processor flexible enough to cover a large corpus of spontaneous dialogues in which two speakers attempt to schedule a meeting.</Paragraph> <Paragraph position="2"> There are two main contributions of the work we will discuss in this paper. From a theoretical standpoint, we will demonstrate that theories which postulate a strict tree structure of discourse (henceforth, Tree Structure Theory, or TST) on either the intentional level or the attentional level (Grosz and Sidner 1986) are not totally adequate for covering spontaneous dialogues, particularly negotiation dialogues which are composed of multiple threads. These are negotiation dialogues in which multiple propositions are negotiated in parallel. We will discuss our proposea extension to TST which handles these structures in a perspicuous manner. From a practical standpoint, our second contribution will be a description of our implemented discourse processor which makes use of this extension of TST, taking as input the imperfect result of parsing these spontaneous dialogues.</Paragraph> <Paragraph position="3"> We will also present a comparison of the performance of two versions of our discourse processor, one based on strict TST, and one with our extended version of TST, demonstrating that our extension of TST yields an improvement in performance on spontaneous scheduling dialogues.</Paragraph> <Paragraph position="4"> A strength of our discourse processor is that because it was designed to take a language-independent meaning representation (interlingua) as its input, it runs without modification on either English or Spanish input. Development of our discourse processor was based on a corpus of 20 spontaneous Spanish scheduling dialogues containing a total of 630 sentences. Although development and initial testing of the discourse processor was done with Spanish dialogues, the theoretical work on the model as well as the evaluation presented in this paper was done with spontaneous English dialogues.</Paragraph> <Paragraph position="5"> In section 2, we will argue that our proposed extension to Standard TST is necessary for making correct predictions about patterns of referring expressions found in dialogues where multiple alternatives are argued in parallel. In section 3 we will present our implementation of Extended TST. Finally, in section 4 we will present an evaluation of the performance of our discourse processor with Extended TST compared to its performance using Standard TST.</Paragraph> </Section> <Section position="6" start_page="0" end_page="33" type="metho"> <SectionTitle> 2 Discourse Structure </SectionTitle> <Paragraph position="0"> Our discourse model is based on an analysis of naturally occurring scheduling dialogues. Figures 1 and 2 contain examples which are adapted from naturally occurring scheduling dialogues. These examples contain the sorts of phenomena we have found in our corpus but have been been simplified for the How does your schedule look for next week? Well, Monday and Tuesday both mornings are good.</Paragraph> <Paragraph position="1"> Wednesday afternoon is good also.</Paragraph> <Paragraph position="2"> It looks like it will have to be Thursday then.</Paragraph> <Paragraph position="3"> Or Friday would also possibly work.</Paragraph> <Paragraph position="4"> Do you have time between twelve and two on Thursday? Or do you think sometime Friday afternoon you could meet? No.</Paragraph> <Paragraph position="5"> Thursday I have a class.</Paragraph> <Paragraph position="6"> And Friday is really tight for me.</Paragraph> <Paragraph position="7"> How is the next week? If all else fails there is always video conferencing. S 1: Monday, Tuesday, and Wednesday I am out of town. But Thursday and Friday are both good.</Paragraph> <Paragraph position="8"> How about Thursday at twelve? $2: Sounds good.</Paragraph> <Paragraph position="9"> See you then.</Paragraph> <Paragraph position="10"> 1: Example of Deliberating Over A Meeting Time purpose of making our argument easy to follow. Notice that in both of these examples, the speakers negotiate over multiple alternatives in parallel. We challenge an assumption underlying the best known theories of discourse structure (Grosz and Sidner 1986; Scha and Polanyi 1988; Polanyi 1988; Mann and Thompson 1986), namely that discourse has a recursive, tree-like structure. Webber (1991) points out that Attentional State i is modeled equivalently as a stack, as in Grosz and Sidner's approach, or by constraining the current discourse segment to attach on the rightmost frontier of the discourse structure, as in Polanyi and Scha's approach. This is because attaching a leaf node corresponds to pushing a new element on the stack; adjoining a node Di to a node Dj corresponds to popping all the stack elements through the one corresponding to Dj and pushing Di on the stack. Grosz and Sider (1986), and more recently Lochbaum (1994), do not formally constrain their intentional structure to a strict tree structure, but they effectively impose this limitation in cases where an anaphoric link must be made between an expression inside of the current discourse segment and an entity evoked in a different for computing which discourse entities are most salient. segment. If the expression can only refer to an entity on the stack, then the discourse segment purpose 2 of the current discourse segment must be attached to the rightmost frontier of the intentional structure. Otherwise the entity which the expression refers to would have already been popped from the stack by the time the reference would need to be resolved. We develop our theory of discourse structure in the spirit of (Grosz and Sidner 1986) which has played an influential role in the analysis of discourse entity saliency and in the development of dialogue processing systems. Before we make our argument, we will argue for our approach to discourse segmentation. In a recent extension to Grosz and Sidner's original theory, described in (Lochbaum 1994), each discourse segment purpose corresponds to a partial or full shared plan 3 (Grosz and Kraus 1993). These discourse segment purposes are expressed in terms of the two intention operators described in (Grosz and Kraus 1993), namely Int. To which represents an agent's intention to perform some action and 2A discourse segment purpose denotes the goal which the speaker(s) attempt to accomplish in engaging in the associated segment of talk.</Paragraph> <Paragraph position="11"> 1. When can you meet next week? SI: DS 1 2. Tuesday afternoon looks good. S2: .... DS2 3. I could do it Wednesday morning too.</Paragraph> <Paragraph position="12"> DS 3 4. Tuesday I have a class from 12:00-1:30. Sl: . DS 4 5. But the other day sounds good.</Paragraph> <Paragraph position="13"> DSA 1. When can you meet next week? r--- DSB ! ' 2. Tuesday afternoon looks good. ! i DS C ....! ......</Paragraph> <Paragraph position="14"> 3. I could do it Wednesday morning too.</Paragraph> <Paragraph position="15"> DS D ~--, 4. Tuesday I have aclass from 12:00-1:30. .- .... DSE i 5. But the other day sounds good.</Paragraph> <Paragraph position="16"> Simple Stack based Structure Proposed Structure Int. That which represents an agent's intention that some proposition hold. Potential intentions are used to account for an agent's process of weighing different means for accomplishing an action he is committed to performing (Bratman, Israel, & Pollack 1988). These potential intentions, Pot.Int. To and Pot.Int. That, are not discourse segment purposes in Lochbaum's theory since they cannot form the basis for a shared plan having not been decided upon yet and being associated with only one agent. It is not until they have been decided upon that they become Int. To's and Int. That's which can then become discourse segment purposes. We argue that potential intentions must be able to be discourse segment purposes.</Paragraph> <Paragraph position="17"> Potential intentions are expressed within portions of dialogues where speakers negotiate over how to accomplish a task which they are committed to completing together. For example, deliberation over how to accomplish a shared plan can be represented as an expression of multiple Pot.Int. To's and Pot.Int. That's, each corresponding to different alternatives. As we understand Lochbaum's theory, for each factor distinguishing these alternatives, the potential intentions are all discussed inside of a single discourse segment whose purpose is to explore the options so that the decision can be made.</Paragraph> <Paragraph position="18"> The stipulation that Int. To's and Int. That's can be discourse segment purposes but Pot.Int. To's and Pot.Int. That's cannot has a major impact on the analysis of scheduling dialogues such as the one in Figure 1 since the majority of the exchanges in scheduling dialogues are devoted to deliberating over which date and at which time to schedule a meeting. This would seem to leave all of the deliberation over meeting times within a single monolithic discourse segment, leaving the vast majority of the dialogue with no segmentation. As a result, we are left with the question of how to account for shifts in focus which seem to occur within the deliberation segment as evidenced by the types of pronominal references which occur within it. For example, in the dialogue presented in Figure 1, how would it be possible to account for the differences in interpretation of &quot;Monday&quot; and &quot;Tuesday&quot; in (3) with &quot;Monday&quot; and &quot;Tuesday&quot; in (14)? It cannot simply be a matter of immediate focus since the week is never mentioned in (13). And there are no semantic clues in the sentences themselves to let the hearer know which week is intended. Either there is some sort of structure in this segment more fine grained than would be obtained if Pot.Int. To's and Pot.Int. That's cannot be discourse segment purposes, or another mechanism must be proposed to account for the shift in focus which occurs within the single segment. We argue that rather than propose an additional mechanism, it is more perspicuous to lift the restriction that Pot.Int. To's and Pot.Int. That's cannot be discourse segment purposes. In our approach a separate discourse segment is allocated for every potential plan discussed in the dialogue, one corresponding to each parallel potential intention expressed.</Paragraph> <Paragraph position="19"> Assuming that potential intentions form the basis for discourse segment purposes just as intentions do, we present two alternative analyses for an example dialogue in Figure 2. The one on the left is the one which would be obtained if Attentional State were modeled as a stack. It has two shortcomings. The first is that the suggestion for meeting on Wednesday in DS 2 is treated like an interruption. Its focus space is pushed onto the stack and then popped off when the focus space for the response to the suggestion for Tuesday in DS 3 is pushed 4. Clearly, this suggestion is not an interruption however. Furthermore, since the focus space for DS 2 is popped off when the focus space for DS 4 is pushed on, 'Wednesday is nowhere on the focus stack when &quot;the other day&quot;, from sentence 5, must be resolved. The only time expression on the focus stack at that point would be &quot;next week&quot;. But clearly this expression refers to Wednesday. So the other problem is that it makes it impossible to resolve anaphoric referring expressions adequately in the case where there are multiple threads, as in the case of parallel suggestions negotiated at once.</Paragraph> <Paragraph position="20"> We approach this problem by modeling Attentional State as a graph structured stack rather than as a simple stack. A graph structured stack is a stack which can have multiple top elements at any point. Because it is possible to maintain more than one top element, it is possible to separate multiple threads in discourse by allowing the stack to branch out, keeping one branch for each thread, with the one most recently referred to more strongly in focus than the others. The analysis on the right hand side of Figure 2 shows the two branches in different patterns. In this case, it is possible to resolve the reference for &quot;the other day&quot; since it would still be on the stack when the reference would need to be resolved. Implications of this model of Attentional State are explored more fully in (Rosd 1995).</Paragraph> </Section> <Section position="7" start_page="33" end_page="35" type="metho"> <SectionTitle> 3 Discourse Processing </SectionTitle> <Paragraph position="0"> We evaluated the effectiveness of our theory of discourse structure in the context of our implemented discourse processor which is part of the Enthusiast Speech translation system. Traditionally machine translation systems have processed sentences in isolation. Recently, however, beginning with work at ATR, there has been an interest in making use of discourse information in machine translation. In (Iida and Arita 1990; Kogura et al. 1990), researchers at ATR advocate an approach to machine translation called illocutionary act based translation, arguing that equivalent sentence forms do not necessarily carry the same illocutionary force between languages. Our implementation is described more fully in (Rosd 1994). See Figure 4 for the discourse rep4Alternatively, DS 2 could not be treated like an interruption, in which case DS 1 would be popped before resentation our discourse processor obtains for the example dialogue in Figure 2. Note that although a complete tripartite structure (Lambert 1993) is computed, only the discourse level is displayed here.</Paragraph> <Paragraph position="1"> Development of our discourse processor was based on a corpus of 20 spontaneous Spanish scheduling dialogues containing a total of 630 sentences. These dialogues were transcribed and then parsed with the GLR* skipping parser (Lavie and Tomita 1993). The resulting interlingua structures (See Figure 3 for an example) were then processed by a set of matching rules which assigned a set of possible speech acts based on the interlingua representation returned by the parser similar to those described in (Hinkelman 1990). Notice that the list of possible speech acts resulting from the pattern matching process are inserted in the a-speech-act slot ('a' for ambiguous).</Paragraph> <Paragraph position="2"> It is the structure resulting from this pattern matching process which forms the input to the discourse processor. Our goals for the discourse processor include recognizing speech acts and resolving ellipsis and anaphora. In this paper we focus on the task of selecting the correct speech act.</Paragraph> <Paragraph position="3"> Our discourse processor is an extension of Lambert's implementation (Lambert 1994; Lambert 1993; Lambert and Carberry 1991). We have chosen to pattern our discourse processor after Lambert's recent work because of its relatively broad coverage in comparison with other computational discourse models and because of the way it represents relationships between sentences, making it possible to recognize actions expressed over multiple sentences.</Paragraph> <Paragraph position="4"> We have left out aspects of Lambert's model which are too knowledge intensive to get the kind of coverage we need. We have also extended the set of structures recognized on the discourse level in order to identify speech acts such as Suggest, Accept, and Reject which are common in negotiation discourse.</Paragraph> <Paragraph position="5"> There are a total of thirteen possible speech acts which we identify with our discourse processor. See Figure 5 for a complete list.</Paragraph> <Paragraph position="7"> It is commonly impossible to tell out of context which speech act might be performed by some utterances since without the disambiguating context they could perform multiple speech acts. For example, &quot;I'm free Tuesday.&quot; could be either a Suggest or an Accept. &quot;Tuesday I have a class.&quot; could be a State-Constraint or a Reject. And &quot;So we can meet Tuesday at 5:00.&quot; could be a Suggest or a Confirm-Appointment. That is why it is important to construct a discourse model which makes it possible to make use of contextual information for the purpose of disambiguating.</Paragraph> <Paragraph position="8"> Some speech acts have weaker forms associated with them in our model. Weaker and stronger forms very roughly correspond to direct and indirect speech acts. Because every suggestion, rejection, acceptance, or appointment confirmation is also giving information about the schedule of the speaker, State-Constraint is considered to be a weaker form of Suggest, Reject, Accept, and Confirm-Appointment.</Paragraph> <Paragraph position="9"> Also, since every acceptance expressed as &quot;yes&quot; is also an affirmative answer, Affirm is considered to be a weaker form of Accept. Likewise Negate is considered a weaker form of Reject. This will come into play in the next section when we discuss our evaluation. null When the discourse processor computes a chain of inference for the current input sentence, it attaches it to the current plan tree. Where it attaches determines which speech act is assigned to the input sentence. For example, notice than in Figure 4, because sentences 4 and 5 attach as responses, they are assigned speech acts which are responses (i.e. either Accept or Reject). Since sentence 4 chains up to an instantiation of the Response operator from an instantiation of the Reject operator, it is assigned the speech act Reject. Similarly, sentence 5 chains up to an instantiation of the Response operator from an instantiation of the Accept operator, sentence 5 is assigned the speech act Accept. After the discourse See you then.</Paragraph> <Paragraph position="10"> Are you free on the morning of the eighth? Tuesday I have a class.</Paragraph> <Paragraph position="11"> Thursday I'm free the whole day.</Paragraph> <Paragraph position="12"> This week looks pretty busy for me.</Paragraph> <Paragraph position="13"> processor attaches the current sentence to the plan tree thereby selecting the correct speech act in context, it inserts the correct speech act in the speech-act slot in the interlingua structure. Some speech acts are not recognized by attaching them to the previous plan tree. These are speech acts such as Suggest which are not responses to previous speech acts. These are recognized in cases where the plan inferencer chooses not to attach the current inference chain to the previous plan tree.</Paragraph> <Paragraph position="14"> When the chain of inference for the current sentence is attached to the plan tree, not only is the speech act selected, but the meaning representation for the current sentence is augmented from context. Currently we have only a limited version of this process implemented, namely one which augments the time expressions between previous time expressions and current time expressions. For example, consider the case where Tuesday, April eleventh has been suggested, and then the response only makes reference to Tuesday. When the response is attached to the suggestion, the rest of the time expression can be filled in.</Paragraph> <Paragraph position="15"> The decision of which chain of inference to select and where to attach the chosen chain, if anywhere, is made by the focusing heuristic which is a version of the one described in (Lambert 1993) which has been modified to reflect our theory of discourse structure. In Lambert's model, the focus stack is represented implicitly in the rightmost frontier of the plan tree called the active path. In order to have a focus stack which can branch out like a graph structured stack in this framework, we have extended Lambert's plan operator formalism to include annotations on the actions in the body of decomposition plan operators which indicate whether that action should appear 0 or 1 times, 0 or more times, 1 or more times, or exactly 1 time. When an attachment to the active path is attempted, a regular expression evaluator checks to see that it is acceptable to make that attachment according to the annotations in the plan operator of which this new action would become a child. If an action on the active path is a repeating action, rather than only the rightmost instance being included on the active path, all adjacent instances of this repeating action would be included. For example, in Figure 4, after sentence 3, not only is the second, rightmost suggestion in focus, along with its corresponding inference chain, but both suggestions are in focus, with the rightmost one being slightly more accessible than the previous one. So when the first response is processed, it can attach to the first suggestion. And when the second response is processed, it can be attached to the second suggestion. Both suggestions remain in focus as long as the node which immediately dominates the parallel suggestions is on the rightmost frontier of the plan tree. Our version of Lambert's focusing heuristic is described in more detail in (Ros~ 1994).</Paragraph> </Section> class="xml-element"></Paper>