File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/j98-4001_metho.xml
Size: 117,819 bytes
Last Modified: 2025-10-06 14:14:47
<?xml version="1.0" standalone="yes"?> <Paper uid="J98-4001"> <Title>A Collaborative Planning Model of Intentional Structure</Title> <Section position="3" start_page="528" end_page="534" type="metho"> <SectionTitle> 2. The SharedPlan Definitions </SectionTitle> <Paragraph position="0"> Grosz and Sidner (1990) originally proposed SharedPlans as a more appropriate model of plans for discourse than the single-agent plans based on AI planning formalisms such as STRIPS (Fikes and Nilsson 1971). SharedPlans differ from these other types of plans in providing a model of collaborative, multiagent plans. Collaborative plans better characterize the nature of discourse. As Grosz and Sidner put it (1990, 418): Discourses are fundamentally examples of collaborative behavior. The participants in a discourse work together to satisfy various of their individual and joint needs. Thus, to be sufficient to underlie discourse theory, a theory of actions, plans, and plan recognition must deal adequately with collaboration.</Paragraph> <Paragraph position="1"> Models of single-agent plans are not sufficient for this purpose. As Grosz and Sidner and others (Searle 1990; Bratman 1992; Grosz and Kraus 1996) have shown, collaboration cannot be modeled by simply combining the plans of individual agents.</Paragraph> <Paragraph position="2"> SharedPlans are also distinguished from other planning formalisms in taking plans to be complex mental attitudes rather than abstract data structures. As Pollack noted (1990, 77): There are plans and there are plans. There are the plans that an agent &quot;knows&quot;: essentially recipes for performing particular actions or for achieving particular goal states. And there are the plans that an agent adopts and that subsequently guide his action.</Paragraph> <Paragraph position="3"> Whereas data-structure approaches to planning and plan recognition are focused on the first type of plan, mental phenomenon approaches are focused on the second. 2 To distinguish these two types of &quot;plans,&quot; we adopt Pollack's terminology and use the term recipe for the first type. Recipes are structures of actions; they represent what agents know when they know a way of doing something. We also follow Pollack in reserving the term plan for the collection of mental attitudes that an agent, or set of agents, must hold to act successfully. Thus, while recipes are comprised primarily of actions, plans are comprised of beliefs and intentions that are directed at those actions. We elaborate on this point in Section 3.</Paragraph> <Paragraph position="4"> For an agent G to have an individual plan for an act o~, it must satisfy the requirements given below (Grosz and Kraus 1996). 3 We will refer to the act oL as the objective of the agent's plan.</Paragraph> <Paragraph position="5"> 1. G has a recipe for o~ 2 The terms data-structure view of plans and mental phenomenon view of plans were coined by Pollack (1986b). 3 These requirements, and those to follow for collaborative plans, omit the case present in Grosz and Kraus's (1996) work of one agent contracting an act to another.</Paragraph> <Paragraph position="6"> Computational Linguistics Volume 24, Number 4 2. For each constituent act fli of the recipe, G intends to perform fli G believes that it can perform fli G has an individual plan for fli SharedPlans are more complex than individual plans in several ways. First, the group of agents involved in a SharedPlan must have mutual belief of a recipe for action. Second, they must designate a single agent or subgroup of agents to perform each subact in their recipe. If a single agent is selected, that agent must form an individual plan for the subact; if a subgroup is selected, they must form a SharedPlan. Third, the agents involved in a SharedPlan must have commitments toward their own actions, as well as those of their partners. The requirements for a group of agents GR to have a SharedPlan for o~ are as follows (Grosz and Kraus 1996): O. GR is committed to performing o~ 1. GR has a recipe for o~ 2. For each single-agent constituent act fli of the recipe, there is an agent G,ol c GR, such that (a) G~i intends to perform fli G~, believes that it can perform fli G~ has an individual plan for fli (b) The group GR mutually believe (2a) (c) The group GR is committed to G~/s success 3. For each multiagent constituent act fli of the recipe, there is a subgroup of agents GR~ C GR such that (a) GR,o~ mutually believe that they can perform fli GR~i has a SharedPlan for fli (b) The group GR mutually believe (3a) (c) The group GR is committed to GR~/s success Table 1 summarizes the operators used by Grosz and Kraus (1993, 1996) to formalize the requirements of individual and shared plans. Two of these operators, FIP and PIP, are used to model the plans of individual agents. An agent has a FIP or full individual plan when it has established all of the requirements outlined above. When the agent has satisfied only a subset of them, it is said to have a partial individual plan or PIE 4 For multiagent plans, Grosz and Kraus provide two SharedPlan operators: FSP and PSE A set of agents have a full SharedPlan (FSP) when all of the mental attitudes outlined above have been established. Until then, the agents' plan will only be partial (the PSP case). In what follows, we will use the term SharedPlan when the degree of completion of a collaborative plan is not at issue. The definitions of FIP and FSP are given in Figures 4 and 5 respectively. 5 4 This description of a PIP is only a rough, though useful, approximation to Grosz and Kraus's (1993, 1996) formal definition. 5 These definitions are high-level schematics of Grosz and Kraus's (1993, 1996) formal definitions. They serve to highlight those aspects of individual and SharedPlans that are relevant to our work, but omit various formal details.</Paragraph> <Paragraph position="7"> There is a recipe R& for fli such that i. G believes that it can perform j3i according to the recipe (3R&)\[BCBA(G, f~i, R&, T&, Tp, {pj })A ii. G has a full individual plan for fli using the recipe FIP(G, fl~, Tp, Tfl,, RZ,)\] Figure 4 Full individual plan (FIP) definition.</Paragraph> <Paragraph position="8"> As indicated in Clause (1) of the definitions in Figures 4 and 5, recipes are modeled in Grosz and Kraus's definitions as sets of constituent acts and constraints. To perform an act a, an agent must perform each constituent act (the fli in Clause (1)) in a's recipe according to the constraints of that recipe (the pj). Actions themselves may be further decomposed into act-types and parameters. We will represent an action c~ as a term of the form 6(pl ..... Pn) where 6 represents the act-type of the action and the Pi its parameters. Figure 6 provides a graphical representation of a recipe. The operators Int.To and Int.Th in Grosz and Kraus's definitions are used to represent different types of intentions. Int.To represents an agent's intention to perform an action, while Int.Th represents an agent's intention that a proposition hold. Int.To's occur in both types of plans (Clause (2a) in Figures 4 and 5), while Int.Th's occur only in SharedPlans (Clauses (0), (2c) and (3c) in Figure 5). Int.Th's are used to represent commitment to the joint activity and also engender the type of helpful behavior required of collaborating agents (Bratman 1992; Grosz and Kraus 1993, 1996). The operators CBA, BCBA, CBAG, and MBCBAG in Grosz and Kraus's definitions are ability operators; they encode requirements on an agent's ability to perform an There is a recipe R& for/3i such that i. G& believes that it can perform/31 according to the recipe (3R&)\[BCBA(G&,/3,, R&, T&, Tp, {pj }) A ii. G& has a full individual plan for fli using the recipe FIP(G&, fli, Tp, T&, R& )\] (b) The group GR mutually believe (2a) M B( GR, Int.To( G& , fli, Tp, T& )A (3R~,)\[CBA(G&,/3i, R&, T&, {pj}) A FI P( G&, /3i, Tp, T&, R& )\], Tp) (c) The group GR is committed to G&'s success MB(GR, (VGj * GR) Int.Th( Gj, (3R&)CBA(G&, Z,, R& , T& , {pj }), Tp, T& ), Tp) 3. For each multi-agent constituent act/3i of the recipe, there is a subgroup of agents GR& C GR such that (a) There is a recipe R& for/31 such that i. GR& mutually believe that they can perform/31 according to the recipe (3R&)\[MBCBAG(GR&,/3i, R&, T&, Tp, {pj }) A ii. GR& has a full ShaxedPlan for/3i using the recipe FSP( GR&, /3i, Tp, T&, R& )\] (b) The group GR mutually believe (3a) MB(GR, (3R~,)\[CBAG(GR&,/3i, R&, T&, {pj }) A FSP(GR&,/3i, Tp, T&, R& )\], Tp) (c) The group GR is committed to GR~'s success Lochbaum A Collaborative Planning Model has.recipe(G, c~, R, T) 4-~ (1) \[basic.level(a) A BEn(G, basic.level(a), T) A R = REmpt~\] V (2) \[-~basic.level(a) A (2a) R = {~,pj} ^ (2al) {\[IGI = 1 ^ BEn(G, R e Recipes(a),T)\] V (2a2) \[IGI > 1 A MB(G, R E Recipes(a), T)\]}\] Definition of has.recipe.</Paragraph> <Paragraph position="9"> action. CBA (read &quot;can bring about&quot;) and BCBA (&quot;believes can bring about&quot;) are single-agent operators, while CBAG (&quot;can bring about group&quot;) and MBCBAG(&quot;mutually believe can bring about group&quot;) are the corresponding group operators. An agent's ability to perform an action depends upon its ability to satisfy both the physical and knowledge preconditions of that action (McCarthy and Hayes 1969; Moore 1985; Morgenstern 1987). For example, for an agent to pick up a particular tower of blocks, it must (i) know how to pick up towers in general, (ii) be able to identify the tower in question, and (iii) have satisfied the physical preconditions or constraints associated with picking up towers (e.g., it must have a free hand). In Grosz and Kraus's (1996) definitions, conditions of the form in (iii) depend upon the constraints under which an act fli is to be performed. These constraints derive from the recipe in which fli is a constituent and are represented by {pj} in the plan definitions in Figures 4 and 5. Conditions of the form in (i) and (ii) above are knowledge preconditions. Knowledge preconditions were not represented in Grosz and Kraus's (1993) original definitions, but were subsequently formalized by the author. We now present definitions of the relations necessary to model these conditions.</Paragraph> <Section position="1" start_page="532" end_page="534" type="sub_section"> <SectionTitle> 2.1 Knowledge Preconditions </SectionTitle> <Paragraph position="0"> 2.1.1 Determining Recipes. For an agent G to be able to perform an act a, it must know how to perform on; i.e., it must have a recipe for the act. The relation has.recipe(G, o~, R, T) is used to represent that agent G has a recipe R for an act o~ at time T. Its formalization is as shown in Figure 7.</Paragraph> <Paragraph position="1"> Clause (1) of the definition indicates that an agent does not need a recipe to perform a basic-level action, i.e., one executable at will (Pollack 1986a). 6 For nonbasic-level actions (Clause (2)), the agent of o~ (either a single agent (2al) or a group of agents (2a2)) must believe that some set of acts, fli, and constraints, pj, constitute a recipe for a.</Paragraph> <Paragraph position="2"> The has.recipe relation can be used to represent one of the knowledge precondition requirements of the ability operators, as well as the recipe requirement in Clause (1) of the plan definitions in Figures 4 and 5.</Paragraph> <Paragraph position="3"> of an act o~ to be able to perform it. For example, if an agent is told, &quot;Now remove the pump \[off the air compressor\],&quot; as in the dialogue of Figure 1, the agent must be able to identify the pump in question. The ability to identify an object is highly context dependent. For example, as Appelt points out (1985, 200), &quot;the description that one 6 Basic-level actions are by their nature single-agent actions.</Paragraph> <Paragraph position="4"> Computational Linguistics Volume 24, Number 4 must know to carry out a plan requiring the identification of 'John's residence' may be quite different depending on whether one is going to visit him, or mail him a letter.&quot; The relation id.params(G, o~, T) is used to represent that agent G can identify the parameters of act o~ at time T. If o~ is of the form 6(pl ..... pn), then id.params(G, o~, T) is true if G can identify each of the Pi. To do so, G must have a description of each pi that is suitable for 6. The relation id.params is thus defined as follows: id.params(G, 6(pl ..... p,), T) 4-~ (Vi, 1 < i < n) has.sat.descr(G, pi, ~(6, pi), T) The function Y in the above definition is a kind of &quot;oracle&quot; intended to model the context-dependent nature of parameter identification. This function returns a suitable identification constraint (Appelt and Kronfeld 1987) for a parameter pi in the context of an act-type 6. For example, in the case of sending a letter to John's residence, the constraint produced by the oracle function would be that John's residence be described by a postal address.</Paragraph> <Paragraph position="5"> The relation has.sat.descr( G, P, C, T) holds of an agent G, a parameter description P, an identification constraint C, and a time T, if G has a suitable description, as determined by C, of the object described as P at time T. To formalize this relation, we rely on Kronfeld's (1986, 1990) notion of an individuating set. An agent's individuating set for an object is a maximal set of terms such that each term is believed by the agent to denote that object. For example, an agent's individuating set for John's residence might include its postal address as well as an identifying physical description such as &quot;the only yellow house on Cherry Street.&quot; To model individuating sets we introduce a function IS(G,P, T); the function returns an agent G's individuating set at time T for the object that G believes can be described as P. This function is based on similar elements of the formal language that Appelt and Kronfeld (1987) introduce as part of their theory of referring. The function returns a set that contains P as well as the other descriptions that G has for the object that it believes P denotes.</Paragraph> <Paragraph position="6"> The relation has.sat.descr is used to represent that an agent can identify a parameter for some purpose. For that to be the case, the agent must have a description, P~, of the parameter such that P' is of the appropriate sort. For example, for an agent to visit John's residence, it is not sufficient for the agent to believe that the description &quot;John's residence&quot; refers to the place where John lives. Rather, the agent needs another description of John's residence, one such as &quot;the only yellow house on Cherry Street,&quot; that is appropriate for the purpose of visiting him. To model an agent's ability to identify a parameter for some purpose, we thus require that the agent have an individuating set for the parameter that contains a description p/such that P/satisfies the identification constraint that derives from the purpose. The definition of has.sat.descr is thus as shown in Figure 8. 7 The predicate suff.for.id(C, P~) is true if the identification constraint C applies to the parameter description PL The oracle function .~'(6,pi) in id.params is used to produce the appropriate identification constraint on Pi given 6.</Paragraph> <Paragraph position="7"> Identification constraints can derive from syntactic, semantic, discourse, and world ~mowledge (Appelt and Kronfeld 1987).</Paragraph> <Paragraph position="8"> In Figures 7 and 8, we have separated the requirements of recipe identification from those of parameter identification. That is, we have defined has.recipe and id.params as independent relations, and do not require an agent to know the parameters of an act to be said to know a recipe for that act. The separation of these two requirements 7 A more precise account of what it means to be able to identify an object is beyond the scope of this paper; for further details, see the discussions by Hobbs (1985), Appelt (1985), Kronfeld (1986, 1990), and Morgenstern (1988).</Paragraph> <Paragraph position="9"> Lochbaum A Collaborative Planning Model has.sat.descr( G, P, C, T) C/-~ {\[\[G\] = 1 A (3P')BEL(G, \[P' * IS(G,P,T) A suff.for.id(C, P')\], T)\] V \[IG\] > 1 A (3P')MB(G,(VGj * G)\[P' * IS(Gj,P,T) A suff.for.id(C, P')\], T)\]} Definition of has.sat.descr.</Paragraph> <Paragraph position="10"> derives from the distinction between recipes and plans. Whereas an agent may know many recipes for performing an act, he will have a plan for that act only if he is conunitted to its performance using a particular recipe. For example, an agent may know that one way to hijack a plane involves smuggling a gun on that plane, without actually intending to hijack a plane at all. Similarly, an agent can know a way of hijacking a plane without actually having a particular plane, or gun, in mind. For this reason, we do not make id.params a requirement of has.recipe. The separation of these two requirements has particular consequences for our model of discourse processing, as will be discussed in Section 7.</Paragraph> <Paragraph position="11"> With the addition of the knowledge precondition relations defined above, the definitions of the SharedPlan ability operators (CBA, BCBA, CBAG, and MBCBAG) include three components. The definitions of these operators now state that for an agent to be able to perform an act c~, it must (i) have a recipe for o~ (has.recipe), (ii) be able to identify the parameters of o~ (id.params), and (iii) be able to satisfy the constraints of its recipe for o~ (the {pj} in the plan definitions).</Paragraph> </Section> </Section> <Section position="4" start_page="534" end_page="539" type="metho"> <SectionTitle> 3. Reasoning with SharedPlans </SectionTitle> <Paragraph position="0"> In more traditional plan-based approaches to natural language processing (e.g., the work of Cohen and Perrault \[1979\], Allen and Perrault \[1980\], Sidner \[1985\], Carberry \[1987\], Litman and Allen \[1987\], Lambert and Carberry \[1991\]), reasoning about plans is focused on reasoning about actions. In these models, actions are represented using operators derived from STRIPS (Fikes and Nilsson 1971) and NOAH (Sacerdoti 1977).</Paragraph> <Paragraph position="1"> Such operators include: a header, specifying the action and its parameters; a precondition list, specifying the conditions that must be true for the action to be performed; 8 a body, specifying how the action is to be performed; and an effects list, specifying the conditions that will hold after the action is performed. Under these models, reasoning about plans involves reasoning according to rules that derive from the components of the plan operators. For example, Allen (1983) introduces a precondition-action rule stating that if agent G wants to achieve proposition P and P is a precondition of an act ACT, then G may want to perform ACT. This rule is used for plan recognition. The corresponding rule for plan construction states that if agent G wants to execute ACT, then G may want to ensure that precondition P is satisfied. Heuristics, derived from both planning and natural language principles, are used to guide the application of the rules to recognize (or construct) the best possible plan accounting for an agent's observations (or desired effects).</Paragraph> <Paragraph position="2"> 8 The preconditions may also be supplemented by a list of applicability conditions, specifying the conditions under which it is reasonable to pursue the action, and a list of constraints specifying restrictions on instantiations of the operator's parameters (Litman and Allen 1987; Carberry 1987). Lambert and Carberry's (1991, 49) Build-Plan operator.</Paragraph> <Paragraph position="3"> In these more traditional approaches, the inference rules are typically expressed in terms of the beliefs and goals of the speaker and hearer. Allen's (1983, 126) precondition-action rule, for example, is represented as: SBAW(P) Di SBAW(ACT) -- if P is a precondition of action ACT where SBAW(P) represents that the inferring agent S believes that agent A wants P. As Pollack (1986b, 1990) has noted, however, these mental attitudes are typically transparent to the reasoning process. The system reasons about the action operators themselves--in this case whether P is a precondition of ACT--and essentially ignores tile mental attitudes in the rules; they are simply carried forward from antecedent to consequent. Pollack has dubbed these approaches data-structure approaches because of their focus on the action operators themselves, rather than on the mental attitudes ti'lat are required for planning and plan recognition.</Paragraph> <Paragraph position="4"> In more recent work, the action operators have come to incorporate many more requirements on the agents' mental states. For example, Lambert and Carberry's operator for the action of building a plan is given in Figure 9. The Build-Plan operator is used to represent the process by which two agents build a plan for one of them to perform an action. The preconditions of the operator specify requirements on the agents' mental states, e.g., that the agents know the referents of the subactions that one of them needs to perform to accomplish the overall action. The main difference between this type of approach and the SharedPlan approach discussed in this paper is in the focus of the representation.' The representation in Figure 9 specifies requirements on performing an action, some of which are requirements on mental states.</Paragraph> <Paragraph position="5"> The SharedPlan definition in Figure 5 specifies requirements on mental states, some of which refer to actions and their decompositions. One might thus think of the representation in Figure 9 as being &quot;inside-out&quot; from that in Figure 5. Because the focus of the representation in Figure 9 remains on the action and its decomposition, we continue to refer to these types of approaches as data-structure approaches. We reserve tlhe term mental phenomenon approach for those approaches, such as SharedPlans Lochbaum A Collaborative Planning Model and Pollack's individual plans (Pollack 1986b, 1990; Grosz and Kraus 1996), that take mental states to be primary. We will return to the implications of representations such as those in Figure 9 in Section 8.2.</Paragraph> <Paragraph position="6"> The process of reasoning with SharedPlans differs significantly from the process of reasoning with plan operators. Under the SharedPlan approach, agents engaged in discourse are taken to be collaborating on performing some action or on achieving some state of affairs. Each agent brings to their collaboration different beliefs about ways in which to achieve their goal and the actions necessary for doing so. Each agent may have incomplete or incorrect beliefs. In addition, their beliefs about each other's beliefs and capabilities to act may be incorrect. The participants use the discourse to communicate their individual beliefs and to establish mutual ones. Under the SharedPlan approach, the utterances of a discourse are thus understood as contributing information toward establishing the beliefs and intentions that are required for successful collaboration. These beliefs and intentions are summarized by the full SharedPlan definition in Figure 5 and form the basis for the discourse participants' utterances.</Paragraph> <Paragraph position="7"> Until the agents have established all of the requirements of a full SharedPlan, they will have a partial SharedPlan. The agents' partial SharedPlan evolves over the course of the agents' discourse as they communicate about the actions they will perform, the effects of those actions as they perform them, and the need to revise their plans when things do not proceed as expected. The agents' partial SharedPlan is thus always in a state of flux. At any given point in the agents' discourse, however, it represents the current state of the agents' collaboration. It thus indicates those beliefs and intentions that have been established at that point in the discourse, as well as those that remain to be established over the course of the remaining discourse. The agents' partial SharedPlan thus serves to delineate the information that the agents must consider in interpreting each other's utterances and in determining what they themselves should do or say next. For the agents' utterances to be coherent, they must advance the agents' partial SharedPlan towards completion by helping to establish the &quot;missing&quot; beliefs and intentions.</Paragraph> <Paragraph position="8"> The concept of plan augmentation thus provides the basis for our model of discourse processing. Under this approach, discourse participants' utterances are understood as augmenting the partial SharedPlan that represents the state of their collaboration. Figure 10 provides a high-level specification of this process? It is based on the assumption that agents G1 and G2 are collaborating on an act o~ and models Gl's reasoning in that regard. It thus stipulates how Gl's beliefs about the agents' partial SharedPlan are augmented over the course of the agents' discourse. ~deg It is important to emphasize here that SharedPlans are complex structures that are distributed in nature. The full SharedPlan for a group activity does not, typically, reside in any single agent's mind, nor is there any notion of a group mind in which the SharedPlan resides. Rather, the beliefs and intentions that form a SharedPlan are distributed among the individual minds of the participating agents. Each agent has individual beliefs about its capabilities to act, as well as individual intentions to do so. In addition, agents have commitments towards other agents' abilities to act (represented by intentions that, Int.Th) and mutual beliefs about others' capabilities and 9 The details of this process differ significantly from that described in a previous paper (Lochbaum, Grosz, and Sidner 1990). 10 For expository purposes, we will take G1 to be male and G2 to be female. We have omitted the time and recipe arguments from the PSP specification for simplicity of exposition and will continue to do so subsequently when they are not at issue.</Paragraph> <Paragraph position="9"> Computational Linguistics Volume 24, Number 4 Assume: PSP({G1, G2}, o~), G1 is the agent being modeled.</Paragraph> <Paragraph position="10"> Let Prop be the proposition communicated by G2's utterance/4.</Paragraph> <Paragraph position="11"> 1. As a result of the communication, G1 assumes MB({G1, G2}, BEL(G2, Prop)).</Paragraph> <Paragraph position="12"> 2. G1 must then determine the relationship of Prop to the current SharedPlan context: (a) If G1 believes that/d or Prop indicates the initiation of a subsidiary SharedPlan for an act fl, then G1 will i. ascribe Int.Th(G2, FSP({G1, G2}, fl)), ii. determine if he is also willing to adopt such an intention.</Paragraph> <Paragraph position="13"> (b) If G1 believes that U or Prop indicates the completion of the current SharedPlan, then G1 will i. ascribe BEL(G2, FSP({G1, G2},a)), ii. determine if he also believes the agents' current SharedPlan to be complete.</Paragraph> <Paragraph position="14"> (c) Otherwise, G1 will i. ascribe to G2 a belief that Prop is relevant to the agents' current SharedPlan, ii. determine if he also believes that to be the case.</Paragraph> <Paragraph position="15"> 3. (a)If Step (2) is successful, then G1 will signal his agreement (possibly implicitly) an/:l assume mutual belief of the inferred relationship in (2a), (2b), or (2c) as appropriate, updating his view of the agents' PSPs in theprocess.</Paragraph> <Paragraph position="16"> (b)Otherwise, G1 will query G2 or communicate his dissent.</Paragraph> <Paragraph position="17"> Figure 10 The SharedPlan augmentation process.</Paragraph> <Paragraph position="18"> commitments. The combination of mutual belief and intention is sufficient to model collaboration. No notion of irreducible joint intention (as in Searle's \[1990\] work), or any other attitude that would refer to a group mind is necessary (Grosz and Kraus 1996).</Paragraph> <Paragraph position="19"> The processing outlined in Figure 10 assumes that agent G2 has just communicated an utterance/d with propositional content Prop. u To make sense of this utterance, G1 must determine how Prop contributes to the agents' PSP for o~. In some cases, the linguistic signal/,/ may aid in this process. As indicated in Figure 10, Prop may be interpreted in one of three basic ways. It may indicate the initiation of a subsidiary SharedPlan (Case (a) of Step (2)), signal the completion of the current SharedPlan (Case (b)), or contribute to it (Case (c)). In each of these cases, G1 first ascribes a particular mental attitude to G2 on the basis of her utterance (Step (i) in each case) and then reasons about the relevance of that mental attitude to the agents' PSP (Step (ii)). If G1 is able to make sense of the utterance in this way, he then updates his beliefs about the agents' PSP to reflect their mutual belief of the inferred contribution of Prop (Step (3a)). Otherwise, if G1 does not understand the relevance of G2&quot;s utterance, or 11 The recognition of propositional content from surface form has been studied by other researchers (e.g., Allen and Perrault \[1980\], Litman and Allen \[1987\], Lambert and Carberry \[1991\]) and is not discussed in this paper.</Paragraph> <Paragraph position="20"> Lochbaum A Collaborative Planning Model disagrees with it, he may simply communicate his dissent to G2 or query her further (Step (3b)).</Paragraph> <Paragraph position="21"> In Case (a) of Step (2), Prop indicates G2&quot;s intention that the agents collaborate on an act ft. G1 first ascribes this intention to G2 and then tries to explain it in the context of the agents' PSP for a. If G1 believes that the performance of fl will contribute to the agents' performance of a, and is willing to collaborate with G2 in this regard, then G1 will adopt an intention similar to that of G2&quot;s and agree to the collaboration. This process is modeled by Step (2aii). On the basis of his reasoning, G1 will also update his view of the agents' PSP to reflect that fl is an act in the agents' recipe for a for which the agents will form a SharedPlan. This behavior is modeled by Step (3a) of the augmentation process. In this step, agent G1 updates his view of the agents' partial plan to reflect their mutual belief of the communicated information.</Paragraph> <Paragraph position="22"> In Case (b) of Step (2), Prop indicates G2&quot;s belief that the SharedPlan on which the agents are currently focused is complete. This SharedPlan may represent the agents' primary collaboration or a subsidiary one. In either case, G1 must determine if he also believes the agents to have established all of the beliefs and intentions required for them to have a full SharedPlan for a. If he does, then he will agree with G2 and update his view of the agents' PSP for a to reflect that it is complete.</Paragraph> <Paragraph position="23"> Case (c) of Step (2) is the default case. If G1 does not believe that Prop indicates the initiation or completion of a SharedPlan, then he will take it to contribute to the agents' current SharedPlan in some way. G1 will first ascribe this belief to G2 and then reason about the specific way in which Prop contributes to the agents' PSP for a. If he is successful in this regard, he will indicate his agreement with G2 and then update his view of the agents' PSP to reflect this more specific relationship.</Paragraph> <Paragraph position="24"> Figure 10 provides a high-level specification of the use of SharedPlans in interpretation. In Section 6 we will provide algorithms for further modeling two of the steps in this process, while in Section 10, we will discuss the use of SharedPlans in generation. The main focus of this paper, however, is on modeling the intentional structure of discourse. In the next section, we thus provide a model of that structure. We then show how the model of utterance interpretation presented in Figure 10 can be mapped to the problem of recognizing intentional structure and utilizing it in discourse processing. 4. Grosz and Sidner's Theory of Discourse Structure According to Grosz and Sidner's (1986) theory, discourse structure is comprised of three interrelated components: a linguistic structure, an intentional structure, and an attentional state. The linguistic structure is a structure that is imposed on the utterances themselves; it consists of discourse segments and embedding relationships among them. The linguistic structure of the sample dialogues in Section 1 is indicated by the bold rule grouping utterances into segments.</Paragraph> <Paragraph position="25"> The intentional structure of discourse consists of the purposes of the discourse segments and their interrelationships. Discourse segment purposes or DSPs are intentions that lead to the initiation of a discourse segment. DSPs are distinguished from other intentions by the fact that they, like certain utterance-level intentions described by Grice (1969), are intended to be recognized. There are two types of relationships that can hold between DSPs, dominance and satisfaction-precedence. One DSP dominates another if the second provides part of the satisfaction of the first. That is, the establishment of the state of affairs represented by the second DSP contributes to the establishment of the state of affairs represented by the first. This relationship is reflected by a corresponding embedding relationship in the linguistic structure. One DSP satisfaction-precedes another if the first must be satisfied before the second. This Computational Linguistics Volume 24, Number 4 (x) (2) V</Paragraph> <Paragraph position="27"> Modeling intentional structure.</Paragraph> <Paragraph position="28"> relationship is reflected by a corresponding sibling relationship in the linguistic structure. null The attentional state component of discourse structure serves as a record of those entities that are salient at any point in the discourse; it is modeled by a stack of focus spaces. With each new discourse segment, a new focus space is pushed onto the stack (possibly after other focus spaces are first popped off), and the objects, properties, and :relations that become salient during the segment are entered into it, as is the segment's DSP. One of the primary roles of the focus space stack is to constrain the range of DSPs to which a new DSP can be related; a new DSP can only be dominated by a DSP in some space on the stack. Once a segment's DSP is satisfied, the segment's focus space :is popped from the stack.</Paragraph> </Section> <Section position="5" start_page="539" end_page="542" type="metho"> <SectionTitle> 5. A SharedPlan Model of Intentional Structure </SectionTitle> <Paragraph position="0"> Figure 11 illustrates the role of SharedPlans in modeling intentional structure. As indicated in the figure, we take each segment of a discourse to have an associated SharedPlan. The purpose of the segment is taken to be an intention that (Int.Th) the discourse participants form that plan. This intention is held by the agent who initiates the segment. Following Grosz and Sidner (1986), we will refer to that agent as the ICP for initiating conversational participant; the other participant is the OCP. DSPs are thus represented as intentions of the form Int.Th(ICP, FSP({ICP, OCP}, fl)) in our model.</Paragraph> <Paragraph position="1"> Relationships between DSPs derive from relationships between the corresponding SharedPlans. For example, a satisfaction-precedence relationship between DSPs corresponds to a temporal dependency between SharedPlans) 2 When one DSP satisfaction- null Lochbaum A Collaborative Planning Model precedes another, the SharedPlan used to model the first must be completed before the SharedPlan used to model the second. Dominance relationships between DSPs depend upon subsidiary relationships between the corresponding SharedPlans. In Section 3, we used the term subsidiary SharedPlan to indicate a subordinate relationship between SharedPlans. More generally, one plan is subsidiary to another if the completion of the first plan establishes one of the beliefs or intentions required for the agents to have the second plan. One plan is thus subsidiary to another if the completion of the first plan contributes to the completion of the second.</Paragraph> <Paragraph position="2"> The utterances of a discourse are understood in terms of their contribution to the SharedPlans associated with the segments of the discourse. Those segments that have been completed at the time of processing an utterance have a full SharedPlan associated with them (e.g., segment (2) in Figure 11), while those that have not have a partial SharedPlan (e.g., segments (1) and (3) in Figure 11).</Paragraph> <Section position="1" start_page="540" end_page="542" type="sub_section"> <SectionTitle> 5.1 Dialogue Analyses </SectionTitle> <Paragraph position="0"> We now return to the dialogues in Section 1 to illustrate the use of SharedPlans in modeling intentional structure. In this section of the paper, we simply describe the intentional structure representations for these examples. In Section 6, we describe the process by which these structures may be recognized and reasoned with.</Paragraph> <Paragraph position="1"> 5.1.1 Example 1: Subtask Subdialogues. The overall purpose of the dialogue in Figure 1 may be represented as: 13</Paragraph> <Paragraph position="3"> &quot;E intends that the agents collaborate to replace the pump and belt of the air compressor, acl.&quot; The circumstances surrounding this dialogue are such that only the Apprentice is physically capable of performing actions; the Expert is in another room and can only instruct the Apprentice as to which actions to perform. Both of the agents participate in the act of replacing the pump and belt of the air compressor, though each agent brings different skills to the task. The Expert provides the expertise, while the Apprentice provides the manual dexterity. Thus, the agent specification of the FSP in DSP1 includes both the Expert and the Apprentice, while only the Apprentice is the agent of the replace act itself.</Paragraph> <Paragraph position="4"> The purpose of the first subdialogue in Figure 1 may be represented as: DSP2 = Int. Th (a, FSP( {e, a }, remove(belt(acl ), {a }))) &quot;A intends that the agents collaborate to remove the belt of the air compressor.&quot; while the purpose of the second subdialogue may be represented as</Paragraph> <Paragraph position="6"> &quot;E intends that the agents collaborate to remove the pump of the air compressor.&quot; The SharedPlans used to model DSP2 and DSP3 are subsidiary to that used to model DSP1 by virtue of the subsidiary plan requirement of the SharedPlan definition.</Paragraph> <Paragraph position="7"> As shown in Clauses (2aii) and (3aii) of the definition in Figure 5, an FSP for an act oz includes as components full plans for each subact in oz's recipe. A plan for one of the subacts fli thus contributes to the FSP for o~, and is therefore subsidiary to it. Because the tasks of removing the air compressor's belt and pump are subtasks of the act of replacing the belt and pump, the SharedPlans to perform those subtasks are subsidiary to the SharedPlan of the main task. DSP1 thus dominates both DSP2 and DSP3.</Paragraph> <Paragraph position="8"> 13 We follow the Prolog convention of specifying variables using initial uppercase letters and constants using initial lowercase letters. Computational Linguistics Volume 24, Number 4</Paragraph> <Paragraph position="10"> A recipe for modifying a network.</Paragraph> <Paragraph position="11"> 5.1.2 Example 2: Correction Subdialogues. The overall purpose of the dialogue in Figure 2 may be represented as: DSP4 = Int.Th(u, FSP( {u, s}, modify_network(NetPiece, Loc, {u, s}))) &quot;U intends that the agents collaborate to modify the piece of network displayed at some screen location.&quot; Figure 12 contains one possible recipe for the act modify_network(NetPiece, Loc, G, T). 14 The recipe requires that an agent display a piece of a network and then put some new data at some screen location. The constraints of the recipe require that the screen location be empty and that there be enough free space for the data at that location. The purpose of the subdialogue in Figure 2 may be represented as: 15 DSP5 =In t. Th ( u, F S P ( ( u, s }, Achieve (freespace.for( Data, below(gel ) ), { u, s }))) &quot;U intends that the agents collaborate to free up some space below the employee concept.&quot; The SharedPlan used to model DSP5 is subsidiary to that used to model DSP4 by virtue of the ability operator BCBA. As discussed in Section 2, an agent G's ability to perform an act fl depends in part on its ability to satisfy the constraints of its recipe for ft. A plan to satisfy one of the constraints thus contributes to the plan for fl and is therefore subsidiary to it. Because the condition freespace_for(Data, Loc) is a constraint in the recipe for modify_network(NetPiece, Loc, G, T), the SharedPlan in DSP5 to free up space on the screen is subsidiary to the SharedPlan in DSP4 to modify the network.</Paragraph> <Paragraph position="12"> DSP4 thus dominates DSPs.</Paragraph> <Paragraph position="13"> 5.1.3 Example 3: Knowledge Precondition Subdialogues. The overall purpose of the dialogue in Figure 3 may be represented as: DSP6 = Int.Th(nm, FSP( {nm, np}, maintain(node39, {nm, np}))) &quot;NM intends that the agents collaborate to maintain node39 of the local computer network.&quot; The purpose of the first subdialogue in Figure 3 may be represented as:</Paragraph> <Paragraph position="15"> &quot;NP intends that the agents collaborate to obtain a recipe for maintaining node39.&quot; The SharedPlan used to model DSP7 is subsidiary to the SharedPlan used to model DSP6 by virtue of the recipe requirement of the SharedPlan definition. As shown in Clause (1) of the definition in Figure 5, for a group of agents to have an FSP for an act o~, they must have mutual belief of a recipe for o~. The SharedPlan in DSP7 to obtain Lochbaum A Collaborative Planning Model a recipe for maintaining node39 thus contributes to the SharedPlan in DSP6 to do the maintenance and is therefore subsidiary to it. As a result, DSP6 dominates DSP7. The second subdialogue in Figure 3 is concerned with identifying a parameter of an act. The purpose of this subdialogue may be represented as:</Paragraph> <Paragraph position="17"> &quot;NM intends that the agents collaborate to obtain a suitable description of the ToNode parameter of the divert_traffic act.&quot; The SharedPlan used to model DSPs is subsidiary to that used to model DSP6 by virtue of the ability operator BCBA. As discussed in Section 2, an agent G's ability to perform an act fl depends in part on its ability to identify the parameters of ft. At this point in the agents' discourse, NM and NP have agreed that the acts divert_traFfic(node39, ToNode, G1) and replace_switch(node39, Switch Type, G2) will be part of their recipe for maintaining node39. Because the agents' recipe for maintaining node39 includes the act of diverting network traffic, the SharedPlan in DSP8 to identify the ToNode parameter of the divert_traffic act contributes to the SharedPlan in DSP6 to maintain node39. DSP6 thus dominates DSP8.</Paragraph> <Paragraph position="18"> The SharedPlan in DSP8 is not subsidiary to the SharedPlan in DSP7, because, id.params is not a requirement of has.recipe. As we argued in Section 2.1, knowing a recipe for an act should not require identifying the parameters of the act or the acts in its recipe. However, because an agent must have a recipe in mind before it can be concerned with identifying the parameters of the acts in that recipe, the SharedPlan in DSP7 must be completed before the SharedPlan in DSPs. 16 DSP7 thus satisfaction-precedes DSP8.</Paragraph> </Section> </Section> <Section position="6" start_page="542" end_page="565" type="metho"> <SectionTitle> 6. Reasoning with Intentional Structure </SectionTitle> <Paragraph position="0"> Intentional structure plays a central role in discourse processing. For each utterance of a discourse, an agent must determine whether the utterance begins a new segment of the discourse, completes the current segment, or contributes to it (Grosz and Sidner 1986). If the utterance begins a new segment of the discourse, the agent must recognize the DSP of that segment, as well as its relationship to the other DSPs underlying the discourse and currently in focus. If the utterance completes the current segment, the agent must come to believe that the DSP of that segment has been satisfied. If the utterance contributes to the current segment, the agent must determine the effect of the utterance on the segment's DSP.</Paragraph> <Paragraph position="1"> We now show how the SharedPlan reasoning presented in Section 3 may be mapped to the problem of recognizing and reasoning with intentional structure. Step (2) of the augmentation process in Figure 10 is divided into three cases based upon the way in which an utterance affects the SharedPlans underlying a discourse. An utterance may indicate the initiation of a subsidiary SharedPlan (Case (2a)), the completion 16 There are several means by which an agent can determine a recipe for an act c~. If an agent chooses a recipe for c~ from some type of manual (e.g., a cookbook), then the agent will have a complete recipe for c~ before identifying the parameters of c~'s constituent acts. On the other hand, when being told a recipe for c~ by another agent, the ignorant agent may interrupt and ask about a parameter of a constituent act before knowing all of the constituent acts. In this case, the agent may have only a partial recipe for c~ before identifying the parameters of the acts in that partial recipe. Thus, if fli is an act in c~'s recipe, a discourse segment concerned with identifying a parameter of fli could be linguistically embedded within a segment concerned with obtaining a recipe for c~. This case poses interesting questions for future research regarding the relationship between the two segments' DSPs. intentional structure is currently in focus.</Paragraph> <Paragraph position="2"> Let Prop be the proposition communicated by G2's utterance/d.</Paragraph> <Paragraph position="3"> 2. G1 must then determine the relationship of Prop to S: (a)Does ld or Prop indicate the initiation of a new discourse segment? If G1 believes that/d or Prop indicates the initiation of a subsidiary SharedPlan for an act fl, then i. G1 believes that the DSP of the new segment is Int.Th(G2, FSP({Gi, G~ }, fl)).</Paragraph> <Paragraph position="4"> ii. G1 explains the new segment by determining the relationship of the SharedPlan in (i) to the SharedPlans maintained in S.</Paragraph> <Paragraph position="5"> (b)Does bl or Prop indicate the completion of the current discourse segment? If G1 believes that/d or Prop indicates the satisfaction of DSP~, then i. G~ believes that G2 believes DSc is complete.</Paragraph> <Paragraph position="6"> ii. If G1 believes that the agent s' PSP for a is complet% then G1 will also believe that DSPc has been satisfied and thus DSC/ is complete. DSPC/ is thus popped from S.</Paragraph> <Paragraph position="7"> (c) Does Prop contribute to the current discourse segment? Otherwise, G1 will i. ascribe to G~ a belief that Prop contributes to the agents' PSP for ii. determine if he also believes that to be the case.</Paragraph> <Paragraph position="8"> Figure 13 Step (2) of the augmentation process.</Paragraph> <Paragraph position="9"> of the current SharedPlan (Case (2b)), or its continuation (Case (2c)). These three cases :may be mapped to the problem of determining whether an utterance begins a new segment of the discourse, completes the current segment, or contributes to it. In Figure 13, we have recast Step (2) of the augmentation process to reflect this use.</Paragraph> <Paragraph position="10"> The augmentation process in Figure 13 specifies the process by which agent G1 makes sense of agent G2's utterances given the current discourse context. We use a stack of SharedPlans S to model this context. The stack corresponds to that portion of the intentional structure that is currently in focus. It thus mirrors the attentional state component of discourse structure and contains PSPs corresponding to discourse segments that have not yet been completed. Because the augmentation process depends most heavily upon the SharedPlans that are used to represent DSPs, it simply makes use of the SharedPlans themselves, rather than the full intentions. The full intentions are easily recoverable from the stack representation.</Paragraph> <Paragraph position="11"> Case (2a) in Figure 13 models the recognition of new discourse segments and their purposes. If G1 believes that G2's utterance indicates the initiation of a new SharedPlan, then G1 will take G2 to be initiating a new discourse segment with her utteranceJ 7 Gt first ascribes this intention to G2 (Step (2ai)) and then tries to explain it given the 17 As discussed in Section 7.3, the DSP of the new segment may be only abstractly specified at this point. Lochbaum A Collaborative Planning Model current discourse context (Step (2aii)). Whereas at the utterance level, a hearer must explain why a speaker said what he did (Sidner and Israel 1981), at the discourse level, an OCP must explain why an ICP engages in a new discourse segment at a particular juncture in the discourse. The latter explanation depends upon the relationship of the new segment's DSP to the other DSPs underlying the discourse. In Step (2aii) of the augmentation process, G1 must thus determine whether the new SharedPlan would contribute to the agents' SharedPlan for o~ or to some other plan on the stack S. If the new SharedPlan does not contribute to any of the plans on the stack, then it is taken as an interruption. If it does not contribute to the agents' SharedPlan for o~, but to another plan on the stack, one for 7 say, then G1 must also determine whether the plans that are above 7 on the stack have been completed.</Paragraph> <Paragraph position="12"> Case (2b) in Figure 13 models the recognition of a segment's completion. If G1 believes that G2's utterance signals the completion of the current segment, then G1 must reason whether he too believes the segment to be complete. For that to be the case, G1 must believe that all of the beliefs and intentions required of an FSP have been established over the course of the segment. The completion of a segment may be signaled in either the linguistic structure or the intentional structure. For example, in the linguistic structure, cue phrases such as &quot;but anyway&quot; may indicate the satisfaction of a DSP (as well as a pop of the focus space stack). In the intentional structure, the completion of a segment may be signaled by the initiation of a new SharedPlan, as described above.</Paragraph> <Paragraph position="13"> Case (2c) models the recognition of an utterance's contribution to the current discourse segment. When a speaker produces an utterance within a segment, a hearer must determine why the speaker said what he did. Step (2c) models the hearer's reasoning by trying to ascribe appropriate beliefs to the speaker. These beliefs are ascribed based on the hearer's beliefs about the state of the agents' SharedPlans and the steps necessary to complete them.</Paragraph> <Section position="1" start_page="544" end_page="548" type="sub_section"> <SectionTitle> 6.1 Modeling the Plan Augmentation Process </SectionTitle> <Paragraph position="0"> Figure 13 contains a high-level specification of the process of reasoning with intentional structure. It provides a framework in which to develop further mechanisms for modeling the various steps of this process. In this section, we present two such mechanisms. The first mechanism presents a method for recognizing the initiation of a new discourse segment (Step (2a) in Figure 13); the second describes an algorithm for reasoning about the contribution of an utterance to the current segment (Step (2c)).</Paragraph> <Paragraph position="1"> These two mechanisms are central to the augmentation process, but are not complete; they each model just one aspect of their respective steps of the process. The complete specification of these steps, as well as that of the augmentation process in general, requires further research, as is discussed in Section 10.</Paragraph> <Paragraph position="2"> 6.1.1 Case (2a): Initiating a New Discourse Segment. Step (2ai) of the augmentation process involves recognizing agent G2's intention that G1 and G2 form a full SharedPlan for an act ft. This intention may be recognized using a conversational default rule, CDRA, shown in Figure 14. TM The antecedent of this rule consists of two parts: (la) G1 must believe that G2 communicated her desire for the performance of act fl to G1, and (lb) G1 must believe that G2 believes they can together perform ft. The second condition precludes the case where G2 is stating her desire to perform the act herself 18 This rule extends Grosz and Sidner's (1990) original conversational default rule, CDR1.</Paragraph> <Paragraph position="3"> or for G1 to perform the act. If conditions (la) and (lb) are satisfied, then in the absence of evidence to the contrary, G1 will believe that G2 intends that they form a full SharedPlan for ft.</Paragraph> <Paragraph position="4"> As given in Figure 14, CDRA is used to recognize an agent's intention based upon its desire for the performance of a particular act ft. The rule may also be used when an agent expresses its desire for a particular state of affairs P. In this case, the expressions OCCUrS(fl) 19 and fl are replaced in Figure 14 by P and Achieve(P, {G1, G2}, T) respectively.</Paragraph> <Paragraph position="5"> augmentation process involves recognizing an utterance's contribution to the current SharedPlan. The SharedPlan definitions place requirements on recipes, abilities, plans, and commitments. A SharedPlan may thus be affected by utterances containing a variety of information. We will focus here, however, on utterances that communicate information about a single action fl that can be taken to play a role in the recipe of the agents' plan for o~. We thus do not deal with utterances concerning warnings (e.g., &quot;Do not clog or close the stem vent under any circumstances&quot; \[Ansari 1995\]) or utterances involving multiple actions that are related in particular ways (e.g., &quot;To reset the printer, flip the switch.&quot; \[Balkanski 1993\]).</Paragraph> <Paragraph position="6"> As with the other cases of Step (2) of the augmentation process, Step (i) of Case (c) involves ascribing a particular belief to agent G2 regarding the relationship of her utterance to the agents' plans. For the types of utterances we are considering here, this belief is concerned with the relationship of the act fl to the objective of the agents' current plan, i.e., o~. In particular, G2's reference to fl is understood as indicating belief of a Contributes relation between fl and oL. Contributes holds of two actions if the performance of the first action plays a role in the performance of the second action (Lochbaum, Grosz, and Sidner 1990; Lochbaum 1994). It is defined as the transitive closure of the D(irectly)-Contributes relation. One act D-Contributes to another if the first act is an element of the second act's recipe (Lochbaum, Grosz, and Sidner 1990; Lochbaum 1994). 2o An agent ascribes belief in a Contributes relation irrespective of his own beliefs about this relationship. Once he has ascribed this belief, he then reasons about whether ihe also believes fl to contribute to o~ and in what way. Step (2cii) of the augmentation process corresponds to this reasoning. To model this step, we introduce an algorithm based on the construction of a dynamic recipe representation called a recipe graph 19 The predicate occurs(fl) is true if fl was, is, or will be performed at the time associated with fl as one of its parameters (Balkanski 1993).</Paragraph> <Paragraph position="7"> 20 The term &quot;contributes&quot; is overloaded in this paper. The use of Contributes here refers to a relation between actions. Grosz and Sidner (1986) also describe a contributes relation between DSPs that is the inverse of the dominates relation. In addition, we have been using contributes informally to refer to the inverse of a subsidiary relationship between plans.</Paragraph> <Paragraph position="8"> Rgraphs result from composing recipes. Whereas a recipe includes only one level of action decomposition, an rgraph may include multiple levels. On analogy with parsing constructs, one can think of a recipe as being like a grammar rule, while an rgraph is like a (partial) parse tree. 2a Whereas a recipe represents information about the abstract performance of an action, an rgraph represents more specialized information by including instantiations of parameters, agents, and times, as well as multiple levels of decomposition. The graphical representations in Figure 15 contrast the structure of these two constructs.</Paragraph> <Paragraph position="9"> The construction of an rgraph corresponds to the reasoning that an agent performs in determining whether or not the performance of a particular act fl makes sense given the agent's beliefs about recipes and the state of its individual and shared plans. The process of rgraph construction can thus be used to model the process by which agent G1 explains G2's presumed belief in a Contributes relation. In explaining this belief, however, G1 must reason about more than just the agents' immediate SharedPlan. In 2l This terminology was chosen to parallel Kautz's. He uses the term explanation graph or egraph for his representation relating event occurrences (Kautz 1990). A comparison of our representation and algorithms with Kautz's can be found elsewhere (Lochbaum 1991, 1994). In short, Kautz's work is based on assumptions that are inappropriate for collaborative discourse. In particular, Kautz assumes a model of keyhole recognition (Cohen, Perrault, and Allen 1982) in which one agent is observing another agent without that second agent's knowledge. In such a situation, only actual event occurrences performed by a single agent are reasoned about; Kautz's representation and algorithms include no means for reasoning about hypothetical, partially specified, or multiagent actions. In addition, in keyhole recognition, no assumptions can be made about the interdependence of observed actions. Because the agent is not aware that it is being observed, it does not structure its actions to facilitate the recognition of its motives. A separate egraph must thus be created for each observation. 22 Barrett and Weld (1994) and Vilain (1990) provide further discussion of the use of parsing in planning and plan recognition.</Paragraph> <Paragraph position="10"> G1 is the agent being modeled, R~ is the set of recipes that G1 knows for a, H is an rgraph explaining the acts underlying the discourse up to this point, /3 is the act referred to by G2.</Paragraph> <Paragraph position="11"> 0. Initialize Hypothesis: If fl is the first act to be explained in the context of PSP({G1, G2}, a), expand H by choosing a recipe from R~ and adding it to the rgraph.</Paragraph> <Paragraph position="12"> 1. Isolate Recipe: Let r be the subtree rooted at a in H.</Paragraph> <Paragraph position="13"> 2. Select Act: Choose an act fll in r such that fli can be identified with fl and has not previously been used to explain another act. If no such act exists, then fail. Otherwise, let r I be the result of identifying fl with fll in r.</Paragraph> <Paragraph position="14"> 3. Update Hypothesis: Let e = constraints(r') U constraints(H). If e is satisfiable, replace the subtree r in H by r', otherwise, fail.</Paragraph> <Paragraph position="15"> The rgraph construction algorithm.</Paragraph> <Paragraph position="16"> particular, he must also take into account any other collaborations of the agents, as well as any individual plans of his own. In so doing, G1 verifies that fl is compatible with the rest of the acts the agents have agreed upon, as well as those G1 intends to perform himself. 23 The rgraph construction algorithm is given in Figure 16. It is based on the assumption that agents G1 and G2 are collaborating on an act a and models Gl's reasoning concerning G2&quot;s reference to an act ft. While PSP({G1, G2}, oL) provides the immediate context for interpreting G2&quot;s utterance, an rgraph H models the remaining context established by the agents' dialogue. H represents Gl'S hypothesis as to how all of the acts underlying the agents' discourse are related. To make sense of G2's utterance concerning t, G1 must determine whether fl directly contributes to a while being consistent with H. Steps (1) and (2) of the algorithm model the immediate explanation of t, while Step (3) ensures that this explanation is consistent with the rest of the rgraph. The algorithm in Figure 16 is nondeterministic. Step (0) involves choosing a recipe from G(s recipe library, while Step (2) involves choosing an act from that recipe. The failures in Steps (2) and (3) do not imply failure of the entire algorithm, but rather failure of a single nondeterministic execution.</Paragraph> <Paragraph position="17"> In Step (0) of the algorithm, Gl'S hypothesis rgraph is initialized to some recipe that he knows for a. As will be discussed in Section 6.2.3, this recipe may involve physical actions, such as those involved in lifting a piano, as well as information-gathering actions, such as those involved in satisfying a knowledge precondition. At the start of the agents' collaboration, G1 may or may not have any beliefs as to how the agents will perform a. If he believes that the agents will use a particular recipe, the ihypothesis rgraph is initialized to that recipe. Otherwise, a recipe is selected arbitrarily from Gl&quot;s recipe library. The initial hypothesis will be refined, and possibly replaced, on the basis of G2's utterances.</Paragraph> <Paragraph position="18"> In Step (1) of the algorithm, the recipe for a is first isolated from the remainder of the rgraph. This recipe, r, represents Gl'S current beliefs as to how the agents are going to perform a. Step (2) of the algorithm involves identifying fl with a particular Lochbaum A Collaborative Planning Model act fli in r resulting in a new recipe r'. If an appropriate fli can be found, it provides an explanation for G2's reference to the act ft. If an appropriate fli cannot be found, then r cannot be the recipe that G2 has in mind for performing oz. The algorithm thus fails in this case and backtracks to select a different recipe for 0~. The new recipe must account for fl as well as all of the other acts previously accounted for by r.</Paragraph> <Paragraph position="19"> Step (3) of the algorithm ensures that the recipe and act chosen to account for a and fl are compatible with the other acts the agents have already discussed in support of a or the objectives of their other plans. This is done by adding the constraints of the recipe r t to the constraints of the rgraph H and checking that the resulting set is satisfiable. For G1 to agree to the performance of t, the recipe r t must be both internally and externally consistent. That is, the constraints of the recipe must be consistent themselves, as well as being consistent with the constraints of the recipes that G1 believes the agents will use to accomplish their other objectives. 24 The rgraph construction algorithm fails to produce an explanation for an act fl in the context of a PSP for a if the algorithm fails for all of the nondeterministic possibilities. This failure corresponds to a discrepancy between agent Gl's beliefs and those G1 has attributed to agent G2. The failure thus indicates that further communication and replanning are necessary.</Paragraph> </Section> <Section position="2" start_page="548" end_page="559" type="sub_section"> <SectionTitle> 6.2 Dialogue Analyses </SectionTitle> <Paragraph position="0"> To further elucidate the augmentation process, we now return to the dialogues given in Section 1 and show that the processes presented in this section capture the properties highlighted by the informal analyses given in the Introduction. We present each analysis from the perspective of one of the two discourse participants. Each analysis thus indicates the type of reasoning that is required for a system to assume the role of that participant in the dialogue.</Paragraph> <Paragraph position="1"> 6.2.1 Example 1: Subtask Subdialogues. The dialogue in Figure 17 (repeated from Figure 1) contains two subtask subdialogues. In Section 1 we noted that an OCP must recognize the purpose underlying each subdialogue, as well as the relationship of each purpose to the preceding discourse, in order to respond appropriately to the ICP. The OCP's recognition of DSPs and their interrelationships is modeled by Case (2a) of the augmentation process in Figure 13. We illustrate its use by modeling the Apprentice's reasoning concerning the Expert's first utterance in the second segment in Figure 17, i.e., (2a) E: Now remove the pump.</Paragraph> <Paragraph position="2"> At this point in the agents' discourse, the stack S consists only of a PSP to replace the air compressor's pump and belt. This PSP corresponds to the overall discourse in Figure 17. The SharedPlan corresponding to the first embedded segment has been completed at this point in the discourse and is thus no longer in focus.</Paragraph> <Paragraph position="3"> 24 Another distinction between our work and Kautz's (1990) relates to Step (3) of the algorithm in Figure 16 and the use of constraints. Whereas rgraphs include an explicit representation of constraints, Kautz's egraphs do not. Constraints are used to guide egraph construction, but are not part of the representation itself. As a result, Kautz's algorithms can only check for constraint satisfaction locally. In our algorithm, that would correspond to checking the satisfiability of a recipe's constraints before adding it to an rgraph, but not afterwards. By checking the satisfiability of the constraint set that results from combining the recipe's constraints with the rgraph's constraints, the rgraph construction algorithm is able to detect unsatisfiability earlier than an algorithm that checks constraints only locally. In utterance (2a), the Expert expresses her desire that the action remove(pump(acl), {a}) be performed, where acl represents the air compressor the agents are working on.</Paragraph> <Paragraph position="4"> The Apprentice's reasoning concerning this utterance may be modeled using CDRA.</Paragraph> <Paragraph position="5"> Condition (la) of CDRA is satisfied by the communication of this utterance to the Apprentice. Condition (lb) is satisfied by the context surrounding the agents' collaboration. Because the Expert is in another room and can only instruct the Apprentice as to which actions to perform, the Expert's utterance cannot be expressing her intention to perform the desired action herself* In addition, because the Apprentice and Expert are both aware that the Apprentice does not have the necessary expertise to perform the action himself, the Apprentice can assume that the Expert must believe the agents can perform the act together, thus satisfying Condition (lb) and sanctioning the default conclusion, Bel(a, Int.Th(e, FSP({a, e}, remove(pump(acl), {a})))). Thus, on the basis of the Expert's utterance and her presumed beliefs concerning the agents' capabilities to act, the Apprentice may reason that the Expert is initiating a new discourse segment with this utterance. The purpose of this segment is recognized as: DSP3 =Int. Th (e, FSP({a, e}, remove(pump (acl), {a } ) ) ).</Paragraph> <Paragraph position="6"> Once the Apprentice recognizes the DSP of this new discourse segment, he must determine its relationship to the other DSPs underlying the discourse* Subsidiary relationships between plans provide the basis for modeling the Apprentice's reasoning* In particular, if the Apprentice believes that a plan for removing the pump would further some other plan of the agents', then he will believe that DSP3 is dominated by the DSP involving that other plan.</Paragraph> <Paragraph position="7"> As discussed in Section 5*1.1, the subsidiary relation in question in this example derives from the constituent plan requirement of the SharedPlan definition. The Apprentice will succeed in recognizing the relationship of the second subdialogue to the remainder of the discourse, if he believes that removing the pump of the air compressor could be an act in the agents' recipe for replacing its pump and belt. If the Apprentice does not have any beliefs about the relationship between these two acts, he may choose to assume the necessary D-Contributes relation on the basis of the Expert's utterance and the current discourse context, or he may choose to query the Expert further.</Paragraph> <Paragraph position="8"> The rgraph construction algorithm may be used to model the Apprentice's reasoning. In particular, Steps (1) and (2) of the algorithm in Figure 16 model the reasoning necessary for determining that a D-Contributes relation holds between two actions. If the OCP is able to infer such a D-Contributes relation, he will thus succeed in determining the subsidiary relationship necessary for explaining a subtask subdialogue. If the OCP is unable to infer such a relationship, then the algorithm will fail. This failure indicates that the OCP may need to query the ICP further about the appropriateness of her utterance. For example, as we noted in Section 1, if the OCP has reason to believe that the proposed subtask will not in fact play a role in the agents' overall task, then the OCP should communicate that information to the ICP. In addition, if the OCP has reason to believe that the performance of the subtask will conflict with the agents' other plans and intentions, then the OCP should communicate that information as well. The latter reasoning is modeled by Step (3) of the rgraph construction algorithm. Step (3) ensures that the subtask is consistent with the objectives of the agents' other plans.</Paragraph> <Paragraph position="9"> Figure 18 contains a graphical representation of the SharedPlans underlying the discourse in Figure 17. It is a snapshot representing the Apprentice's view of the agents' plans just after he explains the initiation of segment (3). Each box in the figure corresponds to a discourse segment and contains the SharedPlan used to model the segment's purpose. The plan used to model DSP3 is marked P3 in this figure, while the plans used to model DSP1 and DSP2 are labeled P1 and P2, respectively. We will continue to follow the convention of co-indexing DSPs with the SharedPlans used to model them in the remainder of this paper.</Paragraph> <Paragraph position="10"> The information represented within each SharedPlan in Figure 18 is separated into two parts. Those beliefs and intentions that have been established at the time of the snapshot are shown above the dotted line, while those that remain to be established, but are used in determining subsidiary relationships, are shown below the line. Because the last utterance of segment (2) signals the end of the agents' SharedPlan for removing the belt, the FSP for that act occurs above the dotted line. The agents' plan for removing the belt is complete and thus no longer in focus at the start of segment (3). We have included it in the figure for illustrative purposes. The index in square brackets Computational Linguistics Volume 24, Number 4 (1) User: Show me the generic concept called &quot;employee&quot;.</Paragraph> <Paragraph position="11"> (2) System: OK. <system displays network> F User: I can't fit a new ic below it.</Paragraph> <Paragraph position="12"> Can you move it up? LSystem: Yes. <system displays network> (6) User: OK, now make an individual employee concept A sample correction subdialogue (Sidner 1983; Litman 1985).</Paragraph> <Paragraph position="13"> to the right of each constituent indicates the clause of the FSP definition from which the constituent arose.</Paragraph> <Paragraph position="14"> Subsidiary relationships between plans are represented by arrows in the figure and are explained by the text that adjoins them. Plans P2 and P3 are thus subsidiary to plan P1 because of the constituent plan requirement (Clause (3aii)) of the FSP definition. These subsidiary relationships indicate that DSP2 and DSP3 are both dominated by DSP1.</Paragraph> <Paragraph position="15"> we take the purpose underlying the entire dialogue to be modeled using a SharedPlan to modify a KL-ONE network, (P4) PSP( {u, s}, modify_network(NetPiece, Data, Loc, {u, s})).</Paragraph> <Paragraph position="16"> We will assume the role of the System in analyzing this example.</Paragraph> <Paragraph position="17"> The System may have many recipes for modifying a network. One may involve deleting a concept from the network, one may involve changing the data in part of the network, and one may involve adding new data to the network. These three possibilities are depicted in Figure 20. At the beginning of the dialogue, the System may have no prior beliefs as to which of these recipes, if any, he and the User will follow to modify the network. The rgraph construction algorithm is used to model the System's reasoning and, as indicated in Step (0), will select one of these recipes nondeterministically. If the chosen recipe fails to account for the User's utterances, then it cannot be the recipe that the User has in mind for modifying the network. The algorithm will then backtrack at that point to select a different recipe for modifying the network. For illustrative purposes, we will assume that the System initially believes that he and the user are following the first recipe in Figure 20; this recipe involves deleting data from the network. The rgraph that results after the System has explained utterance (1) is shown in Figure 21.</Paragraph> <Paragraph position="18"> The User's utterance in (3) indicates that she has encountered a problem with the normal execution of the subtasks involved in modifying a network. The System's reasoning regarding this utterance may be modeled using CDRA. On the basis of the User's utterance and her presumed beliefs concerning the agents' capabilities regarding freeing up space on the screen, the System may reason that the User is initiating a new discourse segment with this utterance. The purpose of this segment is recognized as: DSPs=Int.Th (u, FSP ({ u, s}, Achieve(freespace~for (Data, below(gel)), {u, s })))</Paragraph> <Paragraph position="20"> Initial rgraph explaining utterances (1)-(2) of the dialogue in Figure 19.</Paragraph> <Paragraph position="21"> where gel represents &quot;the generic concept called 'employee'.&quot; To explain the User's initiation of the subdialogue, the System must determine how the SharedPlan in DSP5 will further the agents' plan in (P4).</Paragraph> <Paragraph position="22"> The System's current beliefs as to how the agents will modify the network, as represented by the rgraph in Figure 21, do not provide an explanation for the User's utterance in (3). The System's recipe does not include any type of &quot;fit&quot; act. The rgraph construction algorithm thus fails at this point and backtracks to nondeterministically select a different recipe for modifying the network. Suppose that this time the third recipe in Figure 20 is selected; this is the recipe that includes adding data to the network. The rgraph that results from using this recipe to explain the User's first utterance is shown in Figure 22. This new rgraph also provide an explanation for the User's utterance in (3); the &quot;fit&quot; act referred to by the User corresponds to the &quot;put&quot; act in the rgraph. In addition, the constraints of the recipe, along with the requirements of the ability operators, provide the explanation for the new discourse segment.</Paragraph> <Paragraph position="23"> As discussed in Section 5.1.2, an agent G's ability to perform an act fl depends in part on its ability to satisfy the constraints of the recipe in which fl is a constituent.</Paragraph> <Paragraph position="24"> Thus, to perform the act put(Data, below(gel), {u}), the User must be able to satisfy the constraints empty(below(gel)) and freespace_for( Data, below(gel)). The need to satisfy the latter constraint provides the System with an explanation for DSPs. In particular, the System can reason that the User initiated the new discourse segment in order to satisfy one of the ability requirements of the agents' SharedPlan to modify the network.</Paragraph> <Paragraph position="25"> The SharedPlan in DSP5 is thus subsidiary to that in (P4) by virtue of the BCBA requirements of the latter plan. Figure 23 summarizes our analysis of the dialogue.</Paragraph> <Paragraph position="26"> Whereas subtask subdialogues are explained in terms of constituent plan requirements of SharedPlans (Clause (3aii)), correction subdialogues are explained in terms of ability requirements (Clause (2ai)).</Paragraph> <Paragraph position="27"> Once the System recognizes, and explains, the initiation of the new segment, it will interpret the User's subsequent utterances in the context of its DSP, rather than the previous one. It will thus understand utterance (4) to contribute to freeing up space on the screen, rather than to modifying the network. This reasoning is modeled by Case (3a) of the augmentation process as follows: First, on the basis of its explanation of DSP5, the System will take the agents to have a PSP for the act Achieve(freespace.for(Data, below(gel)), {u, s}). This plan is marked (P5) in Figure 23 and is pushed onto the stack S above the plan in (P4). As a result, the System will now take the agents to be focused on the plan in (P5), rather than that in (P4), and thus will interpret the User's subsequent utterances in terms of the information they contribute towards completing the plan in (P5), rather than that in (P4).</Paragraph> <Paragraph position="28"> The User's utterance in (4) makes reference to an act move(gel, up, {s}). Using the rgraph construction algorithm, this act is understood to directly contribute to the objective of the plan in (P5), i.e., Achieve~freespace~or(Data, below(gel)), {u, s}). The resulting rgraph is shown in Figure 24. This rgraph provides an explanation for utterance (4) in the context of all of the acts involved in the agents' plans.</Paragraph> <Paragraph position="30"> Rgraph explaining utterances (1)-(4) of the dialogue in Figure 19.</Paragraph> <Paragraph position="31"> As noted in Section 1, the System's response to the User's request in (4) should take the context of the agents' entire discourse into account and not simply the context of freeing up space on the screen. In particular, the System should not clear the currently displayed network from the screen to help the User perform the task of putting up some new data, but rather should leave the displayed network visible. The discourse context modeled by the SharedPlans in (P4) and (P5), as well as the rgraph in Figure 24, enables the System to respond correctly. In particular, by examining the plans currently in focus and determining what needs to be done to complete them, the System can reason that it should perform an act in support of Achieve(freespaceqCor(Data, below(gel)), {u, s}). The System will most likely select the requested act of moving gel up, but if it decides to modify that act in some way or to select a different act, the new act must be compatible with the other acts the agents have agreed upon. By inserting the new act into the rgraph and determining that the resulting rgraph constraints will not be violated by this addition, the System can ensure that its response is in accord with the larger discourse context.</Paragraph> <Paragraph position="32"> 6.2.3 Example 3: Knowledge Precondition Subdialogues. The dialogue in Figure 25 (repeated from Figure 3) contains two embedded knowledge precondition subdialogues. We will assume the role of the Network Presenter, NP, in analyzing this example. null The overall purpose of the dialogue may be represented as: DSP6 = Int.Th(nm, FSP( {nm, np}, maintain(node39, {nm, np}))) and can be recognized on the basis of NM's utterance in (1) and CDRA. The purpose of the first subdialogue in Figure 25 can be represented as:</Paragraph> <Paragraph position="34"> Achieve( has.recipe( {nm, np } , maintain(node39, { nm, np } ) , R ) , { nm, np}))).</Paragraph> <Paragraph position="35"> This first subdialogue is initiated by agent NP, the agent whose reasoning we are modeling. We must thus account for NP's generation of an utterance in this example, rather than his interpretation of another agent's utterance. As will be discussed in Section 10, the use of SharedPlans in generation is an area for future research; however, the basic principles used in interpretation apply here as well. The current state of the agents' plans provides the basis for an agent's communication.</Paragraph> <Paragraph position="36"> DSP7 represents NP's intention that the agents determine a means of diverting network traffic. As discussed in Section 5.1.3, for a group of agents G to have a col- null Computational Linguistics Volume 24, Number 4 (1) NM: It looks like we need to do some maintenance on node39.</Paragraph> <Paragraph position="37"> (2) NP: Right.</Paragraph> <Paragraph position="38"> (3) How shall we proceed? (4) NM: Well, first we need to divert the traffic to another node.</Paragraph> <Paragraph position="39"> (5) NP: Okay.</Paragraph> <Paragraph position="40"> (6) Then we can replace node39 with a higher capacity switch.</Paragraph> <Paragraph position="41"> (7) NM: Right.</Paragraph> <Paragraph position="42"> (8) NP: Okay good.</Paragraph> <Paragraph position="43"> FNM: nodes could we divert the traffic to? Which I (10) NP: \[puts up diagram\] (11) ode41 looks like it could temporarily handle the extra load. (I~NM: I agree.</Paragraph> <Paragraph position="44"> (13) Why don't you go ahead and divert the traffic to node41 and then we can do the replacement.</Paragraph> <Paragraph position="45"> (14) NP: Okay.</Paragraph> <Paragraph position="46"> (15) \[NP changes network traffic patterns\] (16) That's done.</Paragraph> <Paragraph position="47"> Sample knowledge precondition subdialogues. (Adapted from Lochbaum, Grosz, and Sidner \[1990\].) laborative plan for an act a, the group must have mutual belief of a recipe for o~. It is this requirement that leads NP to initiate the first subdialogue; deciding upon a means of performing the objective of the agents' collaboration is a necessary first step to furthering that collaboration. The plan in DSP7 to agree on a recipe for maintaining node39 thus contributes to the plan in DSP6 to do the maintenance, and is therefore subsidiary to it. Figure 26 provides a graphical representation of this relationship. Once NM agrees to the subsidiary collaboration, either explicitly or implicitly as in utterance (4), NP will assume that the agents have a partial SharedPlan to obtain the recipe: (P7) PSP( {nm, np}, Achieve( has.recipe( {nm, np} , rnaintain( node39 , { nrn, np}), R), { nm, np})) NP will thus produce his next utterances in the context of the SharedPlan in (P7), rather than that in DSP6 and will assume that NM will do the same. To make sense of NM's utterance in (4), NP must provide an explanation for it in the context of the agents' SharedPlan in (P7). The rgraph construction algorithm is used in modeling NP's reasoning. Whereas in the case of a subtask subdialogue, the algorithm makes uses of recipes for performing a subtask, in the case of a knowledge precondition subdialogue, it makes use of recipes for satisfying a knowledge precondition. Figure 27 contains two recipes an agent might know to obtain a recipe for an act o~. The first is a single-agent recipe that involves looking up a procedure for a in a manual. The second recipe is a multiagent recipe that involves the agents communicating to come to agreement about the acts and constraints that will comprise their recipe for o~.</Paragraph> <Paragraph position="48"> We use these recipes to model NM's reasoning concerning utterance (4) as follows: In Step (0) of the rgraph construction algorithm, a recipe for the act Achieve(has.recipe ({nm, np}, maintain(node39, {nm, np}), R)) is first selected from NP's recipe library. For illustrative purposes, we will assume that the second recipe in Figure 27 is selected. Recipes for obtaining recipes.</Paragraph> <Paragraph position="49"> Next, we try to identify NM's communicative act in utterance (4) with some act in that recipe, and succeed by appropriately instantiating the communicate act. NP is thus able to make sense of NM's utterance based on his beliefs about ways of obtaining recipes. Now, however, he must decide whether the act that NM is proposing to include as part of their recipe for maintaining node39 is compatible with his beliefs about ways of performing that act. This reasoning is modeled by Step (3) of the augmentation process in which the constraints of the rgraph are checked for satisfiability. The recipe for obtaining recipes that was selected in Step (0) of the algorithm indicates that to have a recipe for maintaining node39, the agents must have mutual belief that some set of acts and constraints constitute a recipe for that act. If NP does not believe that the act divert_traffic(Nodel, Node2, G) should play a role in maintaining node39, then the constraint will not hold and the algorithm will fail. NP will then communicate his dissent to NM and possibly propose an alternative act. In this instance, however, NP is in agreement with NM, as evidence by his &quot;Okay&quot; in utterance,(5). The rgraph that results from his reasoning is shown in Figure 28.</Paragraph> <Paragraph position="50"> To produce utterance (6), NP must reason about the state of the agents' SharedPlans and determine what needs to be done to complete them. At this point in the discourse, the agents are focused on obtaining a recipe for maintaining node39 and have agreed that the act of diverting network traffic will be included in that recipe. NP might thus propose the performance of another act as part of their recipe. He does this in utterance (6). In utterance (7), NM agrees to the inclusion of that act. To produce utterance (8), NP must once again reason about the state of the agents' plans. If he believes that diverting network traffic from node39 and then replacing that node with a higher capacity switch will result in maintaining node39, then he will believe that the agents have completed their SharedPlan in (P7) to obtain a recipe for maintaining the node. His utterance in (8) indicates that this is the case. Unless agent NM indicates her disagreement, NP will thus assume that the agents have completed their SharedPlan in (P7) and will update his beliefs accordingly. First, he will remove the SharedPlan in (P7) from further consideration; the agents have completed that plan and have thus satisfied the corresponding discourse purpose. The plan in (P7) is thus popped from NP's representation of the intentional structure. Second, NP will update his beliefs about the dominating plan in DSP6 based on the knowledge gained during the subdialogue. In particular, the recipe that was decided upon to maintain node39 will be added to the plan and the rgraph will be updated accordingly. Figure 29 contains the rgraph representing the new discourse context after utterance (8). Utterance (9) indicates the initiation of a new discourse segment, the purpose of which can be recognized as:</Paragraph> <Paragraph position="52"> FSP({nm, np}, Achieve( has.sat.descr( { nm, np } , ToNode, .T ( divert_traffic, ToNode ) ) , (nm, np}))).</Paragraph> <Paragraph position="53"> using CDRA. As with the other types of subdialogues discussed above, once agent NP recognizes this DSP, he must determine its relationship to the other DSPs underlying the discourse. In this instance, the only other DSP is that underlying the entire discourse. To model agent NP's reasoning, we must thus determine the relationship of the SharedPlan in DSP8 to that in DSP6. The knowledge precondition requirements of the latter plan provide that explanation.</Paragraph> <Paragraph position="54"> A recipe for obtaining a parameter description.</Paragraph> <Paragraph position="55"> As discussed in Section 5.1.3, an agent G's ability to perform an act fl depends in part on its ability to identify the parameters of ft. Thus to perform the act divert_traffic (node39, ToNode, G!) as part of the agents' Shared.Plan to maintain node39, the agents must be able to identify the ToNode parameter of the act. The need to identify this parameter thus provides NP with an explanation for DSP8. In particular, NP can reason that NM initiated the new discourse segment in order to satisfy one of the ability requirements of the agents' SharedPlan to maintain node39. The SharedPlan in DSPs is thus subsidiary to that in DSP6 by virtue of the BCBA requirements of the latter plan. Figure 30 summarizes our analysis of the subdialogue.</Paragraph> <Paragraph position="56"> Once NP recognizes, and explains, the initiation of the new segment, he will produce his subsequent utterances in the context of its DSP, rather than the previous one, and will expect NM to do the same. The rgraph construction algorithm is used in modeling NP's reasoning. Whereas in the previous example, the algorithm makes use of recipes for obtaining recipes, in this case it makes use of recipes for obtaining parameter descriptions. Figure 31 contains an example of such a recipe. The recipe is derived from the definition of has.sat.descr in Figure 8 and represents that an agent G can bring about has.sat.descr of a parameter Pi by getting another agent G2 to give it a description D of pi. The recipe's constraints, however, require that D be of the appropriate sort, according to the constraint .T(6, Pi), for the identification of the parameter to be successful (Appelt 1985; Kronfeld 1986, 1990; Hintikka 1978).</Paragraph> <Paragraph position="57"> Given the discourse context represented by Figure 30 then, NP should respond to NM's utterance in (9) on the basis of his beliefs about ways in which to identify parameters. For example, if NP knows the recipe in Figure 31, then he might respond to NM by communicating some node description to her. As we noted in Section 1, however, the description that NP uses must be one that is appropriate for the current circumstances. In particular, NP should respond to NM with a description that will Computational Linguistics Volume 24, Number 4 enable both of the agents to identify the node for the purposes of diverting network traffic. The rgraph in Figure 29 and the constraints of the recipe in Figure 31 provide the necessary context for modeling NP's behavior. Because NP knows that the agents are trying to divert network traffic as part of maintaining node39, as represented by the rgraph in Figure 29, he should first choose a node that is appropriate for that circumstance. For example, he might choose a node that is spatially close to node39, rather than one that, while lightly loaded, is more distant. After selecting the node, NP should then choose a means of identifying it for NM. For example, he might present her with a diagram of the network and then tell her how to identify the particular node on the diagram; NP's response in utterances (10) and (11) takes this form. It would not be appropriate, however, for NP to respond to NM with some internal node name, or with a description like &quot;the node with the lightest traffic,&quot; unless he believed that NM could identify the node on the basis of that description. The constraints of the recipe in Figure 31 model this requirement. They represent that the description communicated by an agent should be one that will allow the other agent to identify the object in question for the purpose of the act to be performed.</Paragraph> <Paragraph position="58"> 7. Comparison with Grosz and Sidner's Theory Grosz and Sidner (1990) have argued that a theory of DSP recognition depends upon an underlying theory of collaborative plans. Although SharedPlans provide that latter theory, the connection between SharedPlans and DSPs was never specified. In this paper, we have presented a SharedPlan model for recognizing DSPs and their interrelationships. We now show that this model satisfies the requirements set out by Grosz and Sidner's (1986) theory of discourse structure. We first discuss the process by which intentional structure is recognized. Next, we discuss the way in which intentional structure interacts with the attentional state component of discourse structure. And finally, we discuss the contextual use of intentional structure in interpretation.</Paragraph> </Section> <Section position="3" start_page="559" end_page="560" type="sub_section"> <SectionTitle> 7.1 Recognizing Intentional Structure </SectionTitle> <Paragraph position="0"> course structure, Grosz and Sidner give several examples of the types of intentions that could serve as DSPs (Grosz and Sidner 1986, 179): 1. Intend that some 2. Intend that some 3. Intend that some 4. Intend that some 5. Intend that some agent intend to perform some physical task. agent believe some fact.</Paragraph> <Paragraph position="1"> agent believe that one fact supports another. agent intend to identify an object. agent know some property of an object. Intentions such as these, as well as segment beginnings and endings, might be recognized on the basis of linguistic markers, utterance-level intentions, or knowledge about actions and objects in the domain of discourse (Grosz and Sidner 1986). In our model, DSPs take the form Int.Th(ICP, FSP({ICP, OCP},fl)). This type of DSP addresses several problems with the above examples--problems that motivated Grosz and Sidner's (1990) subsequent work on SharedPlans--namely the case of one agent intending another to do something and the so-called master/slave assumption. We recognize DSPs using the conversational default rule, CDRA. This rule provides a means of recognizing the initiation of new segments and their purposes based on the Lochbaum A Collaborative Planning Model propositional content of utterances. Although this use of CDRA is admittedly limited-it requires an ICP to communicate the act that it desires to collaborate on at the outset of a segment--other sources of information, such as those cited above, could also be incorporated into the model to aid in the recognition of new segments and their corresponding SharedPlans.</Paragraph> <Paragraph position="2"> SharedPlans can also be used in recognizing the completion of discourse segments. Case (2b) of the augmentation process in Figure 13 outlines the required reasoning. A discourse segment is complete when all of the beliefs and intentions required to complete its corresponding SharedPlan have been established. This use of SharedPlans also appears at first glance to be of limited use---the mental attitudes required of a full SharedPlan may not all be explicitly established over the course of a dialogue or subdialogue. However, the OCP may be able to infer the completion of a SharedPlan, and thus the corresponding segment, in combination with information from other sources. For example, suppose an OCP has some reason to expect the end of a segment based on a linguistic signal such as an intonational feature (e.g., as described by Grosz and Hirschberg \[1992\]). If additionally the OCP is able to ascribe the various mental attitudes &quot;missing&quot; from the SharedPlan that corresponds to that segment, then the OCP has further evidence for the segment boundary. These mental attitudes may be ascribed on the basis of those of the OCP's beliefs that are in accord with the mental attitudes comprising the SharedPlan (Pollack 1986a; Grosz and Sidner 1990). nizes the initiation of a new discourse segment, it must determine the relationship of that segment's DSP to the other DSPs underlying the discourse (Grosz and Sidner 1986). In our model, relationships between SharedPlans provide the basis for determining the corresponding relationships between DSPs. An OCP must determine how the SharedPlan used to model a segment's DSP is related to the other SharedPlans underlying the discourse. The information that an OCP considers in determining this relationship is delineated by the beliefs and intentions that are required to complete each of the other plans. In this way, our model provides a more detailed account of the relationships that can hold between DSPs than did Grosz and Sidner's original formulation.</Paragraph> <Paragraph position="3"> One DSP dominates another if the second provides part of the satisfaction of the first. In our model, subsidiary relationships between SharedPlans provide a means of determining dominance relationships between DSPs. If one plan is subsidiary to another, then the DSP that is modeled using the first plan is dominated by that modeled using the second. One DSP satisfaction-precedes another if the first must be satisfied before the second. This relationship corresponds to a temporal dependency between SharedPlans. When one SharedPlan must be completed before another, the DSP that is modeled using the first satisfaction-precedes that modeled using the second.</Paragraph> </Section> <Section position="4" start_page="560" end_page="561" type="sub_section"> <SectionTitle> 7.2 Relationship to Attentional State </SectionTitle> <Paragraph position="0"> The attentional state component of discourse structure is an abstraction of the discourse participants' focus of attention; it is modeled using a stack of focus spaces, one for each segment. Each focus space contains its segment's DSP, as well as those objects, properties, and relations that become salient over the course of the segment. One of the primary roles of the focus space stack is to constrain the range of DSPs to which a new DSP can be related; a new DSP can only be dominated by a DSP in some space on the stack.</Paragraph> <Paragraph position="1"> In our model, a segment's focus space contains a DSP of the form Int.Th(ICP, FSP ({ICP, OCP}, fl)). The operations on the focus space stack depend upon subsidiary rela- null Computational Linguistics Volume 24, Number 4 tionships between SharedPlans in the same way that Grosz and Sidner (1986) describe the operations as depending upon DSP relationships. As each SharedPlan corresponding to a discourse segment is completed, the segment's focus space is popped from the stack. Only those SharedPlans in some space on the stack are candidates for subsidiary relationships. The use of the SharedPlan stack S in the augmentation process of Figure 13 reflects the operations of the focus space stack.</Paragraph> </Section> <Section position="5" start_page="561" end_page="562" type="sub_section"> <SectionTitle> 7.3 The Contextual Role of Intentional Structure </SectionTitle> <Paragraph position="0"> An utterance of a discourse can either begin a new segment of the discourse, complete the current segment, or contribute to it (Grosz and Sidner 1986). Each of these possibilities is modeled by a separate case within the augmentation process given in Figure 13. The initiation and completion of discourse segments was discussed in Section 7.1. Our discussion here is thus restricted to the case of an utterance's contributing to a discourse segment.</Paragraph> <Paragraph position="1"> Under Grosz and Sidner's theory, each utterance of a discourse segment contributes some information towards achieving the purpose of that segment. In our model, each utterance is understood in terms of the information it contributes towards completing the corresponding SharedPlan. The FSP definition in Figure 5 constrains the range of information that an utterance of a segment can contribute towards the segment's SharedPlan. Hence, if an utterance cannot be understood as contributing information to the current SharedPlan, then it cannot be part of the current discourse segment. That is, the utterance must begin a new segment of the discourse or complete the current segment, but it cannot contribute to it. In this way, our model provides a more detailed account of the role that intentional structure plays as context in interpreting utterances than did Grosz and Sidner's original formulation.</Paragraph> <Paragraph position="2"> Because each utterance of a discourse segment contributes some information towards the purpose of that segment, the segment's DSP may not be completely determined until the last utterance of the segment. However, as Grosz and Sidner (1986) have argued, the OCP must be able to recognize initially at least a generalization of the DSP so that the proper moves of attentional state can be made. Although CDRA provides a limited method of recognizing new segments and their purposes, it does conform to this aspect of Grosz and Sidner's theory. In particular, the initial purpose of a segment, as recognized by CDRA, is quite generally specified; it consists only of the intention that the agents form a SharedPlan. However, as the utterances of a discourse segment provide information about the details of that plan, the segment's purpose becomes more completely determined. In particular, the purpose comes to include the mental attitudes required of a full SharedPlan and established by the dialogue. Additionally, although the objective of the agents' plan may only be abstractly specified when it is initially recognized, it too may be further refined by the utterances of the segment.</Paragraph> <Paragraph position="3"> 8. Comparison with Previous Plan-Based Approaches Early work on plan recognition in discourse (Allen and Perrault 1980; Cohen, Perrault, and Allen 1982) focused on the problem of reasoning about single utterances. 25 Subsequent work (Sidner and Israel 1981; Sidner 1983, 1985; Carberry 1987) extended the earlier approaches to recognize speaker's intentions across multiple utterances. All of 25 More recent work in the area of single utterance reasoning includes that of Cohen and Levesque (1990) and Perrault (1990). Their work provides a detailed mental state model of speech act processing and is thus focused at a different level of granularity than the work discussed in this paper.</Paragraph> <Paragraph position="4"> Lochbaum A Collaborative Planning Model (1) User: Show me the generic concept called &quot;employee&quot;.</Paragraph> <Paragraph position="5"> (2) System: OK. <system displays network> (3) User: I can't fit a new ic below it.</Paragraph> <Paragraph position="6"> (4) Can you move it up? (~System: Yes. <system displays network> (6) User: OK, now make an individual employee concept whose first name is ...</Paragraph> <Paragraph position="7"> Figure 32 A sample correction subdialogue (Sidner 1983; Litman 1985).</Paragraph> <Paragraph position="8"> these approaches were based on a data-structure view of plans and were designed to recognize utterance-level intentions. More recent work (Litman 1985; Litman and Allen 1987; Lambert and Carberry 1991; Ramshaw 1991) has been concerned with the problems introduced by discourses containing subdialogues. However, the more recent work has followed in the tradition of the previous work and as a result continues to produce an utterance-to-utterance-based analysis of discourse, rather than one based on discourse structure. We now review these approaches and show that they are aimed at recognizing a different type of intention than that discussed in this paper.</Paragraph> </Section> <Section position="6" start_page="562" end_page="564" type="sub_section"> <SectionTitle> 8.1 The Approach of Litman and Allen </SectionTitle> <Paragraph position="0"> To model clarification and correction subdialogues, Litman and Allen propose the use of two types of plans: discourse plans and domain plans (Litman 1985; Litman and Allen 1987). Domain plans represent knowledge about a task, while discourse plans represent conversational relationships between utterances and plans. For example, an agent may use an utterance to introduce, continue, or clarify a plan.</Paragraph> <Paragraph position="1"> In Litman and Allen's model, the process of understanding an utterance entails recognizing a discourse plan from the utterance and then relating that discourse plan to some domain plan; the link between plans is captured by the constraints of the discourse plan. For example, under Litman and Allen's analysis, utterance (3) of the dialogue in Figure 32 (repeated from Figures 2 and 19) is recognized as an instance of the CORRECT-PLAN discourse plan; with the utterance, the User is correcting a domain plan to add data to a network.</Paragraph> <Paragraph position="2"> Litman and Allen use a stack of plans to model attentional aspects of discourse.</Paragraph> <Paragraph position="3"> The plan stack after processing utterance (3) is shown in Figure 33. The CORRECT-PLAN discourse plan on top of the stack indicates that the user and system are correcting a problem with the step labeled D1 in PLAN2 (the DISPLAY act of the ADD-DATA domain plan) by inserting a new step into PLAN2 (?newstep) before the step labeled F1 (the FIT step).</Paragraph> <Paragraph position="4"> The plan stack after processing the User's subsequent utterance in (4) is shown in Figure 34. The IDENTIFY-PARAMETER discourse plan indicates that utterance (4) is being used to identify the ?newstep parameter of the CORRECT-PLAN discourse plan.</Paragraph> <Paragraph position="5"> The boxes in Figures 33 and 34 do not correspond to discourse segments, but rather to individual utterances. PLAN5 in Figure 34 was introduced by utterance (4) in the dialogue, PLAN4 by utterance (3). The two utterances are linked together by the parameter M1, corresponding to the MOVE act in PLAN2. Although this analysis serves as a method of relating the two utterances, it provides only an utterance-to-utterance-based model of discourse processing. Intuitively, utterances (3)-(5) as a</Paragraph> <Paragraph position="7"> Plan stack after processing utterance (4) of the dialogue in Figure 32 (Litman 1985).</Paragraph> <Paragraph position="8"> Lochbaum A Collaborative Planning Model whole are concerned with correcting a problem; utterance (3) identifies the problem, while utterance (4) suggests a method of correcting it. Under Litman and Allen's analysis, however, utterance (3) is used to correct a problem and utterance (4) is used to identify a parameter in a discourse plan. This type of analysis cannot capture the contribution of a subdialogue to the overall discourse in which it is embedded. Each utterance is simply linked to one that precedes it, irrespective of how the utterances aggregate into segments.</Paragraph> <Paragraph position="9"> In contrast to Litman and Allen's approach, our approach accurately reflects the compositional structure of discourse; utterances are understood in the context of discourse segments, and segments in the context of the discourse as a whole. 26 Our analysis of the dialogue in Figure 32 was discussed in Section 6.2.2 and is summarized by Figure 23. Under our analysis, utterance (3) introduces a new discourse segment, the purpose of which is to satisfy a constraint that there be enough free space on the screen to add a new concept. This new segment is recognized and explained based on the ability requirements of SharedPlans. Utterance (4) of the dialogue is understood in the context of this new discourse segment. In particular, the act of moving the generic concept up is understood as a means of satisfying the constraint.</Paragraph> <Paragraph position="10"> In more recent work, Litman.and Allen have augmented their model with a notion of &quot;discourse intentions.&quot; &quot;Discourse intentions are purposes of the speaker, expressed in terms of both the task plans of the speaker (the domain plans) and the plans recursively generated by these plans (the discourse plans)&quot; (Litman and Allen, 1990, 376). For example, the discourse intention underlying utterance (4) can be glossed as: User intends that System intends that System identify the ?newstep parameter of the CORRECT-PLAN discourse plan.</Paragraph> <Paragraph position="11"> Because Litman and Allen's discourse and domain plans are recognized on the basis of a single utterance, their discourse intentions are actually utterance-level intentions, and not the type of discourse-level intentions discussed in this paper.</Paragraph> </Section> <Section position="7" start_page="564" end_page="565" type="sub_section"> <SectionTitle> 8.2 Other Approaches </SectionTitle> <Paragraph position="0"> Lambert and Carberry (1991) have revised Litman and Allen's dichotomy of plans into a trichotomy of discourse, problem-solving, and domain plans. Their discourse plans represent means of achieving communicative goals, while their problem-solving plans represent means of constructing domain plans. The Build-Plan operator in Figure 9 is an example of a problem-solving plan; it is used to represent the process by which two agents build a plan for one of them to do an action. The body of the operator requires that the agents (i) Build-Plans for the subacts of that action and (ii) Instantiate-Var(iable)s of those subacts.</Paragraph> <Paragraph position="1"> In Lambert and Carberry's model, the process of understanding an utterance entails recognizing a tripartite structure of plans. Beginning from the surface-level form of an utterance, their system recognizes plans on the discourse level until a plan at that level can be linked to one on the problem-solving level; plans on the problem-solving level are then recognized until one can be linked to a plan on the domain level; further plans may then be recognized on that level.</Paragraph> <Paragraph position="2"> 26 Although there may be several possible segmentations of a discourse, just as there may be several possible parses of a sentence, there is general agreement that utterances do cluster into segments. The point here is that our analysis reflects this segmentation, whereas Litman and Allen's is utterance-to-utterance based and thus does not.</Paragraph> <Paragraph position="3"> Computational Linguistics Volume 24, Number 4 As a model of subdialogue understanding, Lambert and Carberry's approach suffers from problems similar to that of Litman and Allen's. In particular, Lambert and Carberry's analysis is still utterance-to-utterance based; subdialogues are not recognized as separate units, nor is a subdialogue's contribution to the discourse in which it is embedded recognized. This is also true of Lambert and Carberry's (1992) more recent work on modeling negotiation subdialogues. Although Lambert and Carberry emphasize the importance of recognizing the initiation of negotiation subdialogues, and work through an example involving an embedded negotiation subdialogue, they do not indicate how these subdialogues are actually recognized as such. The only possibility hinted at in the text (i.e., that the discourse act Address-Believability accounts for them) results in a discourse segmentation that does not accurately reflect the purposes underlying their sample dialogue.</Paragraph> <Paragraph position="4"> Figures 35 and 36 contain the sample dialogue used by Lambert and Carberry (1992). In Figure 35, the dialogue is segmented as suggested by Lambert and Carberry's analysis, while in Figure 36 it is segmented to more accurately reflect the purposes underlying the discourse. The subdialogues marked (b) and (d) in Figure 36 are both initiated by $1 and are each concerned with a different aspect of the accuracy of S2's utterance in (6). Segments (b) and (d) are thus siblings both dominated by segment (a) in Figure 36. Under Lambert and Carberry's analysis, however, these two subdialogues are not recognized as separate units. That they should be can be seen by the coherent discourses that remain if either is removed from the dialogue.</Paragraph> <Paragraph position="5"> In addition, although the process of plan construction provides an important context for interpreting utterances, trying to formalize this mental activity under a data-structure approach results in a model that conflates recipes and plans (Pollack 1990). For example, each of Lambert and Carberry's domain act operators requires as a pre-condition that the agent have a plan to use that operator to perform the act. That requirement, however, results in the paradoxical situation whereby a recipe for an act o~ requires having a plan for o~ that uses that recipe. As another example, the Build-Plan operator in Figure 9 requires as a precondition that each agent know the referents of the subactions that one of the agents needs to perform to accomplish o~. However, considering that determining how to perform an act is part of constructing a plan to perform that act, it is odd that a recipe for building a plan for o~ requires knowing the subactions of o~ as a precondition of its use. The fact that these inconsistencies do not seem to pose a problem for Lambert and Carberry's model is testament to its data-structure nature; the plan chaining behavior of their reasoner on the various types of operators is such that no circularities arise.</Paragraph> <Paragraph position="6"> Ramshaw (1991) has augmented Litman and Allen's two types of plans with a different third type, exploration plans. This type of plan is added to distinguish those domain plans an agent has adopted from those it is simply considering adopting. In this model, understanding an utterance entails recognizing a discourse plan from the utterance and then relating that plan to a plan on either the exploration level or the domain level, as determined by the form of the utterance and the plan structures built from previous utterances. Like the previous approaches, however, Ramshaw's model is still utterance-to-utterance based. The three-level structure he manipulates on the basis of each user query does not account in any way for the structure of discourse.</Paragraph> </Section> </Section> <Section position="7" start_page="565" end_page="566" type="metho"> <SectionTitle> 9. Conclusion </SectionTitle> <Paragraph position="0"> In this paper, we have developed a computational model for recognizing the intentional structure of a discourse and using that structure in discourse processing. SharedPlans are used both to represent the components of intentional structure, i.e., discourse Lochbaum A Collaborative Planning Model (5) S 1: What is Dr. Smith teaching? (6) $2: Dr. Smith is teaching Architecture.</Paragraph> <Paragraph position="1"> (7) S 1: Isn't Dr. Brown teaching Architecture? (8) $2: No.</Paragraph> <Paragraph position="2"> (9) Dr. Brown is on sabbatical.</Paragraph> <Paragraph position="3"> EsSI: see on campus yesterday? But didn't I him $2: Yes. He was giving a University colloquium.</Paragraph> <Paragraph position="4"> 1: OK.</Paragraph> <Paragraph position="5"> (14) But isn't Dr. Smith a theory person? SI: What is Dr. Smith teaching? $2: Dr. Smith is teaching Architecture.</Paragraph> <Paragraph position="6"> SI: Isn't Dr. Brown teaching Architecture? $2: No.</Paragraph> <Paragraph position="7"> Dr. Brown is on sabbatical.</Paragraph> <Paragraph position="8"> SI: But didn't I see him on campus yesterday? $2: Yes.</Paragraph> <Paragraph position="9"> He was giving a University colloquium. SI: OK.</Paragraph> <Paragraph position="10"> But isn't Dr. Smith a theory person? segment purposes and their interrelationships, and to reason about the use of intentional structure in utterance interpretation. We have also shown that our work differs from previous plan-based approaches to discourse processing by providing a model for recognizing and reasoning with discourse-level intentions, rather than utterance-level intentions. The previous approaches address the problem of recognizing the propositional content of an utterance from its surface form, but provide only an utterance-to-utterance-based analysis of discourse. In contrast, we begin from propositional content and present a model of discourse processing that derives from discourse structure.</Paragraph> </Section> class="xml-element"></Paper>