File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/95/j95-3003_metho.xml
Size: 79,054 bytes
Last Modified: 2025-10-06 14:13:58
<?xml version="1.0" standalone="yes"?> <Paper uid="J95-3003"> <Title>Collaborating on Referring Expressions</Title> <Section position="3" start_page="353" end_page="357" type="metho"> <SectionTitle> 3. Referring Expressions </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="353" end_page="354" type="sub_section"> <SectionTitle> 3.1 Planning and Referring </SectionTitle> <Paragraph position="0"> By viewing language as action, the planning paradigm can be applied to natural language processing. The actions in this case are speech acts (Austin 1962; Searle 1969), and include such things as promising, informing, and requesting. Cohen and Perrault (1979) developed a system that uses plan construction to map an agent's goals to speech acts, and Allen and Perrault (1980) use plan inference to understand an agent's plan from its speech acts. By viewing it as action (Searle 1969), referring can be incorporated into a planning model. Cohen's model (1981) planned requests that the hearer identify a referent, whereas Appelt (1985) planned concept activations, a generalization of referring actions.</Paragraph> <Paragraph position="1"> Although acts of reference have been incorporated into plan-based models, determining the content of referring expressions hasn't been. For instance, in Appelt's model, concept activations can be achieved by the action describe, which is a primitive, not further decomposed. Rather, this action has an associated procedure that determines a description that satisfies the preconditions of describe. Such special procedures have been the mainstay for accounting for the content of referring expressions, both in constructing and in understanding them, as exemplified by Dale (1989), who chose descriptors on the basis of their discriminatory power; Ehud Reiter (1990), who focused Peter A. Heeman and Graeme Hirst Collaborating on Referring Expressions on avoiding misleading conversational irnplicatures when generating descriptions; and Mellish (1985), who used a constraint satisfaction algorithm to identify referents.</Paragraph> <Paragraph position="2"> Our work follows the plan-based approach to language generation and understanding. We extend the earlier approaches of Cohen and Appelt by accounting for the content of the description at the planning level. This is done by having surface speech actions for each component of a description, plus a surface speech action that expresses a speaker's intention to refer. A referring action is composed of these primitive actions, and the speaker utters them in her attempt to refer to an object.</Paragraph> <Paragraph position="3"> These speech actions are the building blocks that referring expressions are made from. Acting as the mortar are intermediate actions, which have constraints that the plan construction and plan inference processes can reason about. These constraints encode the knowledge of how a description can allow a hearer to identify an object.</Paragraph> <Paragraph position="4"> First, the constraints express the conditions under which an attribute can be used to refer to an object; for instance, that it be mutually believed that the object has a certain property (Clark and Marshall 1981; Perrault and Cohen 1981; Nadathur and Joshi 1983). Second, the constraints keep track of which objects could be believed to be the referent of the referring expression. Third, the constraints ensure that a sufficient number of surface speech actions are added so that the set of candidates associated with the entire referring expression consists of only a single object, the referent. These constraints enable the speaker to construct a referring expression that she believes will allow the hearer to identify the referent. As for the hearer, the explicit encoding of the adequacy of referring expressions allows referent identification to fall out of the plan inference process.</Paragraph> <Paragraph position="5"> Our approach to treating referring as a plan in which surface speech actions correspond to the components of the description allows us to capture how participants collaborate in building a referring expression. Plan repair techniques can be used to refashion an expression if it is not adequate, and clarifications can refer to the part of the plan derivation that is in question or is being repaired. Thus we can model a collaborative dialog in terms of the changes that are being made to the plan derivation.</Paragraph> <Paragraph position="6"> The referring expression plans that we propose are not simply data structures, but are mental objects that agents have beliefs about (Pollack 1990). The plan derivation expresses beliefs of the speaker: how actions contribute to the achievement of the goal, and what constraints hold that will allow successful identification. 2 So plan construction reasons about the beliefs of the agent in constructing a referring plan; likewise, plan inference, after hypothesizing a plan that is consistent with the observed actions, reasons about the other participant's (believed) beliefs in satisfying the constraints of the plan. If the hearer is able to satisfy the constraints, then he will have understood the plan and be able to identify the referent, since a term corresponding to it would have been instantiated in the inferred plan. Otherwise, there will be an action that includes a constraint that is unsatisfiable, and the hearer construes the action as being in error. (We do not reason about how the error affects the satisfiability of the goal of the plan nor use the error to revise the beliefs of the hearer.)</Paragraph> </Section> <Section position="2" start_page="354" end_page="355" type="sub_section"> <SectionTitle> 3.2 Vocabulary and Notation </SectionTitle> <Paragraph position="0"> Before we present the action schemas for referring expressions, we need to introduce the notation that we use. Our terminology for planning follows the general literature. 3 2 Since we assume that the agents have mutual knowledge of the action schemas and that agents can execute surface speech actions, we do not consider beliefs about generation or about the executability of primitive actions.</Paragraph> <Paragraph position="1"> Computational Linguistics Volume 21, Number 3 We use the terms action schema, plan derivation, plan construction, and plan inference. An action schema consists of a header, constraints, a decomposition, and an effect; and it encodes the constraints under which an effect can be achieved by performing the steps in the decomposition. A plan derivation is an instance of an action that has been recursively expanded into primitive actions--its yield. Each component in the plan--the action headers, constraints, steps, and effects--are referred to as nodes of the plan, and are given names so as to distinguish two nodes that have the same content. Finally, plan construction is the process of finding a plan derivation whose yield will achieve a given effect, and plan inference is the process of finding a plan derivation whose yield is a set of observed primitive actions.</Paragraph> <Paragraph position="2"> The action schemas make use of a number of predicates, and these are defined in Table 1. We adopt the Prolog convention that variables begin with an uppercase letter, and all predicates and constants begin with a lowercase letter. Two constants that need to be mentioned are system and user. The first denotes the agent that we are modeling, and the latter, her conversational partner. Since the action schemas are used for both constructing the plans of the system, and inferring the plans of the user, it is sometimes necessary to refer to the speaker or hearer in a general way. For this we use the propositions speaker(Speaker) and hearer(Hearer). These instantiate the variables Speaker and Hearer to system or user; which is which depends on whether the rule is being used for plan construction or plan inference. These propositions are included as constraints in the action schemas as needed.</Paragraph> </Section> <Section position="3" start_page="355" end_page="356" type="sub_section"> <SectionTitle> 3.3 Action Schemas </SectionTitle> <Paragraph position="0"> This section presents action schemas for referring expressions. (We omit discussion of actions that account for superlative adjectives, such as &quot;largest,&quot; that describe an object relative to the set of objects that match the rest of the description. A full presentation is given by Heeman \[1991\].) As we mentioned, the action for referring, called refer, is mapped to the surface speech actions through the use of intermediate actions and plan decomposition. All of the reasoning is done in the refer action and the intermediate actions, so no constraints or effects are included in the surface speech actions.</Paragraph> <Paragraph position="1"> We use three surface speech actions. The first is s-refer(Entity), which is used to express the speaker's intention to refer. The second is s-attrib(Entity, Predicate), and is used for describing an object in terms of an attribute; Entity is the discourse entity of the object, and Predicate is a lambda expression, such as &X. category(X, bird), that encodes the attribute. The third is s-attrib-rel(Entity, OtherEntity, Predicate), and is used for describing an object in terms of some other object. In this case Predicate is a lambda expression of two variables, one corresponding to Entity, and the other to OtherEntity; for instance, ~X. ~ Y. in(X, Y).</Paragraph> <Paragraph position="2"> Refer Action. The schema for refer is shown in Figure 1. The refer action decomposes</Paragraph> </Section> <Section position="4" start_page="356" end_page="356" type="sub_section"> <SectionTitle> Belief </SectionTitle> <Paragraph position="0"> bel(Agt, Prop): Agt believes that Prop is true.</Paragraph> <Paragraph position="1"> bmb(Agtl,Agt2,Prop): Agtl believes that it is mutually believed between himself and Agt2 that Prop is true.</Paragraph> <Paragraph position="2"> knowref(Agtl,Agt2,Ent, Obj): Agtl knows the referent that Agt2 associates with the discourse entity Ent (Webber 1983), which Agtl believes to be Obj. (Proving this proposition with Ent unbound will cause a unique identifier to be created for Ent.)</Paragraph> </Section> <Section position="5" start_page="356" end_page="357" type="sub_section"> <SectionTitle> Goals and Plans </SectionTitle> <Paragraph position="0"> goal(Agt, Goal): Agt has the goal Goal. Agents act to make their goals true.</Paragraph> <Paragraph position="1"> plan(Agt, Plan, Goal): Agt has the goal of Goal and has adopted the plan derivation Plan as a means to achieve it. The agent believes that each action of Plan contributes to the goal, but not necessarily that all of the constraints hold; in other words, the plan must be coherent but not necessarily valid (Pollack 1990, p. 94).</Paragraph> <Paragraph position="2"> content(Plan,Node, Content): The node named by Node in Plan has content Content. yield(Plan,Node,Actions): The subplan rooted at Node in Plan has a yield of the primitive actions Actions.</Paragraph> <Paragraph position="3"> achieve(Plan, Goal): Executing Plan will cause Goal to be true.</Paragraph> <Paragraph position="4"> error(Plan,Node)&quot; Plan has an error at the action labeled Node. Errors are attributed to the action that contains the failed constraint. This predicate is used to encode an agent's belief about an invalidity in a plan.</Paragraph> <Paragraph position="5"> Plan Repair substitute(Plan,Node, NewAction,NewPlan)&quot; Undo all variable bindings in Plan (except those in primitive actions, and then substitute the action header NewAction into Plan at Node, resulting in the partial plan NewPlan.</Paragraph> <Paragraph position="6"> replan(Plan,Actions): Complete the partial plan Plan. Actions are the primitive actions that are added to the plan.</Paragraph> <Paragraph position="7"> replace(Plan,NewPlan): The plan NewPlan replaces Plan.</Paragraph> <Paragraph position="8"> Miscellaneous subset(Set, Lambda, Subset): Compute the subset, Subset, of Set that satisfies the lambda expression Lambda.</Paragraph> <Paragraph position="9"> modifier-absolute-pred(Pred): Pred is a predicate that an object can be described in terms of. Used by the modifier-absolute schema given in Figure 6.</Paragraph> <Paragraph position="10"> modifier-relative-pred(Pred): Pred is a predicate that describes the relationship between two objects. Used by the modifier-relative schema given in Figure 7.</Paragraph> <Paragraph position="11"> pick-one(Object, Set): Pick one object, Object, of the members of Set. speaker(Agt): Agt is the current speaker.</Paragraph> <Paragraph position="12"> hearer(Agt): Agt is the current hearer.</Paragraph> <Paragraph position="13"> into two steps: s-refer, which expresses the speaker's intention to refer, and describe, which accounts for the content of the referring expression (given next). The effect of refer is that the hearer should believe that the speaker has a goal of the hearer knowing the referent of the referring expression. The effect has been formulated in this way because we are assuming that when a speaker has a communicative goal she plans to achieve the goal by making the hearer recognize it; the effect will be achieved by the hearer inferring the speaker's plan, regardless of whether or not the hearer is able to determine the actual referent. To simplify our implementation, this is the only effect that is stated for the action schemas for referring expressions. It corresponds to the literal goal that Appelt and Kronfeld (1987) propose (whereas the actual identification is their condition of satisfaction).</Paragraph> <Paragraph position="14"> Intermediate Actions. The describe action, shown in Figure 2, is used to construct a description of the object through its decomposition into headnoun and modifiers. The variable Cand is the candidate set, the set of potential referents associated with the head noun that is chosen, and it is passed to the modifiers action so that it can ensure that the rest of the description rules out all of the alternatives.</Paragraph> <Paragraph position="15"> The action headnoun, shown in Figure 3, has a single step, s-attrib, which is the surface speech action used to describe an object in terms of some predicate, which for the headnoun schema, is restricted to the category of the object. 4 The schema also has two constraints. The first ensures that the referent is of the chosen category and the second determines the candidate set, Cand, associated with the head noun that is chosen. The candidate set is computed by finding the subset of the objects in the world that the speaker believes could be referred to by the head noun--the objects that the speaker and hearer have an appropriate mutual belief about.</Paragraph> <Paragraph position="16"> The modifiers action attempts to ensure that the referring expression that is being constructed is believed by the speaker to allow the hearer to uniquely identify the referent. We have defined modifiers as a recursive action, with two schemas. 5 The first schema, shown in Figure 4, is used to terminate the recursion, and its constraint specifies that only one object can be in the candidate set. 6 The second schema, shown in Figure 5, embodies the recursion. It uses the modifier plan, which adds a component to the description and updates the candidate set by computing the subset of it that satisfies the new component. The modifier plan thus accounts for individual components of the description.</Paragraph> <Paragraph position="17"> There are two different action schemas for modifier; one is for absolute modifiers,</Paragraph> </Section> </Section> <Section position="4" start_page="357" end_page="360" type="metho"> <SectionTitle> 4 Note that several category predications might be true of an object, and we do not explore which would </SectionTitle> <Paragraph position="0"> be best to use, but see Edmonds (1994) for how preferences can be encoded. 5 We use specialization axioms (Kautz and Allen 1986) to map the modifiers action to the two schemas: modifiers-terminate and modifiers-recurse. 6 In order to distinguish this action from the primitive actions, it has a step that is marked null.</Paragraph> <Paragraph position="1"> schema for relative modifiers.</Paragraph> <Paragraph position="2"> such as &quot;black,&quot; and the other is for relative modifiers, such as &quot;larger.&quot; The former is shown in Figure 6; it decomposes into the surface speech action s-attrib and has a constraint that determines the new candidate set, NewCand, by including only the objects from the old candidate set, Cand, for which the predicate could be believed to be true. The other schema is shown in Figure 7 and is used for describing objects in terms of some other object. It uses the surface speech action s-attrib-rel and also includes a step to refer to the object of comparison.</Paragraph> <Section position="1" start_page="358" end_page="360" type="sub_section"> <SectionTitle> 3.4 Plan Construction and Plan Inference </SectionTitle> <Paragraph position="0"> The goals that we are interested in achieving are communicative goals. Since these goals cannot be directly achieved by a plan of action, the speaker must instead plan actions that will achieve them indirectly, for instance by planning an utterance that results in the hearer recognizing her goal. So, if the speaker wants to achieve Goal, she will attempt to construct a plan whose effect is bel(Hearer, goal(Speaker, Goal)).</Paragraph> <Paragraph position="1"> Plan Construction. Given an effect, the plan constructor finds a plan derivation that has a minimal number of primitive actions, that is valid (with respect to the planning agent's beliefs), and whose root action achieves the effect. The plan constructor uses a Computational Linguistics Volume 21, Number 3 best-first search strategy, expanding the derivation with the fewest number of surface speech actions. The yield of this plan derivation can then be given as input to a module that generates the surface form of the utterance.</Paragraph> <Paragraph position="2"> Plan Inference. Following Pollack (1990), our plan inference process can infer plans in which, in the hearer's view, a constraint does not hold. In inferring a plan derivation, we first find the set of plan derivations that account for the primitive actions that were observed, without regard to whether the constraints hold. This is done by using a chart parser that parses actions rather than words (Sidner 1994; Vilain 1990). For referring plans that contain more than one modifier, there will be multiple derivations corresponding to the order of the modifiers. We avoid this ambiguity by choosing an arbitrary ordering of the modifiers for each such plan.</Paragraph> <Paragraph position="3"> In the second part of the plan inference process, we evaluate each derivation by attempting to find an instantiation for the variables such that all of the constraints hold with respect to the hearer's beliefs about the speaker's beliefs. It could, however, be the case that there is no instantiation, either because this is not the right derivation or because the plan is based on beliefs not shared by the speaker and the hearer. In the latter case, we need to determine which action in the plan is to blame, so that this knowledge can be shared with the other participant.</Paragraph> <Paragraph position="4"> After each derivation has been evaluated, if there is just one valid derivation, an instantiated derivation whose constraints all hold, then the hearer will believe that he has understood. If there is just one derivation and it is invalid, the action containing the constraint that is the source of the invalidity is noted. (We have not explored ambiguous situations, those in which more than one valid derivation remains, or, in the absence of validity, more than one invalid derivation.) We now need to address how we evaluate a derivation. In the case where the plan is invalid, we need to partially evaluate the plan in order to determine which action contains a constraint that cannot be satisfied. However, any instantiation will lead to some constraint being found not to hold. Care must therefore be taken in finding the right instantiation so that blame is attributed to the action at fault. So, we evaluate the constraints in order of mention in the derivation, but postpone any constraints that have multiple solutions until the end. We have found that this simple approach can find the instantiation for valid plans and can find the action that is in error for the others.</Paragraph> <Paragraph position="5"> To illustrate this, consider the headnoun action, which has the following constraints. null</Paragraph> <Paragraph position="7"> bmb(Speaker, Hearer, category(Object, Category)) subset(World, AX. bmb(Speaker, Hearer, category(X, Category)),Cand) During the first step, finding the derivation, all co-referential variables will be unified. In particular, the variable Category will be instantiated from the co-referential variable in the surface speech action. The first three constraints have only a single solution, so they are instantiated. The fourth constraint contains Object. If there is exactly one object that the system believes to be mutually believed to be of Category, then Object is instantiated to it. If there is none, then this constraint is unsatisfiable, and so the evaluation of this plan stops with this action marked as being in error, since no object matches this part of the description. If there is more than one, then this constraint is Peter A. Heeman and Graeme Hirst Collaborating on Referring Expressions postponed and the evaluator moves on to the subset constraint. This constraint has one uninstantiated variable, Cand, which has a unique (non-null) solution, namely the candidate set associated with the head noun. So, this constraint is evaluated.</Paragraph> <Paragraph position="8"> The evaluation then proceeds through the actions in the rest of the plan. Assuming that no intervening errors are encountered, the evaluator will eventually reach the constraint on the terminating instance of modifiers, Cand = \[Object\], with Cand instantiated to a non-null set. If Cand contains more than one object, then this constraint will fail, pinning the blame on the terminating instance of modifiers for there not being enough descriptors to allow the referent to be identified. Otherwise, the terminating constraint will be satisfiable, and so Object will be instantiated to the single object in the candidate set. This will then allow all of the mutual belief constraints that were postponed to be evaluated, since they will now have only a single solution.</Paragraph> </Section> </Section> <Section position="5" start_page="360" end_page="365" type="metho"> <SectionTitle> 4. Clarifications </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="360" end_page="362" type="sub_section"> <SectionTitle> 4.1 Planning and Clarifying </SectionTitle> <Paragraph position="0"> Clark and Wilkes-Gibbs (1986) have presented a model of how conversational participants collaborate in making a referring action successful (see Section 2 above). Their model consists of conversational moves that express a judgment of a referring expression and conversational moves that refashion an expression. However, their model is not computational. They do not account for how the judgment is made, how the judgment affects the refashioning, or the content of the moves.</Paragraph> <Paragraph position="1"> Following the work of Litman and Allen (1987) in understanding clarification subdialogs, we formalize the conversational moves of Clark and Wilkes-Gibbs as discourse actions. These discourse actions are meta-actions that take as a parameter a referring expression plan. The constraints and decompositions of the discourse actions encode the conditions under which they can be applied, how the referring expression derivations can be refashioned, and how the speaker's beliefs can be communicated to the hearer. So, the conversational moves, or clarifications, can be generated and understood within the planning paradigm. 7 Surface Speech Actions. An important part of our model is the surface speech actions. These actions serve as the basis for communication between the two agents, and so they must convey the information that is dictated by Clark and Wilkes-Gibbs's model. For the judgment plans, we have the surface speech actions s-accept, s-reject, and s-postpone, corresponding to the three possibilities in their model. These take as a parameter the plan that is being judged, and for s-reject, also a subset of the speech actions of the referring expression plan. The purpose of this subset is to inform the hearer of the surface speech actions that the speaker found problematic. So, if the referring expression was &quot;the weird creature,&quot; and the hearer couldn't identify anything that he thought &quot;weird,&quot; he might say &quot;what weird thing,&quot; thus indicating he had problems with the surface speech action corresponding to &quot;weird.&quot; For the refashioning plans, we propose that there is a single surface speech action, s-actions, that is used for both replacing a part of a plan, and expanding it. This action takes as a parameter the plan that is being refashioned and a set of surface speech actions that the speaker wants to incorporate into the referring expression plan. Since there is only one action, if it is uttered in isolation, it will be ambiguous 7 We use the term clarification, since the conversational moves of judging and refashioning a referring expression can be viewed as clarifying it.</Paragraph> <Paragraph position="2"> Computational Linguistics Volume 21, Number 3 between a replacement and an expansion; however, the speech action resulting from the judgment will provide the proper context to disambiguate its meaning. In fact, during linguistic realization, if the two actions are being uttered by the same person, they could be combined into a single utterance. For instance, the utterance &quot;no, the red one&quot; could be interpreted as an s-reject of the color that was previously used to describe something and an s-actions for the color &quot;red.&quot; So, as we can see, the surface speech actions for clarifications operate on components of the plan that is being built, namely the surface speech actions of referring expression plans. This is consistent with our use of plan derivations to represent utterances. Although we could have viewed the clarification speech actions as acts of informing (Litman and Allen 1987), this would have shifted the complexity into the parameter of the inform, and it is unclear whether anything would have been gained.</Paragraph> <Paragraph position="3"> Instead, we feel that a parser with a model of the discourse and the context can determine the surface speech actions. 8 Additionally, it should be easier for the generator to determine an appropriate surface form.</Paragraph> <Paragraph position="4"> Judgment Plans. The evaluation of the referring expression plan indicates whether the referring action was successful or not. If it was successful, then the referent has been identified, and so a goal to communicate this is input to the plan constructor.</Paragraph> <Paragraph position="5"> This goal would be achieved by an instance of accept-plan, which decomposes into the surface speech action s-accept.</Paragraph> <Paragraph position="6"> If the evaluation wasn't successful, then the goal of communicating the error is given to the plan constructor, where the error is simply represented by the node in the derivation that the evaluation failed at. There are two reasons why the evaluation could have failed: either no objects match or more than one matches. In the first case, the referring expression is overconstrained, and the evaluation would have failed on an action that decomposes into surface speech actions. In the second case, the referring expression is underconstrained, and so the evaluation would have failed on the constraint that specifies the termination of the addition of modifiers. In our formalization of the conversational moves, we have equated the first case to reject-plan and the second case to postpone-plan, and their constraints test for the abovementioned conditions. The actions reject-plan and postpone-plan decompose into the surface speech actions s-reject and s-postpone, respectively.</Paragraph> <Paragraph position="7"> By observing the surface speech action corresponding to the judgment, the hearer, using plan inference, should be able to derive the speaker's judgment plan. If the judgment was reject-plan or postpone-plan, then the evaluation of the judgment plan should enable the hearer to determine the action in the referring plan that the speaker found problematic due to the constraints specified in the action schemas. The identity of the action in error will provide context for the subsequent refashioning of the referring expression. 9 Refashioning Plans. If a conversant rejects a referring expression or postpones judgment on it, then either the speaker or the hearer will refashion the expression in the context of the rejection or postponement. In keeping with Clark and Wilkes-Gibbs, we use two discourse plans for refashioning: replace-plan and expand-plan. The first is 8 See Levelt (1989, Chapter 12) for how prosody and clue words can be used in determining the type of clarification.</Paragraph> <Paragraph position="8"> 9 Another approach would be to use the identity of the action in error to revise the beliefs that the agent has attributed to the other conversant and to use the revised beliefs in refashioning the plan. However, such reasoning is beyond the scope of this work.</Paragraph> <Paragraph position="9"> Peter A. Heeman and Graeme Hirst Collaborating on Referring Expressions used to replace some of the actions in the referring expression plan with new ones, and the second is to add new actions. Replacements can be used if the referring expression either overconstrains or underconstrains the choice of referent, while the expansion can be used only if it underconstrains the choice. So, these plans can check for these conditions.</Paragraph> <Paragraph position="10"> The decomposition of the refashioning plans encodes how a new referring expression can be constructed from the old one. This involves three tasks: first, a single candidate referent is chosen; second, the referring expression is refashioned; and third, this is communicated to the hearer by way of the action s-actions, which was already discussed. 1deg The first step involves choosing a candidate. If the speaker of the refashioning is the agent who initiated the referring expression, then this choice is obviously pre-determined. Otherwise, the speaker must choose the candidate. Goodman (1985) has addressed this problem for the case of when the referring expression overconstrains the choice of referent. He uses heuristics to relax the constraints of the description and to pick one that nearly fits it. This problem is beyond the scope of this paper, and so we choose one of the referents arbitrarily (but see Heeman \[1991\] for how a simplified version of Goodman's algorithm that relaxes only a single constraint can be incorporated into the planning paradigm).</Paragraph> <Paragraph position="11"> The second step is to refashion the referring expression so that it identifies the candidate chosen in the first step. This is done by using plan repair techniques (Hayes 1975; Wilensky 1981; Wilkens 1985). Our technique is to remove the subplan rooted at the action in error and replan with another action schema inserted in its place. This technique has been encoded into our refashioning plans, and so can be used for both constructing repairs and inferring how another agent has repaired a plan.</Paragraph> <Paragraph position="12"> Now we consider the effect of these refashioning plans. As we mentioned in Section 2, once the refashioning plan is accepted, the common ground of the participants is updated with the new referring expression. So, the effect of the refashioning plans is that the hearer will believe that the speaker wants the new referring expression plan to replace the current one. Note that this effect does not make any claims about whether the new expression will in fact enable the successful identification of the referent. For if it did, and if the new referring expression were invalid, this would imply that the refashioning plan was also invalid, which is contrary to Clark and Wilkes-Gibbs's model of the acceptance process. So, the understanding of a refashioning does not depend on the understanding of the new proposed referring expression, but only on its derivation.</Paragraph> </Section> <Section position="2" start_page="362" end_page="364" type="sub_section"> <SectionTitle> 4.2 Action Schemas </SectionTitle> <Paragraph position="0"> This section presents action schemas for clarifications. Each clarification action includes a surface speech action in its decomposition. However, all reasoning is done at the level of the clarification actions, and so the surface actions do not include any constraints or effects. The notation used in the action schemas was given in Table 1 above.</Paragraph> <Paragraph position="1"> accept-plan. The discourse action accept-plan, shown in Figure 8, is used by the speaker to establish the mutual belief that a plan will achieve its goal. The constraints of the schema specify that the plan being accepted achieves its goal and the decomposition is the surface speech action s-accept. The effect of the schema is that the hearer 10 Another approach would have been to separate the communicative task from the first two (Lambert and Carberry 1991).</Paragraph> <Paragraph position="2"> will believe that the speaker has the goal that it be mutually believed that the plan achieves its goal.</Paragraph> <Paragraph position="3"> reject-plan. The discourse action reject-plan, shown in Figure 9, is used by the speaker if the referring expression plan overconstrains the choice of referent. The speaker uses this schema in order to tell the hearer that the plan is invalid and which action instance the evaluation failed in. The constraints require that the error occurred in an action instance whose yield includes at least one primitive action. The decomposition consists of s-reject, which takes as its parameter the surface speech actions that are in the yield of the problematic action.</Paragraph> <Paragraph position="4"> postpone-plan. The schema for postpone-plan, shown in Figure 10, is similar to rejectplan. However, it requires that the error in the evaluation occurred in an action that does not decompose into any primitive actions, which for referring expressions will be the instance of modifiers that terminates the addition of modifiers.</Paragraph> <Paragraph position="5"> replace-plan. The replace-plan schema is used by the speaker to replace some of the primitive actions in a plan with new actions. Because we need knowledge of the type of action where the error occurred in order to refashion the invalid plan, the constraints of this schema are more specific than those of the judgment plans. The schema that we give in Figure 11, for instance, is used to refashion a referring expression plan in which the error occurred in an instance of a modifier action, u The decomposition of the schema specifies how a new referring expression plan can be built. 12 The first step, pick-one(Object, Cand), chooses one of the objects that matched the part of the description that preceded the error; if the speaker is not the initiator of the referring expression, then this is an arbitrary choice. The second step specifies the header of the action schema that will be used to replace the subplan that contained the error. The third step substitutes the replacement into the referring expression plan, undoing all variable instantiations in the old plan. This results in the partial plan NewPlan. The fourth step calls the plan constructor to complete the partial plan. Finally, the fifth step is the surface speech action s-actions, which is used to inform the hearer of the surface speech actions that are being added to the referring expression plan.</Paragraph> <Paragraph position="6"> expand-plan. The expand-plan schema, shown in Figure 12, is similar to the replace-plan schema shown in Figure 11. The difference is that instead of replacing one of the instances of modifier, it replaces the terminal instance of modifiers by a modifiers subplan that distinguishes one of the objects from the others that match, thus effecting an expansion of the surface speech actions. Even if the speaker thought that the referring expression as it stands were adequate (since the candidate set Cand contains only one member), she will construct a non-null expansion since the replacement is the recursive version of modifiers.</Paragraph> </Section> <Section position="3" start_page="364" end_page="365" type="sub_section"> <SectionTitle> 4.3 Plan Construction and Plan Inference </SectionTitle> <Paragraph position="0"> The general plan construction and plan inference processes are essentially the same as those for referring expressions. However, the plan inference process has been augmented so as to embody the criteria for understanding that were outlined in Section 4.1.</Paragraph> <Paragraph position="1"> The inference of judgment plans must be sensitive to the fact that such a plan includes the constraint that the speaker found the judged plan to be in error even though the 11 If the error occurred in an instance of headnoun, a different replace-plan schema would need to be used, one that for instance relaxed the category that was used in describing the object (Goodman 1985; Heeman 1991).</Paragraph> <Paragraph position="2"> 12 We refer to the steps in the decomposition that are not action headers as mental actions. They need to be proved, just like constraints.</Paragraph> <Paragraph position="3"> expand-plan schema.</Paragraph> <Paragraph position="4"> hearer might not believe it to be. So, the inference process is allowed to assume that the speaker believes any constraint that the goal of the plan implies. In the case of a refashioning, the hearer might not view the proposed referring expression plan as being sufficient for identifying the referent, but would nonetheless understand the refashioning. So, the inference process requires only that the proposed referring expression be derived--so that it can serve to replace the current plan--but not that it be acceptable. So, when a replan action is part of a plan that is being evaluated, the success of this action depends only on whether the plan that is its parameter can be derived, but not whether the derived plan is valid. 13</Paragraph> </Section> </Section> <Section position="6" start_page="365" end_page="370" type="metho"> <SectionTitle> 5. Modeling Collaboration </SectionTitle> <Paragraph position="0"> In the last two sections, we discussed how initial referring expressions, judgments, and refashionings can be generated and understood in our plan-based model. In this section, we show how plan construction and plan inference fit into a complete model of how an agent collaborates in making a referring action successful. Previous natural language systems that use plans to account for the surface speech acts underlying an utterance (such as Cohen and Perrault 1979; Allen and Perrault 1980; Appelt 1985; Litman and Allen 1987) model only the recognition or only the construction of an agent's plans, and so do not address this issue.</Paragraph> <Paragraph position="1"> In order to model an agent's participation in a dialog, we need to model how the mental state of the agent changes as a result of the contributions that are made to the dialog. The change in mental state can be modeled by the beliefs and goals that a participant adopts. When a speaker produces an utterance, as long as the hearer finds it coherent, he can add a belief that the speaker has made the utterance to accomplish some communicative goal. The hearer might then adopt some goal of his own in response to this, and make an utterance that he believes will achieve this goal. Participants expect each other to act in this way. These social norms allow participants to add to their common ground by adopting the inferences about an utterance as mutual beliefs.</Paragraph> <Paragraph position="2"> To account for how conversants collaborate in dialog, however, this cooperation is not strong enough. Not only must participants form mutual beliefs about what was said, they must also form mutual beliefs about the adequacy of the plan for the task 13 Another approach would be to have the plan inference process reason about the intended effects of the plan that it is inferring in order to decide whether it should evaluate embedded plans and whether this evaluation should affect the evaluation of the parent plan.</Paragraph> <Paragraph position="3"> Peter A. Heeman and Graeme Hirst Collaborating on Referring Expressions they are collaborating upon. If the plan is not adequate, then they must work together to refashion it. This level of cooperation is due to what Clark and Wilkes-Gibbs refer to as a mutual responsibility, or what Searle (1990) refers to as a we-intention. This allows the agents to interact so that neither assumes control of the dialog, thus allowing both to contribute to the best of their ability without being controlled or impeded by the other. This is different from what Grosz and Sidner (1990) have called master-servant dialogs, which occur in teacher-apprentice or information-seeking dialogs, in which one of the participants is controlling the conversation (cf. Walker and Whittaker 1990).</Paragraph> <Paragraph position="4"> Note that the noncontrolling agent may be helpful by anticipating obstacles in the plan (Allen and Perrault 1980), but this is not the same as collaborating.</Paragraph> <Paragraph position="5"> The mutual responsibility that the agents share not only concerns the goal they are trying to achieve, but also the plan that they are currently considering. This plan serves to coordinate their activity and so agents will have intentions to keep this plan in their common ground. The plan might not be valid (unlike the shared plan of Grosz and Sidner \[1990\]), so the agents might not mutually believe that each action contributes to the goal of the plan. Because of this, agents will have a belief regarding the validity of the plan, and an intention that this belief be mutually believed.</Paragraph> <Paragraph position="6"> The discourse plans that we described in the previous section can now be seen as plans that can be used to further the collaborative activity. Judgment plans express beliefs about the success of the current plan, and refashioning plans update it. So, the mental state of an agent sanctions the adoption both of goals to express judgment and of goals to refashion. It also sanctions the adoption of beliefs about the current plan. 14 If it is mutually believed that one of the conversants believes there is an error with the current plan, the other also adopts this belief. Likewise, if one of the conversants proposes a replacement, the other accepts it. Since both conversants expect the other to behave in this way, each judgment and refashioning, so long as they are understood, results in the judgment or refashioning being mutually believed. Thus the current plan, through all of its refashionings, remains in the common ground of the participants.</Paragraph> <Paragraph position="7"> Below, we discuss the rules for updating the mental state after a contribution is made. We then give rules that account for the collaborative process. 15</Paragraph> <Section position="1" start_page="366" end_page="367" type="sub_section"> <SectionTitle> 5.1 Rules for Updating the Mental State </SectionTitle> <Paragraph position="0"> After a plan has been contributed to the conversation, by way of its surface speech actions, the speaker and hearer update their beliefs to reflect the contribution that has been made. Both assume that the hearer is observant, can derive a coherent plan (not necessarily valid), and can infer the communicative goal, which is expressed by the effect of the top-level action in the plan. We capture this by having the agent that we are modeling, the system, adopt the belief that it is mutually believed that the speaker intends to achieve the goal by means of the plan. 16 bmb(system, user, plan(Speaker, Plan, Goal)) The system will also add a belief about whether she believes the plan will achieve the goal, and if not, the action that she believes to be in error. So, one of the following propositions will be adopted.</Paragraph> <Paragraph position="1"> After the above beliefs have been added, there are a number of inferences that the agents can make and, in fact, can believe will be made by the other participant as well, and so these inferences can be mutually believed. The first rule is that if it is mutually believed that the speaker intends to achieve Goal by means of Plan, then it will be mutually believed that the speaker has Goal as one of her goals. 17 Rule 1 bmb(system, user, goal(Agtl, Goal)) bmb(system, user, plan(Agt l,Plan, Goal)) & Agtl E {system, user} The next rule concerns the adoption by the hearer of the intended goals of communicative acts. The communicative goal that we are concerned with is where the speaker wants the hearer to believe that the speaker believes some proposition. This only requires that the hearer believe the speaker to be sincere. We assume that both conversants are sincere, and so when such a communicative goal arises, both participants will assume that the hearer has adopted the goal. This is captured by Rule 2.</Paragraph> <Paragraph position="3"> The last rule involves an inference that is not shared. When the user makes a contribution to a conversation, the system assumes that the user believes that the plan will achieve its intended goal.</Paragraph> <Paragraph position="4"> Rule 3 bel(system, bel(user, achieve(Plan, Goal))) bmb (system, user, plan (user, Plan, Goal))</Paragraph> </Section> <Section position="2" start_page="367" end_page="368" type="sub_section"> <SectionTitle> 5.2 Rules for Updating the Collaborative State </SectionTitle> <Paragraph position="0"> The second set of rules that we give concern how the agents update the collaborative state. These rules have been revised from an earlier version (Heeman 1991) so as to better model the acceptance process.</Paragraph> <Paragraph position="1"> 5.2.1 Entering into a Collaborative Activity. We need a rule that permits an agent to enter into a collaborative activity. We use the predicate cstate to represent that an agent is in such a state, and this predicate takes as its parameters the agents involved, the goal they are trying to achieve, and their current plan. Our view of when such a collaborative activity can be entered is very simple: the system believes it is mutually believed that one of them has a goal to refer and has a plan for doing so, but one of them believes this plan to be in error. The last part of the condition states that if the speaker's referring expression was successful from the beginning, no collaboration is 17 All variables mentioned in the rules are existentially quantified.</Paragraph> <Paragraph position="2"> Peter A. Heeman and Graeme Hirst Collaborating on Referring Expressions necessary. It is not required that both participants mutually believe there is an error. Rather, if either detects an error, then that conversant can presuppose that they are collaborating, and make a judgment. Once the other recognizes the judgment that the plan is in error, the criteria for him entering will be fulfilled for him as well.</Paragraph> <Paragraph position="4"/> </Section> <Section position="3" start_page="368" end_page="370" type="sub_section"> <SectionTitle> 5.2.2 Adoption of Mutual Beliefs. In order to model how the state of the collaborative </SectionTitle> <Paragraph position="0"> activity progresses, we need to account for the mutual beliefs that the agents adopt as a result of the utterances that are made.</Paragraph> <Paragraph position="1"> The first rule is for judgment moves in which the speaker finds the current plan in error. Given that the move is understood, both conversants, by way of the rules given in Section 5.1, will believe that it is mutually believed that the speaker believes the current plan to be in error. In this case, the hearer, in the spirit of collaboration, must accept the judgment and so also adopt the belief that the plan is in error, even if he initially found the plan adequate. Since both conversants expect the hearer to behave in this way, the belief that there is an error can be mutually believed. Rule 5, below, captures this. (The adoption of this belief will cause the retraction of any beliefs that the plan is adequate.)</Paragraph> <Paragraph position="3"> The second rule is for refashioning moves. After such a move, the conversants will believe it mutually believed that the speaker has a replacement, NewPlan, for the current plan, Plan. Again, in the spirit of collaboration, the hearer must accept this replacement, and since both expect each other to behave this way, both adopt the belief that it is mutually believed that the new referring expression plan replaces the old one.</Paragraph> <Paragraph position="5"> In adopting this belief, the system updates the cstate by replacing the current plan with the new plan, and adding beliefs that capture the utterance of NewPlan as outlined in Section 5.1 above.</Paragraph> <Paragraph position="6"> Computational Linguistics Volume 21, Number 3 The third rule is for judgment moves in which the speaker finds the current plan acceptable. Given that the move has been understood, each conversant will believe it is mutually believed that the speaker believes that the current plan will achieve the goal (second condition of the rule). However, in order to accept this move, each participant also needs to believe that the hearer also finds the plan acceptable (third condition). This belief would have been inferred if it were the hearer who had proposed the current plan, or the last refashioning. In this case, the speaker (of the acceptance) would have inferred by way of Rule 3 that the hearer believes the plan to be valid; as for the hearer, given that he contributed the current plan, he undoubtedly also believes it to be acceptable.</Paragraph> <Paragraph position="8"> agents adopt goals to further the collaborative activity. These goals lead to judgment and refashioning moves, and so correspond to the rules that we just gave for adopting mutual beliefs.</Paragraph> <Paragraph position="9"> The first goal adoption rule is for informing the hearer that there is an error in the current plan. The conditions specify that Plan is the current plan of a collaborative activity and that the speaker believes that there is an error in it.</Paragraph> <Paragraph position="11"> The second rule is used to adopt the goal of replacing the current plan, Plan, if it has an error. The rule requires that the agent believe that it is mutually believed that there is an error in the current plan. So, this goal cannot be adopted before the goal of expressing judgment has been planned. Note that the consequent has an unbound variable, NewPlan. This variable will become bound when the system develops a plan to achieve this goal, by using the action schema replace-plan (see Figure 11 above).</Paragraph> <Paragraph position="13"> The third rule is used to adopt the goal of communicating the system's acceptance of the current plan. Not only must the system believe that the plan achieves the goal, but it must also believe that the user also believes this. As mentioned above for Rule 7, this last condition prevents the system from trying to accept a plan that it has itself just proposed. Rather, it can only try to accept a plan that the other agent contributed, for it is just such plans for which it will have the belief, by way of Rule 3, that the Peter A. Heeman and Graeme Hirst Collaborating on Referring Expressions user believes the plan achieves the goal.</Paragraph> <Paragraph position="15"/> </Section> <Section position="4" start_page="370" end_page="370" type="sub_section"> <SectionTitle> 5.3 Applying the Rules </SectionTitle> <Paragraph position="0"> The rules that we have given are used to update the mental state of the agent and to guide its activity. Acting as the hearer, the system performs plan inference on each set of actions that it observes, and then applies any rules that it can. When all of the observed actions are processed, the system switches from the role of hearer to speaker.</Paragraph> <Paragraph position="1"> As the speaker, the system checks whether there is a goal that it can try to achieve, and if so, constructs a plan to achieve it. Next, presupposing its partner's acceptance of the plan, it applies any rules that it can. It repeats this until there are no more goals.</Paragraph> <Paragraph position="2"> The actions of the constructed plans form the response of the system; in a complete natural language system, they would be converted to a surface utterance. The system then switches to the role of hearer.</Paragraph> </Section> </Section> <Section position="7" start_page="370" end_page="378" type="metho"> <SectionTitle> 6. An Example </SectionTitle> <Paragraph position="0"> We are now ready to illustrate our system in action. TM For this example, we use a simplified version of a subdialog from the London-Lund corpus (Svartvik and Quirk 1980, S.2.4a:1-8): (6.1) A: t See the weird creature.</Paragraph> <Paragraph position="1"> B: 2 In the corner? A: 3 No, on the television.</Paragraph> <Paragraph position="2"> B: 4 Okay.</Paragraph> <Paragraph position="3"> The system will take the role of person B and we will give it the belief that there are two objects that are &quot;weird&quot;--a television antenna, which is on the television, and a fern plant, which is in the corner.</Paragraph> <Section position="1" start_page="370" end_page="372" type="sub_section"> <SectionTitle> 6.1 Understanding &quot;The Weird Creature&quot; </SectionTitle> <Paragraph position="0"> For the first sentence, the system is given as input the surface speech actions underlying &quot;the weird creature,&quot; as shown below: s-refer(entity1) s-attrib(entityl, XX. assessment(X, weird)) s-attrib(entityl,&X, category(X, creature)) The system invokes the plan inference process, which finds the plan derivations whose yield is the above set of surface speech actions. In this case, there is only one, and the system labels it pl. Figure 13 shows the derivation; arrows represent decomposition, and for brevity, constraints and mental actions have been omitted and the parameters only of the surface speech actions are shown.</Paragraph> <Paragraph position="1"> 18 The system is implemented in C-Prolog under UNIX.</Paragraph> <Paragraph position="2"> Plan derivation (pl) for &quot;The weird creature.&quot; Next, the plan derivation is evaluated. The subset constraint in the headnoun action is evaluated, which narrows the candidate set to the antenna and the fern plant. The subset constraint in the modifier action is then evaluated, which does not eliminate either of the candidates, since the system finds both of them &quot;weird.&quot; The constraint on the modifiers action that terminates the addition of modifiers is then evaluated. However, this constraint fails, since there are two objects that match the description rather than one, as required.</Paragraph> <Paragraph position="3"> The system then updates its beliefs. As described in Section 5.1, the system adds the following beliefs to capture the results of the plan inference process: that it is mutually believed that the user has the goal of knowref and has adopted pl as a means to achieve it, and that pl has an error on the terminating instance of modifiers, node p22.</Paragraph> <Paragraph position="4"> bmb(system, user, plan(user, pl,knowref(system, user, entityl,0bject)))</Paragraph> <Paragraph position="6"> The system next tries to apply the belief and goal adoption rules. From Rule 1 and belief (11), the system adds the belief that it is mutually believed that the user has the goal that the system knowref, and from Rule 3 and belief (11), it adds the belief that the user believes that the plan achieves its goal.</Paragraph> <Paragraph position="7"> bmb(system, user, goal(user, knowref(system, user, entityl,0bject))) bel(system, bel(user, achieve(pl, knowref(system, user, entityl,0bject))))</Paragraph> <Paragraph position="9"> Belief (13), along with (11) and (12), allows the system to apply Rule 4, and so the system enters into a collaborative activity, in which the goal is for it to know the referent and in which the current plan is pl.</Paragraph> <Paragraph position="10"> cstate(system, user, pl,knowref(system, user, entityl,0bject)) (15) Since the system believes there is an error in the current plan, it applies Rule 8, and so gives itself the communicative goal of informing the user of the error in the current plan.</Paragraph> <Paragraph position="11"> Peter A. Heeman and Graeme Hirst Collaborating on Referring Expressions goal(system, bel(user, bel(system, error(p1,p22)))) (16)</Paragraph> </Section> <Section position="2" start_page="372" end_page="374" type="sub_section"> <SectionTitle> 6.2 Constructing &quot;In the Corner?&quot; </SectionTitle> <Paragraph position="0"> Since there are no further belief or goal adoption rules that can be applied, the system next checks for any goals that it can try to achieve. The only goal is (16), which is to inform the user of the error in the plan. Since the error in the referring plan is in the terminating instance of modifiers, the plan constructor builds an instance of postponeplan, which it names p26. (The schema was given in Figure 10.) Rather than realizing the surface speech action immediately, the system plans ahead. (This would allow an opportunistic process to combine surface speech actions into a single utterance \[Appelt 1985\].) So, the system, presupposing that the user understands the system's plan, adds the following belief.</Paragraph> <Paragraph position="1"> bmb (user, system,p lan (system, p 2 6,be l (user, be l (system, error(p l,p 22 ) ) ) ) ) (17) It also adds the belief that this plan will achieve its goal.</Paragraph> <Paragraph position="2"> bel (system, achieve(p 26,bel (user, bel (system, error(p l,p 22) ) ) ) ) (18) Then by Rule 1, the system adds the belief that it is mutually believed that it has the goal.</Paragraph> <Paragraph position="3"> bmb(system, user, goal(system, bel(user, bel(system, error(p l,p22))))) (19) Then by Rule 2, which captures the cooperativity of the agents in communicative goals, it adds the belief that it is mutually believed that the system believes there is an error.</Paragraph> <Paragraph position="4"> bmb(system, user, bel(system, error(p1,p22))) (20) Then, on the basis of (15) and (20) the system applies Rule 5, thus adopting the belief that it is mutually believed that there is an error in the plan. This presupposes the user's acceptance of the judgment plan.</Paragraph> <Paragraph position="5"> bmb (system, user, error(p1,p22)) (21) The system is now able to apply Rule 9, on the basis of (15) and (21), and so adopts the goal of refashioning the invalid referring expression plan and of informing the user of the new plan.</Paragraph> <Paragraph position="6"> goal(system, bel(user, bel(system, replace(pl,RPlan)))) (22) Since no further rules can be applied, the system checks for goals that it can try to fulfill, which will result in choosing (22). To achieve this goal, the plan constructor builds an instance of expand-plan (previously shown in Figure 12). In doing this, the system chooses one of the objects that matched the original description as the likely referent; in this case it happens to choose the object in the corner, the fern plant, which the system represents as fern1. It then substitutes the modifiers subplan that terminates the addition of modifiers with the header of the modifiers-recurse action (with the chosen object instantiated in). The plan constructor is then called to fill in the details, thereby creating the expansion. The expansion it chooses includes a relative modifier (see Figure 7) that describes the object as being in the corner. The new referring plan (labeled p34) is shown in Figure 14, with the expansion circled (we have abbreviated Plan derivation (p34) for &quot;The weird creature in the corner.&quot; the derivation of &quot;the corner&quot;). The surface speech action of expand-plan is s-actions, which takes the surface speech actions of the expansion, listed below, as its parameter. s-attrib-rel(entityl, entity2,XX. AY. in(X,Y)) s-refer(entity2) s-attrib(entity2,)C~, category(X, corner)) Next, the system assumes the user will understand the refashioning, and, by way of Rules I and 2, will be cooperative and adopt the communicative goal that the system believes that the new expanded plan replaces the old referring expression plan. The end result is given below as (23).</Paragraph> <Paragraph position="7"> bmb(system, user, bel(system, replace(p1,p34))) (23) The system, on the basis of (15) and (23), applies Rule 6, and so assumes that the user will accept the refashioning. So, the system adds the belief that it is mutually believed that the new expanded plan replaces the old referring expression.</Paragraph> <Paragraph position="8"> bmb(system, user, rep lace(p1,p34) ) (24) This causes the belief module to update the current plan of the collaborative activity (25). Also, it adds the beliefs that capture the utterance of the refashioned plan: that the system intends it as a means to achieve the referring action and that it does achieve this goal. 19 19 Even though the system has the referent incorrectly identified in the goal of knowref, the goal itself is Peter A. Heeman and Graeme Hirst Collaborating on Referring Expressions cstate(system, user, p34,knowref(system, user, entity1,0bject))) (25) bmb(system, user, plan(system,p34,knowref(system, user, entity1,fern1))) (26) bel(system, achieve(p34,knowref(system, user, entity1,fern1))) (27) The two plans that were constructed, postpone-plan and expand-plan, give rise to the output of the surface speech actions s-postpone and s-expand, which would be realized as &quot;in the corner?'. 2deg</Paragraph> </Section> <Section position="3" start_page="374" end_page="375" type="sub_section"> <SectionTitle> 6.3 Understanding &quot;No, on the Television&quot; </SectionTitle> <Paragraph position="0"> The user next utters &quot;No, on the television.&quot; This would get parsed into two separate surface speech actions, an s-reject corresponding to &quot;no,&quot; and an s-actions corresponding to &quot;on the television.&quot; For simplicity, the plan inference process is invoked separately on each.</Paragraph> <Paragraph position="1"> The system starts with the s-reject action. We assume that the parser can determine from context that the &quot;no&quot; is rejecting the surface speech actions that were previously added and so the parameter of s-reject is a list of these actions. From this, it derives a plan whose yield is the s-reject action, and this plan is an instance of reject-plan (previously shown in Figure 9). The system then evaluates the constraints of the plan, which results in it determining which action in the plan the user found to be in error.</Paragraph> <Paragraph position="2"> This is done by evaluating the constraints of reject-plan, and so finding the action whose yield is the surface speech actions that were rejected. This will be p56, the modifiers-relative action that described the object as being in the corner. The resulting belief, after applying Rules 1 and 2, is the following.</Paragraph> <Paragraph position="3"> bmb(system, user, bel (user, error(p34,p56) ) ) (28) The system then applies the appropriate acceptance rule, Rule 5, and so adopts the belief that the error is mutually believed.</Paragraph> <Paragraph position="4"> bmb (system, user, error(p 34,p56) ) (29) With this belief, the system will have the context that it needs to understand the user's refashioning plan.</Paragraph> <Paragraph position="5"> The system next performs plan recognition starting with the second surface speech action, s-actions, which corresponds to the refashioning &quot;on the television.&quot; It takes as a parameter the following list of actions: 21 s-attrib-rel(entityl, entity3,XX. A Y. on(X,Y)) s-refer(entity3) s-attrib(entity3,XX, category(X, television)) The system finds two plan derivations that account for the primitive action, one an instance of replace-plan (see Figure 11) and the other an instance of expand-plan (see Figure 12). Next it evaluates the constraints of each derivation. The constraints of still valid: for it to identify the referent corresponding to entity1.</Paragraph> <Paragraph position="6"> 20 Although our model does not account for the questioning intonation, it could be a manifestation of the s-postpone.</Paragraph> <Paragraph position="7"> 21 We assume that the parser determines the appropriate discourse entities in these actions: entity1 is the discourse entity for the object being referred to, and entity3 is another discourse entity. Computational Linguistics Volume 21, Number 3 expand-plan do not hold since the action in error, p56, is not an instance of modifiersterminate, so this plan is eliminated. The constraints (and mental actions) of replace-plan do hold, and so the system is able to derive the refashioned referring plan, which it labels p104.</Paragraph> <Paragraph position="8"> Since this instance of replace-plan is the only valid derivation corresponding to the surface speech actions observed, the system takes it as the plan behind the user's utterance. As a result, the system adds the following belief (after applying Rules I and 2).</Paragraph> <Paragraph position="9"> bmb (system, user, bel (user, replace(p34,p104) ) ) (30) The system then applies the acceptance rule for refashioning plans, Rule 6, and so adopts the refashioning as mutually believed.</Paragraph> <Paragraph position="10"> bmb(system, user, replace(p34,p104) ) (31) This causes the belief module to update the current plan of the collaborative activity and to add the belief that the user contributed the new referring expression plan. cstate(system, user, p104,knowref(system, user, entity1,0bject)) (32) bmb(system, user, plan(user, p104,knowref(system, user, entity1,antenna1)) (33) The new referring plan will already have been evaluated. The subplan corresponding to &quot;the television&quot; would have been understood without problem, 22 and the modifier corresponding to &quot;on the television&quot; would have narrowed down the candidates that matched &quot;weird creature&quot; to a single object, antenna1. So, the belief module adds the belief that the system finds the new referring plan to be valid. Also, by way of Rule 3, the system adds the belief that the user also does, since the user had proposed it. bel(system, achieve(p104,knowref(system, user, entity1,antenna1))) (34) bel(system, bel(user, achieve(plO4,knowref(system, user, entityl,antennal)))) (35)</Paragraph> </Section> <Section position="4" start_page="375" end_page="376" type="sub_section"> <SectionTitle> 6.4 Constructing &quot;Okay&quot; </SectionTitle> <Paragraph position="0"> On the basis of (32), (34), and (35), the system is able to apply Rule 10, and so adopts the goal of accepting the plan.</Paragraph> <Paragraph position="1"> goal(system, bel(user, bel(system, achieve(p104, knowref(system, user, entityl,antennal)))))) (36) The plan constructor achieves this by planning an instance of accept-plan, which results in the surface speech action s-accept, which would be realized as &quot;Okay.&quot; Then, after the application of Rules 1, 2, and most importantly 7, the system adopts the belief that it is mutually believed that the plan achieves the goal of referring. bmb(system, user, achieve(p104,knowref(system, user, entihd1,antenna1))) (37) 22 If &quot;the television&quot; is not understood, then since it is a referring expression in its own right, the conversants could collaborate on identifying its referent independently of the referent of &quot;the weird creature&quot;; that is, the participants could enter into an embedded collaborative activity by focusing on one part of the current plan.</Paragraph> <Paragraph position="2"> Peter A. Heeman and Graeme Hirst Collaborating on Referring Expressions 7. Comparisons to Related Work In providing a computational model of how agents collaborate upon referring expressions, we have touched on several different areas of research. First, our work has built on previous work in referring expressions, especially their incorporation into a model based on the planning paradigm. Second, our work has built on the research done in modeling clarifications in the planning paradigm and on plan repair. Third, our work is related to the research being done on modeling collaborative and joint activity.</Paragraph> </Section> <Section position="5" start_page="376" end_page="376" type="sub_section"> <SectionTitle> 7.1 Referring Expressions </SectionTitle> <Paragraph position="0"> Cohen (1981) and Appelt (1985) have also addressed the generation of referring expressions in the planning paradigm. They have integrated this into a model of generating utterances, a step that we haven't taken. However, we have extended their model by incorporating even the generation of the components of the description into our planning model. One result of this is that our surface speech actions are much more fine-grained.</Paragraph> </Section> <Section position="6" start_page="376" end_page="377" type="sub_section"> <SectionTitle> 7.2 Clarifications and Plan Repair </SectionTitle> <Paragraph position="0"> An important part of our work involves accounting for clarifications of referring expressions by using meta-actions that incorporate plan repair techniques. This approach is based on Litman and Allen's work (1987) on understanding clarification subdialogs, in which meta-actions were used to model discourse relations, such as clarifications.</Paragraph> <Paragraph position="1"> There are several major differences between our work and theirs. First, our work addresses not only understanding, but also generation, and how these two tasks fit into a model of how agents collaborate in discourse. Second, Litman and Allen use a stack of unchanging plans to represent the state of the discourse. We, however, use a single current plan, modifying it as clarifications are made. This difference has an important ramification, for it results in different interpretations of the discourse structure.</Paragraph> <Paragraph position="2"> Consider dialog (7.1), which was collected at an information booth in a Toronto train station (Horrigan 1977). (Although the participants are not collaborating in making a referring expression, the dialog will serve to illustrate our point.) (7.1) P- 1 The 8:50 to Montreal? C&quot; 2 8:50 to Montreal. Gate 7.</Paragraph> <Paragraph position="3"> p: 3 Where is it? C: 4 Down this way to your left. Second one on the left.</Paragraph> <Paragraph position="4"> p. 5 OK. Thank you.</Paragraph> <Paragraph position="5"> Litman and Allen represent the state of the discourse after the second utterance as a clarification of the passenger's take-train-trip plan. The information that the train boards at gate 7 is represented only in the clarification plan. So, when the passenger asks &quot;Where is it?,&quot; their system, acting as the clerk, cannot interpret this as a clarification of the take-train-trip plan, since the utterance &quot;cannot be seen as a step of \[that\] plan&quot; (p. 188). So, it is interpreted instead as a request for a clarification of the clerk's &quot;Gate 7&quot; response, implicitly assuming that &quot;Gate 7&quot; was not accepted. In our model, the acceptance of &quot;Gate 7&quot; would be presupposed, and so it would be incorporated into the take-train-trip plan. So, the passenger's question of &quot;Where is it?&quot; would be viewed as a request for the clerk to clarify that plan.</Paragraph> <Paragraph position="6"> The work of Moore and Swartout (1991), Cawsey (1991), and Carletta (1991) on interactive explanations also addresses clarifications using plan repair techniques. This body of work uses plan construction techniques to generate explanations, and uses the Computational Linguistics Volume 21, Number 3 constructed plan as a basis for recovery strategies if the user doesn't understand the explanation. In the cases of Cawsey and Carletta, both use meta-actions to encode the plan repair techniques. However, none of these approaches is within a collaborative framework, in which either agent can contribute to the development of the plan.</Paragraph> <Paragraph position="7"> Other relevant work is that of Lambert and Carberry (1991). In their model of understanding information-seeking dialogs, they propose a distinction between problem-solving activities and discourse activities. In contrast, our clarifications embody both functions in the same actions, thus allowing for a simpler approach to inferring the refashioned referring expressions, since we need not chain to a meta-operator. In later work, Chu-Carroll and Carberry (1994) extended this model to generate responses to proposals that are viewed as sub-optimal or invalid. Like Litman and Allen (1987), they adopt the view that subsequent modifications apply to the preceding modification, rather than the underlying plan.</Paragraph> </Section> <Section position="7" start_page="377" end_page="378" type="sub_section"> <SectionTitle> 7.3 Collaboration </SectionTitle> <Paragraph position="0"> Grosz, Sidner, and Lochbaum (Grosz and Sidner, 1990; Lochbaum, Grosz, and Sidner, 1990) are interested in the type of plans that underlie discourse in which the agents are collaborating in order to achieve some goal. They propose that agents are building a shared plan in which participants have a collection of beliefs and intentions about the actions in the plan. Our model differs from theirs in two important aspects. First, not only do agents have a collection of beliefs and intentions regarding the actions of a shared plan, but we feel that they also have an intention about the goal (Searle 1990; Cohen and Levesque 1991). It is this intention, in conjunction with the current plan, that sanctions the adoption of beliefs and intentions about potential actions that will contribute to the goal, rather than just the shared plan.</Paragraph> <Paragraph position="1"> Second, we feel that their definition of a partial shared plan is too restrictive.</Paragraph> <Paragraph position="2"> Although they address partial plans, they require, in order for an action to be part of a partial shared plan, that both agents believe that the action contributes to the goal.</Paragraph> <Paragraph position="3"> However, this is too strong. In collaborating to achieve a mutual goal, participants sometimes propose an action that is not believed by the other participant or even by the participant that is proposing it. In failing to represent such states, their model is unable to represent the intermediate states in which a hearer might have understood how the speaker's utterance contributes to a plan, but doesn't agree with it. This is important, since if the refashioned plan is invalid, only the referring expression should be refashioned, not the refashioning itself.</Paragraph> <Paragraph position="4"> Traum (1991; Traum and Hinkelman, 1992) is concerned with providing a computational model of grounding, the process in which conversational participants add to the common ground of a conversation (Clark and Schaefer 1989; Clark and Brennan 1990).</Paragraph> <Paragraph position="5"> Traum models the grounding process by proposing that utterances move through a number of states, 'pushed' by grounding acts, which include initiate, continue, repair, request repair, acknowledge, and request acknowledge. Once an utterance has been acknowledged, it will reside in mutual belief as a proposal of the person who initiated it. The proposal state is a subspace of the mutual belief space of the conversants. Only once it has been accepted will it be moved into the shared space (also in mutual belief).</Paragraph> <Paragraph position="6"> Unlike Traum's, our work does not differentiate the proposal state from the shared state. If a proposal is understood, it is incorporated into the current plan. Judgments of acceptability are not on proposals but on the current plan, or a part of it.</Paragraph> <Paragraph position="7"> Sidner (1994) addressed the issue of how conversational participants collaborate in building a shared plan. In this work, Sidner presents a number of speech actions for use in collaborative tasks. These actions are those that an artificial agent could use in negotiating which actions or beliefs to accept into the shared plan of the agents. As Peter A. Heeman and Graeme Hirst Collaborating on Referring Expressions with Traum, it is the proposals that are refashioned, before they are integrated into the shared plan, rather than the shared plan.</Paragraph> <Paragraph position="8"> Cohen and Levesque (1991) focus on formalizing joint intention in a logic. They use this formalism to explain how such elements of communication as confirmations arise when agents are engaging in a joint action. However, they have not addressed how agents collaborate in building a plan, only how agents collaborate while executing a plan. Once this limitation is overcome, their approach could offer us a route for formalizing the mental states of the collaborating agents in our model and for proving that our acceptance and goal adoption rules follow from such states.</Paragraph> </Section> </Section> class="xml-element"></Paper>