XML Viewer - j98-3002

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/j98-3002_metho.xml
Size: 102,748 bytes
Last Modified: 2025-10-06 14:14:48
<?xml version="1.0" standalone="yes"?>
<Paper uid="J98-3002">
  <Title>Collaborative Response Generation in Planning Dialogues</Title>
  <Section position="4" start_page="359" end_page="360" type="metho">
    <SectionTitle>
3. Modeling Collaborative Planning Dialogues
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="359" end_page="360" type="sub_section">
      <SectionTitle>
3.1 Corpus Analysis
</SectionTitle>
      <Paragraph position="0"> In order to develop a response generation model that is capable of generating natural and appropriate responses when (potential) conflicts arise, the first author analyzed sample dialogues from three corpora of collaborative planning dialogues to examine human behavior in such situations. These dialogues are: the TRAINS 91 dialogues (Gross, Allen, and Traum 1993), a set of air travel reservation dialogues (SRI Transcripts 1992), and a set of collaborative negotiation dialogues on movie selections (Udel Transcripts 1995).</Paragraph>
      <Paragraph position="1"> The dialogues were analyzed based on Sidner's model, which captures collaborative planning dialogues as proposal/acceptance and proposal/rejection sequences (Sidner 1992, 1994). Emphasis was given to situations where a proposal was not immediately accepted, indicating a potential conflict between the agents. In our analysis, all cases involving lack of acceptance fall into one of two categories: 1) rejection, where one agent rejects a proposal made by the other agent, and 2) uncertainty in acceptance, where one agent cannot decide whether or not to accept the other agent's proposal. The former is indicated when an agent explicitly conveys rejection of a proposal and / or provides evidence that implies such rejection, while the latter is indicated when an agent solicits further information (usually in the form of a question) to help iher decide whether to accept the proposal. 2 Walker (1996a) analyzed a corpus of financial planning dialogues for utterances that conveyed acceptance or rejection. While our rejection category is subsumed by her rejections, some of what she classifies as rejections would fall into our uncertainty in acceptance category since the speaker's utterance indicates doubt but not complete rejection. For example, one of the utterances that Walker treats as a rejection is &amp;quot;A: Well I thought they just started this year,&amp;quot; in response to B's proposal that A should have been eligible for an IRA last year.</Paragraph>
      <Paragraph position="2"> Since A's utterance conveys uncertainty about whether IRA's were started this year, it indirectly conveys uncertainty about whether A was eligible for an IRA last year.</Paragraph>
      <Paragraph position="3"> Thus, we classify this utterance as uncertainty in acceptance.</Paragraph>
      <Paragraph position="4"> Our analysis confirmed both Sidner's and Walker's observations that collaborative planning dialogues can be modeled as proposal/acceptance and proposal/rejection sequences. However, we further observed that in the vast majority of cases where a proposal is rejected, the proposal is not discarded in its entirety, but is modified to a form that will potentially be accepted by both agents. This tendency toward modification is summarized in Table 1 and is illustrated by the following example (the utterance that suggests modification of the original proposal is in boldface): 3 2 In the vast majority of cases where there is lack of acceptance of a proposal the agent's response to the proposal clearly indicates either a rejection or an uncertainty in acceptance. In cases where there is no explicit indication, the perceived strength of belief conveyed by the agent's response as well as the subsequent dialogue were used to decide between rejection and uncertainty in acceptance.</Paragraph>
      <Paragraph position="5"> 3 We consider a proposal modified if subsequent dialogue pursues the same subgoal that the rejected proposal is intended to address and takes into account the constraints previously discussed (such as the source and destination cities and approximate departure time, in the sample dialogue).</Paragraph>
      <Paragraph position="6">  C: Delta has a four thirty arriving eight fifty five.</Paragraph>
      <Paragraph position="7"> T: That one's sold out.</Paragraph>
      <Paragraph position="8"> C: That's sold out? T: Completely sold out. Now there's a Delta four ten connects with Dallas arrives eight forty.</Paragraph>
      <Paragraph position="9"> We will use the term collaborative negotiation (Sidner 1994) to refer to the kinds of negotiation reflected in our transcripts, in which each agent is driven by the goal of devising a plan that satisfies the interests of the agents as a group, instead of one that maximizes their own individual interests. Further analysis shows that a couple of features distinguish collaborative negotiation from argumentation and noncollaborative negotiation (Chu-Carroll and Carberry 1995c). First, an agent engaging in collaborative negotiation does not insist on winning an argument, and will not argue for the sake of arguing; thus she may change her beliefs if another agent presents convincing justification for an opposing belief. This feature differentiates collaborative negotiation from argumentation (Birnbaum, Flowers, and McGuire 1980; Reichman 1981; Flowers and Dyer 1984; Cohen 1987; Quilici 1992). Second, agents involved in collaborative negotiation are open and honest with one another; they will not deliberately present false information to the other agents, present information in such a way as to mislead the other agents, or strategically hold back information from other agents for later use. This feature distinguishes collaborative negotiation from noncollaborative negotiation such as labor negotiation (Sycara 1989).</Paragraph>
      <Paragraph position="10"> As shown in Table 1, our corpus analysis also found 29 cases in which an agent either explicitly or implicitly indicated uncertainty about whether to accept or reject the other agent's proposal and solicited further information to help in her decision making. 4 These cases can be grouped into four classes based on the strategy that the agent adopted. In the first strategy, Invite-Attack, the agent presents evidence (usually in the form of a question) that caused her to be uncertain about whether to accept the proposal. For example, in the following excerpt from the corpus, A inquired about a piece of evidence that would conflict with Crimson Tide not being B's type of movie:</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="360" end_page="363" type="metho">
    <SectionTitle>
4 About two-thirds of these examples were found in the Udel movie selection dialogues. We believe this
</SectionTitle>
    <Paragraph position="0"> is because in that corpus, the dialogue participants are peers and the criteria for accepting/rejecting a proposal are less clear-cut than in the other two domains.</Paragraph>
    <Paragraph position="1">  B: It's supposed to be violent. It doesn't seem like my type of movie.</Paragraph>
    <Paragraph position="2"> A: Didn't you like Red October? In the second strategy, Ask-Why, the agent requests further evidence from the other agent that will help her make a decision about whether to accept the proposal, as in the following example: Ask-Why Example (SRI Transcripts 1992) T: Does carrier matter to them do you know? C: No.</Paragraph>
    <Paragraph position="3"> T: Can we put them on American? C: Why? The third strategy, Invite-Attack-and-Ask-Why, is a combination of the first and second strategies where the agent presents evidence that caused her to be uncertain about whether to accept the proposal and also requests that the other agent provide further evidence to support the original proposal, as in the following example:  I'd like to know some inkling of information about the movie. P told you what was happening.</Paragraph>
    <Paragraph position="4"> Other than P's reviews.</Paragraph>
    <Paragraph position="5"> Why? He's a good kid. He could tell you.</Paragraph>
    <Paragraph position="6"> Our last strategy includes all other cases in which an agent is clearly uncertain about whether to accept a proposal, but does not directly employ one of the above three strategies to resolve the uncertainty. In our analysis, the cases that fall into this category share a common feature in that the agent explicitly indicates her uncertainty about whether to accept the proposal, without suggesting what type of information will help resolve her uncertainty, as in the following example: Express-Uncertainty Example (Udel Transcripts 1995) A: I don't like violence.</Paragraph>
    <Paragraph position="7"> B: You don't like violence? In our corpus analysis, most responses to these questions provided information that led the agent to eventually accept or reject the original proposal. We argue that this interest in sharing beliefs and supporting information is another feature that distinguishes collaborative negotiation from argumentation and noncollaborative negotiation. Although agents involved in the latter kinds of interaction take other agents' beliefs into account, they do so mainly to find weak points in their opponents' beliefs and to attack them in an attempt to win the argument.</Paragraph>
    <Paragraph position="8">  Chu-Carroll and Carberry Response Generation in Planning Dialogues</Paragraph>
    <Section position="1" start_page="362" end_page="363" type="sub_section">
      <SectionTitle>
3.2 The Overall Processing Model
</SectionTitle>
      <Paragraph position="0"> The results of our corpus analysis suggest that when developing a computational agent that participates in collaborative planning, the behavior described below should be modeled. When presented with a proposal, the agent should evaluate the proposal based on its private beliefs to determine whether to accept or reject the proposal. If the agent does not have sufficient information to make a rational decision about acceptance or rejection, it should initiate an information-sharing subdialogue to exchange information with the other agent so that each agent can knowledgeably re-evaluate the proposal. However, if the agent rejects the proposal, instead of discarding the proposal entirely, it should attempt to modify the proposal by initiating a collaborative negotiation subdialogue to resolve the agents' conflict about the proposal. Thus, we capture collaborative planning in a Propose-Evaluate-Modify cycle of actions (Chu-Carroll and Carberry 1994, 1995a). In other words, we view collaborative planning as agent A proposing a set of actions and beliefs to be added to the shared plan being developed, agent B evaluating the proposal based on his private beliefs to determine whether or not to accept the proposal, and, if not, agent B proposing a set of modifications to the original proposal. Notice that this model is a recursive one in that the modification process itself contains a full collaboration cycle---agent B's proposed modifications will again be evaluated by A, and if conflicts arise, A may propose modifications to the previously proposed modifications.</Paragraph>
      <Paragraph position="1"> To illustrate how the Propose-Evaluate-Modify framework models collaborative planning dialogues, consider the following dialogue segment, taken from the TRAINS  91 corpus (Gross, Allen, and Traum 1993): (11) M: Load the tanker car with the oranges, and as soon as engine E2 gets there, couple the cars and take it to uh (12) S: Well we need a boxcar to take the oranges.</Paragraph>
      <Paragraph position="2"> (13) M: No we need a tanker car.</Paragraph>
      <Paragraph position="3"> (14) S: No we need a tanker car to take the orange juice, we have to make the orange juice first.</Paragraph>
      <Paragraph position="4"> (15) M: Oh we don't have the orange juice yet. Where are there oranges?  In utterance (11), M proposes a partial plan of loading the tanker car with oranges and coupling it with engine E2. S evaluates and rejects the proposal and in utterance (12) conveys to M the invalidity of the proposal as a means of implicitly conveying his intention to modify the proposal. In utterance (13), M rejects the belief proposed by S in utterance (12), and addresses the conflict by restating his belief as a means of modifying S's proposal. This proposed belief is again evaluated and rejected by S who, in utterance (14), again attempts to modify M's proposal by providing a piece of supporting evidence different from that already presented in utterance (12). Finally in utterance (15), M accepts these proposed beliefs and thus S's original proposal that the partial plan proposed in utterance (11) is invalid.</Paragraph>
      <Paragraph position="5"> The empirical studies and models of collaboration proposed in Clark and Wilkes-Gibbs (1990) and Clark and Schaefer (1989) provide further support for our Propose-Evaluate-Modify framework. They show that participants collaborate in maintaining a coherent discourse and that contributions in conversation involve a presentation phase and an acceptance phase. In the case of referring expressions, $1 presents a referring expression as part of an utterance; $2 then evaluates the referring expression. In the  Computational Linguistics Volume 24, Number 3 acceptance phase, $2 provides evidence that he has identified the intended entity and that it is now part of their common ground. If there are deficits in understanding, the agents enter a phase in which the referring expression is refashioned. Clark and Wilkes-Gibbs note several kinds of refashioning actions, including $2 conveying his uncertainty about the intended referent (and thereby requesting an elaboration of it) and $1 replacing the referring expression with a new one of her own (still with the intention of identifying the entity intended by Sl's original expression). This notion of presentation-(evaluation)-acceptance for understanding is similar to our Propose-Evaluate-Modify framework for addition of actions and beliefs to the shared plan. Expressions of uncertainty and substitution actions in the repair phase correlate respectively with information-sharing and modification for conflict resolution in our framework.</Paragraph>
      <Paragraph position="6"> The rest of this paper discusses our plan-based model for response generation in collaborative planning dialogues. Our model focuses on communication and negotiation between a computational agent and a human agent who are collaborating on constructing a plan to be executed by the human agent at a later point in time. Throughout this paper, the user or executing agent (EA) will be used to refer to the agent who will eventually be executing the plan, and the system (CORE) or consulting agent (CA) will be used to refer to the computational agent who is collaborating on constructing the plan. Figure 1 shows a schematic diagram of the design of our response generation model, where the algorithm used in each subprocess is shown in boldface. However, before discussing the details of our response generation model, we first address the modeling of agent intentions, which forms the basis of our representation of agent proposals.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="363" end_page="378" type="metho">
    <SectionTitle>
4. Modeling the Dialogue
</SectionTitle>
    <Paragraph position="0"> In task-oriented collaborative planning, the agents clearly collaborate on constructing their domain plan. In the university course advisement domain, a domain action may be agent A getting a Master's degree in CS (Get-Masters(A, CS)). The agents may also collaborate on the strategies used to construct the domain plan, such as determining whether to investigate in parallel the different plans for an action or whether to first consider one plan in depth (Ramshaw 1991). Furthermore, the agents may collaborate on establishing certain mutual beliefs that indirectly contribute to the construction of their domain plan. For example, they may collaborate on a mutual belief about whether a particular course is offered next semester as a means of determining whether taking the course is feasible. Finally, the agents engage in communicative actions in order to exchange the above desired information.</Paragraph>
    <Paragraph position="1"> To represent the different types of knowledge necessary for modeling a collaborative dialogue, we use an enhanced version of the tripartite model presented in (Lambert and Carberry 1991) to capture the intentions of the dialogue participants.</Paragraph>
    <Paragraph position="2"> The enhanced dialogue model (Chu-Carroll and Carberry 1994) has four levels: the domain level, which consists of the domain plan being constructed to achieve the agents' shared domain goal(s); the problem-solving level which contains the actions being performed to construct the domain plan; the belief level, which consists of the mutual beliefs pursued to further the problem-solving intentions; and the discourse level which contains the communicative actions initiated to achieve the mutual beliefs.</Paragraph>
    <Paragraph position="3"> Actions at the discourse level can contribute to other discourse actions and also establish mutual beliefs. Mutual beliefs can support other beliefs and also enable problem-solving actions. Problem-solving actions can be part of other problem-solving actions and also enable domain actions.</Paragraph>
    <Paragraph position="4">  Chu-Carroll and Carberry Response Generation in Planning Dialogues Output: acceptance ,c accept of user proposal Input: proplsal by user  Each utterance by an agent constitutes a proposal that is intended to affect the agents' shared model of domain and problem-solving intentions, as well as their mutual beliefs. These proposals may be explicitly or implicitly conveyed by an agent's utterances. For example, consider the following utterances by EA:</Paragraph>
    <Paragraph position="6"> EA: I want to satisfy my seminar course requirement.</Paragraph>
    <Paragraph position="7"> Who is teaching CS689? The dialogue model that represents utterances (16) and (17) is shown in Figure 2. It shows the domain actions, problem-solving actions, mutual beliefs, and discourse actions inferred from these utterances, as well as the relationships among them. The actions and beliefs represented at the domain, problem-solving, and belief levels are treated as proposals, and are not considered shared actions or beliefs until the other agent accepts them. The beliefs captured by the nodes in the tree may be of three forms: 1) MB(_agentl,_agent2,_prop), representing that _agent1 and _agent2 come to mutually believe _prop, 2) MknowrefCagentl,_agent2,_var,_prop), meaning that _agent1 and _agent2 come to mutually know the referent of _var which will satisfy _prop, where _var is a variable in _prop, and 3) Mknowifl_agentl,_agent2,_prop), representing that _agent1 and _agent2 come to mutually know whether or not _prop is true. Inform actions produce</Paragraph>
    <Paragraph position="9"> Dialogue model for utterances (16) and (17).</Paragraph>
    <Paragraph position="10"> proposals for beliefs of the first type, while wh-questions and yes-no questions produce proposals for the second and third types of beliefs, respectively, s In order to provide the necessary information for performing proposal evaluation and response generation, we hypothesize a recognition algorithm, based on Lambert and Carberry (1991), that infers agents' intentions from their utterances. This algorithm makes use of linguistic knowledge, contextual knowledge, and world knowledge, and utilizes a library of generic recipes for performing domain, problem-solving, and discourse actions. The library of generic recipes (Pollack 1986) contains templates for performing actions. The recipes are also used by our response generation system in planning its responses to user utterances, and will be discussed in further detail in Section 5.2.</Paragraph>
    <Paragraph position="11"> Our system is presented with a dialogue model capturing a new user proposal and its relation to the preceding dialogue. Based on our Propose-Evaluate-Modify framework the system will evaluate the proposed domain and problem-solving actions, as well as the proposed mutual beliefs, to determine whether to accept the proposal. In 5 Note that wh-quesfions propose that the agents come to mutually know the referent of a variable. Once the proposal is accepted, the agents will work toward achieving this. Mutual knowledge is established when the other agent responds to the question by providing the referent of the variable and the response is accepted by the first agent. Similarly for the case of yes-no questions.</Paragraph>
    <Paragraph position="12">  Chu-Carroll and Carberry Response Generation in Planning Dialogues this paper, we focus on proposal evaluation and modification at the belief level. Readers interested in issues regarding proposal evaluation and modification with respect to proposed actions should refer to Chu-Carroll and Carberry (1994, in press) and Chu-Carroll (1996).</Paragraph>
    <Paragraph position="13">  5. Determining Acceptance or Rejection of Proposed Beliefs</Paragraph>
    <Section position="1" start_page="366" end_page="370" type="sub_section">
      <SectionTitle>
5.1 Evaluating Proposed Beliefs
</SectionTitle>
      <Paragraph position="0"> Previous research has noted that agents do not merely believe or disbelieve a proposition; instead, they often consider some beliefs to be stronger (less defeasible) than others (Lambert and Carberry 1992; Walker 1992; Cawsey et al. 1993). Thus, we associate a strength with each belief by an agent; this strength indicates the agent's confidence in the belief being an accurate description of situations in the real world.</Paragraph>
      <Paragraph position="1"> The strength of a belief is modeled with endorsements, which are explicit records of factors that affect one's certainty in a hypothesis (Cohen 1985), following Cawsey et al.</Paragraph>
      <Paragraph position="2"> (1993) and Logan et al. (1994). We adopt the endorsements proposed by Galliers (1992), based primarily on the source of the information, modified to include the strength of the informing agent's belief as conveyed by the surface form of the utterance used to express the belief. These endorsements are grouped into five classes: warranted, very strong, strong, weak, and very weak, based on the strength that each endorsement represents, in order for the strengths of multiple pieces of evidence for a belief to combine and contribute to determining the overall strength of the belief.</Paragraph>
      <Paragraph position="3"> The belief level of a dialogue model consists of one or more belief trees. Each belief tree includes a main belief, represented by the root node of the tree, and a set of evidence proposed to support it, represented by the descendents of the tree. 6 Given a proposed belief tree, the system must determine whether to accept or reject the belief represented by the root node of the tree (henceforth referred to as the top-level proposed belief). This is because the top-level proposed belief is the main belief that EA (the executing agent) is attempting to establish between the agents, while its descendents are only intended to provide support for establishing that belief (Young, Moore, and Pollack 1994). The result of the system's evaluation may lead to acceptance of the top-level proposed belief, rejection of it, or a decision that insufficient information is available to determine whether to accept or reject it.</Paragraph>
      <Paragraph position="4"> In evaluating a top-level proposed belief (_bel), the system first gathers its evidence for and against _bel. The evidence may be obtained from three sources: 1) EA's proposal of _bel, 2) the system's own private evidence pertaining to _bel, and 3) evidence proposed by EA as support for _bel. However, the proposed evidence will only affect the system's acceptance of _bel if the system accepts the proposed evidence itself; thus, as part of evaluating _bel, the system evaluates the evidence proposed to support _bel, resulting in a recursive process. A piece of evidence (for _bel) consists of an antecedent belief and an evidential relationship between the antecedent belief and _bel.</Paragraph>
      <Paragraph position="5"> .For example, one might support the claim that Dr. Lewis will not be teaching CS682 by stating that Dr. Lewis will be going on sabbatical. This piece of evidence consists of the belief that Dr. Lewis will be going on sabbatical and the evidential relationship 6 In this paper, we only consider situations in which an agent's proposed pieces of evidence all uniformly support or attack a belief, but not situations where some of the proposed pieces of evidence support a belief and some of them attack the belief. In cases where an agent proposes evidence to attack a belief, the proposed belief tree will be represented as the pieces of evidence supporting the negation of the belief being attacked.  Computational Linguistics Volume 24, Number 3 that Dr. Lewis being on sabbatical generally implies that he is not teaching courses. 7 A piece of evidence is accepted if both the belief and the relationship are accepted, rejected if either the belief or the relationship is rejected, and uncertain otherwise.</Paragraph>
      <Paragraph position="6"> The system's ability to decide whether to accept or reject a belief _bel may be affected by its uncertainty about whether to accept or reject evidence that EA proposed as support for _bel. For instance, the system's private evidence pertaining to _bel may be such that it will accept _bel only if it accepts the entire set of evidence proposed by EA. In this case, if the system is uncertain about whether to accept some of the proposed evidence, then this uncertainty would prevent it from accepting _bel. On the other hand, the system's own evidence against _bel may be strong enough to lead to its rejection of _bel regardless of its acceptance of the evidence proposed to support _bel.</Paragraph>
      <Paragraph position="7"> In this case, if the system is uncertain about whether to accept some of the proposed evidence, this uncertainty will have no effect on its decision to accept or reject _bel itself. Thus when the system is uncertain about whether to accept some of the proposed evidence, it must first determine whether resolving its uncertainty in these pieces of evidence has the potential to affect its decision about the acceptance of _bel. To do this, the system must determine the range of its decision about _bel, where the range is identified by two endpoints: the upperbound, which represents the system's decision about _bel in the best-case scenario where it has accepted all the uncertain pieces of evidence proposed to support _bel, and the lowerbound, which represents the system's decision about _bel in the worst-case scenario where it has rejected all the uncertain pieces of evidence. The actual decision about _bel then falls somewhere in between the upperbound and lowerbound, depending on which pieces of evidence are eventually accepted or rejected. If the upperbound and the lowerbound are both accept, then the system will accept _bel and the uncertainty about the proposed evidence will not be resolved since its acceptance or rejection will not affect the acceptance of _bel. s Similarly, if the upperbound and the lowerbound are both reject, the system will reject _bel and the uncertainty about the proposed evidence will again not be resolved. In other cases, the system will pursue information-sharing in order to obtain further information that will help resolve the uncertainty about these beliefs and then re-evaluate _bel.</Paragraph>
      <Paragraph position="8"> We developed an algorithm, Evaluate-Belief (Figure 3), for evaluating a proposal of beliefs based on the aforementioned principles. Evaluate-Belief is invoked with _bel instantiated as the top-level belief of a proposed belief tree. During the evaluation process, two sets of evidence are constructed: the evidence set, which contains the pieces of evidence pertaining to _bel that the system has accepted, and the potential evidence set, which contains the pieces of evidence proposed by the user that the system cannot determine whether to accept or reject. These two sets of evidence are 7 In our model, we associate two measures with an evidential relationship: 1) degree, which represents the amount of support the antecedent _beli provides for the consequent, _bel, and 2) strength, which represents an agent's strength of belief in the evidential relationship (Chu-Carroll 1996). For instance, the system may have a very strong (strength) belief that a professor going on sabbatical provides very strong (degree) support for him not teaching any courses. In some sense, degree can be viewed as capturing the relevance (Grice 1975) of a piece of evidence~the more support an antecedent provides for _bel, the more relevant it is to _bel. Because of space reasons, we will not make the distinction between degree and strength in the rest of this paper. We will use an agent's strength of belief in an evidential relationship to refer to the amount of support that the agent believes the antecedent provides for the consequent. This strength of belief is obtained by taking the weaker of the degree and strength associated with the evidential relationship in the actual representation in our system. 8 Young, Moore, and Pollack (1994) argued that if a parent belief is accepted even though a child belief that is intended to support it is rejected, the rejection of the child belief need not be addressed since it is no longer relevant to the agents' overall goal. Our strategy extends this concept to uncertain information.</Paragraph>
      <Paragraph position="9">  Chu-Carroll and Carberry Response Generation in Planning Dialogues Evaluate-Belief(_bel):  1. evidence set ~-- _bel (appropriately endorsed as conveyed by EA)gand the system's evidence pertaining to _bel 1deg 2. If _bel is a leaf node in the belief tree, return Determine-Acceptance(_bel,evidence set) 3. Evaluate each of _bel's children, -bell ..... _beln:  Algorithm for evaluating a proposed belief.</Paragraph>
      <Paragraph position="10"> then used to calculate the upperbound and the lowerbound, which in turn determine the system's acceptance of _bel.</Paragraph>
      <Paragraph position="11"> In calculating whether to accept a belief, Evaluate-Belief invokes Determine-Acceptance, which performs the following functi6ns (Chu-Carroll 1996): 1) it utilizes a simplified version of Galliers' belief revision mechanism (Galliers 1992; Logal et al.</Paragraph>
      <Paragraph position="12"> 1994) to determine the system's strength of belief in _bel (or its negation) given a set of evidence, by comparing the strengths of the pieces of evidence supporting and attacking _bel, ~1 and 2) it determines whether to accept, reject, or remain uncertain about the acceptance of _bel based on the resulting strength. In determining the strength of a piece of evidence consisting of an antecedent belief and an evidential relationship, 9 EA's proposal of _bel is endorsed according to EA's level of expertise in the subarea of _bel as well as her confidence in _bel as conveyed by the surface form of her utterance.</Paragraph>
      <Paragraph position="13"> 10 In our implementation, CORE's knowledge base contains a set of evidential relationships. Its evidence pertaining to _bel consists of its beliefs about _bel as well as those {_evid-rel,_evid-bel} pairs where 1) the consequent of _evid-rel is _bel, 2) the antecedent of _evid-rel is _evid-bel, and 3) _evid-bel is held by CORE. Future work will investigate how evidence might be inferred and how resource limitations (Walker 1996b) affect the appropriate depth of inferencing.</Paragraph>
      <Paragraph position="14"> 11 To implement our system, we needed a means of estimating the strength of a belief, and we have based this estimation on endorsements such as those used in Galliers' belief revision system. However, the focus of our work is not on a logic of belief, and the mechanisms that we have developed for evaluating proposed beliefs and for effectively resolving detected conflicts (Section 6) are independent of any particular belief logic. Therefore we will not discuss further the details of how strength of belief is determined. Readers are welcome to substitute their favorite means for combining beliefs of various strengths.</Paragraph>
      <Paragraph position="15">  Computational Linguistics Volume 24, Number 3</Paragraph>
      <Paragraph position="17"> Beliefs proposed in utterances (18) and (19).</Paragraph>
      <Paragraph position="18"> Determine-Acceptance follows Walker's weakest link assumption (Walker 1992) and computes the strength of the evidence as the weaker of the strengths of the antecedent belief and the evidential relationship.</Paragraph>
      <Paragraph position="19"> 5.1.1 Example of Evaluating Proposed Beliefs. To illustrate the evaluation of proposed beliefs, consider the following utterances by EA, in response to CORE's proposal that the professor of CS682 may be Dr. Lewis:  (18) EA: The professor of CS682 is not Dr. Lewis.</Paragraph>
      <Paragraph position="20"> (19) Dr. Lewis is going on sabbatical in 1998.</Paragraph>
      <Paragraph position="21">  Figure 4 shows the beliefs proposed by utterances (18) and (19) as follows: 1) the professor of CS682 is not Dr. Lewis, 2) Dr. Lewis is going on sabbatical in 1998, and 3) Dr. Lewis being on sabbatical provides support for him not being the professor of CS682. Note that the second and third beliefs constitute a piece of evidence proposed as support for the first belief. Given these proposed beliefs, CORE evaluates the proposal by invoking the Evaluate-Belief algorithm on the top-level proposed belief, -~Professor(CS682,Lewis). As part of evaluating this belief, CORE evaluates the evidence proposed by EA (step 3 in Figure 3), thus recursively invoking Evaluate-Belief on both the proposed child belief, On-Sabbatical(Lewis,1998), in step 3.1 and the proposed evidential relationship, supports(On-Sabbatical(Lewis,1998),-~Professor(CS682,Lewis)), in step 3.2. When evaluating On-Sabbatical(Lewis,1998), CORE first searches in its private beliefs for evidence relevant to it, which includes: 1) a weak piece of evidence for Dr. Lewis going on sabbatical in 1998, consisting of the belief that Dr. Lewis has been at the university for 6 years and the evidential relationship that being at the university for 6 years provides support for a professor going on sabbatical next year (1998), and 2) a strong piece of evidence against Dr. Lewis going on sabbatical, consisting of the belief that Dr. Lewis has not been given tenure and the evidential relationship that not having been given tenure provides support for a professor not going on sabbatical. These two pieces of evidence are incorporated into the evidence set, along with EA's proposal of the belief, endorsed {non-expert, direct-statement} which has a corresponding strength of strong. CORE then invokes Determine-Acceptance to evaluate how strongly the evidence favors believing or disbelieving On-Sabbatical(Lewis,1998) (step 2). Determine-Acceptance finds that the evidence weakly favors believing On-Sabbatical(Lewis,1998); since this strength does not exceed the predetermined threshold for acceptance (which in our implementation of CORE is strong), CORE reserves judgment about the acceptance of On-Sabbatical(Lewis,1998). Since CORE has a very strong private belief that being on sabbatical provides support for a professor not teaching  Chu-Carroll and Carberry Response Generation in Planning Dialogues a course, CORE accepts the proposed evidential relationship. Since CORE accepts the proposed evidential relationship but is uncertain about the acceptance of the proposed child belief, the acceptance of this piece of evidence is undetermined; thus it is added to the potential evidence set (step 3.5).</Paragraph>
      <Paragraph position="22"> CORE then evaluates the top-level proposed belief, ~Professor(CS682,Lewis). The evidence set consists of EA's proposal of the belief, endorsed {non-expert, direct-statement} whose corresponding strength is strong, and CORE's private weak belief that the professor of CS682 is Dr. Lewis. CORE then computes the upperbound on its decision about accepting -Professor(CS682,Lewis) by considering evidence from both the evidence set and the potential evidence set (step 4.1), resulting in the upperbound being accept. It then computes the lowerbound by considering only evidence from the evidence set, resulting in the lowerbound being uncertain. Since the upperbound is accept and the lowerbound uncertain, CORE again reserves judgment about whether to accept -~Professor(CS682,Lewis), leading it to defer its decision about its acceptance of EA's proposal in (18)-(19).</Paragraph>
    </Section>
    <Section position="2" start_page="370" end_page="373" type="sub_section">
      <SectionTitle>
5.2 Initiating Information-Sharing Subdialogues
</SectionTitle>
      <Paragraph position="0"> A collaborative agent, when facing a situation in which she is uncertain about whether to accept or reject a proposal, should attempt to share information with the other agent so that the agents can knowledgeably re-evaluate the proposal and perhaps come to agreement. We call this type of subdialogue an information-sharing subdialogue (Chu-Carroll and Carberry 1995b). Information-sharing subdialogues differ from information-seeking or clarification subdialogues (van Beek, Cohen, and Schmidt 1993; Raskutti and Zukerman 1993; Logan et al. 1994; Heeman and Hirst 1995). The latter focus strictly on how an agent should go about gathering information from another agent to resolve an ambiguous proposal. In contrast, in an information-sharing subdialogue, an agent may gather information from another agent, present her own relevant information (and invite the other agent to address it), or do both in an attempt to resolve her uncertainty about whether to accept or reject a proposal that has been unambiguously interpreted. Since a collaborative agent should engage in effective and efficient dialogues, she should pursue the information-sharing subdialogue that she believes will most likely result in the agents coming to a rational decision about the proposal. The process for initiating information-sharing subdialogues involves two steps: selecting a subset of the uncertain beliefs that the agent will explicitly address during the information-sharing process (called the focus of information-sharing), and selecting an effective information-sharing strategy based on the agent's beliefs about the selected focus. This process is captured by the recipe for the Share-Info-Reevaluate-Beliefs problem-solving action that is part of a recipe library used by CORE's mechanism for planning responses.</Paragraph>
      <Paragraph position="1"> A recipe includes a header specifying the action defined by the recipe, the recipe type, the applicability conditions and preconditions of the action, the subactions comprising the body of the recipe, and the goal of performing the action. The applicability conditions and preconditions are both conditions that must be satisfied before an action can be performed; however, while it is anomalous for an agent to attempt to satisfy an unsatisfied applicability condition, she may construct a plan to satisfy a failed precondition. A recipe may be of two types: specialization or decomposition. In a specialization recipe, the body of the recipe contains a set of alternative actions that will each accomplish the header action. In a decomposition recipe, the body consists of  The Share-Info-Reevaluate-Beliefs recipe.</Paragraph>
      <Paragraph position="2"> a set of simpler subactions for performing the action encoded by the recipe. 12 Finally, the goal of an action is what the agent performing the action intends to achieve.</Paragraph>
      <Paragraph position="3"> As shown in Figure 5, Share-Info-Reevaluate-Beliefs is applicable only if _agent1 is uncertain about the acceptance of a belief tree proposed by _agent2. The precondition of the action specifies that the focus of information-sharing be identified. The recipe for Share-Info-Reevaluate-Beliefs is of type specialization and its body consists of four subactions that correspond to four alternative information-sharing strategies that _agent1 may adopt in attempting to resolve its uncertainty in the acceptance of the selected focus. The selected subaction will be the one whose applicability conditions (as specified in its recipe) are satisfied; since the applicability conditions for the four sub-actions are mutually exclusive, only one will be selected. This subaction will initiate an information-sharing subdialogue and lead to _agentl's re-evaluation of _agent2's original proposal, taking into account the newly obtained information. Next we describe how the focus of information-sharing is identified and how an information-sharing strategy is selected.</Paragraph>
      <Paragraph position="4">  uncertain about the acceptance of a top-level proposed belief, _bel, it may also have been uncertain about the acceptance of some of the evidence proposed to support it.</Paragraph>
      <Paragraph position="5"> Thus, when the system initiates an information-sharing subdialogue to resolve its uncertainty about _bel, it could either directly resolve the uncertainty about _bel itself, or resolve a subset of the uncertain pieces of evidence proposed to support _bel, thereby perhaps resolving its uncertainty about _bel. We refer to the subset of uncertain beliefs that will be addressed during information-sharing as the focus of information-sharing.</Paragraph>
      <Paragraph position="6"> Selection of the focus of information-sharing partly depends on the upperbound and the lowerbound on the system's decision about accepting _bel. The possible combinations of these values produced by the Evaluate-Belief algorithm are shown in Table 2.13 In cases 1 and 2, the system accepts/rejects _bel regardless of whether the pieces of 12 In Allen's formalism (Allen 1979), the body of a recipe could contain a set of goals to be achieved or a set of actions to be performed. In our current system, the preconditions are goals that are matched against the goals of recipes, and the body contains actions that are matched against the header action in recipes.</Paragraph>
      <Paragraph position="7"> 13 In our model, a child belief in a proposed belief tree is always intended to provide support for its parent belief; thus the evidence in the potential evidence set contributes positively toward the system's acceptance of _bel. Since the upperbound is computed by taking into account evidence from both the evidence and potential evidence sets while the lowerbound is computed by considering evidence from the evidence set alone, the upperbound will always be greater than or equal to the lowerbound (on the scale of reject, uncertain, and accept). Thus only six out of the nine theoretically possible combinations can occur.</Paragraph>
      <Paragraph position="8">  resolve uncertainty regarding _bel itself attempt to accept uncertain evidence attempt to reject uncertain evidence action in cases 4 and/or 5 evidence in the potential evidence set, if ann are accepted or rejected. In these cases, the uncertainty about the proposed evidence does not affect the system's acceptance of _bel, and therefore need not be resolved. In case 3, the system remains uncertain about the acceptance of _bel regardless of whether the uncertain pieces of evidence, if any, are accepted or rejected, i.e., resolving the uncertainty about the evidence will not help resolve the uncertainty about _bel. Thus, the system should focus on sharing information about _bel itself. TM In cases 4 and 6 where the upperbound is accept, acceptance of a large-enough subset of the uncertain evidence will result in the system accepting _bel, and in cases 5 and 6 where the lowerbound is reject, rejection of a large-enough subset of the uncertain evidence can lead the system to reject _bel. Thus in all three cases, the system should initiate information-sharing to resolve the uncertainty about the proposed evidence in an attempt to resolve the uncertainty about _bel. is However, when there is more than one piece of evidence in the potential evidence set, the system should select a minimum subset of these pieces of evidence to address based on the likelihood of each piece of evidence affecting the system's resolution of the uncertainty about _bel.</Paragraph>
      <Paragraph position="9"> In selecting the focus of information-sharing, we take into account the following three factors: 1) the number factor: the number of pieces of uncertain evidence that will be addressed during information-sharing, since one would prefer to address as few pieces of evidence as possible, 2) the effort factor: the effort involved in resolving the uncertainty in a piece of evidence, since one would prefer to address the pieces of evidence that require the least amount of effort to resolve, and 3) the contribution factor: the contribution of each uncertain piece of evidence toward resolving the uncertainty about _bel, since one would prefer to address the uncertain pieces of evidence predicted to have the most impact on resolving the uncertainty about _bel. In cases 4 and 6 in Table 2, where the system will accept _bel if it accepts a sufficient subset of the uncertain evidence, the goal is to select as focus a minimum subset of the uncertain pieces of evidence 1) whose uncertainty requires the least effort to resolve, and 2) which, if accepted, are predicted to lead the system to accept _bel. Similarly, in cases 5 and 6, where the system will reject _bel if it can reject a sufficient subset 14 It might be the case that the system gathers further information about _bel, re-evaluates _bel taking into account the newly-obtained information, and is still uncertain about whether to accept or reject _bel. If this reevaluation of _bel with additional evidence falls into case 4, 5, or 6, then the uncertainty about the proposed evidence becomes relevant and will be pursued.</Paragraph>
      <Paragraph position="10"> 15 Based on our algorithm (to be shown in Figure 6), in case 6, the system will perform the actions in both cases 4 and 5, i.e., try and gather both information that may lead to the acceptance of _bel and information that may lead to the rejection of _bel, and leave it up to the user to determine which one to address. Alternatively, the system could be designed to select between the actions in cases 4 and 5, i.e., determine whether attempting to accept _bel or attempting to reject _bel is more efficient, and pursue the more promising path. We leave this for future work.</Paragraph>
      <Paragraph position="11">  /* _bel has been previously annotated with two features by Evaluate-Belief: _bel.evidence: evidence pertaining to _bel which the system accepts _bel.potentiah evidence proposed by the user for _bel and about which the system is uncertain */ 1. /* Cases I &amp; 2 */ If _bel.upper = _beLlower = accept or if _bel.upper = _beLlower = reject, focus ~- (}; return focus.</Paragraph>
      <Paragraph position="12"> 2. /* Case 3 */ If _bel.upper = _beLlower = uncertain, focus ~-- (_bel}; return focus.</Paragraph>
      <Paragraph position="13"> 3. If _bel has no uncertain children, focus ~-- (_bel}; return focus.</Paragraph>
      <Paragraph position="14"> 4. /* Cases 4 &amp; 6 */ If _beLupper = accept,</Paragraph>
    </Section>
    <Section position="3" start_page="373" end_page="373" type="sub_section">
      <SectionTitle>
4.1 /* The effort factor */
</SectionTitle>
      <Paragraph position="0"> Assign each piece of uncertain evidence in _bel.potential to a set, and order the sets according to how close the evidence in each set was to being accepted. Call them _set1,...,-setm.</Paragraph>
      <Paragraph position="1">  form new sets of evidence of size _set-size from _bel.potential, rank new sets according to how close the evidence in each set was to being rejected, goto 5.2.</Paragraph>
    </Section>
    <Section position="4" start_page="373" end_page="378" type="sub_section">
      <SectionTitle>
5.4 Else, focus ~ U_elj~-seti Select-Focus-Info-Sharing(_el/); return focus.
</SectionTitle>
      <Paragraph position="0"> Figure 6 Algorithm for selecting the focus of information-sharing. of the uncertain evidence, the system's goal is to select as focus a minimum subset of the uncertain pieces of evidence 1) whose uncertainty requires the least effort to resolve, and 2) which, if rejected, are predicted to cause the system to reject _bel. Once the system has identified this subset of uncertain evidence, it has to determine the focus of information-sharing for resolving the uncertainty regarding these pieces of evidence, leading to a recursive process.</Paragraph>
      <Paragraph position="1"> Our algorithm Select-Focus-Info-Sharing, shown in Figure 6, carries out this pro- null Chu-Carroll and Carberry Response Generation in Planning Dialogues cess. It is invoked with _bel instantiated as the uncertain top-level proposed belief. Steps 4 and 5 of the algorithm capture the above principles for identifying a set of uncertain beliefs as the focus of information-sharing. Our algorithm guarantees that the fewest pieces of uncertain evidence for _bel will be addressed, and that the belief(s) selected as focus are those that requires the least effort to achieve among those that are strong enough to affect the acceptance of _bel, thus satisfying the above criteria.  produced by the Select-Focus-Info-Sharing algorithm, is a set of one or more proposed beliefs that the system cannot decide whether to accept and whose acceptance (or rejection) will affect the system's acceptance of the top-level proposed belief. For each of these uncertain beliefs, the system must select an information-sharing strategy that specifies how it will go about sharing information about the belief to resolve its uncertainty. Let _focus be one of the beliefs identified as the focus of informationsharing. The selection of a particular information-sharing strategy should be based on the system's existing beliefs about _focus as well as its beliefs about EA's beliefs about -focus. As discussed in Section 3.1, our analysis of naturally occurring collaborative dialogues shows that human agents may adopt one of four information-sharing strategies. The information-sharing strategies and the criteria under which we believe each strategy should be adopted are as follows: 1. Invite-Attack, in which agent A presents a piece of evidence against Xocus and (implicitly) invites the other agent (agent B) to attack it. This strategy focuses B's attention on the counterevidence and suggests that it is what keeps A from accepting _focus. This strategy is appropriate when A's counterevidence for _focus is critical, i.e., if convincing A that the counterevidence is invalid will cause A to accept _focus. This strategy also allows for the possibility of B accepting the counterevidence and both agents possibly adopting q_focus instead of _focus.</Paragraph>
      <Paragraph position="2"> 2. Ask-Why, in which A queries B about his reasons for believing in _focus.</Paragraph>
      <Paragraph position="3"> This strategy is appropriate when A does not know B's support for _focus, and intends to find out this information. This will result either in A gathering evidence that contributes toward her accepting _focus, or in A discovering B's invalid justification for holding _focus and attempting to convince B of its invalidity.</Paragraph>
      <Paragraph position="4">  3. Ask-Why-and-Invite-Attack, in which A queries B for his evidence for -focus and also presents her evidence against it. This strategy is appropriate when A does not know B's support for -focus, but does have (noncritical) evidence against it. In this case B may provide his support for _focus, attack A's evidence against -focus, or accept A's counterevidence and perhaps subsequently adopt ~_focus.</Paragraph>
      <Paragraph position="5"> 4. Express-Uncertainty, in which A indicates her uncertainty about accepting -focus and presents her evidence against _focus, if any. This strategy is appropriate when none of the previous three strategies apply.</Paragraph>
      <Paragraph position="6"> 16 In the worst-case scenario, the algorithm will examine every superset of the elements in _bel.potential.  However, _bel.potential contains only those proposed pieces of evidence whose acceptance is uncertain, which depends only on the number of utterances provided in a single turn, but not on the size of CORE's or EA's knowledge base. Thus, we believe that this combinatorial aspect of the algorithm should not affect the scalability of our system.</Paragraph>
      <Paragraph position="7">  The Reevaluate-After-Invite-Attack recipe.</Paragraph>
      <Paragraph position="8"> In collaborative dialogues, A's indication of her uncertainty should lead B to provide information that he believes will help A re-evaluate the proposal.</Paragraph>
      <Paragraph position="9"> We have realized these four information-sharing strategies as problem-solving recipes in our system. Figure 7 shows the recipe for the Reevaluate-After-Invite-Attack action which corresponds to the Invite-Attack strategy. Reevaluate-After-Invite-Attack takes four parameters: _agent1, the agent initiating information-sharing; _agent2, the agent who proposed the beliefs under consideration; _focus, a belief selected as the focus of information-sharing; and _proposed-belief-tree, which is the belief tree from _agent2's original proposal. The Reevaluate-After-Invite-Attack action is applicable when _agent1 is uncertain about the acceptance of _focus (captured by the first two applicability conditions). Furthermore, _agent1 must hold another belief _bel that satisfies the following two conditions: 1) _agent1 believes that _bel provides support for -l_focus, and 2) _agent1 disbelieving _bel will result in her accepting _focus, i.e., _bel is the only reason that prevents _agent1 from accepting _focus.</Paragraph>
      <Paragraph position="10"> In the body of Reevaluate-After-Invite-Attack, _agent1 re-evaluates _proposedbelief-tree, _agent2's original proposal, by taking into account the information that she has obtained since it was last evaluated. This new information is obtained through an information-sharing subdialogue using the Invite-Attack strategy, and the dialogue is initiated in an attempt to satisfy the preconditions of Reevaluate-After-Invite-Attack. Before performing the body of Reevaluate-After-Invite-AttacK one of three alternative preconditions must hold: 1) the agents mutually believe _bel and that _bel provides support for l_focus, i.e., _agent2 has accepted _agentl's counterevidence for _focus, 2) the agents mutually believe ~_bel, i.e., _agent1 has given up on her belief about _bel and thus the counterevidence, or 3) the agents mutually believe that _bel does not provide support for l-focus, i.e., _agent1 has changed her belief about the supports :relationship and thus the counterevidence. Since _agent1 believes in both _bel and supports(_bel,~_focus) when the action is initially invoked, she will attempt to satisfy the first precondition by adopting discourse actions to convey these beliefs to _agent2. This results in _agent1 initiating an information-sharing subdialogue to convey to _agent2 her critical evidence against _focus and (implicitly) inviting _agent2 to attack this evidence.</Paragraph>
      <Paragraph position="11">  the example in Section 5.1.1 where CORE has reserved judgment about two beliefs  Chu-Carroll and Carberry Response Generation in Planning Dialogues Dialogue Model for Utterances (18) and (19) ....................... ~. .............................. ,</Paragraph>
      <Paragraph position="13"> ......................................................................................</Paragraph>
      <Paragraph position="14"> Isn &amp;quot;t it true that Dr. Lewis hasn't been given tenure? Figure 8 Dialogue model for utterance (20).</Paragraph>
      <Paragraph position="15"> proposed by EA, namely ~Professor(CS682,Lewis) and On-Sabbatical(Lewis,1998). Since the upperbound and lowerbound on the decision about whether to accept or reject ~Professor(CS682, Lewis) were accept and uncertain, CORE pursues information-sharing by invoking the Share-Info-Reevaluate-Beliefs action (Figure 5), which in turn invokes Select-Focus-Info-Sharing (Figure 6) on the top-level proposed belief -~Professor(CS682,Lewis). Since the potential evidence set contains only one piece of evidence (the only piece of evidence proposed by EA), and CORE's acceptance of this piece of evidence will result in its acceptance of the topqevel proposed belief, the algorithm is applied recursively to the uncertain child belief On-Sabbatical(Lewis,1998). Since the child belief has no children in the proposed belief tree, On-Sabbatical(Lewis,1998) is selected as the focus of information-sharing.</Paragraph>
      <Paragraph position="16"> CORE now performs the body of Share-Info-Reevaluate-Beliefs on the identified focus, On-Sabbatical(Lewis,1998), by selecting an appropriate information-sharing strategy. Since CORE's belief that Dr. Lewis not having been given tenure and its belief in the evidential relationship that Dr. Lewis not having been given tenure implies that he is not going on sabbatical constitute the only obstacle against its acceptance of On-Sabbatical(Lewis,1998), Reevaluate-After-Invite-Attack (Figure 7) is selected as the subaction for Share-Info-Reevaluate-Beliefs.</Paragraph>
      <Paragraph position="17"> Figure 8 shows the dialogue model that will be constructed for this information-sharing process. In order to satisfy the first precondition of Reevaluate-After-Invite-Attack, CORE posts MB(CA, EA, ~Tenured(Lewis)) and MB(CA, EA, supports(~Tenured(Lewis),~On-Sabbatical(Lewis,1998))) as mutual beliefs to be achieved. CORE applies the Express-Doubt discourse action (based on Lambert and Carberry \[1992\]) to simultaneously achieve these two goals, leading to the generation of the  semantic form of the following utterance: (20) CA: Isn't it true that Dr. Lewis hasn't been given tenure? 5.2.4 Possible Follow-ups to Utterance (20). Now consider how the alternative disjuncts of the precondition for Reevaluate-After-Invite-Attack might be satisfied to enable the execution of the body of the action. Figure 9 shows the recipe for Reevaluate-After-Invite-Attack as instantiated in this example. Consider the following alternative responses to utterance (20): 17 (21) a. EA: Oh, you're right. I guess that means he's not going on sabbatical.</Paragraph>
      <Paragraph position="18"> b. EA: He told me that his tenure was approved yesterday.</Paragraph>
      <Paragraph position="19"> c. EA: Yes, but he got special permission to take an early sabbatical.</Paragraph>
      <Paragraph position="20"> d. EA: Really? Are you sure of that?  Utterance (21a) would be interpreted as EA accepting the beliefs proposed in (20). This indicates that EA now believes both ~Tenured(Lewis) and supports(-~Tenured(Lewis), ~On-Sabbatical(Lewis,1998)), thus satisfying the first precondition of Reevaluate-After-Invite-Attack. CORE will then reevaluate EA's original proposal, taking into account the new information obtained from utterance (21a).</Paragraph>
      <Paragraph position="21"> In utterance (21b), EA conveys rejection of CORE's proposed belief, ~Tenured(Lewis). If CORE accepts EA's proposal in (21b), then the mutual belief Tenured(Lewis) is established between the agents. This satisfies the second precondition in Figure 9 and leads CORE to reevaluate EA's original proposal. In utterance (21c), on the other hand, EA conveys rejection of CORE's proposed evidential relationship, supports(~Tenured( Lewis),~On-Sabbatical(Lewis,1998)). If CORE accepts EA's proposal in (21c), then the mutual belief ~supports(~Tenured(Lewis), ~On-Sabbatical(Lewis,1998)) is established between the agents. This satisfies the third precondition in Figure 9 and leads CORE to re-evaluate EA's original proposal. Although in utterance (20), CORE attempted to satisfy the precondition that both agents believe ~Tenured(Lewis) and supports(~Tenured(Lewis), 17 A reviewer suggested a fifth possible response of &amp;quot;So what?&amp;quot; In such a case, the system would need to recognize that EA failed to comprehend the implied evidential relationship between not being tenured and not going on sabbatical. Our current system cannot handle misunderstandings such as this.  Chu-Carroll and Carberry Response Generation in Planning Dialogues ~On-Sabbatical(Lewis,1998)), the precondition that is actually satisfied in (21b) and (21c) is different. This illustrates how the preconditions of Reevaluate-After-Invite-Attack capture situations in which EA presents counterevidence to CORE's critical evidence and changes its beliefs.</Paragraph>
      <Paragraph position="22"> Utterance (21d), on the other hand, would be interpreted as EA being uncertain about whether to accept or reject CORE's proposal in (20), and initiating an information-sharing subdialogue to resolve this uncertainty. This example illustrates how an extended information-sharing process can be captured in our model as a recursive sequence of Propose and Evaluate actions. CORE's first Evaluate action results in uncertainty about the acceptance of EA's proposal in (18) and (19), and leads to the information-sharing subdialogue initiated by (20). CORE's proposal in (20) is evaluated by EA, whose uncertainty about whether to accept it leads her to initiate an embedded information-sharing subdialogue in utterance (21d).</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="378" end_page="391" type="metho">
    <SectionTitle>
6. Resolving Conflicts in Proposed Beliefs
</SectionTitle>
    <Paragraph position="0"> The previous section described our processes for evaluating proposed beliefs and initiating information-sharing to resolve the system's uncertainty in its acceptance of the proposal. The final outcome of the evaluation process is an informed decision about whether the system should accept or reject EA's proposal. When the system rejects EA's proposal, it will attempt to modify the proposal instead of simply discarding it.</Paragraph>
    <Paragraph position="1"> This section describes algorithms for producing responses in negotiation subdialogues initiated as part of the modification process.</Paragraph>
    <Paragraph position="2"> The collaborative planning principle in Whittaker and Stenton (1988); Walker and Whittaker (1990); and Walker (1992) suggests that &amp;quot;conversants must provide evidence of a detected discrepancy in belief as soon as possible'(Walker 1992, 349). Thus, once an agent detects a relevant conflict, she must notify the other agent of the conflict and attempt to resolve it--to do otherwise is to fail in her responsibilities as a collaborative participant. A conflict is &amp;quot;relevant&amp;quot; to the task at hand if it affects the domain plan being constructed. In terms of proposed beliefs, detected conflicts are relevant only if they contribute to resolving the agents' disagreement about a top-level proposed belief.</Paragraph>
    <Paragraph position="3"> This is because the top-level proposed belief contributes to problem-solving actions that in turn contribute to domain actions, while the other beliefs are proposed only as support for it. If the agents agree on the top-level proposed belief, then whether or not they agree on the evidence proposed to support it is no longer relevant (Young, Moore, and Pollack 1994).</Paragraph>
    <Paragraph position="4"> The negotiation process for conflict resolution is carried out by the Modify component of our Propose-Evaluate-Modify cycle. The goal of the modification process is for the agents to reach an agreement on accepting perhaps a variation of EA's original proposal. However, a collaborative agent should not modify a proposal without the other agent's consent. This is captured by our Modify-Proposal action and its two specializations: 1) Correct-Node (Figure 10), which is invoked when the agents attempt to resolve their conflict about a proposed belief, and 2) Correct-Relation, which is invoked when the agents attempt to resolve their conflict about the proposed evidential relationship between two beliefs. The recipes for the first subaction of each of these actions, Modify-Node (Figure 10) and Modify-Relation, share a common precondition that both agents agree that the original proposal is faulty before any modification can take place. It is the attempt to satisfy this precondition that leads to the initiation of a negotiation subdialogue and the generation of natural language utterances to resolve the agents' conflict.</Paragraph>
    <Paragraph position="5"> Communication for conflict resolution involves an agent (agent A) conveying to  the other agent (agent B) the detected conflict and perhaps providing evidence to support her point of view. If B accepts A's proposal for modification, the actual modification process will be carried out. On the other hand, if B does not immediately accept A's claims, he may provide evidence to justify his point of view, leading to an extended negotiation subdialogue to resolve the detected conflict. This negotiation subdialogue may lead to 1) A accepting B's beliefs, thereby accepting B's original proposal and abandoning her proposal to modify it, 2) B accepting A's beliefs, allowing A to carry out the modification of the proposal, 3) the agents accepting a further modification of the proposal, TM or 4) a disagreement between A and B that cannot be resolved. The last case is beyond the scope of this work.</Paragraph>
    <Paragraph position="6"> As in the case of information-sharing, when a top-level proposed belief is rejected by the system, the system may have also rejected some of the evidence proposed to support the top-level belief. Thus, the system must first identify the subset of detected conflicts it will explicitly address in its pursuit of conflict resolution. Furthermore, it must determine what evidence it will present to EA in an attempt to resolve the agents' conflict about these beliefs. The following sections address these two issues.</Paragraph>
    <Section position="1" start_page="379" end_page="382" type="sub_section">
      <SectionTitle>
6.1 Selecting the Focus of Modification
</SectionTitle>
      <Paragraph position="0"> Since collaborative agents are expected to engage in effective and efficient dialogues and not to argue for the sake of arguing, the system should address the rejected belief(s) that it predicts will most efficiently resolve the agents' conflict about the top-level proposed belief. This subset of rejected beliefs will be referred to as the focus of modification.</Paragraph>
      <Paragraph position="1"> The process for selecting the focus of modification operates on a proposed belief tree evaluated using the Evaluate-Belief algorithm in Figure 3 and involves two steps. First, the system constructs a candidate foci tree consisting of the top-level proposed belief along with the pieces of evidence that, if refuted, might resolve the agents' conflict about the top-level proposed belief. These pieces of evidence satisfy the following two criteria: First, the evidence must have been rejected by the system, since a collaborative agent should only refute those beliefs about which the agents disagree. Second, the evidence must be intended to support a rejected belief or evidential relationship in the candidate foci tree. This is because successful refutation of such evidence will 18 This possibility is captured by the recursive nature of our Propose-Evaluate-Modify framework as noted in Section 3.2, but will not be discussed further in this paper.  An evaluated belief tree (a) and its corresponding candidate foci tree (b).</Paragraph>
      <Paragraph position="2"> lessen the support for the rejected belief or relationship it was intended to support and thus indirectly further refutation of the piece of evidence that it is part of; by transitivity, this refutation indirectly furthers refutation of the top-level proposed belief. Our algorithm for constructing the candidate foci tree first enters the top-level belief from the proposed belief tree into the candidate foci tree, since successful refutation of this belief will resolve the agents' conflict about the belief. It then performs a depth-first search on the proposed belief tree to determine the nodes and links that should be included in the candidate foci tree. When a node in the belief tree is visited, both the belief and the evidential relationship between the belief and its parent are examined. If either the belief or the relationship was rejected by the system during the evaluation process, this piece of evidence satisfies the two criteria noted above and is included in the candidate foci tree. The system then continues to search through the evidence proposed to support the rejected belief and/or evidential relationship. On the other hand, if neither the belief nor the relationship was rejected, the search on the current branch terminates, since the evidence itself does not satisfy the first criterion, and none of its descendents would satisfy the second criterion.</Paragraph>
      <Paragraph position="3"> Given the evaluated belief tree in Figure 11(a), Figure 11(b) shows its corresponding candidate foci tree. The parenthesized letters indicate whether a belief or evidential relationship was accepted (a) or rejected (r) during the evaluation process. Notice that the evidence {c, supports(c,a)} is not included in the candidate foci tree because the first criterion is not satisfied. In addition, {Lsupports(f,e)} is not incorporated into the candidate foci tree because the evidence does not satisfy the second criterion.</Paragraph>
      <Paragraph position="4"> The second step in selecting the focus of modification is to select from the candidate foci tree a subset of the rejected beliefs and/or evidential relationships that the system will explicitly refute. The system could attempt to change EA's belief about the top-level belief &amp;el by 1) explicitly refuting _bel, 2) explicitly refuting the proposed evidence for _beL thereby causing him to accept ~_bel, or 3) refuting both _bel and its rejected evidence. A collaborative agent's first preference should be to address the rejected evidence, since McKeown's focusing rules suggest that continuing a newly introduced topic is preferable to returning to a previous topic (McKeown 1985). When a piece of evidence for _bel is refuted, both the evidence and _bel are considered open beliefs and can be addressed naturally in subsequent dialogues. On the other hand, if the agent addresses _bel directly, thus implicitly closing the pieces of evidence proposed to support _beL then it will be less coherent to return to these rejected pieces  Computational Linguistics Volume 24, Number 3 of evidence later on in the dialogue. Furthermore, in addressing a piece of rejected evidence to refute _bel, an agent conveys disagreement regarding both the evidence and _bel. If this refutation succeeds, then the agents not only have resolved their conflict about _bel, but have also eliminated a piece of invalid support for _bel. Although the agents' goal is only to resolve their conflict about _bel, removing support for _bel has the beneficial side effect of strengthening acceptance of -~_bel, i.e., removing any lingering doubts that EA might have about accepting -~_bel. If the system chooses to refute the rejected evidence, then it must identify a minimally sufficient subset that it will actually address, and subsequently identify how it will go about refuting each piece of evidence in this subset. This potentially recursive process produces a set of beliefs, called the focus of modification, that the system will explicitly refute.</Paragraph>
      <Paragraph position="5"> In deciding whether to refute the rejected evidence proposed as support for _bel, to refute _bel directly, or to refute both the rejected evidence and _bel, the system must consider which strategy will be successful in changing EA's beliefs about _bel. The system should first predict whether refuting the rejected evidence alone will produce the desired belief revision. This prediction process involves the system first selecting a subset of the rejected evidence that it predicts it can successfully refute, and then predicting whether eliminating this subset of the rejected evidence is sufficient to cause EA to accept ~_bel. If refuting the rejected evidence is predicted to fail to resolve the agents' conflict about _bel, the system should predict whether directly attacking _bel will resolve the conflict. If this is again predicted to fail, the system should consider whether attacking both _bel and its rejected evidence will cause EA to reject _bel. If none of these is predicted to succeed, then the system does not have sufficient evidence to convince EA of -~_bel.</Paragraph>
      <Paragraph position="6"> Our algorithm, Select-Focus-Modification (Figure 12), is based on the above principles and is invoked with _bel instantiated as the root node of the candidate foci tree. To select the focus of modification, the system must be able to predict the effect that presenting a set of evidence will have on EA's acceptance of a belief. Logan et al.</Paragraph>
      <Paragraph position="7"> (1994) proposed a mechanism for predicting how a hearer's beliefs will be altered by some communicated beliefs. They utilize Galliers' belief revision mechanism (Galliers 11992) to predict the hearer's belief in _bel based on: 1) the speaker's beliefs about the hearer's evidence pertaining to _bel, which can include beliefs previously conveyed by the hearer and stereotypical beliefs that the hearer is thought to hold, and 2) the evidence that the speaker is planning on presenting to the hearer. Thus the prediction is based on the speaker's beliefs about what the hearer's evidence for and against _bel will be after the speaker's evidence has been presented to the hearer. Our Predict function in Figure 12 utilizes this strategy to predict whether the hearer will accept, reject, or remain uncertain about his acceptance of _bel after evidence is presented to him.</Paragraph>
      <Paragraph position="8"> In our algorithm, if resolving the conflict about _bel involves refuting its rejected evidence (steps 4.2 and 4.4), Select-Min-Set is invoked to select a minimally sufficient set to actually address. Select-Min-Set first ranks the pieces of evidence in _cand-set in decreasing order of the impact that each piece of evidence is believed to have on EA's belief in _bel. The system then predicts whether changing EA's belief about the first piece of evidence (_evidl) is sufficient. If not, then merely addressing one piece of evidence will not be sufficient to change EA's belief about _bel (since the other pieces of evidence contribute less to EA's belief in _bel); thus the system predicts whether addressing the first two pieces of evidence in the ordered set is sufficient. This process continues until the system finds the first n pieces of evidence which it predicts, when disbelieved by EA, will cause him to accept -~_bel. The rejected components of these n pieces of evidence are then returned by Select-Min-Set. This process guarantees that  Algorithm for selecting the focus of modification.</Paragraph>
      <Paragraph position="9"> _rain-set is the minimum subset of evidence proposed to support _bel that the system believes it must address in order to change EA's belief in _bel.</Paragraph>
      <Paragraph position="10"> After the Select-Focus-Modification process is completed, each rejected top-level proposed belief (_bel) will be annotated with a set of beliefs that the system should refute (_bel.focus) when attempting to change EA's view of _bel. The negations of these beliefs are then posted by the system as mutual beliefs to be achieved in order to carry out the modification process. The next step is for the system to select an appropriate set of evidence to provide as justification for these proposed mutual beliefs.</Paragraph>
    </Section>
    <Section position="2" start_page="382" end_page="387" type="sub_section">
      <SectionTitle>
6.2 Selecting the Justification for a Claim
</SectionTitle>
      <Paragraph position="0"> Studies in communication and social psychology have shown that evidence improves the persuasiveness of a message (Luchok and McCroskey 1978; Reynolds and Burgoon 1983; Petty and Cacioppo 1984; Hample 1985). Research on the quantity of evidence indicates that there is no optimal amount of evidence, but that the use of high-quality evidence is consistent with persuasive effects (Reinard 1988). On the other hand, Grice's  Computational Linguistics Volume 24, Number 3 maxim of quantity (Grice 1975) argues that one should not contribute more information than is required. Thus, it is important that a collaborative agent select sufficient and effective, but not excessive, evidence to justify an intended mutual belief. The first step in selecting the justification for a claim is to identify the alternative pieces of evidence that the system can present to EA. Since the components of these pieces of evidence may again need to be justified (Cohen and Perrault 1979), these alternative choices will be referred to as the candidate justification chains. The system will then select a subset of these justification chains to present to EA.</Paragraph>
      <Paragraph position="1"> The most important aspect in selecting among these justification chains is that the system believes that the selected justification chains will achieve the goal of convincing EA of the claim. Thus our system first selects the minimum subsets of the candidate justification chains that are predicted to be sufficient to convince EA of the claim. If more than one such subset exists, selection heuristics will be applied. Luchok and McCroskey (1978) argued that high-quality evidence produces more attitude change than any other evidence form, suggesting that justification chains for which the system has the greatest confidence should be preferred. This also allows the system to better justify the evidence should questions about its validity arise. Wyer (1970) and Morley (1987) argued that evidence is most persuasive if it is previously unknown to the hearer, suggesting that the system should select evidence that it believes is novel to EA. 19 Finally, Grice's maxim of quantity (Grice 1975) states that one should not make a contribution more informative than is needed; thus the system should select evidence chains that contain the fewest beliefs.</Paragraph>
      <Paragraph position="2"> Our algorithm Select-Justification (Figure 13) is based on these principles and is invoked on a claim _rob that the system intends to make. When justification chains have been constructed for an antecedent belief _beli and the evidential relationship between _beli and _rob, the algorithm uses a function Make-Evidence to construct a justification chain with _rob as its root node, the root node of _beli-chain as its child node, and the root node of _reli-chain as the relationship between _beli and _mb (step 2.3). Thus, Make-Evidence returns a justification chain for _rnb, which includes a piece of evidence that provides direct support for _mb, namely {_beli,_reli}, as well as specifying how _beli and _reli should be justified. 2deg This justification chain is then added to _evid-set, which contains alternative justification chains that the system can present to EA as support for _rob. The selection criteria discussed earlier are applied to the elements in _evid-set to produce _selected-set. If _selected-set has only one element, then this justification chain will be selected as support for _rob; if _selected-set has more than one element, then a random justification chain will be selected as support for _rob; if _selected-set is empty, then no justification chain will be returned, thus indicating that the system does not have sufficient evidence to convince EA of _rob. 21 Thus the Select-Justification algorithm returns a justification chain needed to support an intended mutual belief, whenever possible, based on both the system's prediction of the strength of each candidate justification chain as well as a set of heuristics motivated by research in communication and social psychology.</Paragraph>
      <Paragraph position="3"> 19 Walker (1996b) has shown the importance of IRU's (Inforinationally Redundant Utterances) in efficient discourse. We leave including appropriate IRU's for future work.</Paragraph>
      <Paragraph position="4"> 20 As can be seen from this construction process, a justification chain can be more than simple chains such as A --~ B ~ C. In fact, it can be a complex tree-like structure in which both nodes and links are further justified. In our current system, a fact appears multiple times in a justification chain if it is used to justify more than one claim.</Paragraph>
      <Paragraph position="5"> 21 In practice this should never be the case, because the Select-Focus-Modification algorithm only selects as focus a set of beliefs that it believes the system can successfully refute.</Paragraph>
      <Paragraph position="6">  EA: Dr. Smith isn't the professor of CS821, is he? Isn't Dr. Jones the professor of CS821? In utterances (22)~(23), EA proposes three mutual beliefs: 1) the professor of CS821 is not Dr. Smith, 2) the professor of CS821 is Dr. Jones, and 3) Dr. Jones being the professor of CS821 provides support for Dr. Smith not being the professor of CS821. 22 CORE's 22 Utterances (22) and (23) are both expressions of doubt. In the former case, the speaker conveys a strong but uncertain belief that the professor of CS821 is not Dr. Smith, while in the latter, the speaker conveys a strong but uncertain belief that the professor of CS821 is Dr. Jones (Lambert and Carberry 1992).  Computational Linguistics Volume 24, Number 3 evaluation of this proposal is very similar to that discussed in Section 5.1.1, and will not be repeated here. The result is that CORE rejects both -~Professor(CS821,Smith) and Professor(CS821,Jones), but accepts the evidential relationship between them.</Paragraph>
      <Paragraph position="7"> Since the top-level proposed belief, -~Professor(CS821,Smith) is rejected by CORE, the modification process is invoked. The Modify-Proposal action specifies that the focus of modification first be identified. Thus CORE constructs the candidate foci tree and applies the Select-Focus-Modification algorithm to its root node. In this example, the candidate foci tree is identical to the proposed belief tree since both the top-level proposed belief and the evidence proposed to support it were rejected. The Select-Focus-Modification algorithm (Figure 12) is then invoked on ~Professor(CS821,Smith). The algorithm specifies that the focus of modification for the rejected evidence first be determined; thus the algorithm is recursively applied to the rejected child belief, Professor(CS821,Jones) (step 3.1).</Paragraph>
      <Paragraph position="8"> CORE has two pieces of evidence against Dr. Jones being the professor of CS821: 1) a very strong piece of evidence consisting of the beliefs that Dr. Jones is going on sabbatical in 1998 and that professors on sabbatical do not teach courses, and 2) a strong piece of evidence consisting of the beliefs that Dr. Jones' expertise is compilers, that CS821 is a database course, and that professors generally do not teach courses outside of their areas of expertise. CORE predicts that its two pieces of evidence, when presented to EA, will lead EA to accept ~Professor(CS821,Jones); thus the focus of modification for Professor(CS821,Jones) is the belief itself.</Paragraph>
      <Paragraph position="9"> Having selected the focus of modification for the rejected child belief, CORE selects the focus of modification for the top-level proposed belief -~Professor(CS821,Smith). Since the only reason that CORE knows of for EA believing ~Professor(CS821,Smith) is the proposed piece of evidence, it predicts that eliminating EA's belief in the evidence would result in EA rejecting ~Professor(CS821,Smith). Therefore, the focus of modification for ~Professor(CS821,Smith) is Professor(CS821,Jones).</Paragraph>
      <Paragraph position="10"> Once the focus of modification is identified, the subactions of Modify-Proposal are invoked on the selected focus. The dialogue model constructed for this modification process is shown in Figure 14. Since the selected focus is represented by a belief node, Correct-Node is selected as the subaction of Modify-Proposal. To satisfy the precondition of Modify-Node, CORE posts MB(CA, EA,-~Professor(CS821,Jones)) as a mutual belief to be achieved. CORE then adopts the Inform discourse action to achieve the mutual belief. Inform has two subactions: Tell which conveys a belief to EA, and Address-Acceptance, which invokes the Select-Justification algorithm (Figure 13) to select justification for the intended mutual belief.</Paragraph>
      <Paragraph position="11"> Since the surface form of EA's utterance in (23) conveyed a strong belief in Professor(CS821,Jones), CORE predicts that merely informing EA of the negation of this proposition is not sufficient to change his belief; therefore CORE constructs justification chains from the available pieces of evidence. Figure 15 shows the candidate justification chains constructed from CORE's two pieces of evidence for ~Professor(CS821,Jones).</Paragraph>
      <Paragraph position="12"> When constructing the justification chain in Figure 15(a), CORE predicts that merely informing EA of On-Sabbatical(Jones,1998) is not sufficient to convince him to accept this belief because of EA's previously conveyed strong belief that Dr. Jones will be on campus in 1998 and the stereotypical belief that being on campus generally implies not being on sabbatical. Thus further evidence is given to support On-Sabbatical(Jones, 1998).</Paragraph>
      <Paragraph position="13"> Given the two alternative justification chains, CORE first selects those that are strong enough to convince EA to accept ~Professor(CS821,Jones). If the justification chain in Figure 15(a) is presented to EA, CORE predicts that EA will have the following pieces of evidence pertaining to Professor(CS821,Jones): 1) a strong belief in  Chu-Carroll and Carberry Response Generation in Planning Dialogues i Dialogue Model for Utterances (22) - (23) ,</Paragraph>
      <Paragraph position="15"> Alternative justification chains for -~Professor(CS821,Jones).</Paragraph>
      <Paragraph position="16"> Professor(CS821,Jones), conveyed by utterance (23), 2) a very strong piece of evidence against Professor(CS821,Jones), provided by CORE's proposal of the belief, 23 and 3) a 23 EA should treat this as a very strong piece of evidence since CORE will convey it in a direct statement and is presumed to have very good (although imperfect) knowledge about prospective teaching assigmnents.  Computational Linguistics Volume 24, Number 3 very strong piece of evidence against Professor(CS821,Jones), provided by CORE's proposed evidence in Figure 15(a). CORE then predicts that EA will have an overall belief in -~Professor(CS821,Jones) of strength (very strong, strong). 24 Similarly, CORE predicts that when the evidence in Figure 15(b) is presented to EA, EA will have a very strong belief in -~Professor(CS821,Jones). Hence, both candidate justification chains are predicted to be strong enough to change EA's belief about Professor(CS821,Jones). Since more than one justification chain is produced, the selection heuristics are applied. The first heuristic prefers justification chains in which CORE is most confident; thus the justification chain in Figure 15(a) is selected as the evidence that CORE will present to EA, leading to the generation of the semantic forms of the following utterances:</Paragraph>
    </Section>
    <Section position="3" start_page="387" end_page="388" type="sub_section">
      <SectionTitle>
7.1 System Implementation
</SectionTitle>
      <Paragraph position="0"> We have implemented a prototype of our conflict resolution system, CORE, for a university course advisement domain; the implementation was done in Common Lisp with the Common Lisp Object System under SunOS. CORE realizes the response generation process for conflict resolution by utilizing the response generation strategies detailed in this paper. Given the dialogue model constructed from EA's proposal, it performs the evaluation and modification processes in our Propose-Evaluate-Modify framework. Domain knowledge used by CORE includes 1) knowledge about objects in the domain, their attributes and corresponding values, such as the professor of CS681 being Dr. Rogers, 2) knowledge about a hierarchy of concepts in the domain; for instance, computer science can be divided into hardware, software, and theory, and 3) knowledge about evidential inference rules in the domain, such as a professor being on sabbatical normally implies that he is not teaching courses. CORE also makes use of a model of its beliefs about EA's beliefs. This knowledge helps CORE tailor its responses to the particular EA by taking into account CORE's beliefs about what EA already believes. In addition, CORE maintains a library of generic recipes in order to plan its actions. In our implementation, CORE has knowledge about 29 distinct objects, 14 evidential rules, and 43 domain, problem-solving, and discourse recipes. Since the focus of this work is on the evaluation and modification processes that are captured as problem-solving actions, 25 of the 43 recipes are domain-independent problem-solving recipes.</Paragraph>
      <Paragraph position="1"> CORE takes as input a four-level dialogue model that represents intentions inferred from EA's utterances, such as that in Figure 2. It then evaluates the proposal to determine whether to accept the proposal to reject the proposal and attempt to modify it, or to pursue information-sharing. As part of the information-sharing and conflict resolution processes, CORE determines the discourse acts that should be adopted to respond to EA's utterances, and generates the semantic forms of the utterances that 24 When the strength of a belief is represented as a list of values, it indicates that the net result of combining the strengths of all pieces of evidence pertaining to the belief is equivalent to having one piece of positive evidence of each of the strengths listed.</Paragraph>
      <Paragraph position="2">  Chu-Carroll and Carberry Response Generation in Planning Dialogues realize these discourse acts. Realization of these logical forms as natural language utterances is discussed in the section on future work.</Paragraph>
    </Section>
    <Section position="4" start_page="388" end_page="391" type="sub_section">
      <SectionTitle>
7.2 Evaluation of CORE
</SectionTitle>
      <Paragraph position="0"> 7.2.1 Methodology. In order to obtain an initial assessment of the quality of CORE's responses, we performed an evaluation to determine whether or not the strategies adopted by CORE are reasonable strategies that a system should employ when participating in collaborative planning dialogues and whether other options should be considered. The evaluation, however, was not intended to address the completeness of the types of responses generated by CORE, nor was it intended to be a full scale evaluation such as would be provided by integrating CORE's strategies into an actual interactive advisement system.</Paragraph>
      <Paragraph position="1"> The evaluation was conducted via a questionnaire in which human judges ranked CORE's responses to EA's utterances among a set of alternative responses, and also rated their level of satisfaction with each individual response. The questionnaire contained a total of five dialogue segments that demonstrated CORE's ability to pursue information-sharing and to resolve detected conflicts in the agents' beliefs; other dialogue segments included in the questionnaire addressed aspects of CORE's performance that are not the topic of this paper. Each dialogue segment was selected to evaluate a particular algorithm used in the response generation process. For each dialogue segment, the judges were given the following information: Input to CORE: this included EA's utterances (for illustrative purposes), the beliefs that would be inferred from each of these utterances and the relationships among them. In effect, this is a textual description of the belief level of the dialogue model that would be inferred from EA's utterances.</Paragraph>
      <Paragraph position="2"> CORE's relevant knowledge: CORE's knowledge relevant to its evaluation of each belief given in the input, along with CORE's strength of belief in each piece of knowledge.</Paragraph>
      <Paragraph position="3"> Responses: for each dialogue segment, five alternative responses were given, one of which was the actual response generated by CORE (the responses were presented in random order so that the judges were not aware of which response was actually generated by the system). The other four responses were obtained by altering CORE's response generation strategies. For instance, instead of invoking our Select-Justification algorithm, an alternative response can be generated by including every piece of evidence that CORE believes will provide support for its claim. Alternatively, the preference for addressing rejected evidence in Select-Focus-Modification can be altered to allow CORE to consider directly refuting a parent belief before considering refuting its rejected child beliefs.</Paragraph>
      <Paragraph position="4"> Appendix A shows a sample dialogue segment in the questionnaire, annotated based on how CORE's response generation mechanism was altered to produce each of the four alternative responses. In evaluating alternative responses, the judges were explicitly instructed not to pay attention to the phrasing of CORE's responses, but to evaluate the responses based on their conciseness, coherence, and effectiveness, since it was the quality of the content of CORE's responses that was of interest in this  evaluation. Based on this principle, the judges were asked to rate the five responses in the following two ways: .</Paragraph>
      <Paragraph position="5"> .</Paragraph>
      <Paragraph position="6"> Level of Satisfaction: the goal of this portion of the evaluation was to assess the level of satisfaction that a user interacting with CORE is likely to have based on CORE's responses. Each alternative response was rated on a scale of very good, good, fair, poor, and terrible.</Paragraph>
      <Paragraph position="7"> Ranking: the goal of this ranking was to compare our response generation strategies with other alternative strategies that might be adopted in designing a response generation system. The judges were asked to rank in numerical order the five responses based on their order of preference.</Paragraph>
      <Paragraph position="8"> Twelve judges, all of whom were undergraduate or graduate students in computer science or linguistics, were asked to participate in this evaluation; evaluation forms were returned anonymously by 10 judges by the established deadline date. Note that the judges had not been taught about the CORE system and its processing mechanisms prior to the evaluation.</Paragraph>
      <Paragraph position="9"> 7.2.2 Results. Two sets of results were computed for the judges' level of satisfaction with CORE's responses, and for the ranking of CORE's responses as compared with the alternative responses. The results of our evaluation are shown in Tables 3(a) and 3(b). In order to assess the judges' level of satisfaction with CORE's responses, we assigned a value of I to 5 to each of the satisfaction ratings where I is terrible and 5 is very good. The mean and median of CORE's actual response in each dialogue segment were then computed, as well as the mean of all alternative responses provided for each dialogue segment, which was used as a basis for comparison. Table 3(a) shows that in the two dialogue segments in which CORE initiated information-sharing (IS1 and IS2), the means of CORE's responses are both approximately one level of satisfaction higher  than the average score given to all other responses (columns 1 and 3 in Table 3(a)). Furthermore, in both cases the median of the score is 4, indicating that at least half of the judges considered CORE's responses to be good or very good. The three dialogue segments in which CORE initiated collaborative negotiation (CN1, CN2, and CN3), however, yielded less uniform results. The means of CORE's responses range from being slightly above the average score for other responses to being one satisfaction level higher. However, in two out of the three responses, at least half of the judges considered CORE's responses to be either good or very good.</Paragraph>
      <Paragraph position="10"> To assess the ranking of CORE's responses as compared with alternative responses, we again computed the means and medians of the rankings given to CORE's responses, as well as the mean of the rankings given to each alternative response. The first column in Table 3(b) shows the mean rankings of CORE's responses. This set of results is consistent with that in Table 3(a) in that the dialogue segments where CORE's responses received a higher mean satisfaction rating also received a lower mean ranking (thus indicating a higher preference). The last column in Table 3(b) shows how the mean of CORE's response in a dialogue segment ranks when compared to the means of the alternative responses in the same dialogue segment. The second column, on the other hand, shows the medians of the rankings for CORE's responses. A comparison of these two columns indicates that they agree in all but one case. The disagreement occurs in dialogue IS2; although more than half of the judges consider an alternative response better than CORE's actual response (because the median of CORE's response is 2), the judges do not agree on what this better response is (because the mean of CORE's response ranks highest among all alternatives). Thus, CORE's response in IS2 can be considered the most preferred response among all judges.</Paragraph>
      <Paragraph position="11"> Next, we examine the alternative responses that are consistently ranked higher than CORE's responses in the dialogue segments. In dialogue IS1, EA proposed a main belief and provided supporting evidence for it. CORE initiated information sharing using the Ask-Why strategy, focusing on an uncertain child belief. The preferred alternative response also adopted the Ask-Why strateg.~ but focused on the main belief. We tentatively assumed that this was because of the judges' preference for addressing the main belief directly instead of being less direct by addressing the uncertain evidence. However, this assumption was shown to be invalid by the result in IS2 where the most preferred response (which is CORE's actual response) addresses an uncertain child belief. A factor that further complicates the problem is the fact that EA has already proposed evidence to support the main belief in IS1; thus applying Ask-Why to the main belief would seem to be ineffective.</Paragraph>
      <Paragraph position="12"> To evaluate our collaborative negotiation strategies, we analyzed the responses in dialogues CN1, CN2, and CN3 that were ranked higher than CORE's actual responses. We compared these preferred responses to CORE's responses based on their agreement on the outcome of the Evaluate-Belief, Select-Focus-Modification, and  Computational Linguistics Volume 24, Number 3 Select-Justification processes, as shown in Table 4. For instance, the second row in the table shows that the second preferred response in dialogue CN1 (listed as CN1.2) was produced as a result of Evaluate-Belief having rejected the proposal (which is in agreement with CORE), of Select-Focus-Modification having selected a child belief as its focus (again in agreement with CORE), and of Select-Justification having selected all available evidence to present as justification (as opposed to CORE, which selected a subset of such evidence). These results indicate that, in the examples we tested, all judges agreed with the &amp;quot;outcome of CORE's proposal evaluation mechanism, and in all but one case, the judges agreed with the belief(s) CORE chose to refute. However, disagreements arose with respect to CORE's process for selecting justification. In dialogue CN1.2, the judges preferred providing all available evidence, which may be the result of one of two assumptions. First, the judges may believe that providing all available evidence is a better strategy in general, or second, they may have reasoned about the impact that potential pieces of evidence have on EA's beliefs and concluded that the subset of evidence that CORE selected is insufficient to convince EA of its claims. In dialogue CN2, the judges preferred a response of the form B ~ A, while CORE generated a response of the form C --~ B ~ A, even though the judges were explicitly given CORE's belief that EA believes -~B. This result invalidates the second assumption above, since if that assumption were true, it is very unlikely that the judges would have concluded that no further evidence for B is needed in this case. However, the first assumption above is also invalidated because an alternative response in dialogue CN2, which enumerated all available pieces of evidence, was ranked second last. This, along with the fact that in dialogue CN3, the judges preferred a response that includes a subset of the evidence selected by CORE, leads us to conclude that further research is needed to determine the reasons that led the judges to make seemingly contradictory judgments, and how these factors can be incorporated into CORE's algorithms to improve its performance. Although the best measure of performance would be to evaluate how our response generation strategies contribute to task success within a robust natural language advisement system, which is beyond our current capability, note that CORE's current collaborative negotiation and information-sharing strategies result in responses that most of our judges consider concise, coherent, and effective, and thus provide an excellent basis for future work.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML