XML Viewer - p97-1034

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/p97-1034_metho.xml
Size: 30,624 bytes
Last Modified: 2025-10-06 14:14:36
<?xml version="1.0" standalone="yes"?>
<Paper uid="P97-1034">
  <Title>Tracking Initiative in Collaborative Dialogue Interactions</Title>
  <Section position="3" start_page="0" end_page="263" type="metho">
    <SectionTitle>
2 Task Initiative vs. Dialogue Initiative
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="262" type="sub_section">
      <SectionTitle>
2.1 Motivation
</SectionTitle>
      <Paragraph position="0"> Previous work on mixed-initiative dialogues focused on tracking and allocating a single thread of control, the conversational lead, among participants. Novick (1988) developed a computational model that utilizes metalocutionary acts, such as repeat and give-turn, to capture mixed-initiative behavior in dialogues. Whittaker and Stenton (1988) devised rules for allocating dialogue control based on utterance types, and Walker and Whittaker (1990) utilized these rules for an analytical study on discourse segmentation. Kitano and Van Ess-Dykema (1991) developed a plan-based dialogue understanding model that tracks the conversational initiative based on the domain and discourse plans behind the utterances.</Paragraph>
      <Paragraph position="1"> Smith and Hipp (1994) developed a dialogue system that varies its responses to user utterances based on four di= alogue modes which model different levels of initiative exhibited by dialogue participants. However, the dialogue mode is determined at the outset and cannot be changed during the dialogue. Guinn (1996) subsequently developed a system that allows change in the level of ini- null tiative based on initiative-changing utterances and each agent's competency in completing the current subtask.</Paragraph>
      <Paragraph position="2"> However, we contend that merely maintaining the conversational lead is insufficient for modeling complex behavior commonly found in naturally-occurring collaborative dialogues (SRI Transcripts, 1992; Gross, Allen, and Tram, 1993; Heeman and Allen, 1995). For instance, consider the alternative responses in utterances (3a)-(3c), given by an advisor to a student's question:  (1) S: I want to take NLP to satisfy my seminar course requirement.</Paragraph>
      <Paragraph position="3"> (2) Who is teaching NLP? (3a) A: Dr. Smith is teaching NLP.</Paragraph>
      <Paragraph position="4"> (3b) A: You can't take NLP because you haven't taken AI, which is a prerequisite for NLP (3c) A: You can't take NLP because you haven't  taken AI, which is a prerequisite for NLP You should take distributed programming to satisfy your requirement, and sign up as a listener for NI.~.</Paragraph>
      <Paragraph position="5"> Suppose we adopt a model that maintains a single thread of control, such as that of (Whittaker and Stenton, 1988). In utterance (3a), A directly responds to S's question; thus the conversational lead remains with S. On the other hand, in (3b) and (3c), A takes the lead by initiating a subdialogue to correct S's invalid proposal. However, existing models cannot explain the difference in the two responses, namely that in (3c), A actively participates in the planning process by explicitly proposing domain actions, whereas in (3b), she merely conveys the invalidity of S's proposal. Based on this observation, we argue that it is necessary to distinguish between task initiative, which tracks the lead in the development of the agents' plan, and dialogue initiative, which tracks the lead in determining the current discourse focus (Chu-Carroll and Brown, 1997). 1 This distinction then allows us to explain * ~/s behavior from a response generation point of view: in (3b), A responds to S's proposal by merely taking over the dialogue initiative, i.e., informing S of the invalidity of the proposal, while in (3c), A responds by taking over both the task and dialogue initiatives, i.e., informing S of the invalidity and suggesting a possible remedy.</Paragraph>
      <Paragraph position="6"> An agent is said to have the task initiative if she is directing how the agents' task should be accomplished, i.e., if her utterances directly propose actions that the 1Although independently conceived, this distinction between task and dialogue initiatives is similar to the notion of choice of task and choice of speaker in initiative in (Novick and Sutton, 1997), and the distinction between control and initiative in (Jordan and Di Eugenio, 1997).</Paragraph>
      <Paragraph position="7"> TI: system  agents should perform. The utterances may propose domain actions (Litman and Allen, 1987) that directly contribute to achieving the agents' goal, such as &amp;quot;Let's send engine E2 to Coming.&amp;quot; On the other hand, they may propose problem-solving actions (Allen, 1991; Lambert and Carberry, 1991; Ramshaw, 1991) that contribute not directly to the agents' domain goal, but to how they would go about achieving this goal, such as &amp;quot;Let's look at the first \[problem\]first.&amp;quot; An agent is said to have the dialogue initiative if she takes the conversational lead in order to establish mutual beliefs, such as mutual beliefs about a piece of domain knowledge or about the validity of a proposal, between the agents. For instance, in responding to agent Xs proposal of sending a boxcar to Coming via Dansville, agent B may take over the dialogue initiative (but not the task initiative) by saying &amp;quot;We can't go by Dansville because we've got Engine I going on that track.&amp;quot; Thus, when an agent takes over the task initiative, she also takes over the dialogue initiative, since a proposal of actions can be viewed as an attempt to establish the mutual belief that a set of actions be adopted. On the other hand, an agent may take over the dialogue initiative but not the task initiative, as in (3b) above.</Paragraph>
    </Section>
    <Section position="2" start_page="262" end_page="263" type="sub_section">
      <SectionTitle>
2.2 An Analysis of the TRAINS91 Dialogues
</SectionTitle>
      <Paragraph position="0"> To analyze the distribution of task/dialogue initiatives in collaborative planning dialogues, we annotated the TRAINS91 dialogues (Gross, Allen, and Traum, 1993) as follows: each dialogue turn is given two labels, task initiative (TI) and dialogue initiative (DI), each of which can be assigned one of two values, system or manager, depending on which agent holds the task/dialogue initiative during that turn. 2 Table 1 shows the distribution of task and dialogue initiatives in the TRAINS91 dialogues. It shows that while in the majority of turns, the task and dialogue initiatives are held by the same agent, in approximately 1/4 of the turns, the agents' behavior can be better accounted forby tracking the two types of initiatives separately.</Paragraph>
      <Paragraph position="1"> To assess the reliability of our annotations, approximately 10% of the dialogues were annotated by two additional coders. We then used the kappa statistic (Siegel and Castellan, 1988; Carletta, 1996) to assess the level of agreement between the three coders with respect to the 2 An agent holds the task initiative during a turn as long as some utterance during the turn directly proposes how the agents should accomplish their goal, as in utterance (3c).</Paragraph>
      <Paragraph position="2">  task and dialogue initiative holders. In this experiment, K is 0,57 for the task initiative holder agreement and K is 0.69 for the dialogue initiative holder agreement.</Paragraph>
      <Paragraph position="3"> Carletta suggests that content analysis researchers consider K &gt;.8 as good reliability, with .67&lt; /~&amp;quot; &lt;.8 allowing tentative conclusions to be drawn (Carletta, 1996). Strictly based on this metric, our results indicate that the three coders have a reasonable level of agreement with respect to the dialogue initiative holders, but do not have reliable agreement with respect to the task initiative holders. However, the kappa statistic is known to be highly problematic in measuring inter-coder reliability when the likelihood of one category being chosen overwhelms that of the other (Grove et al., 1981), which is the case for the task initiative distribution in the TRAINS91 corpus, as shown in Table 1. Furthermore, as will be shown in Table 4, Section 4, the task and dialogue initiative distributions in TRAINS91 are not at all representative of collaborative dialogues. We expect that by taking a sample of dialogues whose task/dialogue initiative distributions are more representative of all dialogues, we will lower the value of P(E), the probability of chance agreement, and thus obtain a higher kappa coefficient of agreement. However, we leave selecting and annotating such a subset of representative dialogues for future work.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="263" end_page="264" type="metho">
    <SectionTitle>
3 A Model for Tracking Initiative
</SectionTitle>
    <Paragraph position="0"> Our analysis shows that the task and dialogue initiatives shift between the participants during the course of a dialogue. We contend that it is important for the agents to take into account signals for such initiative shifts for two reasons. First, recognizing and providing signals for initiative shifts allow the agents to better coordinate their actions, thus leading to more coherent and cooperative dialogues. Second, by determining whether or not it should hold the task and/or dialogue initiatives when responding to user utterances, a dialogue system is able to tailor its responses based on the distribution of initiatives, as illustrated by the previous dialogue (Chu-Carroll and Brown, 1997). This section describes our model for tracking initiative using cues identified from the user's utterances.</Paragraph>
    <Paragraph position="1"> Our model maintains, for each agent, a task initiative index and a dialogue initiative index which measure the amount of evidence available to support the agent holding the task and dialogue initiatives, respectively. After each turn, new initiative indices are calculated based on the current indices and the effects of the cues observed during the turn. These cues may be explicit requests by the speaker to give up his initiative, or implicit cues such as ambiguous proposals. The new initiative indices then determine the initiative holders for the next turn.</Paragraph>
    <Paragraph position="2"> We adopt the Dempster-Shafer theory of evidence (Sharer, 1976; Gordon and Shortliffe, 1984) as our underlying model for inferring the accumulated effect of multiple cues on determining the initiative indices. The Dempster-Shafer theory is a mathematical theory for reasoning under uncertainty which operates over a set of possible outcomes, O. Associated with each piece of evidence that may provide support for the possible outcomes is a basic probability assignment (bpa), a function that represents the impact of the piece of evidence on the subsets of O. A bpa assigns a number in the range \[0,1\] to each subset of O such that the numbers sum to 1.</Paragraph>
    <Paragraph position="3"> The number assigned to the subset O1 then denotes the amount of support the evidence directly provides for the conclusions represented by O1. When multiple pieces of evidence are present, Dempster' s combination rule is used to compute a new bpa from the individual bpa' s to represent their cumulative effect.</Paragraph>
    <Paragraph position="4"> The reasons for selecting the Dempster-Shafer theory as the basis for our model are twofold. First, unlike the Bayesian model, it does not require a complete set of a priori and conditional probabilities, which is difficult to obtain for sparse pieces of evidence. Second, the Dempster-Shafer theory distinguishes between situations in which no evidence is available to support any conclusion and those in which equal evidence is available to support each conclusion. Thus the outcome of the model more accurately represents the amount of evidence available to support a particular conclusion, i.e., the provability of the conclusion (Pearl, 1990).</Paragraph>
    <Section position="1" start_page="263" end_page="264" type="sub_section">
      <SectionTitle>
3.1 Cues for Tracking Initiative
</SectionTitle>
      <Paragraph position="0"> In order to utilize the Dempster-Shafer theory for modeling initiative, we must first identify the cues that provide evidence for initiative shifts. Whittaker, Stenton, and Walker (Whittaker and Stenton, 1988; Walker and Whittaker, 1990) have previously identified a set of utterance intentions that serve as cues to indicate shifts or lack of shifts in initiative, such as prompts and questions.</Paragraph>
      <Paragraph position="1"> We analyzed our annotated TRAINS91 corpus and identified additional cues that may have contributed to the shift or lack of shift in task/dialogue initiatives during the interactions. This results in eight cue types, which are grouped into three classes, based on the kind of knowledge needed to recognize them. Table 2 shows the three classes, the eight cue types, their subtypes if any, whether a cue may affect merely the dialogue initiative or both the task and dialogue initiatives, and the agent expected to hold the initiative in the next turn.</Paragraph>
      <Paragraph position="2"> The first cue class, explicit cues, includes explicit requests by the speaker to give up or take over the initiative. For instance, the utterance &amp;quot;Any suggestions ?&amp;quot; indicates the speaker's intention for the hearer to take over both the task and dialogue initiatives. Such explicit cues can be recognized by inferring the discourse and/or problem-solving intentions conveyed by the speaker' s utterances.  A: &amp;quot;Grab the tanker, pick up oranges, go to Elmira, make them into orange juice.&amp;quot; B: &amp;quot;We go to Elmira, we make orange juice, okay.'&amp;quot; &amp;quot;Yeah &amp;quot;, &amp;quot;Ok&amp;quot;, &amp;quot;Right&amp;quot; &amp;quot;How far is it from Bath to Coming?&amp;quot; &amp;quot;Can we do the route the banana guy isn't doing?&amp;quot; A: &amp;quot;Any suggestions ?&amp;quot; B: &amp;quot;Well, there's a boxcar at Dansville.&amp;quot; &amp;quot;But you have to change your banana plan.&amp;quot; &amp;quot;How long is it from Dansville to Coming ?&amp;quot; &amp;quot;Go ahead and fill up E1 with bananas.&amp;quot; &amp;quot;Well, we have to get a boxcar.&amp;quot; &amp;quot;Right. okay. It's shorter to Bath from Avon.&amp;quot;  The second cue class, discourse cues, includes cues that can be recognized using linguistic and discourse information, such as from the surface form of an utterance, or from the discourse relationship between the current and prior utterances. It consists of four cue types. The first type is perceptible silence at the end of an utterance, which suggests that the speaker has nothing more to say and may intend to give up her initiative. The second type includes utterances that do not contribute information that has not been conveyed earlier in the dialogue. It can be further classified into two groups: repetitions, a sub-set of the informationally redundant utterances (Walker, 1992), in which the speaker paraphrases an utterance by the hearer or repeats the utterance verbatim, and prompts, in which the speaker merely acknowledges the bearer's previous utterance(s). Repetitions and prompts also suggest that the speaker has nothing more to say and indicate that the hearer should take over the initiative (Whittaker and Stenton, 1988). The third type includes questions which, based on anticipated responses, are divided into domain and evaluation questions. Domain questions are questions in which the speaker intends to obtain or verify a piece of domain knowledge.</Paragraph>
      <Paragraph position="3"> They usually merely require a direct response and thus typically do not result in an initiative shift. Evaluation questions, on the other hand, are questions in which the speaker intends to assess the quality of a proposed plan.</Paragraph>
      <Paragraph position="4"> They often require an analysis of the proposal, and thus frequently result in a shift in dialogue initiative. The final type includes utterances that satisfy an outstanding task or discourse obligation. Such obligations may have resulted from a prior request by the hearer, or from an interruption initiated by the speaker himself. In either case, when the task/dialogue obligation is fulfilled, the initiative may be reverted back to the hearer who held the initiative prior to the request or interruption.</Paragraph>
      <Paragraph position="5"> The third cue class, analytical cues, includes cues that cannot be recognized without the hearer performing an evaluation on the speaker's proposal using the heater's private knowledge (Chu-Carroll and Carberry, 1994; Chu-Carroll and Carberry, 1995). After the evaluation, the hearer may find the proposal invalid, suboptimal, or ambiguous. As a result, he may initiate a sub-dialogue to resolve the problem, resulting in a shift in task/dialogue initiatives. 3</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="264" end_page="267" type="metho">
    <SectionTitle>
3 Whittaker, Stenton, and Walker treat subdialogues initiated
</SectionTitle>
    <Paragraph position="0"> as a result of these cues as interruptions, motivated by their collaborative planning principles (Whittaker and Stenton, 1988; Walker and Whittaker, 1990).</Paragraph>
    <Section position="1" start_page="265" end_page="266" type="sub_section">
      <SectionTitle>
3.2 Utilizing the Dempster-Shafer Theory
</SectionTitle>
      <Paragraph position="0"> As discussed earlier, at the end of each turn, new task/dialogue initiative indices are computed based on the current indices and the effect of the observed cues to determine the next task/dialogue initiative holders. In terms of the Dempster-Shafer theory, new task/dialogue bpa's (mt_new/md_netu) 4 are computed by applying Dempster's combination rule to the bpa's representing the current initiative indices ~ and the bpa of each observed cue.</Paragraph>
      <Paragraph position="1"> Evidently, some cues provide stronger evidence for an initiative shift than others. Furthermore, a cue may provide stronger support for a shift in dialogue initiative than in task initiative. Thus, we associate with each cue two bpa' s to represent its effect on changing the current task and dialogue initiative indices, respectively. We extended our annotations of the TRAINS91 dialogues to include, in addition to the agent(s) holding the task and dialogue initiatives for each turn, a list of cues observed during that turn. Initially, each cue~ is assigned the following bpa's: mt-i(O) ~- I and ma-i(@) = 1, where @ = {speaker,hearer}. In other words, we assume that the cue has no effect on changing the current initiative indices. We then developed a training algorithm (Trainbpa, Figure 1) and applied it on the annotated data to obtain the final bpa' s.</Paragraph>
      <Paragraph position="2"> For each turn, the task and dialogue bpa's for each observed cue are used, along with the current initiative indices, to determine the new initiative indices (step 2). The combine function utilizes Dempster's combination rule to combine pairs of bpa' s until a final bpa is obtained to represent the cumulative effect of the given bpa' s. The resulting bpa's are then used to predict the task/dialogue initiative holders for the next turn (step 3). If this prediction disagrees with the actual value in the annotated data, Adjust-bpa is invoked to alter the bpa' s for the observed cues, and Reset-current-bpa is invoked to adjust the current bpa' s to reflect the actual initiative holder (step 4).</Paragraph>
      <Paragraph position="3"> Adjust-bpa adjusts the bpa's for the observed cues in favor of the actual initiative holder. We developed three adjustment methods by varying the effect that a disagreement between the actual and predicted initiative holders will have on changing the bpa' s for the observed cues. The first is constant-increment where each time a disagreement occurs, the value for the actual initiative holder in the bpa is incremented by a constant (A), while 4Bpa's are represented by functions whose names take the form of m,~,b. The subscript sub may be t-X or d-X, indicating that the function represents the task or dialogue bpa under scenario X.</Paragraph>
      <Paragraph position="4"> SThe initiative indices are represented as bpa's. For instance, the current task initiative indices take the following form: rat .... (speaker) = z and rat .... (hearer) = 1 - z.  that for O is decremented by ~. The second method, constant-increment-with-counter, associates with each bpa for each cue a counter which is incremented when a correct prediction is made, and decremented when an incorrect prediction is made. If the counter is negative, the constant-increment method is invoked, and the counter is reset to 0. This method ensures that a bpa will only be adjusted if it has no &amp;quot;credit&amp;quot; for correct predictions in the past. The third method, variable-incrementwith-counter, is a variation of constant-increment-withcounter. However, instead of determining whether an adjustment is needed, the counter determines the amount to be adjusted. Each time the system makes an incorrect prediction, the value for the actual initiative holder is incremented by A/2 cdeg'`'~+z, and that for O decremented  by the same amount.</Paragraph>
      <Paragraph position="5"> In addition to experimenting with different adjustment methods, we also varied the increment constant, A. For each adjustment method, we ran 19 training sessions with A ranging from 0.025 to 0.475, incrementing by 0.025 between each session, and evaluated the system based on its accuracy in predicting the initiative holders for each turn. We divided the TRAINS91 corpus into eight sets based on speaker/hearer pairs. For each A, we cross-validated the results by applying the training algorithm to seven dialogue sets and testing the resulting bpa' s on the remaining set. Figures 2(a) and 2(b) show our system's performance in predicting the task and dialogue initiative holders, respectively, using the three adjustment methods. 6</Paragraph>
    </Section>
    <Section position="2" start_page="266" end_page="267" type="sub_section">
      <SectionTitle>
3.3 Discussion
</SectionTitle>
      <Paragraph position="0"> Figure 2 shows that in the vast majority of cases, our prediction methods yield better results than making predictions without cues. Furthermore, substantial improvement is gained by the use of counters since they prevent the effect of the &amp;quot;exceptions of the rules&amp;quot; from accumulating and resulting in erroneous predictions. By restricting the increment to be inversely exponentially related to the &amp;quot;credit&amp;quot; the bpa had in making correct predictions, variable-increment-with-counter obtains better and more consistent results than constant-increment.</Paragraph>
      <Paragraph position="1"> However, the exceptions of the rules still resulted in undesirable effects, thus the further improved performance by constant-increment-with-counter.</Paragraph>
      <Paragraph position="2"> We analyzed the cases in which the system, using 6For comparison purposes, the straight lines show the system's performance without the use of cues, i.e., always predict that the initiative remains with the current holder.</Paragraph>
      <Paragraph position="3"> constant-increment-with-counter with A = .35, 7 made erroneous predictions. Tables 3(a) and 3(b) summarize the results of our analysis with respect to task and dialogue initiatives, respectively. For each cue type, we grouped the errors based on whether or not a shift occurred in the actual dialogue. For instance, the first row in Table 3(a) shows that when the cue invalid action is detected, the system failed to predict a task initiative shift in 2 out of 3 cases. On the other hand, it correctly predicted all 11 cases where no shift in task initiative occurred. Table 3(a) also shows that when an analytical cue is detected, the system correctly predicted all but one case in which there was no shift in task initiative. However, 55% of the time, the system failed to predict a shift in task initiative, s This suggests that other features need to be taken into account when evaluating user proposals in order to more accurately model initiative shifts resulting from such cues. Similar observations can be made about the errors in predicting dialogue initiative shifts when analytical cues are observed (Table 3(b)).</Paragraph>
      <Paragraph position="4"> Table 3(b) shows that when a perceptible silence is detected at the end of an utterance, when the speaker utters a prompt, or when an outstanding discourse obligation is fulfilled (first three rows in table), the system correctly predicted the dialogue initiative holder in the vast majority of cases. However, for the cue class questions, when the actual initiative shift differs from the norm, i.e., speaker retaining initiative for evaluation questions and hearer taking over initiative for domain questions, the system's performance worsens. In the rThis is the value that yields the optimal results (Figure 2).</Paragraph>
      <Paragraph position="5"> sin the case of suboptimal actions, we encounter the sparse data problem. Since there is only one instance of the cue in the set of dialogues, when the cue is present in the testing set, it is absent from the training set.</Paragraph>
      <Paragraph position="6">  case of domain questions, errors occur when 1) the response requires more reasoning than do typical domain questions, causing the hearer to take over the dialogue initiative, or 2) the hearer, instead of merely responding to the question, offers additional helpful information.</Paragraph>
      <Paragraph position="7"> In the case of evaluation questions, errors occur when 1) the result of the evaluation is readily available to the hearer, thus eliminating the need for an initiative shift, or 2) the hearer provides extra information. We believe that although it is difficult to predict when an agent may include extra information in response to a question, taking into account the cognitive load that a question places on the hearer may allow us to more accurately predict dialogue initiative shifts.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="267" end_page="268" type="metho">
    <SectionTitle>
4 Applications in Other Environments
</SectionTitle>
    <Paragraph position="0"> TO investigate the generality of our system, we applied our training algorithm, using the constant-increment-with-counter adjustment method with A = 0.35, on the TRAINS91 corpus to obtain a set of bpa's. We then evaluated the system on subsets of dialogues from four other corpora: the TRAINS93 dialogues (Heeman and Allen, 1995), airline reservation dialogues (SRI Transcripts, 1992), instruction-giving dialogues (Map Task Dialogues, 1996), and non-task-oriented dialogues (Switchboard Credit Card Corpus, 1992). In addition, we applied our baseline strategy which makes predictions without the use of cues to each corpus.</Paragraph>
    <Paragraph position="1"> Table 4 shows a comparison between the dialogues from the five corpora and the results of this evaluation.</Paragraph>
    <Paragraph position="2"> Row I in the table shows the number of turns where the expert 9 holds the task/dialogue initiative, with percentages shown in parentheses. This analysis shows that me distribution of initiatives varies quite significantly across corpora, with the distribution biased toward one agent in the TRAINS and maptask corpora, and split fairly evenly in the airline and switchboard dialogues. Row 2 shows the results of applying our baseline prediction method to the various corpora. The numbers shown are correct predictions in each instance, with the corresponding percentages shown in parentheses. These results indicate the difficulty of the prediction problem in each corpus that the task/dialogue initiative distribution (row 1) falls to convey. For instance, although the dialogue initiative is distributed approximately 30/70% between the two agents in the TRAINS91 corpus and 40160% in the airline dialogues, the prediction rates in row 2 shows that in both cases, the distribution is the result of shifts in dialogue initiative in approximately 25% of the dialogue turns. Row 3 in the table shows the prediction results when applying our training algorithm using the constant-increment-with-counter method. Finally, the last row shows the improvement in percentage points between our prediction method and the baseline 9The expertis assigned as follows: in the TRAINS domain, the system; in the airline domain, the travel agent; in the maptask domain, the instruction giver; and in the switchboard dialogues, the agent who holds the dialogue initiative the majority of the time.</Paragraph>
    <Paragraph position="3">  prediction method. To test the statistical significance of the differences between the results obtained by the two prediction algorithms, for each corpus, we applied Cochran' s Q test (Cochran, 1950) to the results in rows 2 and 3. The tests show that for all corpora, the differences between the two algorithms when predicting the task and dialogue initiative holders are statistically significant at the levels of p&lt;0.05 and p&lt; 10 -5, respectively.</Paragraph>
    <Paragraph position="4"> Based on the results of our evaluation, we make the following observations. First, Table 4 illustrates the generality of our prediction mechanism. Although the system's performance varies across environments, the use of cues consistently improves the system's accuracies in predicting the task and dialogue initiative holders by 2-4 percentage points (with the exception of the maptask corpus in which there is no room for improvement) TM and 8-13 percentage points, respectively. Second, Table 4 shows the specificity of the trained bpa's with respect to application environments. Using our prediction mechanism, the system's performances on the collaborative planning dialogues (TRAINS91, TRAINS93, and airline reservation) most closely resemble one another (last row in table). This suggests that the bpa's may be somewhat sensitive to application environments since they may affect how agents interpret cues. Third, our prediction mechanism yields better results on task-oriented dialogues. This is because such dialogues are constrained by the goals; therefore, there are fewer digressions and offers of unsolicited opinion as compared to the switchboard corpus.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML