File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1611_metho.xml

Size: 18,851 bytes

Last Modified: 2025-10-06 14:10:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1611">
  <Title>Exploiting Discourse Structure for Spoken Dialogue Performance Analysis</Title>
  <Section position="5" start_page="87" end_page="87" type="metho">
    <SectionTitle>
3 Interaction parameters
</SectionTitle>
    <Paragraph position="0"> For each user, interaction parameters measure specific aspects of the dialogue with the system.</Paragraph>
    <Paragraph position="1"> We use our transition and student state annotation to create two types of interaction parame- null The agreement between the manual correctness annotation and the correctness assigned by ITSPOKE is 90% (kappa of 0.79). In a preliminary agreement study, a second annotator labeled our corpus for a binary version of certainty (uncertainty versus other), resulting in a 90% inter-annotator agreement and a kappa of 0.68.</Paragraph>
    <Paragraph position="2"> ters: unigrams and bigrams. The difference between the two types of parameters is whether the discourse structure context is used or not. For each of our 12 labels (4 for correctness, 4 for certainty and 6 for discourse structure), we derive two unigram parameters per student over the 5 dialogues for that student: a total parameter and a percentage parameter. For example, for the 'Incorrect' unigram we compute, for each student, the total number of student turns labeled with 'Incorrect' (parameter Incorrect) and the percentage of such student turns out of all student turns (parameter Incorrect%). For example, if we consider only the dialogue in Figure 1, In-</Paragraph>
    <Paragraph position="4"> Bigram parameters exploit the discourse structure context. We create two classes of bigram parameters by looking at transition-student state bigrams and transition-transition bigrams. The transition-student state bigrams combine the information about the student state with the transition information of the previous system turn. Going back to Figure 1, the three incorrect answers will be distributed to three bigrams: Advance- null be counted as an Advance-PopUpAdv bigram.</Paragraph>
    <Paragraph position="5"> Similar to the unigrams, we compute a total parameter and a percentage parameter for each bigram. The percentage denominator is number of student turns for the transition-student state bigrams and the number of system turns minus one for the transition-transition bigram. In addition, for each bigram we compute a relative percentage parameter (bigram followed by %rel) by computing the percentage relative to the total number of times the transition unigram appears for that student. For example, we will compute the Advance-Incorrect %rel parameter by dividing the number of Advance-Incorrect bigrams with the number of Advance unigrams (1 divided by 2 in Figure 1); this value will capture the percentage of times an Advance transition is followed by an incorrect student answer.</Paragraph>
  </Section>
  <Section position="6" start_page="87" end_page="90" type="metho">
    <SectionTitle>
4 Results
</SectionTitle>
    <Paragraph position="0"> We use student learning as our evaluation metric because it is the primary metric for evaluating the performance of tutoring systems. Previous work (Forbes-Riley and Litman, 2006) has suc- null cessfully used student learning as the performance metric in the PARADISE framework. Two quantities are used to measure student learning: the pretest score and the posttest score. Both tests consist of 40 multiple-choice questions; the test's score is computed as the percentage of correctly answered questions. The average score and standard deviation for each test are: pretest 0.47 (0.17) and posttest 0.68 (0.17).</Paragraph>
    <Paragraph position="1"> We focus primarily on correlations between our interaction parameters and student learning.</Paragraph>
    <Paragraph position="2"> Because in our data the pretest score is significantly correlated with the posttest score, we study partial Pearson's correlations between our parameters and the posttest score that account for the pretest score. This correlation methodology is commonly used in the tutoring research (Chi et al., 2001). For each trend or significant correlation we report the unigram/bigram, its average and standard deviation over all students, the  statistical significance of R (p).</Paragraph>
    <Paragraph position="3"> First we report significant correlations for unigrams to test our first hypothesis. Next, for our second and third experiment, we report correlations for transition-student state and transition-transition parameters. Finally, we report our preliminary results on PARADISE modeling.</Paragraph>
    <Section position="1" start_page="88" end_page="88" type="sub_section">
      <SectionTitle>
4.1 Unigram correlations
</SectionTitle>
      <Paragraph position="0"> In our first proposed experiment, we want to test the predictive utility of discourse structure in isolation. We compute correlations between our transition unigram parameters and learning. We find no trends or significant correlations. This result suggests that discourse structure in isolation has no predictive utility.</Paragraph>
      <Paragraph position="1"> Here we also report all trends and significant correlations for student state unigrams as the baseline for contextual correlations to be presented in Section 4.2. We find only one significant correlation (Table 2): students with a higher percentage of neutral turns (in terms of certainty) are negatively correlated with learning. We hypothesize that this correlation captures the student involvement in the tutoring process: more involved students will try harder thus expressing more certainty or uncertainty. In contrast, less involved students will have fewer certain/uncertain/mixed turns and, in consequence, more neutral turns. Surprisingly, student correctness does not significantly correlate with learning.</Paragraph>
    </Section>
    <Section position="2" start_page="88" end_page="89" type="sub_section">
      <SectionTitle>
4.2 Transition-student state correlations
</SectionTitle>
      <Paragraph position="0"> For our second experiment, we need to determine the predictive utility of transition-student state bigram parameters. We find a large number of correlations for both transition-correctness bi-grams and transition-certainty bigrams.</Paragraph>
      <Paragraph position="1"> Transition-correctness bigrams This type of bigram informs us whether accounting for the discourse structure transition when looking at student correctness has any predictive value. We find several interesting trends and significant correlations (Table 3).</Paragraph>
      <Paragraph position="2"> The student behavior, in terms of correctness, after a PopUp or a PopUpAdv transition is very informative about the student learning process.</Paragraph>
      <Paragraph position="3"> In both situations, the student has just finished a remediation subdialogue and the system is popping up either by reasking the original question again (PopUp) or by moving on to the next question (PopUpAdv). We find that after PopUp, the number of correct student answers is positively correlated with learning. In contrast, the number, the percentage and the relative percentage of incorrect student answers are negatively correlated with learning. We hypothesize that this correlation indicates whether the student took advantage of the additional learning opportunities offered by the remediation subdialogue. By answering correctly the original system question (PopUp-Correct), the student demonstrates that she has absorbed the information from the remediation dialogue. This bigram is an indication of a successful learning event. In contrast, answering the original system question incorrectly (PopUp-Incorrect) is an indication of a missed learning opportunity; the more events like this happen the less the student learns.</Paragraph>
      <Paragraph position="4">  Similarly, being able to correctly answer the tutor question after popping up from a remediation subdialogue (PopUpAdv-Correct) is positively correlated with learning. Since in many cases, these system questions will make use of  the knowledge taught in the remediation subdialogues, we hypothesize that this correlation also captures successful learning opportunities.</Paragraph>
      <Paragraph position="5"> Another set of interesting correlations is produced by the NewTopLevel-Incorrect bigram.</Paragraph>
      <Paragraph position="6"> We find that the number, the percentage and the relative percentage of times ITSPOKE starts a new essay revision dialogue that results in an incorrect student answer is positively correlated with learning. The content of the essay revision dialogue is determined based on ITSPOKE's analysis of the student essay. We hypothesize that an incorrect answer to the first tutor question is indicative of the system's picking of a topic that is problematic for the student. Thus, we see more learning in students for which more knowledge gaps are discovered and addressed by ITSPOKE.</Paragraph>
      <Paragraph position="7"> Finally, we find the number of times the student answers correctly after an advance transition is positively correlated with learning (the Advance-Correct bigram). We hypothesize that this correlation captures the relationship between students that advance without having major problems and a higher learning gains.</Paragraph>
      <Paragraph position="8"> Transition-certainty bigrams Next we look at the combination between the transition in the dialogue structure and the student certainty (Table 4). These correlations offer more insight on the negative correlation between the Neutral % unigram parameter and student learning. We find that out of all neutral student answers, those that follow an Advance transitions are negatively correlated with learning. Similar to the Neutral % correlation, we hypothesize that Advance-Neutral correlations capture the lack of involvement of the student in the tutoring process. This might be also due to ITSPOKE engaging in teaching concepts that the student is already familiar with.</Paragraph>
      <Paragraph position="9">  In contrast, staying neutral in terms of certainty after a system rejection is positively correlated with learning. These correlations show that based on their position in the discourse structure, neutral student answers will be correlated either negatively or positively with learning.</Paragraph>
      <Paragraph position="10"> Unlike student state unigram parameters which produce only one significant correlation, transition-student state bigram parameters produce a large number of trend and significant correlations (14). This result suggests that exploiting the discourse structure as a contextual information source can be beneficial for performance modeling.</Paragraph>
    </Section>
    <Section position="3" start_page="89" end_page="90" type="sub_section">
      <SectionTitle>
4.3 Transition-transition bigrams
</SectionTitle>
      <Paragraph position="0"> For our third experiment, we are looking at the transition-transition bigram correlations (Table 5). These bigrams help us find trajectories of length two in the discourse structure that are associated with better student learning. Because our student state is domain dependent, translating the transition-student state bigrams to a new domain will require finding a new set of relevant factors to replace the student state. In contrast, because our transition information is domain independent, transition-transition bigrams can be easily implemented in a new domain.</Paragraph>
      <Paragraph position="1"> The Advance-Advance bigram covers situations where the student is covering tutoring material without major knowledge gaps. This is because an Advance transition happens when the student either answers correctly or his incorrect answer can be corrected without going into a remediation subdialogue. Just like with the Advance-Correct correlation (recall Table 3), we hypothesize that these correlations links higher learning gains to students that cover a lot of material without many knowledge gap.</Paragraph>
      <Paragraph position="2">  The Push-Push bigrams capture another interesting behavior. In these cases, the student incorrectly answers a question, entering a remediation subdialogue; she also incorrectly answers the first question in the remediation dialogue entering an even deeper remediation subdialogue. We hypothesize that these situations are indicative of big student knowledge gaps. In our corpus, we find that the more such big knowledge gaps are discovered and addressed by the system the higher the learning gain.</Paragraph>
      <Paragraph position="3"> The SameGoal-Push bigram captures another type of behavior after system rejections that is positively correlated with learning (recall the SameGoal-Neutral bigram, Table 4). In our previous work (Rotaru and Litman, 2006), we per- null formed an analysis of the rejected student turns and studied how rejections affect the student state. The results of our analysis suggested a new strategy for handling rejections in the tutoring domain: instead of rejecting student answers, a tutoring SDS should make use of the available information. Since the recognition hypothesis for a rejected student turn would be interpreted most likely as an incorrect answer thus activating a remediation subdialogue, the positive correlation between SameGoal-Push and learning suggests that the new strategy will not impact learning.</Paragraph>
      <Paragraph position="4"> Similar to the second experiment, the results of our third experiment are also positive: in contrast to transition unigrams, our domain independent trajectories can produce parameters with a high predictive utility.</Paragraph>
    </Section>
    <Section position="4" start_page="90" end_page="90" type="sub_section">
      <SectionTitle>
4.4 PARADISE modeling
</SectionTitle>
      <Paragraph position="0"> Here we present our preliminary results on applying the PARADISE framework to model ITSPOKE performance. A stepwise multivariate linear regression procedure (Walker et al., 2000) is used to automatically select the parameters to be included in the model. Similar to (Forbes-Riley and Litman, 2006), in order to model the learning gain, we use posttest as the dependent variable and force the inclusion of the pretest score as the first variable in the model.</Paragraph>
      <Paragraph position="1"> For the first experiment, we feed the model all transition unigrams. As expected due to lack of correlations, the stepwise procedure does not select any transition unigram parameter. The only variable in the model is pretest resulting in a model with a R  of .22.</Paragraph>
      <Paragraph position="2"> For the second and third experiment, we first build a baseline model using only unigram parameters. The resulting model achieves an R  of .39 by including the only significantly correlated unigram parameter: Neutral %. Next, we build a model using all unigram parameters and all significantly correlated bigram parameters. The new model almost doubles the R  to 0.75. Besides the pretest, the parameters included in the resulting model are (ordered by the degree of contribution from highest to lowest): Advance-Neutral %rel, and PopUp-Incorrect %. These results strengthen our correlation conclusions: discourse structure used as context information or as trajectories information is useful for performance modeling. Also, note that the inclusion of student certainty in the final PARADISE model provides additional support to a hypothesis that has gained a lot of attention lately: detecting and responding to student emotions has the potential to improve learning (Craig et al., 2004; Forbes-Riley and Litman, 2005; Pon-Barry et al., 2006).</Paragraph>
      <Paragraph position="3"> The performance of our best model is comparable or higher than training performances reported in previous work (Forbes-Riley and Litman, 2006; Moller, 2005b; Walker et al., 2001). Since our training data is relatively small (20 data points) and overfitting might be involved here, in the future we plan to do a more in-depth evaluation by testing if our model generalizes on a larger ITSPOKE corpus we are currently annotating. null</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="90" end_page="91" type="metho">
    <SectionTitle>
5 Related work
</SectionTitle>
    <Paragraph position="0"> Previous work has proposed a large number of interaction parameters for SDS performance modeling (Moller, 2005a; Walker et al., 2000; Walker et al., 2001). Several information sources are being tapped to devise parameters classified by (Moller, 2005a) in several categories: dialogue and communication parameters (e.g. dialogue duration, number of system/user turns), speech input parameters (e.g. word error rate, recognition/concept accuracy) and meta-communication parameters (e.g. number of help request, cancel requests, corrections).</Paragraph>
    <Paragraph position="1"> But most of these parameters do not take into account the discourse structure information. A notable exception is the DATE dialogue act annotation from (Walker et al., 2001). The DATE annotation captures information on three dimensions: speech acts (e.g. acknowledge, confirm), conversation domain (e.g. conversation- versus task-related) and the task model (e.g. subtasks like getting the date, time, origin, and destination). All these parameters can be linked to the discourse structure but flatten the discourse structure. Moreover, the most informative of these parameters (the task model parameters) are domain dependent. Similar approximations of the discourse structure are also common for other SDS tasks like predictive models of speech recognition problems (Gabsdil and Lemon, 2004).</Paragraph>
    <Paragraph position="2"> We extend over previous work in several areas. First, we exploit in more detail the hierarchical information in the discourse structure. We quantify this information by recording the discourse structure transitions. Second, in contrast to previous work, our usage of discourse structure is domain independent (the transitions).</Paragraph>
    <Paragraph position="3"> Third, we exploit the discourse structure as a contextual information source. To our knowledge, previous work has not employed parameters similar with our transition-student state bi- null gram parameters. Forth, via the transition-transition bigram parameters, we exploit trajectories in the discourse structure as another domain independent source of information for performance modeling. Finally, similar to (Forbes-Riley and Litman, 2006), we are tackling a more problematic performance metric: the student learning gain. While the requirements for a successful information access SDS are easier to spell out, the same can not be said about tutoring SDS due to the current limited understanding of the human learning process.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML