File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0204_metho.xml

Size: 10,353 bytes

Last Modified: 2025-10-06 14:09:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0204">
  <Title>Predicting Learning in Tutoring with the Landscape Model of Memory</Title>
  <Section position="4" start_page="21" end_page="21" type="metho">
    <SectionTitle>
3 Corpus of Tutoring Transcripts
</SectionTitle>
    <Paragraph position="0"> Our corpus was taken from transcripts collected for the ITSPOKE intelligent tutoring system project (Litman and Silliman, 2004). This project has collected tutoring dialogs with both human and computer tutors. In this paper, we describe results using the human tutor corpus.</Paragraph>
    <Paragraph position="1"> Students being tutored are first given a pre-test to gauge their physics knowledge. After reading instructional materials about physics, they are given a qualitative physics problem and asked to write an essay describing its solution. The tutor (in our case, a human tutor), examines this essay, identifies points of the argument that are missing or wrong, and engages the student in a dialog to remediate those flaws. When the tutor is satisfied that the student has produced the correct argument, the student is allowed to read an &amp;quot;ideal&amp;quot; essay which demonstrates the correct physics argument. After all problems have been completed, the student is given a post-test to measure overall learning gains. Fourteen students did up to ten problems each. The final data set contained 101,181 student and tutor turns, taken from 128 dialogs.</Paragraph>
  </Section>
  <Section position="5" start_page="21" end_page="21" type="metho">
    <SectionTitle>
4 Landscape Model &amp; Tutoring Corpus
</SectionTitle>
    <Paragraph position="0"> Next we generated a list of the physics concepts necessary to represent the main ideas in the target solutions. Relevant concepts were chosen by examining the &amp;quot;ideal&amp;quot; essays, representing the complete argument for each problem. One hundred and twelve such concepts were identified among the 10 physics problems. Simple keyword matching was used to identify these concepts as they appeared in each line Concept Name Keywords above above, over acceleration acceleration,accelerating action action, reaction affect experience,experienced after after, subsequent air friction air resistance, wind resistance average mean ball balls, sphere before before, previous beside beside, next to  of the dialog. A small sample of these concepts and their keywords is shown in Table 1.</Paragraph>
    <Paragraph position="1"> Each concept found was entered into the working memory model with an initial activation level, which was made to decay on subsequent turns using a formula modeled on van den Broek (1996). Concept strengths are assumed to decay by 50% every turn for three turns, after which they go to zero. A sample portion of a transcript showing concepts being identified, entering and decaying is shown in Table 2. Connections between concepts were then calculated as described in section two. A portion of a resulting concept link matrix is shown in Table 3. It should be noted that the Landscape model has some disadvantages in common with other bag-of-words methods. For example, it loses information about word order, and does not handle negation well. As mentioned in section two, van den Broek et al. created a measure that predicted the order in which individual concepts would be recalled. For our task, however, such a measure is less appropriate. We are less interested, for example, in the specific order in which a student remembers the concepts &amp;quot;car&amp;quot; and &amp;quot;heavier,&amp;quot; than we are in whether the student remembers the whole idea that a heavier car accelerates less. To measure these constellations of concepts, we created a new measure of idea strength.</Paragraph>
  </Section>
  <Section position="6" start_page="21" end_page="23" type="metho">
    <SectionTitle>
5 Measuring Idea Strength
</SectionTitle>
    <Paragraph position="0"> The connection strength matrices described above encode data about which concepts are present in each dialog, and how they are connected. To extract useful information from these matrices, we used the idea of a &amp;quot;point.&amp;quot; Working from the ideal essays, we identified a set of key points important for the solution of each physics problem. These key points  car heavier acceleration cause Student I don't know how to answer this it's got to be slower, cause, it's the car is heavier but 5 5 0 0 Tutor yeah, just write whatever you think is appropriate 2.5 2.5 0 0 Student ok, 1.25 1.25 0 0 Essay The rate of acceleration will decrease if the first car is towing a second, because even though the force of the car's engine is the same, the weight of the car is double  recognized that the force, uh, exerted will be the same in both cases,uh, now, uh, how is force related to acceleration?  are modeled after the points the tutor looks for in the student's essay and dialog. For example, in the &amp;quot;accelerating car&amp;quot; problem, one key point might be that the car's acceleration would decrease as the car got heavier. The component concepts of this point would be &amp;quot;car,&amp;quot; &amp;quot;acceleration,&amp;quot; &amp;quot;decrease,&amp;quot; and &amp;quot;heavier.&amp;quot; If this point were expressed in the dialog or essay, we would expect these concepts to have higher-than-average connection strengths between them. If this point were not expressed, or only partially expressed, we would expect lower connection strengths among its constituent concepts. The strength of a point, then, was defined as the sum of strengths of all the links between its component concepts. Call the point in the example above &amp;quot;a0 a3 .&amp;quot; point a0 a3 has n = 4 constituent concepts, and to find its strength we would sum the link strengths between their pairs: &amp;quot;car-acceleration,&amp;quot; &amp;quot;cardecrease,&amp;quot; &amp;quot;car-heavier,&amp;quot; &amp;quot;acceleration-decrease,&amp;quot;, &amp;quot;acceleration-heavier,&amp;quot; and &amp;quot;decrease-heavier.&amp;quot; Using values from Table 3, the total strength for the point would therefore be: a0a2a1a4a3a6a5a8a7a10a9a11a7a13a12a15a14a17a16a19a18a21a20a23a22a24a22a17a25a27a26a29a28a24a30a32a31a33a30a24a30a27a26a29a28a32a31a34a30a24a25a27a26a29a35a4a36a37a31 a36a38a28a27a26a39a22a17a35a40a31a41a30a27a26a39a22a17a30a32a31a33a30a27a26a39a22a17a30a42a20a43a28a24a30a24a28a27a26a29a44a45a22a46a26 For each point, we determined if its connections were significantly stronger than the average. We generate a reference average a8a48a47a38a49 a9a11a7a13a12a4a18 by taking 500 random sets of n concepts from the same dialog and averaging their link weights, where n is the number of concepts in the target point 1. If the target point was found to have a significantly (p a50 .05 in a t-test) larger value than the mean of this random sample, that point was above threshold, and considered to be present in the dialog.</Paragraph>
    <Paragraph position="1"> The number of above-threshold points was added up over all dialogs for each student. The total point-count for student S is therefore:  Where P is the total number of points in all dialogs, and T is a threshold function which returns 1</Paragraph>
    <Paragraph position="3"> Fifty-seven key points were identified among the ten problems, with each point containing between two and five concepts. The next section describes how well this point-count relates to learning.</Paragraph>
  </Section>
  <Section position="7" start_page="23" end_page="23" type="metho">
    <SectionTitle>
6 Results: Point Counts &amp; Learning
</SectionTitle>
    <Paragraph position="0"> We first define &amp;quot;concept-count&amp;quot; to be the number of times physics concepts were added to the activation strength matrix. This corresponds to each &amp;quot;5&amp;quot; in Table 2. Now we look at a linear model with post-test score as the dependant variable, and pre-test score and concept-count as independent variables. In this model pre-test score is significant, with a p-value of .029, but concept-count is not, with a p-value of .270. The adjusted R squared for the model is .396 Similarly, in a linear model with pre-test score and point-count as independent variables, pre-test score is significant with a p-value of .010 and point-count is not, having a p-value of .300. The adjusted R squared for this model is .387.</Paragraph>
    <Paragraph position="1"> However, the situation changes in a linear model with pre-test score, concept-count and point-count as independent variables, and post-test score as the dependent variable. Pre-test is again significant with a p-value of .002. Concept-count and point-count are now both significant with p-values of .016 and .017, respectively. The adjusted R-squared for this model rises to .631.</Paragraph>
    <Paragraph position="2"> These results indicate that our measure of points, as highly associated constellations of concepts, adds predictive power over simply counting the occurrence of concepts alone. The number of concept mentions does not predict learning, but the extent to which these concepts are linked into relevant points in the Landscape memory model is correlated with learning.</Paragraph>
  </Section>
  <Section position="8" start_page="23" end_page="23" type="metho">
    <SectionTitle>
7 Discussion
</SectionTitle>
    <Paragraph position="0"> Several features of the resulting model are worth mentioning. First, the Landscape Model is a model of memory, and our measurements can be interpreted as a measure of what the student is remembering from the tutoring session taken as a whole.</Paragraph>
    <Paragraph position="1"> Second, the point-counts are taken from the entire dialog, rather than from either the tutor or student's contributions. Other results suggest that it would be interesting to investigate the extent to which these points are produced by the student, the tutor, or both...and what effect their origin might have on their correlation with learning. For example, (Chi et al., 2001) investigated student-centered, tutor-centered and interactive hypotheses of tutoring and found that students learned just as effectively when tutor feedback was suppressed. They suggest, among other things, that students self-construction of knowledge was encouraging deep learning.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML