File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-0211_metho.xml

Size: 28,803 bytes

Last Modified: 2025-10-06 14:07:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-0211">
  <Title>Discourse Processing for Explanatory Essays in Tutorial Applications</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Overview of the Why-Atlas Tutoring
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
System
</SectionTitle>
      <Paragraph position="0"> The architecture for the Why-Atlas qualitative physics tutoring system is shown in Figure 1. The user interface for the system is a screen area in which the physics question is displayed along with an essay entry window and a dialogue window. As the student enters an answer and explanation for a qualitative physics question the sentence-level understanding module builds sets of propositions and passes  them, via the discourse manager, to the discourse-level understanding module. Each set of propositions represents one interpretation of a sentence. The user interface and the sentence-level understanding components are described in detail in (Ros'e, 2000; Freedman et al., 2000).</Paragraph>
      <Paragraph position="1"> The discourse-level understanding module uses language and domain reasoning axioms and the Tacitus-lite+ abductive inference engine to create a set of proofs that offer an explanation for the student's essay and give some insight into what the student may believe about physics and how to apply that knowledge. The discourse-level understanding module updates the propositions and the search queue for proofs in the history with the results from Tacitus-lite+. This part of the history supports anaphora resolution and processing of revisions a student may make to his essay. The discourse manager module selects and sends the best proofs to the tutorial strategist.</Paragraph>
      <Paragraph position="2"> The tutorial strategist identifies relevant communicative goals. Currently there are four categories of communicative goals. Two of these, disambiguating terminology and clarifying the essay, are addressed via directives to modify the essay. The other two, remediating misconceptions and eliciting more complete explanations, are addressed via dialogue. Misconceptions are detected when the proof includes an axiom that is incorrect or inapplicable. Incompleteness is detected under two conditions. First, there may be multiple proofs that are equally good.</Paragraph>
      <Paragraph position="3"> This condition indicates that the student did not say enough in his explanation for the system to decide which proof best represents what the student's reasoning may be. Each possible line of reasoning could point to different underlying problems with the student's physics knowledge. The second condition occurs when the student fails to explicitly state a mandatory point, which is a proposition that domain instructors require of any acceptably complete essay. Once the tutorial strategist has identified communicative goals it prioritizes them according to curriculum constraints and sends them to the discourse manager, which selects the highest priority goal after taking dialogue coherency into account and sends the goal to either the dialogue engine or the sentence-level realization module.</Paragraph>
      <Paragraph position="4"> The dialogue engine initiates and carries out a dialogue plan that will either help the student recognize and repair a misconception or elicit a more complete explanation from the student. The main mechanism for addressing these goals are what we call a knowledge construction dialogue (KCD) specification. A KCD specification is a hand-authored push-down network. Nodes in the network are either the system's questions to students or pushes and pops to other networks. The links exiting a node correspond to anticipated responses to the question. Each question is a canned string, ready for presentation to a student. The last state of the network is saved in the history and the sentence-level understanding module accesses this in order to get information for analysis of student responses. The sentence-level understanding module uses a classification approach for dialogue responses from the student since currently the dialogue plans are limited to ones that expect short, direct responses. During a dialogue, response class information is delivered directly to the dialogue engine via the discourse manager. The dialogue engine is described further in (Ros'e et al., 2001).</Paragraph>
      <Paragraph position="5"> The other communicative goals, disambiguating terminology and clarifying the essay, are addressed by the discourse manager as directives for the student to modify the essay. It passes propositions and a goal to the sentence-level realization module which uses templates to build the deep syntactic structures required by the RealPro realizer (Lavoie and Rambow, 1997) for generating a string that communicates the goal.</Paragraph>
      <Paragraph position="6"> When the discourse manager is ready to end its turn in the dialogue, it passes the accumulated natural language strings to the user interface. This output may also include transitions between the goals selected for the turn.</Paragraph>
      <Paragraph position="7"> While a dialogue is in progress, the discourse-level understanding and tutorial strategist modules are bypassed until the essay is revised. Once the student revises his essay, it is reanalyzed and the cycle repeats until no additional communicative goals arise from the system's analysis of the essay.</Paragraph>
      <Paragraph position="8"> Although the overall architecture of the system is a pipeline, there is feedback to earlier modules via the history. Only the discourse-level understanding and discourse manager modules are internally pipelines, the rest are rule-based.</Paragraph>
      <Paragraph position="9"> 3 Background on Weighted Abduction and Tacitus-lite+ Abduction is a process of reasoning from an observation to possible explanations for that observation. In the case of the Why-Atlas system the observations are what the student said and the possible explanations for why the student said this are the physics qualitative axioms (both good and bad) and orderings of those axioms that support what the student said. To arrive at the explanation, some assumptions have to be made along the way since all the inferences that underly an explanation will not be expressed.</Paragraph>
      <Paragraph position="10"> Weighted abduction is one of several possible formalisms for realizing abductive reasoning. With weighted abduction there is a cost associated with making an assumption during the inference process.</Paragraph>
      <Paragraph position="11"> Following the weighted abductive inference algorithm described in (Stickel, 1988), Tacitus-lite is a collection of axioms where each axiom is expressed as a Horn clause. Further, each conjunct pi has a weight wi associated with it, as in (2). The weight is used to calculate the cost of assuming pi instead of proving it where cost(pi) = cost(r) wi.</Paragraph>
      <Paragraph position="12"> (2) p1w1 ^ ^ pnwn ) r Given a goal or observation to be proven, Tacitus-lite takes one of four actions; 1) assumes the observation at the cost associated with it 2) unifies with a fact for zero cost 3) unifies with a literal that has already been assumed or proven at no additional cost 4) attempts to prove it with an axiom.</Paragraph>
      <Paragraph position="13"> All possible proofs could be generated. However, Tacitus-lite allows the applications builder to set depth bounds on the number of axioms applied in proving an observation and on the global number of proofs generated during search. Tacitus-lite maintains a queue of proofs where the initial proof reflects assuming all the observations and each of the four above actions adds a new proof to the queue.</Paragraph>
      <Paragraph position="14"> The proof generation can be stopped at any point and the proofs with the lowest cost can be selected as the most plausible proofs for the observations.</Paragraph>
      <Paragraph position="15"> Tacitus-lite uses a best-first search guided by heuristics that select which proof to expand, which observation or goal in that proof to act upon, which action to apply and which axiom to use when that is the selected action. Most of the heuristics in Why-Atlas are specific to the domain and application.</Paragraph>
      <Paragraph position="16"> SRI's release of Tacitus-lite was subsequently extended by the first author of this paper for the research project described in (Thomason et al., 1996).</Paragraph>
      <Paragraph position="17"> It was named Tacitus-lite+ at that time. Two main extensions from that work that we are making use of are: 1) proofs falling below a user defined cost threshold halt the search 2) a simple variable typing system reduces the number of axioms written and the size of the search space (Hobbs et al., 1988, pg 102).</Paragraph>
      <Paragraph position="18"> Unlike the earlier applications of Tacitus-lite+, Why-Atlas uses it for both shallow qualitative physics reasoning and discourse-level language reasoning. To support qualitative physics reasoning we've made a number of general inference engine extensions, such as improved consistency checking, detecting and avoiding reasoning loops and allowing the axiom author to express both good and bad axioms in the same axiom set. These recent extensions are described further in (Jordan et al., 2002).</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Building an Abductive Proof
</SectionTitle>
    <Paragraph position="0"> The discourse-level understanding module uses language axioms and the Tacitus-lite+ abductive inference engine to resolve pronominal and temporal anaphora and make other discourse-level language related inferences. It transforms the sentence-level propositions into more complete propositions given the context of the problem the student is solving (represented as facts) and the context of the preceding sentences of the essay.</Paragraph>
    <Paragraph position="1"> From these discourse-level propositions, proofs are built and analyzed to determine appropriate communicative actions. To build these proofs, the discourse-level understanding module uses domain axioms, the above resulting propositions and again the Tacitus-lite+ abductive inference engine.</Paragraph>
    <Paragraph position="2"> We've separated the discourse-level language axioms from the domain axioms both for efficiency and modularity because there is generally only a small amount of interaction between the language and domain axioms. Separating them reduces the search space. In cases where interaction within a single axiom is necessary, we've place these axioms in the set of language axioms. The system currently has 90 language axioms and 95 domain axioms. The domain axioms fully cover 5 problems as well as parts of many other problems.</Paragraph>
    <Paragraph position="3"> We will describe in more detail each of these stages of building the proof in the sections that follow. null</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Applying Discourse-level Language Axioms
to Sentence-level Propositions
</SectionTitle>
      <Paragraph position="0"> The discourse-level language axioms are currently addressing the local resolution of pronominal and temporal anaphora, flattening out embedded relationships and canonicalizing some lexical choices that can only be resolved given the context of the problem. We are still developing and testing axioms that will better address pronominal and temporal anaphora inter-sententially and axioms that will generate additional propositions for quantifiers and plurals.</Paragraph>
      <Paragraph position="1"> Pronominal Anaphora. It is generally easy to resolve pronominal anaphora in the context of a qualitative physics problem because there are a small number of candidates to consider. For example, in the case of the pumpkin problem in (1), there are only four physics bodies that are likely to be discussed in a student essay; the pumpkin, the runner, the earth and air.</Paragraph>
      <Paragraph position="2"> The system is able to resolve simple intra-sentential pronominal references using language axioms. The objects described in a single sentence are the candidate set and argument restrictions rule out many of these candidates. But to resolve inter-sentential anaphora, as in (3), the system currently relies on the domain axioms. The domain axioms will bind the body variables to their most likely referents during unification with facts, and previously assumed and proven propositions similarly to (Hobbs et al., 1988).</Paragraph>
      <Paragraph position="3"> (3) The man is exerting a force on it.</Paragraph>
      <Paragraph position="4"> But in the case of anaphoric references to physical quantities such as velocity, acceleration and force, as in (4), we need to extend the language axioms to handle these cases because it involves too much unconstrained search for the domain axioms to resolve these. This is because the physical quantities are the predicates that most strongly influence the domain reasoning.</Paragraph>
      <Paragraph position="5"> (4) The velocity is constant before the pumpkin is thrown. But after the release, it will decrease because there is no force.</Paragraph>
      <Paragraph position="6"> To extend the language axioms to address inter-sentential anaphora we need to implement and test a recency ordering of the physics bodies and quantities that have already been discussed in the essay. But we expect this to be simple to do since the essays generally only involve one discourse segment.</Paragraph>
      <Paragraph position="7"> Temporal Anaphora. As with pronominal anaphora, temporal anaphora is usually clear because the student often explicitly indicates when an event or state occurs relative to another event or state as with the first sentence of the explanation presented in (1). In these cases, the domain-level reasoning will be able to unify the anchor event or state with an already known event or state in the proof it is constructing.</Paragraph>
      <Paragraph position="8"> When there is no temporal anchor the domain-level search is too under-constrained so the language axioms resolve the temporal orderings. In some cases world knowledge is used to infer the temporal relationships as in (5). Here we know that to catch an object it must have been thrown or dropped beforehand and so the event in (5a) must occur after the event in (5b).</Paragraph>
      <Paragraph position="9"> (5) a. The man catches the pumpkin.</Paragraph>
      <Paragraph position="10"> b. This is because they had the same velocity when he threw it.</Paragraph>
      <Paragraph position="11"> Otherwise, the language axioms use information about tense and aspect and default orderings relative to these to guide inferences about temporal relationships ((Kamp, 1993; Dowty, 1986; Partee, 1984; Webber, 1988) inter alia).</Paragraph>
      <Paragraph position="12"> Embedded Relationships. In the physics essays we are addressing, there is a tendency to express multiple relations within a single sentence as in (6). Here the &amp;quot;equal&amp;quot; and &amp;quot;opposite&amp;quot; relations are embedded in a temporal &amp;quot;when&amp;quot; relation. In this case the sentence-level understanding module is not in the best position to indicate the specific constraints that each of these relations imposes so this is handled by discourse-level understanding. It would also impose a greater burden on the domain-level proof building if these relationships were not resolved beforehand. For example, in the case of the last clause in (6) there is an elliptical reference that could cause the domain-level a great deal of unconstrained search.</Paragraph>
      <Paragraph position="13"> (6) When the magnitude of the pumpkin's velocity equals the man's, the pumpkin's velocity is in the opposite direction.</Paragraph>
      <Paragraph position="14"> Canonicalizing Lexical Usage. One simple case in which the language axioms canonicalize lexical items has to do with direction. For example, saying &amp;quot;move up the inclined plane&amp;quot; should be interpreted as a positive direction for the horizontal component even though the phrase contains &amp;quot;up&amp;quot;. The axioms are able to canonicalize references such as up, down, left, right, north, south into a positive or negative direction relative to an axis in a coordinate system that may be tilted slightly to align with planes. This is an example of the kinds of axioms in which language and domain knowledge are interacting within a single axiom.</Paragraph>
      <Paragraph position="15"> Quantifiers and Plurals In our target essays, there is frequent usage of quantifiers and plurals with respect to physics bodies and frequent use of quantifiers with respect to parameters of physical quantities (e.g. &amp;quot;at all times&amp;quot; &amp;quot;all the magnitudes of the velocities&amp;quot;).</Paragraph>
      <Paragraph position="16"> We have recently completed our specification for a sentence-level representation of quantifiers and plurals. From this representation the language axioms will generate an appropriate number of new propositions to use in the proof building stage, given the context of the problem and the expression recognized from sentence-level processing.</Paragraph>
      <Paragraph position="17"> Although we have not yet implemented and tested this set of language axioms, we have successfully hand-encoded sentences such as (7) into both their sentence-level and discourse-level representations and have used the latter successfully in the final proof building process. For example, for (7), the system creates two equivalent propositions about acceleration, each referring to different balls. In addition, both of these propositions are related to two horizontal component of velocity of pumpkin is decreasing horizontal component of force of air on pumpkin is 0 Student said: velocity of the pumpkin is decreasing horizontal component of the total force on pumpkin is 0</Paragraph>
      <Paragraph position="19"> horizontal component of force of man on pumpkin is 0 man applies a force of 0 to the pumpkin have impetus bug  exerting a force on it.&amp;quot; additional propositions about the force of gravity applying to the same ball as in its related acceleration proposition.</Paragraph>
      <Paragraph position="20"> (7) The acceleration of both balls is increasing due to the force of earth's gravity.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Applying Domain-level Axioms to Build an
Explanatory Proof
</SectionTitle>
      <Paragraph position="0"> The propositions produced by applying the language axioms are the goals that are to be proven using domain-level axioms. Figure 2 is an example of a simplified abductive proof for sentence (8).</Paragraph>
      <Paragraph position="1"> (8) The pumpkin moves slower because the man is not exerting a force on it.</Paragraph>
      <Paragraph position="2"> Each level of downward arrows from the gloss of a proposition in Figure 2 represents a domain axiom that can be used to prove that proposition. One way to prove that the velocity of the pumpkin is decreasing is to prove that just the horizontal component of the velocity vector is the one that is decreasing since the context of the question (see (1)) makes this a likely interpretation. Alternatively, the system could request that the student be more precise by asking which components of the velocity vector are decreasing.</Paragraph>
      <Paragraph position="3"> In the case of trying to prove that the horizontal component is decreasing, Tacitus-lite+ is applying a bad physics axiom that is one manifestation of the impetus misconception; the student thinks that a force is necessary to maintain a constant velocity. In this case it assumes the student has this misconception but alternatively the system could try to gather more evidence that this is true by asking the student diagnostic questions.</Paragraph>
      <Paragraph position="4"> Next Tacitus-lite+ proves that the total force on the pumpkin is zero by proving that the possible addend forces are zero. In the context of this problem, it is a given that air resistance is negligible and so it unifies with a fact for zero cost. Next it assumes that the student believes the man is applying a horizontal force of 0 to the pumpkin.</Paragraph>
      <Paragraph position="5"> Finally, it still needs to prove another proposition that was explicitly asserted by the student; that the force of the man on the pumpkin is 0. As with the velocity, it will try to prove this by proving that the horizontal component of that force is zero. Since it has already assumed that this is true, the abductive proof is finished and ready to be further analyzed by the tutorial strategist module to give additional feedback to the student.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Incrementally Processing an Essay
</SectionTitle>
      <Paragraph position="0"> We have also extended Tacitus-lite+ to run incrementally so that it can start processing before the student completes his essay. In this way it can take advantage of the processing lull as the student composes his essay. In simulations of various typing speeds, (Ros'e et al., 2002) estimated that there is a 60 second processing lull during the completion of a sentence after subtracting out a 5 second average incremental parsing cost. During this lull it can build proofs using the previous sentences in the essay.</Paragraph>
      <Paragraph position="1"> To run Tacitus-lite+ incrementally, we added a function that takes as input a proof queue and the new goals that are to be proven and returns a new proof queue. The discourse-level understanding module builds the input proof queue by finding the proofs in the most recent queue with which the new goals are consistent and adding the new goals to a copy of each of those proofs. We then modified Tacitus-lite+ to take an arbitrary proof queue as input. null The discourse-level understanding module stores and selects proof queues, which are returned by Tacitus-lite+ after it attempts to prove a sentence.</Paragraph>
      <Paragraph position="2"> Suppose for example that each sentential input is treated as a separate input to Tacitus-lite+ and that sentence Sk has already been processed and yielded proof queue Qk. As the next sentence Sk+1 arrives, a copy of Qk is updated with proofs that include Sk+1 as new information to be proven. But if Sk+1 conflicts with every proof in the copy of Qk, then an earlier proof queue is tried. Similarly, if a student modifies a previously processed sentence, the original sentence is regarded as having been deleted. The inference process backs up to the point just before the deleted sentence was processed and reprocesses the substituted sentence and all that follows it. This mechanism for backing-up allows the inference process to be incremental.</Paragraph>
      <Paragraph position="3"> At the end of composing an essay, the student will in the best case have to wait the length of time that it takes to finish parsing the last sentence of the essay plus the length of time that it takes to extend the proof by one sentence. In the worst case, which is when he modifies the first sentence or inserts a new first sentence, he will have to wait the same amount of time as he would for non-incremental discourse-level understanding.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Deriving Feedback for Students From
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Plausible Proofs
</SectionTitle>
      <Paragraph position="0"> To identify communicative goals the tutorial strategist next analyzes the best proofs. Currently it examines just one of the best proofs by applying a set of test patterns to parts of the proof. It can test for combinations of patterns for givens (mainly to get bindings for variables in a pattern), for assumed propositions, for propositions asserted in the student's essay, and for inferred propositions. In addition it can also test for missing patterns in the proof and for particular domain axioms to have been used. Each goal that the system is capable of addressing is linked to sets of patterns that are expected to be indicative of it.</Paragraph>
      <Paragraph position="1"> In the case of the proof for (8), the tutorial strategist identifies a dialogue goal that addresses the impetus misconception as being relevant since an impetus axiom is part of the proof.</Paragraph>
      <Paragraph position="2"> In addition to engaging students in a dialogue, the system can also give direct, constructive feedback on the essays they are composing. When there are multiple interpretations, it is better to ask the student to make certain things in the essay clearer. The tutorial strategist includes test patterns that target important details that students often leave out. For example, suppose the student says that the velocity is increasing but this is only true for the vertical component of the velocity vector. It may then be important to clarify which component of the velocity the student has in mind since thinking that the horizontal component is increasing indicates a misconception.</Paragraph>
      <Paragraph position="3"> It is also possible that two propositions in an essay will be contradictory. In this case the system points out that there is a conflict, describes the conflict and directs the student to repair it.</Paragraph>
      <Paragraph position="4"> We expect to extend the tutorial strategist module so that if there are multiple best proofs, it will ask the student questions that will help it disambiguate which proof is most representative of the student's intended meaning for the essay.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6 Preliminary Results and Future Plans
</SectionTitle>
    <Paragraph position="0"> Although we've found that incremental understanding is successful at taking advantage of the processing lull during which the student composes his essay, we still need to fine-tune it so as to minimize both the need to back-up and how much under-constrained searching it does (i.e. the more Tacitus-lite+ has of the student's explanation the more constrained the search is). Currently, Tacitus-lite+ runs after every new sentence that is recognized by the sentence-level understanding module. During each of these runs Tacitus-lite+ continues until one of its run-time thresholds is exceeded.</Paragraph>
    <Paragraph position="1"> We plan to also experiment with other ways of bounding the run-time for Tacitus-lite+ during incremental processing. For example, we might impose a specific time-limit that is based on the expected 60 second processing lull while the student composes his next sentence.</Paragraph>
    <Paragraph position="2"> In initial timing tests, using a set of 5 correct essays that involved no backing up, the average incremental processing time per sentence when we set the search bound to 50 proofs and the assumption cost threshold to .056, is 21.22 seconds. The worst case time for extending a proof by one sentence was 98 seconds and the best was 1 second. So in the best case, which is when no previous sentences have been modified, the student will wait on average 21.22 seconds after he completes the last sentence in his essay for a response from Why-Atlas.</Paragraph>
    <Paragraph position="3"> In human-human computer-mediated tutoring, we found that in the worst case the student waits 2 minutes for a reply from the tutor after completing the essay. The wait time in the case of the human tutor is a combination of the time it takes to read and analyze the student's response and then compose a reply.7 Although the timings are inconclusive and not directly comparable, it gives us an order of magnitude for tolerable wait times.</Paragraph>
    <Paragraph position="4"> We will complete a 5 week formative evaluation of the Why-Atlas system in which we will compare the learning gains of 24 students to other sets of students in three other conditions; 1) a text control 2) human tutoring 3) another tutoring system that uses statistical classification only. During these trials, we will log decisions and processing times for each module of the system. From these detailed logs we will be able to better evaluate the speed and correctness of each system module.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML