File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0210_intro.xml

Size: 3,664 bytes

Last Modified: 2025-10-06 14:01:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0210">
  <Title>A Hybrid Text Classi cation Approach for Analysis of Student Essays</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In this paper we describe CarmelTC, a novel hybrid text classi cation approach for analyzing essay answers to qualitative physics questions. In our evaluation we demonstrate that the novel hybrid CarmelTC approach outperforms both Latent Semantic Analysis (LSA) (Landauer et al., 1998; Laham, 1997) and Rainbow (Mc-Callum, 1996; McCallum and Nigam, 1998), which is a Naive Bayes approach, as well as a purely symbolic approach similar to (Furnkranz et al., 1998). Whereas LSA and Rainbow are pure bag of words approaches, CarmelTC is a rule learning approach where rules for classifying units of text rely on features extracted from a syntactic analysis of that text as well as on a bag of words classi cation of that text. Thus, our evaluation demonstrates the advantage of combining predictions from symbolic and bag of words approaches for text classi cation. Similar to (Furnkranz et al., 1998), neither CarmelTC nor the purely symbolic approach require any domain speci c knowledge engineering or text annotation beyond providing a training corpus of texts matched with appropriate classi cations, which is also necessary for Rainbow, and to a much lesser extent for LSA.</Paragraph>
    <Paragraph position="1"> CarmelTC was developed for use inside of the Why2-Atlas conceptual physics tutoring system (VanLehn et al., 2002; Graesser et al., 2002) for the purpose of grading short essays written in response to questions such as Suppose you are running in a straight line at constant speed. You throw a pumpkin straight up. Where will it land? Explain. This is an appropriate task domain for pursuing questions about the bene ts of tutorial dialogue for learning because questions like this one are known to elicit robust, persistent misconceptions from students, such as heavier objects exert more force. (Hake, 1998; Halloun and Hestenes, 1985). In Why2-Atlas, a student rst types an essay answering a qualitative physics problem. A computer tutor then engages the student in a natural language dialogue to provide feedback, correct misconceptions, and to elicit more complete explanations. The rst version of Why2-Atlas was deployed and evaluated with undergraduate students in the spring of 2002; the system is continuing to be actively developed (Graesser et al., 2002).</Paragraph>
    <Paragraph position="2"> In contrast to many previous approaches to automated essay grading (Burstein et al., 1998; Foltz et al., 1998; Larkey, 1998), our goal is not to assign a letter grade to student essays. Instead, our purpose is to tally which set of correct answer aspects are present in student essays. For example, we expect satisfactory answers to the example question above to include a detailed explanation of how Newton's rst law applies to this scenario. From Newton's rst law, the student should infer that the pumpkin and the man will continue at the same constant horizontal velocity that they both had before the release.</Paragraph>
    <Paragraph position="3"> Thus, they will always have the same displacement from the point of release. Therefore, after the pumpkin rises and falls, it will land back in the man's hands. Our goal is to coach students through the process of constructing good physics explanations. Thus, our focus is on the physics content and not the quality of the student's writing, in contrast to (Burstein et al., 2001).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML