File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/n03-2030_metho.xml
Size: 5,200 bytes
Last Modified: 2025-10-06 14:08:16
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-2030"> <Title>A Hybrid Approach to Content Analysis for Automatic Essay Grading</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Student Essay Analysis </SectionTitle> <Paragraph position="0"> We cast the Student Essay Analysis problem as a text classification problem where we classify each sentence in the student's essay as an expression one of a set of &quot;correct answer aspects&quot;, or &quot;nothing&quot; in the case where no &quot;correct answer aspect&quot; was expressed. Essays are first segmented into individual sentence units. Next, each segment is classified as corresponding to one of the set of key points or &quot;nothing&quot; if it does not include any key point.</Paragraph> <Paragraph position="1"> We then take an inventory of the classifications other than &quot;nothing&quot; that were assigned to at least one segment. We performed our evaluation over essays collected from students interacting with our tutoring system in response to the question &quot;Suppose you are running in a straight line at constant speed. You throw a pumpkin straight up. Where will it land? Explain.&quot;, which we refer to as the Pumpkin Problem. Thus, there are a total of six alternative classifications for each segment: Class 1 After the release the only force acting on the pumpkin is the downward force of gravity.</Paragraph> <Paragraph position="2"> Class 2 The pumpkin continues to have a constant horizontal velocity after it is released.</Paragraph> <Paragraph position="3"> Class 3 The horizontal velocity of the pumpkin continues to be equal to the horizontal velocity of the man.</Paragraph> <Paragraph position="4"> Class 4 The pumpkin and runner cover the same distance over the same time.</Paragraph> <Paragraph position="5"> Class 5 The pumpkin will land on the runner.</Paragraph> <Paragraph position="6"> Class 6 Sentence does not adequately express any of the above specified key points.</Paragraph> <Paragraph position="7"> Often what distinguishes sentences from one class and another is subtle. For example, &quot;The pumpkin's horizontal velocity, which is equal to that of the man when he released it, will remain constant.&quot; belongs to Class 2. However, it could easily be mistaken for Class 3 based on the set of words included, although it does not express that idea since it does not address the relationship between the pumpkin's and man's velocity after the release. Similarly, &quot;So long as no other horizontal force acts upon the pumpkin while it is in the air, this velocity will stay the same.&quot;, belongs to Class 2 although looks similar on the surface to either Class 1 or 3. Nevertheless, it does not express the required propositional content for either of those classes.</Paragraph> <Paragraph position="8"> The most frequent problem is that sentences that express most but not all of the content associated with a required point should be classified as &quot;nothing&quot; although they have a lot of words in common with sentences from the class that they are most similar to. Similarly, sentences like &quot;It will land on the ground where the runner threw it up.&quot; contain all of the words required to correctly express the idea corresponding to Class 5, although it does not express that idea, and in fact expresses a wrong idea. These very subtle distinctions pose problems for &quot;bag of words&quot; approaches since they base their decisions only on which words are present regardless of their order or the functional relationships between them.</Paragraph> <Paragraph position="9"> The hybrid CarmelTC approach induces decision trees using features from a deep syntactic functional analysis of an input text as well as a prediction from the Rainbow Naive Bayes text classifier (McCallum and Nigam, 1998).</Paragraph> <Paragraph position="10"> Additionally, it uses features that indicate the presence or absence of words found in the training examples. From these features CarmelTC builds a vector representation for each sentence. It then uses the ID3 decision tree learning algorithm (Quinlin, 1993) to induce rules for identifying sentence classes based on these feature vectors.</Paragraph> <Paragraph position="11"> From CARMEL's deep syntactic analysis of a sentence, we extract individual features that encode functional relationships between syntactic heads (e.g., (subjthrow man)), tense information (e.g., (tense-throw past)), and information about passivization and negation (e.g., (negation-throw +) or (passive-throw -)). Syntactic feature structures produced by the grammar factor out those aspects of syntax that modify the surface realization of a sentence but do not change its deep functional analysis, including syntactic transformations such as passivization and extraction. These deep functional relationships give CarmelTC the information lacking on Bag of Words approaches that is needed for effective content analysis in highly causal domains, such as research methods or physics.</Paragraph> </Section> class="xml-element"></Paper>