File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-1032_intro.xml
Size: 2,586 bytes
Last Modified: 2025-10-06 14:06:34
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1032"> <Title>Automated Scoring Using A Hybrid Feature Identification Technique</Title> <Section position="2" start_page="0" end_page="206" type="intro"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> This paper describes the development and evaluation of a prototype system designed for the purpose of automatically scoring essay responses. The paper reports on evaluation results from scoring 13 sets of essay data from</Paragraph> <Section position="1" start_page="0" end_page="206" type="sub_section"> <SectionTitle> the Analytical Writing Assessments of the Graduate Management Admissions Test </SectionTitle> <Paragraph position="0"> (GMAT) (see the GMAT Web site at http://www.gmat.org/ for sample questions) and 2 sets of essay data from the Test of Written English (TWE) (see http://www.toefl.org/ tstprpmt.html for sample TWE questions).</Paragraph> <Paragraph position="1"> Electronic Essay Rater (e-rater) was designed to automatically analyze essay features based on writing characteristics specified at each of six score points in the scoring guide used by human raters for manual scoring (also available at http://www.gmat.orff). The scoring guide indicates that an essay that stays on the topic of the question has a strong, coherent and well-organized argument structure, and displays a variety of word use and syntactic structure will receive a score at the higher end of the six-point scale (5 or 6). Lower scores are assigned to essays as these characteristics diminish.</Paragraph> <Paragraph position="2"> One of our main goals was to design a system that could score an essay based on features specified in the scoring guide for manual scoring. E-rater features include rhetorical structure, syntactic structure, and topical analysis. For each essay question, a stepwise linear regression analysis is run on a set of training data (human-scored essay responses) to extract a weighted set of predictive features for each test question.</Paragraph> <Paragraph position="3"> Final score prediction for cross-validation uses the weighted predictive feature set identified during training. Score prediction accuracy is determined by measuring agreement between human rater scores and e-rater score predictions. In accordance with human interrater &quot;agreement&quot; standards, human and e-rater scores also &quot;agree&quot; if there is an exact match or if the scores differ by no more than one point (adjacent agreement).</Paragraph> </Section> </Section> class="xml-element"></Paper>