File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/p05-1065_intro.xml
Size: 3,445 bytes
Last Modified: 2025-10-06 14:03:08
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-1065"> <Title>Reading Level Assessment Using Support Vector Machines and Statistical Language Models</Title> <Section position="2" start_page="0" end_page="523" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The U.S. educational system is faced with the challenging task of educating growing numbers of students for whom English is a second language (U.S.</Paragraph> <Paragraph position="1"> Dept. of Education, 2003). In the 2001-2002 school year, Washington state had 72,215 students (7.2% of all students) in state programs for Limited English Pro cient (LEP) students (Bylsma et al., 2003). In the same year, one quarter of all public school students in California and one in seven students in Texas were classi ed as LEP (U.S. Dept. of Education, 2004). Reading is a critical part of language and educational development, but nding appropriate reading material for LEP students is often dif cult. To meet the needs of their students, bilingual education instructors seek out high interest level texts at low reading levels, e.g. texts at a rst or second grade reading level that support the fth grade science curriculum. Teachers need to nd material at a variety of levels, since students need different texts to read independently and with help from the teacher. Finding reading materials that ful ll these requirements is dif cult and time-consuming, and teachers are often forced to rewrite texts themselves to suit the varied needs of their students.</Paragraph> <Paragraph position="2"> Natural language processing (NLP) technology is an ideal resource for automating the task of selecting appropriate reading material for bilingual students.</Paragraph> <Paragraph position="3"> Information retrieval systems successfully nd topical materials and even answer complex queries in text databases and on the World Wide Web. However, an effective automated way to assess the reading level of the retrieved text is still needed. In this work, we develop a method of reading level assessment that uses support vector machines (SVMs) to combine features from statistical language models (LMs), parse trees, and other traditional features used in reading level assessment.</Paragraph> <Paragraph position="4"> The results presented here on reading level assessment are part of a larger project to develop teacher-support tools for bilingual education instructors. The larger project will include a text simplication system, adapting paraphrasing and summarization techniques. Coupled with an information retrieval system, these tools will be used to select and simplify reading material in multiple languages for use by language learners. In addition to students in bilingual education, these tools will also be useful for those with reading-related learning disabili- null ties and adult literacy students. In both of these situations, as in the bilingual education case, the student's reading level does not match his/her intellectual level and interests.</Paragraph> <Paragraph position="5"> The remainder of the paper is organized as follows. Section 2 describes related work on reading level assessment. Section 3 describes the corpora used in our work. In Section 4 we present our approach to the task, and Section 5 contains experimental results. Section 6 provides a summary and description of future work.</Paragraph> </Section> class="xml-element"></Paper>