File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1204_intro.xml
Size: 2,058 bytes
Last Modified: 2025-10-06 14:02:02
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1204"> <Title>Evaluation of Features for Sentence Extraction on Different Types of Corpora</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Our ultimate goal is to create a robust summarization system that can handle different types of documents in a uniform way. To achieve this goal, we have developed a summarization system based on sentence extraction. We have participated in evaluation workshops on automatic summarization for both Japanese and English written corpora. We have also evaluated the performance of the sentence extraction system for Japanese lectures. At both workshops we obtained some of the top results, and for the speech corpus we obtained results comparable with those for the written corpora. This means that the features we use are worth analyzing.</Paragraph> <Paragraph position="1"> Sentence extraction is one of the main methods required for a summarization system to reduce the size of a document. Edmundson (1969) proposed a method of integrating several features, such as the positions of sentences and the frequencies of words in an article, in order to extract sentences. He manually assigned parameter values to integrate features for estimating the significance scores of sentences.</Paragraph> <Paragraph position="2"> On the other hand, machine learning methods can also be applied to integrate features. For sentence extraction from training data, Kupiec et al. (1995) and Aone et al. (1998) used Bayes' rule, Lin (1999) and Nomoto and Matsumoto (1997) generated a decision tree, and Hirao et al. (2002) generated an SVM.</Paragraph> <Paragraph position="3"> In this paper, we not only show evaluation results for our sentence extraction system using combinations of features but also analyze the features for different types of corpora. The analysis gives us some indication about how to use these features and how to combine them.</Paragraph> </Section> class="xml-element"></Paper>