File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1101_intro.xml
Size: 3,187 bytes
Last Modified: 2025-10-06 14:02:01
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1101"> <Title>Improving Summarization Performance by Sentence Compression - A Pilot Study</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The majority of systems participating in the past</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Document Understanding Conference (DUC, 2002) </SectionTitle> <Paragraph position="0"> (a large scale summarization evaluation effort sponsored by the United States government), and the Text Summarization Challenge (Fukusima and Okumura, 2001) (sponsored by Japanese government) are extraction based. Extraction-based automatic text summarization systems extract parts of original documents and output the results as summaries (Chen et al., 2003; Edmundson, 1969; Goldstein et al., 1999; Hovy and Lin, 1999; Kupiec et al., 1995; Luhn, 1969). Other systems based on information extraction (McKeown et al., 2002; Radev and McKeown, 1998; White et al., 2001) and discourse analysis (Marcu, 1999; Strzalkowski et al., 1999) also exist but they are not yet usable for general-domain summarization. Our study focuses on the effectiveness of applying sentence compression techniques to improve the performance of extraction-based automatic text summarization systems.</Paragraph> <Paragraph position="1"> Sentence compression aims to retain the most salient information of a sentence, rewritten in a short form (Knight and Marcu, 2000). It can be used to deliver compressed content to portable devices (Buyukkokten et al., 2001; Corston-Oliver, 2001) or as a reading aid for aphasic readers (Carroll et al., 1998) or the blind (Grefenstette, 1998). Earlier research in sentence compression focused on compressing single sentences, and were evaluated on a sentence by sentence basis. For example, Jing (2000) trained her system on a set of 500 sentences from the Benton Foundation (http://www.benton.org) and their reduced forms written by humans. The results were evaluated at the parse tree level against the reduced trees; while Knight and Marcu (2000) trained their system on a set of 1,067 sentences from Ziff-Davis magazine articles and evaluated their results on grammaticality and importance rated by humans. Both reported success in their evaluation criteria. However, neither of them reported their techniques' effectiveness in improving the overall performance of automatic text summarization systems. The goal of this pilot study is set to answer this question and provide a guideline for future research.</Paragraph> <Paragraph position="2"> Section 2 gives an overview of Knight and Marcu's sentence compression algorithm that we used to compressed summary sentences. Section 3 describes the multi-document summarization system, NeATS, which was used as our testbed. Section 4 introduces a recall-based unigram co-occurrence automatic evaluation metric. Section 5 presents the experimental design. Section 6 shows the empirical results. Section 7 concludes this paper and discusses future directions.</Paragraph> </Section> </Section> class="xml-element"></Paper>