File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1009_intro.xml
Size: 1,716 bytes
Last Modified: 2025-10-06 14:02:35
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1009"> <Title>Hybrid Text Summarization: Combining External Relevance Measures with Structural Analysis</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In this paper, we present algorithms to address the shortcomings of both purely structural and purely statistical methods of sentence extraction summarization. We present the PALSUMM hybrid summarization algorithms that use structural methods based on discourse parsing to construct a representation of the text, apply conventional statistical methods to identify salient information (See discussion and references in Marcu 2003) and then construct a partial discourse tree that includes the information identified as most salient along with the text at all nodes dominating that salient information. Optionally, sentence compression techniques are applied to the resulting summary to further compress text length (Grefenstette, 1998; Knight and Marcu, 2002).</Paragraph> <Paragraph position="1"> The novelty of our approach lies in combining text structural methods with sentence extraction methods which evaluate relevance on the basis of external factors such as lexical frequency or lexical field information in the specific document, in related or documents in general or, alternatively by matching lexical items in a query against lexical items in a document. The sentences selected by the external oracle are then providing context for anaphora resolution and reference interpretation through inclusion of hierarchically superordinate information from the structural tree.</Paragraph> </Section> class="xml-element"></Paper>