File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/96/c96-2166_abstr.xml
Size: 1,539 bytes
Last Modified: 2025-10-06 13:48:40
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-2166"> <Title>Fast Generation of Abstracts from General Domain Text Corpora by Extracting Relevant Sentences</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> This paper describes a system for generating text abstracts which relies on a general, purely statistical principle, i.e., on the notion of &quot;relevance&quot;, as it is defined in terms of the combination of tf*idf weights of words in a sentence. The system generates abstracts from newspaper articles by selecting the &quot;most relevant&quot; sentences and combining them in text order. Since neither domain knowledge nor text-sort-specific heuristics are involved, this system provides maximal generality and flexibility.</Paragraph> <Paragraph position="1"> Also, it is fast and can be efficiently ilnplemented for both on-line and off-line purposes. An experiment shows that recall and precision for the extracted sentences (taking the sentences extracted by human subjects as a baseline) is within the same range as recall/precision when the human subjects are coinpared amongst each other: this means in fact that tile performance of the system is indistinguishable from the performance of a human abstractor. Finally, the system yields significantly better results than a default &quot;lead&quot; algorithm does which chooses just some initial sentences from the text.</Paragraph> </Section> class="xml-element"></Paper>