File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/w96-0510_intro.xml
Size: 3,240 bytes
Last Modified: 2025-10-06 14:06:10
<?xml version="1.0" standalone="yes"?> <Paper uid="W96-0510"> <Title>Summarization: an Application for NL Generation</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In this paper, I will be exploring techniques for automatically summarising texts, concentrating on selecting the content of the summary from a parsed (semantic) representation of the original text. Summarization is a particularly nice application for natural language generation because the original text can serve as the knowledge base for generating the summary.</Paragraph> <Paragraph position="1"> In addition, we only need to develop a lexicon limited to the words and senses in the original text (as long as we use the same words in the same context as the original text). This simplifies the generation task somewhat.</Paragraph> <Paragraph position="2"> However, summarization is not a trivial task. We must first analyze the original text using a robust grammar that can produce a reliable semantic interpretation of the text. To simplify this investigation, I will not tackle the many problems of NL analysis, but will use already parsed texts from the TAG Tree Bank (UPenn, 1995). I use a perl script to convert the syntactic structures in this parsed corpus into a list of logical forms that roughly indicate the predicate-argument structure of each clause in the text. 1 We can generate a summary by choosing a subset of this list of LFs.</Paragraph> <Paragraph position="3"> However, choosing the right subset is not easy.</Paragraph> <Paragraph position="4"> The problem is how to judge which clauses are important: Sophisticated discourse analysis is needed in order to interpret the intentional and rhetorical structure of the original text and then prune it in the appropriate ways.</Paragraph> <Paragraph position="5"> 1A parser which directly produces the pred-arg structure is probably preferable to this method. Note that the parser probably would not have to resolve all syntactic ambiguities in the the summarization task, because we can preserve the same ambiguities in the summary, or delete some of the problem phrases such as PPs in the summary anyway.</Paragraph> <Paragraph position="6"> However, discourse analysis is a hard task that requires an immense amount of world knowledge (Sparck-Jones, 1993). I investigate ways to generate a summary without full interpretation of the original text. I use Centering Theory to roughly segment the text, as described in the next section. Then, as described in section 3, a set of pruning rules based on centers and discourse relations are used to select the content of the summary. First, those segments that are about the most frequent centers of attention are selected, and then these segments are pruned by recognizing non-critical elaborations among the propositions. Another heuristic used is to select restatements among the propositions for the summary, since restatement is a good indicator of important information. The proposed summarization heuristics are tested out on a sample text in section 4; an implementation to test out these heuristics is in progress.</Paragraph> </Section> class="xml-element"></Paper>