File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/w01-0813_intro.xml

Size: 3,558 bytes

Last Modified: 2025-10-06 14:01:18

<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-0813">
  <Title>Applying Natural Language Generation to Indicative Summarization</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Automatic summarization techniques have mostly neglected the indicative summary, which characterizes what the documents are about. This is in contrast to the informative summary, which serves as a surrogate for the document. Indicative multidocument summaries are an important way of helping a user discriminate between several documents returned by a search engine.</Paragraph>
    <Paragraph position="1"> Traditional summarization systems are primarily based on text extraction techniques. For an indicative summary, which typically describes the topics and structural features of the summarized documents, these approaches can produce summaries that are too specific. In this paper, we propose a natural language generation (NLG) model for the automatic creation of indicative multidocument summaries. Our model is based on the values of high-level document features, such as its distribution of topics and media types.</Paragraph>
    <Paragraph position="2"> Highlighted differences between the documents: The topics include &amp;quot;definition&amp;quot; and &amp;quot;what are the risks?&amp;quot; More information on additional topics which are not (The American Medical Assocation family medical Physicians and Surgeons complete home medical guide).</Paragraph>
    <Paragraph position="3"> This file (5 minute emergency medicine consult) is close in content to the extract.</Paragraph>
    <Paragraph position="4"> included in the extract is available in these files The Merck manual of medical information contains extensive information on the topic.</Paragraph>
    <Paragraph position="5"> guide and The Columbia University College of We found 4 documents on Angina: Summary of the Disease: Angina Get information on: [ variant angina  |treatment?  |diag ... ]</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Navigational Aids
</SectionTitle>
      <Paragraph position="0"> Treatment is designed to prevent or reduce ischemia andExtract: minimize symptoms. Angina that cannot be controlled by drugs ...</Paragraph>
      <Paragraph position="1">  healthcare topic of &amp;quot;Angina&amp;quot;. The generated indicative summary in the bottom half categorizes documents by their difference in topic distribution. null Specifically, we focus on the problem of content planning in indicative multidocument summary generation. We address the problem of &amp;quot;what to say&amp;quot; in Section 2, by examining what document features are important for indicative summaries, starting from a single document context and generalizing to a multidocument, query-based context. This yields two rules-of-thumb for guiding content calculation: 1) reporting differences from the norm and 2) reporting information relevent to the query.</Paragraph>
      <Paragraph position="2"> We have implemented these rules as part of the content planning module of our CENTRIFUSER summarization system. The summarizer's architecture follows the consensus NLG architecture (Reiter, 1994), including the stages of content calculation and content planning. We follow the generation of a sample indicative multidocument query-based summary, shown in the bottom half of Figure 1, focusing on these two stages in the remainder of the paper.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML