File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-0404_intro.xml

Size: 5,516 bytes

Last Modified: 2025-10-06 14:01:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-0404">
  <Title>Revisions that Improve Cohesion in Multi-document Summaries: A Preliminary Study</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Background and previous work
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Theories on discourse structure
Rhetorical Structure Theory (RST) [Mann &amp;
</SectionTitle>
      <Paragraph position="0"> Thompson, 1988] has contributed a great deal to the understanding of the discourse of written documents. RST describes the coherence nature of a text and is based on the assumption that the elementary textual units are non-overlapping text spans. The central concept of RST is the rhetorical relation, which indicates the relationship between two spans.</Paragraph>
      <Paragraph position="1"> RST can be used in sentence selection for single document summarization [Marcu, 1997].</Paragraph>
      <Paragraph position="2"> However, it cannot be applied to MDS. In RST, text coherence is achieved because the writer intentionally establishes relationships between the phrases in the text. This is not the case in MDS, where sentences are extracted from different source articles, written by various authors.</Paragraph>
      <Paragraph position="3"> Inspired by RST, [Radev, 2000] endeavored to establish a Cross-document Structure Theory (CST) that is more appropriate for MDS. CST focuses on the relationships between sentences that come from multiple documents, which vary substantially from those between sentences in the same text. Such relationships include identity, paraphrase and subsumption (one sentence contains more information than the other).</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Computational models of text coherence
</SectionTitle>
      <Paragraph position="0"> Based on RST, [Marcu, 2000] established a Rhetorical Parser. The parser exploits cue phrases in an algorithm that discovers discourse relationships between phrases in a text. This parser can be used to extract sentences in single-document summarization. To contrast, [Harabagiu, 1999] concentrated on the derivation of a model that can establish coherence relations in a text without relying on cue phrases. She made use of large lexical databases, such as Wordnet, and of path finding algorithms that generate the algorithms that generate the cohesion structure of texts represented by a lexical path.</Paragraph>
      <Paragraph position="1"> [Hovy, 1993] summarized previous work that focused on the automated planning and generation of multi-sentence texts using discourse relationships. Text generation is relevant to MDS, as we can view MDS as an attempt to generate a new text by reusing sentences from different sources. The systems discussed in [Hovy, 1993] relied on a knowledge base and a representation of discourse structure. The dependency of text generation on knowledge of discourse structure was emphasized.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Revision of single-document summaries
</SectionTitle>
      <Paragraph position="0"> [Mani et al, 1999] focused on the revision of single-document summaries in order to improve their informativeness. They noted that such revision might also fix 'coherence errors.' Three types of revision operators were identified: sentence compaction, sentence aggregation and sentence smoothing. To contrast, [Jing &amp; McKeown, 2000] concentrated on analyzing human-written summaries in order to determine how professionals construct summaries. They found that most sentences could be traced back to specific cut-and-paste operations applied to the source document. They identified six operations and used them to implement an automatic revision module.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.4 Temporal ordering of events
</SectionTitle>
      <Paragraph position="0"> [Filatova &amp; Hovy, 2001] addressed the issue of resolving temporal references in news stories. Although events in articles are not always presented in chronological order, readers must be able to reconstruct the timeline of events in order to comprehend the story. They endeavored to develop a module that could automatically assign a time stamp to each clause in a document. Using a syntactic parser, patterns were discovered as to which syntactic phrases tend to indicate the occurrence of a new event. In MDS, the correct temporal relationships between events described in the extracted sentences often needs to be reestablished, since they may be incorrect or unclear.</Paragraph>
      <Paragraph position="1"> [Barzilay et al, 2001] evaluated three algorithms for sentence ordering in multi-document summaries. One algorithm implemented was the Chronological Ordering algorithm. However, the resulting summaries often suffered from abrupt changes in topic. After conducting an experiment in which they studied how humans manually ordered sentences in a summary, they concluded that topically related sentences should be grouped together. The Chronological Ordering algorithm was augmented by introducing a cohesion constraint.</Paragraph>
      <Paragraph position="2"> The evaluation of the output summaries demonstrated a significant improvement in quality.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML