File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1205_intro.xml

Size: 4,367 bytes

Last Modified: 2025-10-06 14:02:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1205">
  <Title>An evolutionary approach for improving the quality of automatic summaries</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> It is generally accepted that there are two main approaches for producing automatic summaries.</Paragraph>
    <Paragraph position="1"> The first one is called extract and rearrange because it extracts the most important sentences from a text and tries to arrange them in a coherent way. These methods were introduced in the late 50s (Luhn, 1958) and similar methods are still widely used.</Paragraph>
    <Paragraph position="2"> The second approach attempts to understand the text and, then, generates its abstract, for this reason it is referred to as understand and generate. The best-known method that uses such an approach is described in (DeJong, 1982). Given that the methods which &amp;quot;understand&amp;quot; a text are domain dependent, whenever robust methods are required, extraction methods are preferred.</Paragraph>
    <Paragraph position="3"> Even though the extraction methods currently used are more advanced than the one proposed in (Luhn, 1958), many still produce summaries which are not very coherent, making their reading difficult. This paper presents a novel summarisation approach which tries to improve the quality of the produced summaries by ameliorating their local cohesion.</Paragraph>
    <Paragraph position="4"> This paper is structured as follows: In Section 2 we present our hypothesis: it is possible to produce better summaries by enforcing the continuity principle (see next section for a definition of this principle) . A corpus of scientific abstracts is analysed in Section 3 to learn whether this principle holds in human produced summaries.</Paragraph>
    <Paragraph position="5"> In Section 4, we present two algorithms which combine traditional techniques with information provided by the continuity principle. Several criteria are used to evaluate these algorithms on scientific articles in Section 5. We finish with concluding remarks, which also indicate possible future research avenues.</Paragraph>
    <Paragraph position="6"> 2 How to ensure local cohesion In the previous section we already mentioned that we are trying to improve the automatic summaries by using the continuity principle defined in Centering Theory (CT) (Grosz et al., 1995). This principle, requires that two consecutive utterances have at least one entity in common. Even though it sounds very simple, this principle is important for the rest of the principles defined in the CT because if it does not hold, none of the other principles can be satisfied. Given that only the continuity principle will be used in this paper and due to space limits, the rest of these principles are not discussed here. Their description can be found in (Kibble and Power, 2000). For the same reason we will not go into details about the CT.</Paragraph>
    <Paragraph position="7"> In this paper, we take an approach similar to (Karamanis and Manurung, 2002) and try to produce summaries which do not violate the continuity principle. In this way, we hope to produce summaries which contain sequences of sentences that refer the same entity, and therefore will be more coherent. Before we can test if the principle is satisfied, it is necessary to define certain parameters on which the principle relies. As aforementioned, the principle is tested on pairs of consecutive utterances. In general utterances are clauses or sentences. Given that the automatic identification of clauses is not very accurate, we consider sentences as utterances. An advantage of using sentences is that most summarisation methods extract sentences, which makes it easier to integrate them with our method.</Paragraph>
    <Paragraph position="8"> In this paper, we consider that two utterances have an entity in common if the same head noun phrase appears in both utterances. In order to determine the head of noun phrases we use the FDG tagger (Tapanainen and J&amp;quot;arvinen, 1997) which also provides partial dependency relations between the constituents of a sentence. At this stage we do not employ any other method to determine whether two noun phrases are semantically related.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML