File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/w01-0806_intro.xml

Size: 3,446 bytes

Last Modified: 2025-10-06 14:01:12

<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-0806">
  <Title>An algorithm for efficiently generating summary paragraphs using tree-adjoining grammara0</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Summarisation of simply structured data as short natural language paragraphs has recently been a focus of interest. Shaw (1998) and Bental et al. (1999) looked at generating text from database records. Robin and McKeown (1996) summarised quantitative data. Shaw's examples were drawn from patient medical records; Bental et al's from online resource cataloging information. A requirement common to all these studies has been to produce aggregated (Reape and Mellish, 1999) text.</Paragraph>
    <Paragraph position="1"> Also in all these studies, the structure of the input data used was fairly flat. In particular, in (Shaw, 1998) and (Bental et al., 1999) each record a3 This work is supported by EPSRC grant GR/M23106. is associated with a particular entity (e.g. a patient or an online resource) and is essentially a list of attribute-value pairs. We refer to pairs as fields, and to attributes as field names. The relationship between a value and the entity with which it is associated is specified by the field name. Most field names represent &amp;quot;is a&amp;quot; or &amp;quot;has a&amp;quot; relationships and hence most values represent facts about the entity. Slightly more complex structure may also be coerced into this form, but we will focus on this simple case.</Paragraph>
    <Paragraph position="2"> For our application (summarising data about educational resources), we additionally assume that we are required to be able to summarise any subset of fields from a given record, and that our summary must include every member of that subset. The challenge from this sort of summarisation is to devise a system which satisfies two potentially incompatible constraints. First, it must be flexible enough to model, for any combination of fields, the optimally aggregated paragraph which expresses them. Second, despite the very large search space that such flexibility probably implies, it must be capable of finding that paragraph in a reasonable time.</Paragraph>
    <Paragraph position="3"> The contribution the present work makes is a set of algorithms which prune this search space.</Paragraph>
    <Paragraph position="4"> This space is specified in terms of compositions of elementary trees of a tree-adjoining grammar (TAG) (Joshi, 1986). The first transforms a TAG into a lexicalised version of itself which has better computational properties with respect to summarising a record. The second removes those parts of a TAG which are redundant with respect to summarising a particular record. The third identifies an explicit mapping from each field to its possible surface realisations, and hence allows a desirable surface form to be chosen for each field. Our partial implementation of these algorithms has produced some promising results.</Paragraph>
    <Paragraph position="5"> The rest of the paper is organised as follows. In section 2 we discuss the problem, our approach to modelling it, and characteristics of the search space implied by our approach. In section 3 we present our algorithms for searching this space.</Paragraph>
    <Paragraph position="6"> In section 4 we summarise and discuss.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML