File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0501_intro.xml

Size: 2,790 bytes

Last Modified: 2025-10-06 14:01:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0501">
  <Title>Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation</Title>
  <Section position="4" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Previous Work
</SectionTitle>
    <Paragraph position="0"> Other researchers have investigated the topic of automatic generation of abstracts, but the focus has been different, e.g., sentence extraction (Edmundson, 1969; Johnson et al, 1993; Kupiec et al., 1995; Mann et al., 1992; Teufel and Moens, 1997; Zechner, 1995), processing of structured templates (Paice and Jones, 1993), sentence compression (Hori et al., 2002; Knight and Marcu, 2001; Grefenstette, 1998, Luhn, 1958), and generation of abstracts from multiple sources (Radev and McKeown, 1998). We focus instead on the construction of headline-style abstracts from a single story.</Paragraph>
    <Paragraph position="1"> Headline generation can be viewed as analogous to statistical machine translation, where a concise document is generated from a verbose one using a Noisy Channel Model and the Viterbi search to select the most likely summarization. This approach has been explored in (Zajic et al., 2002) and (Banko et al., 2000).</Paragraph>
    <Paragraph position="2"> The approach we use in Hedge is most similar to that of (Knight and Marcu, 2001), where a single sentence is shortened using statistical compression. As in this work, we select headline words from story words in the order that they appear in the story--in particular, the first sentence of the story. However, we use linguistically motivated heuristics for shortening the sentence; there is no statistical model, which means we do not require any prior training on a large corpus of story/headline pairs.</Paragraph>
    <Paragraph position="3"> Linguistically motivated heuristics have been used by (McKeown et al, 2002) to distinguish constituents of parse trees which can be removed without affecting grammaticality or correctness. GLEANS (Daume et al, 2002) uses parsing and named entity tagging to fill values in headline templates.</Paragraph>
    <Paragraph position="4"> Consider the following excerpt from a news story:  (1) Story Words: Kurdish guerilla forces moving  with lightning speed poured into Kirkuk today immediately after Iraqi troops, fleeing relentless U.S. airstrikes, abandoned the hub of Iraq's rich northern oil fields.</Paragraph>
    <Paragraph position="5"> Generated Headline: Kurdish guerilla forces poured into Kirkuk after Iraqi troops abandoned oil fields.</Paragraph>
    <Paragraph position="6"> In this case, the words in bold form a fluent and accurate headline for the story. Italicized words are deleted based on information provided in a parse-tree representation of the sentence.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML