File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/00/p00-1041_relat.xml
Size: 2,909 bytes
Last Modified: 2025-10-06 14:15:34
<?xml version="1.0" standalone="yes"?> <Paper uid="P00-1041"> <Title>Headline Generation Based on Statistical Translation</Title> <Section position="3" start_page="0" end_page="0" type="relat"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> Most previous work on summarization focused on extractive methods, investigating issues such as cue phrases (Luhn, 1958), positional indicators (Edmundson, 1964), lexical occurrence statistics (Mathis et al., 1973), probabilistic measures for token salience (Salton et al., 1997), and the use of implicit discourse structure (Marcu, 1997). Work on combining an information extraction phase followed by generation has also been reported: for instance, the FRUMP system (DeJong, 1982) used templates for both in- null 1: time -3.76 Beam 40 2: new customers -4.41 Beam 81 3: dell computer products -5.30 Beam 88 4: new power macs strategy -6.04 Beam 90 5: apple to sell macintosh users -8.20 Beam 86 6: new power macs strategy on internet -9.35 Beam 88 7: apple to sell power macs distribution strategy -10.32 Beam 89 8: new power macs distribution strategy on internet products -11.81 Beam 88 9: apple to sell power macs distribution strategy on internet -13.09 Beam 86 formation extraction and presentation. More recently, summarizers using sophisticated postextraction strategies, such as revision (McKeown et al., 1999; Jing and McKeown, 1999; Mani et al., 1999), and sophisticated grammar-based generation (Radev and McKeown, 1998) have also been presented.</Paragraph> <Paragraph position="1"> The work reported in this paper is most closely related to work on statistical machine translation, particularly the 'IBM-style' work on CAN-DIDE (Brown et al., 1993). This approach was based on a statistical translation model that mapped between sets of words in a source language and sets of words in a target language, at the same time using an ordering model to constrain possible token sequences in a target language based on likelihood. In a similar vein, a summarizer can be considered to be 'translating' between two languages: one verbose and the other succinct (Berger and Lafferty, 1999; Witbrock and Mittal, 1999). However, by definition, the translation during summarization is lossy, and consequently, somewhat easier to design and experiment with. As we will discuss in this paper, we built several models of varying complexity;1 even the simplest one did reasonably well at summarization, whereas it would have been severely deficient at (traditional) translation.</Paragraph> <Paragraph position="2"> 1We have very recently become aware of related work that builds upon more complex, structured models - syntax trees - to compress single sentences (Knight and Marcu, 2000); our work differs from that work in (i) the level of compression possible (much more) and, (ii) accuracy possible (less).</Paragraph> </Section> class="xml-element"></Paper>