File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/06/p06-3007_relat.xml
Size: 3,329 bytes
Last Modified: 2025-10-06 14:15:58
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-3007"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Investigations on Event-Based Summarization</Title> <Section position="4" start_page="37" end_page="37" type="relat"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> Term-based extractive summarization can date back to (Luhn, 1958) and (Edmundson, 1969).</Paragraph> <Paragraph position="1"> This approach is simple but rather applicable. It represents the content of documents mainly by bag of words. Luhn (1958) establishes a set of &quot;significant&quot; words, whose frequency is between a higher bound and a lower bound. Edmundson (1969) collects common words, cue words, title/heading words from documents. Weight scores of sentences are computed based on type/frequency of terms. Sentences with higher scores will be included in summaries. Later researchers adopt tf*idf score to discriminate words (Brandow et al., 1995) (Radev et al., 2004). Other surface features are also exploited to extract important sentence, such as position of sentence and length of sentence (Teufel and Moens, 1999) (Radev et al., 2004). To make the extraction model suitable for documents in different domains, recently machine learning approaches are widely employed (Kupiec et al., 1995) (Conroy and Schlesinger, 2004).</Paragraph> <Paragraph position="2"> To represent deep meaning of documents, other researchers have investigated different structures. Barzilay and Elhadad (1997) segment the original text and construct lexical chains.</Paragraph> <Paragraph position="3"> They employ strong chains to represent important parts of documents. Marcu (1997) describes a rhetorical parsing approach which takes unrestricted text as input and derives the rhetorical structure tree. They express documents with structure trees. Dejong (1978) adopts predefined templates to express documents. For each topic, the user predefines frames of expected information types, together with recognition criteria. However, these approaches just achieve moderate results.</Paragraph> <Paragraph position="4"> Recently, event receives attention to represent documents. Filatovia and Hatzivassiloglou (2004) define event as action (verbs/action nouns) and named entities. After identifying actions and event entities, they adopt frequency weighting scheme to identify important sentence.</Paragraph> <Paragraph position="5"> Vanderwende et al. (2004) represent event by dependency triples. After analysis of triples they connect nodes (words or phrases) by way of semantic relationships. Yoshioka and Haraguchi (2004) adopt a similar approach to build a map, but they regard sentence as the nodes of the map.</Paragraph> <Paragraph position="6"> After construction of a map representation for documents, Vanderwende et al. (2004), and Yoshioka and Haraguchi (2004) all employ PageRank algorithm to select the important sentences. Although these approaches employ event representation and PageRank algorithm, it should be noted that our event representation is different with theirs. Our event representation is based on named entities and event terms, without help of dependency parsing. These previous event-based approaches achieved promising results.</Paragraph> </Section> class="xml-element"></Paper>