File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/01/w01-1011_relat.xml
Size: 2,989 bytes
Last Modified: 2025-10-06 14:15:38
<?xml version="1.0" standalone="yes"?> <Paper uid="W01-1011"> <Title>GIST-IT: Summarizing Email Using Linguistic Knowledge and Machine Learning Evelyne Tzoukermann</Title> <Section position="2" start_page="0" end_page="0" type="relat"> <SectionTitle> 1 Related work </SectionTitle> <Paragraph position="0"> Traditionally a document summary is seen as a small, coherent prose that renders to the user the important meaning of the text. In this framework most of the research has focused on extractive summaries at sentence level. However, as discussed in [Boguraev and Kennedy (1999)], the meaning of 'summary' should be adjusted depending on the information management task for which it is used. Key phrases, for example, can be seen as semantic metadata that summarize and characterize documents [Witten et al (1999), Turney (1999)]. These approaches select a set of candidate phrases (sequence of one, two or three consecutive stemmed, non-stop words) and then apply machine learning techniques to classify them as key phrases or not. But dealing only with n-grams does not always provide good output in terms of a summary (see discussion in Section 5.4).</Paragraph> <Paragraph position="1"> Wacholder (1998) proposes a linguistically-motivated method for the representation of the document aboutness: 'head clustering'. A list of simple noun phrases is first extracted, clustered by head and then ranked by the frequency of the head. Klavans et al (2000) report on the evaluation of 'usefulness' of head clustering in the context of browsing applications, in terms of quality and coverage.</Paragraph> <Paragraph position="2"> Other researchers have used noun-phrases quite successfully for information retrieval task [Strzalkowski et al (1999), Sparck-Jones (1999)]. Strzalkowski et al (1999) uses head + modifier pairs as part of a larger system which constitutes the &quot;stream model&quot; that is used for information retrieval. They treat the head-modifier relationship as an &quot;ordered relation between otherwise equal elements&quot;, emphasizing that for some tasks, the syntactic head of the NP is not necessarily a semantic head, and the modifier is not either necessarily a semantic modifier and that the opposite is often true. Using a machine learning approach, we proved this hypothesis for the task of gisting.</Paragraph> <Paragraph position="3"> Berger and Mittal (2000) present a summarization system named OCELOT, based on probabilistic models, which provides the gist of web documents. Like email messages, web documents are also very heterogeneous and their unstructured nature pose equal difficulties.</Paragraph> <Paragraph position="4"> In this paper, we propose a novel technique for summarization that combines the linguistic approach of extracting simple noun phrases as possible candidates for document extracts, and the use of machine learning algorithms to automatically select the most salient ones.</Paragraph> </Section> class="xml-element"></Paper>