File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/01/w01-0719_relat.xml
Size: 2,323 bytes
Last Modified: 2025-10-06 14:15:39
<?xml version="1.0" standalone="yes"?> <Paper uid="W01-0719"> <Title>Combining Linguistic and Machine Learning Techniques for Email Summarization</Title> <Section position="9" start_page="0" end_page="0" type="relat"> <SectionTitle> 5 Related Work </SectionTitle> <Paragraph position="0"> Machine learning has been successfully applied to different natural language tasks, including text summarization. A document summary is seen as a succinct and coherent prose that captures the meaning of the text. Prior work in document summarization has been mostly based on sentence extraction. Kupiec et al. (1995) use machine learning for extracting the most important sentences of the document. But extractive summarization relies on the properties of source text that emails typically do not have: coherence, grammaticality, well defined structure. Berger and Mittal (2000) present a summarization system, named OCELOT that provides the gist of the web documents based on probabilistic models. Their approach is closed related with statistical machine translation.</Paragraph> <Paragraph position="1"> As discussed in (Boguraev and Kennedy, 1999), the meaning of &quot;summary&quot; should be adjusted depending on the information management task for which it is used. Key phrases, for example, can be seen as semantic metadata that summarize and characterize documents (Witten et al., 1999; Turney, 2000). These approaches select a set of candidate phrases (bigrams or trigrams) and then apply Na &quot; ive Bayes learning to classify them as key phrases or not. But dealing only with n-grams does not always provide good output in terms of a summary. In (Boguraev and Kennedy, 1999) the &quot;gist&quot; of a document is seen as a sequence of salient objects, usually topical noun phrases, presented in a highlighted context. Their approach is similar to extracting technical terms (Justeson and Katz, 1995). Noun phrases are used also in IR task (Strzalkowski et al., 1999; Smeaton, 1999; Sparck Jones, 1999).</Paragraph> <Paragraph position="2"> The work of Strzalkowski et al. (1999) supports our hypothesis that for some NLP tasks (gisting, IR) the head+modifier relation of a noun phrase is in fact an ordered relation between semantically equally important elements.</Paragraph> </Section> class="xml-element"></Paper>