File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0506_intro.xml
Size: 3,942 bytes
Last Modified: 2025-10-06 14:01:54
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0506"> <Title>A Study for Documents Summarization based on Personal Annotation</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Related work </SectionTitle> <Paragraph position="0"> To summarize, is to reduce complexity of documents, hence, in length, while retaining some of the important information in the original documents. Titles, keywords, tables-of-contents and abstracts might all be considered as forms of summary; here, we consider summary as a set of sentences containing some of the essential information from the original document.</Paragraph> <Paragraph position="1"> A lot of approaches were proposed in text summarization, such as word frequency based method (Luhn, 1958), cue phrase method (Edmundson, 1969), Position-based methods (Edmundson, 1969; Hovy and Lin, 1997; Teufel and Moens, 1997).</Paragraph> <Paragraph position="2"> At the same time, some machine learning methods were used to integrate different clues in documents.</Paragraph> <Paragraph position="3"> Given a corpus and its predefined summaries as training set, it is to identify the relationships between documents and their summaries, the sentences which satisfy the rules are the ones to be extracted (Kupiec et al., 1995). Other machine learning methods perform sentence clustering based on a set of extracted features of sentences, and, choose a representative sentence from each cluster, and combine them into a summary according to their original order in the text (Nomoto and Matsumoto, 2001).</Paragraph> <Paragraph position="4"> Most of the above techniques have the limitations we have mentioned at the beginning, they failed to supply a personalized summary which reflects the interests and preferences of different users.</Paragraph> <Paragraph position="5"> There were some work based on annotation (Golovchinsky et al., 1999; Price et al., 1998), but they mainly focus on supplying an authoring tool which gives instructions on how to do annotation; or annotation identification and extraction which is difficult since annotation may be done freely and randomly; or annotation based query which aims at query expansion based on annotations. But rarely people think of using annotation for summarization. In fact, we only found one work about summarization based on annotation (Nagao and Hasida, 1998), but annotations there are defined on a complex set of GDA (Global Document Annotation) tags, which is an XML-based tag set, and allows machines to automatically infer underlying structures of documents by parsing, and authors of WWW files can annotate their documents by those tags, but they are not studied further about how to affect the summarization.</Paragraph> <Paragraph position="6"> Since annotations reflect user's opinions, different users may have different annotations; thus summarizations based on annotations are tailored to users' interests to some extent. Therefore we will integrate annotations into our summarization framework, which is expected to supply personalized summaries for given users, and different from traditional uniform summary. Here we make an assumption that what are annotated is interesting or important compared to other parts of document, which is reasonable since this is a common view about why users make annotations.</Paragraph> <Paragraph position="7"> In this paper, we mainly focus on the kind of annotations that are parts of the text in order to avoid complex manuscript recognition. Since we mainly consider the effect of annotations on the performance of summarization, annotations can only make sense when they are thought of as keywords. However, past experiments show that keywords method has a lower performance when working with other methods (Edmundson, 1969; Teufel and Moens, 1997), so the main approach we used here is based on key words frequency.</Paragraph> </Section> class="xml-element"></Paper>