File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1018_intro.xml

Size: 2,332 bytes

Last Modified: 2025-10-06 14:02:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1018">
  <Title>Chinese Text Summarization Based on Thematic Area Detection</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> With the approaching information explosion, people begin to feel at a loss about the mass of information. Because the effectiveness of the existing information retrieval technology is still unsatisfactory, it becomes a problem to efficiently find the information mostly related to the needs of customers retrieval results so that customers can easily accept or reject the retrieved information without needing to look at the original retrieval results. This paper has proposed a new summarization method, where K-medoid clustering method is applied to detect all possible partitions of thematic areas, and a novel clustering analysis method, which is based on a self-defined objective function, is applied to automatically determine K, the number of latent thematic areas in a document This method consists of three main stages: 1) Find out the thematic areas in the document by adopting the K-medoid clustering method (Kaufmann and Rousseeuw, 1987as well as a novel clustering analysis method. 2) From each thematic area, find a sentence which has the maximum semantic similarity value with this area as the representation. 3) Output the selected sentences to form the final summary according to their pos itions in the original document. To validate the effectiveness of the proposed method, use this method as well as the traditional non-thematic -areas-detection method on our experimental samples to generate two groups of summaries. Next, make a comparison between them.</Paragraph>
    <Paragraph position="1"> The final results show a clear superiority of our method over the traditional one in the scores of the evaluation parameters.</Paragraph>
    <Paragraph position="2"> The remainder of this paper is organized as follows. In the next section, we review related methods that are commonly discussed in the automatic summarization literature. Section 3 describes our method in detail. The evaluation methodology and experimental results are presented in Section 4. Finally, we conclude with a discussion and future work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML