File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-1018_evalu.xml
Size: 3,175 bytes
Last Modified: 2025-10-06 13:59:15
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1018"> <Title>Chinese Text Summarization Based on Thematic Area Detection</Title> <Section position="6" start_page="4" end_page="4" type="evalu"> <SectionTitle> 4 Experimental Results and Performance </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="4" end_page="4" type="sub_section"> <SectionTitle> Evaluation 4.1 Evaluation Methodology </SectionTitle> <Paragraph position="0"> It is challenging to objectively evaluate the qua lity of different automatic summarization methods. Methods for evaluation can be broadly classified into two categories: intrinsic and extrinsic (Mani, 2001). We adopt the former to evaluate the quality of summarization by defining the following parameters for evaluation.</Paragraph> <Paragraph position="1"> 1) Theme coverage (TC) The definition of TC is the percentage of the thematic contents covered by the selected summarization sentences. The value of the parameter can be got by means of the works of some experts.</Paragraph> <Paragraph position="2"> 2) Representation entropy (RE) In order to effectively and objectively evaluate the redundancy of the produced summary, we refer to the parameter which was initially proposed by (Mitra et al., 2002) for evaluating the feature redundancy in the process of feature sele ction and transform it into the novel parameter to evaluate the summarization redundancy.</Paragraph> <Paragraph position="3"> According to this, some important notations are defined as follows: N Number of terms in the original document ; The novel evaluation parameter proposed by us can objectively evaluate the quality of the produced summary by effectively combining the above two parameters. The more the value of F, the better the quality of the produced summary.</Paragraph> </Section> <Section position="2" start_page="4" end_page="4" type="sub_section"> <SectionTitle> 4.2 Experimental Results </SectionTitle> <Paragraph position="0"> We randomly extract 200 documents of different genres from the Modern Chinese Corpus of State Language Commission to form the experimental corpus. Because summarizing short documents doesn't make much sense in real applic ations (Gong and Liu, 2001), we select 30 documents of more than 400 characters from the corpus as the samples which are summarized by the proposed summarization method (method 1 for abbreviation) and the traditional non-thematic -area-detection method (method 2 for abbreviation), that is the method of determining the weights of sentences in a document, sorting them in a decreasing order, and selecting the top sentences in the end. The specific experimental data and evaluation results of parameters are given in table 3 and table 4.</Paragraph> <Paragraph position="1"> The synthetic evaluation of the 30 samples proves that our method under the above evaluation parameters is superior to the traditional non-thematic -area-detection summarization method when dealing with different genres of text documents with free style and flexible theme distribution, and the results we have achieved are encouraging.</Paragraph> </Section> </Section> class="xml-element"></Paper>