File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-1203_evalu.xml
Size: 2,778 bytes
Last Modified: 2025-10-06 13:59:05
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1203"> <Title>Combining Optimal Clustering and Hidden Markov Models for Extractive Summarization</Title> <Section position="6" start_page="0" end_page="0" type="evalu"> <SectionTitle> 6 Experiments </SectionTitle> <Paragraph position="0"> Many researchers have proposed various evaluation methods for summarization. We find that extrinsic, task-oriented evaluation method to be most easily automated, and quantifiable (Radev 2000).</Paragraph> <Paragraph position="1"> We choose to evaluate our stochastic theme classification system (STCS) on a multi-document summarization task, among other possible tasks We choose a content-based method to evaluate the summaries extracted by our system, compared to those by another extraction-based system MEAD (Radev 2002), and against a baseline system that chooses the top N sentence in each document as salient sentences. All three systems are considered unsupervised.</Paragraph> <Paragraph position="2"> The evaluation corpus we use is a segment of the English part of HKSAR news from the LDC, consisting of 215 articles. We first use MEAD to extract summaries from 1%-20% compression ratio.</Paragraph> <Paragraph position="3"> We then use our system to extract the same number of salient sentences as MEAD, according to the sentence weights. The baseline system also extracts the same amount of data as the other two systems. We plot the cosine similarities of the original 215 documents with each individual extracted summaries from these three systems. The following figure shows a plot of cosine similarity scores against compression ratio of each extracted summary. In terms of relative similarity score, our system is 22.8% higher on average than MEAD, and 46.3% higher on average than the baseline. null summarizer (MEAD) by 22.8% on average, and outperforms the baseline top-N sentence selection system by 46.3% on average.</Paragraph> <Paragraph position="4"> We would like to note that in our comparative evaluation, it is necessary to normalize all variable factors that might affect the system performance, other than the intrinsic algorithm in each system. For example, we ensure that the sentence segmentation function is identical in all three systems. In addition, index term weights need to be properly trained within their own document clusters. Since MEAD discards all sentences below the length 9, the other two systems also discard such sentences. The feature weights in both our system and MEAD are all set to the default value one. Since all other features are the same between our system and MEAD, the difference in performance is attributed to the core clustering and centroid computation algorithms in both systems.</Paragraph> </Section> class="xml-element"></Paper>