File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/c04-1107_evalu.xml

Size: 5,227 bytes

Last Modified: 2025-10-06 13:59:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1107">
  <Title>Probabilistic Sentence Reduction Using Support Vector Machines</Title>
  <Section position="7" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5 Experiments and Discussion
</SectionTitle>
    <Paragraph position="0"> We used the same corpus as described in (Knight and Marcu 02), which includes 1,067 pairs of sentences and their reductions. To evaluate sentence reduction algorithms, we randomly selected 32 pairs of sentences from our parallel corpus, which is refered to as the test corpus. The training corpus of 1,035 sentences extracted 44,352 examples, in which each training example corresponds to an action. The SVM tool, LibSVM (Chang 01) is applied to train our model. The training examples were  17: end if18: end for19: end for20:</Paragraph>
    <Paragraph position="2"> 21: end while divided into SHIFT, REDUCE, DROP, RE-STORE, and ASSIGN groups. To train our support vector model in each group, we used the pairwise method with the polynomial kernel function, in which the parameter p in (3) and the constant C0 in equation (1) are 2 and 0:0001, respectively.</Paragraph>
    <Paragraph position="3"> The algorithms (Knight and Marcu 02) and (Nguyen and Horiguchi 03) served as the baseline1 and the baseline2 for comparison with the proposed algorithms. Deterministic sentence reduction using SVM and probabilistic sentence  reductionwerenamedasSVM-DandSVMP,respectively. For convenience, the ten top reduced outputs using SVMP were called SVMP-10. We used the same evaluation method as described in (Knight and Marcu 02) to compare the proposed methods with previous methods. For this experiment, we presented each original sentence in the test corpus to three judges who are specialists in English, together with three sentence reductions: the human generated reduction sentence, the outputs of the proposed algorithms, and the output of the baseline algorithms.</Paragraph>
    <Paragraph position="4"> The judges were told that all outputs were generated automatically. The order of the outputs was scrambled randomly across test cases. The judges participated in two experiments. In the flrst, they were asked to determine on a scale from 1 to 10 how well the systems did with respect to selecting the most important words in the original sentence. In the second, they were asked to determine the grammatical criteria of reduced sentences.</Paragraph>
    <Paragraph position="5"> Table 2 shows the results of English language sentence reduction using a support vector machine compared with the baseline methods and with human reduction. Table 2 shows compression rates, and mean and standard deviation results across all judges, for each algorithm. The results show that the length of the reduced sentences using decision trees is shorter than using SVMs, and indicate that our new methods out-perform the baseline algorithms in grammatical and importance criteria. Table 2 shows that the  highest performances. We also compared the computation time of sentence reduction using support vector machine with that in previous works. Table 3 shows that the computational times for SVM-D and SVMP-10 are slower than baseline, but it is acceptable for SVM-D.</Paragraph>
    <Paragraph position="6">  We also investigated how sensitive the proposed algorithms are with respect to the training data by carrying out the same experiment on sentences of difierent genres. We created the test corpus by selecting sentences from the web-site of the Benton Foundation (http://www.benton.org). The leading sentences in each news article were selected as the most relevant sentences to the summary of the news. We obtained 32 leading long sentences and 32 headlines for each item. The 32 sentences are used as a second test for our methods. We use a simple ranking criterion: the more the words in the reduced sentence overlap with the words in the headline, the more important the sentence is. A sentence satisfying this criterion is called a relevant candidate.</Paragraph>
    <Paragraph position="7"> For a given sentence, we used a simple method, namely SVMP-R to obtain a reduced sentence by selecting a relevant candidate among the ten top reduced outputs using SVMP-10.</Paragraph>
    <Paragraph position="8"> Table 4 depicts the experiment results for the baseline methods, SVM-D, SVMP-R, and SVMP-10. The results shows that, when applied to sentence of a difierent genre, the performance of SVMP-10 degrades smoothly, while the performance of the deterministic sentence reductions (the baselines and SVM deterministic) drops sharply. This indicates that the probabilistic sentence reduction using support vector machine is more stable.</Paragraph>
    <Paragraph position="9"> Table 4 shows that the performance of SVMP-10 is also close to the human reduction outputs and is better than previous works. In addition, SVMP-R outperforms the deterministic sentence reduction algorithms and the differences between SVMP-R's results and SVMP-10 are small. This indicates that we can obtain reduced sentences which are relevant to the headline, while ensuring the grammatical and the importance criteria compared to the original sentences.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML