File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1129_metho.xml

Size: 22,627 bytes

Last Modified: 2025-10-06 14:08:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1129">
  <Title>Syntactic Simpli cation for Improving Content Selection in Multi-Document Summarization</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 The Summarizer
</SectionTitle>
    <Paragraph position="0"> We use a sentence-clustering approach to multi-document summarization (similar to multigen (Barzilay, 2003)), where sentences in the input documents are clustered according to their similarity.</Paragraph>
    <Paragraph position="1"> Larger clusters represent information that is repeated more often across input documents; hence the size of a cluster is indicative of the importance of that information. For our current implementation, a representative (simpli ed) sentence is selected from each cluster and these are incorporated into the summary in the order of decreasing cluster size.</Paragraph>
    <Paragraph position="2"> A problem with this approach is that the clustering is not always accurate. Clusters can contain spurious sentences, and a cluster's size might then exaggerate its importance. Improving the quality of the clustering can thus be expected to improve the content of the summary. We now describe our experiments on syntactic simpli cation and sentence clustering. Our hypothesis is that simplifying parenthetical units (relative clauses and appositives) will improve the performance of our clustering algorithm, by preventing it from clustering on the basis of background information.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Simpli cation and Clustering
</SectionTitle>
      <Paragraph position="0"> We use SimFinder (Hatzivassiloglou et al., 1999) for sentence clustering and its similarity metric to evaluate cluster quality; SimFinder outputs similarity values (simvals) between 0 and 1 for pairs of sentences, based on word overlap, synonymy and n-gram matches. We use the average of the simvals for each pair of sentences in a cluster to evaluate a quality-score for the cluster. Table 1 below shows the quality-scores averaged over all clusters when the original document set is and is not preprocessed using our syntactic simpli cation software (described in a36 2.2). We use 30 document sets from the 2003 Document Understanding Conference (see a36 3.1 for description). For each of the experiments in table 1, SimFinder produced around 1500 clusters, with an average cluster size beween 3.6 and 3.8.</Paragraph>
      <Paragraph position="1">  sults in a 5% relative improvement in clustering.</Paragraph>
      <Paragraph position="2"> This improvement is signi cant at con dence a38a40a39 a41a43a42a11a44 as determined by the difference in proportions test (Snedecor and Cochran, 1989). Further, the standard deviation for the performance of the clustering decreases by around 2%. This suggests that removing parentheticals results in better and more robust clustering. As an example of how clustering improves, our simpli cation routine simpli es: PAL, which has been unable to make payments on dlrs 2.1 billion in debt, was devastated by a pilots' strike in June and by the region's currency crisis, which reduced passenger numbers and in ated costs.</Paragraph>
      <Paragraph position="3"> to: PAL was devastated by a pilots' strike in June and by the region's currency crisis.</Paragraph>
      <Paragraph position="4"> Three other sentences also simplify to the extent that  they represent PAL being hit by the June strike. The resulting cluster (with quality score=0.94) is: 1. PAL was devastated by a pilots' strike in June and by the region's currency crisis.</Paragraph>
      <Paragraph position="5"> 2. In June, PAL was embroiled in a crippling threeweek pilots' strike.</Paragraph>
      <Paragraph position="6"> 3. Tan wants to retain the 200 pilots because they stood by him when the majority of PAL's pilots staged a devastating strike in June.</Paragraph>
      <Paragraph position="7"> 4. In June, PAL was embroiled in a crippling threeweek pilots' strike.</Paragraph>
      <Paragraph position="8">  On the other hand, splitting conjoined clauses does not appear to aid clustering1. This indicates that the improvement from removing parentheticals is not because shorter sentences might cluster better (as SimFinder controls for sentence length, this is anyway unlikely). For con rmation, we performed one more experiment we deleted words at random, so that the average sentence length for the modi ed input documents was the same as for the inputs with parentheticals removed. This actually made the clustering worse (av. quality score of 0.637), con rming that the improvement from removing parentheticals was not due to reduced sentence length. These results demonstrate that the parenthetical nature of relative clauses and appositives makes their removal useful.</Paragraph>
      <Paragraph position="9"> Improved clustering, however, need not necessarily translate to improved content selection in summaries. We therefore also need to evaluate our summarizer. We do this in a36 3, but rst we describe the summarizer in more detail.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Description of our Summarizer
</SectionTitle>
      <Paragraph position="0"> Our summarizer has four stages preprocessing of original documents to remove parentheticals, clustering of the simpli ed sentences, selecting of one representative sentence from each cluster and deciding which of these selected sentences to incorporate in the summary.</Paragraph>
      <Paragraph position="1"> We use our syntactic simpli cation software (Siddharthan, 2002; Siddharthan, 2003) to remove parentheticals. It uses the LT TTT (Grover et al., 2000) for POS-tagging and simple noun-chunking. It then performs apposition and relative clause identi cation and attachment using shallow techniques based on local context and animacy information obtained from WordNet (Miller et al., 1993).</Paragraph>
      <Paragraph position="2"> We then cluster the simpli ed sentences with SimFinder (Hatzivassiloglou et al., 1999). To further tighten the clusters and ensure that their size is representative of their importance, we post-process them as follows. SimFinder implements an incremental approach to clustering. At each incremental step, the similarity of a new sentence to an existing cluster is computed. If this is higher than a threshold, the sentence is added to the cluster. There is no backtracking; once a sentence is added to a cluster, it cannot be removed, even if it is dissimilar to all the  in June. However, averaged over the entire DUC'03 data set, there is no net improvement from splitting conjunction. sentences added to the cluster in the future. Hence, there are often one or two sentences that have low similarity with the nal cluster. We remove these with a post-process that can be considered equivalent to a back-tracking step. We rede ne the criteria for a sentence to be part of the nal cluster such that it has to be similar (simval above the threshold) to all other sentences in the nal cluster. We prune the cluster to remove sentences that do not satisfy this criterion. Consider the following cluster and a threshold of 0.65. Each line consists of two sentence ids (P[sent id]) and their simval.</Paragraph>
      <Paragraph position="3">  We mark all the lines with similarity values below the threshold (in bold font). We then remove as few sentences as possible such that these lines are excluded. In this example, it is suf cient to remove  The result is a much tighter cluster with one sentence less than the original. This pruning operation leads to even higher similarity scores than those presented in table 1.</Paragraph>
      <Paragraph position="4"> Having pruned the clusters, we select a representative sentence from each cluster based on tf*idf. We then incorporate these representative sentences into the summary in decreasing order of their cluster size. For clusters with the same size, we incorporate sentences in decreasing order of tf*idf. Unlike multigen (Barzilay, 2003), which is generative and constructs a sentence from each cluster using information fusion, we implement extractive summarization and select one (simpli ed) sentence from each cluster. We discuss the scope for generation in our summarizer in a36 4 and a36 6.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Evaluation
</SectionTitle>
    <Paragraph position="0"> We present two evaluations in this section. Our system, as described in the previous section, was entered for the DUC'04 competition. We describe how it fared in a36 3.3. We also present an evaluation over a larger data set to show that syntactic simpli cation of parenthetical units signi cantly improves content selection (a36 3.4). But rst, we describe our data (a36 3.1) and the evaluation metric Rouge (a36 3.2).</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Data
The Document Understanding Conference (DUC)
</SectionTitle>
      <Paragraph position="0"> has been run annually since 2001 and is the biggest summarization evaluation effort, with participants from all over the world. In 2003, DUC put special emphasis on the development of automatic evaluation methods and also started providing participants with multiple human-written models needed for reliable evaluation. Participating generic multi-document summarizers were tested on 30 event-based sets in 2003 and 50 sets in 2004, all 80 containing roughly 10 newswire articles each. There were four human-written summaries for each set, created for evaluation purposes. In DUC'03, the task was to generate 100 word summaries, while in DUC'04, the limit was changed to 665 bytes.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Evaluation Metric
</SectionTitle>
      <Paragraph position="0"> We evaluated our summarizer on the DUC test sets using the Rouge automatic scoring metric (Lin and Hovy, 2003). The experiments in Lin and Hovy (2003) show that among n-gram approaches to scoring, Rouge-1 (based on unigrams) has the highest correlation with human scores. In 2004, an additional automatic metric based on longest common subsequence was included (Rouge-L), that aims to overcome some de ciencies of Rouge-1, such as its susceptibility to ungrammatical keyword packing by dishonest summarizers2. For our evaluations, we use the Rouge settings from DUC'04: stop words are included, words are Porter-stemmed, and all four human model summaries are used.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 DUC'04 Evaluation
</SectionTitle>
      <Paragraph position="0"> We entered our system as described above for the DUC'04 competition. There were 35 entries for the generic summary task, including ours. At 95% condence levels, our system was signi cantly superior to 23 systems and indistinguishable from the other 11 (using Rouge-L). Using Rouge-1, there was one system that was signi cantly superior to ours, 10 that were indistinguishable and 23 that were significantly inferior. We give a few Rouge scores from DUC'04 in gure 2 below for comparison purposes.</Paragraph>
      <Paragraph position="1"> The 95% con dence intervals for our summarizer are +-0.0123 (Rouge-1) and +-0.0130 (Rouge-L).</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.4 Bene ts from Syntactic Simpli cation
</SectionTitle>
      <Paragraph position="0"> Table 3 below shows the Rouge-1 and Rouge-L scores for our summarizer when the text is and is not simpli ed to remove parentheticals. The data  for this evaluation consists of the 80 document sets from DUC'03 and DUC'04. We did not use data from previous years as these included only one human model-summary and Rouge requires multiple models to be reliable.</Paragraph>
      <Paragraph position="1">  The improvement in performance when the text is preprocessed to remove parenthetical units is signi cant at 95% con dence limits. When compared to the 34 other participants of DUC'04, the simplication step raises our clustering-based summarizer from languishing in the bottom half to being in the top third and statistically indistinguishable from the top system at 95% con dence (using Rouge-L).</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Reference Regeneration
</SectionTitle>
    <Paragraph position="0"> As the evaluations above show, preprocessing text with syntactic simpli cation signi cantly improves content selection for our summarizer. This is encouraging; however, our summarizer, as describe so far, generates summaries that contain no parentheticals (appositives or relative clauses), as these are removed from the original texts prior to summarization. We believe that the inclusion of parenthetical information about entities should be treated as a reference generation task, rather than a content selection one. Our analysis of human summaries suggests that people select parentheticals to improve coherence and to aid the hearer in identifying referents and relating them to the discourse. A complete treatment of parentheticals in reference regeneration in summaries is beyond the scope of this paper, the emphasis of which is content-selection, rather than coherence. We plan to address this issue elsewhere; in this paper, we restrict ourselves to describing a baseline approach to incorporating parentheticals in regenerated references to people in summaries.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Including Parentheticals
</SectionTitle>
      <Paragraph position="0"> Our text-simpli cation system (Siddharthan, 2003) provides us with with a list of all relative clauses, appositives and pronouns that attach to/co-refer with every entity. We used a named entity tagger (Wacholder et al., 1997) to collect all such information for every person. The processed references to the same people across documents were aligned using the named entity tagger canonic name, resulting in tables similar to those shown in gure 1.</Paragraph>
      <Paragraph position="1"> Abdullah Ocalan APW19981106.1119: [IR] Abdullah Ocalan; [AP] leader of the outlawed Kurdistan Worker 's Party; [CO] Ocalan; APW19981104.0265: [IR] Kurdish rebel leader Abdullah Ocalan; [RC] who is wanted in Turkey on charges of heading a terrorist organization; [CO] Ocalan; [RC] who leads the banned Kurdish Workers Party , or PKK , which has been ghting for Kurdish autonomy in Turkey since 1984; [CO] Ocalan; [CO] Ocalan; [CO] Ocalan; APW19981113.0541: [IR] Abdullah Ocalan; [AP] leader of Kurdish insurgents; [RC ] who has been sought for years by Turkey; [CO] Ocalan; [CO] Ocalan; [CO] Ocalan; [PR] He; [CO] Ocalan; [CO] Ocalan; [PR] his; [CO] Ocalan; [CO] Ocalan; [CO] Ocalan; [PR] his; [CO] Ocalan; [CO] Ocalan; [AP] a political science dropout from Ankara university in 1978; APW19981021.0554: [IR] rebel leader Abdullah Ocalan; [PR] he; [CO] Ocalan;  in the input. The canonic form of the named entity is shown in bold and the input article id in italic. IR stands for initial reference , CO for subsequent noun co-reference, PR for pronoun reference, AP for apposition and RC for relative clause.</Paragraph>
      <Paragraph position="2"> We automatically post-edited our summaries using a modi ed version of the module described in Nenkova and McKeown (2003). This module normalizes references to people in the summary, by introducing them in detail when they are rst mentioned and using a short reference for subsequent mentions; these operations were shown to improve the readability of the resulting summaries.</Paragraph>
      <Paragraph position="3"> Nenkova and McKeown (2003) avoided including parentheticals due to both the unavailability of fast and reliable identi cation and attachment of appositives and relative clauses, and theoretical issues relating to the selection of the most suitable parenthetical unit in the new summary context. In order to ensure a balanced inclusion of parenthetical information in our summaries, we modi ed their initial approach to allow for including relative clauses and appositives in initial references.</Paragraph>
      <Paragraph position="4"> We made use of two empirical observations made by Nenkova and McKeown (2003) based on human summaries: a rst mention is very likely to be modi ed in some way (probability of 0.76), and subsequent mentions are very unlikely to be postmodi ed (probability of 0.01 0.04). We therefore only considered incorporating parentheticals in rst mentions. We constructed a set consisting of appositives and relative clauses from initial references in the input documents and an empty string option (for the example in gure 1, the set would be a53 leader of the outlawed Kurdistan Worker's Party , who is wanted in Turkey on charges of heading a terrorist organization , leader of Kurdish insurgents , who has been sought for years by Turkey , a54a56a55 ). We then selected one member of the set randomly for inclusion in the initial reference. A more sophisticated approach to the treatment of parentheticals in reference regeneration, based on lexical cohesion constraints, is currently underway.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Evaluation
</SectionTitle>
      <Paragraph position="0"> We repeated the evaluations on the 80 document sets from DUC'03 and DUC'04, using our simpli cation+clustering based summarizer with the reference regeneration component included. The results are shown in the table below. At 95% con dence, the difference in performance is not signi cant.</Paragraph>
      <Paragraph position="1">  This is an interesting result because it suggests that rewriting references does not adversely affect content selection. This might be because the extra words added to initial references are partly compensated for by words removed from subsequent references. In any case, the reference rewriting can signi cantly improve readability, as shown in the examples in gures 2 and 3. We are also optimistic that a more focused reference rewriting process based on lexical-cohesive constraints and information-theoretic measures can improve Rouge content-evaluation scores as well as summary readability. null</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Surface Analysis of Summaries
</SectionTitle>
    <Paragraph position="0"> Table 5 compares the average sentence lengths of our summaries (after reference rewriting) with those of the original news reports, human (model) summaries and machine summaries generated by the participating summarizers at DUC'03 and '04.</Paragraph>
    <Paragraph position="1"> These gures con rm various intuitions about human vs machine-generated summaries machine summaries tend to be based on sentence extraction; Before: Pinochet was placed under arrest in London Friday by British police acting on a warrant issued by a Spanish judge. Pinochet has immunity from prosecution in Chile as a senator-for-life under a new constitution that his government crafted. Pinochet was detained in the London clinic while recovering from back surgery.</Paragraph>
    <Paragraph position="2"> After: Gen. Augusto Pinochet, the former Chilean dictator, was placed under arrest in London Friday by British police acting on a warrant issued by a Spanish judge. Pinochet has immunity from prosecution in Chile as a senator-for-life under a new constitution that his government crafted. Pinochet was detained in the London clinic while recovering from back surgery.</Paragraph>
    <Paragraph position="3">  erated summary before/after reference regeneration. many have an explicitly encoded preference for long sentences (assumed to be more informative); humans tend to select information at a sub-sentential level. As a result, human summaries contain on average shorter sentences than the original, while machine summaries contain on average longer sentences than the original. Interestingly, our summarizer, like human summarizers, generates shorter sentences than the original news text.</Paragraph>
    <Paragraph position="4">  from DUC'03 and '04.</Paragraph>
    <Paragraph position="5"> Equally interesting is the distribution of parentheticals. The original news reports contain on average one parenthetical unit (appositive or relative clause) every 3.9 sentences. The machine summaries contain on average one parenthetical every 3.3 sentences. On the other hand, human summaries contain only one parenthetical unit per 8.9 sentences on average.</Paragraph>
    <Paragraph position="6"> In other words, human summaries contain fewer parenthetical units per sentence than the original reports; this appears to be a deliberate attempt at including more events and less background information in a summary. Machine summaries tend to contain on average more parentheticals than the original reports. This is possibly an artifact of the preference for longer sentences, but the data suggests that 100 word machine summaries use up valuable space by presenting unnecessary background information.</Paragraph>
    <Paragraph position="7"> Our summaries contain one parenthetical unit every 10.0 sentences. This is closer to human summaries than to the average machine summary, again suggesting that our approach of treating the incluBefore: null Turkey has been trying to form a new government since a coalition government led by Yilmaz collapsed last month over allegations that he rigged the sale of a bank. Ecevit refused even to consult with the leader of the Virtue Party during his efforts to form a government. Ecevit must now try to build a government. Demirel consulted Turkey's party leaders immediately after Ecevit gave up.</Paragraph>
    <Paragraph position="8"> After: Turkey has been trying to form a new government since a coalition government led by Prime Minister Mesut Yilmaz collapsed last month over allegations that he rigged the sale of a bank. Premier-designate Bulent Ecevit refused even to consult with the leader of the Virtue Party during his efforts to form a government. Ecevit must now try to build a government.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
President Suleyman Demirel consulted Turkey's party
</SectionTitle>
      <Paragraph position="0"/>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML