File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/x98-1025_metho.xml

Size: 38,466 bytes

Last Modified: 2025-10-06 14:15:18

<?xml version="1.0" standalone="yes"?>
<Paper uid="X98-1025">
  <Title>SUMMARIZATION: (1) USING MMR FOR DIVERSITY- BASED RERANKING AND (2) EVALUATING SUMMARIES</Title>
  <Section position="3" start_page="181" end_page="182" type="metho">
    <SectionTitle>
2. MAXIMAL MARGINAL RELEVANCE
</SectionTitle>
    <Paragraph position="0"> Most modern IR search engines produce a ranked list of retrieved documents ordered by declining relevance to the user's query \[1, 18, 21, 26\]. In contrast, we motivated the need for '&amp;quot;relevant novelty&amp;quot; as a potentially superior criterion. However, there is no known way to directly measure new-and-relevant information, especially given traditional bag-of-words methods such as the vector-space model \[19, 21\]. A first approximation to measuring relevant novelty is to measure relevance and novelty independently and provide a linear combination as the metric. We call the linear combination &amp;quot;marginal relevance&amp;quot; -- i.e. a document has high marginal relevance if it is both relevant to the query and contains minimal similarity to previously selected documents. We strive to maximize marginal relevance in retrieval and summarization, hence we label our method &amp;quot;maximal marginal relevance&amp;quot; (MMR).</Paragraph>
    <Paragraph position="1"> The Maximal Marginal Relevance (MMR) metric is defined as follows: Let C = document collection (or document stream) Let Q = ad-hoc query (or analyst-profile or topic/category specification) Let R = IR (C, Q, q) - i.e. the ranked list of documents retrieved by an IR system, given C and Q and a relevance threshold theta, below which it will not retrieve documents. (q can be degree of match, or number of documents).</Paragraph>
    <Paragraph position="2"> Let S = subset of documents in R already provided to the user. (Note that in an IR system without MMR and dynamic reranking, S is typically a proper prefix of list R.) R~ is the set difference, i.e. the set of documents in R, not yet offered to the user.</Paragraph>
    <Paragraph position="3"> def MMR(C,Q,R,S)=Argmax\[X*Sim 1 (Di,Q)-(1-X)Max(Sim2(Di,Dj))\] Di ~R\S Dj eS Given the above definition, MMR computes incrementally the standard relevance-ranked list when the parameter ~=1, and computes a maximal diversity ranking among the documents in R when X=0. For intermediate values of ~, in the interval \[0,1\], a linear combination of both criteria is optimized. Users wishing to sample the information space around the query, should set ~, at a smaller value, and those wishing to focus in on multiple potentially overlapping or reinforcing relevant documents, should set ~, to a value closer to 1. For document retrieval, we found that a particularly effective search strategy (reinforced by the user study discussed below) is to start with a small L (e.g. ~, = .3) in order to understand the information space in the region of the query, and then to focus on the most important parts using a reformulated query (possibly via relevance feedback) and a larger value of ~ (e.g. ~, = .7). Note that the similarity metric Sim 1 used in document retrieval and relevance ranking between documents and query could be the same as Sim2 between documents (e.g., both could be cosine similarity), but this need not be the case. A more accurate, but computationally more costly metric could be used when applied only to the elements of the retrieved document set R, given that IRI &lt;&lt; ICI, if MMR is applied for re-ranking the top portion of the ranked list produced by a standard IR system.</Paragraph>
    <Paragraph position="4">  query : Brazil external debt fiqure</Paragraph>
  </Section>
  <Section position="4" start_page="182" end_page="182" type="metho">
    <SectionTitle>
Adicle Title
BRAZIL SEEN AS VANGUARD FOR CHANGING DEBT STRATEGY
FUNARO REJECTS UK SUGGESTION OF IMF BRAZIL PLAN
ECONOMIC SPOTLIGHT - BRAZIL DEBT DEADLINES LOOM
U.S. URGED TO STRENGTHEN DEBT STRATEGY
U.S, URGES BANKS TO DEVELOP NEW 3RD WLD FINANCE
FUNARO'S DEPARTURE COULD LEAD TO BRAZIL DEBT DEAL
U.S. OFFICIALS SAY BRAZIL SHOULD DEAL WITH BANKS
BRAZIL SEEKS TO REASSURE BANKS ON DEBT SUSPENSION
BRAZIL SEEKS TO REASSURE BANKS ON DEBT SUSPENSION
BRAZIL CRITICISES ADVISORY COMMITTEE STRUCTURE
LATIN DEBTORS MAKE NEW PUSH FOR DEBT RELIE
BRAZIL DEBT SEEN PARTNER TO HARD SELL TACTICS
BRAZIL DEBT POSES THORNY ISSUE FOR U.S. BANKS
U.S. URGES BANKS TO WEIGH PHILIPPINE DEBT PLAN
U.K. SAYS HAS NO ROLE IN BRAZIL MORATORIUM TALKS
TALKING POINT/BANK STOCKS
CANADA BANKS COULD SEE PRESSURE ON BRAZIL LOANS
TREASURY'S BAKER SAYS BRAZIL NOT tN CRISIS
BRAZIL'S DEBT CRISIS BECOMING POLITICAL CRISIS
BAKER AND VOLCKER SAY DEBT STRATEGY WILL WORK
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="5" start_page="182" end_page="183" type="metho">
    <SectionTitle>
3. DOCUMENT REORDERING
</SectionTitle>
    <Paragraph position="0"> We implemented MMR in two retrieval engines, PURSUIT (an upgraded version of the original TM retrieval engine inside the Lycos search engine), \[9\] and SMART (the publicly available version of the Cornell IR engine) \[1\]. Using the scoring functions available in each system for both Siml and Sim2, we obtained consistent and expected results in the behavior of the two systems.</Paragraph>
    <Paragraph position="1"> The results of MMR reranking are shown in Table 1. In this Reuters document collection, article 1403 is a duplicate of 1388. MMR reranking performs as expected, for decreasing values of 1, the ranking of 1403 drops. Also as predicted, novel but still relevant information as evidenced by document 69 starts to increase in ranking. Relevant, but similar to the highest ranked documents, such as document 1713 drop in ranked ordering. Document 2149 's position varies depending on its similarity to previously seen information.</Paragraph>
    <Paragraph position="2"> We also performed a pilot experiment with five users who were undergraduates from various disciplines. The purpose of the study was to find out if they could tell what was the difference between the standard ranked document order retrieved by SMART and a MMR reranked order with X = ..5.</Paragraph>
    <Paragraph position="3"> They were asked to perform nine different search tasks to find information and asked various questions about the tasks. They used two methods to retrieve documents, known only as R and S. Parallel tasks were constructed so that one set of users would perform method R on one task and method S on a similar task. Users were not told how the documents were presented only that either &amp;quot;method R&amp;quot; or &amp;quot;method S&amp;quot; were used and that they needed to be try to distinguish the differences between methods. After each task we asked them to record the information found. We also asked them to look at the ranking for method R and method S and see if they could tell any difference between the two. The majority of people said they preferred the method which gave in their opinion the most broad and interesting topics. In the final section they were asked to select a search method and use it for a search task. 80% (4 out of 5) chose the method MMR to use. The person who chose Smart stated it was because &amp;quot;it tends to group more like stories together.&amp;quot; The users indicated a differential preference for MMR in navigation and for locating the relevant candidate documents more quickly, and pure-relevance ranking when looking at related documents within that band. Three of the five users clearly discovered the differential utility of diversity search and relevance-only search. One user explicitly stated his strategy: &amp;quot;Method R \[relevance only\] groups items together based on similarity and Method S \[MMR re-ranking\] gives a wider array. I would  use Method S \[MMR re-ranking\] to find a topic ... and then use Method R \[relevance-only\] with a specific search from Method S \[MMR rerankingl to yield a lot of closely related items.&amp;quot; The initial study was too small to yield statistically significant trends with respect to speed of known-item retrieval, or recall improvements for broader query tasks. However, based on our own experience and questionnaire responses from the five users, we expect that task demands play a large role with respect to which method yields better performance.</Paragraph>
  </Section>
  <Section position="6" start_page="183" end_page="184" type="metho">
    <SectionTitle>
4. SINGLE DOCUMENT SUMMARIES
</SectionTitle>
    <Paragraph position="0"> Human summarization of documents, sometimes called &amp;quot;abstraction&amp;quot; is a fixed-length generic summary, reflecting the key points that the abstractor -- rather than the user -- deems important. Consider a physician evaluating a particular chemotherapy regimen who wants to know about its adverse effects to elderly female patients. The retrieval engine produces several lengthy reports (e.g. a 300-page clinical study), whose abstracts do not contain any hint of whether there is information regarding effects on elderly patients. A useful summary for this physician would contain query-relevant passages (e.g.</Paragraph>
    <Paragraph position="1"> differential adverse effects on elderly males and females, buried in page 211-212 of the clinical study) assembled into a summary. A different user with different information needs may require a totally different summary of the same document.</Paragraph>
    <Paragraph position="2"> We developed a minimal-redundancy query-relevant summarizer-by-extraction method, which differs from previous work in summarization \[10, 12, 15, 18, 24\] in several dimensions.</Paragraph>
    <Paragraph position="3"> * Optional query relevance: as discussed above a query or a user interest profile (for the vector sum of both, appropriately weighted) is used to select relevant passages. If a generic query-free summary is desired, the centroid vector of the document is calculated and passages are selected with the principal components of the centroid as the query.</Paragraph>
    <Paragraph position="4"> * Variable granularity summarization: The length of the summary is under user control. Brief summaries are useful for indicative purposes (e.g.</Paragraph>
    <Paragraph position="5"> whether to read further), and longer ones for drilling and extracting detailed information.</Paragraph>
    <Paragraph position="6"> * Non-redundancy: Information density is enhanced by ensuring a degree of dissimilarity between passages contained in the summary. The degree of query-focus vs. diversity sampling is under user control (the ~, parameter in the MMR formula).</Paragraph>
    <Paragraph position="7"> Our process for creating single document summaries is as follows:  1. Segment a document into passages and index the  passages using the inverted indexing method used by the IR engine for full documents. Passages may be phrases, sentences, n-sentence chunks, or paragraphs. For the TIPSTER III evaluation, we used sentences as passages.</Paragraph>
    <Paragraph position="8"> 2. Within a document, identify the passages relevant to the query. Use a threshold below which the passages are discarded. We used a similarity metric based on cosine similarity using the traditional TF-IDF weights.</Paragraph>
    <Paragraph position="9"> 3. Apply the MMR metric as defined in Section 2 to the passages (rather than full documents).</Paragraph>
    <Paragraph position="10"> Depending on the desired length of the summary, select a few or larger number. If the parameter is not very close to 1, redundant query relevant passages will tend to be eliminated and other different, slightly less query relevant passages will be included. We allow the user to select the  number of passages or the percentage of the document size (also known as the &amp;quot;compression ratio&amp;quot;).</Paragraph>
    <Paragraph position="11"> 4. Reassemble the selected passages into a summary document using one of the following summarycohesion criteria: * Document appearance order: Present the  segments according to their order of presentation in the original document. If the first sentence is longer than a threshold, we automatically include this sentence in the summary as it tends to set the context for the article. If the user only wants to view a few segments, the first sentence must also meet a threshold for sentence rank to be included.</Paragraph>
    <Paragraph position="12"> * News-story principle: Present the information in MMR-ranked order, i.e., the most relevant and most diverse information first. In this manner, the reader gets the maximal information even if they stop reading the summary. This allows the diversity of relevant information to be presented earlier and topic introduced may be revisited after other relevant topics have been introduced. * Topic-cohesion principle: First group together the document segments by topic clustering (using sub-document similarity criteria). Then rank the centroids of each cluster by MMR (most important first) and present the information, a  topic-coherent cluster at a time, starting with the cluster whose centroid ranks highest.</Paragraph>
    <Paragraph position="13"> We implemented query-relevant documentappearance-based sequencing of information. Our method of summarization does not require the more elaborate language-regeneration needed by Kathy McKeown and her group at Columbia in their summarization work \[15\]. As such our method is simpler, faster and more widely applicable, but yields potentially less cohesive summaries. All summary results in this paper use the SMART search engine with stopwords eliminated from the indexed data and stemming.</Paragraph>
    <Paragraph position="14"> Query: Delaunay refinement mesh generation finite element method foundations three dimension analysis; ~ = .3 \[1\] Delaunay refinement is a technique for generating unstructured meshes of triangles or tetrahedra suitable for use in the finite element method or other numerical methods for solving partial differential equations.</Paragraph>
    <Paragraph position="15"> \[5\] The purpose of this thesis is to further this progress by cementing the foundations of two-dimensional Delaunay refinement, and by extending the technique and its analysis to three dimensions.</Paragraph>
    <Paragraph position="16"> \[15\] Nevertheless, Delaunay refinement methods for tetrahedral mesh generation have the rare distinction that they offer strong theoretical bounds and frequently perform well in practice.</Paragraph>
    <Paragraph position="17"> \[39\] If one can generate meshes that are completely satisfying for numedcal techniques like the finite element method, the other applications fall easily in line. \[131\] Our understanding of the relative merit of different metrics for measuring element quality, or the effects of small numbers of poor quality elements on numedcal solutions, is based as much on engineedng expedence and rumor as it is on mathematical foundations.</Paragraph>
    <Paragraph position="18"> \[158\] Delaunay refinement methods are based upon a well-known geometric construction called the Delaunay triangulation, which is discussed extensively in the mesh generation chapter.</Paragraph>
    <Paragraph position="19"> \[201\] I first extend Ruppert's algorithm to three dimensions, and show that the extension generates nicely graded tetrahedral meshes whose circumradius-to-shortest edge ratios are nearly bounded below two.</Paragraph>
    <Paragraph position="20"> \[2250\] Refinement Algorithms for Quality Mesh Generation: Delaunay refinement algodthms for mesh generation operate by maintaining a Delaunay or constrained Delaunay triangulation, which is refined by inserting carefully placed vertices until the mesh meets constraints on element quality and size.</Paragraph>
    <Paragraph position="21"> \[3648\] I do not know to what difference between the algorithms one should attribute the slightly better bound for Delaunay refinement, nor whether it marks a real difference between the algodthms or is an artifact of the different methods of analysis.</Paragraph>
    <Paragraph position="22">  Query: sliver mesh boundary removal small angles; ;L = .7 \[1\] Delaunay refinement is a technique for generating unstructured meshes of tdangles or tetrahedra suitable for use in the finite element method or other numerical methods for solving partial differential equations.</Paragraph>
    <Paragraph position="23"> \[129\] Hence, many mesh generation algorithms take the approach of attempting to bound the smallest angle. \[2621\] Because s is locked, inserting a vertex at c will not remove t from the mesh.</Paragraph>
    <Paragraph position="24"> \[2860\] Of course, one must respect the PSLG; small input angles cannot be removed.</Paragraph>
    <Paragraph position="25"> \[3046\] The worst slivers can often be removed by Delaunay refinement, even if there is no theoretical guarantee. \[3047\] Meshes with bounds on the circumradius-to-shortest edge ratios of their tetrahedra are an excellent starting point for mesh smoothing and optimization methods designed to remove slivers and improve the quality of an existing mesh (see smoothing section).</Paragraph>
    <Paragraph position="26"> \[3686\] If one inserts a vertex at the circumcenter of each sliver tetrahedron, will the algorithm fail to terminate? \[3702\] A sliver can always be eliminated by splitting it, but how can one avoid creating new slivers in the process? \[3723\] Unfortunately, my practical success in removing slivers is probably due in part to the severe restdctions on input angle I have imposed upon Delaunay refinement. \[3724\] Practitioners report that they have the most difficulty removing slivers at the boundary of a mesh, especially near small angles.</Paragraph>
  </Section>
  <Section position="7" start_page="184" end_page="185" type="metho">
    <SectionTitle>
5. SUMMARIZING
DOCUMENTS
LONGER
</SectionTitle>
    <Paragraph position="0"> The MMR-passage selection ':'method for summarization works better for longer documents (which typically contain more inherent passage redundancy across document sections such as abstract, introduction, conclusion, results, etc.). To demonstrate the quality of summaries that can be obtained for long documents, we summarized an entire dissertation containing 3,772 sentences with a generic topic query constructed by expanding the thesis title (Figure 1). In contrast, Figure 2 shows the results of a more specialized query with a larger L value to focus summarization less on diversity and more on topic.</Paragraph>
    <Paragraph position="1"> The above example demonstrates the utility of query relevance in summarization and the incremental utility of controlling summary focus via the lambda parameter. It also highlights a shortcoming of summarization by extraction, namely coping with antecedent references. Sentence \[2621\] refers to coefficients &amp;quot;s&amp;quot;, &amp;quot;c&amp;quot;, and &amp;quot;t,&amp;quot; which do not make sense outside the framework that defines them. Such referential problems are ameliorated with increased passage length, for instance using paragraphs rather than sentences. However, longer-passage selection  also implies longer summaries. Another solution co-reference resolution \[25\].</Paragraph>
  </Section>
  <Section position="8" start_page="185" end_page="187" type="metho">
    <SectionTitle>
6. MULTI-DOCUMENT SUMMARIES
</SectionTitle>
    <Paragraph position="0"> is As discussed earlier, MMR passage selection works equally well for summarizing single documents or clusters of topically related documents. Our method for multi-document summarization follows the same basic procedure as that of single document summarization (see section 4). In step 2 (Section 4), we identify the N most relevant passages from each of the documents in the collection and use them to form the passage set to be MMR re-ranked. N is dependent on the desired resultant length of the summary. We used N relevant passages from each document collection rather than the top relevant passages in the entire collection so that each article had a chance to provide a query-relevant contribution. In the future we intend to compare this to using MMR ranking where the entire document set is treated as a single document. Steps 2, 3 and 4 are primarily the same.</Paragraph>
    <Paragraph position="1"> The TIPSTER evaluation corpus provided several sets of topical clusters to which we applied MMR summarization. In one such example on a cluster of apartheid-related documents, we used the topic description as the query (see Figure 3) and N was set to 4 (4 sentences per article were reranked). The top 10 sentences for ~ = 1 (effectively query relevance, but no MMR) and k = .3 (both query relevance and MMR anti-redundancy) are shown in Figures 4 and 5 respectively.</Paragraph>
    <Paragraph position="2"> The summaries clearly demonstrate the need for MMR in passage selection. The 7~ = 1 case exhibits considerable redundancy, ranging from nearreplication in passages \[4\] and \[5\] to redundant content in passages \[7\] and \[9\]. Whereas the L = .3 case exhibits no such redundancy. Counting clearly distinct propositions in both cases yields a 20% greater information content for the MMR case, though both summaries are equivalent in length.</Paragraph>
    <Paragraph position="3"> Topic:  &lt;head&gt; Tipster Topic Description &lt;num&gt; Number: 110 &lt;dom&gt; Domain: International Politics &lt;title&gt; Topic: Black Resistance Against the South African Government &lt;desc&gt; Description: Document will discuss efforts by the black majority in South Afdca to overthrow domination by the white minority government.</Paragraph>
    <Paragraph position="4"> &lt;smry&gt; Summary: Document will discuss efforts by the black majority in South Africa to overthrow domination by the white minority government.</Paragraph>
    <Paragraph position="5"> &lt;narr&gt; Narrative:  A relevant document will discuss any effort by blacks to force political change in South Africa. The reported black challenge to apartheid may take any form -- military, political, or economic -- but of greatest interest would be information on reported activities by armed personnel linked to the African National Congress (ANC), either in South Africa or in bordering states.</Paragraph>
    <Paragraph position="6">  &lt;con&gt; Concept(s): 1. African National Congress, ANC, Nelson Mandela, Oliver Tambo 2. Chief Buthelezi, Inkatha, Zulu 3. terrorist, detainee, subversive, communist 4. Limpopo River, Angola, Botswana, Mozambique, Zambia 5. apartheid, black township, homelands, group areas act,  \[1\] \[761\] AP880212-0060 \[15\] ANGOP quoted the Angolan statement as saying the main causes of conflict in the region are South Africa's &amp;quot;'illegal occupation&amp;quot; of Namibia, South African attacks against its black-ruled neighbors and its alleged creation of armed groups to carry out &amp;quot;'terrorist activities&amp;quot; in those countries, and the denial of political rights to the black majodty in South Africa.</Paragraph>
    <Paragraph position="7"> \[2\] \[758\] AP880803-0080 \[25\] Three Canadian anti-apartheid groups issued a statement urging the government to sever diplomatic and economic links with South Africa and aid the African National Congress, the banned group fighting the white-dominated government in South Africa.</Paragraph>
    <Paragraph position="8"> \[3\] \[756\] AP880803-0082 \[25\] Three Canadian anti-apartheid groups issued a statement urging the government to sever diplomatic and economic links with South Africa and aid the African National Congress, the banned group fighting the white-dominated government in South Africa.</Paragraph>
    <Paragraph position="9"> \[4\] \[790\] AP880802-0165 \[27\] South Africa says the ANC, the main black group fighting to overthrow South Africa's white government, has seven major military bases in Angola, and the Pretona government wants those bases closed down.</Paragraph>
    <Paragraph position="10"> \[5\] \[654\] AP880803-0158 \[27\] South Africa says the ANC, the main black group fighting to overthrow South Africa's white-led government, has seven major military bases in Angola, and it wants those bases closed down.</Paragraph>
    <Paragraph position="11"> \[6\] \[92\] WSJ910204-0176 \[2\] de Klerk's proposal to repeal the major pillars of apartheid drew a generally positive response from black leaders, but African National Congress leader Nelson Mandela called on the international community to continue economic sanctions against South Africa until the government takes further steps.</Paragraph>
    <Paragraph position="12"> \[7\] \[781\] AP880823-0069 \[18\] The ANC is the main guerrilla group fighting to overthrow the South African government and end apartheid, the system of racial segregation in which South Africa's black majority has no vote in national affairs. \[8\] \[375\] WSJ890908-0159 \[24\] For everywhere he tums, he hears the same mantra of demands -- release, lift bans, dismantle, negotiate -- be it from local anti-apartheid activists or from foreign governments: release political prisoners, like African National Congress leader Nelson Mandela; lift bans on all political organizations, such as the ANC, the Pan Africanist Congress and the United Democratic Front; dismantle all apartheid legislation; and finally, begin negotiations with leaders of all races. \[9\] \[762\] AP880212-0060 \[14\] The African National Congress is the main rebel movement fighting South Africa's white-led government and SWAPO is a black guerrilla group fighting for independence for Namibia, which is administered by South Africa.</Paragraph>
    <Paragraph position="13"> \[10\] \[91\] WSJ910404-0007 \[8\] Under an agreement between the South Afncan government and the Afncan National Congress, the major anti-apartheid organization, South Africa's remaining political prisoners are scheduled for release by April 30.</Paragraph>
    <Paragraph position="14">  Angolan statement as saying the main causes of conflict in the region are South Africa's &amp;quot;'illegal occupation&amp;quot; of Namibia, South African attacks against its black-ruled neighbors and its alleged creation of armed groups to carry out &amp;quot;'terrorist activities&amp;quot; in those countries, and the denial of political dghts to the black majority in South Afdca. \[2\] \[2\] \[758\] AP880803-0080 \[25\] Three Canadian anti-apartheid groups issued a statement urging the government to sever diplomatic and economic links with South Africa and aid the African National Congress, the banned group fighting the white-dominated government in South Africa. \[3\] \[6\] \[92\] WSJ910204-0176 \[2\] de Klerk's proposal to repeal the major pillars of apartheid drew a generally positive response from black leaders, but African National Congress leader Nelson Mandela called on the intemational community to continue economic sanctions against South Afdca until the government takes further steps.</Paragraph>
    <Paragraph position="15"> \[4\] \[8\] \[375\] WSJ890908-0159 \[24\] For everywhere he turns, he hears the same mantra of demands -- release, lift bans, dismantle, negotiate -- be it from local anti-apartheid activists or from foreign governments: release political prisoners, like African National Congress leader Nelson Mandela; lift bans on all political organizations, such as the ANC, the Pan Afncanist Congress and the United Democratic Front; dismantle all apartheid legislation; and finally, begin negotiations with leaders of all races. \[5\] \[4\] \[790\] AP880802-0165 \[27\] South Africa says the ANC, the main black group fighting to overthrow South Africa's white government, has seven major military bases in Angola, and the Pretoria government wants those bases closed down.</Paragraph>
    <Paragraph position="16"> \[6\] \[11\] \[334\] AP890703-0114 \[14\] The white delegation chief, Mike Olivier, said the ANC members, including President Oliver Tambo and South African Communist Party leader Joe Slovo, said some white anti-apartheid members of Parliament could make a difference, although the organization believes Parliament as a whole is not representative of South Africans.</Paragraph>
    <Paragraph position="17"> \[7\] \[14\] \[788\] WSJ880323-0129 \[11\] These included a picture of Oliver Tambo, the exiled leader of the banned African National Congress; a story about 250 women attending an ANC conference in southern Afnca; a report on the cdsis in black education; and an advertisement sponsored by a Catholic group in West Germany that quoted a Psalm and called for the abolition of torture in South Africa.</Paragraph>
    <Paragraph position="18"> \[8\] \[12\] \[303\] AP880621-0089 \[8\] There was no immediate comment from South Africa, which in the past has staged cross-border raids on Botswana and other neighboring countries to attack suspected facilities of the Afdcan National Congress, which seeks to overthrow South Afdca's white-led government.</Paragraph>
    <Paragraph position="19"> \[9\] \[24\] \[502\] wsJg00510-0088 \[24\] While the membership of Inkatha, the religiously and politically conservative group that is the ANC's chief rival for power in black South Afdca, is overwhelmingly Zulu, Inkatha's leader, Mangosutho Buthelezi, has very seldom appealed to sectional tnbal loyalties.</Paragraph>
    <Paragraph position="20"> \[10\] \[16\] \[593\] AP890821-0092 \[11\] Besides ending the emergency and lifting bans on anti-apartheid groups and individual activists, the Harare summit's conditions included the removal of all troops from South Afnca's black townships, releasing all political prisoners and ending political tdals and executions, and a government commitment to free political discussion.</Paragraph>
    <Paragraph position="21">  As can be seen from the above summaries, multi-document synthetic summaries require support in the user interface. In particular, the following issues need to be addressed: * Attributability: The user needs to be able to access easily the source of a given passage.</Paragraph>
    <Paragraph position="22"> This could be the single document summary (see Figure 6).</Paragraph>
    <Paragraph position="23">  * Contextually: The user needs to be able to zoom in on the context surrounding the chosen passages.</Paragraph>
    <Paragraph position="24"> * Redirection: The user should be able to highlight certain parts of the synthetic summary and give a command to the system indicating that these parts are to be weighted heavily and that other parts are to be given a lesser weight.</Paragraph>
  </Section>
  <Section position="9" start_page="187" end_page="188" type="metho">
    <SectionTitle>
7. EVALUATION OF SINGLE
DOCUMENT SUMMARIZATION
</SectionTitle>
    <Paragraph position="0"> An ideal text summary contains the relevant information for which the user is looking, excludes extraneous information, provides background to suit the user's profile, eliminates redundant information and filters out relevant information that the user knows or has seen. The first step in building such summaries is extracting the relevant pieces of articles to a user query. We performed a pilot evaluation in which we used a database of assessor marked relevant sentences to examine how well a summarization system could extract the relevant sections of documents.</Paragraph>
    <Paragraph position="1"> Automatically generating text extraction summaries based on a query or high frequency words from the text can produce a reasonable looking summary, yet this summary can be far from the optimal goal of quality summaries: readable, useful, intelligible, appropriate length summaries from which the information that the user is seeking can be extracted. Jones &amp; Galliers define this type of evaluation as intrinsic (measuring a system's quality) compared to extrinsic (measuring a system's performance in a given task) \[7\].</Paragraph>
    <Paragraph position="2"> In the past year, there has been a focus in TIPSTER on both the intrinsic and extrinsic aspects of summarization evaluation \[4\]. The evaluation consisted of three tasks (1) determining document relevance to a topic for query-relevant summaries (an indicative summary), (2) determining categorization for generic summaries (an indicative summary), (3) establishing whether summaries can answer a specified set of questions (an informative summary) by comparison to an ideal summary. In each task, the summaries are rated in terms of confidence in decision, intelligibility and length. Jing, Barzilay, McKeown and Elhadad \[6\] performed a pilot experiment (40 sentences) in which they examined the performance (precision-recall) of three summarization systems (one using notion of number of sentences, the other two using numbers of words or number of clauses). They compared the performance of these systems against human ideal summaries and found that different systems achieved their best performances at different lengths (compression ratios). They also found the same results for determining document relevance to a topic (one of the TIPSTER tasks) for query-relevant summaries.</Paragraph>
    <Paragraph position="3"> Our approach to summarization is different from Columbia and TIPSTER in that the focus is not on an &amp;quot;ideal human summary&amp;quot; of any particular document cutoff size. An ideal summarization system must first be able to recognize the relevant sentences (or parts of a document) for a topic or query and then be able to create a summary from these relevant segments.</Paragraph>
    <Paragraph position="4"> Although a list of words, an index or table of contents, is an appropriate label summary and can indicate relevance, informative summaries need at least noun-verb phrases. We choose to use the sentence as our underlying unit and evaluated summarization systems for the first stage of summary creation - coverage of relevant sentences. Other systems \[16, 23\] use the paragraph as a summary unit. Since the paragraph consists of more than one sentence and often more than one information unit, it is not as suitable for this type of evaluation, although it may be more suitable for a construction unit in summaries due to the additional context that it provides. For example., paragraphs will often solve co-reference issues, yet provide additional non-relevant information. One of the issues in  summarization evaluation is how to score (penalize) extraneous non-useful information contained in a summary.</Paragraph>
    <Paragraph position="5"> Unlike document information retrieval, text summarization evaluation has not extensively addressed the performance of different methodologies by evaluating the effects of different components.</Paragraph>
    <Paragraph position="6"> Most summarization systems use linguistic knowledge as well as a statistical component \[3, 5, 16, 23\]. We applied the monolingual information retrieval method of query expansion \[20, 27, 28\] to summarization, using parts of the document to expand our queries. We also performed compression experiments. We used a modified version of the 11-pt average recall/precision (Section 9.2) to evaluate our results.</Paragraph>
  </Section>
  <Section position="10" start_page="188" end_page="188" type="metho">
    <SectionTitle>
8. EXPERIMENT DESIGN
</SectionTitle>
    <Paragraph position="0"> For our pilot experiment, we created two data sets, one based on relevant sentence judgments, the other based on model summaries (Section 8.1). We then defined a modified version of the 11-point average recall precision (Section 8.2) to use as our evaluation measure. We then performed experiments as described in Section 9 to evaluate the effects of MMR, query expansion, and compression.</Paragraph>
    <Section position="1" start_page="188" end_page="188" type="sub_section">
      <SectionTitle>
8.1 Data Sets
</SectionTitle>
      <Paragraph position="0"> We created two data sets for our pilot experiments.</Paragraph>
      <Paragraph position="1"> For the first { 110 Set} we took 50 documents from the TIPSTER evaluation provided set of 200 news articles spanning 1988-1991. All these documents were on the same topic (see Figure 3). Three evaluators ranked each of the sentences in the document as relevant, somewhat relevant and not relevant. For the purpose of this experiment, somewhat relevant was treated as relevant and the final score for the sentence was determined by a majority vote. The sentences that received this majority vote were tabulated as a relevant sentence (to the topic). The document was ranked as relevant or not relevant. All three assessors had 68% agreement in their relevance judgments. The query was extracted from the topic (see Figure 3).</Paragraph>
      <Paragraph position="2"> The second data set {Model Sutures} was provided as a training set for the Question and Answer portion of the TIPSTER evaluation. It consisted of &amp;quot;model summaries&amp;quot; which contained sentences of an article that answered a list of questions. These model sentences were used to score the summarizer. The query was extracted from the questions.</Paragraph>
    </Section>
    <Section position="2" start_page="188" end_page="188" type="sub_section">
      <SectionTitle>
8.2 Evaluation Code
</SectionTitle>
      <Paragraph position="0"> We modified the 11-pt recall-precision curves \[21\] commonly used for document information retrieval.</Paragraph>
      <Paragraph position="1"> Since many documents only have a few relevant sentences, corresponding curves for summarization have a lot of intervals with missing data items. To remedy this situation, we implemented a step function for the precision values. This allowed the recall intervals that would not naturally be filled to be assigned an actual precision value. For example, in the case of two relevant sentences in the document, points 0-5 (the first five intervals) would all have the first precision value (naturally occurring at point 5) and points 6-10 (the second value), the second value (naturally occurring at point 10). We interpolated the results of each query for the composite graph to form modified interpolated recall-precision curves.</Paragraph>
      <Paragraph position="2"> In order to account for the fact that a compressed summary does not have the opportunity to return the full set of relevant sentences, we use a normalized version of recall and a normalized version of F1 as defined below.</Paragraph>
      <Paragraph position="3"> Given:</Paragraph>
      <Paragraph position="5"/>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML