XML Viewer - w04-1017

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-1017_metho.xml
Size: 17,607 bytes
Last Modified: 2025-10-06 14:09:10
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1017">
  <Title>Event-Based Extractive Summarization</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 General Summarization Model
</SectionTitle>
    <Paragraph position="0"> Many summarization systems (e.g., (Teufel and Moens, 1997; McKeown et al., 1999; Lin and Hovy, 2000)) include two levels of analysis: the sentence level, where every textual unit is scored according to</Paragraph>
    <Paragraph position="2"> the concepts or features it covers, and the text level, where, before being added to the final output, textual units are compared to each other on the basis of those features.</Paragraph>
    <Paragraph position="3"> In Section 1 we presented a four-step pipeline for extractive summarization; existing summarization systems largely follow this pipeline, although they introduce different approaches for every step in it. We suggest a model that describes the extractive summarization task in general terms. Consider the matrix in Table 1.</Paragraph>
    <Paragraph position="4"> Rows of this matrix represent all textual units into which the input text is divided. Columns represent the concepts discovered for the input text. Every concept is either absent or present in a given textual unit. Each concept ci has also an associated weight wi indicating the importance of this concept. These weights can be used for scoring the textual units.</Paragraph>
    <Paragraph position="5"> Thus, the input text and the important information in it is mapped onto an m n matrix. Using the above matrix it is possible to formulate the extractive summarization problem as extracting the minimal amount of textual units which cover all the concepts that are interesting or important. To account for the cost of long summaries, we can constrain the total length of the summary, or balance it against the total weight of covered concepts.</Paragraph>
    <Paragraph position="6"> The presented model can be also used for comparing summaries consisting of different textual units. For example, a summary consisting only of textual unit t1 renders the same information as the summary consisting of textual units t2 and t3. Both these summaries cover the same set of concepts, namely c1, c2 and c3. We explore properties of this model in more detail in (Filatova and Hatzivassiloglou, 2004).</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Associating Concepts with Features
</SectionTitle>
    <Paragraph position="0"> Before extracting a summary, it is necessary to define what concepts in the input text are important and should be covered by the output text. There is no exact definition or even agreement between different approaches on what an important concept is.</Paragraph>
    <Paragraph position="1"> In order to use the model of Section 2 one has to approximate the notion of &amp;quot;concept&amp;quot; with some textual features.</Paragraph>
    <Paragraph position="2"> Current summarization approaches use text features which give high scores to the textual units that contain important information, and low scores to those textual units which are not highly likely to contain information worth to be included in the final output.</Paragraph>
    <Paragraph position="3"> There exist approaches that deal mainly with lexical features, like tf*idf weighing of words in the input text(s), words used in the titles and section headings (Luhn, 1958; Edmundson, 1968), or the presence or absence of certain cue phrases like significant, important, and in conclusion (Kupiec et al., 1995; Teufel and Moens, 1997). Other systems exploit the co-occurrence of particular concepts (Barzilay and Elhadad, 1997; Lin and Hovy, 2000) or syntactic constraints between concepts (McKeown et al., 1999). Concepts do not have to be directly observable as text snippets--they can represent abstract properties that particular text units may or may not satisfy, for example, status as a first sentence in a paragraph or generally position in the source text (Baxendale, 1958; Lin and Hovy, 1997).</Paragraph>
    <Paragraph position="4"> Some summarization systems assume that the importance of a sentence is derivable from a rhetorical representation of the source text (Marcu, 1997).</Paragraph>
    <Paragraph position="5"> The matrix representation of the previous section offers a way to formalize the sharing of information between textual units at the individual feature level. Thus, this representation is most useful for contentrelated concepts that should not be repeated in the summary. The representation can however handle independent features such as sentence position by encoding them separately for each textual unit.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Atomic Events
</SectionTitle>
    <Paragraph position="0"> Atomic events link major constituent parts of the actions described in a text or collection of texts through the verbs or action nouns labeling the event itself. The idea behind this technique is that the major constituent parts of events (participants, locations, times) are usually realized in text as named entities. The more important the constituent part, the more often the corresponding named entity is mentioned.</Paragraph>
    <Paragraph position="1"> Not all the constituent parts of events need to be represented by named entities. For example, in an airline crash it is important to report information about the passengers and the crew. These are not marked by named entities but are highly likely to be among the most frequently used nouns. Thus, we add the top ten most frequent nouns to the list of named entities.</Paragraph>
    <Paragraph position="2"> We use the algorithm for atomic event extraction proposed in (Filatova and Hatzivassiloglou, 2003).</Paragraph>
    <Paragraph position="3"> It involves the following steps:  1. Analyze each input sentence1 one at a time; ignore sentences that do not contain at least two named entities or frequent nouns.</Paragraph>
    <Paragraph position="4"> 2. Extract all the possible pairs of named entities/frequent nouns in the sentence, preserving their order and all the words in between. We call such pairs of named entities relations, and the words in-between the named entities in a relation connectors.</Paragraph>
    <Paragraph position="5"> 3. For each relation, count how many times this relation is used in the input text(s).</Paragraph>
    <Paragraph position="6"> 4. Keep only connectors that are content verbs  or action nouns, according to WordNet's (Fellbaum, 1998) noun hierarchy. For each connector calculate how many times it is used for the extracted relation.</Paragraph>
    <Paragraph position="7"> After calculating the scores for all relations and all connectors within each relation, we calculate their normalized scores The normalized relation score is the ratio of the count for the current relation (how many times we see the relation within a sentence in the input) over the overall count of all relations. The normalized connector score is the ratio of the count for the current connector (how many times we see this connector for the current relation) over the overall count for all connectors for this relation. null Thus, out of the above procedural definition, an atomic event is a triplet of two named entities (or frequent nouns) connected by a verb or an actiondenoting noun. To get a score for the atomic event we multiply the normalized score for the relation by the normalized score for the connector. The score indicates how important the triplet is overall.</Paragraph>
    <Paragraph position="8"> In the above approach to event detection we do not address co-reference, neither we merge together the triplets which describe the same event using paraphrases, inflected forms and syntactic variants (e.g., active/passive voice). Our method uses relatively simple extraction techniques and shallow statistics, but it is fully automatic and can serve as a first approximation of the events in the input text(s). Our approach to defining events is not the only one proposed--this is a subject with substantial work in linguistics, information retrieval, and information extraction. In linguistics, events are often defined at a fine-grained level as a matrix verb or a single action noun like &amp;quot;war&amp;quot; (Pustejovsky, 2000). In contrast, recent work in information retrieval 1We earlier showed empirically (Filatova and Hatzivassiloglou, 2003) that a description of a single event is usually bound within one sentence.</Paragraph>
    <Paragraph position="9"> within the TDT framework has taken event to mean essentially &amp;quot;narrowly defined topic for search&amp;quot; (Allan et al., 1998). Finally, for the information extraction community an event represents a template of relationships between participants, times, and places (Marsh and Perzanowski, 1997). It may be possible to use these alternative models of events as a source of content features.</Paragraph>
    <Paragraph position="10"> We earlier established empirically (Filatova and Hatzivassiloglou, 2003) that this technique for atomic event extraction is useful for delineating the major participants and their relationships from a set of topically related input texts. For example, from a collection of documents about an airplane crash the algorithm assigns the highest score to atomic events that link together the name of the airline, the source and destination airports and the day when the crash happened through the verb crashed or its synonyms.</Paragraph>
    <Paragraph position="11"> It is thus plausible to explore the usefulness of these event triplets as the concepts used in the model of Section 2.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Textual Unit Selection
</SectionTitle>
    <Paragraph position="0"> We have formulated the problem of extractive summarization in terms of the matrix model, stating that mapping concepts present in the input text onto the textual units out of which the output is constructed can be accomplished by extracting the minimal amount of textual units which either cover most of the important concepts. Every time we add a new textual unit to the output it is possible to judge what concepts in it are already covered in the final summary. This observation can be used to avoid redundancy: before adding a candidate textual unit to the output summary, we check whether it contains enough new important concepts.</Paragraph>
    <Paragraph position="1"> We describe in this section several algorithms for selecting appropriate textual units for the output summary. These algorithms differ on whether they take advantage of the redundancy reduction prop-erty of our model, and on whether they prioritize important concepts individually or collectively. They share, however, a common property: all of them operate independently of the features chosen to represent important concepts, and thus can be used with both our event-based features and other feature sets.</Paragraph>
    <Paragraph position="2"> The comparison of the results allows us to empirically determine whether event-based features can help in summarization.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Static Greedy Algorithm
</SectionTitle>
      <Paragraph position="0"> Our first text unit selection algorithm does not support any mechanism for avoiding redundant information in the summary. Instead, it rates each textual  unit independently. Textual units are included in the summary if and only if they cover lots of concepts. More specifically, 1. For every textual unit, calculate the weight of this textual unit as the sum of the weights of all the concepts covered by this textual unit.</Paragraph>
      <Paragraph position="1"> 2. Choose the textual unit with the maximum weight and add it to the final output.</Paragraph>
      <Paragraph position="2"> 3. Continue extracting other textual units in order of total weight till we get the summary of the desired length.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Avoiding Redundancy in the Summary
</SectionTitle>
      <Paragraph position="0"> Two popular techniques for avoiding redundancy in summarization are Maximal Marginal Relevance (MMR) (Goldstein et al., 2000) and clustering (McKeown et al., 1999). In MMR the determination of redundancy is based mainly on the textual overlap between the sentence that is about to be added to the output and the sentences that are already in the output. Clustering offers an alternative: before starting the selection process, the summarization system clusters the input textual units. This step allows analyzing one representative unit from each cluster instead of all textual units.</Paragraph>
      <Paragraph position="1"> We take advantage of the model matrix of Section 2 to explore another way to avoid redundancy.</Paragraph>
      <Paragraph position="2"> Rather than making decisions for each textual unit independently, as in our Static Greedy Algorithm, we globally select the subset of textual units that cover the most concepts (i.e., information) present in the input. Then our task becomes very similar to a classic theory problem, Maximum Coverage.</Paragraph>
      <Paragraph position="3"> Given C, a finite set of weighted elements, a collection T of subsets of C, and a parameter k, the maximum coverage problem is to find k members of T such that the total weight of the elements covered (i.e., belonging to the k members of the solution) is maximized. This problem is NP-hard, as it can be reduced to the well-known set cover problem (Hochbaum, 1997). Thus, we know only approximation algorithms solving this problem in polynomial time.</Paragraph>
      <Paragraph position="4"> Hochbaum (1997) reports that a greedy algorithm is the best possible polynomial approximation algorithm for this problem. This algorithm iteratively adds to the solution S the set ti 2 T that locally maximizes the increase in the total weight of elements covered by S [ ti. The algorithm gives a solution with weight at least 1=(1 e) of the optimal solution's total weight.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.3 Adaptive Greedy Algorithm
</SectionTitle>
      <Paragraph position="0"> The greedy algorithm for the maximum coverage problem is not directly applicable to summarization, because the formulation of maximum coverage assumes that any combination of k sets ti (i.e., k sentences) is equally good as long as they cover the same total weight of concepts. A more realistic limitation for the summarization task is to aim for a fixed total length of the summary, rather than a fixed total number of sentences; this approach has been adopted in several evaluation efforts, including the Document Understanding Conferences (DUC). We consequently modify the greedy algorithm for the maximum coverage problem to obtain the following adaptive greedy algorithm for summarization:  1. For each textual unit calculate its weight as the sum of weights of all concepts it covers.</Paragraph>
      <Paragraph position="1"> 2. Choose the textual unit with the maximum weight and add it to the output. Add the concepts covered by this textual unit to the list of concepts covered in the final output.</Paragraph>
      <Paragraph position="2"> 3. Recalculate the weights of the textual units: subtract from each unit's weight the weight of all concepts in it that are already covered in the output.</Paragraph>
      <Paragraph position="3"> 4. Continue extracting text units in order of their  total weight (going back to step 2) until the summary is of the desired length.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.4 Modified Adaptive Greedy Algorithm
</SectionTitle>
      <Paragraph position="0"> The adaptive greedy algorithm described above prioritizes sentences according to the total weight of concepts they cover. While this is a reasonable approach, an alternative is to give increased priority to concepts that are individually important, so that sentences mentioning them have a chance of being included in the output even if they don't contain other important concepts. We have developed the following variation of our adaptive greedy algorithm, termed the modified greedy algorithm:  1. For every textual unit calculate its weight as the sum of weights of all concepts it covers.</Paragraph>
      <Paragraph position="1"> 2. Consider only those textual units that contain the concept with the highest weight that has not yet been covered. Out of these, choose the one with highest total weight and add it to the final output. Add the concepts which are covered by this textual unit to the list of concepts covered in the final output.</Paragraph>
      <Paragraph position="2"> 3. Recalculate the weights of the textual units: subtract from each unit's weight the weight of all concepts in it that are already covered in the output.</Paragraph>
      <Paragraph position="3"> 4. Continue extracting textual units, going back  to step 2 each time, until we get a summary of the desired length.</Paragraph>
      <Paragraph position="4"> The modified greedy algorithm has the same mechanism for avoiding redundancy as the adaptive greedy one, while according a somewhat different priority to individual sentences (weight of most important concepts versus just total weight).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML