File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/j98-3005_metho.xml
Size: 64,167 bytes
Last Modified: 2025-10-06 14:14:48
<?xml version="1.0" standalone="yes"?> <Paper uid="J98-3005"> <Title>Generating Natural Language Summaries from Multiple On-Line Sources</Title> <Section position="5" start_page="475" end_page="475" type="metho"> <SectionTitle> 3. System Overview </SectionTitle> <Paragraph position="0"> The overall architecture of our summarization system given earlier in Figure 1 draws on research in software agents (Genesereth and Ketchpel 1994) to allow connections to a variety of different types of data sources. Facilities are used to provide a transparent interface to heterogeneous data sources that run on several machines and may be written in different programming languages. Currently, we have incorporated facilities to various live news streams, the CIA World Factbook, and past newspaper archives.</Paragraph> <Paragraph position="1"> The architecture allows for the incorporation of additional facilitators and data sources as our work progresses.</Paragraph> <Paragraph position="2"> The system extracts data from the different sources and then combines it into a conceptual representation of the summary. The summarization component, shown on the left side of the figure, consists of a base summary generator, which combines information from multiple input articles and organizes that information using a paragraph planner. The structured conceptual representation of the summary is passed to the lexical chooser, shown at the bottom of the diagram. The lexical chooser also receives input from the World Factbook and possible descriptions of people or organizations to augment the base summary. The full content is then passed through a sentence generator, implemented using the FUF/SURGE language generation system (Elhadad 1993; Robin 1994). FUF is a functional unification formalism that uses a large systemic grammar of English, called SURGE, to fill in syntactic constraints, build a syntactic tree, choose closed class words, and eventually linearize the tree as a sentence.</Paragraph> <Paragraph position="3"> The right side of the figure shows how proper nouns and their descriptions are extracted from past news. An entity extractor identifies proper nouns in the past newswire archives, along with descriptions. Descriptions are then categorized using the WordNet hierarchy. Finally, an FD or functional description (Elhadad 1993) for the description is generated so that it can be reused in fluent ways in the final summary. FDs mix functional, semantic, syntactic, and lexical information in a recursive attribute-value format that serves as the basic data structure for all information within</Paragraph> </Section> <Section position="6" start_page="475" end_page="476" type="metho"> <SectionTitle> FUF / SURGE. 4. Generating the Summary </SectionTitle> <Paragraph position="0"> SUMMONS produces a summary from sets of templates that contain the salient facts reported in the input articles and that are produced by the message understanding systems. These systems extract specific pieces of information from a given news article. An example of a template produced by MUC systems and used in our system is shown in Figures 2 and 3. To test our system, we used the templates produced by systems participating in MUC-4 (MUC 1992) as input. MUC-4 systems operate on the terrorist domain and extract information by filling fields such as perpetrator, victim, ~md type of event, for a total number of 25 fields per template. In addition, we filled the same template forms by hand from current news articles for further testing. Currently, work is under way in our group on the building of an information extraction Radev and McKeown Generating Natural Language Summaries</Paragraph> </Section> <Section position="7" start_page="476" end_page="487" type="metho"> <SectionTitle> MESSAGE: ID MESSAGE: TEMPLATE INCIDENT: DATE INCIDENT: LOCATION INCIDENT: TYPE INCIDENT: STAGE OF EXECUTION INCIDENT: INSTRUMENT ID INCIDENT: INSTRUMENT TYPE PERP: INCIDENT CATEGORY PERP: INDIVIDUAL ID PERP: ORGANIZATION ID PERP: ORG. CONFIDENCE PHYS TGT: ID PHYS TGT: TYPE PHYS TGT: NUMBER PHYS TGT: FOREIGN NATION * PHYS TGT: EFFECT OF INCIDENT PHYS TGT: TOTAL NUMBER HUM TGT: NAME HUM TGT: DESCRIPTION HUM TGT: TYPE HUM TGT: NUMBER HUM TGT: FOREIGN NATION HUM TGT: EFFECT OF INCIDENT HUM TGT: TOTAL NUMBER </SectionTitle> <Paragraph position="0"> Parsed MUC-4 template.</Paragraph> <Paragraph position="1"> module similar to the ones used in the MUC conferences, which we will later use as an input to SUMMONS. We are basing our implementation on the tools developed at the University of Massachusetts (Fisher et al. 1995). The resulting system will not only be able to generate summaries from preparsed templates but will also produce summaries directly from raw text by merging the message understanding component with the current version of SUMMONS.</Paragraph> <Paragraph position="2"> Our work provides a methodology for developing summarization systems, identifies planning operators for combining information in a concise summary, and uses empirically collected phrases to mark summarized material. We have collected a corpus of newswire summaries that we used as data for developing the planning operators and for gathering a large set of lexical constructions used in summarization. This Computational Linguistics Volume 24, Number 3 Reuters reported that 18 people were killed in a Jerusalem bombing Sunday. The next day, a bomb in Tel Aviv killed at least 10 people and wounded 30 according to Israel radio. Reuters reported that at least 12 people were killed and 105 wounded. Later the same day, Reuters reported that the radical Muslim group Hamas had claimed responsibility for the act.</Paragraph> <Paragraph position="3"> Figure 4 Sample output from SUMMONS.</Paragraph> <Paragraph position="4"> corpus will eventually aid in a full system evaluation. Since news articles often summarize previous reports of the same event, our corpus also includes short summaries of previous articles.</Paragraph> <Paragraph position="5"> We used this corpus to develop both the content planner (i.e., the module that determines what information to include in the summary) and the linguistic component (i.e., the module that determines the words and surface syntactic form of the summary) of our system. We used the corpus to identify planning operators that are used to combine information; this includes techniques for linking information together in a related way (e.g., identifying changes, similarities, trends) as well as making generalizations. We also identified phrases that are used to mark summaries and used these to build the system lexicon. An example summary produced by the system is shown in Figure 4. This paragraph sun~arizes four articles about two separate terrorist acts that took place in Israel in March of 1996 using two different planning operators. While the system we report on is fully implemented, our work is undergoing continuous development. Currently, the system includes eight different planning operators, a testbed of 200 input templates grouped into sets on the same event, and can produce fully lexicalized summaries for approximately half of the cases (the rest of the templates were either not complete or the information extracted in them was irrelevant to the task). We haven't performed an evaluation beyond the testbed. Our work provides a methodology for increasing the vocabulary size and the robustness of the system using a collected corpus, and moreover, it shows how summarization can be used to evaluate the message understanding systems, identifying future research directions that would not be pursued under the current MUC evaluation cycle. 2 Due to inherent difficulties in the summarization task, our work is a substantial first step and provides the framework for a number of different research directions.</Paragraph> <Paragraph position="6"> The rest of this section describes the summarizer, specifying the planning operators used for summarization as well as a detailed discussion of the summarization algorithm showing how summaries of different length are generated. We provide examples of the summarization markers we collected for the lexicon and show the demands that :summarization creates for interpretation.</Paragraph> <Section position="1" start_page="477" end_page="479" type="sub_section"> <SectionTitle> 4.1 Overview of the Summarization Component </SectionTitle> <Paragraph position="0"> The summarization component of SUMMONS is based on the traditional language generation system architecture (McKeown 1985; McDonald and Pustejovsky 1986; Hovy 1988). A typical language generator is divided into two main components, a 2 Participating systems in the DARPA message understanding program are evaluated on a regular basis. Participants are given a set of training text to tune their systems over a period of time and their systems are tested on unseen text at follow-up conferences.</Paragraph> <Paragraph position="1"> Radev and McKeown Generating Natural Language Summaries content planner, which selects information from an underlying knowledge base to include in a text, and a linguistic component, which selects words to refer to concepts contained in the selected information and arranges those words, appropriately inflecting them, to form an English sentence. The content planner produces a conceptual representation of text meaning (e.g., a frame, a logical form, or an internal representation of text) and typically does not include any linguistic information. The linguistic component uses a lexicon and a grammar of English to realize the conceptual representation into a sentence. The lexicon contains the vocabulary for the system and encodes constraints about when each word can be used. As shown in Figure 1, the content planner used by SUMMONS determines what information from the input MUC templates should be included in the summary using a set of planning operators that are specific to summarization and, to some extent, to the terrorist domain. Its linguistic component determines the phrases and surface syntactic form of the summary. The linguistic component consists of a lexical chooser, which determines the high-level sentence structure of each sentence and the words that realize each semantic role, and the FUF/SURGE (Elhadad 1991; Elhadad 1993) sentence generator.</Paragraph> <Paragraph position="2"> Input to SUMMONS is a set of templates, where each template represents the information extracted from one or more articles by a message understanding system.</Paragraph> <Paragraph position="3"> However, we constructed by hand an additional set of templates that include also terrorist events that have taken place after the period of time covered in MUC-4, such as the World Trade Center bombing, the Hebron Mosque massacre and more recent incidents in Israel, as well as the disaster in Oklahoma City. These incidents were not handled by the original message understanding systems. We also created by hand a set of templates unrelated to real newswire articles, which we used for testing some techniques of our system. We enriched the templates for all these cases by adding four slots: the primary source, the secondary source, and the times at which both sources made their reports. 3 We found having the source of the report immensely useful for discovering and reporting contradictions and generalizations, because often different reports of an event are in conflict. Also, source information can indicate the level of confidence of the report, particularly when reported information changes over time.</Paragraph> <Paragraph position="4"> For example, if several secondary sources all report the same facts for a single event, citing multiple primary sources, it is more likely that this is the way the event really happened, while if there are many contradictions between reports, it is likely that the facts are not yet fully known.</Paragraph> <Paragraph position="5"> Members of our research group are currently working on event tracking (Aho et al. 1997). Their prototype uses pattern-matching techniques to track changes to on-line news sources and provide a live feed of articles that relate to a changing event.</Paragraph> <Paragraph position="6"> SUMMONS's summarization component generates a base summary, which contains facts extracted from the input set of articles. The base summary is later enhanced with additional facts from on-line structured databases with descriptions of individuals extracted from previous news to produce the extended summary. The base summary is a paragraph consisting of one or more sentences, where the length of the summary is controlled by a variable input parameter. In the absence of a specific user model, the base summary is produced. Otherwise, the extended summary (base summary with added descriptions of entities) is generated instead. Similarly, the default is that the summary contains references to contradictory and updated information. However, if Computational Linguistics Volume 24, Number 3 the user profile makes it explicit, only the latest and the most trusted (as per the user's preference of sources) facts are included.</Paragraph> <Paragraph position="7"> SUMMONS rates information in terms of importance, where information that appears in only one article is given a lower rating and information that is synthesized from multiple articles is rated more highly.</Paragraph> <Paragraph position="8"> Development of the text generation component of SUMMONS was made easier because of the language generation tools and framework available at Columbia University. No changes in the FUF sentence generator were needed. In addition, the lexical chooser and content planner were based on the design used in the PLANDoc automated documentation system described in Section 2.3.</Paragraph> <Paragraph position="9"> In particular, we used FUF to implement the lexical chooser, representing the lexicon as a grammar as we have done in many previous systems (Elhadad 1993; Robin 1994; McKeown, Robin, and Tanenblatt 1993; Feiner and McKeown 1991). The main effort in porting the approach to SUMMONS was in identifying the words and phrases needed for the domain. The content planner features several stages. It first groups news articles together, identifies commonalities between them, and notes how the discourse influences wording by setting realization flags, which denote such discourse features as &quot;similarity&quot; and &quot;contradiction.&quot; Realization flags (McKeown, Kukich, and Shaw 1994b) guide the choice of connectives in the generation stage.</Paragraph> <Paragraph position="10"> Before lexical choice, SUMMONS maps the templates into FDs that are expected as input to FUF and uses a domain ontology (derived from the ontologies represented in the message understanding systems) to enrich the input. For example, grenades and bombs are both explosives, while diplomats and civilians are both considered to be human targets.</Paragraph> </Section> <Section position="2" start_page="479" end_page="479" type="sub_section"> <SectionTitle> 4.2 Methodology: Collecting and Using a Summary Corpus </SectionTitle> <Paragraph position="0"> In order to produce plausible and understandable summaries, we used available on-line corpora as models, including the Wall Street Journal and current newswire from Reuters and the Associated Press. The corpus of summaries is 2.5 MB in size. We have manually grouped 300 articles in threads related to single events or series of similar events.</Paragraph> <Paragraph position="1"> From the corpora collected in this way, we extracted manually, and after careful investigation, several hundred language constructions that we found relevant to the types of summaries we want to produce. In addition to the summary cue phrases collected from the corpus, we also tried to incorporate as many phrases as possible that have relevance to the message understanding conference domain. Due to domain variety, such phrases were essentially scarce in the newswire corpora and we needed to collect them from other sources (e.g., modifying templates that we acquired from the summary corpora to provide a wider coverage).</Paragraph> <Paragraph position="2"> Since one of the features of a briefing is conciseness, we have tried to assemble small paragraph summaries that, in essence, describe a single event and the change of perception of the event over time, or a series of related events with no more than a few sentences.</Paragraph> </Section> <Section position="3" start_page="479" end_page="482" type="sub_section"> <SectionTitle> 4.3 Summary Operators for Content Planning </SectionTitle> <Paragraph position="0"> The main point of departure for SUMMONS from previous work is in the stage of identifying what information to include and how to group it together, as well as the use of a corpus to guide this and later processes. In PLANDoc, successive items to summarize are very similar and the problem is to form a grouping that puts the most similar items together, allowing the use of conjunction and ellipsis to delete repetitive material. For summarizing multiple news articles, the task is almost the opposite; we Radev and McKeown Generating Natural Language Summaries ((#TEMPLATES == 2) && (T \[i\]. INCIDENT. LOCATION == T \[2\]. INCIDENT. LOCATION) a& (T \[i\]. INCIDENT.TIME < T \[2\]. INCIDENT.TIME) St& ...</Paragraph> <Paragraph position="1"> (T \[i\]. SECSOURCE. SOURCE ! = T \[2\]. SECSOURCE. SOURCE) ) ==> (apply (' ' contradiction' ', ' 'with-new-account' ', T \[i\], T \[2\] ) ) Figure 5 Rules for the contradiction operator.</Paragraph> <Paragraph position="2"> need to find the differences from one article to the next, identifying how the reported facts have changed. Thus, the main problem was the identification of summarization strategies, which indicate how information is linked together to form a concise and cohesive summary. As we have found in other work (Robin 1994), what information is included is often dependent on the language available to make concise additions. Thus, using a corpus summary was critical to identifying the different summaries possible.</Paragraph> <Paragraph position="3"> We have developed a set of heuristics derived from the corpora that decide what types of simple sentences constitute a summary, in what order they need to be listed, as well as the ways in which simple sentences are combined into more complex ones. In addition, we have specified which summarization-specific phrases are to be included in different types of summaries.</Paragraph> <Paragraph position="4"> The system identifies a preeminent set of templates from the input to the MUC system. This set needs to contain a large number of similar fields. If this holds, we can merge the set into a simpler structure, keeping the common features and marking the distinct features as Elhadad (1993) and McKeown, Kukich, and Shaw (1994b) suggest. At each step, a summary operator is selected based on existing similarities between articles in the database. This operator is then applied to the input templates, resulting in a new template that combines, or synthesizes, information from the old. Each operator is independent of the others and several can be applied in succession to the input templates. Each of the seven major operators is further subdivided to cover various modifications to its input. Figure 5 shows part of the rules for the Contradiction operator. Given two templates, if INCIDENT.LOCATION is the same, the time of first report is before time of second report, the report sources are different, and at least one other slot differs in value, apply the contradiction operator to combine the templates.</Paragraph> <Paragraph position="5"> A summary operator encodes a means for linking information in two different templates. Often it results in the synthesis of new information. For example, a generalization may be formed from two independent facts. Alternatively, since we are summarizing reports written over time, highlighting how knowledge of the event changed is important and, therefore, summaries sometimes must identify differences between reports. A description of the operators we identified in our corpus follows, accompanied by an example of system output for each operator. Each example primarily summarizes two or three input templates, as this is the result of applying a single operator once. More complex summaries can be produced by applying multiple operators on the same input, as shown in the examples; see Figures 6 to 11 in Section 4.5.</Paragraph> <Paragraph position="6"> plete information, the change is usually included in the summary. In order for the &quot;change of perspective&quot; operator to apply, the SOURCE field must be the same, while Computational Linguistics Volume 24, Number 3 the value of another field changes so that it is not compatible with the original value. For example, if the number of victims changes, we know that the first report was wrong if the number goes down, while the source had incomplete information (or additional people died) if the number goes up. The first two sentences from the following example were generated using the change of perspective operator. The initial estimate of &quot;at least 10 people&quot; killed in the incident becomes &quot;at least 12 people.&quot; Similarly, the change in the number of wounded people is also reported.</Paragraph> <Paragraph position="7"> Example 1 March 4th, Reuters reported that a bomb in Tel Aviv killed at least 10 people and wounded 30. Later the same day, Reuters reported that at least 12 people were killed and 105 wounded.</Paragraph> <Paragraph position="8"> 4.3.2 Contradiction. When two sources report conflicting information about the same event, a contradiction arises. In the absence of values indicating the reliability of the sources, a summary cannot report either of them as true, but can indicate that the facts are not clear. The number of sources that contradict each other can indicate the level of confusion about the event. Note that the current output of the message understanding systems does not include sources. However, SUMMONS uses this feature to report disagreement between output by different systems. A summary might indicate that one of the sources determined that 20 people were killed, while the other source determined that only 5 were indeed killed. The difference between this example and the previous one on change of perspective is the source of the update. If the same source announces a change, then we know that it is reporting a change in the facts. Otherwise, an additional source presents information that is not necessarily more correct than the information presented by the earlier source and we can therefore conclude that we have a contradiction.</Paragraph> <Paragraph position="9"> Example 2 The afternoon of February 26, 1993, Reuters reported that a suspected bomb killed at least six people in the World Trade Center. However, Associated Press announced that exactly five people were killed in the blast.</Paragraph> <Paragraph position="10"> 4.3.3 Addition. When a subsequent report indicates that additional facts are known, this is reported in a summary. Additional results of the event may occur after the initial report or additional information may become known. The operator determines this by the way the value of a template slot changes. Since the former template doesn't contain a value for the perpetrator slot and the latter contains information about claimed responsibility, we can apply the addition operator.</Paragraph> <Paragraph position="11"> Example 3 On Monday, a bomb in Tel Aviv killed at least 10 people and wounded 30 according to Israel radio. Later the same day, Reuters reported that the radical Muslim group Hamas had claimed responsibility for the act.</Paragraph> <Paragraph position="12"> 4.3.4 Refinement. In subsequent reports a more general piece of information may be refined. Thus, if an event is originally reported to have occurred in New York City, the location might later be specified as a particular borough of the city. Similarly, if a terrorist group is identified as Palesfinian, later the exact name of the terrorist group may be determined. Since the update is assigned a higher value of &quot;importance,&quot; it will be favored over the original article in a shorter summary. Unlike the previous Radev and McKeown Generating Natural Language Summaries example, there was a value for the perpetrator slot in the first template, while the second one further elaborates on it, identifying the perpetrator more specifically. Example 4 On Monday, Reuters announced that a suicide bomber killed at least 10 people in Tel Aviv. Later the same day, Reuters reported that the Islamic fundamentalist group Hamas claimed responsibility.</Paragraph> <Paragraph position="13"> heighten the reader's confidence in their veracity and thus, agreement between sources is usually reported.</Paragraph> <Paragraph position="14"> Example 5 The morning of March 1st 1994, UPI reported that a man was kidnapped in the Bronx. Later, this was confirmed by Reuters.</Paragraph> <Paragraph position="15"> all of them have incomplete information, it is possible to combine information from them to produce a more complete summary. This operator is also used to aggregate multiple events as shown in the example.</Paragraph> <Paragraph position="16"> Example 6 Reuters reported that 18 people were killed in a Jerusalem bombing Sunday. The next day, a bomb in Tel Aviv killed at least 10 people and wounded 30 according to Israel radio. A total off at least 28 people were killed in the two terrorist acts in Israel over the last two days.</Paragraph> <Paragraph position="17"> It should be noted that in this example, the third sentence will not be generated if there is a restriction on the length of the summary.</Paragraph> <Paragraph position="18"> 4.3.7 Trend. There is a trend if two or more articles reflect similar patterns over time. Thus, we might notice that three consecutive bombings occurred at the same location and summarize them into a single sentence.</Paragraph> </Section> <Section position="4" start_page="482" end_page="483" type="sub_section"> <SectionTitle> Example 7 </SectionTitle> <Paragraph position="0"> This is the third terrorist act committed by Hamas in four weeks.</Paragraph> <Paragraph position="1"> mary and secondary sources of a certain piece of news, and since these are generally trusted sources of information, we ought also to pay attention to the lack of information from a certain source when such is expected to be present. For example, it might be the case that a certain news agency reports a terrorist act in a given countr3~ but the authorities of that country don't give out any information. Since there is an infinite number of sources that might not confirm a given fact (or the system will not have access to the appropriate templates), we have included this operator only as an illustration of a concept that further highlights the domain-specificity of the system. Example 8 Two bombs exploded in Baghdad, Iraqi dissidents reported Friday. There was no confirmation of the incidents by the Iraqi National Congress.</Paragraph> </Section> <Section position="5" start_page="483" end_page="484" type="sub_section"> <SectionTitle> 4.4 Algorithm </SectionTitle> <Paragraph position="0"> The algorithm used in the system to sort, combine, and generalize the input templates is described in the following subsections.</Paragraph> <Paragraph position="1"> 4.4.1 Input. At this stage, the system receives a set of templates from the message understanding conferences or a similar set of templates from a related domain. All templates are described as lists of attribute/value pairs (as shown later in Figure 7). These pairs (with the exception of the source information) are defined in the MUC-4 guidelines.</Paragraph> <Paragraph position="2"> 4.4.2 Preprocessing. This stage includes the following substages: * The templates are sorted in chronological order.</Paragraph> <Paragraph position="3"> * Templates that have obviously been incorrectly generated by a MUC system are identified and filtered manually. This includes templates left blank or mostly unfilled by the MUC system.</Paragraph> <Paragraph position="4"> * A database of all fields and templates is created. This database is used later as a basis for grouping and collapsing templates.</Paragraph> <Paragraph position="5"> * All irrelevant fields or fields containing bad values are manually marked as such and don't participate in further analyses.</Paragraph> <Paragraph position="6"> * Templates related to the same event are manually grouped into sets for combination using SUMMONS.</Paragraph> <Paragraph position="7"> * Knowledge of the source of the information is marked as the specific message understanding system for the site submitting the template if it is not present in the input template. Note that since the current message understanding systems do not extract the source, this is the most specific we can be for such cases.</Paragraph> <Paragraph position="8"> We are experimenting with some techniques to automate the preprocessing stage. Our preliminary impressions show that by restricting SUMMONS to templates in which at least five or six slots are filled, we can eliminate most of the irrelevant templates.</Paragraph> <Paragraph position="9"> tween templates, which will trigger certain operators. Since slots are matched among templates in chronological order, there is only one sequence in which they can be applied.</Paragraph> <Paragraph position="10"> Such patterns trigger reordering of the templates and modification of their individual importance values. As an example, if two templates are combined with the refinement operator, the importance value of the combined template will be greater than the sum of the individual importance of the constituent templates. At the same time, the values of these two templates are lowered (still keeping a higher value on the later one, which is assumed to be the more correct of the two). All templates directly extracted from the MUC output are assigned an initial importance value of 100. Currently, with each application of an operator, we lower the value of a contributing individual template by 20 points and give any newly produced template that combines information from already existing contributing templates a value greater than the sum of the values of the contributing templates after those values have been updated. Furthermore, some operators reduce the importance values of Radev and McKeown Generating Natural Language Summaries existing templates even further (e.g., the refinement operator reduces the importance of chronologically earlier templates by additional increments of 20 points because they contain outdated information). Thus, the final summary will contain only the combined template if there are restrictions on length. Otherwise, text corresponding to the constituent templates will also be generated.</Paragraph> <Paragraph position="11"> The value of the importance of the template corresponds also to the position in the summary paragraph, as more important templates will be generated first. Each new template contains information indicating whether its constituent templates are obsolete and thus no longer needed. Also, at this stage the coverage vector (a data structure that keeps track of which templates have already been combined and which ones are still to be considered in applying operators) is updated to point to the templates that are still active and can be further combined. This way we make sure that all templates still have a chance of participating in the actual summary. The resulting templates are combined into small paragraphs according to the event or series of events that they describe. Each paragraph is then realized by the linguistic component. Each set of templates produces a single paragraph.</Paragraph> <Paragraph position="12"> the database after the heuristic combination stage, the content planner organizes the presentation of information within a paragraph.</Paragraph> <Paragraph position="13"> It looks at consecutive templates in the database, marked as separate paragraphs from the previous stage, and assigns values to &quot;realization switches&quot; that control local choices such as tense and voice. They also govern the presence or absence of certain constituents to avoid repetition of constituents and to satisfy anaphora constraints. converted into FDs, SUMMONS carries out the following steps to produce the base summary: * Templates are sorted according to the order of the value of the importance slot. Only the top templates are realized. Templates with higher importance values appear with priority in the summary if a restriction on length is specified.</Paragraph> <Paragraph position="14"> * An intermediate module, the ontologizer (part of the Base Summary Generator shown in Figure 1), converts factual information from the template database into data structures compatible with the ontology of the MUC domain. This is used, for example, to make generalizations (e.g., that Medellin and Bogot~i are in Colombia).</Paragraph> <Paragraph position="15"> * The lexical chooser component of SUMMONS is a functional (systemic) grammar that emphasizes the use of summarization phrases originating from the summary corpora. For example, it can generate verbs or nominal constructs for nodes in the MUC hierarchy (e.g., &quot;kidnapping&quot; vs. &quot;X kidnapped Y&quot;).</Paragraph> <Paragraph position="16"> * Surface generation from the augmented template FDs is performed using FUF and SURGE. We have written additional generation code to handle paragraph-level constructions related to the summarization operators.</Paragraph> </Section> <Section position="6" start_page="484" end_page="487" type="sub_section"> <SectionTitle> 4.5 An Example of System Operation </SectionTitle> <Paragraph position="0"> This subsection describes how the algorithm is applied to a set of four templates by tracing the computational process that transforms the raw source into a final natural Monday, killing at least 13 people and wounding more than 100. Israeli police say an Islamic suicide bomber blew himself up outside a crowded shopping mall. It was the fourth deadly bombing in Israel in nine days. The Islamic fundamentalist group Hamas claimed responsibility for the attacks, which have killed at least 54 people. Hamas is intent on stopping the Middle East peace process. President Clinton joined the voices of international condemnation after the latest attack. He said the &quot;forces of terror shall not triumph&quot; over peacemaking efforts. Article 4: TEL AVIV (Reuters) - A Muslim suicide bomber killed at least 12 people and wounded 105, including children, outside a crowded Tel Aviv shopping mall Monday, police said. Sunday, a Hamas suicide bomber killed 18 people on a Jerusalem bus. Hamas has now killed at least 56 people in four attacks in nine days. The windows of stores lining both sides of Dizengoff Street were shattered, the charred skeletons of cars lay in the street, the sidewalks were strewn with blood. The last attack on Dizengoff was in October 1994 when a Hamas suicide bomber killed 22 people on a bus.</Paragraph> <Paragraph position="1"> Figure 6 Fragments of input articles 1--4.</Paragraph> <Paragraph position="2"> language summary. Excerpts from the four input news articles are shown in Figure 6. The four news articles are transformed into four templates that correspond to four separate accounts of two related events and will be included in the set of templates from which the template combiner will work. Only the relevant fields are shown. Let's now consider the four templates in the order that they appear in the list of templates. These templates are shown in Figures 7 to 10. They are generated manually from the input newswire texts. Information about the primary and secondary sources of information is added (PRIMSOURCE and SECSOURCE) . The differences in the templates (which will trigger certain operators) are shown in bold face. The summary generated by the system was shown earlier in Figure 4 and is repeated here in Figure 11.</Paragraph> <Paragraph position="3"> The first two sentences are generated from template one. The subsequent sentences are generated using different operators that are triggered according to changing values for certain attributes in the three remaining templates.</Paragraph> <Paragraph position="4"> As previous templates didn't contain information about the perpetrator, SUMMONS applies the refinement operator to generate the fourth sentence. Sentence three is generated using the change of perspective operator, as the number of victims reported in articles two and three is different.</Paragraph> <Paragraph position="5"> The description for Hamas (&quot;radical Muslim group&quot;) was added by the extraction generator (see Section 5). Typically, a description is included in the source text and should be extracted by the message understanding system. In cases in which a description doesn't appear or is not extracted, SUMMONS generates a description from the database of extracted descriptions. We are currently working on an algorithm that &quot;killed: at least 13&quot; &quot;wounded: more than 100&quot; &quot;gamas&quot; will select the best description based on such parameters as the user model (what information has already been presented to the user?), the attitude towards the entity (is it favorable?), or a historical model that describes the changes in the profile of a person over the period of time (what was the previous occupation of the person who is being described?).</Paragraph> <Paragraph position="6"> Template for article four.</Paragraph> <Paragraph position="7"> Reuters reported that 18 people were killed in a Jerusalem bombing Sunday. The next day, a bomb in Tel Aviv killed at least 10 people and wounded 30 according to Israel radio. Reuters reported that at least 12 people were killed and 105 wounded. Later the same day, Reuters reported that the radical Muslim group Hamas had claimed responsibility for the act.</Paragraph> <Paragraph position="8"> Figure 11 SUMMONS output based on the four articles.</Paragraph> </Section> </Section> <Section position="8" start_page="487" end_page="492" type="metho"> <SectionTitle> 5. Generating Descriptions </SectionTitle> <Paragraph position="0"> When a summary refers to an entity (person, place, or organization), it can make use of descriptions extracted by the MUC systems. Problems arise when information needed for the summary is either missing from the input article(s) or not extracted by the information extraction system. In such cases, the information may be readily available in other current news stories, in past news, or in on-line databases. If the summarization system can find the needed information in other on-line sources, then it can produce an improved summary by merging information extracted from the input articles with information from the other sources (Radev and McKeown 1997).</Paragraph> <Paragraph position="1"> In the news domain, a summary needs to refer to people, places, and organizations and provide descriptions that clearly identify the entity for the reader. Such descriptions may not be present in the original text that is being summarized. For example, the American pilot Scott O'Grady, downed in Bosnia in June of 1995, was unknown to the American public prior to the incident. To a reader who tuned into news on this event days later, descriptions from the initial articles might be more useful. A summarizer that has access to different descriptions will be able to select the description that best suits both the reader and the series of articles being summarized. Similarly, in the example in Section 4, if the user hasn't been informed about what Hamas is and no description is available in the source template, older descriptions in the FD format can be retrieved and used.</Paragraph> <Paragraph position="2"> In this section, we describe an enhancement to the base summarization system, called the profile manager, which tracks prior references to a given entity by extracting descriptions for later use in summarization. The component includes the entity Radev and McKeown Generating Natural Language Summaries extractor and description extractor modules shown in Figure 1 and has the following features: * It builds a database of profiles for entities by storing descriptions from a collected corpus of past news.</Paragraph> <Paragraph position="3"> * It operates in real time, allowing for connections with the latest breaking, on-line news to extract information about the most recently mentioned individuals and organizations.</Paragraph> <Paragraph position="4"> * It collects and merges information from sources, thus allowing for a more complete record and reuse of information.</Paragraph> <Paragraph position="5"> * As it parses and identifies descriptions, it builds a lexicalized, syntactic representation of the description in a form suitable for input to the FUF/SURGE language generation system.</Paragraph> <Paragraph position="6"> As a result, SUMMONS will be able to combine descriptions from articles appearing only a few minutes before the ones being summarized with descriptions from past news in a permanent storage for future use.</Paragraph> <Paragraph position="7"> Since the profile manager constructs a lexicalized, syntactic FD from the extracted description, the generator can reuse the description in new contexts, merging it with other descriptions, into a new grammatical sentence. This would not be possible if only canned strings were used, with no information about their internal structure. Thus, in addition to collecting a knowledge source that provides identifying features of individuals, the profile manager also provides a lexicon of domain-appropriate phrases that can be integrated with individual words from a generator's lexicon to produce summary wording in a flexible fashion.</Paragraph> <Paragraph position="8"> We have extended the profile manager by semantically categorizing descriptions using WordNet, so that a generator can more easily determine which description is relevant in different contexts.</Paragraph> <Paragraph position="9"> The profile manager can also be used in a real-time fashion to monitor entities and the changes of descriptions associated with them over the course of time.</Paragraph> <Paragraph position="10"> The rest of this section discusses the stages involved in the collection and reuse of descriptions.</Paragraph> <Section position="1" start_page="488" end_page="490" type="sub_section"> <SectionTitle> 5.1 Creation of a Database of Profiles </SectionTitle> <Paragraph position="0"> In this subsection, we describe the description management module of SUMMONS shown in Figure 1. We explain how entity names and descriptions for them are extracted from old newswire and how these descriptions are converted to FDs for surface generation.</Paragraph> <Paragraph position="1"> an initial set of descriptions, we used a 1.7 MB corpus containing Reuters newswire from February to June of 1995. Later, we used a Web-based interface that allowed anyone on the Internet to type in an entity name and force a robot to search for documents containing mentions of the entity and extract the relevant descriptions. These descriptions are then also added to the database.</Paragraph> <Paragraph position="2"> At this stage, search is limited to the database of retrieved descriptions only, thus reducing search time, as no connections will be made to external news sources at the time of the query. Only when a suitable stored description cannot be found will the system initiate search of additional text.</Paragraph> <Paragraph position="3"> using the POS part-of-speech tagger (Church 1988), we used a CREP (Duford 1993) regular grammar to first extract all possible candidates for entities. These consist of all sequences of words that were tagged as proper nouns (NP) by POS. Our manual analysis showed that out of a total of 2150 entities recovered in this way, 1139 (52.9%) are not names of entities. Among these are bigrams such as Prime Minister or Egyptian President that were tagged as NP by POS. Table 1 shows how many entities we retrieve at this stage, and of them, how many pass the semantic filtering test.</Paragraph> <Paragraph position="4"> * Weeding out of false candidates. Our system analyzed all candidates for entity names using WordNet (Miller et al. 1990) and removed from consideration those that contain words appearing in WordNet's dictionary. This resulted in a list of 421 unique entity names that we used for the automatic description extraction stage. All 421 entity names retrieved by the system are indeed proper nouns.</Paragraph> <Paragraph position="5"> scriptions using finite-state techniques. The first case is when the entity that we want to describe was already extracted automatically (see Section 5.1.1) and exists in the database of descriptions. The second case is when we want a description to be retrieved in real time based on a request from the generation component.</Paragraph> <Paragraph position="6"> In the first stage, the profile manager generates finite-state representations of the entities that need to be described. These full expressions are used as input to the description extraction module, which uses them to find candidate sentences in the corpus for extracting descriptions. Since the need for a description may arise at a later time than when the entity was found and may require searching new text, the description finder must first locate these expressions in the text.</Paragraph> <Paragraph position="7"> These representations are fed to CREP, which extracts noun phrases on either side of the entity (either pre-modifiers or appositions) from the news corpus. The finite-state grammar for noun phrases that we use represents a variety of different syntactic structures for both pre-modifiers and appositions. Thus, they may range from a simple noun (e.g., &quot;president Bill Clinton&quot;) to a much longer expression (e.g., &quot;Gilberto Rodriguez Orejuela, the head of the Cali cocaine cartel&quot;). Other forms of descriptions, such as relative clauses, are the focus of ongoing implementation.</Paragraph> <Paragraph position="8"> Table 2 shows some of the different patterns retrieved. For example, when the profile manager has retrieved the description the political arm of the Irish Republican Army for Sinn Fein, it looks at the head noun in the description NP (arm), which we manually added to the list of trigger words to be categorized as an organization (see next subsection). It is important to notice that even though WordNet typically presents problems with disambiguation of words retrieved from arbitrary text, we don't have any trouble disambiguating arm in this case due to the constraints on the context in which it appears (as an apposition describing an entity).</Paragraph> <Paragraph position="9"> 5.1.3 Categorization of Descriptions. We use WordNet to group extracted descriptions into categories. For the head noun of the description NP, we try to find a WordNet hypemym that can restrict the semantics of the description. Currently, we identify concepts such as &quot;profession, .... nationality,&quot; and &quot;organization.&quot; Each of these concepts is triggered by one or more words (which we call trigger terms) in the description. Table 2 shows some examples of descriptions and the concepts under which they are classified based on the WordNet hypernyms for some trigger words. For example, all of the following triggers in the list (minister, head, administrator, and commissioner) can be traced up to leader in the WordNet hierarchy. We have currently a list of 75 such trigger words that we have compiled manually.</Paragraph> <Paragraph position="10"> 5.1.4 Organization of Descriptions in a Database of Profiles. For each retrieved entity we create a new profile in a database of profiles. We keep information about the surface string that is used to describe the entity in newswire (e.g., Addis Ababa), the source of the description and the date that the entry has been made in the database (e.g., &quot;reuters95_06_25&quot;). In addition to these pieces of metainformation, all retrieved descriptions and their frequencies are also stored.</Paragraph> <Paragraph position="11"> Currently, our system doesn't have the capability of matching references to the same entity that use different wordings. As a result, we keep separate profiles for each of the following: Robert Dole, Dole, and Bob Dole. We use each of these strings as the key in the database of descriptions.</Paragraph> <Paragraph position="12"> Figure 12 shows the profile associated with the key John Major. It can be seen that four different descriptions have been used in the parsed corpus to describe John Major. Two of the four are common and are used in SUMMONS, whereas the other two result from incorrect processing by POS and/or CREE The database of profiles is updated every time a query retrieves new descriptions matching a certain key.</Paragraph> </Section> <Section position="2" start_page="490" end_page="492" type="sub_section"> <SectionTitle> 5.2 Generation of Descriptions </SectionTitle> <Paragraph position="0"> When presenting an entity to the user, the content planner of a language generation system may decide to include some background information about it if the user has Generated FD for Silvio Berlusconi.</Paragraph> <Paragraph position="1"> not previously seen the entity. When the extracted information doesn't contain an appropriate description, the system can use some descriptions retrieved by the profile manager.</Paragraph> <Paragraph position="2"> 5.2.1 Transformation of Descriptions into Functional Descriptions. In order to reuse the extracted descriptions in the generation of summaries, we have developed a module that converts finite-state descriptions retrieved by the description extractor into functional descriptions that we can use directly in generation. A description retrieved by the system is shown in Figure 13. The corresponding FD is shown in Figure 14. and to represent them into FDs but we haven't yet implemented the module for including them in the output summary. We have focused so far on identifying when this kind of generation will be needed: Radev and McKeown Generating Natural Language Summaries Grammaticality. The deeper representation allows for grammatical transformations, such as aggregation: e.g., president Yeltsin + president Clinton can be generated as presidents Yeltsin and Clinton.</Paragraph> <Paragraph position="3"> Unification with existing ontologies. For example, if an ontology contains information about the word president as being a realization of the concept &quot;head of state,&quot; then under certain conditions, the description can be replaced by a different one that realizes the concept of &quot;head of state.&quot; Generation of referring expressions. In the previous example, if president Bill Clinton is used in a sentence, then head of state can be used as a referring expression in a subsequent sentence.</Paragraph> <Paragraph position="4"> Modification/Update of descriptions. If we have retrieved prime minister as a description for Silvio Berlusconi, and later we obtain knowledge that someone else has become Italy's prime minister, then we can generate former prime minister using a transformation of the old FD. Lexical choice. When different descriptions are automatically marked for semantics, the profile manager can prefer to generate one over another based on semantic features. This is useful if a summary discusses events related to one description associated with the entity more than the others. For example, when an article concerns Bill Clinton on the campaign trail, then the description democratic presidential candidate is more appropriate. On the other hand, when an article concerns an international summit of world leaders, then the description U.S. President is more appropriate. Merging lexicons. The lexicon generated automatically by the system can be merged with a manually compiled domain lexicon.</Paragraph> </Section> </Section> <Section position="9" start_page="492" end_page="495" type="metho"> <SectionTitle> 6. System Status </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="492" end_page="493" type="sub_section"> <SectionTitle> 6.1 Summary Generation </SectionTitle> <Paragraph position="0"> Currently, our system can produce simple summaries consisting of one- to three- sentence paragraphs, which are limited to the MUC domain and to a few additional events for which we have manually created MUC-like templates. We have also implemented the modules to connect to the World Factbook. We have converted all ontologies related to the MUC and the Factbook into FDs. The user model which would allow users to specify preferred sources of information, frequency of briefings, etc., hasn't been fully implemented yet.</Paragraph> <Paragraph position="1"> A problem that we haven't addressed is related to the clustering of articles according to their relevance to a specific event. This is an area that requires further research. Another such area is the development of algorithms for grouping together articles that belong to the same topic.</Paragraph> <Paragraph position="2"> Finally, one of our main topics for future work is the development of techniques that can generate summary updates. To do this, we must make use of a discourse model that represents the content and wording of summaries that have already been presented to the user. When generating an update, the summarizer must avoid repeating content and, at the same time, must be able to generate references to entities and events that were previously described.</Paragraph> </Section> <Section position="2" start_page="493" end_page="493" type="sub_section"> <SectionTitle> Computational Linguistics Volume 24, Number 3 6.2 The Description Generator </SectionTitle> <Paragraph position="0"> At the current stage, the description generator has the following coverage: * Syntactic coverage. Currently, the system includes an extensive finite-state gran~nar that can handle various premodifiers and appositions. The grammar matches arbitrary noun phrases in each of these two cases to the extent that the POS part-of-speech tagger provides a correct tagging.</Paragraph> <Paragraph position="1"> * Precision. In Section 5.1.1 we showed the precision of the extraction of entity names. Similarly, we have computed the precision of retrieved 611 descriptions using randomly selected entities from the list retrieved in Section 5.1.1. Of the 611 descriptions, 551 (90.2%) were correct. The others included a roughly equal number of cases of incorrect NP attachment and incorrect part-of-speech assignment.</Paragraph> <Paragraph position="2"> * Length of descriptions. The longest description retrieved by the system was nine lexical items long: Maurizio Gucci, the former head of Italy's Gucci fashion dynasty. The shortest descriptions are one lexical item in length---e.g. President Bill Clinton.</Paragraph> <Paragraph position="3"> * Protocol coverage. We have implemented retrieval facilities to extract descriptions using the NNTP (Usenet News) and HTTP (World-Wide Web) protocols. These modules can be easily reused in other systems with similar architecture to ours.</Paragraph> <Paragraph position="4"> 6.2.1 Limitations. Our system currently doesn't handle entity cross-referencing. It will not realize that Clinton and Bill Clinton refer to the same person. Nor will it link a person's profile with the profile of the organization of which he is a member. We should note that extensive research in this field exists and we plan to make use of one of the proposed methods (Wacholder, Ravin, and Choi 1997) to solve this problem.</Paragraph> </Section> <Section position="3" start_page="493" end_page="493" type="sub_section"> <SectionTitle> 6.3 Portability </SectionTitle> <Paragraph position="0"> An important issue is portability of SUMMONS to other domains. There are no a priori restrictions in our approach that would limit SUMMONS to template-based inputs (and hence, shallow knowledge representation schemes without recursion). It would be interesting to determine the actual number of different representation schemes for news in general.</Paragraph> <Paragraph position="1"> Since there exist systems that can learn extraction rules for unrestricted domains (Lehnert et al. 1993), the information extraction doesn't seem to present any fundamental bottleneck either. Rather the questions are: how many man-hours are required to convert to each new domain? and how many of the rules from one domain are applicable to each new domain? There are no clear answers to these questions. The library of planning operators used in SUMMONS is extensible and can be ported to other domains, although it is likely that new operators will be needed. In addition, new vocabulary will also be needed. The authors plan to perform a portability analysis and report on it in the future.</Paragraph> </Section> <Section position="4" start_page="493" end_page="494" type="sub_section"> <SectionTitle> 6.4 Suggested Evaluation </SectionTitle> <Paragraph position="0"> Given that no alternative approaches to conceptual summarization of multiple articles exist, we have found it very hard to perform an adequate evaluation of the summaries generated by SUMMONS. We consider several potential evaluations: qualitative (user satisfaction and readability) and task-based. In a task-based evaluation, one set of Radev and McKeown Generating Natural Language Surrunaries judges would have access to the full set of articles, while another set of evaluators would have the summaries generated by SUMMONS. The task would involve decision making (e.g., deciding whether the same organization has been involved in multiple incidents). The time for decision making will be plotted against the accuracy of the answers provided by the judges from the two sets. A third set of judges might have access to summaries generated by sununarizers based on sentence extraction from multiple documents. Similar evaluation techniques have been proposed for single-document summarizers (Jing et al. 1998).</Paragraph> <Paragraph position="1"> 7. Future Work The prototype system that we have developed serves as the springboard for research in a variety of directions. First and foremost is the need to use statistical techniques to increase the robustness and vocabulary of the system. Since we were looking for phrasings that mark summarization in a full article that includes other material as well, for a first pass we found it necessary to do a manual analysis in order to determine which phrases were used for summarization. In other words, we knew of no automatic way of identifying summary phrases. However, having an initial seed set of summary phrases might allow us to automate a second pass analysis of the corpus by looking for variant patterns of the ones we have found.</Paragraph> <Paragraph position="2"> By using automated, statistical techniques to find additional phrases, we could increase the size of the lexicon and use the additional phrases to identify new summarization strategies to add to our stock of operators.</Paragraph> <Paragraph position="3"> Our summary generator could be used both for evaluating message understanding systems by using the summaries to highlight differences between systems and for identifying weaknesses in the current systems. We have already noted a number of drawbacks with the current output, which makes summarization more difficult, giving the generator less information to work with. For example, it is only sometimes indicated in the output that a reference to a person, place, or event is identical to an earlier reference; there is no connection across articles; the source of the report is not included. Finally, the structure of the template representation is somewhat shallow, being closer to a database record than a knowledge representation. This means that the generator's knowledge of different features of the event and relations between them is somewhat shallow.</Paragraph> </Section> <Section position="5" start_page="494" end_page="495" type="sub_section"> <SectionTitle> 7.1 Generation of Descriptions </SectionTitle> <Paragraph position="0"> One of the more important current goals is to increase coverage of the system by providing interfaces to a large number of on-line sources of news. We would ideally want to build a comprehensive and shareable database of profiles that can be queried over the World-Wide Web. The database will have a defined interface that will allow for systems such as SUMMONS to connect to it.</Paragraph> <Paragraph position="1"> Another goal of our research is the generation of evolving summaries that continuously update the user on a given topic of interest. In that case, the system will have a model containing all prior interaction with the user. To avoid repetitiveness, such a system will have to resort to using different descriptions (as well as referring expressions) to address a specific entity. 4 We will be investigating an algorithm that will select a proper ordering of multiple descriptions referring to the same person within the same discourse.</Paragraph> <Paragraph position="2"> Computational Linguistics Volume 24, Number 3 After we collect a series of descriptions for each possible entity, we need to decide how to select among them. There are two scenarios. In the first one, we have to pick one single description from the database that best fits the summary we are generating. In the second scenario, the evolving summary, we have to generate a sequence of descriptions, which might possibly view the entity from different perspectives. We are investigating algorithms that will decide the order of generation of the different descriptions. Among the factors that will influence the selection and ordering of descriptions, we can note the user's interests, his knowledge of the entity, and the focus of the summary (e.g., democratic presidential candidate for Bill Clinton, versus U.S. president).</Paragraph> <Paragraph position="3"> We can also select one description over another based on how recently they have been included in the database, whether or not one of them has been used in a summary already, whether the summary is an update to an earlier summary, and whether another description from the same category has been used already. We have yet to decide under what circumstances a description needs to be generated at all.</Paragraph> <Paragraph position="4"> We are interested in implementing existing algorithms or designing our own that will match different instances of the same entity appearing in different syntactic forms, e.g., to establish that PLO is an alias for the Palestine Liberation Organization. We will investigate using co-occurrence information to match acronyms to full organization names as well as alternative spellings of the same name.</Paragraph> <Paragraph position="5"> We will also look into connecting the current interface with news available on the Internet and with an existing search engine such as Lycos, AltaVista, or Yahoo. We can then use the existing indices of all Web documents mentioning a given entity as a news corpus on which to perform the extraction of descriptions.</Paragraph> </Section> </Section> class="xml-element"></Paper>