File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/h05-1031_intro.xml

Size: 15,624 bytes

Last Modified: 2025-10-06 14:02:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="H05-1031">
  <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 241-248, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Automatically Learning Cognitive Status for Multi-Document Summarization of Newswire</Title>
  <Section position="2" start_page="0" end_page="243" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Multi-document summarization has been an active area of research over the past decade (Mani and Maybury, 1999) and yet, barring a few exceptions (Daum*e III et al., 2002; Radev and McKeown, 1998), most systems still use shallow features to produce an extractive summary, an age-old technique (Luhn, 1958) that has well-known problems. Extractive summaries contain phrases that the reader cannot understand out of context (Paice, 1990) and irrelevant phrases that happen to occur in a relevant sentence (Knight and Marcu, 2000; Barzilay, 2003).</Paragraph>
    <Paragraph position="1"> Referring expressions in extractive summaries illustrate this problem, as sentences compiled from different documents might contain too little, too much or repeated information about the referent.</Paragraph>
    <Paragraph position="2"> Whether a referring expression is appropriate depends on the location of the referent in the hearer's mental model of the discourse the referent's cognitive status (Gundel et al., 1993). If, for example, the referent is unknown to the reader at the point of mention in the discourse, the reference should include a description, while if the referent was known to the reader, no descriptive details are necessary.</Paragraph>
    <Paragraph position="3"> Determining a referent's cognitive status, however, implies the need to model the intended audience of the summary. Can such a cognitive status model be inferred automatically for a general readership? In this paper, we address this question by performing a study with human subjects to con rm that reasonable agreement on the distinctions can be achieved between different humans (cf. a2 5). We present an automatic approach for inferring what the typical reader is likely to know about people in the news. Our approach uses machine learning, exploiting features based on the form of references to people in the input news articles (cf. a2 4). Learning cognitive status of referents is necessary if we want to ultimately generate new, more appropriate references for news summaries.</Paragraph>
    <Section position="1" start_page="0" end_page="241" type="sub_section">
      <SectionTitle>
1.1 Cognitive status
</SectionTitle>
      <Paragraph position="0"> In human communication, the wording used by speakers to refer to a discourse entity depends on their communicative goal and their beliefs about what listeners already know. The speaker's goals and beliefs about the listener's knowledge are both a part of a cognitive/mental model of the discourse.</Paragraph>
      <Paragraph position="1">  Cognitive status distinctions depend on two parameters related to the referent a) whether it already exists in the hearer's model of the discourse, and b) its degree of salience. The in uence of these distinctions on the form of referring expressions has been investigated in the past. For example, centering theory (Grosz et al., 1995) deals predominantly with local salience (local attentional status), and the givenness hierarchy (information status) of Prince (1992) focuses on how a referent got in the discourse model (e.g. through a direct mention in the current discourse, through previous knowledge, or through inference), leading to distinctions such as discourseold, discourse-new, hearer-old, hearer-new, inferable and containing inferable. Gundel et al. (1993) attempt to merge salience and givenness in a single hierarchy consisting of six distinctions in cognitive status (in focus, activated, familiar, uniquely identiable, referential, type-identi able).</Paragraph>
      <Paragraph position="2"> Among the distinctions that have an impact on the form of references in a summary are the familiarity of the referent: D. Discourse-old vs discourse-new H. Hearer-old vs hearer-new and its global salience1: M. Major vs minor In general, initial (discourse-new) references to entities are longer and more descriptive, while subsequent (discourse-old) references are shorter and have a purely referential function. Nenkova and McKeown (2003) have studied this distinction for references to people in summaries and how it can be used to automatically rewrite summaries to achieve better uency and readability.</Paragraph>
      <Paragraph position="3"> The other two cognitive status distinctions, whether an entity is central to the summary or not (major or minor) and whether the hearer can be assumed to be already familiar with the entity (hearerold vs hearer-new status), have not been previously studied in the context of summarization. There is a tradeoff, particularly important for a short summary, between what the speaker wants to convey 1The notion of global salience is very important to summarization, both during content selection and during generation on initial references to entities. On the other hand, in focus or local attentional state are relevant to anaphoric usage during subsequent mentions.</Paragraph>
      <Paragraph position="4"> and how much the listener needs to know. The hearer-old/new distinction can be used to determine whether a description for a character is required from the listener's perspective. The major/minor distinction plays a role in de ning the communicative goal, such as what the summary should be about and which characters are important enough to refer to by name.</Paragraph>
    </Section>
    <Section position="2" start_page="241" end_page="241" type="sub_section">
      <SectionTitle>
1.2 Hearer-Old vs Hearer-New
</SectionTitle>
      <Paragraph position="0"> Hearer-new entities in a summary should be described in necessary detail, while hearer-old entities do not require an introductory description. This distinction can have a signi cant impact on over-all length and intelligibility of the produced summaries. Usually, summaries are very short, 100 or 200 words, for input articles totaling 5,000 words or more. Several people might be involved in a story, which means that if all participants are fully described, little space will be devoted to actual news. In addition, introducing already familiar entities might distract the reader from the main story (Grice, 1975). It is thus a good strategy to refer to an entity that can be assumed hearer-old by just a title + last name, e.g. President Bush, or by full name only, with no accompanying description, e.g.</Paragraph>
      <Paragraph position="1"> Michael Jackson.</Paragraph>
    </Section>
    <Section position="3" start_page="241" end_page="242" type="sub_section">
      <SectionTitle>
1.3 Major vs Minor
</SectionTitle>
      <Paragraph position="0"> Another distinction that human summarizers make is whether a character in a story is a major or a minor one and this distinction can be conveyed by using different forms of referring expressions. It is common to see in human summaries references such as the dissident's father. Usually, discourse-initial references solely by common noun, without the inclusion of the person's name, are employed when the person is not the main focus of a story (Sanford et al., 1988). By detecting the cognitive status of a character, we can decide whether to name the character in the summary. Furthermore, many summarization systems use the presence of named entities as a feature for computing the importance of a sentence (Saggion and Gaizaukas, 2004; Guo et al., 2003). The ability to identify the major story characters and use only them for sentence weighting can bene t such systems since only 5% of all people mentioned in the input are also mentioned in the summaries.</Paragraph>
      <Paragraph position="1">  2 Why care about people in the news? News reports (and consequently, news summaries) tend to have frequent references to people (in DUC data - see a2 3 for description - from 2003 and 2004, there were on average 3.85 references to people per 100-word human summary); hence it is important for news summarization systems to have a way of modeling the cognitive status of such referents and a theory for referring to people.</Paragraph>
      <Paragraph position="2"> It is also important to note that there are differences in references to people between news reports and human summaries of news. Journalistic conventions for many mainstream newspapers dictate that initial mentions to people include a minimum description such as their role or title and af liation. However, in human summaries, where there are greater space constraints, the nature of initial references changes. Siddharthan et al. (2004) observed that in DUC'04 and DUC'03 data2, news reports contain on average one appositive phrase or relative clause every 3.9 sentences, while the human summaries contain only one per 8.9 sentences on average. In addition to this, we observe from the same data that the average length of a rst reference to a named entity is 4.5 words in the news reports and only 3.6 words in human summaries. These statistics imply that human summarizers do compress references, and thus can save space in the summary for presenting information about the events. Cognitive status models can inform a system when such reference compression is appropriate.</Paragraph>
      <Paragraph position="3"> 3 Data preparation: the DUC corpus The data we used to train classi ers for these two distinctions is the Document Understanding Conference collection (2001 2004) of 170 pairs of document input sets and the corresponding human-written multi-document summaries (2 or 4 per set). Our aim is to identify every person mentioned in the 10 news reports and the associated human summaries for each set, and assign labels for their cognitive status (hearer old/new and major/minor). To do this, we rst preprocess the data (a2 3.1) and then perform the labeling (a2 3.2).</Paragraph>
      <Paragraph position="4">  of about 10 news reports, 4 human summaries for each set, and the summaries by participating machine summarizers.</Paragraph>
    </Section>
    <Section position="4" start_page="242" end_page="243" type="sub_section">
      <SectionTitle>
3.1 Automatic preprocessing
</SectionTitle>
      <Paragraph position="0"> All documents and summaries were tagged with BBN's IDENTIFINDER (Bikel et al., 1999) for named entities, and with a part-of-speech tagger and simplex noun-phrase chunker (Grover et al., 2000).</Paragraph>
      <Paragraph position="1"> In addition, for each named entity, relative clauses, appositional phrases and copula constructs, as well as pronominal co-reference were also automatically annotated (Siddharthan, 2003). We thus obtained coreference information (cf. Figure 1) for each per-son in each set, across documents and summaries.</Paragraph>
      <Paragraph position="2">  Sakharov from two news report. 'IR' stands for 'initial reference', 'CO' for noun co-reference, 'PR' for pronoun reference, 'AP' for apposition, 'RC' for relative clause and 'IS' for copula constructs.</Paragraph>
      <Paragraph position="3"> The tools that we used were originally developed for processing single documents and we had to adapt them for use in a multi-document setting.</Paragraph>
      <Paragraph position="4"> The goal was to nd, for each person mentioned in an input set, the list of all references to the per-son in both input documents and human summaries.</Paragraph>
      <Paragraph position="5"> For this purpose, all input documents were concatenated and processed with IDENTIFINDER. This was then automatically post-processed to mark-up coreferring names and to assign a unique canonical name (unique id) for each name coreference chain. For the coreference, a simple rule of matching the last name was used, and the canonical name was the First-Name LastName string where the two parts of the name could be identi ed 3. Concatenating all documents assures that the same canonical name will be assigned to all named references to the same person.</Paragraph>
      <Paragraph position="6"> 3Occasionally, two or more different people with the same last name are discussed in the same set and this algorithm would lead to errors in such cases. We did keep a list of rst names associated with the entity, so a more re ned matching model could be developed, but this was not the focus of this work.  The tools for pronoun coreference and clause and apposition identi cation and attachment were run separately on each document. Then the last name of each of the canonical names derived from the IDENTIFINDER output was matched with the initial reference in the generic coreference list for the document with the last name. The tools that we used have been evaluated separately when used in normal single document setting. In our cross-document matching processes, we could incur more errors, for example when the general coreference chain is not accurate. On average, out of 27 unique people per cluster identi ed by IDENTIFINDER, 4 people and the information about them are lost in the matching step for a variety of reasons such as errors in the clause identi er, or the coreference.</Paragraph>
    </Section>
    <Section position="5" start_page="243" end_page="243" type="sub_section">
      <SectionTitle>
3.2 Data labeling
</SectionTitle>
      <Paragraph position="0"> Entities were automatically labeled as hearer-old or new by analyzing the syntactic form that human summarizers used for initial references to them. The labeling rests on the assumption that the people who produced the summaries used their own model of the reader when choosing appropriate references for the summary. The following instructions had been given to the human summarizers, who were not professional journalists: To write this summary, assume you have been given a set of stories on a news topic and that your job is to summarize them for the general news sections of the Washington Post. Your audience is the educated adult American reader with varied interests and background in current and recent events. Thus, the human summarizers were given the freedom to use their assumptions about what entities would be generally hearer-old and they could refer to these entities using short forms such as (1) title or role+ last name or (2) full name only with no pre- or post-modi cation. Entities that the majority of human summarizers for the set referred to using form (1) or (2) were labeled as hearer-old. From the people mentioned in human summaries, we obtained 118 examples of hearer-old and 140 examples of hearer-new persons - 258 examples in total - for supervised machine learning.</Paragraph>
      <Paragraph position="1"> In order to label an entity as major or minor, we again used the human summaries entities that were mentioned by name in at least one summary were labeled major, while those not mentioned by name in any summary were labeled minor. The underlying assumption is that people who are not mentioned in any human summary, or are mentioned without being named, are not important. There were 258 major characters who made it to a human summary and 3926 minor ones that only appeared in the news reports. Such distribution between the two classes is intuitively plausible, since many people in news articles express opinions, make statements or are in some other way indirectly related to the story, while there are only a few main characters.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML