File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/w96-0510_metho.xml

Size: 11,750 bytes

Last Modified: 2025-10-06 14:14:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="W96-0510">
  <Title>Summarization: an Application for NL Generation</Title>
  <Section position="3" start_page="0" end_page="37" type="metho">
    <SectionTitle>
2 Discourse Segmentation
</SectionTitle>
    <Paragraph position="0"> Centering Theory (Grosz, Joshi, and Weinstein, 1995) is a computational model of local discourse coherence which relates each utterance to the previous and the following utterances by keeping track of the center of attention. The most salient entity, the center of attention, at a particular utterance is called the backward looking center (Cb). The Cb is defined as the highest thematically ranked element in the previous utterance that also occurs in the current utterance. If there is a pronoun in the sentence, it is preferred to be Cb.</Paragraph>
    <Paragraph position="1"> Centering Theory can be used to segment a discourse by noting whether the same center of attention, Cb, is preserved from one ut- null terance to another. Basically, we can either CONTINUE to talk about the same entity or SHIFT to a new center. A SHIFT indicates the start of a new discourse segment. 2 In the method that I am proposing, the original text is first divided into segments according to Centering Theory. Then, as described in the following sections, the segments which axe about the most frequent Cb(s) in the text are selected for the summary, and then the discourse relations of elaboration and restatement are used to further prune and select information for the summary.</Paragraph>
  </Section>
  <Section position="4" start_page="37" end_page="38" type="metho">
    <SectionTitle>
3 Content Selection
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="37" end_page="37" type="sub_section">
      <SectionTitle>
3.1 Frequent Centers
</SectionTitle>
      <Paragraph position="0"> After the text has been segmented, we need to decide which of the discourse segments are important for the summary. The most prevalent discourse topic will play a big role in the summary. Thus, the most frequent Cb can be used to select the important segments in the text. I propose the following heuristic: Heuristic 1: Select those segments that are about the most frequent Cb in the text 3 for the summary.</Paragraph>
      <Paragraph position="1"> Picking the most frequent Cb gives better results than simply picking the most frequent words or references as the most important topics in the text. For example, in the sample text (see Section 4) about a new electronic surveillance method being tried on prisoners that will allow them to be under house-arrest, &amp;quot;wristband&amp;quot; occurs just as frequently as &amp;quot;surveillance/supervision', however &amp;quot;surveillance/supervision&amp;quot; is a more frequent Cb than &amp;quot;wristband&amp;quot;, and this reflects the fact that it is a more central topic in the text.</Paragraph>
    </Section>
    <Section position="2" start_page="37" end_page="37" type="sub_section">
      <SectionTitle>
3.2 Pruning Elaborations
</SectionTitle>
      <Paragraph position="0"> While doing the centering analysis of my sample texts, I noticed that it is the segment boundaries, the SHIFTs, that are important for summarization in the discourse anal2There are other cues to discourse segmentation (not yet included in this study) such as tense and aspect continuity and the use of cue words such as &amp;quot;and&amp;quot;. 3More than one frequent Cb can be picked if there are no clear winners.</Paragraph>
      <Paragraph position="1"> ysis of the original text. In fact, the CONTINUE transitions in Centering often correspond to Elaboration relations in RST (Mann and Thompson, 1987). A restricted type of the elaboration relation between sentences can be restated in Centering terms: Elaboration on the same topic: the subject of the clause is a pronoun that refers to the subject of the previous clause - a CONTINUE in centering. null Thus, I propose the following heuristic for pruning the segments in the summary:  the same topic (as defined above) in the summary.</Paragraph>
      <Paragraph position="2"> For example, the second sentence below can be left out of the summary because it is an elaboration on the same topic.</Paragraph>
      <Paragraph position="3">  (1) a. Most county jail inmates did not commit violent crimes.</Paragraph>
      <Paragraph position="5"> b. They're in jail for such things as bad checks or stealing.</Paragraph>
      <Paragraph position="7"/>
    </Section>
    <Section position="3" start_page="37" end_page="38" type="sub_section">
      <SectionTitle>
3.3 Restatement
</SectionTitle>
      <Paragraph position="0"> Another RST relation that is very important for summarization is Restatement, because restatements are a good indicator of important information. Good authors often restate the thesis, often at the beginning and at the end of the text, to ensure that the point of the text gets across. The heuristic used is: Heuristic 3: Select repeated or semantically synonymous LFs (i.e. predicate-argument relations) in the original text for the summary.</Paragraph>
      <Paragraph position="1"> One way to find restatements in the text is to simply search for repeated phrases. However, most good authors restate phrases rather simply repeating them. That is why I propose we search for repeated LFs rather than repeated words or phrases. Since LFs capture the primary relations in a whole clause, their frequency captures dependencies that traditional statistical approaches such as bigrams  and trigrams would miss. However, some inference would be necessary in order to infer whether LFs are. semantically synonymous.</Paragraph>
      <Paragraph position="2"> For example, the following two sentences from the sample text are very similar.</Paragraph>
      <Paragraph position="3"> Their semantic representations contain the propositions call(computer, prisoner) and plugin(prisoner), after anaphora resolution and inferences such as that call(computer, prisoner) is equivalent to make(a computerized call, to a former prisoner's home). Notice that a simple trigram would not recognize &amp;quot;that person answers by plugging in&amp;quot; in (2)b as a restatement of the &amp;quot;prisoner plugs in&amp;quot;. We need to consider the predicate-argument relations instead of simple word collocations.</Paragraph>
      <Paragraph position="4"> (2) a. Whenever a computer randomly calls them from jail, the former prisoner plugs in to let corrections officials know they're in the right place at the right time.</Paragraph>
      <Paragraph position="5"> b. When a computerized call is made to a former prisoner's home, that person answers by plugging in the device.</Paragraph>
      <Paragraph position="6"> Searching for similar LFs captures important information that is restated many times in the text. 4 This method is similar to aggregation methods used in NL generation. Summarization can be seen as a massive application of aggregation algorithms. We need to look for shared elements, agents, propositions, etc. in the semantic representation of the original text in order to aggregate similar elements as well as to recognize important elements that the author restates many times.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="38" end_page="39" type="metho">
    <SectionTitle>
4 An Example Text
</SectionTitle>
    <Paragraph position="0"> The following is a sample text from the Penn Treebank. The A and alternating normal and italicized script mark segment breaks in the text as determined by Centering Theory. Embedded subsegments are shown with brackets.</Paragraph>
    <Paragraph position="1"> The Cbs are shown in bold.</Paragraph>
    <Paragraph position="2"> TEXT: AComputerized phone calls \[which do everything from selling magazine subscriptions to reminding people about meetings\] have become the telephone equivalent of junk mail, 4Many restatements in the texts involve the most frequent Cb which may serve as an additional heuristic. but a new application of the technology is about to be tried out in Massachusetts \[to ease crowded jail conditions\]. AA Next week some inmates IT released early .from the Hampton County jail\] in Springfield will be wearing a wristband \[that T hooks up with a special jack on their home phones\]. \[Whenever a computer randomly calls them .from jail\], the former prisoner plugs in \[\[to let corrections officials know\] they're in the right place at the right time\]\]. A The device is attached to a plastic wristband. It looks like a watch. It functions like an electronic probation officer. A \[When a computerized call is made to a former prisoner's home phone\], that person answers by plugging in the device. A The wristband can be removed only by breaking its clasp and \[if that's done\] the inmate immediately is returned to jail. A The description conjures up images of big brother watching, A but Jay Ash, \[deputy superintendent of the Hampton County jail in Springfield\], says \[the surveillance system is not that sinister\]. Such supervision, \[according to Ash\], is a sensible cost effective alternative to incarceration \[that T should not alarm civil libertarians\]. A Dr.</Paragraph>
    <Paragraph position="3"> Norman Rosenblatt, \[dean of the college of criminal justice at Northeastern University\], agrees. Rosenblatt expects electronic surveillance in parole situations to become more wide spread, and he thinks \[eventually people will get used to the idea\]. A Springfield jail deputy superintendent Ash says \[\[although it will allow some prisoners to be released a few months before their sentences are up\], concerns that may raise about public safety are not well founded\]. AA Most county jail inmates did not commit violent crimes. They're in jail for such things as bad checks or stealing.</Paragraph>
    <Paragraph position="4"> Those on early release must check in with corrections officials fifty times a week according to Ash \[who says about half the contacts for a select group will now be made by the computerized phone calls\]. A Initially the program will involve only a handful of inmates. Ash says the ultimate goal is to use it \[to get about forty out of jail early\]. A The Springfield jail IT built for 270 people\] now houses more than 500. A The content of the summary is selected by picking the two segments with the most fre- null quent Cb, the inmate(s)/prisoner. These are marked with two AAs at the beginning of the segments above. Then, elaborations (i.e.</Paragraph>
    <Paragraph position="5"> CONTINUEs) in these segments are deleted.</Paragraph>
    <Paragraph position="6"> Essentially, this leaves the first sentence of each segment with the Cb the inmates. In addition, we search for restatements in the text. As a result, the following sentences from the text are selected for the summary. The first and third sentences are the first sentences in the segments about the most frequent Cb, the inmates; the second sentence as well as part of the first sentence is given by recognizing restatements in the text.</Paragraph>
    <Paragraph position="7"> Summary: ANext week some inmates released early from the Hampton County jail in Springfield will be wearing a wristband that hooks up with a special jack on their home phones. A When a computerized call is made to a former prisoner's home phone, that person answers by plugging in the device. A Most county jail inmates did not commit violent crimes. A The summary above just shows the relevant portions of the original text (in the original order) selected for the summary. The heuristics for content selection actually operate on LFs; the selected LFs will then be sent to a generator which can plan a more coherent summary than what is produced above. 5</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML