File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-1085_intro.xml
Size: 4,371 bytes
Last Modified: 2025-10-06 14:01:23
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1085"> <Title>Detecting Shifts in News Stories for Paragraph Extraction</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 An Event, A Subject Class and A Subject </SectionTitle> <Paragraph position="0"> Our hypothesis about key paragraphs in multiple documents related to the target event is that they include words related to the subject of a document, a subject class among documents, and the target event. We call these words subject, subject class and event words. The notion of a subject word refers to the theme ofthe document itself,i.e.,something a writer wishes to express, and it appears across paragraphs, butdoes not appear inother documents(Luhn, 1958). A subject class word differentiates it from a specific subject, i.e. it is a broader class of subjects, but narrower than an event. It appears across documents, and these documents discuss related subjects. An event word, on the other hand, is something that occurs at a specific place and time associated with some specific actions, and it appears across documents about the target event.</Paragraph> <Paragraph position="1"> Let us take a look at the following three documents concerning the Kobe Japan quake from the TDT1.</Paragraph> <Paragraph position="2"> 1. Emergency work continues after earthquake in Japan 1-1. Casualties are mounting in [Japan], where a strong [earthquake] eight hours ago struck [Kobe]. Up to 400 {people} related {deaths} are confirmed, thousands of {injuries}, and rescue crews are searching *************** 2. Quake Collapses Buildings in Central Japan 2-1. At least two {people} died and dozens {injuries} when a powerful [earthquake] rolled through central [Japan] Tuesday morning, collapsing buildings and setting off fires in the cities of [Kobe] and Osaka. 2-2. The [Japan] MeteorologicalAgency said the [earthquake], which measured 7.2on the open-ended Richter scale, rumbled across Honshu Island from the Pacific Ocean to the [Japan] Sea.</Paragraph> <Paragraph position="3"> 2-3. The worst hit areas were the port city of [Kobe] and the nearby island of Awajishima where in both places dozens of fires broke out and up to 50 buildings, including several apartment blocks, *************** 3. US forces to fly blankets to Japan quake survivors 3-1. UnitedStatesforcesbasedin[Kobe][Japan]willtake blankets to help [earthquake] survivors Thursday, in the U.S. military's first disaster relief operation in [Japan] since it set up bases here.</Paragraph> <Paragraph position="4"> 3-2. A military transporter was scheduled to take off in the afternoonfrom Yokota air base on the outskirts of Tokyo and fly to Osaka with 37,000 blankets.</Paragraph> <Paragraph position="5"> 3-3. Following the [earthquake] Tuesday, President Clinton offered the assistance of U.S. military forces in [Japan], and Washington provided the Japanese The underlined words in Figure 1 denote a subject wordineachdocument. Wordsmarkedwith'{}'and '[]' refer to a subject class word and an event word, respectively. Words such as 'Kobe' and 'Japan' are associated with an event, since all of these documents concern the Kobe Japan quake. The first document says that emergency work continues after the earthquake in Japan. Underlined words such as 'rescue' and 'crews' denote the subject ofthe document. The second document states that the quake collapsed buildingsincentral Japan. These twodocuments mention the same thing: A powerful earthquake rolled through central Japan, and many people were injured. Therefore, words such as 'people' and 'injuries' which appear in both documents are subject class words, and these documents are classifiedintothesameset. Ifwecandetermine thatthese documents discuss relatedsubjects, wecaneliminate redundancy between them. The third document, on the other hand, states that the US military will fly blankets to Japan quake survivors. The subject of the document is different from the earlier ones, i.e., the subject has shifted.</Paragraph> <Paragraph position="6"> Though it is hard to make a clear distinction between a subject and a subject class, it is easier to find properties to determine whether the later document discusses the same subject as an earlier one or not. Our method exploits this feature of documents.</Paragraph> </Section> class="xml-element"></Paper>