File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/a97-1043_intro.xml

Size: 2,801 bytes

Last Modified: 2025-10-06 14:06:16

<?xml version="1.0" standalone="yes"?>
<Paper uid="A97-1043">
  <Title>An Automatic Extraction of Key Paragraphs Based on Context Dependency</Title>
  <Section position="3" start_page="291" end_page="291" type="intro">
    <SectionTitle>
2 Context Dependency
</SectionTitle>
    <Paragraph position="0"> Like Luhn's assumption about keywords, our method is based on the fact that a writer normally repeats certain words (keywords) as he advances or varies his arguments and as he elaborates on an aspect of a subject (Luhn, 1958). In this paper, we focus on newspaper articles. Figure 1 shows the structure of Wall Street Journal corpus.</Paragraph>
    <Paragraph position="1">  In Figure 1, one day's newspaper articles consist of several different topics such as 'Economic news', 'International news', etc. We call this Domain, and each element ('Economic news', or 'International news') a context. A particular domain, for example, 'Economic news', consists of several articles each of which has different title name. In Figure 1, 'General signal corp.', 'Safecard services inc.', and 'Jostens inc.' show title names. We call this Article, and each element ('General signal corp.' etc) context. Furthermore, a particular article, for example, 'General signal corp.' consists of several paragraphs and key-words of the 'General signal corp.' article appear throughout paragraphs. We call each paragraph context in the Paragraph.</Paragraph>
    <Paragraph position="2"> We introduce a degree of context dependency into the structure of newspaper articles shown in Figure 1 in order to extract keywords. A degree of context dependency is a measure showing how strongly each word related to a given context, a particular context of Paragraph, Article, or Domain. In Figure 1, let 'O' be a keyword in the article 'General signal corp.'. According to Luhn's assumption, 'O' frequently appears throughout paragraphs. Therefore, the deviation value of 'O' in the Paragraph is small.</Paragraph>
    <Paragraph position="3"> On the other hand, the deviation value of 'O' in the Article is larger than that of the Paragraph, since in Article, 'O' appears in a particular element of the Article, 'General signal corp.'. Furthermore, the deviation value of 'O' in the Domain is larger than those of the Article and Paragraph, since in the Domain, 'O' appears frequently in a particular context, 'Economic news'. We extracted keywords using this feature of the degree of context dependency. In Figure 1, if a word is a keyword in a given article, it satisfies the following two conditions:  1. The deviation value of a word in the Paragraph is smaller than that of the Article.</Paragraph>
    <Paragraph position="4"> 2. The deviation value of a word in the Article is smaller than that of the Domain.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML