File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/96/c96-2164_evalu.xml
Size: 3,924 bytes
Last Modified: 2025-10-06 14:00:21
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-2164"> <Title>A Method for Abstracting Newspaper Articles by Using Surface Clues</Title> <Section position="8" start_page="975" end_page="976" type="evalu"> <SectionTitle> 5 Experiment </SectionTitle> <Paragraph position="0"> We conducted an experiment to check the validity of the proposed method.</Paragraph> <Paragraph position="1"> The testers were divided into two groups, A and B, each consisting 10 people. Those in group A selected important sentences (about 1/3 of the article) in 5 editorials and 3 general articles from the Nikkei Newspaper. Those in group B selected important sentences (about 1/3 of the article) in 3 editorials and 3 general articles, which were different from those used for group used for group B are shown in Figures 1 (a) and 2 (a), respectively. In each of these figures, the first number is a sentence number, the second number is the number of supporters in group B, and the last part is a rough English translation. G Table 1 shows two weight sets; weight set 1 was created by the author in such a way that sentences located near the beginning and end are regarded as important, sentence importance is not proportional to points for rhetorical relation, and the importance of insistence-type sentences is higher in editorials than in general articles\] Weight set 2, on the other hand, was calculated from the results obtained from group A by the method described in the previous section. Weight set 2 for general articles implies that sentences near the beginning are more important than ones near the end, and insistence-type sentences are less important, and so on. On the other hand, weight set 2 for editorials implies that sentences both near the beginning and near the end are important, and that insistence-type sentences are important, s To check the validity of these weight sets, we compared the abstracts created by the system, using weight set 1 and 2, from the articles supplied to group B, with materiM for creating a practical system, because it is difficult to ask enough people to do this experiment to ensure that the result is statistically meaningful. However, the generM tendency can be extracted, and the weight set is determined on the basis of this experiment.</Paragraph> <Paragraph position="2"> the abstracts created by group B. For the general article in Figure 1 (a), the three most important sentences (roughly 1/3 of the article) determined by using the weight sets 1 and 2 are listed in Figures 1 (b) and (c), respectively. In this case, the three most important sentences selected by grou t) B were 0, 2, and 3. Likewise, for the editorial in Figure 2 (a), the eight most iml)ortant sentences (roughly 1/a of the article) determined by using weight sets l attd 2 are listed in Figures 2 (b) and (c), respectively, in this case, the eight most important sentences selected by grou I) 11 are 0, 2, 3, 12, 15, 20, 21, 22. Here, we introduce the following metric of estrangement to check which abstract is most similar to the result of group B: Estrangement = ~.,,(the number of supporters of a sentence si) - ~_~.</(the number of Sul)porters of a sentence s j) where s{ is a sentence that is included in an abstract by group B but not in an abstract created by the system, and s./ is a sentence that is not included in an abstract by group B but is included in an abstract created by the system.</Paragraph> <Paragraph position="3"> The estrangements of the articles in Figures 1 and 2 are as follows: From this result, the weight set 2 calcu- null to adjust these heuristics for the given text. This paper proposed a method for this adjustment; that is, a method for determining weights of surface features by multiple-regression analysis of abstracts created by human. By using this method, a system can have an ability to be applied to a variety of texts.</Paragraph> </Section> class="xml-element"></Paper>