File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-2164_metho.xml
Size: 8,155 bytes
Last Modified: 2025-10-06 14:14:21
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-2164"> <Title>A Method for Abstracting Newspaper Articles by Using Surface Clues</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Surface Features of a Sen- </SectionTitle> <Paragraph position="0"> tence The proposed method is to create an abstract by determining important sentences according to features extracted from each sentence. For each sentence in a given Japanese newspaper article, the following features 1 are analyzed:</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> * Important Keywords: </SectionTitle> <Paragraph position="0"> An important keyword is defined as a keyword that appears in another sentence or in a title. The number of points for this feature is tile total number of occurrences of impot'tant keywords.</Paragraph> </Section> <Section position="5" start_page="0" end_page="974" type="metho"> <SectionTitle> * Tense: </SectionTitle> <Paragraph position="0"> The tense of a sentence is analyzed as past or present. This feature gives 1 point for present, and 0 for past.</Paragraph> <Paragraph position="1"> l blest of these features were proposed in the previous studies. Keywords were proposed in \[6\], sentence location was proposed in \[1\], sentence type was proposed in \[1, 9\], etc., and rhetorica\] relations were proposed in studies using rhetorical structures sud, as \[a\].</Paragraph> <Paragraph position="2"> * Type of a Sentence: Sentence types are fact, conjecture, or insistence. This feature gives 0 points for fact, 1 for conjecture, and 2 for insistence.</Paragraph> </Section> <Section position="6" start_page="974" end_page="975" type="metho"> <SectionTitle> * Rhetorical Relation: </SectionTitle> <Paragraph position="0"> The rhetorical relations to the preceding context is analyzed as example, adverse, parallel, comparison, or connection. This feature gives 1 point for reason, 2 for example, and 0 for others.</Paragraph> <Paragraph position="1"> * Distance from the beginning of a text: In general, sentences located near the beginning of a text tend to be important. Therefore, sentences in the first paragraph are given 5 points for this feature, sentences in the next paragraph 4, and so on.</Paragraph> <Paragraph position="2"> * Distance from the end of a text: Sentences located near the ending of a text also tend to be important. Therefore, sentences in the last paragraph are given 5 points for this feature, sentences in the previous paragraph 4, and so on. The tense of a sentence is simply determined to be past if it has &quot;ta&quot; (an inflection for the past tense) in the last phr~e3 The reason why tense is used is that sentences stating about the current fact seem to be more important than ones about the past fact in the context of editorial articles.</Paragraph> <Paragraph position="3"> The sentence type is determined by checking special expressions in the last phrase, a For instance, if the final phrase contains &quot;bekida&quot; (&quot;should&quot;) or &quot;nakerebanaranai&quot; (&quot;must&quot;), then its sentence type is insistence; if it contains &quot;darou&quot; (&quot;probably ...&quot;), then its type is conjecture; otherwise, its type is fact. Examples of special expressions used to determine sentence type are as follows: * Conjecture: kamosirenal (may), kanenai (be capable of), souda (likely to), youda (likely to), darou (probably), etc.</Paragraph> <Paragraph position="4"> * Insistence: tai (want to do), hosii (want someone to do), bekida (should), nakereba-naranai (must), taisetu-dearu (important), hituyouda (necessary), etc.</Paragraph> <Paragraph position="5"> 2 In this method, past does not imply the past tense lit a strict sense but rather ;the sentence is not in the present tense. In Japanese, &quot;ta&quot; implies the past tense, completion, and so on. Most cases are actual instances of the past tense.</Paragraph> <Paragraph position="6"> nit is sufficient to check in fire last phrase for Japanese sentences, because a predicative phrase is always located at the end of a Japanese senteltce. Therefore, another strategy is needed for languages in which a predicative phrase may be located in the middle of a sentence.</Paragraph> <Paragraph position="7"> The rhetorical relation is determined by checking special expressions both in the first phrase and in the last phrase of a sentence. For instance, if &quot;sitakarada&quot; 4 is found in the last phrase, then the rhetorical relation is reason, and if the conjunction &quot;sikasi&quot; (&quot;but&quot;) is found, then the rhetorical relation is adverse. 5 Examples of special expressions used to determine rhetorical relations are listed below: * Example: tatoeba (for instance), nado (etc.), etc.</Paragraph> <Paragraph position="8"> * Adverse: sikasi (but), tokoroga (however), etc.</Paragraph> <Paragraph position="9"> * Comparison: koreni-taisi (while), etc.</Paragraph> <Paragraph position="10"> * Parallel: mata (further), sarani (in addition), etc.</Paragraph> <Paragraph position="11"> * Reason: karada (because), tameda (because), etc.</Paragraph> <Paragraph position="12"> 3 Process of Creating an Abstract null The basic method for creating an abstract in most previous studies has been to analyze the sentences of a text in terms of some surface features, and a heuristic to determine the most important sentences on the basis of these features.</Paragraph> <Paragraph position="13"> The method proposed in this paper formalizes the above approach so that the importance of each sentence is calculated as the sum of feature points multiplied by their feature weights. The most important sentences are then extracted as an abstract. The importance S of a sentence is calculated as follows: r~ i=t where a is a constant, P/ is the number of points assigned to the i-th feature, which is normalized to be between 0 and 1, and Wi is the weight assigned to the i-th feature.</Paragraph> <Paragraph position="14"> The steps in creating an abstract are as follows: 1. For each sentence, calculate the importance.</Paragraph> <Paragraph position="15"> 2. Select the sentence that has the highest importance value among the unselected sentences.</Paragraph> <Paragraph position="16"> 3. If the selected sentence sl has another sentence based on \[10\].</Paragraph> <Paragraph position="17"> 4. If the ratio of the number of selected sentences to the number of sentences in the text exceeds the specified one, then terminate this process; otherwise, goto 2.</Paragraph> <Paragraph position="18"> These steps select sentences on the basis of their importance value, but they also respect the rhetorical structure to some extent (step 3), because if the rhetorical structure is totally ignored, the output text will be awkward to read.</Paragraph> </Section> <Section position="7" start_page="975" end_page="975" type="metho"> <SectionTitle> 4 A Method for Determining </SectionTitle> <Paragraph position="0"> the Weights of Features Most previous systems can be considered to determine the weights of features according to human intuition. On the other hand, this paper proposes a method for determining the wieghts of features by multiple-regression analysis of correct examples, which are abstracts created by testers. A tester selects important sentences that should be included in an abstract. The importance value of a sentence is defined as the number of supporters (testers who selected it as an important one) divided by the total number of testers. Let this importance value be S; we then get the following equation for each sentence: S=a+LWI*Pi iml where, a is a constant, Pi is the number of points assigned to the i-th featnre which is normalized to be between 0 to l, and Wi is the weight assigned to the i-th feature.</Paragraph> <Paragraph position="1"> In this equation, Wi is the only variable. Therefore, the feature weight Wi is calculated by multiple-regression analysis.</Paragraph> </Section> class="xml-element"></Paper>