File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-1053_metho.xml
Size: 10,963 bytes
Last Modified: 2025-10-06 14:07:52
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1053"> <Title>Extracting Important Sentences with Support Vector Machines</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Important Sentence Extraction </SectionTitle> <Paragraph position="0"> based on Support Vector Machines</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Support Vector Machines (SVMs) </SectionTitle> <Paragraph position="0"> SVM is a supervised learning algorithm for 2class problems.</Paragraph> <Paragraph position="2"> is its class label, positive(+1) or negative([?]1). SVM separates positive and negative examples by a hyperplane defined by</Paragraph> <Paragraph position="4"> where &quot;*&quot; represents the inner product.</Paragraph> <Paragraph position="5"> In general, such a hyperplane is not unique.</Paragraph> <Paragraph position="6"> Figure 1 shows a linearly separable case. The SVM determines the optimal hyperplane by maximizing the margin. A margin is the distance between negative examples and positive examples.</Paragraph> <Paragraph position="7"> Since training data is not necessarily linearly separable, slackvariables(x</Paragraph> <Paragraph position="9"> incur misclassification error, and should satisfy the following inequalities:</Paragraph> <Paragraph position="11"> Under these constraints, the following objective function is to be minimized.</Paragraph> <Paragraph position="13"> The first term in (3) corresponds to the size of the margin and the second term represents misclassification.</Paragraph> <Paragraph position="14"> By solving a quadratic programming problem, the decisionfunction f(x) = sgn(g(x))can be derived where</Paragraph> <Paragraph position="16"> The decision function depends on only support vectors (x i ). Training examples, except for support vectors, have no influence on the decision function.</Paragraph> <Paragraph position="17"> Non-linear decision surfaces can be realized</Paragraph> <Paragraph position="19"> In this paper, we use polynomial kernel functionsthathavebeenveryeffectivewhenapplied null to other tasks, such as natural language processing(Joachims, 1998; Kudoand Matsumoto, 2001; Kudo and Matsumoto, 2000):</Paragraph> <Paragraph position="21"/> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Sentence Ranking by using Support Vector Machines </SectionTitle> <Paragraph position="0"> Important sentence extraction can be regarded as a two-class problem: important or unimportant. However,theproportionofimportantsentences in training data will differ from that in the test data. The number of important sentences in a document is determined by a summarization rate that is given at run-time. A simple solution for this problem is to rank sentencesin a document. We use g(x)the distance from the hyperplane to x torank the sentences.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 Features </SectionTitle> <Paragraph position="0"> We define the boolean features discussed below that are associated with sentence S i by taking past studies into account (Zechner, 1996; Nobata et al., 2001; Hirao et al., 2001; Nomoto and Matsumoto, 1997).</Paragraph> <Paragraph position="1"> We use 410 boolean variables for each S</Paragraph> <Paragraph position="3"> Where x =(x[1],***,x[410]). Areal-valuedfeature normalized between 0 and 1 is represented by 10 boolean variables. Each variable corresponds to an internal [i/10,(i +1)/10) where i = 0 to 9. For example, Posd =0.75 is represented by &quot;0000000100&quot; because 0.75 belongs to [7/10,8/10).</Paragraph> <Paragraph position="4"> Position of sentences We define three feature functions for the posi-</Paragraph> <Paragraph position="6"> . First, Lead is a boolean that corresponds to the output of the lead-based method</Paragraph> <Paragraph position="8"> 'sposition ina paragraph. The first sentence obtains the highest score, the last obtains the lowest score:</Paragraph> <Paragraph position="10"> we assign 1 to the sentence. An N was given for each documentbyTSCcommittee.</Paragraph> <Paragraph position="12"> the number of characters before S i in the paragraph. null Length of sentences We define a feature function that addresses the length of sentence as</Paragraph> <Paragraph position="14"> Weight of sentences We defined the feature function that weights sentences based on frequency-based word weighting as</Paragraph> <Paragraph position="16"> Here, T is the number of sentence in a docu-</Paragraph> <Paragraph position="18"> The first term of the equation above is the weighting of a word in a specific field. The second term is the occurrence probability of word t.</Paragraph> <Paragraph position="19"> We set parameters a and b as 0.8, 0.2, respectively. The weight of a previous sentence</Paragraph> <Paragraph position="21"> ), and the weight of a next</Paragraph> <Paragraph position="23"> We define the feature function Den(S</Paragraph> <Paragraph position="25"> represents density of key words in a sentence of speech such as &quot;Noun-jiritsu&quot; and &quot;Verbjiritsu&quot; is used in the sentence. The number of part of speech is 66.</Paragraph> <Paragraph position="26"> Semantical depth of nouns x[r]=1 (301[?]r[?]311) if and only if S</Paragraph> <Paragraph position="28"> a noun at a certain semantical depth according toaJapaneselexicon,Goi-Taikei(Ikeharaetal., 1997). The number of depth levels is 11. For instance, Semdep=2 means that a noun in S</Paragraph> <Paragraph position="30"> belongs to the second depth level.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Document genre </SectionTitle> <Paragraph position="0"> x[r]=1 (312[?]r[?]315) if and only if the document belongs to a certain genre. The genre is explicitly written in the header of each document. The number of genres is four: General, National, Editorial, and Commentary.</Paragraph> <Paragraph position="2"> includes an assertive expression.</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="2" type="metho"> <SectionTitle> 3 Experimental settings </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Corpus </SectionTitle> <Paragraph position="0"> We used the data set of TSC (Fukushima and Okumura, 2001) summarization collection for our evaluation. TSC was established as a sub-task of NTCIR-2 (NII-NACSIS Test Collection for IR Systems). The corpus consists of 180</Paragraph> </Section> <Section position="2" start_page="0" end_page="2" type="sub_section"> <SectionTitle> Japanese documents </SectionTitle> <Paragraph position="0"> from the MainichiNewspapers of 1994, 1995, and 1998. In each document, important sentences were manually extracted at summarization rates of 10%, 30%, and 50%. Note that the summarization rates depend on the number of sentences in a document not the number of characters. Table 1 shows the statistics.</Paragraph> </Section> <Section position="3" start_page="2" end_page="2" type="sub_section"> <SectionTitle> 3.2 Evaluated methods </SectionTitle> <Paragraph position="0"> Wecomparedfourmethods: decisiontreelearning, boosting, lead, and SVM. At each summa-</Paragraph> </Section> <Section position="4" start_page="2" end_page="2" type="sub_section"> <SectionTitle> 3.3 Measures for evaluation </SectionTitle> <Paragraph position="0"> In the TSC corpus, the number of sentences to be extracted was explicitly given by the TSC committee. When we extract sentences according to that number, Precision, Recall, and F-measure become the same value. We call this value Accuracy. Accuracy is defined as follows: Accuracy= b/a x100, where a is the specified number of important sentences, and b is the number of true important sentences that were contained in system's output.</Paragraph> </Section> </Section> <Section position="5" start_page="2" end_page="3" type="metho"> <SectionTitle> 4 Results </SectionTitle> <Paragraph position="0"> Table 2 shows the results of five-fold cross validation by using all 180 documents.</Paragraph> <Paragraph position="1"> For all summarization rates and all genres, SVM achieved the highest accuracy, the lead-based method the lowest. Let the null hypothesis be &quot;There are no differences among the scoresofthe fourmethods&quot;. We testedthis null hypothesisat asignificance levelof 1%by using Tukey's method. Although the SVM's performance was best, the differences were not statistically significant at 10%. At 30% and 50%, SVM performed better than the other methods with a statistical significance.</Paragraph> </Section> <Section position="6" start_page="3" end_page="4" type="metho"> <SectionTitle> 5 Discussion </SectionTitle> <Paragraph position="0"> Table 2 shows that Editorial and Commentary are more difficult than the other genres. We can consider two reasons for the poor scores of</Paragraph> <Section position="1" start_page="3" end_page="4" type="sub_section"> <SectionTitle> Editorial and Commentary: </SectionTitle> <Paragraph position="0"> of (i,j).</Paragraph> <Paragraph position="1"> Table 3 shows that the result implies that non-standard features are useful in Editorial and Commentary documents.</Paragraph> <Paragraph position="2"> Now, we examine effective features in each genre. Since we used the second order polynomial kernel, we can expand g(x) as follows:</Paragraph> <Paragraph position="4"> where lscript is the number of support vectors, and</Paragraph> <Paragraph position="6"> We can rewrite it as follows when all vectors [h,k] |was large, the feature or the feature pair had a strong influence on the optimal hyperplane. null genre.</Paragraph> <Paragraph position="7"> Effective features common to three genres at three rates were sentence positions. Since National has a typical newspaper style, the beginning of the document was important. Moreover, &quot;ga&quot; and &quot;ta&quot; were important. These functional words are used when a new event is introduced.</Paragraph> <Paragraph position="8"> In Editorial and Commentary, the end of a paragraph and that of a document were important. The reason for this result is that subtopic or main topic conclusions are common in those positions. This implies that National has a different text structure from Editorial and Commentary. null Moreover, in Editorial, &quot;de&quot; and sentence weight was important. In Commentary, semantically shallow words, sentence weight and the length of a next sentence were important.</Paragraph> <Paragraph position="9"> In short, we confirmed that the feature(s) effectivefordiscriminating a genre differwiththe genre.</Paragraph> </Section> </Section> class="xml-element"></Paper>