File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-1053_metho.xml

Size: 10,963 bytes

Last Modified: 2025-10-06 14:07:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1053">
  <Title>Extracting Important Sentences with Support Vector Machines</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Important Sentence Extraction
</SectionTitle>
    <Paragraph position="0"> based on Support Vector Machines</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Support Vector Machines (SVMs)
</SectionTitle>
      <Paragraph position="0"> SVM is a supervised learning algorithm for 2class problems.</Paragraph>
      <Paragraph position="2"> is its class label, positive(+1) or negative([?]1). SVM separates positive and negative examples by a hyperplane defined by</Paragraph>
      <Paragraph position="4"> where &amp;quot;*&amp;quot; represents the inner product.</Paragraph>
      <Paragraph position="5"> In general, such a hyperplane is not unique.</Paragraph>
      <Paragraph position="6"> Figure 1 shows a linearly separable case. The SVM determines the optimal hyperplane by maximizing the margin. A margin is the distance between negative examples and positive examples.</Paragraph>
      <Paragraph position="7"> Since training data is not necessarily linearly separable, slackvariables(x</Paragraph>
      <Paragraph position="9"> incur misclassification error, and should satisfy the following inequalities:</Paragraph>
      <Paragraph position="11"> Under these constraints, the following objective function is to be minimized.</Paragraph>
      <Paragraph position="13"> The first term in (3) corresponds to the size of the margin and the second term represents misclassification.</Paragraph>
      <Paragraph position="14"> By solving a quadratic programming problem, the decisionfunction f(x) = sgn(g(x))can be derived where</Paragraph>
      <Paragraph position="16"> The decision function depends on only support vectors (x i ). Training examples, except for support vectors, have no influence on the decision function.</Paragraph>
      <Paragraph position="17"> Non-linear decision surfaces can be realized</Paragraph>
      <Paragraph position="19"> In this paper, we use polynomial kernel functionsthathavebeenveryeffectivewhenapplied null to other tasks, such as natural language processing(Joachims, 1998; Kudoand Matsumoto, 2001; Kudo and Matsumoto, 2000):</Paragraph>
      <Paragraph position="21"/>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Sentence Ranking by using Support
Vector Machines
</SectionTitle>
      <Paragraph position="0"> Important sentence extraction can be regarded as a two-class problem: important or unimportant. However,theproportionofimportantsentences in training data will differ from that in the test data. The number of important sentences in a document is determined by a summarization rate that is given at run-time. A simple solution for this problem is to rank sentencesin a document. We use g(x)the distance from the hyperplane to x torank the sentences.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Features
</SectionTitle>
      <Paragraph position="0"> We define the boolean features discussed below that are associated with sentence S i by taking past studies into account (Zechner, 1996; Nobata et al., 2001; Hirao et al., 2001; Nomoto and Matsumoto, 1997).</Paragraph>
      <Paragraph position="1"> We use 410 boolean variables for each S</Paragraph>
      <Paragraph position="3"> Where x =(x[1],***,x[410]). Areal-valuedfeature normalized between 0 and 1 is represented by 10 boolean variables. Each variable corresponds to an internal [i/10,(i +1)/10) where i = 0 to 9. For example, Posd =0.75 is represented by &amp;quot;0000000100&amp;quot; because 0.75 belongs to [7/10,8/10).</Paragraph>
      <Paragraph position="4"> Position of sentences We define three feature functions for the posi-</Paragraph>
      <Paragraph position="6"> . First, Lead is a boolean that corresponds to the output of the lead-based method</Paragraph>
      <Paragraph position="8"> 'sposition ina paragraph. The first sentence obtains the highest score, the last obtains the lowest score:</Paragraph>
      <Paragraph position="10"> we assign 1 to the sentence. An N was given for each documentbyTSCcommittee.</Paragraph>
      <Paragraph position="12"> the number of characters before S i in the paragraph. null Length of sentences We define a feature function that addresses the length of sentence as</Paragraph>
      <Paragraph position="14"> Weight of sentences We defined the feature function that weights sentences based on frequency-based word weighting as</Paragraph>
      <Paragraph position="16"> Here, T is the number of sentence in a docu-</Paragraph>
      <Paragraph position="18"> The first term of the equation above is the weighting of a word in a specific field. The second term is the occurrence probability of word t.</Paragraph>
      <Paragraph position="19"> We set parameters a and b as 0.8, 0.2, respectively. The weight of a previous sentence</Paragraph>
      <Paragraph position="21"> ), and the weight of a next</Paragraph>
      <Paragraph position="23"> We define the feature function Den(S</Paragraph>
      <Paragraph position="25"> represents density of key words in a sentence  of speech such as &amp;quot;Noun-jiritsu&amp;quot; and &amp;quot;Verbjiritsu&amp;quot; is used in the sentence. The number of part of speech is 66.</Paragraph>
      <Paragraph position="26"> Semantical depth of nouns x[r]=1 (301[?]r[?]311) if and only if S</Paragraph>
      <Paragraph position="28"> a noun at a certain semantical depth according toaJapaneselexicon,Goi-Taikei(Ikeharaetal., 1997). The number of depth levels is 11. For instance, Semdep=2 means that a noun in S</Paragraph>
      <Paragraph position="30"> belongs to the second depth level.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Document genre
</SectionTitle>
      <Paragraph position="0"> x[r]=1 (312[?]r[?]315) if and only if the document belongs to a certain genre. The genre is explicitly written in the header of each document. The number of genres is four: General, National, Editorial, and Commentary.</Paragraph>
      <Paragraph position="2"> includes an assertive expression.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="2" type="metho">
    <SectionTitle>
3 Experimental settings
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Corpus
</SectionTitle>
      <Paragraph position="0"> We used the data set of TSC (Fukushima and Okumura, 2001) summarization collection for our evaluation. TSC was established as a sub-task of NTCIR-2 (NII-NACSIS Test Collection for IR Systems). The corpus consists of 180</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="2" type="sub_section">
      <SectionTitle>
Japanese documents
</SectionTitle>
      <Paragraph position="0"> from the MainichiNewspapers of 1994, 1995, and 1998. In each document, important sentences were manually extracted at summarization rates of 10%, 30%, and 50%. Note that the summarization rates depend on the number of sentences in a document not the number of characters. Table 1 shows the statistics.</Paragraph>
    </Section>
    <Section position="3" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
3.2 Evaluated methods
</SectionTitle>
      <Paragraph position="0"> Wecomparedfourmethods: decisiontreelearning, boosting, lead, and SVM. At each summa-</Paragraph>
    </Section>
    <Section position="4" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
3.3 Measures for evaluation
</SectionTitle>
      <Paragraph position="0"> In the TSC corpus, the number of sentences to be extracted was explicitly given by the TSC committee. When we extract sentences according to that number, Precision, Recall, and F-measure become the same value. We call this value Accuracy. Accuracy is defined as follows: Accuracy= b/a x100, where a is the specified number of important sentences, and b is the number of true important sentences that were contained in system's output.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="2" end_page="3" type="metho">
    <SectionTitle>
4 Results
</SectionTitle>
    <Paragraph position="0"> Table 2 shows the results of five-fold cross validation by using all 180 documents.</Paragraph>
    <Paragraph position="1"> For all summarization rates and all genres, SVM achieved the highest accuracy, the lead-based method the lowest. Let the null hypothesis be &amp;quot;There are no differences among the scoresofthe fourmethods&amp;quot;. We testedthis null hypothesisat asignificance levelof 1%by using Tukey's method. Although the SVM's performance was best, the differences were not statistically significant at 10%. At 30% and 50%, SVM performed better than the other methods with a statistical significance.</Paragraph>
  </Section>
  <Section position="6" start_page="3" end_page="4" type="metho">
    <SectionTitle>
5 Discussion
</SectionTitle>
    <Paragraph position="0"> Table 2 shows that Editorial and Commentary are more difficult than the other genres. We can consider two reasons for the poor scores of</Paragraph>
    <Section position="1" start_page="3" end_page="4" type="sub_section">
      <SectionTitle>
Editorial and Commentary:
</SectionTitle>
      <Paragraph position="0"> of (i,j).</Paragraph>
      <Paragraph position="1"> Table 3 shows that the result implies that non-standard features are useful in Editorial and Commentary documents.</Paragraph>
      <Paragraph position="2"> Now, we examine effective features in each genre. Since we used the second order polynomial kernel, we can expand g(x) as follows:</Paragraph>
      <Paragraph position="4"> where lscript is the number of support vectors, and</Paragraph>
      <Paragraph position="6"> We can rewrite it as follows when all vectors  [h,k] |was large, the feature or the feature pair had a strong influence on the optimal hyperplane. null  genre.</Paragraph>
      <Paragraph position="7"> Effective features common to three genres at three rates were sentence positions. Since National has a typical newspaper style, the beginning of the document was important. Moreover, &amp;quot;ga&amp;quot; and &amp;quot;ta&amp;quot; were important. These functional words are used when a new event is introduced.</Paragraph>
      <Paragraph position="8"> In Editorial and Commentary, the end of a paragraph and that of a document were important. The reason for this result is that subtopic or main topic conclusions are common in those positions. This implies that National has a different text structure from Editorial and Commentary. null Moreover, in Editorial, &amp;quot;de&amp;quot; and sentence weight was important. In Commentary, semantically shallow words, sentence weight and the length of a next sentence were important.</Paragraph>
      <Paragraph position="9"> In short, we confirmed that the feature(s) effectivefordiscriminating a genre differwiththe genre.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML