File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/p04-1035_metho.xml

Size: 14,924 bytes

Last Modified: 2025-10-06 14:09:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1035">
  <Title>A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Evaluation Framework
</SectionTitle>
    <Paragraph position="0"> Our experiments involve classifying movie reviews as either positive or negative, an appealing task for several reasons. First, as mentioned in the introduction, providing polarity information about reviews is a useful service: witness the popularity of www.rottentomatoes.com. Second, movie reviews are apparently harder to classify than reviews of other products (Turney, 2002; Dave, Lawrence, and Pennock, 2003). Third, the correct label can be extracted automatically from rating information (e.g., number of stars). Our data4 contains 1000 positive and 1000 negative reviews all written before 2002, with a cap of 20 reviews per author (312 authors total) per category. We refer to this corpus as the polarity dataset.</Paragraph>
    <Paragraph position="1"> Default polarity classifiers We tested support vector machines (SVMs) and Naive Bayes (NB). Following Pang et al. (2002), we use unigram-presence features: the ith coordinate of a feature vector is 1 if the corresponding unigram occurs in the input text, 0 otherwise. (For SVMs, the feature vectors are length-normalized). Each default document-level polarity classifier is trained and tested on the extracts formed by applying one of the sentence-level subjectivity detectors to reviews in the polarity dataset.</Paragraph>
    <Paragraph position="2"> Subjectivity dataset To train our detectors, we need a collection of labeled sentences. Riloff and Wiebe (2003) state that &amp;quot;It is [very hard] to obtain collections of individual sentences that can be easily identified as subjective or objective&amp;quot;; the polarity-dataset sentences, for example, have not  been so annotated.5 Fortunately, we were able to mine the Web to create a large, automaticallylabeled sentence corpus6. To gather subjective sentences (or phrases), we collected 5000 movie-review snippets (e.g., &amp;quot;bold, imaginative, and impossible to resist&amp;quot;) from www.rottentomatoes.com. To obtain (mostly) objective data, we took 5000 sentences from plot summaries available from the Internet Movie Database (www.imdb.com). We only selected sentences or snippets at least ten words long and drawn from reviews or plot summaries of movies released post-2001, which prevents overlap with the polarity dataset.</Paragraph>
    <Paragraph position="3"> Subjectivity detectors As noted above, we can use our default polarity classifiers as &amp;quot;basic&amp;quot; sentence-level subjectivity detectors (after retraining on the subjectivity dataset) to produce extracts of the original reviews. We also create a family of cut-based subjectivity detectors; these take as input the set of sentences appearing in a single document and determine the subjectivity status of all the sentences simultaneously using per-item and pairwise relationship information. Specifically, for a given document, we use the construction in Section 2.2 to build a graph wherein the source s and sink t correspond to the class of subjective and objective sentences, respectively, and each internal node vi corresponds to the document's ith sentence si. We can set the individual scores ind1(si) to PrNBsub (si) and ind2(si) to 1 [?] PrNBsub (si), as shown in Figure 3, where PrNBsub (s) denotes Naive Bayes' estimate of the probability that sentence s is subjective; or, we can use the weights produced by the SVM classifier instead.7 If we set all the association scores to zero, then the minimum-cut classification of the sentences is the same as that of the basic subjectivity detector. Alternatively, we incorporate the degree of proximity between pairs of sentences, controlled by three parameters. The threshold T specifies the maximum distance two sentences can be separated by and still be considered proximal. The  (negative=objective) from the separating hyperplane, to non-negative numbers by</Paragraph>
    <Paragraph position="5"> and ind2(si) = 1 [?] ind1(si). Note that scaling is employed only for consistency; the algorithm itself does not require probabilities for individual scores.</Paragraph>
    <Paragraph position="6"> non-increasing function f(d) specifies how the influence of proximal sentences decays with respect to distance d; in our experiments, we tried f(d) = 1, e1[?]d, and 1/d2. The constant c controls the relative influence of the association scores: a larger c makes the minimum-cut algorithm more loath to put proximal sentences in different classes. With these in hand8, we set (for j &gt; i)</Paragraph>
    <Paragraph position="8"/>
  </Section>
  <Section position="5" start_page="0" end_page="83" type="metho">
    <SectionTitle>
4 Experimental Results
</SectionTitle>
    <Paragraph position="0"> Below, we report average accuracies computed by ten-fold cross-validation over the polarity dataset.</Paragraph>
    <Paragraph position="1"> Section 4.1 examines our basic subjectivity extraction algorithms, which are based on individualsentence predictions alone. Section 4.2 evaluates the more sophisticated form of subjectivity extraction that incorporates context information via the minimum-cut paradigm.</Paragraph>
    <Paragraph position="2"> As we will see, the use of subjectivity extracts can in the best case provide satisfying improvement in polarity classification, and otherwise can at least yield polarity-classification accuracies indistinguishable from employing the full review. At the same time, the extracts we create are both smaller on average than the original document and more effective as input to a default polarity classifier than the same-length counterparts produced by standard summarization tactics (e.g., first- or last-N sentences). We therefore conclude that subjectivity extraction produces effective summaries of document sentiment.</Paragraph>
    <Section position="1" start_page="0" end_page="83" type="sub_section">
      <SectionTitle>
4.1 Basic subjectivity extraction
</SectionTitle>
      <Paragraph position="0"> As noted in Section 3, both Naive Bayes and SVMs can be trained on our subjectivity dataset and then used as a basic subjectivity detector. The former has somewhat better average ten-fold cross-validation performance on the subjectivity dataset (92% vs.</Paragraph>
      <Paragraph position="1"> 90%), and so for space reasons, our initial discussions will focus on the results attained via NB sub- null (Full review); indeed, the difference is highly statistically significant (p &lt; 0.01, paired t-test). With SVMs as the polarity classifier instead, the Full review performance rises to 87.15%, but comparison via the paired t-test reveals that this is statistically indistinguishable from the 86.4% that is achieved by running the SVM polarity classifier on ExtractNB input. (More improvements to extraction performance are reported later in this section.) These findings indicate10 that the extracts preserve (and, in the NB polarity-classifier case, apparently clarify) the sentiment information in the originating documents, and thus are good summaries from the polarity-classification point of view. Further support comes from a &amp;quot;flipping&amp;quot; experiment: if we give as input to the default polarity classifier an extract consisting of the sentences labeled objective, accuracy drops dramatically to 71% for NB and 67% for SVMs. This confirms our hypothesis that sentences discarded by the subjectivity extraction process are indeed much less indicative of sentiment polarity.</Paragraph>
      <Paragraph position="2"> Moreover, the subjectivity extracts are much more compact than the original documents (an important feature for a summary to have): they contain on average only about 60% of the source reviews' words. (This word preservation rate is plotted along the x-axis in the graphs in Figure 5.) This prompts us to study how much reduction of the original documents subjectivity detectors can perform and still accurately represent the texts' sentiment information. null We can create subjectivity extracts of varying lengths by taking just the N most subjective sentences11 from the originating review. As one base10Recall that direct evidence is not available because the polarity dataset's sentences lack subjectivity labels.</Paragraph>
      <Paragraph position="3"> 11These are the N sentences assigned the highest probability by the basic NB detector, regardless of whether their probabilline to compare against, we take the canonical summarization standard of extracting the first N sentences -- in general settings, authors often begin documents with an overview. We also consider the last N sentences: in many documents, concluding material may be a good summary, and www.rottentomatoes.com tends to select &amp;quot;snippets&amp;quot; from the end of movie reviews (Beineke et al., 2004). Finally, as a sanity check, we include results from the N least subjective sentences according to Naive Bayes.</Paragraph>
      <Paragraph position="4"> Figure 4 shows the polarity classifier results as N ranges between 1 and 40. Our first observation is that the NB detector provides very good &amp;quot;bang for the buck&amp;quot;: with subjectivity extracts containing as few as 15 sentences, accuracy is quite close to what one gets if the entire review is used. In fact, for the NB polarity classifier, just using the 5 most subjective sentences is almost as informative as the Full review while containing on average only about 22% of the source reviews' words.</Paragraph>
      <Paragraph position="5"> Also, it so happens that at N = 30, performance is actually slightly better than (but statistically indistinguishable from) Full review even when the SVM default polarity classifier is used (87.2% vs.</Paragraph>
      <Paragraph position="6"> 87.15%).12 This suggests potentially effective extraction alternatives other than using a fixed probability threshold (which resulted in the lower accuracy of 86.4% reported above).</Paragraph>
      <Paragraph position="7"> Furthermore, we see in Figure 4 that the N mostsubjective-sentences method generally outperforms the other baseline summarization methods (which perhaps suggests that sentiment summarization cannot be treated the same as topic-based summarizaities exceed 50% and so would actually be classified as subjective by Naive Bayes. For reviews with fewer than N sentences, the entire review will be returned.</Paragraph>
      <Paragraph position="8"> 12Note that roughly half of the documents in the polarity dataset contain more than 30 sentences (average=32.3, standard deviation 15).</Paragraph>
      <Paragraph position="9">  Also indicated are results for some statistical significance tests. tion, although this conjecture would need to be verified on other domains and data). It's also interesting to observe how much better the last N sentences are than the first N sentences; this may reflect a (hardly surprising) tendency for movie-review authors to place plot descriptions at the beginning rather than the end of the text and conclude with overtly opinionated statements.</Paragraph>
    </Section>
    <Section position="2" start_page="83" end_page="83" type="sub_section">
      <SectionTitle>
4.2 Incorporating context information
</SectionTitle>
      <Paragraph position="0"> The previous section demonstrated the value of subjectivity detection. We now examine whether context information, particularly regarding sentence proximity, can further improve subjectivity extraction. As discussed in Section 2.2 and 3, contextual constraints are easily incorporated via the minimum-cut formalism but are not natural inputs for standard Naive Bayes and SVMs.</Paragraph>
      <Paragraph position="1"> Figure 5 shows the effect of adding in proximity information. ExtractNB+Prox and ExtractSVM+Prox are the graph-based subjectivity detectors using Naive Bayes and SVMs, respectively, for the individual scores; we depict the best performance achieved by a single setting of the three proximity-related edge-weight parameters over all ten data folds13 (parameter selection was not a focus of the current work). The two comparisons we are most interested in are ExtractNB+Prox versus ExtractNB and ExtractSVM+Prox versus ExtractSVM.</Paragraph>
      <Paragraph position="2"> We see that the context-aware graph-based subjectivity detectors tend to create extracts that are more informative (statistically significant so (paired t-test) for SVM subjectivity detectors only), although these extracts are longer than their contextblind counterparts. We note that the performance 13Parameters are chosen from T [?] {1,2,3}, f(d) [?] {1,e1[?]d,1/d2}, and c [?] [0,1] at intervals of 0.1. enhancements cannot be attributed entirely to the mere inclusion of more sentences regardless of whether they are subjective or not -- one counterargument is that Full review yielded substantially worse results for the NB default polarity classifier-and at any rate, the graph-derived extracts are still substantially more concise than the full texts.</Paragraph>
      <Paragraph position="3"> Now, while incorporating a bias for assigning nearby sentences to the same category into NB and SVM subjectivity detectors seems to require some non-obvious feature engineering, we also wish to investigate whether our graph-based paradigm makes better use of contextual constraints that can be (more or less) easily encoded into the input of standard classifiers. For illustrative purposes, we consider paragraph-boundary information, looking only at SVM subjectivity detection for simplicity's sake.</Paragraph>
      <Paragraph position="4"> It seems intuitively plausible that paragraph boundaries (an approximation to discourse boundaries) loosen coherence constraints between nearby sentences. To capture this notion for minimum-cut-based classification, we can simply reduce the association scores for all pairs of sentences that occur in different paragraphs by multiplying them by a cross-paragraph-boundary weight w [?] [0,1]. For standard classifiers, we can employ the trick of having the detector treat paragraphs, rather than sentences, as the basic unit to be labeled. This enables the standard classifier to utilize coherence between sentences in the same paragraph; on the other hand, it also (probably unavoidably) poses a hard constraint that all of a paragraph's sentences get the same label, which increases noise sensitivity.14 Our experiments reveal the graph-cut formulation to be the better approach: for both default polarity classifiers (NB and SVM), some choice of parameters (including w) for ExtractSVM+Prox yields statistically significant improvement over its paragraphunit non-graph counterpart (NB: 86.4% vs. 85.2%; SVM: 86.15% vs. 85.45%).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML