File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-1035_intro.xml
Size: 7,835 bytes
Last Modified: 2025-10-06 14:02:21
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-1035"> <Title>A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Method </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Architecture </SectionTitle> <Paragraph position="0"> One can consider document-level polarity classification to be just a special (more difficult) case of text categorization with sentiment- rather than topic-based categories. Hence, standard machine-learning classification techniques, such as support vector machines (SVMs), can be applied to the entire documents themselves, as was done by Pang, Lee, and Vaithyanathan (2002). We refer to such classification techniques as default polarity classifiers. null However, as noted above, we may be able to improve polarity classification by removing objective sentences (such as plot summaries in a movie review). We therefore propose, as depicted in Figure 1, to first employ a subjectivity detector that determines whether each sentence is subjective or not: discarding the objective ones creates an extract that should better represent a review's subjective content to a default polarity classifier.</Paragraph> <Paragraph position="1"> To our knowledge, previous work has not integrated sentence-level subjectivity detection with document-level sentiment polarity. Yu and Hatzivassiloglou (2003) provide methods for sentence-level analysis and for determining whether a document is subjective or not, but do not combine these two types of algorithms or consider document polarity classification. The motivation behind the single-sentence selection method of Beineke et al. (2004) is to reveal a document's sentiment polarity, but they do not evaluate the polarity-classification accuracy that results.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Context and Subjectivity Detection </SectionTitle> <Paragraph position="0"> As with document-level polarity classification, we could perform subjectivity detection on individual sentences by applying a standard classification algorithm on each sentence in isolation. However, modeling proximity relationships between sentences would enable us to leverage coherence: text spans occurring near each other (within discourse boundaries) may share the same subjectivity status, other things being equal (Wiebe, 1994).</Paragraph> <Paragraph position="1"> We would therefore like to supply our algorithms with pair-wise interaction information, e.g., to specify that two particular sentences should ideally receive the same subjectivity label but not state which label this should be. Incorporating such information is somewhat unnatural for classifiers whose input consists simply of individual feature vectors, such as Naive Bayes or SVMs, precisely because such classifiers label each test item in isolation.</Paragraph> <Paragraph position="2"> One could define synthetic features or feature vectors to attempt to overcome this obstacle. However, we propose an alternative that avoids the need for such feature engineering: we use an efficient and intuitive graph-based formulation relying on finding minimum cuts. Our approach is inspired by Blum and Chawla (2001), although they focused on similarity between items (the motivation being to combine labeled and unlabeled data), whereas we are concerned with physical proximity between the items to be classified; indeed, in computer vision, modeling proximity information via graph cuts has led to very effective classification (Boykov, Veksler, and Zabih, 1999).</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 Cut-based classification </SectionTitle> <Paragraph position="0"> Figure 2 shows a worked example of the concepts in this section.</Paragraph> <Paragraph position="1"> Suppose we have n items x1,...,xn to divide into two classes C1 and C2, and we have access to two types of information: * Individual scores indj(xi): non-negative estimates of each xi's preference for being in Cj based on just the features of xi alone; and * Association scores assoc(xi,xk): non-negative estimates of how important it is that xi and xk be in the same class.1 We would like to maximize each item's &quot;net happiness&quot;: its individual score for the class it is assigned to, minus its individual score for the other class. But, we also want to penalize putting tightlyassociated items into different classes. Thus, after some algebra, we arrive at the following optimization problem: assign the xis to C1 and C2 so as to minimize the partition cost</Paragraph> <Paragraph position="3"> assoc(xi,xk).</Paragraph> <Paragraph position="4"> The problem appears intractable, since there are 2n possible binary partitions of the xi's. However, suppose we represent the situation in the following manner. Build an undirected graph G with vertices {v1,...,vn,s,t}; the last two are, respectively, the source and sink. Add n edges (s,vi), each with weight ind1(xi), and n edges (vi,t), each with weight ind2(xi). Finally, add parenleftbign2parenrightbig edges (vi,vk), each with weight assoc(xi,xk). Then, cuts in G are defined as follows: Definition 1 A cut (S,T) of G is a partition of its nodes into sets S = {s} [?] Sprime and T = {t} [?] Tprime, where s negationslash[?] Sprime,t negationslash[?] Tprime. Its cost cost(S,T) is the sum of the weights of all edges crossing from S to T. A minimum cut of G is one of minimum cost.</Paragraph> <Paragraph position="5"> be probabilities. Based on individual scores alone, we would put Y (&quot;yes&quot;) in C1, N (&quot;no&quot;) in C2, and be undecided about M (&quot;maybe&quot;). But the association scores favor cuts that put Y and M in the same class, as shown in the table. Thus, the minimum cut, indicated by the dashed line, places M together with Y in C1. Observe that every cut corresponds to a partition of the items and has cost equal to the partition cost.</Paragraph> <Paragraph position="6"> Thus, our optimization problem reduces to finding minimum cuts.</Paragraph> <Paragraph position="7"> Practical advantages As we have noted, formulating our subjectivity-detection problem in terms of graphs allows us to model item-specific and pair-wise information independently. Note that this is a very flexible paradigm. For instance, it is perfectly legitimate to use knowledge-rich algorithms employing deep linguistic knowledge about sentiment indicators to derive the individual scores.</Paragraph> <Paragraph position="8"> And we could also simultaneously use knowledge-lean methods to assign the association scores. Interestingly, Yu and Hatzivassiloglou (2003) compared an individual-preference classifier against a relationship-based method, but didn't combine the two; the ability to coordinate such algorithms is precisely one of the strengths of our approach.</Paragraph> <Paragraph position="9"> But a crucial advantage specific to the utilization of a minimum-cut-based approach is that we can use maximum-flow algorithms with polynomial asymptotic running times -- and near-linear running times in practice -- to exactly compute the minimum-cost cut(s), despite the apparent intractability of the optimization problem (Cormen, Leiserson, and Rivest, 1990; Ahuja, Magnanti, and Orlin, 1993).2 In contrast, other graph-partitioning problems that have been previously used to formulate NLP classification problems3 are NP-complete (Hatzivassiloglou and McKeown, 1997; Agrawal et al., 2003; Joachims, 2003).</Paragraph> </Section> </Section> class="xml-element"></Paper>