File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1639_metho.xml
Size: 20,921 bytes
Last Modified: 2025-10-06 14:10:46
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1639"> <Title>floor-debate transcripts</Title> <Section position="5" start_page="328" end_page="330" type="metho"> <SectionTitle> 3 Method </SectionTitle> <Paragraph position="0"> The support/oppose classification problem can be approached through the use of standard classifiers such as support vector machines (SVMs), which consider each text unit in isolation. As discussed in Section 1, however, the conversational nature of our data implies the existence of various relationships that can be exploited to improve cumulative classification accuracy for speech segments belonging to the same debate. Our classification framework, directly inspired by Blum and Chawla (2001), integrates both perspectives, optimizing its labeling of speech segments based on both individual speech-segment classification scores and preferences for groups of speech segments to receive the same label. In this section, we discuss the specific classification framework that we adopt and the set of mechanisms that we propose for modeling specific types of relationships.</Paragraph> <Section position="1" start_page="329" end_page="329" type="sub_section"> <SectionTitle> 3.1 Classification framework </SectionTitle> <Paragraph position="0"> Let s1,s2,...,sn be the sequence of speech segments within a given debate, and let Y and N stand for the &quot;yea&quot; and &quot;nay&quot; class, respectively. Assume we have a non-negative function ind(s,C) indicating the degree of preference that an individual-document classifier, such as an SVM, has for placing speech-segment s in class C. Also, assume that some pairs of speech segments have weighted links between them, where the non-negative strength (weight) str(lscript) for a link lscript indicates the degree to which it is preferable that the linked speech segments receive the same label. Then, any class assignment c = c(s1),c(s2),...,c(sn) can be assigned a cost</Paragraph> <Paragraph position="2"> str(lscript), where c(s) is the &quot;opposite&quot; class from c(s). A minimum-cost assignment thus represents an optimum way to classify the speech segments so that each one tends not to be put into the class that the individual-document classifier disprefers, but at the same time, highly associated speech segments tend not to be put in different classes.</Paragraph> <Paragraph position="3"> As has been previously observed and exploited in the NLP literature (Pang and Lee, 2004; Agarwal and Bhattacharyya, 2005; Barzilay and Lapata, 2005), the above optimization function, unlike many others that have been proposed for graph or set partitioning, can be solved exactly in an provably efficient manner via methods for finding minimum cuts in graphs. In our view, the contribution of our work is the examination of new types of relationships, not the method by which such relationships are incorporated into the classification decision.</Paragraph> </Section> <Section position="2" start_page="329" end_page="329" type="sub_section"> <SectionTitle> 3.2 Classifying speech segments in isolation </SectionTitle> <Paragraph position="0"> In our experiments, we employed the well-known classifier SVMlight to obtain individual-document classification scores, treating Y as the positive class and using plain unigrams as features.5 Following standard practice in sentiment analysis (Pang et al., 2002), the input to SVMlight consisted of normalized presence-of-feature (rather than frequency-of-feature) vectors. The ind value 5SVMlight is available at svmlight.joachims.org. Default parameters were used, although experimentation with different parameter settings is an important direction for future work (Daelemans and Hoste, 2002; Munson et al., 2005).</Paragraph> <Paragraph position="1"> for each speech segment s was based on the signed distance d(s) from the vector representing s to the trained SVM decision plane:</Paragraph> <Paragraph position="3"> where ss is the standard deviation of d(s) over all speech segments s in the debate in question, and ind(s,N) def= 1[?] ind(s,Y).</Paragraph> <Paragraph position="4"> We now turn to the more interesting problem of representing the preferences that speech segments may have for being assigned to the same class.</Paragraph> </Section> <Section position="3" start_page="329" end_page="330" type="sub_section"> <SectionTitle> 3.3 Relationships between speech segments </SectionTitle> <Paragraph position="0"> A wide range of relationships between text segments can be modeled as positive-strength links.</Paragraph> <Paragraph position="1"> Here we discuss two types of constraints that are considered in this work.</Paragraph> <Paragraph position="2"> Same-speaker constraints: In Congressional debates and in general social-discourse contexts, a single speaker may make a number of comments regarding a topic. It is reasonable to expect that in many settings, the participants in a discussion may be convinced to change their opinions midway through a debate. Hence, in the general case we wish to be able to express &quot;soft&quot; preferences for all of an author's statements to receive the same label, where the strengths of such constraints could, for instance, vary according to the time elapsed between the statements. Weighted links are an appropriate means to express such variation.</Paragraph> <Paragraph position="3"> However, if we assume that most speakers do not change their positions in the course of a discussion, we can conclude that all comments made by the same speaker must receive the same label.</Paragraph> <Paragraph position="4"> This assumption holds by fiat for the ground-truth labels in our dataset because these labels were derived from the single vote cast by the speaker on the bill being discussed.6 We can implement this assumption via links whose weights are essentially infinite. Although one can also implement this assumption via concatenation of same-speaker speech segments (see Section 4.3), we view the fact that our graph-based framework incorporates 6We are attempting to determine whether a speech segment represents support or not. This differs from the problem of determining what the speaker's actual opinion is, a problem that, as an anonymous reviewer put it, is complicated by &quot;grandstanding, backroom deals, or, more innocently, plain change of mind ('I voted for it before I voted against it')&quot;. both hard and soft constraints in a principled fashion as an advantage of our approach.</Paragraph> <Paragraph position="5"> Different-speaker agreements In House discourse, it is common for one speaker to make reference to another in the context of an agreement or disagreement over the topic of discussion. The systematic identification of instances of agreement can, as we have discussed, be a powerful tool for the development of intelligently selected weights for links between speech segments.</Paragraph> <Paragraph position="6"> The problem of agreement identification can be decomposed into two sub-problems: identifying references and their targets, and deciding whether each reference represents an instance of agreement. In our case, the first task is straightforward because we focused solely on by-name references.7 Hence, we will now concentrate on the second, more interesting task.</Paragraph> <Paragraph position="7"> We approach the problem of classifying references by representing each reference with a wordpresence vector derived from a window of text surrounding the reference.8 In the training set, we classify each reference connecting two speakers with a positive or negative label depending on whether the two voted the same way on the bill under discussion9. These labels are then used to train an SVM classifier, the output of which is subsequently used to create weights on agreement links in the test set as follows.</Paragraph> <Paragraph position="8"> Let d(r) denote the distance from the vector representing reference r to the agreement-detector SVM's decision plane, and let sr be the standard deviation of d(r) over all references in the debate in question. We then define the strength agr of the agreement link corresponding to the reference as:</Paragraph> <Paragraph position="10"> The free parameter a specifies the relative impor- null represent relationships between speech segments, we ignore references for which the target of the reference did not speak in the debate in which the reference was made.</Paragraph> <Paragraph position="11"> tance of the agr scores. The threshold thagr controls the precision of the agreement links, in that values of thagr greater than zero mean that greater confidence is required before an agreement link can be added.10</Paragraph> </Section> </Section> <Section position="6" start_page="330" end_page="332" type="metho"> <SectionTitle> 4 Evaluation </SectionTitle> <Paragraph position="0"> This section presents experiments testing the utility of using speech-segment relationships, evaluating against a number of baselines. All reported results use values for the free parameter a derived via tuning on the development set. In the tables, boldface indicates the development- and test-set results for the development-set-optimal parameter settings, as one would make algorithmic choices based on development-set performance.</Paragraph> <Section position="1" start_page="330" end_page="331" type="sub_section"> <SectionTitle> 4.1 Preliminaries: Reference classification </SectionTitle> <Paragraph position="0"> Recall that to gather inter-speaker agreement information, the strategy employed in this paper is to classify by-name references to other speakers as to whether they indicate agreement or not.</Paragraph> <Paragraph position="1"> To train our agreement classifier, we experimented with undoing the deletion of amendmentrelated speech segments in the training set. Note that such speech segments were never included in the development or test set, since, as discussed in Section 2, their labels are probably noisy; however, including them in the training set allows the classifier to examine more instances even though some of them are labeled incorrectly. As Table 2 shows, using more, if noisy, data yields better agreement-classification results on the development set, and so we use that policy in all subsequent experiments.11 10Our implementation puts a link between just one arbitrary pair of speech segments among all those uttered by a given pair of apparently agreeing speakers. The &quot;infiniteweight&quot; same-speaker links propagate the agreement information to all other such pairs.</Paragraph> <Paragraph position="2"> 11Unfortunately, this policy leads to inferior test-set agree- null An important observation is that precision may be more important than accuracy in deciding which agreement links to add: false positives with respect to agreement can cause speech segments to be incorrectly assigned the same label, whereas false negatives mean only that agreement-based information about other speech segments is not employed. As described above, we can raise agreement precision by increasing the threshold thagr, which specifies the required confidence for the addition of an agreement link. Indeed, Table 3 shows that we can improve agreement precision by setting thagr to the (positive) mean agreement score u assigned by the SVM agreement-classifier over all references in the given debate12. However, this comes at the cost of greatly reducing agreement accuracy (development: 64.38%; test: 66.18%) due to lowered recall levels. Whether or not better speech-segment classification is ultimately achieved is discussed in the next sections.</Paragraph> </Section> <Section position="2" start_page="331" end_page="331" type="sub_section"> <SectionTitle> 4.2 Segment-based speech-segment </SectionTitle> <Paragraph position="0"> classification Baselines The first two data rows of Table 4 depict baseline performance results. The #(&quot;support&quot;) [?] #(&quot;oppos&quot;) baseline is meant to explore whether the speech-segment classification task can be reduced to simple lexical checks. Specifically, this method uses the signed difference between the number of words containing the stem &quot;support&quot; and the number of words containing the stem &quot;oppos&quot; (returning the majority class if the difference is 0). No better than 62.67% test-set accuracy is obtained by either baseline.</Paragraph> <Paragraph position="1"> Using relationship information Applying an SVM to classify each speech segment in isolation leads to clear improvements over the two base-line methods, as demonstrated in Table 4. When we impose the constraint that all speech segments uttered by the same speaker receive the same label via &quot;same-speaker links&quot;, both test-set and ment classification. Section 4.5 contains further discussion. 12We elected not to explicitly tune the value of thagr in order to minimize the number of free parameters to deal with. tion accuracy, in percent. Here, the initial SVM is run on the concatenation of all of a given speaker's speech segments, but the results are computed over speech segments (not speakers), so that they can be compared to those in Table 4.</Paragraph> <Paragraph position="2"> development-set accuracy increase even more, in the latter case quite substantially so.</Paragraph> <Paragraph position="3"> The last two lines of Table 4 show that the best results are obtained by incorporating agreement information as well. The highest test-set result, 71.16%, is obtained by using a high-precision threshold to determine which agreement links to add. While the development-set results would induce us to utilize the standard threshold value of 0, which is sub-optimal on the test set, the thagr = 0 agreement-link policy still achieves noticeable improvement over not using agreement links (test set: 70.81% vs. 67.21%).</Paragraph> </Section> <Section position="3" start_page="331" end_page="332" type="sub_section"> <SectionTitle> 4.3 Speaker-based speech-segment </SectionTitle> <Paragraph position="0"> classification We use speech segments as the unit of classification because they represent natural discourse units. As a consequence, we are able to exploit relationships at the speech-segment level. However, it is interesting to consider whether we really need to consider relationships specifically between speech segments themselves, or whether it suffices to simply consider relationships between the speakers of the speech segments. In particular, as an alternative to using same-speaker links, we tried a speaker-based approach wherein the way we determine the initial individual-document classification score for each speech segment uttered by a person p in a given debate is to run an SVM on the concatenation of all of p's speech segments within that debate. (We also ensure that agreement-link information is propagated from speech-segment to speaker pairs.) How does the use of same-speaker links compare to the concatenation of each speaker's speech segments? Tables 4 and 5 show that, not surprisingly, the SVM individual-document classifier works better on the concatenated speech segments than on the speech segments in isolation. However, the effect on overall classification accuracy is less clear: the development set favors same-speaker links over concatenation, while the test set does not.</Paragraph> <Paragraph position="1"> But we stress that the most important observation we can make from Table 5 is that once again, the addition of agreement information leads to substantial improvements in accuracy.</Paragraph> </Section> <Section position="4" start_page="332" end_page="332" type="sub_section"> <SectionTitle> 4.4 &quot;Hard&quot; agreement constraints </SectionTitle> <Paragraph position="0"> Recall that in in our experiments, we created finite-weight agreement links, so that speech segments appearing in pairs flagged by our (imperfect) agreement detector can potentially receive different labels. We also experimented with forcing such speech segments to receive the same label, either through infinite-weight agreement links or through a speech-segment concatenation strategy similar to that described in the previous subsection. Both strategies resulted in clear degradation in performance on both the development and test sets, a finding that validates our encoding of agreement information as &quot;soft&quot; preferences.</Paragraph> </Section> <Section position="5" start_page="332" end_page="332" type="sub_section"> <SectionTitle> 4.5 On the development/test set split </SectionTitle> <Paragraph position="0"> We have seen several cases in which the method that performs best on the development set does not yield the best test-set performance. However, we felt that it would be illegitimate to change the train/development/test sets in a post hoc fashion, that is, after seeing the experimental results.</Paragraph> <Paragraph position="1"> Moreover, and crucially, it is very clear that using agreement information, encoded as preferences within our graph-based approach rather than as hard constraints, yields substantial improvements on both the development and test set; this, we believe, is our most important finding.</Paragraph> </Section> </Section> <Section position="7" start_page="332" end_page="333" type="metho"> <SectionTitle> 5 Related work </SectionTitle> <Paragraph position="0"> Politically-oriented text Sentiment analysis has specifically been proposed as a key enabling technology in eRulemaking, allowing the automatic analysis of the opinions that people submit (Shulman et al., 2005; Cardie et al., 2006; Kwon et al., 2006). There has also been work focused upon determining the political leaning (e.g., &quot;liberal&quot; vs. &quot;conservative&quot;) of a document or author, where most previously-proposed methods make no direct use of relationships between the documents to be classified (the &quot;unlabeled&quot; texts) (Laver et al., 2003; Efron, 2004; Mullen and Malouf, 2006). An exception is Grefenstette et al. (2004), who experimented with determining the political orientation of websites essentially by classifying the concatenation of all the documents found on that site.</Paragraph> <Paragraph position="1"> Others have applied the NLP technologies of near-duplicate detection and topic-based text categorization to politically oriented text (Yang and Callan, 2005; Purpura and Hillard, 2006).</Paragraph> <Paragraph position="2"> Detecting agreement We used a simple method to learn to identify cross-speaker references indicating agreement. More sophisticated approaches have been proposed (Hillard et al., 2003), including an extension that, in an interesting reversal of our problem, makes use of sentiment-polarity indicators within speech segments (Galley et al., 2004). Also relevant is work on the general problems of dialog-act tagging (Stolcke et al., 2000), citation analysis (Lehnert et al., 1990), and computational rhetorical analysis (Marcu, 2000; Teufel and Moens, 2002).</Paragraph> <Paragraph position="3"> We currently do not have an efficient means to encode disagreement information as hard constraints; we plan to investigate incorporating such information in future work.</Paragraph> <Paragraph position="4"> Relationships between the unlabeled items Carvalho and Cohen (2005) consider sequential relations between different types of emails (e.g., between requests and satisfactions thereof) to classify messages, and thus also explicitly exploit the structure of conversations.</Paragraph> <Paragraph position="5"> Previous sentiment-analysis work in different domains has considered inter-document similarity (Agarwal and Bhattacharyya, 2005; Pang and Lee, 2005; Goldberg and Zhu, 2006) or explicit inter-document references in the form of hyperlinks (Agrawal et al., 2003).</Paragraph> <Paragraph position="6"> Notable early papers on graph-based semi-supervised learning include Blum and Chawla (2001), Bansal et al. (2002), Kondor and Lafferty (2002), and Joachims (2003). Zhu (2005) maintains a survey of this area.</Paragraph> <Paragraph position="7"> Recently, several alternative, often quite sophisticated approaches to collective classification have been proposed (Neville and Jensen, 2000; Lafferty et al., 2001; Getoor et al., 2002; Taskar et al., 2002; Taskar et al., 2003; Taskar et al., 2004; McCallum and Wellner, 2004). It would be interesting to investigate the application of such methods to our problem. However, we also believe that our approach has important advantages, including conceptual simplicity and the fact that it is based on an underlying optimization problem that is provably and in practice easy to solve.</Paragraph> </Section> class="xml-element"></Paper>