File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/j04-3002_intro.xml
Size: 22,175 bytes
Last Modified: 2025-10-06 14:02:15
<?xml version="1.0" standalone="yes"?> <Paper uid="J04-3002"> <Title>at Asheville</Title> <Section position="4" start_page="294" end_page="301" type="intro"> <SectionTitle> 4. Features Used in Concert </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="294" end_page="294" type="sub_section"> <SectionTitle> 4.1 Introduction </SectionTitle> <Paragraph position="0"> In this section, we examine the various types of clues used together. In preparation for this work, all instances in OP1 and OP2 of all of the PSEs identified as described in Section 3 have been automatically identified. All training to define the PSE instances in OP1 was performed on data separate from OP1, and all training to define the PSE instances in OP2 was performed on data separate from OP2.</Paragraph> </Section> <Section position="2" start_page="294" end_page="295" type="sub_section"> <SectionTitle> 4.2 Consistency in Precision among Data Sets </SectionTitle> <Paragraph position="0"> Table 9 summarizes the results from previous sections in which the opinion piece data are used for testing. The performance of the various features is consistently good or bad on the same data sets: the performance is better for all features on W9-10 and W9-04 than on W9-22 and W9-33 (except for the ugen-4-grams, which occur with very low frequency, and the verbs, which have low frequency in W9-10). This is so despite the fact that the features were generated using different procedures and data: The Wiebe, Wilson, Bruce, Bell, and Martin Learning Subjective Language 0. PSEs = all adjs, verbs, modals, nouns, and adverbs that appear at least once in an SE (except not, will, be, have).</Paragraph> <Paragraph position="1"> 1. PSEinsts = the set of all instances of PSEs 2. HiDensity = {} 3. For P in PSEinsts: 4. leftWin(P) = the W words before P 5. rightWin(P) = the W words after P 6. density(P) = number of SEs whose first or last</Paragraph> <Paragraph position="3"> Algorithm for calculating density in subjective-element data.</Paragraph> <Paragraph position="4"> adjectives and verbs were generated from WSJ document-level opinion piece classifications; the n-gram features were generated from newsgroup and WSJ expression-level subjective-element classifications; and the unique unigram feature requires no training.</Paragraph> <Paragraph position="5"> This consistency in performance suggests that the results are not brittle.</Paragraph> </Section> <Section position="3" start_page="295" end_page="296" type="sub_section"> <SectionTitle> 4.3 Choosing Density Parameters from Subjective-Element Data </SectionTitle> <Paragraph position="0"> In Wiebe (1994), whether a PSE is interpreted to be subjective depends, in part, on how subjective the surrounding context is. We explore this idea in the current work, assessing whether PSEs are more likely to be subjective if they are surrounded by subjective elements. In particular, we experiment with a density feature to decide whether or not a PSE instance is subjective: If a sufficient number of subjective elements are nearby, then the PSE instance is considered to be subjective; otherwise, it is discarded.</Paragraph> <Paragraph position="1"> The density parameters are a window size W and a frequency threshold T.</Paragraph> <Paragraph position="2"> In this section, we explore the density of manually annotated PSEs in subjective-element data and choose density parameters to use in Section 4.4, in which we apply them to automatically identified PSEs in opinion piece data.</Paragraph> <Paragraph position="3"> The process for calculating density in the subjective-element data is given in Figure 3. The PSEs are defined to be all adjectives, verbs, modals, nouns, and adverbs that appear at least once in a subjective element, with the exception of some stop words (line 0 of Figure 3). Note that these PSEs depend only on the subjective-element manual annotations, not on the automatically identified features used elsewhere in the article or on the document-level opinion piece classes. PSEinsts is the set of PSE instances to be disambiguated (line 1). HiDensity (initialized on line 2) will be the subset of PSEinsts that are retained. In the loop, the density of each PSE instance P is calculated. This is the number of subjective elements that begin or end in the W words preceding or following P (line 6). P is retained if its density is at least T (line 7).</Paragraph> <Paragraph position="4"> Lines 8-9 of the algorithm assess the precision of the original (PSEinsts) and new (HiDensity) sets of PSE instances. If prec(HiDensity) is greater than prec(PSEinsts), then there is evidence that the number of subjective elements near a PSE instance is related to its subjectivity in context.</Paragraph> <Paragraph position="5"> To create more data points for this analysis, WSJ-SE was split into two (WSJ-SE1 and WSJ-SE2) and annotations of the two judges are considered separately. WSJ-SE2-D, for example, refers to D's annotations of WSJ-SE2. The process in Figure 3 was repeated for different parameter settings (T in [1, 2, 4,...,48] and W in [1, 10, 20,..., 490]) on each of the SE data sets. To find good parameter settings, the results for each data set were sorted into five-point precision intervals and then sorted by frequency within each interval. Information for the top three precision intervals for each data set are shown in Table 10, specifically, the parameter values (i.e., T and W) and the frequency and precision of the most frequent result in each interval. The intervals are in the rows labeled Range. For example, the top three precision intervals for WSJ-SE1-M, 0.87-0.92, 0.82-0.87, and 0.77-0.82 (no parameter values yield higher precision than 0.92). The top of Table 10 gives baseline frequencies and precisions, which are |PSEinsts |and prec(PSEinsts), respectively, in line 8 of Figure 3.</Paragraph> <Paragraph position="6"> The parameter values exhibit a range of frequencies and precisions, with the expected trade-off between precision and frequency. We choose the following parameters to test in Section 4.4: For each data set, for each precision interval whose lower bound is at least 10 percentage points higher than the baseline for that data set, the top two (T, W) pairs yielding the highest frequencies in that interval are chosen. Among the five data sets, a total of 45 parameter pairs were so selected. This exercise was completed once, without experimenting with different parameter settings.</Paragraph> </Section> <Section position="4" start_page="296" end_page="298" type="sub_section"> <SectionTitle> 4.4 Density for Disambiguation </SectionTitle> <Paragraph position="0"> In this section, density is exploited to find subjective instances of automatically identified PSEs. The process is shown in Figure 4. There are only two differences between the algorithms in Figures 3 and 4. First, in Figure 3, density is defined in terms of the number of subjective elements nearby. However, subjective-element annotations are not available in test data. Thus in Figure 4, density is defined in terms of the Wiebe, Wilson, Bruce, Bell, and Martin Learning Subjective Language 0. PSEinsts = the set of instances in the test data of all PSEs described in Section 3 1. HiDensity = {} 2. For P in PSEinsts: 3. leftWin(P) = the W words before P 4. rightWin(P) = the W words after P 5. density(P) = number of PSEinsts whose first or last</Paragraph> <Paragraph position="2"> Algorithm for calculating density in opinion piece (OP) data number of other PSE instances nearby, where PSEinsts consists of all instances of the automatically identified PSEs described in Section 3, for which results are given in Second, in Figure 4, we assess precision with respect to the document-level classes (lines 7-8). The test data are OP1.</Paragraph> <Paragraph position="3"> An interesting question arose when we were defining the PSE instances: What should be done with words that are identified to be PSEs (or parts of PSEs) according to multiple criteria? For example, sunny, radiant, and exhilarating are all unique in corpus OP1, and are all members of the adjective PSE feature defined for testing on OP1. Collocations add additional complexity. For example, consider the sequence and splendidly, which appears in the test data. The sequence and splendidly matches the ugen-2-gram (and-conj U-adj), and the word splendidly is unique. In addition, a sequence may match more than one n-gram feature. For example, is it that matches three fixed-n-gram features: is it, is it that, and it that.</Paragraph> <Paragraph position="4"> In the current experiments, the more PSEs a word matches, the more weight it is given. The hypothesis behind this treatment is that additional matches represent additional evidence that a PSE instance is subjective. This hypothesis is realized as follows: Each match of each member of each type of PSE is considered to be a PSE instance. Thus, among them, there are 11 members in PSEinsts for the five phrases sunny, radiant, exhilarating, and splendidly, and is it that, one for each of the matches mentioned above.</Paragraph> <Paragraph position="5"> The process in Figure 4 was conducted with the 45 parameter pair values (T and W) chosen from the subjective-element data as described in Section 4.3. Table 11 shows results for a subset of the 45 parameters, namely, the most frequent parameter pair chosen from the top three precision intervals for each training set. The bottom of the table gives a baseline frequency and a baseline precision in OP1, defined as |PSEinsts| and prec(PSEinsts), respectively, in line 7 of Figure 4.</Paragraph> <Paragraph position="6"> The density features result in substantial increases in precision. Of the 45 parameter pairs, the minimum percentage increase over baseline is 22%. Fully 24% of the 45 parameter pairs yield increases of 200% or more; 38% yield increases between 100%</Paragraph> <Paragraph position="8"> and 199%, and 38% yield increases between 22% and 99%. In addition, the increases are significant. Using the set of high-density PSEs defined by the parameter pair with the least increase over baseline, we tested the difference in the proportion of PSEs in opinion pieces that are high-density and the proportion of PSEs in nonopinion pieces that are high-density. The difference between these two proportions is highly significant (z = 46.2, p < 0.0001).</Paragraph> <Paragraph position="9"> Notice that, except for one blip (T, W = 6, 10 under WSJ-SE-M), the precisions decrease and the frequencies increase as we go down each column in Table 11. The same pattern can be observed with all 45 parameter pairs (results not included here because of space considerations). But the parameter pairs are ordered in Table 11 based on performance in the manually annotated subjective-element data, not based on performance in the test data. For example, the entry in the first row, first column (T, W = 10, 20) is the parameter pair giving the highest frequency in the top precision interval of WSJ-SE-M (frequency and precision in WSJ-SE-M, using the process of Figure 3). Thus, the relative precisions and frequencies of the parameter pairs are carried over from the training to the test data. This is quite a strong result, given that the PSEs in the training data are from manual annotations, while the PSEs in the test data are our automatically identified features.</Paragraph> </Section> <Section position="5" start_page="298" end_page="300" type="sub_section"> <SectionTitle> 4.5 High-Density Sentence Annotations </SectionTitle> <Paragraph position="0"> To assess the subjectivity of sentences with high-density PSEs, we extracted the 133 sentences in corpus OP2 that contain at least one high-density PSE and manually annotated them. We refer to these sentences as the system-identified sentences.</Paragraph> <Paragraph position="1"> We chose the density-parameter pair (T, W = 12, 30), based on its precision and frequency in OP1. This parameter setting yields results that have relatively high precision and low frequency. We chose a low-frequency setting to make the annotation study feasible.</Paragraph> <Paragraph position="2"> The extracted sentences were independently annotated by two judges. One is a coauthor of this article (judge 1), and the other has performed subjectivity annotation before, but is not otherwise involved in this research (judge 2). Sentences were annotated according to the coding instructions of Wiebe, Bruce, and O'Hara (1999) which, recall, are to classify a sentence as subjective if there is a significant expression of subjectivity of either the writer or someone mentioned in the text, in the sentence. (1) The outburst of shooting came nearly two weeks after clashes between Moslem worshippers and oo Somali soldiers.</Paragraph> <Paragraph position="3"> (2.a) But now the refugees are streaming across the border and alarming the world. ss (2.b) In the middle of the crisis, Erich Honecker was hospitalized with a gall stone operation. oo (2.c) It is becoming more and more obvious that his gallstone-age communism is dying with him: ... ss (3.a) Not brilliantly, because, after all, this was a performer who was collecting paychecks from lounges ss at Hiltons and Holiday Inns, but creditably and with the air of someone for whom &quot;Ten Cents a Dance&quot; was more than a bit autobiographical.</Paragraph> <Paragraph position="4"> (3.b) &quot;It was an exercise of blending Michelle's singing with Susie's singing,&quot; explained Ms. Stevens. oo (4) Enlisted men and lower-grade officers were meat thrown into a grinder. ss (5) &quot;If you believe in God and you believe in miracles, there's nothing particularly crazy about that.&quot; ss (6) He was much too eager to create &quot;something very weird and dynamic,&quot; ss &quot;catastrophic and jolly&quot; like &quot;this great and coily thing&quot; &quot;Lolita.&quot; (7) The Bush approach of mixing confrontation with conciliation strikes some people as sensible, perhaps ss even inevitable, because Mr. Bush faces a Congress firmly in the hands of the opposition.</Paragraph> <Paragraph position="5"> (8) Still, despite their efforts to convince the world that we are indeed alone, the visitors do seem to keep ss coming and, like the recent sightings, there's often a detail or two that suggests they may actually be a little on the dumb side.</Paragraph> <Paragraph position="6"> (9) As for the women, they're pathetic. ss (10) At this point, the truce between feminism and sensationalism gets might uneasy. ss (11) MMPI's publishers say the test shouldn't be used alone to diagnose ss psychological problems or in hiring; it should be given in conjunction with other tests.</Paragraph> <Paragraph position="7"> (12) While recognizing that professional environmentalists may feel threatened, ss I intend to urge that UV-B be monitored whenever I can.</Paragraph> <Paragraph position="8"> In addition to the subjective and objective classes, a judge can tag a sentence as unsure if he or she is unsure of his or her rating or considers the sentence to be borderline. An equal number (133) of other sentences were randomly selected from the corpus to serve as controls. The 133 system-identified sentences and the 133 control sentences were randomly mixed together. The judges were asked to annotate all 266 sentences, not knowing which were system-identified and which were control. Each sentence was presented with the sentence that precedes it and the sentence that follows it in the corpus, to provide some context for interpretation.</Paragraph> <Paragraph position="9"> Table 12 shows examples of the system-identified sentences. Sentences classified by both judges as objective are marked oo and those classified by both judges as subjective are marked ss.</Paragraph> <Paragraph position="10"> Examples of subjective sentences adjacent to system-identified sentences.</Paragraph> <Paragraph position="11"> Bathed in cold sweat, I watched these Dantesque scenes, holding tightly the damp hand of Edek or Waldeck who, like me, were convinced that there was no God.</Paragraph> <Paragraph position="12"> &quot;The Japanese are amazed that a company like this exists in Japan,&quot; says Kimindo Kusaka, head of the Softnomics Center, a Japanese management-research organization.</Paragraph> <Paragraph position="13"> And even if drugs were legal, what evidence do you have that the habitual drug user wouldn't continue to rob and steal to get money for clothes, food or shelter? The moral cost of legalizing drugs is great, but it is a cost that apparently lies outside the narrow scope of libertarian policy prescriptions.</Paragraph> <Paragraph position="14"> I doubt that one exists.</Paragraph> <Paragraph position="15"> They were upset at his committee's attempt to pacify the program critics by cutting the surtax paid by the more affluent elderly and making up the loss by shifting more of the burden to the elderly poor and by delaying some benefits by a year. Judge 1 classified 103 of the system-identified sentences as subjective, 16 as objective, and 14 as unsure. Judge 2 classified 102 of the system-identified sentences as subjective, 27 as objective; and 4 as unsure. The contingency table is given in Table 13. The kappa value using all three classes is 0.60, reflecting the highly skewed distribution in favor of subjective sentences, and the disagreement on the lower-frequency classes (unsure and objective). Consistent with the findings in Wiebe, Bruce, and O'Hara (1999), the kappa value for agreement on the sentences for which neither judge is unsure is very high: 0.86.</Paragraph> <Paragraph position="16"> A different breakdown of the sentences is illuminating. For 98 of the sentences (call them SS), judges 1 and 2 tag the sentence as subjective. Among the other sentences, 20 appear in a block of contiguous system-identified sentences that includes a member of SS. For example, in Table 12, (2.a) and (2.c) are in SS and (2.b) is in the same block of subjective sentences as they are. Similarly, (3.a) is in SS and (3.b) is in the same block. Among the remaining 15 sentences, 6 are adjacent to subjective sentences that were not identified by our system (so were not annotated by the judges). All of those sentences contain significant expressions of subjectivity of the writer or someone mentioned in the text, the criterion used in this work for classifying a sentence as subjective. Samples are shown in Table 14.</Paragraph> <Paragraph position="17"> Thus, 93% of the sentences identified by the system are subjective or are near subjective sentences. All the sentences, together with their tags and the sentences adjacent to them, are available on the Web at www.cs.pitt.edu/~wiebe.</Paragraph> </Section> <Section position="6" start_page="300" end_page="301" type="sub_section"> <SectionTitle> 4.6 Using Features for Opinion Piece Recognition </SectionTitle> <Paragraph position="0"> In this section, we assess the usefulness of the PSEs identified in Section 3 and listed in Table 9 by using them to perform document-level classification of opinion pieces.</Paragraph> <Paragraph position="1"> Opinion-piece classification is a difficult task for two reasons. First, as discussed in Section 2.1, both opinionated and factual documents tend to be composed of a mixture of subjective and objective language. Second, the natural distribution of documents in our data is heavily skewed toward nonopinion pieces. Despite these hurdles, using only 4 In contrast, Judge 1 classified only 53 (45%) of the control sentences as subjective, and Judge 2 classified only 47 (36%) of them as subjective.</Paragraph> <Paragraph position="2"> Wiebe, Wilson, Bruce, Bell, and Martin Learning Subjective Language our PSEs, we achieve positive results in opinion-piece classification using the basic k-nearest-neighbor (KNN) algorithm with leave-one-out cross-validation (Mitchell 1997). Given a document, the basic KNN algorithm classifies the document according to the majority classification of the document's k closest neighbors. For our purposes, each document is characterized by one feature, the count of all PSE instances (regardless of type) in the document, normalized by document length in words. The distance between two documents is simply the absolute value of the difference between the normalized PSE counts for the two documents.</Paragraph> <Paragraph position="3"> With leave-one-out cross-validation, the set of n documents to be classified is divided into a training set of size n[?]1 and a validation set of size 1. The one document in the validation set is then classified according to the majority classification of its k closest-neighbor documents in the training set. This process is repeated until every document is classified.</Paragraph> <Paragraph position="4"> Which value to use for k is chosen during a preprocessing phase. During the pre-processing phase, we run the KNN algorithm with leave-one-out cross-validation on a separate training set, for odd values of k from 1 to 15. The value of k that results in the best classification during the preprocessing phase is the one used for later KNN classification.</Paragraph> <Paragraph position="5"> For the classification experiment, the data set OP1 was used in the preprocessing phase to select the value of k, and then classification was performed on the 1,222 documents in OP2. During training on OP1, k equal to 15 resulted in the best classification. On the test set, OP2, we achieved a classification accuracy of 0.939; the baseline accuracy for choosing the most frequent class (nonopinion pieces) was 0.915. Our classification accuracy represents a 28% reduction in error and is significantly better than baseline according to McNemar's test (Everitt 1997).</Paragraph> <Paragraph position="6"> The positive results from the opinion piece classification show the usefulness of the various PSE features when used together.</Paragraph> </Section> </Section> class="xml-element"></Paper>