File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-1014_evalu.xml
Size: 10,870 bytes
Last Modified: 2025-10-06 13:59:02
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1014"> <Title>Learning Extraction Patterns for Subjective Expressions</Title> <Section position="7" start_page="0" end_page="0" type="evalu"> <SectionTitle> 4 Experimental Results </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Subjectivity Data </SectionTitle> <Paragraph position="0"> The text collection that we used consists of English-language versions of foreign news documents from FBIS, the U.S. Foreign Broadcast Information Service. The data is from a variety of countries. Our system takes unannotated data as input, but we needed annotated data to evaluate its performance. We briefly describe the manual annotation scheme used to create the gold-standard, and give interannotator agreement results.</Paragraph> <Paragraph position="1"> In 2002, a detailed annotation scheme (Wilson and Wiebe, 2003) was developed for a government-sponsored project. We only mention aspects of the annotation scheme relevant to this paper. The scheme was inspired by work in linguistics and literary theory on subjectivity, which focuses on how opinions, emotions, etc. are expressed linguistically in context (Banfield, 1982). The goal is to identify and characterize expressions of private states in a sentence. Private state is a general covering term for opinions, evaluations, emotions, and speculations (Quirk et al., 1985). For example, in sentence (1) the writer is expressing a negative evaluation.</Paragraph> <Paragraph position="2"> (1) &quot;The time has come, gentlemen, for Sharon, the as- null sassin, to realize that injustice cannot last long.&quot; Sentence (2) reflects the private state of Western countries. Mugabe's use of overwhelmingly also reflects a private state, his positive reaction to and characterization of his victory.</Paragraph> <Paragraph position="3"> (2) &quot;Western countries were left frustrated and impotent after Robert Mugabe formally declared that he had overwhelmingly won Zimbabwe's presidential election.&quot; Annotators are also asked to judge the strength of each private state. A private state may have low, medium, high or extreme strength.</Paragraph> <Paragraph position="4"> To allow us to measure interannotator agreement, three annotators (who are not authors of this paper) independently annotated the same 13 documents with a total of 210 sentences. We begin with a strict measure of agreement at the sentence level by first considering whether the annotator marked any private-state expression, of any strength, anywhere in the sentence. If so, the sentence is subjective. Otherwise, it is objective. The average pair-wise percentage agreement is 90% and the average pair-wise value is 0.77.</Paragraph> <Paragraph position="5"> One would expect that there are clear cases of objective sentences, clear cases of subjective sentences, and borderline sentences in between. The agreement study supports this. In terms of our annotations, we define a sentence as borderline if it has at least one private-state expression identified by at least one annotator, and all strength ratings of private-state expressions are low. On average, 11% of the corpus is borderline under this definition. When those sentences are removed, the average pairwise percentage agreement increases to 95% and the average pairwise value increases to 0.89.</Paragraph> <Paragraph position="6"> As expected, the majority of disagreement cases involve low-strength subjectivity. The annotators consistently agree about which are the clear cases of subjective sentences. This leads us to define the gold-standard that we use when evaluating our results. A sentence is subjective if it contains at least one private-state expression of medium or higher strength. The second class, which we call objective, consists of everything else.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Evaluation of the Learned Patterns </SectionTitle> <Paragraph position="0"> Our pool of unannotated texts consists of 302,163 individual sentences. The HP-Subj classifier initially labeled roughly 44,300 of these sentences as subjective, and the HP-Obj classifier initially labeled roughly 17,000 sentences as objective. In order to keep the training set relatively balanced, we used all 17,000 objective sentences and 17,000 of the subjective sentences as training data for the extraction pattern learner.</Paragraph> <Paragraph position="1"> 17,073 extraction patterns were learned that have frequency 2 and Pr(subjective j patterni) .60 on the training data. We then wanted to determine whether the extraction patterns are, in fact, good indicators of subjectivity. To evaluate the patterns, we applied different subsets of them to a test set to see if they consistently occur in subjective sentences. This test set consists of 3947 level) precision for the learned extraction patterns on the test set. In this figure, precision is the proportion of pattern instances found in the test set that are in subjective sentences, and recall is the proportion of subjective sentences that contain at least one pattern instance.</Paragraph> <Paragraph position="2"> We evaluated 18 different subsets of the patterns, by selecting the patterns that pass certain thresholds in the training data. We tried all combinations of 1 = f2,10g and 2 = f.60,.65,.70,.75,.80,.85,.90,.95,1.0g. The data points corresponding to 1=2 are shown on the upper line in Figure 4, and those corresponding to 1=10 are shown on the lower line. For example, the data point corresponding to 1=10 and 2=.90 evaluates only the extraction patterns that occur at least 10 times in the training data and with a probability .90 (i.e., at least 90% of its occurrences are in subjective training sentences).</Paragraph> <Paragraph position="3"> Overall, the extraction patterns perform quite well.</Paragraph> <Paragraph position="4"> The precision ranges from 71% to 85%, with the expected tradeoff between precision and recall. This experiment confirms that the extraction patterns are effective at recognizing subjective expressions.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 Evaluation of the Bootstrapping Process </SectionTitle> <Paragraph position="0"> In our second experiment, we used the learned extraction patterns to classify previously unlabeled sentences from the unannotated text collection. The new subjective sentences were then fed back into the Extraction Pattern Learner to complete the bootstrapping cycle depicted by the rightmost dashed line in Figure 1. The Pattern-based Subjective Sentence Classifier classifies a sentence as subjective if it contains at least one extraction pattern with 1 5 and 2 1.0 on the training data. This process produced approximately 9,500 new subjective sentences that were previously unlabeled.</Paragraph> <Paragraph position="1"> Since our bootstrapping process does not learn new objective sentences, we did not want to simply add the new subjective sentences to the training set, or it would become increasingly skewed toward subjective sentences.</Paragraph> <Paragraph position="2"> Since HP-Obj had produced roughly 17,000 objective sentences used for training, we used the 9,500 new subjective sentences along with 7,500 of the previously identified subjective sentences as our new training set. In other words, the training set that we used during the second bootstrapping cycle contained exactly the same objective sentences as the first cycle, half of the same subjective sentences as the first cycle, and 9,500 brand new subjective sentences.</Paragraph> <Paragraph position="3"> On this second cycle of bootstrapping, the extraction pattern learner generated many new patterns that were not discovered during the first cycle. 4,248 new patterns were found that have 1 2 and 2 .60. If we consider only the strongest (most subjective) extraction patterns, 308 new patterns were found that had 1 10 and 2 1.0. This is a substantial set of new extraction patterns that seem to be very highly correlated with subjectivity.</Paragraph> <Paragraph position="4"> An open question was whether the new patterns provide additional coverage. To assess this, we did a simple test: we added the 4,248 new patterns to the original set of patterns learned during the first bootstrapping cycle. Then we repeated the same analysis that we depict in Figure 4. In general, the recall numbers increased by about 2-4% while the precision numbers decreased by less, from 0.5-2%.</Paragraph> <Paragraph position="5"> In our third experiment, we evaluated whether the learned patterns can improve the coverage of the high-precision subjectivity classifier (HP-Subj), to complete the bootstrapping loop depicted in the top-most dashed line of Figure 1. Our hope was that the patterns would allow more sentences from the unannotated text collection to be labeled as subjective, without a substantial drop in precision. For this experiment, we selected the learned extraction patterns that had 1 10 and 2 1.0 on the training set, since these seemed likely to be the most reliable (high precision) indicators of subjectivity.</Paragraph> <Paragraph position="6"> We modified the HP-Subj classifier to use extraction patterns as follows. All sentences labeled as subjective by the original HP-Subj classifier are also labeled as subjective by the new version. For previously unlabeled sentences, the new version classifies a sentence as subjective if (1) it contains two or more of the learned patterns, or (2) it contains one of the clues used by the original HP-Subj classifier and at least one learned pattern. Table 1 shows the performance results on the test set mentioned in Section 3.1 (2197 sentences) for both the original HP-Subj classifier and the new version that uses the learned extraction patterns. The extraction patterns produce a 7.2 percentage point gain in coverage, and only a 1.1 percentage point drop in precision. This result shows that the learned extraction patterns do improve the performance of the high-precision subjective sentence classifier, allowing it to classify more sentences as subjective with nearly the same high reliability.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> High-Precision Sentence Classifier </SectionTitle> <Paragraph position="0"> Table 2 gives examples of patterns used to augment the HP-Subj classifier which do not overlap in non-function words with any of the clues already known by the original system. For each pattern, we show an example sentence from our corpus that matches the pattern.</Paragraph> </Section> </Section> class="xml-element"></Paper>