XML Viewer - w03-1014

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1014_metho.xml
Size: 11,424 bytes
Last Modified: 2025-10-06 14:08:27
<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1014">
  <Title>Learning Extraction Patterns for Subjective Expressions</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Learning and Bootstrapping Extraction
</SectionTitle>
    <Paragraph position="0"> We have developed a bootstrapping process for subjectivity classification that explores three ideas: (1) high-precision classifiers can be used to automatically identify subjective and objective sentences from unannotated texts, (2) this data can be used as a training set to automatically learn extraction patterns associated with subjectivity, and (3) the learned patterns can be used to grow the training set, allowing this entire process to be bootstrapped. null Figure 1 shows the components and layout of the bootstrapping process. The process begins with a large collection of unannotated text and two high precision subjectivity classifiers. One classifier searches the unannotated corpus for sentences that can be labeled as subjective with high confidence, and the other classifier searches for sentences that can be labeled as objective with high confidence. All other sentences in the corpus are left unlabeled. The labeled sentences are then fed to an extraction pattern learner, which produces a set of extraction patterns that are statistically correlated with the subjective sentences (we will call these the subjective patterns). These patterns are then used to identify more sentences within the unannotated texts that can be classified as subjective. The extraction pattern learner can then re-train using the larger training set and the process repeats.</Paragraph>
    <Paragraph position="1"> The subjective patterns can also be added to the high-precision subjective sentence classifier as new features to improve its performance. The dashed lines in Figure 1 represent the parts of the process that are bootstrapped.</Paragraph>
    <Paragraph position="2"> In this section, we will describe the high-precision sentence classifiers, the extraction pattern learning process, and the details of the bootstrapping process.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 High-Precision Subjectivity Classifiers
</SectionTitle>
      <Paragraph position="0"> The high-precision classifiers (HP-Subj and HP-Obj) use lists of lexical items that have been shown in previous work to be good subjectivity clues. Most of the items are single words, some are N-grams, but none involve syntactic generalizations as in the extraction patterns. Any data used to develop this vocabulary does not overlap with the test sets or the unannotated data used in this paper.</Paragraph>
      <Paragraph position="1"> Many of the subjective clues are from manually developed resources, including entries from (Levin, 1993; Ballmer and Brennenstuhl, 1981), Framenet lemmas with frame element experiencer (Baker et al., 1998), adjectives manually annotated for polarity (Hatzivassiloglou and McKeown, 1997), and subjectivity clues listed in (Wiebe, 1990). Others were derived from corpora, including subjective nouns learned from unannotated data using bootstrapping (Riloff et al., 2003).</Paragraph>
      <Paragraph position="2"> The subjectivity clues are divided into those that are strongly subjective and those that are weakly subjective, using a combination of manual review and empirical results on a small training set of manually annotated data.</Paragraph>
      <Paragraph position="3"> As the terms are used here, a strongly subjective clue is one that is seldom used without a subjective meaning, whereas a weakly subjective clue is one that commonly has both subjective and objective uses.</Paragraph>
      <Paragraph position="4"> The high-precision subjective classifier classifies a sentence as subjective if it contains two or more of the strongly subjective clues. On a manually annotated test set, this classifier achieves 91.5% precision and 31.9% recall (that is, 91.5% of the sentences that it selected are subjective, and it found 31.9% of the subjective sentences in the test set). This test set consists of 2197 sentences, 59% of which are subjective.</Paragraph>
      <Paragraph position="5"> The high-precision objective classifier takes a different approach. Rather than looking for the presence of lexical items, it looks for their absence. It classifies a sentence as objective if there are no strongly subjective clues and at most one weakly subjective clue in the current, previous, and next sentence combined. Why doesn't the objective classifier mirror the subjective classifier, and consult its own list of strongly objective clues? There are certainly lexical items that are statistically correlated with the ob- null jective class (examples are cardinal numbers (Wiebe et al., 1999), and words such as per, case, market, and total), but the presence of such clues does not readily lead to high precision objective classification. Add sarcasm or a negative evaluation to a sentence about a dry topic such as stock prices, and the sentence becomes subjective. Conversely, add objective topics to a sentence containing two strongly subjective words such as odious and scumbag, and the sentence remains subjective.</Paragraph>
      <Paragraph position="6"> The performance of the high-precision objective classifier is a bit lower than the subjective classifier: 82.6% precision and 16.4% recall on the test set mentioned above (that is, 82.6% of the sentences selected by the objective classifier are objective, and the objective classifier found 16.4% of the objective sentences in the test set). Although there is room for improvement, the performance proved to be good enough for our purposes.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Learning Subjective Extraction Patterns
</SectionTitle>
      <Paragraph position="0"> To automatically learn extraction patterns that are associated with subjectivity, we use a learning algorithm similar to AutoSlog-TS (Riloff, 1996). For training, AutoSlog-TS uses a text corpus consisting of two distinct sets of texts: &amp;quot;relevant&amp;quot; texts (in our case, subjective sentences) and &amp;quot;irrelevant&amp;quot; texts (in our case, objective sentences).</Paragraph>
      <Paragraph position="1"> A set of syntactic templates represents the space of possible extraction patterns.</Paragraph>
      <Paragraph position="2"> The learning process has two steps. First, the syntactic templates are applied to the training corpus in an exhaustive fashion, so that extraction patterns are generated for (literally) every possible instantiation of the templates that appears in the corpus. The left column of Figure 2 shows the syntactic templates used by AutoSlog-TS. The right column shows a specific extraction pattern that was learned during our subjectivity experiments as an instantiation of the syntactic form on the left. For example, the pattern &lt;subj&gt; was satisfied1 will match any sentence where the verb satisfied appears in the passive voice. The pattern &lt;subj&gt; dealt blow represents a more complex expression that will match any sentence that contains a verb phrase with head=dealt followed by a direct object with head=blow. This would match sentences such as &amp;quot;The experience dealt a stiff blow to his pride.&amp;quot; It is important to recognize that these patterns look for specific syntactic constructions produced by a (shallow) parser, rather than exact word sequences.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
SYNTACTIC FORM EXAMPLE PATTERN
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
PATTERN FREQ %SUBJ
</SectionTitle>
    <Paragraph position="0"> The second step of AutoSlog-TS's learning process applies all of the learned extraction patterns to the training corpus and gathers statistics for how often each pattern occurs in subjective versus objective sentences.</Paragraph>
    <Paragraph position="1"> AutoSlog-TS then ranks the extraction patterns using a metric called RlogF (Riloff, 1996) and asks a human to review the ranked list and make the final decision about which patterns to keep.</Paragraph>
    <Paragraph position="2"> In contrast, for this work we wanted a fully automatic process that does not depend on a human reviewer, and we were most interested in finding patterns that can identify subjective expressions with high precision. So we ranked the extraction patterns using a conditional probability measure: the probability that a sentence is subjective given that a specific extraction pattern appears in it. The exact formula is: Pr(subjective j patterni) = subjfreq(patterni)freq(patterni) where subjfreq(patterni) is the frequency of patterni in subjective training sentences, and freq(patterni) is the frequency of patterni in all training sentences. (This may also be viewed as the precision of the pattern on the training data.) Finally, we use two thresholds to select extraction patterns that are strongly associated with subjectivity in the training data. We choose extraction patterns for which freq(patterni) 1 and Pr(subjective j patterni) 2.</Paragraph>
    <Paragraph position="3"> Figure 3 shows some patterns learned by our system, the frequency with which they occur in the training data (FREQ) and the percentage of times they occur in subjective sentences (%SUBJ). For example, the first two rows show the behavior of two similar expressions using the verb asked. 100% of the sentences that contain asked in the passive voice are subjective, but only 63% of the sentences that contain asked in the active voice are subjective. A human would probably not expect the active and passive voices to behave so differently. To understand why this is so, we looked in the training data and found that the passive voice is often used to query someone about a specific opinion. For example, here is one such sentence from our training set: &amp;quot;Ernest Bai Koroma of RITCORP was asked to address his supporters on his views relating to 'full blooded Temne to head APC'.&amp;quot; In contrast, many of the sentences containing asked in the active voice are more general in nature, such as &amp;quot;The mayor asked a newly formed JR about his petition.&amp;quot; Figure 3 also shows that expressions using talk as a noun (e.g., &amp;quot;Fred is the talk of the town&amp;quot;) are highly correlated with subjective sentences, while talk as a verb (e.g., &amp;quot;The mayor will talk about...&amp;quot;) are found in a mix of subjective and objective sentences. Not surprisingly, longer expressions tend to be more idiomatic (and subjective) than shorter expressions (e.g., put an end (to) vs. put; is going to be vs. is going; was expected from vs. was expected). Finally, the last two rows of Figure 3 show that expressions involving the noun fact are highly correlated with subjective expressions! These patterns match sentences such as The fact is... and ... is a fact, which apparently are often used in subjective contexts. This example illustrates that the corpus-based learning method can find phrases that might not seem subjective to a person intuitively, but that are reliable indicators of subjectivity.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML