File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1014_intro.xml

Size: 7,582 bytes

Last Modified: 2025-10-06 14:01:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1014">
  <Title>Learning Extraction Patterns for Subjective Expressions</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Background
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Subjectivity Analysis
</SectionTitle>
      <Paragraph position="0"> Much previous work on subjectivity recognition has focused on document-level classification. For example, (Spertus, 1997) developed a system to identify inflammatory texts and (Turney, 2002; Pang et al., 2002) developed methods for classifying reviews as positive or negative.</Paragraph>
      <Paragraph position="1"> Some research in genre classification has included the recognition of subjective genres such as editorials (e.g., (Karlgren and Cutting, 1994; Kessler et al., 1997; Wiebe et al., 2001)).</Paragraph>
      <Paragraph position="2"> In contrast, the goal of our work is to classify individual sentences as subjective or objective. Document-level classification can distinguish between &amp;quot;subjective texts&amp;quot;, such as editorials and reviews, and &amp;quot;objective texts,&amp;quot; such as newspaper articles. But in reality, most documents contain a mix of both subjective and objective sentences.</Paragraph>
      <Paragraph position="3"> Subjective texts often include some factual information.</Paragraph>
      <Paragraph position="4"> For example, editorial articles frequently contain factual information to back up the arguments being made, and movie reviews often mention the actors and plot of a movie as well as the theatres where it's currently playing.</Paragraph>
      <Paragraph position="5"> Even if one is willing to discard subjective texts in their entirety, the objective texts usually contain a great deal of subjective information in addition to facts. For example, newspaper articles are generally considered to be relatively objective documents, but in a recent study (Wiebe et al., 2001) 44% of sentences in a news collection were found to be subjective (after editorial and review articles were removed).</Paragraph>
      <Paragraph position="6"> One of the main obstacles to producing a sentence-level subjectivity classifier is a lack of training data. To train a document-level classifier, one can easily find collections of subjective texts, such as editorials and reviews. For example, (Pang et al., 2002) collected reviews from a movie database and rated them as positive, negative, or neutral based on the rating (e.g., number of stars) given by the reviewer. It is much harder to obtain collections of individual sentences that can be easily identified as subjective or objective. Previous work on sentence-level subjectivity classification (Wiebe et al., 1999) used training corpora that had been manually annotated for subjectivity. Manually producing annotations is time consuming, so the amount of available annotated sentence data is relatively small.</Paragraph>
      <Paragraph position="7"> The goal of our research is to use high-precision subjectivity classifiers to automatically identify subjective and objective sentences in unannotated text corpora. The high-precision classifiers label a sentence as subjective or objective when they are confident about the classification, and they leave a sentence unlabeled otherwise. Unannotated texts are easy to come by, so even if the classifiers can label only 30% of the sentences as subjective or objective, they will still produce a large collection of labeled sentences. Most importantly, the high-precision classifiers can generate a much larger set of labeled sentences than are currently available in manually created data sets.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Extraction Patterns
</SectionTitle>
      <Paragraph position="0"> Information extraction (IE) systems typically use lexico-syntactic patterns to identify relevant information. The specific representation of these patterns varies across systems, but most patterns represent role relationships surrounding noun and verb phrases. For example, an IE system designed to extract information about hijackings might use the pattern hijacking of &lt;x&gt;, which looks for the noun hijacking and extracts the object of the preposition of as the hijacked vehicle. The pattern &lt;x&gt; was hijacked would extract the hijacked vehicle when it finds the verb hijacked in the passive voice, and the pattern &lt;x&gt; hijacked would extract the hijacker when it finds the verb hijacked in the active voice.</Paragraph>
      <Paragraph position="1"> One of our hypotheses was that extraction patterns would be able to represent subjective expressions that have noncompositional meanings. For example, consider the common expression drives (someone) up the wall, which expresses the feeling of being annoyed with something. The meaning of this expression is quite different from the meanings of its individual words (drives, up, wall). Furthermore, this expression is not a fixed word sequence that could easily be captured by N-grams. It is a relatively flexible construction that may be more generally represented as &lt;x&gt; drives &lt;y&gt; up the wall, where x and y may be arbitrary noun phrases. This pattern would match many different sentences, such as &amp;quot;George drives me up the wall,&amp;quot; &amp;quot;She drives the mayor up the wall,&amp;quot; or &amp;quot;The nosy old man drives his quiet neighbors up the wall.&amp;quot; We also wondered whether the extraction pattern representation might reveal slight variations of the same verb or noun phrase that have different connotations. For example, you can say that a comedian bombed last night, which is a subjective statement, but you can't express this sentiment with the passive voice of bombed. In Section 3.2, we will show examples of extraction patterns representing subjective expressions which do in fact exhibit both of these phenomena.</Paragraph>
      <Paragraph position="2"> A variety of algorithms have been developed to automatically learn extraction patterns. Most of these algorithms require special training resources, such as texts annotated with domain-specific tags (e.g., AutoSlog (Riloff, 1993), CRYSTAL (Soderland et al., 1995), RAPIER (Califf, 1998), SRV (Freitag, 1998), WHISK (Soderland, 1999)) or manually defined keywords, frames, or object recognizers (e.g., PALKA (Kim and Moldovan, 1993) and LIEP (Huffman, 1996)).</Paragraph>
      <Paragraph position="3"> AutoSlog-TS (Riloff, 1996) takes a different approach, requiring only a corpus of unannotated texts that have been separated into those that are related to the target domain (the &amp;quot;relevant&amp;quot; texts) and those that are not (the &amp;quot;irrelevant&amp;quot; texts). Most recently, two bootstrapping algorithms have been used to learn extraction patterns. Meta-bootstrapping (Riloff and Jones, 1999) learns both extraction patterns and a semantic lexicon using unannotated texts and seed words as input. ExDisco (Yangarber et al., 2000) uses a bootstrapping mechanism to find new extraction patterns using unannotated texts and some seed patterns as the initial input.</Paragraph>
      <Paragraph position="4"> For our research, we adopted a learning process very similar to that used by AutoSlog-TS, which requires only relevant texts and irrelevant texts as its input. We describe this learning process in more detail in the next section.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML