File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/h05-1045_metho.xml

Size: 11,489 bytes

Last Modified: 2025-10-06 14:09:31

<?xml version="1.0" standalone="yes"?>
<Paper uid="H05-1045">
  <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 355-362, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns</Title>
  <Section position="4" start_page="355" end_page="357" type="metho">
    <SectionTitle>
3 Semantic Tagging via Conditional
Random Fields
</SectionTitle>
    <Paragraph position="0"> We defined the problem of opinion source identification as a sequence tagging task via CRFs as follows. Given a sequence of tokens, x = x1x2...xn, we need to generate a sequence of tags, or labels, y = y1y2...yn. We define the set of possible label values as 'S', 'T', '-', where 'S' is the first token (or Start) of a source, 'T' is a non-initial token (i.e., a conTinuation) of a source, and '-'is a token that is not part of any source.2 A detailed description of CRFs can be found in  Lafferty et al. (2001). For our sequence tagging problem, we create a linear-chain CRF based on an undirected graph G = (V,E), where V is the set of random variables Y = fYij1 i ng, one for each of n tokens in an input sentence; and E = f(Yi[?]1,Yi)j1 &lt; i ng is the set of n 1 edges forming a linear chain. For each sentence x, we define a non-negative clique potential exp(summationtextKk=1 lkfk(yi[?]1,yi,x)) for each edge, and exp(summationtextKprimek=1 lprimekfprimek(yi,x)) for each node, where fk(...) is a binary feature indicator function, lk is a weight assigned for each feature function, and K and Kprime are the number of features defined for edges and nodes respectively. Following Lafferty et al. (2001), the conditional probability of a sequence of labels y given a sequence of tokens x is:</Paragraph>
    <Paragraph position="2"> (2) where Zx is a normalization constant for each x. Given the training data D, a set of sentences paired with their correct 'ST-' source label sequences, the parameters of the model are trained to maximize the conditional log-likelihoodproducttext (x,y)[?]D P(yjx). For inference, given a sentence x in the test data, the tagging sequence y is given by argmaxyprimeP(yprimejx).</Paragraph>
    <Section position="1" start_page="356" end_page="357" type="sub_section">
      <SectionTitle>
3.1 Features
</SectionTitle>
      <Paragraph position="0"> To develop features, we considered three properties of opinion sources. First, the sources of opinions are mostly noun phrases. Second, the source phrases should be semantic entities that can bear or express opinions. Third, the source phrases should be directly related to an opinion expression. When considering only the first and second criteria, this task reduces to named entity recognition. Because of the third condition, however, the task requires the recognition of opinion expressions and a more sophisticated encoding of sentence structure to capture relationships between source phrases and opinion expressions. null With these properties in mind, we define the following features for each token/word xi in an input sentence. For pedagogical reasons, we will describe some of the features as being multi-valued or categorical features. In practice, however, all features are binarized for the CRF model.</Paragraph>
      <Paragraph position="1"> Capitalization features We use two boolean features to represent the capitalization of a word: all-capital, initial-capital.</Paragraph>
      <Paragraph position="2"> Part-of-speech features Based on the lexical categories produced by GATE (Cunningham et al., 2002), each token xi is classified into one of a set of coarse part-of-speech tags: noun, verb, adverb, wh-word, determiner, punctuation, etc. We do the same for neighboring words in a [ 2, +2] window in order to assist noun phrase segmentation.</Paragraph>
      <Paragraph position="3"> Opinion lexicon features For each token xi, we include a binary feature that indicates whether or not the word is in our opinion lexicon -- a set of words that indicate the presence of an opinion. We do the same for neighboring words in a [ 1, +1] window.</Paragraph>
      <Paragraph position="4"> Additionally, we include for xi a feature that indicates the opinion subclass associated with xi, if available from the lexicon. (e.g., &amp;quot;bless&amp;quot; is classified as &amp;quot;moderately subjective&amp;quot; according to the lexicon, while &amp;quot;accuse&amp;quot; and &amp;quot;berate&amp;quot; are classified more specifically as &amp;quot;judgments&amp;quot;.) The lexicon is initially populated with approximately 500 opinion words 3 from (Wiebe et al., 2002), and then augmented with opinion words identified in the training data. The training data contains manually produced phrase-level annotations for all expressions of opinions, emotions, etc. (Wiebe et al., 2005). We collected all content words that occurred in the training set such that at least 50% of their occurrences were in opinion annotations.</Paragraph>
      <Paragraph position="5"> Dependency tree features For each token xi, we create features based on the parse tree produced by the Collins (1999) dependency parser. The purpose of the features is to (1) encode structural information, and (2) indicate whether xi is involved in any grammatical relations with an opinion word. Two pre-processing steps are required before features can be constructed: 3Some words are drawn from Levin (1993); others are from Framenet lemmas (Baker et al. 1998) associated with communication verbs.</Paragraph>
      <Paragraph position="6">  1. Syntactic chunking. We traverse the depen- null dency tree using breadth-first search to identify and group syntactically related nodes, producing a flatter, more concise tree. Each syntactic &amp;quot;chunk&amp;quot; is also assigned a grammatical role (e.g., subject, object, verb modifier, time, location, of-pp, by-pp) based on its constituents. Possessives (e.g., &amp;quot;Clinton's idea&amp;quot;) and the phrase &amp;quot;according to X&amp;quot; are handled as special cases in the chunking process.</Paragraph>
      <Paragraph position="7"> 2. Opinion word propagation. Although the opinion lexicon contains only content words and no multi-word phrases, actual opinions often comprise an entire phrase, e.g., &amp;quot;is really willing&amp;quot; or &amp;quot;in my opinion&amp;quot;. As a result, we mark as an opinion the entire chunk that contains an opinion word. This allows each token in the chunk to act as an opinion word for feature encoding.</Paragraph>
      <Paragraph position="8"> After syntactic chunking and opinion word propagation, we create the following dependency tree features for each token xi: the grammatical role of its chunk the grammatical role of xi[?]1's chunk whether the parent chunk includes an opinion word whether xi's chunk is in an argument position with respect to the parent chunk whether xi represents a constituent boundary Semantic class features We use 7 binary features to encode the semantic class of each word xi: authority, government, human, media, organizationor company, proper name, and other. The other class captures 13 semantic classes that cannot be sources, such as vehicle and time.</Paragraph>
      <Paragraph position="9"> Semantic class information is derived from named entity and semantic class labels assigned to xi by the Sundance shallow parser (Riloff, 2004). Sundance uses named entity recognition rules to label noun phrases as belonging to named entity classes, and assigns semantic tags to individual words based on a semantic dictionary. Table 1 shows the hierarchy that Sundance uses for semantic classes associated with opinion sources. Sundance is also used to recognize and instantiate the source extraction patterns  that are learned by AutoSlog-SE, which is described in the next section.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="357" end_page="358" type="metho">
    <SectionTitle>
4 Semantic Tagging via Extraction
</SectionTitle>
    <Paragraph position="0"> We also learn patterns to extract opinion sources using a statistical adaptation of the AutoSlog IE learning algorithm. AutoSlog (Riloff, 1996a) is a supervised extraction pattern learner that takes a training corpus of texts and their associated answer keys as input. A set of heuristics looks at the context surrounding each answer and proposes a lexico-syntactic pattern to extract that answer from the text. The heuristics are not perfect, however, so the resulting set of patterns needs to be manually reviewed by a person.</Paragraph>
    <Paragraph position="1"> In order to build a fully automatic system that does not depend on manual review, we combined AutoSlog's heuristics with statistics from the annotated training data to create a fully automatic supervised learner. We will refer to this learner as AutoSlog-SE (Statistically Enhanced variation of AutoSlog). AutoSlog-SE's learning process has three steps: Step 1: AutoSlog's heuristics are applied to every noun phrase (NP) in the training corpus. This generates a set of extraction patterns that, collectively, can extract every NP in the training corpus.</Paragraph>
    <Paragraph position="2"> Step 2: The learned patterns are augmented with selectional restrictions that semantically constrain the types of noun phrases that are legitimate extractions for opinion sources. We used  the semantic classes shown in Figure 1 as selectional restrictions.</Paragraph>
    <Paragraph position="3"> Step 3: The patterns are applied to the training corpus and statistics are gathered about their extractions. We count the number of extractions that match annotations in the corpus (correct extractions) and the number of extractions that do not match annotations (incorrect extractions). These counts are then used to estimate the probability that the pattern will extract an opinion source in new texts: P(source  |patterni) = correct sourcescorrect sources + incorrect sources This learning process generates a set of extraction patterns coupled with probabilities. In the next section, we explain how these extraction patterns are represented as features in the CRF model.</Paragraph>
  </Section>
  <Section position="6" start_page="358" end_page="358" type="metho">
    <SectionTitle>
5 Extraction Pattern Features for the CRF
</SectionTitle>
    <Paragraph position="0"> The extraction patterns provide two kinds of information. SourcePatt indicates whether a word activates any source extraction pattern. For example, the word &amp;quot;complained&amp;quot; activates the pattern &amp;quot;&lt;subj&gt; complained&amp;quot; because it anchors the expression. SourceExtrindicates whether a word is extracted by any source pattern. For example, in the sentence &amp;quot;President Jacques Chirac frequently complained about France's economy&amp;quot;, the words &amp;quot;President&amp;quot;, &amp;quot;Jacques&amp;quot;, and &amp;quot;Chirac&amp;quot; would all be extracted by the &amp;quot;&lt;subj&gt; complained&amp;quot; pattern.</Paragraph>
    <Paragraph position="1"> Each extraction pattern has frequency and probability values produced by AutoSlog-SE, hence we create four IE pattern-based features for each token xi: SourcePatt-Freq, SourceExtr-Freq, SourcePatt-Prob, and SourceExtr-Prob, where the frequency values are divided into three ranges: f0, 1, 2+g and the probability values are divided into five ranges of equal size.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML