File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/h05-2017_intro.xml
Size: 6,618 bytes
Last Modified: 2025-10-06 14:02:55
<?xml version="1.0" standalone="yes"?> <Paper uid="H05-2017"> <Title>OPINE: Extracting Product Features and Opinions from Reviews</Title> <Section position="3" start_page="0" end_page="32" type="intro"> <SectionTitle> 2 OPINE Overview </SectionTitle> <Paragraph position="0"> OPINE is built on top of KNOWITALL, a Web-based, domain-independent information extraction system (Etzioni et al., 2005). Given a set of relations of interest, KNOWITALL instantiates relation-specific generic extraction patterns into extraction rules which find candidate facts. The Assessor module then assigns a probability to each candidate using a form of Point-wise Mutual Information (PMI) between phrases that is estimated from Web search engine hit counts (Turney, 2003). It Input: product class C, reviews R.</Paragraph> <Paragraph position="1"> Output: set of [feature, ranked opinion list] tuples</Paragraph> <Paragraph position="3"> computes the PMI between each fact and discriminator phrases (e.g., &quot;is a scanner&quot; for the isA() relationship in the context of the Scanner class). Given fact f and discriminator d, the computed PMI score is:</Paragraph> <Paragraph position="5"> The PMI scores are converted to binary features for a Naive Bayes Classifier, which outputs a probability associated with each fact.</Paragraph> <Paragraph position="6"> Given product class C with instances I and reviews R, OPINE's goal is to find the set of (feature, opinions) tuples f(f, oi, ...oj)gs.t. f 2F and oi, ...oj 2O, where: a) F is the set of product class features in R.</Paragraph> <Paragraph position="7"> b) O is the set of opinion phrases in R.</Paragraph> <Paragraph position="8"> c) opinions associated with a particular feature are ranked based on their strength.</Paragraph> <Paragraph position="9"> OPINE's solution to this task is outlined in Figure 1. In the following, we describe in detail each step.</Paragraph> <Paragraph position="10"> Explicit Feature Extraction OPINE parses the reviews using the MINIPAR dependency parser (Lin, 1998) and applies a simple pronoun-resolution module to the parsed data. The system then finds explicitly mentioned product features (E) using an extended version of KNOWITALL's extract-and-assess strategy described above. OPINE extracts the following types of product features: properties, parts, features of product parts (e.g., ScannerCoverSize), related concepts (e.g., Image is related to Scanner) and parts and properties of related concepts (e.g., ImageSize). When compared on this task with the most relevant previous review-mining system in (Hu and Liu, 2004), OPINE obtains a 22% improvement in precision with only a 3% reduction in recall on the relevant 5 datasets. One third of this increase is due to OPINE's feature assessment step and the rest is due to the use of Web PMI statistics.</Paragraph> <Paragraph position="11"> Opinion Phrases OPINE extracts adjective, noun, verb and adverb phrases attached to explicit features as potential opinion phrases. OPINE then collectively assigns positive, negative or neutral semantic orientation (SO) labels to their respective head words. This problem is similar to labeling problems in computer vision and OPINE uses a well-known computer vision technique, relaxation labeling, as the basis of a 3-step SO label assignment procedure. First, OPINE identifies the average SO label for a word w in the context of the review set. Second, OPINE identifies the average SO label for each word w in the context of a feature f and of the review set (&quot;hot&quot; has a negative connotation in &quot;hot room&quot;, but a positive one in &quot;hot water&quot;). Finally, OPINE identifies the SO label of word w in the context of feature f and sentence s. For example, some people like large scanners (&quot;I love this large scanner&quot;) and some do not (&quot;I hate this large scanner&quot;). The phrases with non-neutral head words are retained as opinion phrases and their polarity is established accordingly. On the task of opinion phrase extraction, OPINE obtains a precision of 79% and a recall of 76% and on the task of opinion phrase polarity extraction OPINE obtains a precision of 86% and a recall of 84%.</Paragraph> <Paragraph position="12"> Implicit Features Opinion phrases refer to properties, which are sometimes implicit (e.g., &quot;tiny phone&quot; refers to the phone size). In order to extract such properties, OPINE first clusters opinion phrases (e.g., tiny and small will be placed in the same cluster), automatically labels the clusters with property names (e.g., Size) and uses them to build implicit features (e.g., PhoneSize). Opinion phrases are clustered using a mixture of WordNet information (e.g., antonyms are placed in the same cluster) and lexical pattern information (e.g., &quot;clean, almost spotless&quot; suggests that &quot;clean&quot; and &quot;spotless&quot; are likely to refer to the same property). (Hu and Liu, 2004) doesn't handle implicit features, so we have evaluated the impact of implicit feature extraction on two separate sets of reviews in the Hotels and Scanners domains. Extracting implicit features (in addition to explicit features) has resulted in a 2% increase in precision and a 6% increase in recall for OPINE on the task of feature extraction.</Paragraph> <Section position="1" start_page="32" end_page="32" type="sub_section"> <SectionTitle> Ranking Opinion Phrases Given an opinion cluster, </SectionTitle> <Paragraph position="0"> OPINE uses the final probabilities associated with the SO labels in order to derive an initial opinion phrase strength ranking (e.g., great > good > average) in the manner of (Turney, 2003). OPINE then uses Web-derived constraints on the relative strength of phrases in order to improve this ranking. Patterns such as &quot;a1, (*) even a2&quot; are good indicators of how strong a1 is relative to a2. OPINE bootstraps a set of such patterns and instantiates them with pairs of opinions in order to derive constraints such as strength(deafening) > strength(loud). OPINE also uses synonymy and antonymy-based constraints such as strength(clean) = strength(dirty). The constraint set induces a constraint satisfaction problem whose solution is a ranking of the respective cluster opinions (the remaining opinions maintain their default ranking). OPINE's accuracy on the opinion ranking task is 87%. Finally, OPINE outputs a set of (feature, ranked opinions) tuples for each product.</Paragraph> </Section> </Section> class="xml-element"></Paper>