File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/06/p06-2079_relat.xml
Size: 5,335 bytes
Last Modified: 2025-10-06 14:15:56
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2079"> <Title>Examining the Role of Linguistic Knowledge Sources in the Automatic Identification and Classification of Reviews</Title> <Section position="4" start_page="611" end_page="612" type="relat"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="611" end_page="611" type="sub_section"> <SectionTitle> 2.1 Review Identification </SectionTitle> <Paragraph position="0"> As noted in the introduction, while a review can contain both subjective and objective phrases, our non-reviews are essentially factual documents in which subjective expressions can rarely be found.</Paragraph> <Paragraph position="1"> Hence, review identification can be viewed as an instance of the broader task of classifying whether a document is mostly factual/objective or mostly opinionated/subjective. There have been attempts on tackling this so-called document-level subjectivity classification task, with very encouraging results (see Yu and Hatzivassiloglou (2003) and Wiebe et al. (2004) for details).</Paragraph> </Section> <Section position="2" start_page="611" end_page="612" type="sub_section"> <SectionTitle> 2.2 Polarity Classification </SectionTitle> <Paragraph position="0"> There is a large body of work on classifying the polarity of a document (e.g., Pang et al. (2002), Turney (2002)), a sentence (e.g., Liu et al. (2003), Yu and Hatzivassiloglou (2003), Kim and Hovy (2004), Gamon et al. (2005)), a phrase (e.g., Wilson et al. (2005)), and a specific object (such as a product) mentioned in a document (e.g., Morinaga et al. (2002), Yi et al. (2003), Popescu and Etzioni (2005)). Below we will center our discussion of related work around the four types of features we will explore for polarity classification.</Paragraph> <Paragraph position="1"> Higher-order n-grams. While n-grams offer a simple way of capturing context, previous work has rarely explored the use of n-grams as features in a polarity classification system beyond unigrams. Two notable exceptions are the work of Dave et al. (2003) and Pang et al. (2002). Interestingly, while Dave et al. report good performance on classifying reviews using bigrams or trigrams alone, Pang et al. show that bigrams are not useful features for the task, whether they are used in isolation or in conjunction with unigrams. This motivates us to take a closer look at the utility of higher-order n-grams in polarity classification.</Paragraph> <Paragraph position="2"> Manually-tagged term polarity. Much work has been performed on learning to identify and classify polarity terms (i.e., terms expressing a positive sentiment (e.g., happy) or a negative sentiment (e.g., terrible)) and exploiting them to do polarity classification (e.g., Hatzivassiloglou and McKeown (1997), Turney (2002), Kim and Hovy (2004), Whitelaw et al. (2005), Esuli and Sebastiani (2005)). Though reasonably successful, these (semi-)automatic techniques often yield lexicons that have either high coverage/low precision or low coverage/high precision. While manually constructed positive and negative word lists exist (e.g., General Inquirer1), they too suffer from the problem of having low coverage. This prompts us to manually construct our own polarity word lists2 and study their use in polarity classification.</Paragraph> <Paragraph position="3"> Dependency relations. There have been several attempts at extracting features for polarity classification from dependency parses, but most focus on extracting specific types of information such as adjective-noun relations (e.g., Dave et al. (2003), Yi et al. (2003)) or nouns that enjoy a dependency relation with a polarity term (e.g., Popescu and Etzioni (2005)). Wilson et al. (2005) extract a larger variety of features from dependency parses, but unlike us, their goal is to determine the polarity of a phrase, not a document. In comparison to previous work, we investigate the use of a larger set of dependency relations for classifying reviews.</Paragraph> <Paragraph position="4"> Objective information. The objective portions of a review do not contain the author's opinion; hence features extracted from objective sentences and phrases are irrelevant with respect to the polarity classification task and their presence may complicate the learning task. Indeed, recent work has shown that benefits can be made by first separating facts from opinions in a document (e.g, Yu and Hatzivassiloglou (2003)) and classifying the polarity based solely on the subjective portions of the document (e.g., Pang and Lee (2004)). Motivated by the work of Koppel and Schler (2005), we identify and extract objective material from non-reviews and show how to exploit such information in polarity classification.</Paragraph> <Paragraph position="5"> Finally, previous work has also investigated features that do not fall into any of the above categories. For instance, instead of representing the polarity of a term using a binary value, Mullen and Collier (2004) use Turney's (2002) method to assign a real value to represent term polarity and introduce a variety of numerical features that are aggregate measures of the polarity values of terms selected from the document under consideration.</Paragraph> </Section> </Section> class="xml-element"></Paper>