File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/p05-2008_intro.xml

Size: 3,078 bytes

Last Modified: 2025-10-06 14:03:08

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-2008">
  <Title>Using Emoticons to reduce Dependency in Machine Learning Techniques for Sentiment Classification</Title>
  <Section position="3" start_page="0" end_page="43" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Recent years have seen an increasing amount of research effort expended in the area of understanding sentiment in textual resources. A sub-topic of this research is that of Sentiment Classification. That is, given a problem text, can computational methods determine if the text is generally positive or generally negative? Several diverse applications exist for this potential technology, ranging from the automatic filtering of abusive messages (Spertus, 1997) to an in-depth analysis of market trends and consumer opinions (Dave et al., 2003). This is a complex and challenging task for a computer to achieve -- consider the difficulties involved in instructing a computer to recognise sarcasm, for example.</Paragraph>
    <Paragraph position="1"> Previous work has shown that traditional text classification approaches can be quite effective when applied to the sentiment analysis problem. Models such as Na&amp;quot;ive Bayes (NB), Maximum Entropy (ME) and Support Vector Machines (SVM) can determine the sentiment of texts. Pang et al. (2002) used a bagof-features framework (based on unigrams and bigrams) to train these models from a corpus of movie reviews labelled as positive or negative. The best accuracy achieved was 82.9%, using an SVM trained on unigram features. A later study (Pang and Lee, 2004) found that performance increased to 87.2% when considering only those portions of the text deemed to be subjective.</Paragraph>
    <Paragraph position="2"> However, Engstr&amp;quot;om (2004) showed that the bagof-features approach is topic-dependent. A classifier trained on movie reviews is unlikely to perform as well on (for example) reviews of automobiles. Turney (2002) noted that the unigram unpredictable might have a positive sentiment in a movie review (e.g. unpredictable plot), but could be negative in the review of an automobile (e.g. unpredictable steering). In this paper, we demonstrate how the models are also domain-dependent -- how a classifier trained on product reviews is not effective when evaluating the sentiment of newswire articles, for example. Furthermore, we show how the models are temporally-dependent -- how classifiers are biased by the trends of sentiment apparent during the time-period represented by the training data.</Paragraph>
    <Paragraph position="3"> We propose a novel source of training data based on the language used in conjunction with emoticons in Usenet newsgroups. Training a classifier using this data provides a breadth of features that, while it  curacies, in percent. Best performance on a test set for each model is highlighted in bold.</Paragraph>
    <Paragraph position="4"> does not perform to the state-of-the-art, could function independent of domain, topic and time.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML