File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-2081_intro.xml
Size: 3,417 bytes
Last Modified: 2025-10-06 14:03:42
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2081"> <Title>Whose thumb is it anyway? Classifying author personality from weblog text</Title> <Section position="3" start_page="0" end_page="627" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> There is now considerable interest in affective language processing. Work focusses on analysing subjective features of text or speech, such as sentiment, opinion, emotion or point of view (Pang et al., 2002; Turney, 2002; Dave et al., 2003; Liu et al., 2003; Pang and Lee, 2005; Shanahan et al., 2005). Discussing affective computing in general, Picard (1997) notes that phenomena vary in duration, ranging from short-lived feelings, through emotions, to moods, and ultimately to long-lived, slowly-changing personality characteristics.</Paragraph> <Paragraph position="1"> Within computational linguistics, most work has focussed on sentiment and opinion concerning specific entities or events, and on binary classifications of these. For instance, both Pang and Lee (2002) and Turney (2002) consider the thumbs up/thumbs down decision: is a film review positive or negative? However, Pang and Lee (2005) point out that ranking items or comparing reviews will benefit from finer-grained classifications, over multiple ordered classes: is a film review two- or three- or four-star? And at the same time, some work now considers longer-term affective states. For example, Mishne (2005) aims to classify the primary mood of weblog postings; the study encompasses both fine-grained (but non-ordered) multiple classification (frustrated/loved/etc.) and coarse-grained binary classification (active/passive, positive/negative).</Paragraph> <Paragraph position="2"> This paper is about the move to finer-grained multiple classifications; and also about weblogs.</Paragraph> <Paragraph position="3"> But it is also about even more persistent affective states; in particular, it focusses on classifying author personality. We would argue that ongoing work on sentiment analysis or opinion-mining stands to benefit from progress on personalityclassification. The reason is that people vary in personality, and they vary in how they appraise events--and hence, in how strongly they phrase their praise or condemnation. Reiter and Sripada (2004) suggest that lexical choice may sometimes be determined by a writer's idiolect--their personal language preferences. We suggest that while idiolect can be a matter of accident or experience, it may also reflect systematic, personality-based differences. This can help explain why, as Pang and Lee (2005) note, one person's four star review is another's two-star. To put it more bluntly, if you're not a very outgoing sort of person, then your thumbs up might be mistaken for someone else's thumbs down. But how do we distinguish such people? Or, if we spot a thumbs-up review, how can we tell whose thumb it is, anyway? The paper is structured as follows. It introduces trait theories of personality, notes work to date on personality classification, and raises some questions. It then outlines the weblog corpus and the experiments, which compare classification accuracies for four personality dimensions, seven tasks, and five feature selection policies. We discuss the implications of the results, and related work, and end with suggestions for next steps.</Paragraph> </Section> class="xml-element"></Paper>