File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-2022_metho.xml
Size: 6,107 bytes
Last Modified: 2025-10-06 14:10:13
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-2022"> <Title>Automatic Recognition of Personality in Conversation</Title> <Section position="3" start_page="0" end_page="86" type="metho"> <SectionTitle> 2 Experimental method </SectionTitle> <Paragraph position="0"> Our approach can be summarized in five steps: (1) collect individual corpora; (2) collect personality ratings for each participant; (3) extract relevant features from the texts; (4) build statistical models of the personality ratings based on the features; and (5) test the learned models on the linguistic outputs of unseen individuals.</Paragraph> <Section position="1" start_page="85" end_page="85" type="sub_section"> <SectionTitle> 2.1 Spoken language and personality ratings </SectionTitle> <Paragraph position="0"> The data consists of daily-life conversation extracts of 96 participants wearing an Electronically Activated Recorder (EAR) for two days, collected by Mehl et al. (in press). To preserve the participants' privacy, random bits of conversation were recorded, and only the participants' utterances were transcribed, making it impossible to reconstruct whole conversations. The corpus contains 97,468 words and 15,269 utterances. Table 1 shows utterances for two participants judged as introvert and extravert.</Paragraph> <Paragraph position="1"> Introvert: - Yeah you would do kilograms. Yeah I see what you're saying. - On Tuesday I have class. I don't know.</Paragraph> <Paragraph position="2"> - I don't know. A16. Yeah, that is kind of cool.</Paragraph> <Paragraph position="3"> - I don't know. I just can't wait to be with you and not have to do this every night, you know? - Yeah. You don't know. Is there a bed in there? Well ok just... Extravert: - That's my first yogurt experience here. Really watery. Why? - Damn. New game.</Paragraph> <Paragraph position="4"> - Oh.</Paragraph> <Paragraph position="5"> - Yeah, but he, they like each other. He likes her. - They are going to end up breaking up and he's going to be like. rated as extremely introvert and extravert.</Paragraph> <Paragraph position="6"> Between 5 and 7 independent observers scored each extract using the Big Five Inventory (John and Srivastava, 1999). Mehl et al. (in press) report strong inter-observer reliabilities for all dimensions (r = 0.84, p < 0.01). Average observers' ratings were used as the scores for our experiments.</Paragraph> </Section> <Section position="2" start_page="85" end_page="85" type="sub_section"> <SectionTitle> 2.2 Feature selection </SectionTitle> <Paragraph position="0"> Features are automatically extracted from each extract (see Table 2). We compute the ratio of words in each category from the LIWC utility (Pennebaker et al., 2001), as those features are correlated with the Big Five dimensions (Pennebaker and King, 1999).</Paragraph> <Paragraph position="1"> Additional psychological characteristics were computed by averaging word feature counts from the MRC psycholinguistic database (Coltheart, 1981).</Paragraph> <Paragraph position="2"> In an attempt to capture initiative-taking in conversation (Walker and Whittaker, 1990; Furnham, 1990), we introduce utterance type features using heuristics on the parse tree to tag each utterance as a command, prompt, question or assertion. Overall tagging accuracy over 100 randomly selected utterances is 88%.</Paragraph> <Paragraph position="3"> As personality influences speech, we also use Praat - Affective or emotional processes (Affect): positive emotions (Posemo), positive feelings (Posfeel), optimism and energy (Optim), negative emotions (Negemo), anxiety or fear (Anx), anger (Anger), sadness (Sad) - Cognitive Processes (Cogmech): causation (Cause), insight (Insight), discrepancy (Discrep), inhibition (Inhib), tentative (Tentat), certainty (Certain) - Sensory and perceptual processes (Senses): seeing (See), hearing (Hear), feeling (Feel) - Social processes (Social): communication (Comm), other references to people (Othref), friends (Friends), family (Family), humans (Humans) * RELATIVITY: - Time (Time), past tense verb (Past), present tense verb (Present), future tense verb (Future) - Space (Space): up (Up), down (Down), inclusive (Incl), exclusive (Excl) - Motion (Motion) * PERSONAL CONCERNS: - Occupation (Occup): school (School), work and job (Job), achievement (Achieve) - Leisure activity (Leisure): home (Home), sports (Sports), television and movies (TV), music (Music) - Money and financial issues (Money) - Metaphysical issues (Metaph): religion (Relig), death (Death), physical states and functions (Physcal), body states and symptoms (Body), sexuality (Sexual), eating and drinking (Eating), sleeping (Sleep), bels in brackets.</Paragraph> <Paragraph position="4"> (Boersma, 2001) to compute prosodic features characterizing the voice's pitch, intensity, and speech rate.</Paragraph> </Section> <Section position="3" start_page="85" end_page="86" type="sub_section"> <SectionTitle> 2.3 Statistical model </SectionTitle> <Paragraph position="0"> By definition, personality evaluation assesses relative differences between individuals, e.g. one per- null son is described as an extravert because the average population is not. Thus, we formulate personality recognition as a ranking problem: given two individuals' extracts, which shows more extraversion? Personality models are trained using RankBoost, a boosting algorithm for ranking, for each Big Five trait using the observers' ratings of personality (Freund et al., 1998). RankBoost expresses the learned models as rules, which support the analysis of differences in the personality models (see section 3). Each rule modifies the conversation extract's ranking score by a whenever a feature value exceeds experimentally learned thresholds, e.g. Rule 1 of the extraversion model in Table 4 increases the score of an extract by a = 1.43 if the speech rate is above 0.73 words per second. Models are evaluated by a ranking error function which reports the percentage of misordered pairs of conversation extracts.</Paragraph> </Section> </Section> class="xml-element"></Paper>