File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/h05-1073_evalu.xml
Size: 8,670 bytes
Last Modified: 2025-10-06 13:59:20
<?xml version="1.0" standalone="yes"?> <Paper uid="H05-1073"> <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 579-586, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Emotions from text: machine learning for text-based emotion prediction</Title> <Section position="10" start_page="582" end_page="583" type="evalu"> <SectionTitle> 5 Results and discussion </SectionTitle> <Paragraph position="0"> This section first presents the results from experiments with the two different confusion sets described above, as well as feature experimentation.</Paragraph> <Section position="1" start_page="582" end_page="583" type="sub_section"> <SectionTitle> 5.1 Classification results </SectionTitle> <Paragraph position="0"> Average accuracy from 10-fold cross validation for the first experiment, i.e. classifying sentences as either NEUTRAL or EMOTIONAL, are included in table 5 and figure 1 for the two tuning conditions on the main feature sets and baselines. As expected, degree of success reflects parameter settings, both for content BOW and all features. Nevertheless, under these circumstances, performance above a na&quot;ive baseline and a BOW approach is obtained. Moreover, sequencing shows potential for contributing in one case. However, observations also point to three issues: first, the current data set appears to be too small. Second, the data is not easily separable. This comes as no surprise, given the subjective nature of the task, and the rather low interannotator agreement, reported above. Moreover, despite the schematic narrative plots of children's stories, tales still differ in their overall affective orientation, which increases data complexity. Third and finally, the EMOTION class is combined by basic emotion labels, rather than an original annotated label.</Paragraph> <Paragraph position="1"> More detailed averaged results from 10-fold cross-validation are included in table 6 using all features and the separated tuning and evaluation data condition sep-tune-eval. With these parameters, approximately 3% improvement in accuracy over the na&quot;ive baseline P(Neutral) was recorded, and 5% over the content BOW, which obviously did poorly with these parameters. Moreover, precision is higher than recall for the combined EMOTION class.</Paragraph> <Paragraph position="2"> In comparison, with the same-tune-eval procedure, the accuracy improved by approximately 9% over P(Neutral) and by 8% over content BOW.</Paragraph> <Paragraph position="3"> In the second experiment, the emotion category was split into two classes: emotions with positive versus negative valence. The results in terms of precision, recall, and F-score are included in table 7, using all features and the sep-tune-eval condition. The decrease in performance for the emotion classes mirrors the smaller amounts of data available for each class. As noted in section 4.3, only 9.87% of the sentences were annotated with a positive emotion, and the results for this class are worse. Thus, performance seems likely to improve as more annotated story data becomes available; at this point, we are experimenting with merely around 12% of the total texts targeted by the data annotation project.</Paragraph> </Section> <Section position="2" start_page="583" end_page="583" type="sub_section"> <SectionTitle> 5.2 Feature experiments </SectionTitle> <Paragraph position="0"> Emotions are poorly understood, and it is especially unclear which features may be important for their recognition from text. Thus, we experimented with different feature configurations. Starting with all features, again using 10-fold cross-validation for the separated tuning-evaluation condition sep-tuneeval, one additional feature group was removed until none remained. The feature groups are listed in table 8. Figure 2 on the next page shows the accuracy at each step of the cumulative subtraction process. While some feature groups, e.g. syntactic, appeared less important, the removal order mattered; e.g. if syntactic features were removed first, accuracy decreased. This fact also illustrated that features work together; removing any group degraded performance because features interact and there is no true independence. It was observed that features' contributions were sensitive to parameter tuning. Clearly, further work on developing features which fit the TEP problem is needed.</Paragraph> </Section> </Section> <Section position="11" start_page="583" end_page="585" type="evalu"> <SectionTitle> 6 Refining the model </SectionTitle> <Paragraph position="0"> This was a &quot;first pass&quot; of addressing TEP for TTS.</Paragraph> <Paragraph position="1"> At this point, the annotation project is still on-going, and we only had a fairly small data set to draw on.</Paragraph> <Paragraph position="2"> Nevertheless, results indicate that our learning approach benefits emotion recognition. For example, the following instances, also labeled with the same valence by both annotators, were correctly classified both in the binary (N vs. E) and the tripartite polarity task (N, NE, PE), given the separated tuning and evaluation data condition, and using all features: (1a) E/NE: Then he offered the dwarfs money, and prayed and besought them to let him take her away; but they said, &quot;We will not part with her for all the gold in the world.&quot; (1b) N: And so the little girl really did grow up; her skin was as white as snow, her cheeks as rosy as the blood, and her hair as black as ebony; and she was called Snowdrop.</Paragraph> <Paragraph position="3"> (2a) E/NE: &quot;Ah,&quot; she answered, &quot;have I not reason to weep? (2b) N: Nevertheless, he wished to try him first, and took a stone in his hand and squeezed it together so that water dropped out of it.</Paragraph> <Paragraph position="4"> Cases (1a) and (1b) are from the well-known FOLK TALE Snowdrop, also called Snow White. (1a) and (1b) are also correctly classified by the simple content BOW approach, although our approach has higher prediction confidence for E/NE (1a); it also considers, e.g. direct speech, a fairly high verb count, advanced story progress, connotative words and conjunctions thereof with story progress features, all of which the BOW misses. In addition, the simple content BOW approach makes incorrect predictions at both the bipartite and tripartite levels for examples (2a) and (2b) from the JOKES AND ANECDOTES stories Clever Hans and The Valiant Little Tailor, while our classifier captures the affective differences by considering, e.g. distinctions in verb count, interjection, POS, sentence length, connotations, story subtype, and conjunctions.</Paragraph> <Paragraph position="5"> Next, we intend to use a larger data set to conduct a more complete study to establish mature findings.</Paragraph> <Paragraph position="6"> We also plan to explore finer emotional meaning distinctions, by using a hierarchical sequential model which better corresponds to different levels of cognitive difficulty in emotional categorization by humans, and to classify the full set of basic level emotional categories discussed in section 4.3. Sequential modeling of simple classifiers has been successfully employed to question classification, for example by (Li and Roth, 2002). In addition, we are working on refining and improving the feature set, and given more data, tuning can be improved on a sufficiently large development set. The three subcorpora in the annotation project can reveal how authorship affects emotion perception and classification.</Paragraph> <Paragraph position="7"> Moreover, arousal appears to be an important dimension for emotional prosody (Scherer, 2003), especially in storytelling (Alm and Sproat, 2005).</Paragraph> <Paragraph position="8"> Thus, we are planning on exploring degrees of emotional intensity in a learning scenario, i.e. a problem similar to measuring strength of opinion clauses (Wilson, Wiebe and Hwa, 2004).</Paragraph> <Paragraph position="9"> Finally, emotions are not discrete objects; rather they have transitional nature, and blend and overlap along the temporal dimension. For example, (Liu, Lieberman and Selker, 2003) include parallel estimations of emotional activity, and include smooth- null ing techniques such as interpolation and decay to capture sequential and interactive emotional activity.</Paragraph> <Paragraph position="10"> Observations from tales indicate that some emotions are more likely to be prolonged than others.</Paragraph> </Section> class="xml-element"></Paper>