Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), pages 208–215,
Sydney, July 2006. c©2006 Association for Computational Linguistics
Humor: Prosody Analysis and Automatic Recognition
for F * R * I * E * N * D * S *
Amruta Purandare and Diane Litman
Intelligent Systems Program
University of Pittsburgh
CUamruta,litmanCV@cs.pitt.edu
Abstract
We analyze humorous spoken conversa-
tions from a classic comedy television
show, FRIENDS, by examining acoustic-
prosodic and linguistic features and their
utility in automatic humor recognition.
Using a simple annotation scheme, we au-
tomatically label speaker turns in our cor-
pus that are followed by laughs as hu-
morous and the rest as non-humorous.
Our humor-prosody analysis reveals sig-
nificant differences in prosodic character-
istics (such as pitch, tempo, energy etc.)
of humorous and non-humorous speech,
even when accounted for the gender and
speaker differences. Humor recognition
was carried out using standard supervised
learning classifiers, and shows promising
results significantly above the baseline.
1 Introduction
As conversational systems are becoming preva-
lent in our lives, we notice an increasing need for
adding social intelligence in computers. There has
been a considerable amount of research on incor-
porating affect (Litman and Forbes-Riley, 2004)
(Alm et al., 2005) (D’Mello et al., 2005) (Shroder
and Cowie, 2005) (Klein et al., 2002) and person-
ality (Gebhard et al., 2004) in computer interfaces,
so that, for instance, user frustrations can be rec-
ognized and addressed in a graceful manner. As
(Binsted, 1995) correctly pointed out, one way to
alleviate user frustrations, and to make human-
computer interaction more natural, personal and
interesting for the users, is to model HUMOR.
Research in computational humor is still in
very early stages, partially because humorous lan-
guage often uses complex, ambiguous and incon-
gruous syntactic and semantic expressions (At-
tardo, 1994) (Mulder and Nijholt, 2002) which re-
quire deep semantic interpretation. Nonetheless,
recent studies have shown a feasibility of auto-
matically recognizing (Mihalcea and Strapparava,
2005) (Taylor and Mazlack, 2004) and generating
(Binsted and Ritchie, 1997) (Stock and Strappar-
ava, 2005) humor in computer systems. The state
of the art research in computational humor (Bin-
sted et al., 2006) is, however, limited to text (such
as humorous one-liners, acronyms or wordplays),
and to our knowledge, there has been no work to
date on automatic humor recognition in spoken
conversations.
Before we can model humor in real application
systems, we must first analyze features that char-
acterize humor. Computational approaches to hu-
mor recognition so far primarily rely on lexical
and stylistic cues such as alliteration, antonyms,
adult slang (Mihalcea and Strapparava, 2005). The
focus of our study is, on the other hand, on ana-
lyzing acoustic-prosodic cues (such as pitch, in-
tensity, tempo etc.) in humorous conversations
and testing if these cues can help us to auto-
matically distinguish between humorous and non-
humorous (normal) utterances in speech. We hy-
pothesize that not only the lexical content but
also the prosody (or how the content is expressed)
makes humorous expressions humorous.
The following sections describe our data collec-
tion and pre-processing, followed by the discus-
sion of various acoustic-prosodic as well as other
types of features used in our humorous-speech
analysis and classification experiments. We then
present our experiments, results, and finally end
with conclusions and future work.
208
2 FRIENDS Corpus
(Scherer, 2003) discuss a number of pros and cons
of using real versus acted data, in the context of
emotional speech analysis. His main argument is
that while real data offers natural expressions of
emotions, it is not only hard to collect (due to eth-
ical issues) but also very challenging to annotate
and analyze, as there are very few instances of
strong expressions and the rest are often very sub-
tle. Acted data (also referred to as portrayed or
simulated), on the other hand, offers ample of pro-
totypical examples, although these are criticized
for not being natural at times. To achieve some
balance between naturalness and strength/number
of humorous expressions, we decided to use di-
alogs from a comedy television show FRIENDS,
which provides classical examples of casual, hu-
morous conversations between friends who often
discuss very real-life issues, such as job, career,
relationships etc.
We collected a total of 75 dialogs (scenes) from
six episodes of FRIENDS, four from Season I
(Monica Gets a New Roommate, The One with
Two Parts: Part 1 and 2, All the Poker) and two
from Season II (Ross Finds Out, The Prom Video),
all available on The Best of Friends Volume I
DVD. This gave us approximately 2 hrs of audio.
Text transcripts of these episodes were obtained
from: http://www.friendscafe.org/scripts.shtml,
and were used to extract lexical features (used later
in classification).
Figure 1 shows an excerpt from one of the di-
alogs in our corpus.
3 Audio Segmentation and Annotation
We segmented each audio file (manually) by mark-
ing speaker turn boundaries, using Wavesurfer
(http://www.speech.kth.se/wavesurfer). We apply
a fairly straightforward annotation scheme to au-
tomatically identify humorous and non-humorous
turns in our corpus. Speaker turns that are fol-
lowed by artificial laughs are labeled as Humor-
ous, and all the rest as Non-Humorous. For ex-
ample, in the dialog excerpt shown in figure 1,
turns 3, 7, 9, 11 and 16 are marked as humor-
ous, whereas turns 1, 2, 5, 6, 13, 14, 15 are
marked as non-humorous. Artificial laughs, si-
lences longer than 1 second and segments of au-
dio that contain purely non-verbal sounds (such
as phone rings, door bells, music etc.) were ex-
cluded from the analysis. By considering only
[1] Rachel: Guess what?
[2] Ross: You got a job?
[3] Rachel: Are you kidding? I am trained for
nothing!
[4] BOLaughterBQ
[5] Rachel: I was laughed out of twelve inter-
views today.
[6] Chandler: And yet you’re surprisingly up-
beat.
[7] Rachel: You would be too if you found John
and David boots on sale, fifty percent off!
[8] BOLaughterBQ
[9] Chandler: Oh, how well you know me...
[10] BOLaughterBQ
[11] Rachel: They are my new, I don’t need a job,
I don’t need my parents, I got great boots, boots!
[12] BOLaughterBQ
[13] Monica: How’d you pay for them?
[14] Rachel: Uh, credit card.
[15] Monica: And who pays for that?
[16] Rachel: Um... my... father.
[17] BOLaughterBQ
Figure 1: Dialog Excerpt
speaker turns that are followed by laughs as hu-
morous, we also automatically eliminate cases of
pure visual comedy where humor is expressed us-
ing only gestures or facial expressions. In short,
non-verbal sounds or silences followed by laughs
are not treated as humorous. Henceforth, by
turn, we mean proper speaker turns (and not non-
verbal turns). We currently do not apply any spe-
cial filters to remove non-verbal sounds or back-
ground noise (other than laughs) that overlap with
speaker turns. However, if artificial laughs overlap
with a speaker turn (there were only few such in-
stances), the speaker turn is chopped by marking a
turn boundary exactly before/after the laughs be-
gin/end. This is to ensure that our prosody anal-
ysis is fair and does not catch any cues from the
laughs. In other words, we make sure that our
speaker turns are clean and not garbled by laughs.
After segmentation, we got a total of 1629
speaker turns, of which 714 (43.8%) are humor-
ous, and 915 (56.2%) are non-humorous. We also
made sure that there is a 1-to-1 correspondence be-
tween speaker turns in text transcripts that were
obtained online and our audio segments, and cor-
rected few cases where there was a mis-match (due
to turn-chopping or errors in online transcripts).
209
Figure 2: Audio Segmentation, Transcription and Feature Extraction using Wavesurfer
4 Speaker Distributions
There are 6 main actors/speakers (3 male and 3 fe-
male) in this show, along with a number of (in our
data 26) guest actors who appear briefly and rarely
in some of our dialogs. As the number of guest
actors is quite large, and their individual contribu-
tion is less than 5% of the turns in our data, we
decided to group all the guest actors together in
one GUEST class.
As these are acted (not real) conversations,
there were only few instances of speaker turn-
overlaps, where multiple speakers speak together.
These turns were given a speaker label MULTI. Ta-
ble 1 shows the total number of turns and humor-
ous turns for each speaker, along with their per-
centages in braces. Percentages for the Humor col-
umn show, out of the total (714) humorous turns,
how many are by each speaker. As one can notice,
the distribution of turns is fairly balanced among
the six main speakers. We also notice that even
though each guest actors’ individual contribution
is less than 5% in our data, their combined contri-
bution is fairly large, almost 16% of the total turns.
Table 2 shows that the six main actors together
form a total of 83% of our data. Also, of the to-
tal 714 humorous turns, 615 (86%) turns are by
the main actors. To study if prosody of humor dif-
fers across males and females, we also grouped
the main actors into two gender classes. Table
2 shows that the gender distribution is fairly bal-
Speaker #Turns(%) #Humor (%)
Chandler (M) 244 (15) 163 (22.8)
Joey (M) 153 (9.4) 57 (8)
Monica (F) 219 (13.4) 74 (10.4)
Phoebe (F) 180 (11.1) 104 (14.6)
Rachel (F) 273 (16.8) 90 (12.6)
Ross (M) 288 (17.7) 127 (17.8)
GUEST (26) 263 (16.1) 95 (13.3)
MULTI 9 (0.6) 4 (0.6)
Table 1: Speaker Distribution
anced among the main actors, with 50.5% male
and 49.5% female turns. We also see that of the
685 male turns, 347 turns (almost 50%) are hu-
morous, and of the 672 female turns, 268 (ap-
proximately 40%) are humorous. Guest actors and
multi-speaker turns are not considered in the gen-
der analysis.
Speaker #Turns #Humor
Male 685 347
(50.5% of Main) (50.6% of Male)
Female 672 268
(49.5% Of Main) (39.9% of Female)
Total 1357 615
Main (83.3% of Total) (86.1% of Humor)
Table 2: Gender Distribution for Main Actors
210
5 Features
Literature in emotional speech analysis (Liscombe
et al., 2003)(Litman and Forbes-Riley, 2004)
(Scherer, 2003)(Ang et al., 2002) has shown that
prosodic features such as pitch, energy, speak-
ing rate (tempo) are useful indicators of emotional
states, such as joy, anger, fear, boredom etc. While
humor is not necessarily considered as an emo-
tional state, we noticed that most humorous ut-
terances in our corpus (and also in general) often
make use of hyper-articulations, similar to those
found in emotional speech.
For this study, we use a number of acoustic-
prosodic as well as some non acoustic-prosodic
features as listed below:
Acoustic-Prosodic Features:
AF Pitch (F0): Mean, Max, Min, Range, Stan-
dard Deviation
AF Energy (RMS): Mean, Max, Min, Range,
Standard Deviation
AF Temporal: Duration, Internal Silence, Tempo
Non Acoustic-Prosodic Features:
AF Lexical
AF Turn Length (#Words)
AF Speaker
Our acoustic-prosodic features make use of
the pitch, energy and temporal information in
the speech signal, and are computed using
Wavesurfer. Figure 2 shows Wavesurfer’s energy
(dB), pitch (Hz), and transcription (.lab) panes.
The transcription interface shows text correspond-
ing to the dialog turns, along with the turn bound-
aries. All features are computed at the turn level,
and essentially measure the mean, maximum, min-
imum, range (maximum-minimum) and standard
deviation of the feature value (F0 or RMS) over
the entire turn (ignoring zeroes). Duration is mea-
sured in terms of time in seconds, from the be-
ginning to the end of the turn including pauses
(if any) in between. Internal silence is measured
as the percentage of zero F0 frames, and essen-
tially account for the amount of silence in the turn.
Tempo is computed as the total number of sylla-
bles divided by the duration of the turn. For com-
puting the number of syllables per word, we used
the General Inquirer database (Stone et al., 1966).
Our lexical features are simply all words (alpha-
numeric strings including apostrophes and stop-
words) in the turn. The value of these features is
integral and essentially counts the number of times
a word is repeated in the turn. Although this indi-
rectly accounts for alliterations, in the future stud-
ies, we plan to use more stylistic lexical features
like (Mihalcea and Strapparava, 2005).
Turn length is measured as the number of words
in the turn. For our classification study, we con-
sider eight speaker classes (6 Main actors, 1 for
Guest and Multi) as shown in table 1, whereas for
the gender study, we consider only two speaker
categories (male and female) as shown in table 2.
6 Humor-Prosody Analysis
Feature Humor Non-Humor
Mean-F0 206.9 208.9
Max-F0* 299.8 293.5
Min-F0* 121.1 128.6
Range-F0* 178.7 164.9
StdDev-F0 41.5 41.1
Mean-RMS* 58.3 57.2
Max-RMS* 76.4 75
Min-RMS* 44.2 44.6
Range-RMS* 32.16 30.4
StdDev-RMS* 7.8 7.5
Duration* 3.18 2.66
Int-Sil* 0.452 0.503
Tempo* 3.21 3.03
Length* 10.28 7.97
Table 3: Humor Prosody: Mean feature values for
Humor and Non-Humor groups
Table 3 shows mean values of various acoustic-
prosodic features over all speaker turns in our data,
across humor and non-humor groups. Features
that have statistically (pBO=0.05 as per indepen-
dent samples t-test) different values across the two
groups are marked with asterisks. As one can
see, all features except Mean-F0 and StdDev-F0
show significant differences across humorous and
non-humorous speech. Table 3 shows that humor-
ous turns in our data are longer, both in terms of
the time duration and the number of words, than
non-humorous turns. We also notice that humor-
ous turns have smaller internal silence, and hence
rapid tempo. Pitch (F0) and energy (RMS) fea-
tures have higher maximum, but lower minimum
211
values, for humorous turns. This in turn gives
higher values for range and standard deviation for
humor compared to the non-humor group. This re-
sult is somewhat consistent with previous findings
of (Liscombe et al., 2003) who found that most of
these features are largely associated with positive
and active emotional states such as happy, encour-
aging, confident etc. which are likely to appear in
our humorous turns.
7 Gender Effect on Humor-Prosody
To analyze prosody of humor across two genders,
we conducted a 2-way ANOVA test, using speaker
gender (male/female) and humor (yes/no) as our
fixed factors, and each of the above acoustic-
prosodic features as a dependent variable. The
test tells us the effect of humor on prosody ad-
justed for gender, the effect of gender on prosody
adjusted for humor and also the effect of interac-
tion between gender and humor on prosody (i.e.
if the effect of humor on prosody differs accord-
ing to gender). Table 4 shows results of 2-way
ANOVA, where Y shows significant effects, and
N shows non-significant effects. For example, the
result for tempo shows that tempo differs signifi-
cantly only across humor and non-humor groups,
but not across the two gender groups, and that
there is no effect of interaction between humor
and gender on tempo. As before, all features ex-
cept Mean-F0 and StdDev-F0 show significant dif-
ferences across humor and no-humor conditions,
even when adjusted for gender differences. The
table also shows that all features except inter-
nal silence and tempo show significant differences
across two genders, although only pitch features
(Max-F0, Min-F0, and StdDev-F0) show the ef-
fect of interaction between gender and humor. In
other words, the effect of humor on these pitch fea-
tures is dependent on gender. For instance, if male
speakers raise their pitch while expressing humor,
female speakers might lower. To confirm this,
we computed means values of various features for
males and females separately (See Tables 5 and
6). These tables indeed suggest that male speak-
ers show higher values for pitch features (Mean-
F0, Min-F0, StdDev-F0), while expressing humor,
whereas females show lower. Also for male speak-
ers, differences in Min-F0 and Min-RMS values
are not statistically significant across humor and
non-humor groups, whereas for female speakers,
features Mean-F0, StdDev-F0 and tempo do not
show significant differences across the two groups.
One can also notice that the differences in the
mean pitch feature values (specifically Mean-F0,
Max-F0 and Range-F0) between humor and non-
humor groups are much higher for males than for
females.
In summary, our gender analysis shows that al-
though most acoustic-prosodic features are differ-
ent for males and females, the prosodic style of ex-
pressing humor by male and female speakers dif-
fers only along some pitch-features (both in mag-
nitude and direction).
Feature Humor Gender Humor
x Gender
Mean-F0 N Y N
Max-F0 Y Y Y
Min-F0 Y Y Y
Range-F0 Y Y N
StdDev-F0 N Y Y
Mean-RMS Y Y N
Max-RMS Y Y N
Min-RMS Y Y N
Range-RMS Y Y N
StdDev-RMS Y Y N
Duration Y Y N
Int-Sil Y N N
Tempo Y N N
Length Y Y N
Table 4: Gender Effect on Humor Prosody: 2-Way
ANOVA Results
8 Speaker Effect on Humor-Prosody
We then conducted similar ANOVA test to account
for the speaker differences, i.e. by considering hu-
mor (yes/no) and speaker (8 groups as shown in ta-
ble 1) as our fixed factors and each of the acoustic-
prosodic features as a dependent variable for a 2-
Way ANOVA. Table 7 shows results of this analy-
sis. As before, the table shows the effect of humor
adjusted for speaker, the effect of speaker adjusted
for humor and also the effect of interaction be-
tween humor and speaker, on each of the acoustic-
prosodic features. According to table 7, we no
longer see the effect of humor on features Min-
F0, Mean-RMS and Tempo (in addition to Mean-
F0 and StdDev-F0), in presence of the speaker
variable. Speaker, on the other hand, shows sig-
nificant effect on prosody for all features. But
212
Feature Humor Non-Humor
Mean-F0* 188.14 176.43
Max-F0* 276.94 251.7
Min-F0 114.54 113.56
Range-F0* 162.4 138.14
StdDev-F0* 37.83 34.27
Mean-RMS* 57.86 56.4
Max-RMS* 75.5 74.21
Min-RMS 44.04 44.12
Range-RMS* 31.46 30.09
StdDev-RMS* 7.64 7.31
Duration* 3.1 2.57
Int-Sil* 0.44 0.5
Tempo* 3.33 3.1
Length* 10.27 8.1
Table 5: Humor Prosody for Male Speakers
surprisingly, again only pitch features Mean-F0,
Max-F0 and Min-F0 show the interaction effect,
suggesting that the effect of humor on these pitch
features differs from speaker to speaker. In other
words, different speakers use different pitch varia-
tions while expressing humor.
9 Humor Recognition by Supervised
Learning
We formulate our humor-recognition experiment
as a classical supervised learning problem, by
automatically classifying spoken turns into hu-
mor and non-humor groups, using standard ma-
chine learning classifiers. We used the decision
tree algorithm ADTree from Weka, and ran a
10-fold cross validation experiment on all 1629
turns in our data1. The baseline for these ex-
periments is 56.2% for the majority class (non-
humorous). Table 8 reports classification results
for six feature categories: lexical alone, lexical +
speaker, prosody alone, prosody + speaker, lexical
+ prosody and lexical + prosody + speaker (all).
Numbers in braces show the number of features
in each category. There are total 2025 features
which include 2011 lexical (all word types plus
turn length), 13 acoustic-prosodic and 1 for the
speaker information. Feature Length was included
in the lexical feature group, as it counts the num-
ber of lexical items (words) in the turn.
1We also tried other classifiers like Naive Bayes and Ad-
aBoost, although since the results were equivalent to ADTree,
we do not report those here.
Feature Humor Non-Humor
Mean-F0 235.79 238.75
Max-F0* 336.15 331.14
Min-F0* 133.63 143.14
Range-F0* 202.5 188
StdDev-F0 46.33 46.6
Mean-RMS* 58.44 57.64
Max-RMS* 77.33 75.57
Min-RMS* 44.08 44.74
Range-RMS* 33.24 30.83
StdDev-RMS* 8.18 7.59
Duration* 3.35 2.8
Int-Sil* 0.47 0.51
Tempo 3.1 3.1
Length* 10.66 8.25
Table 6: Humor Prosody for Female Speakers
All results are significantly above the baseline
(as measured by a pair-wise t-test) with the best
accuracy of 64% (8% over the baseline) obtained
using all features. We notice that the classifica-
tion accuracy improves on adding speaker infor-
mation to both lexical and prosodic features. Al-
though these results do not show a strong evidence
that prosodic features are better than lexical, it is
interesting to note that the performance of just a
few (13) prosodic features is comparable to that
of 2011 lexical features. Figure 3 shows the deci-
sion tree produced by the classifier in 10 iterations.
Numbers indicate the order in which the nodes are
created, and indentations mark parent-child rela-
tions. We notice that the classifier primarily se-
lected speaker and prosodic features in the first
10 iterations, whereas lexical features were se-
lected only in the later iterations (not shown here).
This seems consistent with our original hypothe-
sis that speech features are better at discriminating
between humorous and non-humorous utterances
in speech than lexical content.
Although (Mihalcea and Strapparava, 2005) ob-
tained much higher accuracies using lexical fea-
tures alone, it might be due to the fact that our data
is homogeneous in the sense that both humorous
and non-humorous turns are extracted from the
same source, and involve same speakers, which
makes the two groups highly alike and hence chal-
lenging to distinguish. To make sure that the
lower accuracy we get is not simply due to using
smaller data compared to (Mihalcea and Strappar-
213
Feature Humor Speaker Humor
x Speaker
Mean-F0 N Y Y
Max-F0 Y Y Y
Min-F0 N Y Y
Range-F0 Y Y N
StdDev-F0 N Y N
Mean-RMS N Y N
Max-RMS Y Y N
Min-RMS Y Y N
Range-RMS Y Y N
StdDev-RMS Y Y N
Duration Y Y N
Int-Sil Y Y N
Tempo N Y N
Length Y Y N
Table 7: Speaker Effect on Humor Prosody: 2-
Way ANOVA Results
Feature -Speaker +Speaker
Lex 61.14 (2011) 63.5 (2012)
Prosody 60 (13) 63.8 (14)
Lex + Prosody 62.6 (2024) 64 (2025)
Table 8: Humor Recognition Results (% Correct)
ava, 2005), we looked at the learning curve for the
classifier (see figure 4) and found that the classi-
fier performance is not sensitive to the amount of
data.
Table 9 shows classification results by gender,
using all features. For the male group, the base-
line is 50.6%, as the majority class humor is 50.6%
(See Table 2). For females, the baseline is 60%
(for non-humorous) as only 40% of the female
turns are humorous.
Gender Baseline Classifier
Male 50.6 64.63
Female 60.1 64.8
Table 9: Humor Recognition Results by Gender
As Table 9 shows, the performance of the classi-
fier is somewhat consistent cross-gender, although
for male speakers, the relative improvement is
much higher (14% above the baseline), than for
females (only 5% above the baseline). Our earlier
observation (from tables 5 and 6) that differences
in pitch features between humor and non-humor
CY (1)SPEAKER = chandler: 0.469
CY (1)SPEAKER != chandler: -0.083
CY CY (4)SPEAKER = phoebe: 0.373
CY CY (4)SPEAKER != phoebe: -0.064
CY (2)DURATION BO 1.515: -0.262
CY CY (5)SILENCE BO 0.659: 0.115
CY CY (5)SILENCE BQ= 0.659: -0.465
CY CY (8)SD F0 BO 9.919: -1.11
CY CY (8)SD F0 BQ= 9.919: 0.039
CY (2)DURATION BQ= 1.515: 0.1
CY CY (3)MEAN RMS BO 56.117: -0.274
CY CY (3)MEAN RMS BQ= 56.117: 0.147
CY CY CY (7)come BO 0.5: -0.056
CY CY CY (7)come BQ= 0.5: 0.417
CY CY (6)SD F0 BO 57.333: 0.076
CY CY (6)SD F0 BQ= 57.333: -0.285
CY CY (9)MAX RMS BO 86.186: 0.011
CY CY CY (10)MIN F0 BO 166.293: 0.047
CY CY CY (10)MIN F0 BQ= 166.293: -0.351
CY CY (9)MAX RMS BQ= 86.186: -0.972
Legend: +ve = humor, -ve = non-humor
Figure 3: Decision Tree (only the first 10 iterations
are shown)
groups are quite higher for males than for females,
may explain why we see higher improvement for
male speakers.
10 Conclusions
In this paper, we presented our experiments on
humor-prosody analysis and humor recognition
in spoken conversations, collected from a clas-
sic television comedy, FRIENDS. Using a sim-
ple automated annotation scheme, we labeled
speaker turns in our corpus that are followed
by artificial laughs as humorous, and the rest as
non-humorous. We then examined a number of
acoustic-prosodic features based on pitch, energy
and temporal information in the speech signal,
that have been found useful by previous studies in
emotion recognition.
Our prosody analysis revealed that humorous
and non-humorous turns indeed show significant
differences in most of these features, even when
accounted for the speaker and gender differences.
Specifically, we found that humorous turns tend
to have higher tempo, smaller internal silence, and
higher peak, range and standard deviation for pitch
and energy, compared to non-humorous turns.
On the humor recognition task, our classifier
214
Figure 4: Learning Curve: %Accuracy versus
%Fraction of Data
achieved the best performance when acoustic-
prosodic features were used in conjunction with
lexical and other types of features, and in all ex-
periments attained the accuracy statistically signif-
icant over the baseline. While prosody of humor
shows some differences due to gender, the perfor-
mance on the humor recognition task is equiva-
lent for males and females, although the relative
improvement over the baseline is much higher for
males than for females.
Our current study focuses only on lexical and
speech features, primarily because these features
can be computed automatically. In the future, we
plan to explore more sophisticated semantic and
pragmatic features such as incongruity, ambiguity,
expectation-violation etc. We also like to inves-
tigate if our findings generalize to other types of
corpora besides TV-show dialogs.

References
C. Alm, D. Roth, and R. Sproat. 2005. Emotions from
text: Machine learning for text-based emotion pre-
diction. In Proceedings of HLT/EMNLP, Vancou-
ver, CA.
J. Ang, R. Dhillon, A. Krupski, E. Shriberg, and
A. Stolcke. 2002. Prosody-based automatic de-
tection of annoyance and frustration in human-
computer dialog. In Proceedings of ICSLP.
S. Attardo. 1994. Linguistic Theory of Humor. Moun-
ton de Gruyter, Berlin.
K. Binsted and G. Ritchie. 1997. Computational rules
for punning riddles. Humor, 10(1).
K. Binsted, B. Bergen, S. Coulson, A. Nijholt,
O. Stock, C. Strapparava, G. Ritchie, R. Manurung,
H. Pain, A. Waller, and D. O’Mara. 2006. Com-
putational humor. IEEE Intelligent Systems, March-
April.
K. Binsted. 1995. Using humour to make natural lan-
guage interfaces more friendly. In Proceedings of
the AI, ALife and Entertainment Workshop, Mon-
treal, CA.
S. D’Mello, S. Craig, G. Gholson, S. Franklin, R. Pi-
card, and A. Graesser. 2005. Integrating affect sen-
sors in an intelligent tutoring system. In Proceed-
ings of Affective Interactions: The Computer in the
Affective Loop Workshop.
P. Gebhard, M. Klesen, and T. Rist. 2004. Color-
ing multi-character conversations through the ex-
pression of emotions. In Proceedings of Affective
Dialog Systems.
J. Klein, Y. Moon, and R. Picard. 2002. This computer
responds to user frustration: Theory, design, and re-
sults. Interacting with Computers, 14.
J. Liscombe, J. Venditti, and J. Hirschberg. 2003.
Classifying subject ratings of emotional speech us-
ing acoustic features. In Proceedings of Eurospeech,
Geneva, Switzerland.
D. Litman and K. Forbes-Riley. 2004. Predicting
student emotions in computer-human tutoring dia-
logues. In Proceedings of ACL, Barcelona, Spain.
R. Mihalcea and C. Strapparava. 2005. Making
computers laugh: Investigations in automatic humor
recognition. In Proceedings of HLT/EMNLP, Van-
couver, CA.
M. Mulder and A. Nijholt. 2002. Humor research:
State of the art. Technical Report 34, CTIT Techni-
cal Report Series.
Scherer. 2003. Vocal communication of emotion: A
review of research paradigms. Speech Communica-
tion, 40(1-2):227–256.
M. Shroder and R. Cowie. 2005. Toward emotion-
sensitive multimodal interfaces: the challenge of the
european network of excellence humaine. In Pro-
ceedings of User Modeling Workshop on Adapting
the Interaction Style to Affective Factors.
O. Stock and C. Strapparava. 2005. Hahaacronym:
A computational humor system. In Proceedings of
ACL Interactive Poster and Demonstration Session,
pages 113–116, Ann Arbor, MI.
P. Stone, D. Dunphy, M. Smith, and D. Ogilvie. 1966.
The General Inquirer: A Computer Approach to
Content Analysis. MIT Press, Cambridge, MA.
J. Taylor and L. Mazlack. 2004. Computationally rec-
ognizing wordplay in jokes. In Proceedings of the
CogSci 2004, Chicago, IL.
