Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language
Processing (HLT/EMNLP), pages 531–538, Vancouver, October 2005. c©2005 Association for Computational Linguistics
Making Computers Laugh:
Investigations in Automatic Humor Recognition
Rada Mihalcea
Department of Computer Science
University of North Texas
Denton, TX, 76203, USA
rada@cs.unt.edu
Carlo Strapparava
Istituto per la Ricerca Scientifica e Tecnologica
ITC – irst
I-38050, Povo, Trento, Italy
strappa@itc.it
Abstract
Humor is one of the most interesting and
puzzling aspects of human behavior. De-
spite the attention it has received in fields
such as philosophy, linguistics, and psy-
chology, there have been only few at-
tempts to create computational models for
humor recognition or generation. In this
paper, we bring empirical evidence that
computational approaches can be success-
fully applied to the task of humor recogni-
tion. Through experiments performed on
very large data sets, we show that auto-
matic classification techniques can be ef-
fectively used to distinguish between hu-
morous and non-humorous texts, with sig-
nificant improvements observed over apri-
ori known baselines.
1 Introduction
... pleasure has probably been the main goal all along. But I hesitate
to admit it, because computer scientists want to maintain their image
as hard-working individuals who deserve high salaries. Sooner or
later society will realize that certain kinds of hard work are in fact
admirable even though they are more fun than just about anything
else. (Knuth, 1993)
Humor is an essential element in personal com-
munication. While it is merely considered a way
to induce amusement, humor also has a positive ef-
fect on the mental state of those using it and has the
ability to improve their activity. Therefore computa-
tional humor deserves particular attention, as it has
the potential of changing computers into a creative
and motivational tool for human activity (Stock et
al., 2002; Nijholt et al., 2003).
Previous work in computational humor has fo-
cused mainly on the task of humor generation (Stock
and Strapparava, 2003; Binsted and Ritchie, 1997),
and very few attempts have been made to develop
systems for automatic humor recognition (Taylor
and Mazlack, 2004). This is not surprising, since,
from a computational perspective, humor recogni-
tion appears to be significantly more subtle and dif-
ficult than humor generation.
In this paper, we explore the applicability of
computational approaches to the recognition of ver-
bally expressed humor. In particular, we investigate
whether automatic classification techniques are a vi-
able approach to distinguish between humorous and
non-humorous text, and we bring empirical evidence
in support of this hypothesis through experiments
performed on very large data sets.
Since a deep comprehension of humor in all of
its aspects is probably too ambitious and beyond
the existing computational capabilities, we chose
to restrict our investigation to the type of humor
found in one-liners. A one-liner is a short sen-
tence with comic effects and an interesting linguistic
structure: simple syntax, deliberate use of rhetoric
devices (e.g. alliteration, rhyme), and frequent use
of creative language constructions meant to attract
the readers attention. While longer jokes can have
a relatively complex narrative structure, a one-liner
must produce the humorous effect “in one shot”,
with very few words. These characteristics make
this type of humor particularly suitable for use in an
automatic learning setting, as the humor-producing
features are guaranteed to be present in the first (and
only) sentence.
We attempt to formulate the humor-recognition
531
problem as a traditional classification task, and feed
positive (humorous) and negative (non-humorous)
examples to an automatic classifier. The humor-
ous data set consists of one-liners collected from
the Web using an automatic bootstrapping process.
The non-humorous data is selected such that it
is structurally and stylistically similar to the one-
liners. Specifically, we use three different nega-
tive data sets: (1) Reuters news titles; (2) proverbs;
and (3) sentences from the British National Corpus
(BNC). The classification results are encouraging,
with accuracy figures ranging from 79.15% (One-
liners/BNC) to 96.95% (One-liners/Reuters). Re-
gardless of the non-humorous data set playing the
role of negative examples, the performance of the
automatically learned humor-recognizer is always
significantly better than apriori known baselines.
The remainder of the paper is organized as fol-
lows. We first describe the humorous and non-
humorous data sets, and provide details on the Web-
based bootstrapping process used to build a very
large collection of one-liners. We then show experi-
mental results obtained on these data sets using sev-
eral heuristics and two different text classifiers. Fi-
nally, we conclude with a discussion and directions
for future work.
2 Humorous and Non-humorous Data Sets
To test our hypothesis that automatic classification
techniques represent a viable approach to humor
recognition, we needed in the first place a data set
consisting of both humorous (positive) and non-
humorous (negative) examples. Such data sets can
be used to automatically learn computational mod-
els for humor recognition, and at the same time eval-
uate the performance of such models.
2.1 Humorous Data
For reasons outlined earlier, we restrict our attention
to one-liners, short humorous sentences that have the
characteristic of producing a comic effect in very
few words (usually 15 or less). The one-liners hu-
mor style is illustrated in Table 1, which shows three
examples of such one-sentence jokes.
It is well-known that large amounts of training
data have the potential of improving the accuracy of
the learning process, and at the same time provide
insights into how increasingly larger data sets can
affect the classification precision. The manual con-
enumerations matching
stylistic constraint (2)?
yes
yes
seed one−liners
automatically identified
        one−liners
Web search
webpages matching 
thematic constraint (1)?
candidate
webpages
Figure 1: Web-based bootstrapping of one-liners.
struction of a very large one-liner data set may be
however problematic, since most Web sites or mail-
ing lists that make available such jokes do not usu-
ally list more than 50–100 one-liners. To tackle this
problem, we implemented a Web-based bootstrap-
ping algorithm able to automatically collect a large
number of one-liners starting with a short seed list,
consisting of a few one-liners manually identified.
The bootstrapping process is illustrated in Figure
1. Starting with the seed set, the algorithm auto-
matically identifies a list of webpages that include at
least one of the seed one-liners, via a simple search
performed with a Web search engine. Next, the web-
pages found in this way are HTML parsed, and ad-
ditional one-liners are automatically identified and
added to the seed set. The process is repeated sev-
eral times, until enough one-liners are collected.
An important aspect of any bootstrapping algo-
rithm is the set of constraints used to steer the pro-
cess and prevent as much as possible the addition of
noisy entries. Our algorithm uses: (1) a thematic
constraint applied to the theme of each webpage;
and (2) a structural constraint, exploiting HTML an-
notations indicating text of similar genre.
The first constraint is implemented using a set
of keywords of which at least one has to appear
in the URL of a retrieved webpage, thus poten-
tially limiting the content of the webpage to a
theme related to that keyword. The set of key-
words used in the current implementation consists
of six words that explicitly indicate humor-related
content: oneliner, one-liner, humor, humour, joke,
532
One-liners
Take my advice; I don’t use it anyway.
I get enough exercise just pushing my luck.
Beauty is in the eye of the beer holder.
Reuters titles
Trocadero expects tripling of revenues.
Silver fixes at two-month high, but gold lags.
Oil prices slip as refiners shop for bargains.
BNC sentences
They were like spirits, and I loved them.
I wonder if there is some contradiction here.
The train arrives three minutes early.
Proverbs
Creativity is more important than knowledge.
Beauty is in the eye of the beholder.
I believe no tales from an enemy’s tongue.
Table 1: Sample examples of one-liners, Reuters ti-
tles, BNC sentences, and proverbs.
funny. For example, http://www.berro.com/Jokes
or http://www.mutedfaith.com/funny/life.htm are the
URLs of two webpages that satisfy this constraint.
The second constraint is designed to exploit the
HTML structure of webpages, in an attempt to iden-
tify enumerations of texts that include the seed one-
liner. This is based on the hypothesis that enumer-
ations typically include texts of similar genre, and
thus a list including the seed one-liner is likely to
include additional one-line jokes. For instance, if a
seed one-liner is found in a webpage preceded by the
HTML tag <li> (i.e. “list item”), other lines found
in the same enumeration preceded by the same tag
are also likely to be one-liners.
Two iterations of the bootstrapping process,
started with a small seed set of ten one-liners, re-
sulted in a large set of about 24,000 one-liners.
After removing the duplicates using a measure of
string similarity based on the longest common sub-
sequence metric, we were left with a final set of
approximately 16,000 one-liners, which are used in
the humor-recognition experiments. Note that since
the collection process is automatic, noisy entries are
also possible. Manual verification of a randomly se-
lected sample of 200 one-liners indicates an average
of 9% potential noise in the data set, which is within
reasonable limits, as it does not appear to signifi-
cantly impact the quality of the learning.
2.2 Non-humorous Data
To construct the set of negative examples re-
quired by the humor-recognition models, we tried
to identify collections of sentences that were non-
humorous, but similar in structure and composition
to the one-liners. We do not want the automatic clas-
sifiers to learn to distinguish between humorous and
non-humorous examples based simply on text length
or obvious vocabulary differences. Instead, we seek
to enforce the classifiers to identify humor-specific
features, by supplying them with negative examples
similar in most of their aspects to the positive exam-
ples, but different in their comic effect.
We tested three different sets of negative exam-
ples, with three examples from each data set illus-
trated in Table 1. All non-humorous examples are
enforced to follow the same length restriction as the
one-liners, i.e. one sentence with an average length
of 10–15 words.
1. Reuters titles, extracted from news articles pub-
lished in the Reuters newswire over a period of
one year (8/20/1996 – 8/19/1997) (Lewis et al.,
2004). The titles consist of short sentences with
simple syntax, and are often phrased to catch
the readers attention (an effect similar to the
one rendered by one-liners).
2. Proverbs extracted from an online proverb col-
lection. Proverbs are sayings that transmit, usu-
ally in one short sentence, important facts or
experiences that are considered true by many
people. Their property of being condensed, but
memorable sayings make them very similar to
the one-liners. In fact, some one-liners attempt
to reproduce proverbs, with a comic effect, as
in e.g. “Beauty is in the eye of the beer holder”,
derived from “Beauty is in the eye of the be-
holder”.
3. British National Corpus (BNC) sentences, ex-
tracted from BNC – a balanced corpus covering
different styles, genres and domains. The sen-
tences were selected such that they were similar
in content with the one-liners: we used an in-
formation retrieval system implementing a vec-
torial model to identify the BNC sentence most
similar to each of the 16,000 one-liners1. Un-
like the Reuters titles or the proverbs, the BNC
sentences have typically no added creativity.
However, we decided to add this set of negative
examples to our experimental setting, in order
1The sentence most similar to a one-liner is identified by
running the one-liner against an index built for all BNC sen-
tences with a length of 10–15 words. We use a tf.idf weighting
scheme and a cosine similarity measure, as implemented in the
Smart system (ftp.cs.cornell.edu/pub/smart)
533
to observe the level of difficulty of a humor-
recognition task when performed with respect
to simple text.
To summarize, the humor recognition experiments
rely on data sets consisting of humorous (positive)
and non-humorous (negative) examples. The posi-
tive examples consist of 16,000 one-liners automat-
ically collected using a Web-based bootstrapping
process. The negative examples are drawn from: (1)
Reuters titles; (2) Proverbs; and (3) BNC sentences.
3 Automatic Humor Recognition
We experiment with automatic classification tech-
niques using: (a) heuristics based on humor-specific
stylistic features (alliteration, antonymy, slang); (b)
content-based features, within a learning framework
formulated as a typical text classification task; and
(c) combined stylistic and content-based features,
integrated in a stacked machine learning framework.
3.1 Humor-Specific Stylistic Features
Linguistic theories of humor (Attardo, 1994) have
suggested many stylistic features that characterize
humorous texts. We tried to identify a set of fea-
tures that were both significant and feasible to im-
plement using existing machine readable resources.
Specifically, we focus on alliteration, antonymy, and
adult slang, which were previously suggested as po-
tentially good indicators of humor (Ruch, 2002; Bu-
caria, 2004).
Alliteration. Some studies on humor appreciation
(Ruch, 2002) show that structural and phonetic prop-
erties of jokes are at least as important as their con-
tent. In fact one-liners often rely on the reader’s
awareness of attention-catching sounds, through lin-
guistic phenomena such as alliteration, word repeti-
tion and rhyme, which produce a comic effect even if
the jokes are not necessarily meant to be read aloud.
Note that similar rhetorical devices play an impor-
tant role in wordplay jokes, and are often used in
newspaper headlines and in advertisement. The fol-
lowing one-liners are examples of jokes that include
one or more alliteration chains:
Veni, Vidi, Visa: I came, I saw, I did a little shopping.
Infants don’t enjoy infancy like adults do adultery.
To extract this feature, we identify and count the
number of alliteration/rhyme chains in each exam-
ple in our data set. The chains are automatically ex-
tracted using an index created on top of the CMU
pronunciation dictionary2.
Antonymy. Humor often relies on some type of
incongruity, opposition or other forms of apparent
contradiction. While an accurate identification of
all these properties is probably difficult to accom-
plish, it is relatively easy to identify the presence of
antonyms in a sentence. For instance, the comic ef-
fect produced by the following one-liners is partly
due to the presence of antonyms:
A clean desk is a sign of a cluttered desk drawer.
Always try to be modest and be proud of it!
The lexical resource we use to identify antonyms
is WORDNET (Miller, 1995), and in particular the
antonymy relation among nouns, verbs, adjectives
and adverbs. For adjectives we also consider an in-
direct antonymy via the similar-to relation among
adjective synsets. Despite the relatively large num-
ber of antonymy relations defined in WORDNET,
its coverage is far from complete, and thus the
antonymy feature cannot always be identified. A
deeper semantic analysis of the text, such as word
sense disambiguation or domain disambiguation,
could probably help detecting other types of seman-
tic opposition, and we plan to exploit these tech-
niques in future work.
Adult slang. Humor based on adult slang is very
popular. Therefore, a possible feature for humor-
recognition is the detection of sexual-oriented lexi-
con in the sentence. The following represent exam-
ples of one-liners that include such slang:
The sex was so good that even the neighbors had a cigarette.
Artificial Insemination: procreation without recreation.
To form a lexicon required for the identification of
this feature, we extract from WORDNET DOMAINS3
all the synsets labeled with the domain SEXUALITY.
The list is further processed by removing all words
with high polysemy (≥ 4). Next, we check for the
presence of the words in this lexicon in each sen-
tence in the corpus, and annotate them accordingly.
Note that, as in the case of antonymy, WORDNET
coverage is not complete, and the adult slang fea-
ture cannot always be identified.
Finally, in some cases, all three features (alliteration,
2Available at http://www.speech.cs.cmu.edu/cgi-bin/cmudict
3WORDNET DOMAINS assigns each synset in WORDNET
with one or more “domain” labels, such as SPORT, MEDICINE,
ECONOMY. See http://wndomains.itc.it.
534
antonymy, adult slang) are present in the same sen-
tence, as for instance the following one-liner:
Behind every greatal manant is a greatal womanant, and
behind every greatal womanant is some guy staring at her
behindsl!
3.2 Content-based Learning
In addition to stylistic features, we also experi-
mented with content-based features, through ex-
periments where the humor-recognition task is for-
mulated as a traditional text classification problem.
Specifically, we compare results obtained with two
frequently used text classifiers, Na¨ıve Bayes and
Support Vector Machines, selected based on their
performance in previously reported work, and for
their diversity of learning methodologies.
Na¨ıve Bayes. The main idea in a Na¨ıve Bayes text
classifier is to estimate the probability of a category
given a document using joint probabilities of words
and documents. Na¨ıve Bayes classifiers assume
word independence, but despite this simplification,
they perform well on text classification. While there
are several versions of Na¨ıve Bayes classifiers (vari-
ations of multinomial and multivariate Bernoulli),
we use the multinomial model, previously shown to
be more effective (McCallum and Nigam, 1998).
Support Vector Machines. Support Vector Ma-
chines (SVM) are binary classifiers that seek to find
the hyperplane that best separates a set of posi-
tive examples from a set of negative examples, with
maximum margin. Applications of SVM classifiers
to text categorization led to some of the best results
reported in the literature (Joachims, 1998).
4 Experimental Results
Several experiments were conducted to gain insights
into various aspects related to an automatic hu-
mor recognition task: classification accuracy using
stylistic and content-based features, learning rates,
impact of the type of negative data, impact of the
classification methodology.
All evaluations are performed using stratified ten-
fold cross validations, for accurate estimates. The
baseline for all the experiments is 50%, which rep-
resents the classification accuracy obtained if a label
of “humorous” (or “non-humorous”) would be as-
signed by default to all the examples in the data set.
Experiments with uneven class distributions were
also performed, and are reported in section 4.4.
4.1 Heuristics using Humor-specific Features
In a first set of experiments, we evaluated the classi-
fication accuracy using stylistic humor-specific fea-
tures: alliteration, antonymy, and adult slang. These
are numerical features that act as heuristics, and the
only parameter required for their application is a
threshold indicating the minimum value admitted for
a statement to be classified as humorous (or non-
humorous). These thresholds are learned automat-
ically using a decision tree applied on a small subset
of humorous/non-humorous examples (1000 exam-
ples). The evaluation is performed on the remaining
15,000 examples, with results shown in Table 24.
One-liners One-liners One-liners
Heuristic Reuters BNC Proverbs
Alliteration 74.31% 59.34% 53.30%
Antonymy 55.65% 51.40% 50.51%
Adult slang 52.74% 52.39% 50.74%
ALL 76.73% 60.63% 53.71%
Table 2: Humor-recognition accuracy using allitera-
tion, antonymy, and adult slang.
Considering the fact that these features represent
stylistic indicators, the style of Reuters titles turns
out to be the most different with respect to one-
liners, while the style of proverbs is the most sim-
ilar. Note that for all data sets the alliteration feature
appears to be the most useful indicator of humor,
which is in agreement with previous linguistic find-
ings (Ruch, 2002).
4.2 Text Classification with Content Features
The second set of experiments was concerned with
the evaluation of content-based features for humor
recognition. Table 3 shows results obtained using
the three different sets of negative examples, with
the Na¨ıve Bayes and SVM text classifiers. Learning
curves are plotted in Figure 2.
One-liners One-liners One-liners
Classifier Reuters BNC Proverbs
Na¨ıve Bayes 96.67% 73.22% 84.81%
SVM 96.09% 77.51% 84.48%
Table 3: Humor-recognition accuracy using Na¨ıve
Bayes and SVM text classifiers.
4We also experimented with decision trees learned from a
larger number of examples, but the results were similar, which
confirms our hypothesis that these features are heuristics, rather
than learnable properties that improve their accuracy with addi-
tional training data.
535
 40
 50
 60
 70
 80
 90
 100
 0  20  40  60  80  100
Classification accuracy (%)
Fraction of data (%)
Classification learning curves
Naive Bayes
SVM  40
 50
 60
 70
 80
 90
 100
 0  20  40  60  80  100
Classification accuracy (%)
Fraction of data (%)
Classification learning curves
Naive Bayes
SVM  40
 50
 60
 70
 80
 90
 100
 0  20  40  60  80  100
Classification accuracy (%)
Fraction of data (%)
Classification learning curves
Naive Bayes
SVM
(a) (b) (c)
Figure 2: Learning curves for humor-recognition using text classification techniques, with respect to three
different sets of negative examples: (a) Reuters; (b) BNC; (c) Proverbs.
Once again, the content of Reuters titles appears
to be the most different with respect to one-liners,
while the BNC sentences represent the most simi-
lar data set. This suggests that joke content tends to
be very similar to regular text, although a reasonably
accurate distinction can still be made using text clas-
sification techniques. Interestingly, proverbs can be
distinguished from one-liners using content-based
features, which indicates that despite their stylistic
similarity (see Table 2), proverbs and one-liners deal
with different topics.
4.3 Combining Stylistic and Content Features
Encouraged by the results obtained in the first
two experiments, we designed a third experiment
that attempts to jointly exploit stylistic and con-
tent features for humor recognition. The feature
combination is performed using a stacked learner,
which takes the output of the text classifier, joins it
with the three humor-specific features (alliteration,
antonymy, adult slang), and feeds the newly created
feature vectors to a machine learning tool. Given
the relatively large gap between the performance
achieved with content-based features (text classifi-
cation) and stylistic features (humor-specific heuris-
tics), we decided to implement the second learning
stage in the stacked learner using a memory based
learning system, so that low-performance features
are not eliminated in the favor of the more accu-
rate ones5. We use the Timbl memory based learner
(Daelemans et al., 2001), and evaluate the classifica-
tion using a stratified ten-fold cross validation. Table
5Using a decision tree learner in a similar stacked learning
experiment resulted into a flat tree that takes a classification de-
cision based exclusively on the content feature, ignoring com-
pletely the remaining stylistic features.
4 shows the results obtained in this experiment, for
the three different data sets.
One-liners One-liners One-liners
Reuters BNC Proverbs
96.95% 79.15% 84.82%
Table 4: Humor-recognition accuracy for combined
learning based on stylistic and content features.
Combining classifiers results in a statistically sig-
nificant improvement (p < 0.0005, paired t-test)
with respect to the best individual classifier for the
One-liners/Reuters and One-liners/BNC data sets,
with relative error rate reductions of 8.9% and 7.3%
respectively. No improvement is observed for the
One-liners/Proverbs data set, which is not surpris-
ing since, as shown in Table 2, proverbs and one-
liners cannot be clearly differentiated using stylistic
features, and thus the addition of these features to
content-based features is not likely to result in an
improvement.
4.4 Discussion
The results obtained in the automatic classification
experiments reveal the fact that computational ap-
proaches represent a viable solution for the task of
humor-recognition, and good performance can be
achieved using classification techniques based on
stylistic and content features.
Despite our initial intuition that one-liners are
most similar to other creative texts (e.g. Reuters ti-
tles, or the sometimes almost identical proverbs),
and thus the learning task would be more difficult in
relation to these data sets, comparative experimental
results show that in fact it is more difficult to distin-
guish humor with respect to regular text (e.g. BNC
536
sentences). Note however that even in this case the
combined classifier leads to a classification accuracy
that improves significantly over the apriori known
baseline.
An examination of the content-based features
learned during the classification process reveals in-
teresting aspects of the humorous texts. For in-
stance, one-liners seem to constantly make reference
to human-related scenarios, through the frequent use
of words such as man, woman, person, you, I. Simi-
larly, humorous texts seem to often include negative
word forms, such as the negative verb forms doesn’t,
isn’t, don’t, or negative adjectives like wrong or bad.
A more extensive analysis of content-based humor-
specific features is likely to reveal additional humor-
specific content features, which could also be used in
studies of humor generation.
In addition to the three negative data sets, we also
performed an experiment using a corpus of arbitrary
sentences randomly drawn from the three negative
sets. The humor recognition with respect to this neg-
ative mixed data set resulted in 63.76% accuracy for
stylistic features, 77.82% for content-based features
using Na¨ıve Bayes and 79.23% using SVM. These
figures are comparable to those reported in Tables 2
and 3 for One-liners/BNC, which suggests that the
experimental results reported in the previous sec-
tions do not reflect a bias introduced by the negative
data sets, since similar results are obtained when the
humor recognition is performed with respect to ar-
bitrary negative examples.
As indicated in section 2.2, the negative exam-
ples were selected structurally and stylistically sim-
ilar to the one-liners, making the humor recognition
task more difficult than in a real setting. Nonethe-
less, we also performed a set of experiments where
we made the task even harder, using uneven class
distributions. For each of the three types of nega-
tive examples, we constructed a data set using 75%
non-humorous examples and 25% humorous exam-
ples. Although the baseline in this case is higher
(75%), the automatic classification techniques for
humor-recognition still improve over this baseline.
The stylistic features lead to a classification accu-
racy of 87.49% (One-liners/Reuters), 77.62% (One-
liners/BNC), and 76.20% (One-liners/Proverbs),
and the content-based features used in a Na¨ıve
Bayes classifier result in accuracy figures of 96.19%
(One-liners/Reuters), 81.56% (One-liners/BNC),
and 87.86% (One-liners/Proverbs).
Finally, in addition to classification accuracy, we
were also interested in the variation of classifica-
tion performance with respect to data size, which
is an aspect particularly relevant for directing fu-
ture research. Depending on the shape of the learn-
ing curves, one could decide to concentrate future
work either on the acquisition of larger data sets, or
toward the identification of more sophisticated fea-
tures. Figure 2 shows that regardless of the type of
negative data, there is significant learning only un-
til about 60% of the data (i.e. about 10,000 positive
examples, and the same number of negative exam-
ples). The rather steep ascent of the curve, especially
in the first part of the learning, suggests that humor-
ous and non-humorous texts represent well distin-
guishable types of data. An interesting effect can
be noticed toward the end of the learning, where for
both classifiers the curve becomes completely flat
(One-liners/Reuters, One-liners/Proverbs), or it even
has a slight drop (One-liners/BNC). This is probably
due to the presence of noise in the data set, which
starts to become visible for very large data sets6.
This plateau is also suggesting that more data is not
likely to help improve the quality of an automatic
humor-recognizer, and more sophisticated features
are probably required.
5 Related Work
While humor is relatively well studied in scientific
fields such as linguistics (Attardo, 1994) and psy-
chology (Freud, 1905; Ruch, 2002), to date there
is only a limited number of research contributions
made toward the construction of computational hu-
mour prototypes.
One of the first attempts is perhaps the work de-
scribed in (Binsted and Ritchie, 1997), where a for-
mal model of semantic and syntactic regularities was
devised, underlying some of the simplest types of
puns (punning riddles). The model was then ex-
ploited in a system called JAPE that was able to au-
tomatically generate amusing puns.
Another humor-generation project was the HA-
HAcronym project (Stock and Strapparava, 2003),
whose goal was to develop a system able to au-
tomatically generate humorous versions of existing
6We also like to think of this behavior as if the computer
is losing its sense of humor after an overwhelming number of
jokes, in a way similar to humans when they get bored and stop
appreciating humor after hearing too many jokes.
537
acronyms, or to produce a new amusing acronym
constrained to be a valid vocabulary word, starting
with concepts provided by the user. The comic ef-
fect was achieved mainly by exploiting incongruity
theories (e.g. finding a religious variation for a tech-
nical acronym).
Another related work, devoted this time to the
problem of humor comprehension, is the study re-
ported in (Taylor and Mazlack, 2004), focused on
a very restricted type of wordplays, namely the
“Knock-Knock” jokes. The goal of the study was
to evaluate to what extent wordplay can be automati-
cally identified in “Knock-Knock” jokes, and if such
jokes can be reliably recognized from other non-
humorous text. The algorithm was based on auto-
matically extracted structural patterns and on heuris-
tics heavily based on the peculiar structure of this
particular type of jokes. While the wordplay recog-
nition gave satisfactory results, the identification of
jokes containing such wordplays turned out to be
significantly more difficult.
6 Conclusion
A conclusion is simply the place where you got tired of thinking.
(anonymous one-liner)
The creative genres of natural language have been
traditionally considered outside the scope of any
computational modeling. In particular humor, be-
cause of its puzzling nature, has received little atten-
tion from computational linguists. However, given
the importance of humor in our everyday life, and
the increasing importance of computers in our work
and entertainment, we believe that studies related to
computational humor will become increasingly im-
portant.
In this paper, we showed that automatic classifi-
cation techniques can be successfully applied to the
task of humor-recognition. Experimental results ob-
tained on very large data sets showed that computa-
tional approaches can be efficiently used to distin-
guish between humorous and non-humorous texts,
with significant improvements observed over apriori
known baselines. To our knowledge, this is the first
result of this kind reported in the literature, as we
are not aware of any previous work investigating the
interaction between humor and techniques for auto-
matic classification.
Finally, through the analysis of learning curves
plotting the classification performance with respect
to data size, we showed that the accuracy of the au-
tomatic humor-recognizer stops improving after a
certain number of examples. Given that automatic
humor-recognition is a rather understudied problem,
we believe that this is an important result, as it pro-
vides insights into potentially productive directions
for future work. The flattened shape of the curves
toward the end of the learning process suggests that
rather than focusing on gathering more data, fu-
ture work should concentrate on identifying more
sophisticated humor-specific features, e.g. semantic
oppositions, ambiguity, and others. We plan to ad-
dress these aspects in future work.
References
S. Attardo. 1994. Linguistic Theory of Humor. Mouton de
Gruyter, Berlin.
K. Binsted and G. Ritchie. 1997. Computational rules for pun-
ning riddles. Humor, 10(1).
C. Bucaria. 2004. Lexical and syntactic ambiguity as a source
of humor. Humor, 17(3).
W. Daelemans, J. Zavrel, K. van der Sloot, and A. van den
Bosch. 2001. Timbl: Tilburg memory based learner, ver-
sion 4.0, reference guide. Technical report, University of
Antwerp.
S. Freud. 1905. Der Witz und Seine Beziehung zum Unbe-
wussten. Deutike, Vienna.
T. Joachims. 1998. Text categorization with Support Vector
Machines: learning with many relevant features. In Pro-
ceedings of the European Conference on Machine Learning.
D.E. Knuth. 1993. The Stanford Graph Base: A Platform for
combinatorial computing. ACM Press.
D. Lewis, Y. Yang, T. Rose, and F. Li. 2004. RCV1: A new
benchmark collection for text categorization research. The
Journal of Machine Learning Research, 5:361–397.
A. McCallum and K. Nigam. 1998. A comparison of event
models for Naive Bayes text classification. In Proceedings
of AAAI-98 Workshop on Learning for Text Categorization.
G. Miller. 1995. Wordnet: A lexical database. Communication
of the ACM, 38(11):39–41.
A. Nijholt, O. Stock, A. Dix, and J. Morkes, editors. 2003. Pro-
ceedings of CHI-2003 workshop: Humor Modeling in the
Interface, Fort Lauderdale, Florida.
W. Ruch. 2002. Computers with a personality? lessons to be
learned from studies of the psychology of humor. In Pro-
ceedings of the The April Fools Day Workshop on Computa-
tional Humour.
O. Stock and C. Strapparava. 2003. Getting serious about the
development of computational humour. In Proceedings of
the 8th International Joint Conference on Artificial Intelli-
gence (IJCAI-03), Acapulco, Mexico.
O. Stock, C. Strapparava, and A. Nijholt, editors. 2002. Pro-
ceedings of the The April Fools Day Workshop on Computa-
tional Humour, Trento.
J. Taylor and L. Mazlack. 2004. Computationally recognizing
wordplay in jokes. In Proceedings of CogSci 2004, Chicago.
538
