Characterizing Response Types and Revealing Noun Ambiguity
in German Association Norms
Alissa Melinger and Sabine Schulte im Walde and Andrea Weber
Psycholinguistics and Computational Linguistics
Saarland University, Saarbr¨ucken, Germany
a0 melinger,schulte,aweber
a1 @coli.uni-sb.de
Abstract
This paper presents an analysis of seman-
tic association norms for German nouns.
In contrast to prior studies, we not only
collected associations elicited by written
representations of target objects but also
by their pictorial representations. In a first
analysis, we identified systematic differ-
ences in the type and distribution of as-
sociate responses for the two presentation
forms. In a second analysis, we applied a
soft cluster analysis to the collected target-
response pairs. We subsequently used the
clustering to predict noun ambiguity and
to discriminate senses in our target nouns.
1 Introduction
Language is rife with ambiguity. Sentences can
be structurally ambiguous, pronouns can be ref-
erentially ambiguous, and words can be polyse-
mous. The human language faculty deals remark-
ably well with the omnipresent ambiguity, so well
in fact that we are rarely aware of the multiple al-
ternatives that are available. Despite our appar-
ent lack of awareness, psycholinguistic research
has shown that alternative meanings are never-
theless activated during processing. For exam-
ple, in a seminal study of homograph recognition,
Tanenhaus et al. (1979) demonstrated that multi-
ple meanings of a homograph are initially acti-
vated even in highly constraining syntactic con-
texts, such as They all rose vs. They bought a rose.
Likewise in speech production, Cutting and Fer-
reira (1999) showed that non-depicted senses of
homophones are activated during picture naming.
Thus, when either a homograph word or a picture
with a homophone name are processed, multiple
meanings are initially activated.
Intuitively, however, one might expect differ-
ences in the degree to which multiple meanings
are activated depending on the presentation mode.
To our knowledge no investigation has compared
picture (top-down) and word (bottom-up) seman-
tic processing. In this paper, we investigate differ-
ences in the semantic information, namely asso-
ciations, elicited in these two presentation modes.
We reason that, if multiple meanings of an am-
biguous word are activated when the stimulus is
processed, then the elicited associates should re-
flect the ambiguity. If the degree of activation dif-
fers with respect to the presentation mode, the as-
sociates should reflect this difference as well.
Manuallylinkingassociates toaparticular word
sense would be time intensive and subjective.
Thus, we rely on computational methods that have
the potential to automatically compare the asso-
ciatesprovided forthetwopresentationmodesand
classify them into meaning-referring sets. These
methods thus not only reveal differences in the as-
sociates elicited in the two presentation conditions
but also, in the case of ambiguous nouns, identify
which associates are related to which meaning of
the word.
Our analyses are guided by the following two
questions:
1. Are there systematic differences in associate
response types when target objects are pre-
sented in written form compared to when the
written form is accompanied by a pictorial
representation? Predictions about which dif-
ferences we expected in the response types
are made, and the associate responses are an-
alyzed accordingly (Section 4).
2. Can we identify multiple senses of the nouns
and discriminate between noun senses based
on the associate responses? We apply a clus-
tering technique to the target-response pairs;
the cluster analysis gathers semantically sim-
ilar target nouns, based on overlapping sets
of associate responses, and predicts the am-
biguity of nouns and their senses (Section 5).
41
Ina2 Section 2, we provide an overview of the types
of differences we anticipate; Section 3 describes
the materials and procedure used for the associ-
ation elicitation; in Sections 4 and 5, we explain
how response types were characterized and noun
senses identified.
2 Intuitions
A critical component of the current study was the
presentation of target stimuli in two forms: Lexi-
cal stimuli consisted of the written name of target
objects; pictorial stimuli consisted of the written
names accompanied by black and white line draw-
ings of the referred-to objects.
We assumed that, in some cases, associate re-
sponses elicited by written words would be dif-
ferent from associate responses elicited by pic-
tures. Differences in responses might arise from
a variety of sources: a) images might increase the
salience of physical attributes of objects, b) im-
ages might show non-prototypical characteristics
of objects that would not be evoked by words, c)
when word forms have different shades of mean-
ing, responses evoked by lexical stimuli might in-
dex any of the words’ meanings while responses
evoked by pictorial representations might be more
biased towards the depicted sense.
To illustrate these points, consider the follow-
ing example. The picture of a Hexe ‘witch’ from
our study showed a witch riding on a broom, see
Figure 1. This particular choice of activity, rather
than, for example, a plausible alternative like stir-
ring a cauldron or simply standing by herself,
accentuated the relationship between witch and
broom. Indeed, we found that this accentuation
was reflected in the associate responses: 27 of the
50 participants (54%) who saw the picture of the
witch produced broom as an associate while only
18 participants (36%) who read the word witch
produced broom. Thus, the association strength of
a response elicited by words does not necessarily
generalize to picture stimuli, and vise versa.
To demonstrate the relevance of presentation
mode for potentially ambiguous nouns, consider
a second example. The German word for ‘lock’
is Schloss. Schloss, however, also means ‘cas-
tle’. Associate responses such as Schl¨ussel ‘key’
and Fahrrad ‘bicycle’ might be elicited by the
lock meaning of the word while responses such as
Prinzessin ‘princess’ or Burg ‘castle’ would index
the alternative meaning.
Figure 1: Example picture for witch.
3 Data Collection Method
This section introduces our elicitation procedure.
Materials: 409 German nouns referring to pic-
turable objects were chosen as target stimuli. To
ensure broad coverage, target objects represented
a variety of semantic classes including animals,
plants, professions, food, furniture, vehicles, and
tools. Simple black and white line drawings of
target stimuliweredrawn fromseveral sources, in-
cluding Snodgrass and Vanderwart (1980) and the
picture database from the Max Planck Institute for
Psycholinguistics in the Netherlands.
Participants: 300 German participants, mostly
students from Saarland University, received either
course credit or monetary compensation for filling
out a questionnaire.
Procedure: The409target stimuliweredivided
randomly into three separate questionnaires con-
sisting of approximately 135 nouns each. Each
questionnaire was printed in two formats: target
objects were either presented as pictures together
with their preferred name (to ensure that associate
responses were provided for the desired lexical
item) or the name of the target objects was pre-
sented without a representative picture accompa-
nying it. Next to each target stimulus three lines
were printed on which participants could write up
to three semantic associate responses for the stim-
ulus. The order of stimulus presentation was indi-
vidually randomized for each participant. Partici-
pants were instructed to give one associate word
per line, for a maximum of three responses per
trial. No time limits were given for responding,
though participants were told to work swiftly and
withoutinterruption. Eachversion ofthequestion-
naire was filled out by 50 participants, resulting in
a maximum of 300 data points for any given target
stimulus (50 participants a3 2 presentation modes
a3 3 responses).
Collected associate responses were entered into
a database with the following additional infor-
42
mation:a4 For each target stimulus we recorded a)
whether it was presented as a picture or in written
form, and b) whether the name was a homophone
(and thus likely to elicit semantic associates for
multiple meanings). For each response type pro-
vided by a participant, we coded a) the order of
the response, i.e., first, second, third, b) the part-
of-speech of the response, and c) the type of se-
manticrelationbetweenthetarget stimulusandthe
response (e.g., part-whole relations such as car –
wheel, and categorical relationship such as hyper-
nymy, hyponymy, and synonymy).
4 Analysis of Response Types
As described in Section 2, one might expect vari-
ation in the response types for the two presenta-
tion modes, because the associations provided in
the ‘picture+word’ condition were biased towards
the depicted sense of the target noun. Our first
analysis evaluates what sorts of differences are in
fact observed in the data, i.e., which intuitions
are empirically supported, and which are not. To
this end, this section is concerned with systematic
differences in response types when target stimuli
were presented in written form (‘word only’, sub-
sequently W condition) or when the written form
was accompanied by a picture (‘picture+word’,
subsequently PW condition). We first give our
predictions for the differences in response types
and then continue with the corresponding analy-
ses of response types.
4.1 Predictions
Based on our intuitions, we predicted the follow-
ing differences.
1. The overall number of response tokens is
unlikely to differ for the two presentation
modes, since participants are limited to three
associate responses per target stimulus in
both presentation modes.
2. The overall number of response types, how-
ever, should differ: in the PW condition
we expect a bias towards the depicted noun
sense, resulting in a smaller number of re-
sponse types than in the W condition.
3. The PW condition produces less idiosyn-
cratic response types than the W condition,
because pictures reinforce associations that
are either depicted, or at least related to the
depicted sense and its characteristics, result-
ing in less response diversity.
4. The PWcondition receives more associations
that show a part-of relation to the target stim-
ulus than the W condition, because charac-
teristics of the pictures can highlight specific
parts of the whole.
5. The type agreement, i.e., the number of re-
sponse types on which the PW and the W
conditions agree is expected to differ with re-
spect to the target noun. For target nouns
thatarehighlyambiguousweexpectlowtype
agreement. Note that this prediction does not
refer to a PW-W distinction, but instead uses
the PW-W distinction to approach the issue
of noun senses.
4.2 Response Type Distributions
The analyses to follow are based on stimulus-
response frequency distributions: For each target
stimulus and each response type, we calculated
how often the response type was provided. The
result was a frequency distribution for the 409
target nouns, providing frequencies for each re-
sponsetype. Thefrequency distributions were dis-
tinguished for the PW condition and the W condi-
tion. Table 1 provides an example of the most fre-
quent response types and their frequencies for the
homophone target noun Schloss, as described in
Section 2; the ‘lock’ meaning was depicted, ‘cas-
tle’ is an alternative meaning. Hereafter, we will
refer to an association provided in the PW con-
dition as association PW, and an association pro-
vided in the W condition as association W, e.g.,
Burg PW vs. Burg W.
Association POS PW W
Schl¨ussel ‘key’ N 38 13
T¨ur ‘door’ N 10 5
Prinzessin ‘princess’ N 0 8
Burg ‘castle’ N 0 8
sicher ‘safe’ ADJ 7 0
Fahrrad ‘bike’ N 7 0
schließen ‘close’ V 6 1
Keller ‘cellar’ N 7 0
K¨onig ‘king’ N 0 7
Turm ‘tower’ N 0 6
Sicherheit ‘safety’ N 5 1
Tor ‘gate’ N 2 4
zu ‘shut’ ADV 4 1
Table 1: Response type frequencies for Schloss.
4.3 Results
Based on the frequency distributions in Sec-
tion4.2, we analyzed theresponse types according
to our predictions in Section 4.1.
43
Number of response tokens: The number of
response tokens was compared for each target
stimulus in both presentation modes. The total
number ofresponse tokens was58,642 (with mean
a5a7a6a9a8a11a10a13a12 ) in the PW condition and 58,072 (a5a14a6
142)in theW condition. Wehad predicted that To-
ken(PW) a15 Token(W). The analysis showed, how-
ever, that in 243 of 409 cases (59%) the number
of response tokens was larger for PW than for W
(Token(PW) a16 Token(W)); in 132 cases (32%) To-
ken(PW) a17 Token(W), and in 34 cases (8%) To-
ken(PW) a6 Token(W). The unpredicted difference
betweenpresentationmodeswassignificantacross
items in a two-tailed t-test, a18a20a19 a10a22a21a22a23a22a24a25a6a27a26a29a28a30a21a13a31a32a31a34a33a36a35 a17
a28a30a21a32a21a37a8 . We take the result as an indication that pic-
turesfacilitate the production ofassociations. This
is an interesting insight especially since the num-
ber of associate responses per target stimulus was
limited while response time was not.
Number of response types: The number of re-
sponse types was compared for each target stim-
ulus in both presentation modes. The total num-
ber of response types in the PW condition was
19,800 (a5a38a6 48) compared with 20,332 (a5a39a6
50) in the W condition. We had predicted that
Type(W) a16 Type(PW). The results showed in-
deed that in 229 of the 409 cases (56%) the
number of response types was larger for W than
for PW (Type(W) a16 Type(PW)); in 152 cases
(37%) Type(PW) a16 Type(W), and in 28 cases
(7%) Type(PW) = Type(W). This predicted differ-
ence, although small, was significant, a18a11a19 a10a22a21a22a23a22a24a40a6
a12a29a28a41a26a32a12a29a33a36a35
a17
a28a30a21a32a21a37a8 .
Idiosyncratic response types: The proportions
of idiosyncratic response types (i.e., associate re-
sponses that were provided only once for a cer-
tain target stimulus) were compared for each tar-
get stimulus in both presentation modes. In total,
12,011 (a5a42a6 29) idiosyncratic responses were pro-
vided in the PW condition and 12,582 (a5a42a6 31) id-
iosyncratic responses in the W condition. We had
predicted that Idio(W) a16 Idio(PW). The analysis
showed indeed that in 216 of the 409 cases (53%)
the number of idiosyncratic responses was larger
for W than for PW (Idio(W) a16 Idio(PW)); in 175
cases (43%) Idio(PW) a16 Idio(W), and in 18 cases
(4%) Idio(PW) a6 Idio(W). The predicted differ-
encewasreliableacrossitems,a18a11a19 a10a22a21a22a23a22a24a43a6a44a12a29a28a45a31a46a26a29a33a36a35 a17
a28a30a21a32a21a37a8 . This pattern of results is consistent with the
notion of a restricted set of responses in the PW
condition relative to the W condition.
Part-of response types: Based on the man-
ual annotation of semantic relations between tar-
get nouns and responses, proportions of response
types which stand in a part-of relation to the target
nouns were determined. The total number of part-
of response types was 876 (a5a47a6 2.7) in the PW
condition, and 901 (a5a48a6 2.8) in the W condition.
We predicted that Part(PW) a16 Part(W). The anal-
ysis showed however that in only 94 of the 409
cases (29%) the number of part-of responses was
larger for PW than for W (Part(PW) a16 Part(W));
in 114 cases (35%) Part(W) a16 Part(PW), and in
115 cases (36%) Part(W) a6 Part(PW). The differ-
encebetweenconditionswasnot significantacross
items, a18a20a19 a12a32a49a32a49a22a24a50a6a51a8a46a28a52a10a13a49a29a33a36a35 a16 a28a53a8 . The absence of a re-
liable difference in this analysis possibly suggests
that our pictures did not regularly enhance a part-
whole relationship.
Type agreement: The final analysis was based
on response type agreement for PW and W. How-
ever, this analysis did not aim to distinguish be-
tween the two presentation modes but rather used
the agreement proportions as a diagnostic of po-
tential target noun ambiguity. Here we calculated
thetotalamountofoverlapbetweenthePWandW
conditions. For this calculation, we identified the
number of response types that occur in both the
PW and W conditions for a particular target stim-
ulus and divided that number by the total number
of response types produced for that target stimu-
lus, irrespective of condition. In other words, if a
noun PW receives responses A and B and noun W
receives responses B and C, then the total number
of shared response types is 1, namely response B,
and the total number of response types across con-
ditions is 3, namely A, B and C. Thus, the propor-
tion of agreement is .33.
We reasoned that target nouns with low type
agreement are likely to be ambiguous. To test this,
we sorted the targets by their proportion of agree-
ment, and comparedthe top and bottom 20 targets.
In the manual annotation of our stimuli, cf. Sec-
tion 3, we had recorded that 10% of our stimuli
were homophones. Thus, a random distribution
would predict two ambiguous items in a 20 item
sample if the proportion of agreement is not an in-
dicator of ambiguity. Instead, we found 11 am-
biguous nouns in the set of 20 targets with lowest
agreement proportions and 2 ambiguous nouns in
the set of 20 targets with highest agreement pro-
portions. A a54a56a55 test indicated that the number of
44
ambiguousa57 nouns found in the two sets differed
significantly, a54a56a55 a6a58a31a34a28a41a49a32a59a29a33a36a35 a17 a28a30a21a37a8 .
Summarizing this first set of analyses, we found
that the associate responses for concrete German
nouns differed significantly depending on the for-
mat under which they were elicited, namely the
presentation mode. The fact that we found more
response types in total and also more idiosyncratic
responses when target nouns were presented in
the ‘word only’ vs. the ‘picture+word’ condition
suggests that alternative meanings were more ac-
tive when participants were presented with writ-
ten stimuli compared to depicted stimuli. It is also
interesting to note that not all our intuitive predic-
tions wereborn out. Forexample, despiteour feel-
ing that the picture should bias the inclusion of de-
picted part-of relations, such as the broom a15 witch
example discussed above, this intuition was not
supported by the data. This fact highlights the im-
portance of first analyzing the responses to ensure
the necessary conditions are present for the identi-
fication of ambiguous words.
5 Analysis of Noun Senses
The second analysis in this paper addresses the
distinction of noun senses on the basis of asso-
ciations. Our goal is to identify the – potentially
multiple – senses of target nouns, and to reveal
differences in the noun senses with respect to the
presentation modes. The analysis was done as fol-
lows.
1. Thetarget-responsepairswereclustered. The
soft cluster analysis was expected to assign
semantically similar noun senses into com-
mon clusters, as based on shared associate re-
sponses. (Section 5.1)
2. The clusters were used to predict the ambigu-
ityof nouns andtheir respective senses. (Sec-
tion 5.2)
3. The clusters and their predictability were
evaluated by annotating noun senses with
Duden dictionary definitions, and calculating
interannotator agreement. (Section 5.3)
5.1 Latent Semantic Noun Clusters
Target nouns were clustered on the basis of their
association frequencies, cf. Table 1. I.e., the
clustering result was determined by joint frequen-
cies of the target nouns and the respective associ-
ations. The targets themselves were described by
the noun-condition combination, e.g. Schloss PW,
and Schloss W. We used noun-condition combina-
tions as compared to nouns only, because the clus-
tering result should not only distinguish senses of
nouns in general, but in addition predict the noun
senses with respect to the condition.
Various techniques have been exploited for
word sense disambiguation. Closely related to
our work, Schvaneveldt’s pathfinder networks
(Schvaneveldt, 1990) were based on word asso-
ciations and were used to identify word senses.
An enourmous number of approaches in compu-
tational linguistics can be found on the SENSE-
VAL webpage (SENSEVAL,), which hosts a word
sense disambiguation competition. We applied
Latent Semantic Clusters (LSC) to our associa-
tion data. The LSC algorithm is an instance of
the Expectation-Maximisation algorithm (Baum,
1972) for unsupervised training based on unan-
notated data, and has been applied to model the
selectional dependency between two sets of words
participatinginagrammaticalrelationship(Rooth,
1998; Rooth et al., 1999). The resulting clus-
ter analysis defines two-dimensional soft clusters
whichareabletogeneraliseoverhiddendata. LSC
training learns three probability distributions, one
for the probabilities of the clusters, and one for
each tuple input item and each cluster (i.e., a prob-
ability distribution for the target nouns and each
cluster, and one for the associations and each clus-
ter), thus the two dimensions. We use an imple-
mentation of the LSC algorithm as provided by
Helmut Schmid.
The LSC output depends not only on the distri-
butional input, but also on the number of clusters
tomodel. Asarule, themoreclustersaremodeled,
the more skewed the resulting probability distribu-
tions for cluster membership are. Since the goal
of this work was not to optimize the clustering pa-
rameters, but to judge the general predictability of
such models, we concentrated on two clustering
models, with 100 and 200 clusters, respectively.
Table 2 presents the most probable noun-
condition combinations for a cluster from the 100-
clusteranalysis: Theclusterprobability is0.01295
(probabilities ranged from 0.00530 to 0.02674).
Themostprobable associationsthatwerecommon
to members of this cluster were Ritter ‘knight’,
Mittelalter ‘medieval times’, R¨ustung ‘armour’,
Burg ‘castle’, Kampf ‘fight’, k¨ampfen ‘fight’,
Schwert ‘sword’, Waffe‘weapon’, Schloss‘castle’,
45
sca60 harf ‘sharp’. This example shows that the asso-
ciationsprovide asemanticdescription oftheclus-
ter, and the target nouns themselves appear in the
cluster if one of their senses is related to the clus-
ter description. In addition, we can see that, e.g.,
Schloss appearsinthisclusteronlyinthe Wcondi-
tion. The reason for this is that the picture showed
the ‘lock’ sense of Schloss, so the PW condition
was less likely to elicit ‘castle’-related responses.
This example cluster illustratesnicely what we ex-
pect from the cluster analysis with respect to dis-
tinguishing noun senses.
Target Noun Cond Prob
R¨ustung ‘armour’ W 0.097
Schwert ‘sword’ W 0.097
Burg ‘castle’ W 0.096
R¨ustung ‘armour’ PW 0.096
Dolch ‘dagger’ PW 0.095
Schwert ‘sword’ PW 0.093
Burg ‘castle’ PW 0.091
Dolch ‘dagger’ W 0.089
Ritter ‘knight’ PW 0.073
Ritter ‘knight’ W 0.068
Schloss ‘castle’ W 0.040
Turm ‘tower’ PW 0.014
Table 2: Sample cluster, 100-cluster analysis.
5.2 Prediction of Noun Ambiguity and Noun
Senses
The noun clusters were used to predict the ambi-
guity of nouns and their respective senses. The
two-dimensional cluster probabilities, as intro-
duced above, offer the following information:
a61 Which associations are highly probable for a
cluster? The most probable associations are
considered as defining the semantic content
of the cluster.
a61 Which target nouns are highly probable for
a cluster and its semantic content, i.e. the
associations? Relating the target nouns in a
cluster with the cluster associations defines
the respective sense of the noun. To refer to
the above example, finding Schloss in a clus-
ter together with associations such as ‘castle’
and ‘fight’ relates this instance of Schloss to
the ‘castle’ sense and not the ‘lock’ sense.
a61 Which target nouns are in the same cluster
and therefore refer to a common sense/aspect
of the nouns? This information is relevant for
revealing sense differences of target nouns
with respect to the conditions PW vs. W.
In order to predict whether a noun is in a cluster or
not, we needed a cut-off value for the membership
probability. We settled on 1%, i.e., a target noun
withaprobabilityof a62 1%wasconsideredamem-
ber of a cluster. Based on the 200-cluster informa-
tion, we then performed the following analyses on
noun ambiguity and noun senses.
Prediction of noun ambiguity: For each tar-
get noun, we predicted its ambiguity by the num-
ber of clusters it was a member of. For example,
the highly ambiguous noun Becken ‘basin, cym-
bal, pelvis’ (among other senses), was a mem-
ber of 8 clusters, as compared to the unambigu-
ous B¨acker ‘baker’ which was a member of only
one cluster. Membership in several clusters does
not necessarily point to multiple noun senses (be-
causedifferent combinationsofassociationsmight
define similar semantic contents), but nevertheless
the clusters provide an indication of the degree of
noun ambiguity. The total number of senses in the
200-cluster analysis was 735, which means an av-
erage of 1.8 senses for each target stimulus (across
presentation condition).
Discrimination of noun senses: The most
probableassociationsin the clusterswereassumed
to describe the semantic content of the clusters.
They can be used to discriminate noun senses of
polysemous nouns. Referring back to our exam-
ple noun Becken, it appeared in one cluster with
the most probable associations Wasser ‘water’,
Garten ‘garden’, Feuerwehr ‘fire brigade’, gießen
‘water’, and nass ‘wet’, describing the ‘basin’
sense of the target noun; in a second cluster it ap-
pearedwithMusik ‘music’, laut ‘loud’, Instrument
‘instrument’, Orchester ‘orchestra’, and Jazz, de-
scribing the music-related sense; and in a third
cluster it appeared with Hand ‘hand’, Bein ‘leg’,
Ellenbogen ‘elbow’, K¨orper ‘body’ and Muskel
‘muscle’, describing the body-related sense, etc.
Noun similarity: Those target nouns which
were assigned to a common cluster were assumed
tobesemanticallysimilar(withrespecttotheclus-
ter content). Again, referring back to our example
noun Becken and the three senses discriminated
above, in the first cluster refering to the ‘basin’
sense we find other nouns such as Eimer ‘bucket’,
Font¨ane ‘fountain’, Brunnen ‘fountain, well’, Wei-
her ‘pond’, and Vase ‘vase’, all related to water
andwatercontainer; inthesecondclusterreferring
to the music sense we find Tuba ‘tuba’, Trompete
‘trumpet’, Saxophon ‘sax’, and Trommel ‘drum’,
46
anda57 in the third cluster referring to the body sense
we find Arm ‘arm’, and Knochen ‘bone’.
Discrimination of PW vs. W noun senses:
Combining the previous two analyses allowed us
to discriminate senses as provided by the two ex-
perimental conditions. Remember that the target
nouns in the clusters included the specification of
the condition. If we find a target noun in a cer-
tain cluster with both condition specifications, it
means that some associations produced to both the
PW and the W conditions referred to the same
noun sense. If a target noun appears in a certain
cluster only with one condition specified, it means
that the associations captured the respective noun
sense only in one condition. Thus, a target noun
appearing in a cluster in only one condition was
an indication for ambiguity. Going back to our ex-
ample noun Becken and its three example clusters,
we find the noun in both conditions only in one of
the three clusters, namely the cluster for the music
sense, and this happens to be the sense depicted
in the PW condition. In the two other clusters,
we only find Becken in the W condition. In total,
Becken appears in both conditions only in 1 out of
8 clusters, in only the PW condition in 1 cluster,
and in only the W condition in 6 clusters.
The four analyses demonstrate that and how the
clusters can be used to predict and discriminate
noun senses. Of course, the predictions are not
perfect, but they apprximately correspond to our
linguistic intuitions. Impressively, the clusters re-
vealed not only blatantly polysemous words such
as Becken but also distinct facets of a word. For
example, the stimulus Filter ‘filter’ had associa-
tions to coffee-related senses as well as cigarette-
related senses, both of which were then reflected
in the clusters.
5.3 Evaluation of Noun Clusters
In order to perform a more independent evaluation
of the clusters which is not only based on specific
examples, we assessed the clusters by two annota-
tors. 20homophones weremanuallyselectedfrom
the 409 target nouns. In addition, we relied on the
indicators for ambiguity as defined in Section 4,
and selected the 20 top and bottom nouns from the
ordered list of type agreement for the two condi-
tions. The manual list showed some overlap with
the selection dependent on type agreement, result-
ing in a list of 51 target nouns.
For each of the selected target nouns, we looked
up the noun senses as defined by the Duden, a
standard German dictionary. We primarily used
the stylistic dictionary (Dudenredaktion, 2001),
but used the foreign language dictionary (Du-
denredaktion, 2005) ifthe noun was missing in the
former. Each target noun was defined by its (short
version) sense definitions. For example, Schloss
was defined by the senses Vorrichtung zum Ver-
schließen ‘device for closing’ and Wohngeb¨aude
von F¨ursten und Adeligen ‘residential building for
princes and noblemen’.
As targets for the evaluation, we used the two
cluster analyses as mentioned above, containing
100 and 200 clusters with membership probabil-
ity cut-offs at 1%. Two annotators were then pre-
sented with two lists each: For each cluster analy-
sis, they saw a list of the 51 selected target nouns,
accompanied by the clusters they were members
of,i.e., for whichthey showed aprobability a62 a8a64a63 ,
ignoring the condition of the target noun (PW vs.
W). In total, the annotators were given 82/91 clus-
ters which included any of the 51 selected nouns.
For each cluster, the annotators saw the five most
probable associations, and all cluster members.
The annotators were asked to select a Duden sense
for each cluster, if possible. The results of the an-
notation are presented in Table 3. Annotator 1
identified a Duden sense for 72/75% of the clus-
ters, annotator 2for 78/71%. Interannotator agree-
ment on which of the Duden senses was appropri-
ate for a cluster (if any) was 81/85%; a65 a6a66a28a45a31a46a23a22a67a34a28a45a31a46a68 .
Source 100 clusters 200 clusters
No. of clusters 82 91
Annotator 1 59 72% 68 75%
Annotator 2 64 78% 65 71%
Table 3: Clusters and identified Duden senses.
The evaluation of the clusters as carried out by
the sense annotation demonstrates that the cluster
senses correspond largely to Duden senses. This
first kind of evaluation models the precision of the
cluster analyses. A second kind of evaluation as-
sessed how many different Duden senses we cap-
turewiththe clusteranalyses; thisevaluation mod-
ells the recall of the cluster analyses. Duden de-
fines a total of 113 senses to our target nouns. Ta-
ble 4 specifies the recall for the data sets and an-
notators.
The evaluations show that the precision is much
larger than the recall. It might be worth applying
the clustering with a different number of clusters
47
Source 100 clusters 200 clusters
Annotator 1 46 41% 54 48%
Annotator 2 51 45% 52 46%
Table 4: Cluster recall of Duden senses.
and/or a different cut-off for the cluster member-
ship probability, but that would lower the preci-
sionoftheanalyses. Webelieve thattheevaluation
numbers are quite impressive, especially consider-
ing that Duden not only specifies everyday vocab-
ulary, but includes colloquial expressions (such as
Ballon as ‘human head’), out-dated senses (such
as Mond as ‘month’), and domain-specific senses
(such as Blatt as ‘shoulder of a hoofed game’).
6 Conclusions
In this paper we evaluated differences in the types
and strengths of semantic associations elicited un-
der twoconditions of presentation, ‘picture+word’
and ‘word only’. Consistent with prior psycholin-
guistic research, we observed associations to dif-
ferent meanings of a word in both conditions,
supporting the idea that multiple meanings of
homonymsareactiveduringbothpictureandword
processing. However, our analyses of response
types also showed that responses to pictures were
less diverse and idiosyncratic than responses to
words, suggesting that the degree to which alter-
native meanings are active in the two presentation
modes mayindeed be different. One further impli-
cation of the analyses is that semantic associations
(and especially association strengths) from word-
based norming studies do not necessarily general-
ize for the purpose of experiments using depicted
materials. This insight should have an impact on
psycholinguistic studies when selecting depicted
vs. written stimuli.
Our predictions for the types of differences we
expected were based on intuitive grounds. One
might therefore question the value of the analy-
ses presented in Section 4. It is interesting to note,
however, that some of the predictions were in fact
not born out. As the cluster analysis presented
in Section 5 required differences between the two
stimulus modes, it was critical that a proper eval-
uation of those differences be conducted, even if
some of them seem trivially true.
The cluster analysis demonstrated that we can
capitalize on the semantic associations and both
identify and discriminate the various senses of the
target nouns. Indeed, the clusters not only re-
vealed sense differences of target nouns with re-
specttotheirpresentationmodes, butalsodetected
noun senses which had not been identified by the
authors initially. This indicates that this method
not only can discriminate between senses but it
can also detect ambiguity. The cluster analysis al-
lowed us to apply automatic methods of identify-
ingwhich meaning ofawordaparticularassociate
refers to, which would otherwise be a time con-
suming and error-prone manual activity.

References
Leonard E. Baum. 1972. An inequality and associ-
atedmaximization technique in statistical estimation
for probabilistic functions of Markov processes. In-
equalities, III:1–8.
J. Cooper Cutting and Victor S. Ferreira. 1999.
Semantic and phonological information flow in
the production lexicon. Journal of Experimen-
tal Psychology: Learning, Memory, and Cognition,
25(2):318–344.
Dudenredaktion, editor. 2001. DUDEN – Das
Stilw¨orterbuch. Number 2 in ‘Duden in zw¨olf
B¨anden’. Dudenverlag, Mannheim, 8th edition.
Dudenredaktion, editor. 2005. DUDEN – Deutsch als
Fremdsprache Standardw¨orterbuch. Dudenverlag,
Mannheim, 1st edition.
Mats Rooth, Stefan Riezler, Detlef Prescher, Glenn
Carroll, and Franz Beil. 1999. Inducing a seman-
tically annotated lexicon via EM-based clustering.
In Proceedings of the 37th Annual Meeting of the
Association for Computational Linguistics.
Mats Rooth. 1998. Two-dimensional clusters
in grammatical relations. In Inducing Lexicons
with the EM Algorithm, AIMS Report 4(3). Insti-
tut f¨ur Maschinelle Sprachverarbeitung, Universit¨at
Stuttgart.
Roger W. Schvaneveldt,editor. 1990. Pathfinder Asso-
ciative Networks. Studies in Knowledge Organiza-
tion. Ablex Publishing Corporation, Norwood, NJ.
SENSEVAL. Evaluation exercises for Word Sense
Disambiguation. http://www.senseval.org/. Orga-
nized by ACL-SIGLEX.
Joan Gay Snodgrass and Mary Vanderwart. 1980. A
standardized set of 260 pictures: Norms for name
agreement, image agreement, familiarity, and visual
complexity. Journal of Experimental Psychology:
Human Learning and Memory, 6:174–215.
Michael K. Tanenhaus, James M. Leiman, and Mark S.
Seidenberg. 1979. Evidence for multiple stages in
the processing of ambiguous words in syntactic con-
texts. Journal ofVerbal Learning and VerbalBehav-
ior, 18:427–440.
