Tree-cut and A Lexicon based on Systematic Polysemy
Noriko Tomuro
DePaul University
School of Computer Science, Telecommunications and Information Systems
243 S. Wabash Ave.
Chicago, IL 60604
tomuro@cs.depaul.edu
Abstract
This paper describes a lexicon organized around sys-
tematic polysemy: a set of word senses that are
related in systematic and predictable ways. The
lexicon is derived by a fully automatic extraction
method which utilizes a clustering technique called
tree-cut. We compare our lexicon to WordNet
cousins, and the inter-annotator disagreement ob-
served between WordNet Semcor and DSO corpora.
1 Introduction
In recent years, the granularity of word senses
for computational lexicons has been discussed fre-
quently in Lexical Semantics (for example, (Kilgar-
ri, 1998a;; Palmer, 1998)). This issue emerged as a
prominent problem after previous studies and ex-
ercises in Word Sense Disambiguation (WSD) re-
ported that, when ne-grained sense denitions such
as those in WordNet (Miller, 1990) were used, en-
tries became very similar and indistinguishable to
human annotators, thereby causing disagreementon
correct tags (Kilgarri, 1998b;; Veronis, 1998;; Ng et
al., 1999). In addition to WSD, the selection of sense
inventories is fundamentally critical in other Natural
Language Processing (NLP) tasks such as Informa-
tion Extraction(IE) and Machine Translation(MT),
as well as in Information Retrieval (IR), since the
dierence in the correct sense assignments aects re-
call, precision and other evaluation measures.
In response to this, several approaches have been
proposed which group ne-grained word senses in
various ways to derive coarse-grained sense groups.
Some approachesutilize an abstractionhierarchyde-
ned in a dictionary (Kilgarri, 1998b), while others
utilize surface syntactic patterns of the functional
structures (such as predicate-argument structure for
verbs) of words (Palmer, 1998). Also, the current
version of WordNet (1.6) encodes groupings of sim-
ilar/related word senses (or synsets) by a relation
called cousin.
Another approach to grouping word senses is to
utilize a linguistic phenomenon called systematic
polysemy: a set of wordsensesthat arerelated in sys-
tematic and predictableways.
1
Forexample, ANIMAL
and MEAT meanings of the word \chicken" are re-
lated because chicken as meat refers to the 
esh of
achicken as a bird that is used for food.
2
This rela-
tion is systematic, since many ANIMAL words suchas
\duck" and \lamb" haveaMEAT meaning. Another
example is the relation QUANTITY-PROCESSobserved
in nouns such as \increase" and \supply".
Sense grouping based on systematic polysemy is
lexico-semantically motivated in that it expresses
general human knowledge about the relatedness of
word meanings. Such sense groupings have advan-
tages compared to other approaches. First, related
senses of a word often exist simultaneously in a
discourse (for example the QUANTITY and PROCESS
meanings of \increase" above). Thus, systematic
polysemy can be eectively used in WSD (and WSD
evaluation) to accept multiple or alternative sense
tags (Buitelaar, personal communication). Second,
many systematic relations are observed between
senses which belong to dierent semantic categories.
So if a lexicon is dened by a collection of sepa-
rate trees/hierarchies (such as the case of Word-
Net), systematic polysemy can express similaritybe-
tween senses that are not hierarchically proximate.
Third, by explicitly representing(inter-)relationsbe-
tween senses, a lexicon based on systematic poly-
semy can facilitate semantic inferences. Thus it is
useful in knowledge-intensive NLP tasks such as dis-
course analysis, IE and MT. More recently, (Gonzalo
et al., 2000)alsodiscussespotential usefulnessof sys-
tematic polysemy for clustering word senses for IR.
However, extracting systematic relations from
large sense inventories is a dicult task. Most of-
ten, this procedure is done manually. For example,
WordNet cousin relations were identied manually
bytheWordNet lexicographers. A similar eort was
also made in the EuroWordnet project (Vossen et
1
Systematic polysemy (in the sense we use in this paper) is
also referred to as regular polysemy (Apresjan, 1973) or logical
polysemy (Pustejovsky,1995).
2
Note that systematic polysemy should be contrasted
with homonymy, which refers to words which have more
than one unrelated sense (e.g. FINANCIAL INSTITUTION and
SLOPING LAND meanings of the word \bank").
al., 1999). The problem is not only that manual
inspection of a large, complex lexicon is very time-
consuming, it is also prone to inconsistencies.
In this paper, we describes a lexicon organized
around systematic polysemy. The lexicon is derived
by a fully automatic extraction method which uti-
lizes a clustering technique called tree-cut (Li and
Abe, 1998). In our previous work (Tomuro, 2000),
we applied this method to a small subset of Word-
Net nouns and showed potential applicability. In the
current work, we applied the method to all nouns
and verbs in WordNet, and built a lexicon in which
word senses are partitioned by systematic polysemy.
We report results of comparing our lexicon with the
WordNet cousins as well as the inter-annotator dis-
agreement observed between two semantically an-
notated corpora: WordNet Semcor (Landes et al.,
1998) and DSO (Ng and Lee, 1996). The results are
quite promising: our extraction method discovered
89% of the WordNet cousins, and the sense parti-
tions in our lexicon yielded better  values (Car-
letta, 1996) than arbitrary sense groupings on the
agreement data.
2 The Tree-cut Technique
The tree-cut technique is an unsupervised learning
technique which partitions data items organized in a
tree structure into mutually-disjoint clusters. It was
originally proposed in (Li and Abe, 1998), and then
adopted in our previous method for automatically
extracting systematic polysemy(Tomuro, 2000). In
this section, we give a brief summary of this tree-cut
technique using examples from (Li and Abe, 1998)'s
original work.
2.1 Tree-cut Models
The tree-cut technique is applied to data items that
are organized in a structure called a thesaurus tree.
A thesaurus tree is a hierarchicallyorganized lexicon
where leaf nodes encode lexicaldata (i.e., words)and
internal nodes represent abstract semantic classes.
A tree-cut is a partition of a thesaurus tree. It is
a list of internal/leaf nodes in the tree, and each
node represents a set of all leaf nodes in asubtree
rooted by the node. Such a set is also considered as a
cluster.
3
Clusters in a tree-cut exhaustively cover all
leaf nodes of the tree, and they are mutually disjoint.
For instance, Figure 1 shows an example thesaurus
tree andone possible tree-cut[AIRCRAFT,ball, kite,
puzzle], which is indicated by a thick curve in the
gure. There are also four other possible tree-cuts
for this tree: [airplane, helicopter, ball, kite, puzzle],
[airplane, helicopter, TOY], [AIRCRAFT, TOY]and
[ARTIFACT].
In (Li and Abe, 1998), the tree-cut technique
was applied to the problem of acquiring general-
3
A leaf node is also a cluster whose cardinalityis1.
ized case frame patterns from a corpus. Thus, each
node/word in the tree received as its value the num-
ber of instances where the word occurred as a case
role (subject, object etc.) of a given verb. Then the
acquisition of a generalized case frame was viewed as
a problem of selecting the best tree-cut model that
estimates the true probability distribution, given a
sample corpus data.
Formally,atree-cut model M is a pair consisting
of a tree-cut ; and a probability parameter vector
 of the same length,
M =(;;;) (1)
where ; and  are:
;=[C
1
;;::;;C
k
];;=[P(C
1
);;::;;P(C
k
)] (2)
where C
i
(1  i  k) is a cluster in the tree-
cut, P(C
i
) is the probability of a cluster C
i
, and
P
k
i=1
P(C
i
) = 1. Note that P(C) is the prob-
ability of cluster C = fn
1
;;::;;n
m
g as a whole,
that is, P(C) =
P
m
j=1
P(n
j
). For example, sup-
pose a corpus contains 10 instances of verb-object
relation for the verb \
y", and the frequencies
of object nouns n, denoted f(n), are as follows:
f(airplane) = 5;;f(helicopter) = 3;;f(ball) =
0;;f(kite)=2;;f(puzzle)=0.Then, the set of tree-
cut models for the example thesaurus tree shown in
Figure 1 includes ([airplane, helicopter, TOY], [.5,
.3, .2]) and ([AIRCRAFT, TOY], [.8, .2]).
2.2 The MDL Principle
To select the best tree-cut model, (Li and Abe, 1998)
uses the Minimal Description Length (MDL). The
MDL is a principle of data compression in Informa-
tion Theory which states that, for a given dataset,
the best model is the one which requires the min-
imum length (often measured in bits) to encode
the model (the model description length) and the
data (the data description length) (Rissanen, 1978).
Thus, the MDL principle captures the trade-o be-
tween the simplicity of a model, which is measured
bythenumberof clustersin atree-cut, and the good-
ness of t to the data, which is measured by the
estimation accuracy of the probability distribution.
The calculation of the description length for a
tree-cut model is as follows. Given a thesaurus tree
T and a sample S consisting of the case frame in-
stances, the total description length L(M;;S) for a
tree-cut model M =(;;;) is
L(M;;S)=L(;) +L(j;)+L(Sj;;;) (3)
where L(;) is the model description length, L(j;)
is the parameter description length (explained
shortly), and L(Sj;;;) is the data description
length. Note that L(;) + L(j;) essentially corre-
sponds to the usual notion of the model description
length.
ARTIFACT
AIRCRAFT TOY
airplane helicopter ball kite puzzle
Γ L(Θ|Γ) L(S|Γ,Θ)  L(M,S)
[A]                   1.66 11.60 13.26
[AC,TOY]                 3.32    14.34 17.66
[ap,heli,TOY]            4.98    14.44 19.42
[AC,ball,kite,puz]       6.64 4.96   11.60 
[ap,hel,ball,kite,puz] 8.31     5.06   13.37
0.8
0.0
0.2 0.0
530 02
frequency
Figure 1: The MDL lengths and the nal tree-cut
Each length in L(M;;S) is calculated as follows.
4
The model description length L(;) is
L(;) = log
2
jGj (4)
where G is the set of all cuts in T,andjGj denotes
the size of G. This value is a constantforallmod-
els, thus it is omitted in the calculation of the total
length.
The parameter description length L(j;) indi-
cates the complexityofthemodel. It is the length
required to encode the probability distribution of the
clusters in the tree-cut ;. It is calculated as
L(j;) =
k
2
log
2
jSj (5)
where k is the length of , and jSj is the size of S.
Finally, the data description length L(Sj;;;) is
the length required to encode the whole sample data.
It is calculated as
L(Sj;;;) = ;
X
n2S
log
2
P(n) (6)
where, for each n 2 C and each C 2 ;,
P(n)=
P(C)
jCj
and P(C)=
f(C)
jSj
(7)
Note the equation (7) essentially computes the Max-
imum Likelihood Estimate (MLE) for all n.
5
A table in Figure 1 shows the MDL lengths for all
ve tree-cut models. The best model is the one with
the tree-cut [AIRCRAFT, ball, kite, puzzle].
3 Clustering Systematic Polysemy
Using the tree-cut technique described above, our
previous work (Tomuro, 2000) extracted systematic
polysemyfromWordNet. In this section, wegivea
summary of this method, and describe the cluster
pairs obtained by the method.
4
For justication and detailed explanation of these formu-
las, see (Li and Abe, 1998).
5
In our previous work, weusedentropy instead of MLE.
That is because the lexicon represents true population, not
samples;; thus there is no additional data to estimate.
3.1 Extraction Method
In our previous work, systematically related word
senses are derived as binary cluster pairs, by apply-
ing the extraction procedure to a combination of two
WordNet (sub)trees. This process is done in the fol-
lowing three steps. In the rst step, all leaf nodes
of the two trees are assigned avalue of either 1, if
a node/word appears in both trees, or 0 otherwise.
6
In the second step, the tree-cut technique is applied
to each tree separately,andtwo tree-cuts (or sets of
clusters) are obtained. To searchthebest tree-cut
for atree(i.e., the model which requires the mini-
mum total description length), a greedy algorithm
called Find-MDL described in (Li and Abe, 1998)
is used to speed up the search. Finally in the third
step, clusters in those two tree-cuts are matched up,
and the pairs whichhavesubstantial overlap (more
than three overlapping words) are selected as sys-
tematic polysemies.
Figure 2 shows parts of the nal tree-cuts for the
ARTIFACT and MEASURE classes. Note in the gure,
bold letters indicate words which are polysemous in
the two trees (i.e., assigned a value 1).
3.2 Modication
In the currentwork, we made a minor modication
to the extraction method described above, by re-
moving nodes that are assigned a value 0 from the
trees. The purpose was to make the tree-cut tech-
nique less sensitive to the structure of a tree and
produce more specic clusters dened at deeper lev-
els.
7
The MDL principle inherently penalizes a com-
plex tree-cut by assigning a long parameter length.
Therefore, shorter tree-cuts partitioned at abstract
levels are often preferred. This causes a problem
when the tree is bushy, which is the case with Word-
Net trees. Indeed, many tree-cut clusters obtained
in our previous work were from nodes at depth 1
(counting the root as depth 0) { around 88% (122
6
Prior to this, eachWordNet (sub)tree is transformed into
a thesaurus tree, since WordNet tree is a graph rather than a
tree, and internal nodes as well as leaf nodes carry data. In
the transformation, all internal nodes in a WordNet tree are
copied as leaf nodes, and shared subtrees are duplicated.
7
Removing nodes with 0 is also warranted since we are not
estimating values for those nodes (as explained in footnote 5).
MEASURE
INDEFINITE
QUANTITY
LINEAR
MEASURE
yard
LINEAR
UNIT
foot
CONTAINERFUL
bottle
bucket
spoon
DEFINITE
QUANTITY
TIME
PERIOD
mile
0.07
0.12
0.330.36
load
blockbit
ounce
quarter
flash
morning
knot
0.53
ARTIFACT
foot
STRUCTURE
INSTRUMEN-
TALITY
yard
ROD
bottle bucket
CONTAINER
spoon
VESSEL
base
building
ARTICLE
IMPLEMENT
UTENSIL
mixer porcelain
0.1
TABLEWARE
spoon
dish
plate
0.02
DEVICE
foot
knot
Figure 2: Parts of the nal tree-cuts for ARTIFACT and MEASURE
Table 1: Automatically Extracted Cluster Pairs
Category Basic Underspecied Cluster
classes classes pairs
Nouns 24 99 2,377
Verbs 10 59 1,710
Total 34 158 4,077
out of total 138clusters)obtained for 5 combinations
of WordNet noun trees. Note that we did not allow
a cluster at the root of a tree;; thus, depth 1isthe
highest level for any cluster. After the modication
above, the proportion of depth 1 clusters decreased
to 49% (169 out of total 343 clusters) for the same
tree combinations.
3.3 Extracted Cluster Pairs
We applied the modied method described aboveto
all nouns and verbs in WordNet. We rst parti-
tioned words in the two categories into basic classes.
A basic class is an abstract semantic concept, and
it corresponds to a (sub)tree in the WordNet hier-
archies. Wechose 24 basic classes for nouns and 10
basic classes for verbs, from WordNet Top categories
for nouns and lexicographers' le names for verbs
respectively. Those basic classes exhaustively cover
all words in the two categories encoded in Word-
Net. For example, basic classes for nouns include
ARTIFACT, SUBSTANCE and LOCATION, while basic
classes for verbs include CHANGE, MOTION and STATE.
For each part-of-speech category, we applied our
extraction method to all combinations of two ba-
sic classes. Here, a combined class, for instance
ARTIFACT-SUBSTANCE, represents an underspecied
semantic class. We obtained 2,377 cluster pairs in
99 underspecied classes for nouns, and 1,710cluster
pairs in 59 underspecied classes for verbs. Table 1
shows a summary of the number of basic and under-
specied classes and cluster pairs extracted by our
method.
Although the results vary among category combi-
nations, the accuracy (precision) of the derived clus-
ter pairswasratherlow: 50to 60%on average,based
on our manual inspection using around 5% randomly
chosen samples.
8
This means our automatic method
over-generates possible relations. We speculate that
this is because in general, there are many homony-
mous relations that are 'systematic' in the English
language. For example, in the ARTIFACT-GROUP
class, a pair [LUMBER, SOCIAL GROUP]was extracted.
Words which are common in the two clusters are
\picket", \board" and \stock". Since there are
enough numberofsuchwords (for our purpose), our
automatic method could not dierentiate them from
true systematic polysemy.
4 Evaluation: Comparison with
WordNet Cousins
To test our automatic extraction method, wecom-
pared the cluster pairs derived by our method to
WordNet cousins. The cousin relation is relatively
new in WordNet, and the coverage is still incom-
plete. Currently a total of 194 unique relations are
encoded. A cousin relation in WordNet is dened
between two synsets, and it indicates that senses of
aword that appear in both of the (sub)trees rooted
by those synsetsare related.
9
The cousinswere man-
8
Note that the relatedness between clusters was deter-
mined solely by our subjective judgement. That is because
there is no existing large-scale lexicon which encodes related
senses completely for all words in the lexicon. (Note that
WordNet cousin relation is encoded only for some words).
Although the distinction between related vs. unrelated mean-
ings is sometimes unclear, systematicity of the related senses
among words is quite intuitive and has been well studied in
Lexical Semantics (for example, (Apresjan, 1973;; Nunberg,
1995;; Copestake and Briscoe, 1995)). A comparison with
WordNet cousin is discussed in the next section 4.
9
Actually, cousin is one of the three relations which in-
dicate the grouping of related senses of a word. Others are
sister and twin. In this paper, we use cousin to refer to all
relations listed in \cousin.tps" le (available in a WordNet
distribution).
ually identied by the WordNet lexicographers.
To compare the automatically derived cluster
pairs to WordNet cousins, we used the hypernym-
hyponym relation in the trees, instead of the number
or ratio of the overlapping words. This is because
the levels at which the cousin relations are dened
dier quite widely, from depth 0 to depth 6, thus the
number of polysemous words covered in each cousin
relation signicantly varies. Therefore, it was di-
cult to decide on an appropriate threshold value for
either criteria.
Using the hypernym-hyponym relation, we
checked, for each cousin relation, whether there was
at least one cluster pair that subsumed or was sub-
sumed by the cousin. More specically, for a cousin
relation dened between nodes c1 and c2 in trees
T1 and T2 respectively and a cluster pair dened
between nodes r1 and r2 in the same trees, wede-
cided on the correspondence if c1isahypernym or
hyponym of r1, and c2isahypernym or hyponym
r2 at the same time.
Based on this criteria, we obtained a result indi-
cating that 173 out of the 194 cousin relations had
corresponding cluster pairs. This makes the recall
ratio 89%, whichwe consider to be quite high.
In addition to the WordNet cousins, our auto-
matic extraction method discovered several interest-
ing relations. Table 2 shows some examples.
5 A Lexicon based on Systematic
Relations
Using the extracted cluster pairs, we partitioned
word senses for all nouns and verbs in WordNet, and
produced a lexicon. Recall from the previous section
that our cluster pairs are generated for all possible
binary combinations of basic classes, thus one sense
could appear in more than one cluster pair. For ex-
ample, Table3shows the cluster pairs (and a set of
senses covered by each pair, which we call a sense
cover) extracted for the noun \table" (which has 6
senses in WordNet). Also as wehave mentioned ear-
lier in section accuracy-result, our cluster pairs con-
tain many false positives ones. For those reasons, we
took a conservativeapproach, by disallowing transi-
tivity of cluster pairs.
To partition senses of a word, we rst assign each
sense cover a value whichwecallaconnectedness. It
is dened as follows. For a given word w whichhasn
senses, let S be the set of all sense covers generated
for w. Let c
ij
denote the numberofsensecovers in
which sense i (s
i
) and sense j (s
j
) occurred together
in S (where c
ii
= 0 for all 1  i  n), and d
ij
=
P
n
k=1
c
ik
+c
kj
C
, where k 6= i, k 6= j, c
ik
> 0, c
kj
> 0,
and C =
P
i;;j
c
ij
. A connectedness of a sense cover
sc 2 S, denoted CN
sc
, where sc =(s
l
;;::;;s
m
) (1 
Table 3: Extracted Relations for \table"
Sense Cover Cluster Pair CN
(1 4) [ARRANGEMENT, NAT OBJ] 1.143
(1 5) [ARRANGEMENT, SOC GROUP] 1.143
(2 3) [FURNITURE] 4.429
(2 3 4) [FURNITURE, NAT OBJ] 7.429
(2 3 5) [FURNITURE, SOC GROUP] 7.714
(2 3 6) [FURNITURE, FOOD] 7.429
(4 5) [NAT OBJ, SOC GROUP] 1.429
(5 6) [SOC GROUP, FOOD] 1.286
l<m n) is dened as:
CN
sc
=
m
X
i=l
m
X
j=1
c
ij
+d
ij
(8)
Intuitively, c
ij
represents the weightofa direct re-
lation, and d
ij
represents the weightofan indirect
relation between anytwo senses i and j. The idea
behind this connectedness measure is to favor sense
covers that have strong intra-relations. This mea-
sure also eectively takes into account a one-level
transitivityind
ij
. As an example, the connectedness
of (2 3 4) is the summation of c
23
;;c
34
;;c
24
;;d
23
;;d
34
and d
24
. Here, c
23
= 4 because sense 2 and 3 co-
occur in four sense covers, and c
34
= c
24
=1.Also,
d
23
=
(c
24
+c
43
)+(c
25
+c
53
)+(c
26
+c
63
)
C
=
2+2+2
14
= :429
(omitting cases where either or both c
ik
and c
kj
are
zero), and similarly d
34
= :5 and d
24
= :5. Thus,
CN
(234)
=4+1+1+:429+:5+:5=7:429. Table 3
shows the connectedness values for all sense covers
for \table".
Then, we partition the senses by selecting a set of
non-overlapping sense covers which maximizes the
total connectedness value. So in the example above,
the set f(1 4),(2 3 5)g yields the maximum con-
nectedness. Finally, senses that are not covered by
any sense covers are taken as singletons, and added
to the nal sense partition. So the sense partition
for \table" becomes f(1 4),(2 3 5),(6)g.
Table 4 shows the comparison between Word-
Net and our new lexicon. As you can see,
our lexicon contains much less ambiguity: the
ratio of monosemous words increased from 84%
(88,650/105,461.84)to 92% (96,964/105,461.92),
and the average number of senses for polysemous
words decreased from 2.73 to 2.52 for nouns, and
from 3.57 to 2.82 for verbs.
As a note, our lexicon is similar to CORELEX
(Buitelaar, 1998) (or CORELEX-II presented in
(Buitelaar, 2000)), in that both lexicons share the
same motivation. However, our lexicon diers from
CORELEX in that CORELEX looks at all senses of
aword and groups words that have the same sense
distribution pattern, whereas our lexicon groups
Table 2: Examples of Automatically Extracted Systematic Polysemy
Underspecied Class Cluster Pair Common Words
ACTION-LOCATION [ACTION, POINT] \drop", \circle", \intersection", \dig",
\crossing", \bull's eye"
ARTIFACT-GROUP [STRUCTURE, PEOPLE] \house", \convent", \market", \center"
ARTIFACT-SUBSTANCE [FABRIC, CHEMICAL COMPOUND] \acetate", \nylon", \acrylic", \polyester"
COMMUNICATION-PERSON [VOICE, SINGER] \soprano", \alto", \tenor", \baritone"
[WRITING, RELIGIOUS PERSON] \John", \Matthew", \Jonah", \Joshua",
\Jeremiah"
Table 4: WordNet vs. the New Lexicon
Category WordNet New
Nouns Monosemous 82,892 88,977
Polysemous 12,243 6,158
Total words 95,135 95,135
Ave # senses 2.73 2.52
Verbs Monosemous 5,758 7,987
Polysemous 4,568 2,339
Total words 10,326 10,326
Ave # senses 3.57 2.82
Total Monosemous 88,650 96,964
Polysemous 16,811 8,497
Total words 105,461 105,461
word senses that have the same systematic relation.
Thus, our lexicon represents systematic polysemyat
anerlevel than CORELEX, by pinpointing related
senses within eachword.
6 Evaluation: Inter-annotator
Disagreement
To test if the sense partitions in our lexicon con-
stitute an appropriate (or useful) level of granular-
ity, we applied it to the inter-annotator disagree-
ment observed in two semantically annotated cor-
pora: WordNet Semcor (Landes et al., 1998) and
DSO (Ng and Lee, 1996). The agreementbetween
those corpora is previously studied in (Ng et al.,
1999). In our current work, we rst re-produced
their agreement data, then used our sense partitions
to see whether or not they yield a better agreement.
In this experiment, we extracted 28,772 sen-
tences/instances for 191 words (consisting of 121
nouns and 70 verbs) tagged in the intersection of
the two corpora. This constitutes the base data set.
Table 5 shows the breakdown of the number of in-
stances where tags agreed and disagreed.
10
As you
10
Note that the numbers reported in (Ng et al., 1999) are
slightly more than the ones reported in this paper. For in-
stance, the number of sentences in the intersected corpus re-
ported in (Ng et al., 1999) is 30,315. We speculate the dis-
crepancies are due to the dierentsentence alignment meth-
Table 5: Agreementbetween Semcor and DSO
Category Agree Disagree Total Ave. 
Nouns 6,528 5,815 12,343 .268
Verbs 7,408 9,021 16,429 .260
Total 13,936 14,836 28,772 .264
(%) (48.4) (51.6) (100.0)
can see, the agreementisnotvery high: only around
48%.
11
This low agreement ratio is also re
ected in a mea-
sure called the  statistic (Carletta, 1996;; Bruce and
Wiebe, 1998;; Ng et al., 1999).  measure takes into
accountchance agreement, thus better representing
the state of disagreement. A  value is calculated
for each word, on a confusion matrix where rows
represent the senses assigned by judge 1 (DSO) and
columns represent the senses assigned by judge 2
(Semcor). Table 6 shows an example matrix for the
noun \table".
A  value for a word is calculated as follows. We
use the notation and formula used in (Bruce and
Wiebe, 1998). Let n
ij
denote the number of in-
stances where the judge 1 assigned sense i and the
judge 2 assigned sense j to the same instance, and
n
i+
and n
+i
denote the marginal totals of rows and
columns respectively. The formula is:
k =
P
i
P
ii
;
P
i
P
i+
P
+i
1;
P
i
P
i+
P
+i
(9)
where P
ii
=
n
ii
n
++
(i.e., proportion of n
ii
, the number
of instances where both judges agreed on sense i,to
the total instances), P
i+
=
n
i+
n
++
and P
+i
=
n
+i
n
++
.
The  value is 1.0 when the agreement is perfect
(i.e., values in the o-diagonal cells are all 0, that
is,
P
i
P
ii
=1),or 0 when the agreement is purely
ods used in the experiments.
11
(Ng et al., 1999) reports a higher agreement of 57%. We
speculate the discrepancy might be from the version of Word-
Net senses used in DSO, whichwas slightly dierent from the
standard delivery version (as noted in (Ng et al., 1999)).
Table 6: Confusion Matrix for the noun \table" ( = :611)
Judge 2 (Semcor)
1 2 3 4 5 6 Total
1 43 0 0 0 0 0 43 (= n
1+
)
2 6 17 3 0 0 0 26 (= n
2+
)
Judge 1 3 0 0 0 0 0 0 0 (= n
3+
)
(DSO) 4 1 0 0 0 0 0 1 (= n
4+
)
5 0 0 0 0 0 0 0 (= n
5+
)
6 2 2 1 0 0 0 5 (= n
6+
)
Total 52 19 4 0 0 0 75
(= n
+1
) (= n
+2
) (= n
+3
) (= n
+4
) (= n
+5
) (= n
+6
) (= n
++
)
Table 7: Reduced Matrix for \table" ( = :699)
1,4 2,3,5 6 Total
1,4 44 0 0 44
2,3,5 6 20 0 26
6 2 3 0 5
Total 52 23 0 75
bychance (i.e., values in a row (or column) are uni-
formly distributed across rows (or columns), that is,
P
ii
= P
i+
P
+i
for all 1  i  M, where M is the
number of rows/columns).  also takes a negative
value when there is a systematic disagreement be-
tween the two judges (e.g., some values in the diago-
nal cells are 0, that is, P
ii
= 0 for some i). Normally,
  :8 is considered a good agreement (Carletta,
1996).
By using the formula above, the average  for the
191 words was .264, as shown in Table 5.
12
This
means the agreement between Semcor and DSO is
quite low.
We selected the same 191 words from our lexicon,
and used their sense partitions to reduce the size of
the confusion matrices. For eachword, we computed
the  for the reduced matrix, and compared it with
the  for a random sense grouping of the same parti-
tion pattern.
13
For example, the partition pattern of
f(1 4),(2 3 5),(6)g for \table" mentioned earlier
(where Table 7 shows its reduced matrix) is a multi-
nomial combination
;
6
231

. The  value for a ran-
dom grouping is obtained by generating 5,000 ran-
dom partitions whichhave the same pattern as the
corresponding sense partition in our lexicon, then
taking the mean of their 's. Then we measured the
possible increase in  by our lexicon by taking the
dierence between the paired  values for all words
(i.e., 
w
by our sense partition - 
w
by random par-
tition, for a word w), and performed a signicance
12
(Ng et al. 1999)'s result is slightly higher:  = :317.
13
For this comparison, we excluded 23 words whose sense
partitions consisted of only 1 sense cover. This is re
ected in
the total number of instances in Table 8.
Table 8: Our Lexicon vs. Random Partitions
Category Total Our Lexicon Random
Ave.  Ave. 
Nouns 10,980 .247 .217
Verbs 14,392 .283 .262
Total 25,372 .260 .233
test, with a null hypothesis that there was no signif-
icant increase. The result showed that the P-values
were 4.17 and 2.65 for nouns and verbs respectively,
whichwere both statistically signicant. Therefore,
the null hypothesis was rejected, and we concluded
that there was a signicant increase in  by using
our lexicon.
As a note, the average 's for the 191 words from
our lexicon and their corresponding random parti-
tions were .260 and .233 respectively. Those values
are in fact lower than that for the original WordNet
lexicon. There are two major reasons for this. First,
in general, combining any arbitrary senses does not
always increase . In the given formula 9,  actually
decreases when the increase in
P
i
P
ii
(i.e., the diag-
onal sum) in the reduced matrix is less than the in-
creasein
P
i
P
i+
P
+i
(i.e., the marginalproduct sum)
by some factor.
14
This situation typically happens
when senses combined are well distinguished in the
original matrix, in the sense that, for senses i and j,
n
ij
and n
ji
are 0 or very small (relative to the total
frequency). Second, some systematic relations are in
fact easily distinguishable. Senses in such relations
often denote dierent objects in a context, for in-
stance ANIMAL and MEAT senses of \chicken". Since
our lexicon groups those senses together, the 's for
the reduce matrices decrease for the reason we men-
tioned above. Table 8shows the breakdown of the
average  for our lexicon and random groupings.
14
This is because
P
i
P
i+
P
+i
is subtracted in both the nu-
merator and the denominator in the  formula. Note that
both
P
i
P
ii
and
P
i
P
i+
P
+i
always increase when any ar-
bitrary senses are combined. The factor mentioned here is
1;
P
i
P
ii
1;
P
i
P
i+
P
+i
.
7 Conclusions and Future Work
As we reported in previous sections, our tree-cut
extraction method discovered 89% of the Word-
Net cousins. Although the precision was rela-
tively low (50-60%), this is an encouraging re-
sult. As for the lexicon, our sense partitions con-
sistently yielded better  values than arbitrary
sense groupings. We consider these results to
be quite promising. Our data is available at
www.depaul.edu/ntomuro/research/naacl-01.html.
It is signicant to note that cluster pairs and sense
partitions derived in this work are domain indepen-
dent. Such information is useful in broad-domain
applications, or as a background lexicon (Kilgarri,
1997) in domain specic applications or text catego-
rization and IR tasks. For those tasks, weanticipate
that our extraction methods may be useful in deriv-
ing characteristics of the domains or given corpus,
as well as customizing the lexical resource. This is
our next future research.
For other future work, we plan to investigate an
automatic way of detecting and ltering unrelated
relations. We are also planning to compare our sense
partitions with the systematic disagreement ob-
tained by (Wiebe, et al., 1998)'s automatic classier.
Acknowledgments
The author wishes to thank Steve Lytinen at
DePaul University and the anonymous reviewers for
very useful comments and suggestions.
References
Apresjan, J. (1973). RegularPolysemy. Linguistics,
(142).
Bruce, R. and Wiebe, J. (1998). Word-sense Dis-
tinguishability and Inter-coder Agreement. In
Proceedings of the COLING/ACL-98, Montreal,
Canada.
Buitelaar, P. (1998). CORELEX: Systematic Poly-
semy and Underspecication. Ph.D. dissertation,
Department of Computer Science, Brandeis Uni-
versity.
Buitelaar, P. (2000). Reducing Lexical Semantic
Complexity with Systematic Polysemous Classes
and Underspecication. In Proceedings of the
ANLP/NAACL-00 Workshop on Syntactic and
Semantic Complexity in Natural Language Pro-
cessing,Seattle,WA.
Carletta, J. (1996). Assessing AgreementonClas-
sication Tasks: The Kappa Statistic, Computa-
tional Linguistics, 22(2).
Copestake, A. and Briscoe, T. (1995). Semi-
productivePolysemy and Sense Extension. Jour-
nal of Semantics, 12.
Gonzalo, J., Chugur, I. and Verdejo, F. (2000).
Sense Clusters for Information Retrieval: Evi-
dence from Semcor and the InterLingual Index.
In Proceedings of the ACL-2000 Workshop on
Word Senses and Multilinguality, Hong-Kong.
Kilgarri, A. (1997). Foreground and Background
Lexicons and Word Sense Disambiguation for In-
formation Extraction. In Proceedings of the In-
ternational Workshop on Lexically Driven Infor-
mation Extraction.
Kilgarri, A. (1998a). SENSEVAL: An Exercise
in Evaluating Word Sense Disambiguation Pro-
grams. In Proceedings of the LREC.
Kilgarri, A. (1998b). Inter-tagger Agreement. In
Advanced Papers of the SENSEVAL Workshop,
Sussex, UK.
Landes, S., Leacock, C. and Tengi, R. (1998).
Building Semantic Concordance. In WordNet:
An Electronic Lexical Database, The MIT Press.
Li, H. and Abe, N. (1998). Generalizing Case
Frames Using a Thesaurus and the MDL Prin-
ciple, Computational Linguistics, 24(2).
Miller, G. (eds.) (1990). WORDNET: An Online
Lexical Database. International Journal of Lex-
icography, 3(4).
Ng, H.T., and Lee, H.B. (1996). Integrating Mul-
tiple Knowledge Sources to Disambiguate Word
Sense. In ProceedingsoftheACL-96,SantaCruz,
CA.
Ng, H.T., Lim, C. and Foo, S. (1999). A Case
Study on Inter-Annotator Agreement for Word
Sense Disambiguation. In Proceedings of the
ACL SIGLEX Workshop on Standardizing Lexi-
cal Resources,CollegePark, MD.
Nunberg, G. (1995). Transfers of Meaning. Journal
of Semantics, 12.
Palmer, M. (1998). Are Wordnet sense distinctions
appropriate for computational lexicons? In Ad-
vancedPapers of the SENSEVAL Workshop, Sus-
sex, UK.
Pustejovsky, J. (1995). The Generative Lexicon,
The MIT Press.
Rissanen, J. (1978). Modeling by Shortest Data
Description. Automatic, 14.
Tomuro, N. (2000). Automatic Extraction of Sys-
tematic Polysemy Using Tree-cut. In Proceedings
of the ANLP/NAACL-00 Workshop on Syntactic
and Semantic Complexity in Natural Language
Processing, Seattle, WA.
Veronis, J. (1998). A Study of Polysemy Judge-
ments and Inter-annotator Agreement. In Ad-
vancedPapers of the SENSEVAL Workshop, Sus-
sex, UK.
Vossen, P., Peters, W. and Gonzalo, J. (1999). To-
wards a Universal Index of Meaning. In Proceed-
ings of the ACL SIGLEX Workshop on Standard-
izing Lexical Resources, College Park, MD.
Wiebe, J., Bruce, R. and O'Hara, T. (1999). De-
velopment and Use of a Gold-Standard Data Set
for Subjectivity Classications. In Proceedings of
the ACL-99, College Park, MD.
