KUNLP System in SENSEVAL-3
Hee-Cheol Seo, Hae-Chang Rim
Dept. of Computer Science
and Engineering,
Korea University
1, 5-ka, Anam-dong, Seongbuk-Gu,
Seoul, 136-701, Korea
CUhcseo, rimCV@nlp.korea.ac.kr
Soo-Hong Kim
Dept. of Computer Software Engineering,
College of Engineering,
Sangmyung University,
San 98-20, Anso-Dong,
Chonan, Chungnam, Korea
soohkim@smuc.ac.kr
Abstract
We have participated in both English all words
task and English lexical sample task of SENSEVAL-
3. Our system disambiguates senses of a target
word in a context by selecting a substituent among
WordNet relatives of the target word, such as syn-
onyms, hypernyms, meronyms and so on. The deci-
sion is made based on co-occurrence frequency be-
tween candidate relatives and each of the context
words. Since the co-occurrence frequency is obtain-
able from raw corpus, our method is considered to
be an unsupervised learning algorithm that does not
require a sense-tagged corpus.
1 Introduction
At SENSEVAL-3, we adopted an unsupervised ap-
proach based on WordNet and raw corpus, which
does not require any sense tagged corpus. Word-
Net specifies relationships among the meanings of
words.
Relatives of a word in WordNet are defined as
words that have a relationship with it, e.g. they
are synonyms, antonyms, superordinates (hyper-
nyms), or subordinates (hyponyms). Relatives, es-
pecially those in a synonym class, usually have
related meanings and tend to share similar con-
texts. Hence, some WordNet-based approaches ex-
tract relatives of each sense of a polysemous word
from WordNet, collect example sentences of the rel-
atives from a raw corpus, and learn the senses from
the example sentences for WSD. Yarowsky (1992)
first proposed this approach, but used International
Roget’s Thesaurus as a hierarchical lexical database
instead of WordNet. However, the approach seems
to suffer from examples irrelevant to the senses of
a polysemous word since many of the relatives are
polysemous. Leacock et al. (1998) attempted to ex-
clude irrelevant or spurious examples by using only
monosemous relatives in WordNet. However, some
senses do not have short distance monosemous rel-
atives through a relation such as synonym, child,
and parent. A possible alternative of using only
monosemous relatives in the long distance, how-
ever, is problematic because the longer the distance
of two synsets in WordNet, the weaker the relation-
ship between them. In other words, the monose-
mous relatives in the long distance may provide ir-
relevant examples for WSD.
Our approach is somewhat similar to the Word-
Net based approach of Leacock et al. (1998) in that
it acquires relatives of a target word from WordNet
and extracts co-occurrence frequencies of the rela-
tives from a raw corpus, but our system uses poly-
semous as well as monosemous relatives. To avoid
a negative effect of polysemous relatives on the co-
occurrence frequency calculation, our system han-
dles the example sentences of each relative sepa-
rately instead of putting together the example sen-
tences of all relatives into a pool. Also we devised
our system to efficiently disambiguate senses of all
words using only co-occurrence frequency between
words.
2 KUNLP system
2.1 Word Sense Disambiguation
We disambiguate senses of a word in a context
1
by selecting a substituent word from the WordNet
2
relatives of the target word. Figure 1 represents a
flowchart of the proposed approach. Given a target
word and its context, a set of relatives of the target
word is created by searches in WordNet. Next, the
most appropriate relative that can be substituted for
the word in the context is chosen. In this step, co-
occurrence frequency is used. Finally, the sense of
the target word that is related to the selected relative
is determined.
The example in Figure 2 illustrates how the pro-
posed approach disambiguates senses of the tar-
get word chair given the context. The set of rel-
atives CUpresident, professorship, ...CV of chair is
built by WordNet searches, and the probability,
1
In this paper, a context indicates a target word and six
words surrounding the target word in an instance.
2
The WordNet version is 1.7.1.
                                             Association for Computational Linguistics
                        for the Semantic Analysis of Text, Barcelona, Spain, July 2004
                 SENSEVAL-3: Third International Workshop on the Evaluation of Systems
Context
Target Word
Context
Words
Surrounding
Target Word
Acquire
Set of Relatives
Select
a Relative
Determine
a Sense
WordNet
Co-occurrence
Information
Matrix
Sense
Figure 1: Flowchart of KUNLP System
“C8D6B4D4D6D3CUCTD7D7D3D6D7CWCXD4CYBVD3D2D8CTDCD8B5,” that a relative can
be substituted for the target word in the given con-
text is estimated by the co-occurrence frequency be-
tween the relative and each of the context words. In
this example, the relative, seat, is selected with the
highest probability and the proper sense, “a seat for
one person, with a support for the back,” is chosen.
Thus, the second step of our system (i.e. selecting
a relative) has to be carefully implemented to select
the proper relative that can substitute for the target
word in the context, while the first step (i.e. acquir-
ing the set of relatives) and the third step (i.e. deter-
mining a sense) are done simply through searches in
WordNet.
The substituent word of the CX-th target word D8DB
CX
in a context BV is defined to be the relative of D8DB
CX
which has the largest co-occurrence probability with
the words in the context:
CBCFB4D8DB
CX
BNBVB5
CSCTCU
BP CPD6CVD1CPDC
D6
CXCY
C8B4D6
AB
CXCY
CYBVB5 (1)
where CBCF is the substituent word, D6
CXCY
is the CY-th
relative of D8DB
CX
, and D6
AB
CXCY
is the AB-th sense related to
D8DB
CX
3
.IfAB is 2, the 2-nd sense of D6
CXCY
is related to
D8DB
CX
. The right hand side of Equation 1 is calculated
with logarithm as follows:
CPD6CVD1CPDC
D6
CXCY
C8B4D6
AB
CXCY
CYBVB5
BP CPD6CVD1CPDC
D6
CXCY
C8B4BVCYD6
AB
CXCY
B5C8B4D6
AB
CXCY
B5
C8B4BVB5
BP CPD6CVD1CPDC
D6
CXCY
C8B4BVCYD6
AB
CXCY
B5C8B4D6
AB
CXCY
B5
BP CPD6CVD1CPDC
D6
CXCY
CUD0D3CVC8B4BVCYD6
AB
CXCY
B5B7D0D3CVC8B4D6
AB
CXCY
B5CV (2)
3
AB is a function with two parameters D8DB
CX
and D6
CXCY
, but it can
be written in brief without parameters.
Instance :
    He should sit in the chair beside the desk.
Target Word :
    'chair'
Context :
    sit in the chair beside the desk
Set of Relatives :
    {professorship, president, chairman,
     electronic chair, death chair, seat,
     office, presiding officer, ...}
Probability of Relative given the Context :
    P( professorship | Context )
    P( president | Context )
    ...
    P( seat | Context )
    ...
Selected Relative :
    'seat' - it is the most likely word occurred
                from the above context among
                the relatives of 'chair'
Determined Sense :
    chair%1:06:00 - "a seat for one person,
                                with a support for the back."
    'seat' - the hypernym of chair%1:06:00.
Figure 2: Example of sense disambiguation proce-
dure for chair
Then Equation 2 may be calculated under the as-
sumption that words in BV occur independently:
CPD6CVD1CPDC
D6
CXCY
CUD0D3CVC8B4BVCYD6
AB
CXCY
B5B7D0D3CVC8B4D6
AB
CXCY
B5CV
AP CPD6CVD1CPDC
D6
CXCY
CU
D2
CG
CZBPBD
D0D3CVC8B4DB
CZ
CYD6
AB
CXCY
B5B7D0D3CVC8B4D6
AB
CXCY
B5CV (3)
where DB
CZ
is the CZ-th word in BV and D2 is the number
of words in BV. In Equation 3, we assume indepen-
dence among words in BV.
The first probability in Equation 3 is calculated as
follows:
C8B4DB
CZ
CYD6
AB
CXCY
B5
AP C8B4DB
CZ
CYD6
CXCY
B5
BP
C8B4D6
CXCY
CYDB
CZ
B5C8B4DB
CZ
B5
C8B4D6
CXCY
B5
(4)
The second probability in Equation 3 is computed
as follows:
C8B4D6
AB
CXCY
B5BPACB4D6
AB
CXCY
B5C8B4D6
CXCY
B5 (5)
where ACB4D6
AB
CXCY
B5 is the ratio of the frequency of D6
AB
CXCY
to
that of D6
CXCY
:
ACB4D6
AB
CXCY
B5BP
CFC6CUB4D6
AB
CXCY
B5B7BCBMBH
D2A3 BCBMBHB7CFC6CUB4D6
CXCY
B5
where CFC6CUB4D6
AB
CXCY
B5 is the frequency of D6
AB
CXCY
in Word-
Net, CFC6CUB4D6
CXCY
B5 is the frequency of D6
CXCY
in WordNet,
0.5 is a smoothing factor, and D2 is the number of
senses of D6
CXCY
.
Applying Equations 4 and 5 to Equation 3, we
have the following equation for acquiring the rela-
tive with the largest co-occurrence probability:
CPD6CVD1CPDC
D6
CXCY
C8B4D6
AB
CXCY
CYBVB5
AP CPD6CVD1CPDC
D6
CXCY
D2
CG
CZBPBD
D0D3CV
C8B4D6
CXCY
CYDB
CZ
B5C8B4DB
CZ
B5
C8B4D6
CXCY
B5
B7 D0D3CVACB4D6
AB
CXCY
B5C8B4D6
CXCY
B5
BP CPD6CVD1CPDC
D6
CXCY
D2
CG
CZBPBD
D0D3CV
C8B4D6
CXCY
CYDB
CZ
B5
C8B4D6
CXCY
B5
B7 D0D3CVACB4D6
AB
CXCY
B5C8B4D6
CXCY
B5
In the case that several relatives have the largest
co-occurrence probability, all senses related to the
relatives are determined as proper senses.
2.2 Co-occurrence Frequency Matrix
In order to select a substituent word for a target
word in a given context, we must calculate the
probabilities of finding relatives, given the con-
text. These probabilities can be estimated based on
the co-occurrence frequency between a relative and
context words as follows:
C8B4D6
CXCY
B5BP
CUD6CTD5B4D6
CXCY
B5
BVCB
(6)
C8B4D6
CXCY
CYDB
CZ
B5 BP
C8B4D6
CXCY
BNDB
CZ
B5
C8B4DB
CZ
B5
BP
CUD6CTD5B4D6
CXCY
BNDB
CZ
B5
CUD6CTD5B4DB
CZ
B5
(7)
where CUD6CTD5B4D6
CXCY
B5 is the frequency of D6
CXCY
, BVCB is the
corpus size, C8B4D6
CXCY
BNDB
CZ
B5 is the probability that D6
CXCY
and DB
CZ
co-occur, and CUD6CTD5B4D6
CXCY
BNDB
CZ
B5 is the frequency
that D6
CXCY
and DB
CZ
co-occur.
In order to calculate these probabilities, frequen-
cies of words and word pairs are required. For this,
we build a co-occurrence frequency matrix that con-
tains co-occurrence frequencies of words pairs. In
this matrix, an element D1
CXCY
represents the frequency
that the i-th word and j-th word in the vocabulary co-
occur in a corpus
4
. The frequency of a word can be
calculated by counting all frequencies in the same
row or column. The vocabulary is composed of all
content words in the corpus. Now, the equations 6
and 7 can be calculated with the matrix.
The matrix is easily built by counting each word
pair in a given corpus. It is not necessary to make an
individual matrix for each polysemous word, since
the matrix contains co-occurrence frequencies of all
word pairs. Hence, it is possible to disambiguate all
words with only one matrix. In other words, the pro-
posed method disambiguates the senses of all words
efficiently with only one matrix.
2.3 WordNet Relatives
Our system used most of relationship types in Word-
Net, except sister and attribute types, to acquire
the relatives of target words. For a nominal word,
we included all hypernyms and hyponyms in dis-
tance 3 from a sense, which indicate parents, grand-
parents and great-grand parents for hypernymy and
children, grandchildren and great-children for hy-
ponymy
5
.
In order to identify part-of-speech (POS) of
words including target words in instances, our sys-
tem uses TreeTagger (Schmid, 1994). After POS
4
The co-occurrence frequency matrix is a symmetric ma-
trix, thus D1
CXCY
is the same as D1
CYCX
.
5
We implemented WordNet APIs with index files and
data files in WordNet package, which is downloadable from
http://www.cogsci.princeton.edu/ wn/.
fine grained coarse grained
recall prec. recall prec.
noun 0.451 0.451 0.556 0.556
verb(R) 0.354 0.354 0.496 0.496
adjective 0.497 0.497 0.610 0.610
overall 0.404 0.404 0.528 0.528
Table 1: Official Results : English Lexical Sample
with U without U
recall prec. recall prec.
overall 0.500 0.500 0.496 0.510
Table 2: Official Results (fine grained) : English All
Words
of the target word is determined, relationship types
related to the POS are considered to acquire the can-
didate relatives of the target word. For instance, if a
target word is adverb, the following relationships of
the word are considered: synonymy, antonymy, and
derived.
2.4 WordNet Multiword Expression
Our system recognizes multiword expressions of
WordNet in an instance by a simple string match
before disambiguating senses of a target word. If
the instance has a multiword expression including
the target word, our system does not disambiguate
the senses of the multiword expression but just as-
signs all senses of the multiword expression to the
instance.
3 Official Results
We have participated in both English lexical sample
task and English all words task. Table 1 and 2 show
the official results of our system for two tasks. Our
system disambiguates all instances, thus the cover-
age of our system is 100% and precision of our sys-
tem is the same as the recall.
Our system assigns WordNet sense key to each
instance, but verbs in English lexical sample task
are annotated based on Wordsmyth definitions. In
official submission, we did not map the WordNet
sense keys of verbs to Wordsmyth senses, thus
the recall of our system for verbs is 0%. Ta-
ble 1 shows the results after a mapping between
Wordsmyth and WordNet verb senses using the file
EnglishLS.dictionary.mapping.xml.
In English all word task, there are two additional
scoring measures in addition to fine- and coarse-
grained scoring: with U and without U
6
.Inwith U,
6
These measures are described in Benjamin Synder’s mail
any instance without a WN sensekey is assumed to
be tagged with a ‘U’ and thus is tagged as correct
if the answer file (i.e. answer.key) has a ‘U’, incor-
rect otherwise. In without U, any instance without
a WN sensekey is assumed to have been skipped,
thus precision will not be affected, but recall will be
lowered.
4 Conclusions
In SENSEVAL-3, we participated in both English all
words task and English lexical sample task with an
unsupervised system based on WordNet and a raw
corpus, which did not use any sense tagged cor-
pus. Our system disambiguated the senses of a tar-
get word by selecting a substituent among WordNet
relatives of the target word, which frequently co-
occurs with each word surrounding the target word
in a context. Since each relative is usually related
to only one sense of the target word, our system
identifies the proper sense with the selected rela-
tive. The substituent word is selected based on the
co-occurrence frequency between the relative and
the words surrounding the target word in a given
context. We collected the co-occurrence frequency
from a raw corpus, not a sense-tagged one that is
often required by other approaches. In short, our
system disambiguates senses of words only through
the set of WordNet relatives of the target words and
a raw corpus. The system was simple but seemed
to achieve a good performance when considered the
performance of systems in last SENSEVAL-2 En-
glish tasks.
For future research, we will investigate the depen-
dency between the types of relatives and the char-
acteristics of words or senses in order to devise an
improved method that better utilizes various types
of relatives for WSD.

References
Claudia Leacock, Martin Chodorow, and George A.
Miller. 1998. Using corpus statistics and Word-
Net relations for sense identification. Computa-
tional Linguistics, 24(1):147–165.
Helmut Schmid. 1994. Probabilistic part-of-speech
tagging using decision trees. In Proceedings of
International Conference on New Methods in
Language Processing, Manchester,U.K.
David Yarowsky. 1992. Word-sense disambigua-
tion using statistical models of Roget’s cate-
gories trained on large corpora. In Proceedings
of COLING-92, pages 454–460, Nantes, France,
July.
