J 
Domain-Specific Semantic Class Disambiguation Using WordNet 
Li Shiuan Peh 
DSO National Laboratories 
20 Science Park Drive 
Singapore 118230 
plishiua~dso, org. sg 
Hwee Tou Ng 
DSO National Laboratories 
20 Science Park Drive 
Singapore 118230 
nhweetou@dso, org. sg 
Abstract 
This paper presents an approach which ex- 
ploits general-purpose algori.t~m~ and re- 
sources for domain-specific semantic class 
dis~mhiguation, thus facilitating the gen- 
eralization of semautic patterns fTom 
word-based to class-based representations. 
Through the mapping of the donza£u- 
specific semantic hierarchy onto Word- 
Net and the application of general-purpose 
word sense disambiguation and semantic 
distance metrics, the approach proposes a 
portable, wide-coverage method for disam- 
biguating semantic classes. Unlike exist- 
ing methods, the approach does not require 
annotated corpora. When tested on the 
MUC-4 terrorism domain, the approach 
is shown to outperform the most frequent 
heuristic substan~lly and achieve compa- 
rable accuracy with human judges. Its 
p~fo£~ance also compares favourably with 
two supervised learning algorithm.q. 
1 Introduction 
The semantic classi~cation of words refers to the 
abstraction of ambiguous (surface) words to un- 
ambiguous concepts. These concepts may be ex- 
plicitly expressed in a pre-defmed taxonomy of 
classes, or implicitly derived through the clustering 
of sen~-ticany-related words. Semantic classifica- 
tion has proved useful in a range of application ar- 
eas, such as information extraction (Soderland et at., 
1995), acquim'tion of domain knowledge (Mikheev 
and Finch, 1995) and improvement of parsing accu- 
racy through the speci~cation of selectional restric- 
tions (Grishman and Sterling, 1994; Gri~h,n~n aud 
Sterling, 1992). 
In this paper, we address the problem of s~mantic 
class disambiguation, with a view towards applying 
it to information extraction. The disambiguation of 
the semantic class of words in a particular context 
facilitates the generalization of semantic extraction 
patterns used in information extraction from word- 
based to class-based forms. This abstraction is effec- 
tively taFped by CRYSTAL (Soderland et aL, 1995), 
one of the first few approaches to the automatic in-. 
duction of extraction patterns. 
Many existing information extraction systems 
(MUC-6, lg96) rely on tedious knowledge engineer- 
ing approaches to hard-code semantic classes of 
words in a semantic lexicon, thus hampering the 
portability of their systems to di~erent domaln~. 
A notable exception is the approach taken by the 
Universi~ of Massachusetts. Its knowledge acquisi- 
tion framework, Kenmore, uses a case-based learn- 
ing mech--;am to learn domain knowledge automat- 
icaUy (Cardie, 1993). Kenmore, being a supervised 
algorithm, relies on an annotated corpus of domain- 
specific classes. Grishman et aL (1992) too ventured 
towards automatic semantic acquisition for informa- 
tion extraction. However, they expressed reserva- 
tions regmrding the use of WordNet to augment their 
semantic hierarchy automatically, citing examples of 
unintemded senses of words resulting in erroneous se- 
mantic cl~L~Sz~ation. 
To circumvent the ~notation bottleneck faced by 
Kenmore, our approach exploits general a~orithms 
and resources for the disambiguation of do,~i-- 
specific semantic classes. Unlike Grishmau et al.'s 
approach, our application of general word sense dis- 
ambiguation algorithms and semantic distance met- 
rics allows for an effective use of the Rue sense gran- 
ularity of WordNet. Experiments carried out on the 
MUC-4 (1992) terrorism domain saw our approach 
outtperform~g supervised algorithms and matching 
b,~n judgements. 
56 
i 
I 
I 
I 
i 
Figure I : Semantic Class Dis~mhiguation. 
2 Our Approach 
As opposed to proponents of "domain-specific in- 
formation for domain-specific applications", our ap- 
proach veztures towards the application of general- 
purpose algor\]t~,~ and resources to our dom~i,- 
specific s~rn~tic class disaznbiguation problem. 
Our information source is the extensive seman- 
tic hierarchy WordNet (Miller, 1990) which was de- 
signed to capture the semantics of general nuances 
and uses of the English language. Our approach rec- 
onciles the domain-specific hierarchy with this ~ast 
network and exploits WordNet to uncover semantic 
ci~s~es, without the need of an ~otated corpus. 
Firstly, the domain-specific hierarchy is mapped 
onto the semantic network of WordNet, by manu- 
ally as.~zni~g corresponding WordNet node(s) to the 
classes in the do~,~-speci~c hierarchy. To disem- 
biguate a word, the sentence context of the word 
is first streamed through a general word sense dis- 
ambiguation module which assigns the appropriate 
sense of the word. The word sense disambiguation 
module hence effectively pinpoints a partic~l~r node 
in WordNet that corresponds to the current sense 
of the word. Thereafter, this chosen concept node 
is piped througJa a semantic distance module which 
determines the s~m~c distances between this con- 
cept node and all the s~m~,~tic class nodes in the 
domain-speci~c hierarchy. If the distance between 
the concept node and a semantic class node is be- 
low some threshold, the semantic class node becomes 
a candidate class node. The nearest candidate eJ~ss 
node is then chosen as the semautic class of the word. 
If no such candidates exist, the word does not belong 
to any of the semantic classes in the hierarchy, and 
is usually labelled as the "entity" class. The flow of 
our approach is illustrated in Figure 1. 
A walkthrough of the approach with a simple ex- 
ample w~l better illustrate it. Consider a domain- 
specit~c hierarchy with just 3 classes :- VEHICLE, 
AIRCRAFT and CAR, as shown in Figure 2(a). 
Mapping this domainospeci~c hierarchy to Word- 
Net simply involves finding the specific sense(s) of 
57 
m r motor_vehicle: 1 
Figure 2 : (a) A simple domain-specific hier- 
archy (b) The classes of the domain-specific 
hierarchy as mapped onto WordNet, together 
with the word to be dis~mhigtmted, "plane'. 
the classes. In this case, all three classes correspond 
to their first sense in WordNet. 
Then, given a sentence, say, "The plane win be 
taking off in 5 minutes time. ~, to dis~m~iguate the 
semantic class of the word "plane", the sentence is 
fed to the word sense disambiguation module. The 
module win determine the sense of this wor& In 
this example, the correct sense of "plane" is sense 1, 
i.e. the sense of an aeroplane. Having identified the 
particular concept node in Word.Net that "plaue" 
corresponds to, the distances between this concept 
node and the three semantic class nodes are then cal- 
culated by the semantic distance module. Based on 
WordNet, the module will conclude that the concept 
node "plane:l" is nearer to the semantic class node 
"aircraft:l" and should hence be cl~Lssified as AIR- 
CRAFT. Figure 2(b) shows the relative positions of 
the concept node ~plane:l ~ and the three semantic 
cl~q nodes in Word_Net. 
2.1 Word Sense Dis~mhlguation 
Word sense disambiguation is an active research area 
in natural language processing, with a great number 
of novel methods proposed. Methods can typically 
be delineated along two dimensions, corpns-based 
vs. dictionary-based approaches. 
Corpus-based word sense disambignation algo- 
rjthm~ such as (Ng and Lee, 1996; Bruce and Wiebe, 
1994; Yarowsky, 1994) relied on supervised learn- 
ing fzom annotated corpora. The main drawback 
of these approaches is their requirement of a sizable 
sense-tagged corpus. Attempts to alleviate this tag- 
bottleneck i~lude tmotstr~ias (Te~ ot ill,, 
1996; Hearst, 1991) and unsupervised algorith~ 
(Yarowsky, 199s) 
Dictionary-based approaches rely on linguistic 
knowledge sources such as ma~l~i,~e-readable dictio- 
naries (Luk, 1995; Veronis and Ide, 1990) and Word- 
Net (Agirre and Rigau, 1996; Resnik, 1995) and e0(- 
ploit these for word sense disaznbiguation. 
Thus far, two notable sense-tagged corpora, the 
semantic concordance of WordN'et 1.5 (Miller et al. 
,1994) and the DSO corpus of 192,800 sense-tagged 
occtuTences of 191 words used by (Ng and Lee, 1996) 
are still insu~cient in scale for supervised algorithms 
to perform well on a wide range of texts. 
Unsupervised algorit~m~ such as (Yarowsky, 
1995) have reported good accuracy that rivals that 
of supervised algorithms. However, the algorithm 
was only tested on coarse-level senses and not on 
the refined sense distinctioas of WordNet, which is 
the required sense granularity of our approach. 
We hence turn to dictionary-based approaches, fo- 
cusing on WordNet-based algorithms Since they fit 
in snugly with our WordNet-based semantic class 
disambiguation task. 
Information Content 
Resnik (1995) proposed a word sense disambigua- 
tion algorithm which determ~ the senses of words 
in noun groupings. The sense of a word is disam- 
biguated by choosing the sense which is most highly 
supported by the other nouus of the noun group. 
The extent of support depends on the information 
content of the subsumers of the nouns in Word.Net, 
whereby information content is defined as negative 
log 1;1~1.~hood -togp(c), and p(c) is the probability 
of encountering an instance of concept c. 
As mentioned in his paper, although his approach 
was only reported on the disambiguation of words in 
related noun groupings, it can potentially be applied 
to word sense disambiguation of nouns in r-~-;~g 
text. 
In our implementation of his approach, we applied 
the method to general word sense disambiguation. 
We used the surrounding nouns of a word in free 
vmn~g text as the "norm grouping" and followed 
his algorit~r~ without modifications ~. 
Conceptual Density 
Agirre and Rigau:s (1996) approach has a ~imilar 
motivation as Kesnik's. Both approaches hinge on 
the belief that surrounding noun.~ provide strong 
clues as to the sense of a word. 
The main difference lies in how they determine the 
extent of support offered by the surrounding nouns. 
Agirre and Rigau uses the conceptual density of the 
ancestors of the nouns in WordNet as their metric. 
Our implementation foliow$ the pseudo-code pre- 
ZThe pseudo-code of his algorithm is detailed in (Res~ik, x995). 
=Surrounding nouns in the o~na\] ResnJk's approach 
refers to the other nouns in the noun grouping. 
umted in (Agirre and Rigan, 1996) s. For words 
which the algorithm failed to disambiguate (when 
no senses or more than one sense is returned), we 
relied on the most frequent heuristic. 
2.2 Semantic Distance 
The task of the semantic distance module is to re- 
flect accurately the notion of "closeness" between 
the chosen concept node of the word and the seman- 
tic class nodes. It thus requires a metric which can 
effectively represent the semantic distance bet~veen 
two nodes in a taxonomy such as Word.Net. 
Conceptual Distance 
Rada et. al (1989) proposed such a metric termed 
as conceptual distance. Conceptual distance be- 
tween two nodes is defined as the m~.ir-mn num- 
ber of edges separating the two nodes. Take the 
example in Figure 2(b), the conceptual distance be- 
tween "plane:l" and "aircraft:I" is 1, that between 
=plane:l" and "vehicle:l" is 2, and that between 
=plane:l" and "car:l" is 44. 
Link Probability 
The 11~1~ probability metric is our variant of the 
conceptual distance metric. Instead of considering 
all edges as equi-distance, the probability of the 1.1n\]¢ 
(or edge) is used to bias its distance. This metric is 
motivated by Resnik's use of the probability of in- 
stance occurrences of concepts, p(c) (Resnik, 1995). 
Link probability is defined as the difference between 
the probability of instance occurrences of the parent 
and child of the \]i.k~ Formally, 
Lin&l'P(e, b) = p(a) - p(b), 
SWe clarified with the authoz~ certain parts of the 
algorithm which we find unclear. These axe the poin~ 
worth noting :- 
(1) corrtpu%e.concephtaLdens/b 9 of Step 2 only computes 
the conceptual density of concepts which are not ~-rked 
inva~d; 
(2) ex/%Ioop of Step 3 occurs whsu all senses subsumed by 
conce~ were already pzeviously disambiguatecl or when 
one or more senses of the word to be disambiguated are 
subsumed by con~elm~ 
(3) ~z~rLd~=r'n.5~r~zte&ser~ of Step 4 marks senses 
subsumed by concept as disambiguated, marks concept 
and its clfddren as invalid, and discards other senses of 
the wor~ wi~ sere(s) disambiguated by ¢on~; 
(4) disambiguated se~es of 'words which form the con- 
text are not brought forward to the next window. 
41.n Word.Net, these are 25 unique beginners of the 
taxonomy, instead of a co~on root. Hence, in our hn- 
plementation, we ~.ign a large conceptual distance of 
999 to the virtual edges between two unique beginners. 
I 
I 
i 
i 
I 
i I 
I 
I 
! 
I 
I 
I 
I 
i 
I 
I 
I 
i 
58 ! 
~b.ere ~©) ffi 
~h,e.re wm,ds(©) i~ 'the ¢e¢ o,f ~,0~.~ ~ the 
~orlm,~ w~h ~re a~ba~med b~ the ¢~..e~ ©, 
~d 2V/# the ~o~Z ~mbee of ~to~ffi~ 
o¢~rr~/n ~e ¢orp~, (~.e~/k, 1998) 
a m ~r~t~ of the linJ~, 
b m ¢dt//.d o.f the link. 
The intuition behind this mewic is that the dis- 
tance between the parent and the child should be 
"closer if the probability of the parent is close to 
that of the child, since that implies that whenever 
an instance of the parent occurs in the corpus, it is 
usually an instance of the child. 
Descendant Coverage 
In the same spirit, the descendant coverage met- 
tic attempts to tweak the constant edge distance 
assumption of the conceptual distance metric. In- 
stead of relying on corpus statistics, static inforn~.- 
tion from Word.Net is exploited. Descendant cov- 
erage of a l~nlc is defined as the difference in the 
percentage of descendants subsumed by the parent 
and that subsumed by the child :- 
Des~ee(a, b) ---- ~(a) -- d(b), 
Total ~umber of de~cet~d~,~ i~ WordNe¢ " 
b ~ ~ o~ ~he iinJ#. 
The same intuition underlies this metric; that the 
distance between the parent and the child should be 
"nearer" if the percentage of descendants subsumed 
by the parent is close to that of the child, since it 
indic~es that most descendants of the pare~ are 
also descendants of the child. 
Taxonomic Link (IS-A) 
All the metrics detailed above were designed to 
capture semantic similarity or closeness. The seman- 
tic class disambi~ion problem, however, is essen- 
tially to identify membership of the chosen concept 
node in the semantic class nodes. 
A simple implementation of the s~n~n~c distance 
module can thus be just a waversal of the taxonomic 
l~b~ (IS-A) of Word.Net. If the chosen concept node 
is s descendant of a s~n~=~ic class node, it should 
be classified as that s~a~tic class. 
3 Evaluation 
The domain we worked on is the MUC-4 (1992) ter- 
rorism domaln. Nouns are extracted from the first 18 
59 
passages (dev-muc4-0001 to dev-muc4-0018) of the 
corpus of news wire art.ides to form our test corpus. 
The nouns extracted are the head nouns within noun 
phrases which are recognised by WordNet, including 
proper nouns such as "United States". These 1023 
nouns are hand-tagged with their sense and seman- 
tic class in the particular context to form the answer 
keys for subsequent experiments. 
3.1 Mapping dom~,;~-specifi¢ hierarchy 
onto Word.Net 
The domain-specific hierarchy used in our work is 
that crafWd by researchers fzom the University of 
Massachusetts for their information extraction sys- 
tem, which was one of the participants at MUC-4 
(Riloff, 1994). 
Mapping from the dom~,~-specific hierarchy to 
WordNer ~3rpically requires only the assignment of 
senses to the classes. For instance, the semantic 
class "human" is mapped onto its sense I node in 
WordNet, the uhuman:l" concept node. Classes can 
also be mapped onto more than one concept node in 
WordNet. The semantic class "attack", for e~ample, 
is mapped onZo both senses I and 5. 
There are cases where the exact wording of a se- 
mantic class in the domain-specific hierarchy is not 
pre~mt in WordNet. Take for instance the seman- 
tic class ~goveroment..ot~.cia/" in the domain-specific 
hiermx:hy. Since the collocation is not in Word- 
Net, we mapped it to the concept node ~govern- 
ment.agent:l" which we felt is closest in meaning. 
The set of mapped semantic cl~-~Ses in WordNet 
is shown in Figure 3 s. 
3.2 Word Sense Dis~mhigzmtlon 
We ran our two/mplementstions of word sense dis- 
ambiguation algorithms, the information content al- 
gorithm and the conceptual density method, on our 
domain-specific rest set. For the information content 
algorithm, a window size of 10, i.e. 5 nouns to the 
lefz and right, was found to yield the best results; 
w~1~t for the conceptual density algorithm, the op- 
timum window size was found to be 30. For both 
algorithm% only the nouns Of the same passage are 
incorporated into the context window. If the noun 
to be disambiguated is the first noun of the passage, 
the window will include the subsequent .N nouns of 
the same passage. 
The probability statistics required for Resuik's 
tneematton ¢~um¢ algoctchm were eonecmd 
sAs this hie~zchy is adopted, and not created by us, 
occasionally, "we can only furnish guesses as to the exact 
meaning of the semantic classes. 
I 
I 
• 
~po¢l 
wi~:l 
.~,~.,~: 1 ~1 
L_ 
,.. ~J 
m~ 
Figure 3 : MUC-4 semantic class hierarchy as mapped onto WordNet. 
777,857 noun occurrences of the entire Brown cor- 
pus and Wall Street Jottrnal corpus. 
The results are shown in Table I. The most fre- 
quent baseline is obtained by following the stxategy 
of always picking sense 1 of WordNet, since Word- 
Net orders its senses such that sense I is the most 
likely sense. 
As both algofithm.q performed below the most fre- 
quent baseline, it prompted us to evaluate the in- 
dicativeness of surrounding nouns for word sense 
disambiguation. We hence provided 2 h,m~ judges 
with a randomly selected sample of 80 ex~wples from 
the 734 polysemic nouns of our test corpus of 1023 
e~'~ples. The human judges are provided with the 
10 nouns surrounding the word to be disambiguated. 
Based only on these clues, they have to select a sin- 
gle sense of the word in the particular sentence con- 
text. Their responses are then tallied with the seuse- 
tagged test corpus. 
Table 2 shows the accuracies attained by the hu- 
man judges. Both judges are able ¢o perform sub-- 
scantially better than the most frequent heuristic 
baseline, despite the seeming)y impoverished knowl- 
edge source. Feedback from the)udges reveal possi- 
ble leverage for future improvements. Firstly, judges 
reflect that frequently, just one indicative surround- 
ing noun is enough to provide clear evidence for 
sense disambig~tion. The other nouns will just be 
glossed over and do not contribute to the decision. 
ALso, indicative nouns may not just hold is-a rela- 
tionships, which are the only relationships exploited 
by both algorithms. Rather, they are simply related 
in some m~-ner to the noun to be disambiguated. 
For instance, a surzounding context including the 
word "church ~ will indicate a strong support for the 
"pastor" sense of ~m;~i~ter ~ as opposed 1;o its other 
se~.ses. These reflections of the human judges seem 
to point towards the need for an effective method 
for selecting only particular nouns in the surround- 
ing context as evidence. Use of other relatiouships 
besides is-a may also help in disambi~tion, as is 
already expounded by (Sussna, lg93). 
3.3 Semantic Distance Metrics 
To evaluate the semantic distance metrics, we feed 
the se~tic distance mod~e with the correct senses 
of the entire test corpus and observe the resultant 
semantic c!~ss disambiguation accuracy. 
The conceptual distance, link probability and de- 
SCend~mt coverage metrics all require trAversal of 
11~1~ from one node to another. However, all of the 
metrics are commutative, i.e. distance from concept 
a to b is the same as chat from b to ~ In seman- 
tic class disambi~tion, a distinction is necessary 
since the taxonomic links indicate membership re- 
lationships which are not commutative ("aircraft:l" 
is a "vehicle:l ~ but "vehicle:l ~ need not be an "air- 
craft:l'). We hence associate different weights to 
the upwards and downwards traversal of links, with 
the 25 unique be~ers of Word.Net being the top- 
most nodes. Upward traversal of links towards the 
unique beginners are weighted consistently at 0.3 
whilst downward traversal of links towards the leaves 
I 
i 
! 
I 
i 
! 
i 
I 
| 
/ 
i 
i 
i 
i 
I 
i 
60 i 
I 
I 
i 
I 
I 
I |++ 
I 
I 
I 
I 
~.er~ati~ co=ent (po.t~ena¢) 
Conceptual deasity (polysemic) , "C,o.cez~t-~a~ denS" + 
.,,Most fzeque~.t heuristic (po\]ysemlc) 
Most frequent heuristic (polysemlc) 
Information content (overall) .... 
Conceptual density (overall) 
Conceptroa.l density + 
Most frequent headstic (overall) 
Most, fre~ent heuz-~'tic (overaU) 
~Examples #l~m~iguated I ~Correct I Accu~tcy 
734 734 292 39.78 % 
734 27~ 68 24.73 % 
734 734 I 385 I 52.4,5 % 734 ' 734 464 63.22% 
\[o~ II I ,I¢ :H, --_.~gllv~ 
1023 
107,3 
102,3 
1023 
674 ~3 65.88 % 
73.61% 
Table 1: Word sense disnmhiguation results. 
#Bxamples #Correct Accuracy 
Human A 80 57 71.25 ~0 
Human B 80 59 73.75 % 
Most frequent heuristic 80 45 56.25 % 
'l~ble 2: Word sense dls~mhiguation using surrounding nouns. 
are weighted at 1.7 s. 
.Also, different thresholds axe used for different lev- 
els of the domain-specific hierarchy. Since higher 
level classes, such as the level 0 "human" class, 
encompasses at wider range of words, it is evident 
that the thresholds for higher level classes-r~n-ot 
be stricter than that of lower level classes. For fair 
comparison of each metric, the best thresholds are 
arrived through exhaustive searching of a reasonable 
space 7. The results are detailed in Table 3. 
Accuracy on specific se~,mtic classes refers to an 
exact match of the pcogram's response with the cor- 
pus answer. The general ~n~t;ic class disambigua- 
tion accuracy, on the other hand, considers a re- 
spouse correct as long as the response class is in the 
sub-hierarchy which originated, fz'om the same level 
0 class as the answer. For example, if the program's 
reeponse is class =politi~", whilst the answer is 
class =lawyer", since both e\]~qses originated from the 
same level 0 class =b-m~ ~, this response is consid- 
ered correct when calculating the general semantic 
class accuracy. The specific se~tic class disam- 
biguation accuracy is hence the stricter measure. 
It may seem puzzling that semantic class disam- 
biguation does not achieve 100% accuracy even when 
supplied with the correct senses, i.e. even when the 
word sense d;~mhiguation module is able m attain 
100~0 accuracy, the overall semantic class disam- 
biguation accuracy still lags behind the ideal. Since 
SThese weights are found to be optimum for all three 
znetric$. 
~Integral thresholds are searched for the conceptual 
distance meetri~ whilst the thresholds of the other met- 
tics are searched in steps of 0.01. 
61 
the taxonomic 1~nlc~ in Word.Net are designed to cap- 
ture membership of words in classes, it may senn 
odd that the correct identification of the word sense 
coupled with the IS-A taxonomic 1~ still do not 
guarantee correct semantic class disambiguation. 
The reason for this paradox is perceptive di~er- 
ences; that between the designers of the MUC-4 
domain-specific hierarchy we adopted and the Word- 
Net hierarchy, and that between the an-orator of the 
answer corpus and the WordNet designers. 
Take for example the monosemic word "kidnap- 
ping". Its correct semantic class is =a~ack:5 s'. 
However, it is not a descendant of =attack:Y in 
Word.Net. The hypemyms of "kidnapping" axe \[cap- 
ture ~ felony --~ crime --> evil-doing -+ wrong-doing 
--> activity .-+ act\] and thatt of =attack:5" are \[bakery 
--~ crime ~ evil.doing ~ wrong-doing ~ activity 
act\]. Both perceptions of =kidnatpping" are correct. 
"kidnapping" can be viewed as a form of =attack:Y 
and ~m+\]~dy, it can be viewed as a form of =cap- 
t~re ~ . 
An effective semantic distance metric is hence 
needed here. The semantic distance module should 
infer the close distance between the two concept 
nodes "kidnapping" aud "attack:5" and thus col 
rectly classify "lddz~ppin~. 
3.4 Semantic Class Dis~mTdguation 
After evaluation of the separate phases, we corn- 
blued the best algorithms of the two phases and 
evaluated the performance of our semantic class dis- 
ambiguattion approach. Hence, the most ftequent 
S=attack:5" refers to an assault on someone whilst 
'%track:l" refers to the be~n~g of an o~m~rve. 
Disambiguation Accuracy Thresholds ~ 
Specific C\]a.sses General CCla~es 
Conceptual Distance .... 81.52 % . 87~10 % (3,2,2,1) 
Link Probability 80.16 % 85.24 % ~ 10,1,0.01,0.01,0.01) 
Co,  e % I 83.87 % i (}.02,0.01,0.01,0.01) 
Taxonomic L 79.67 % ! 85.14 % L Not applicable 
Table 3: Effect of different semantic distance metrics on semautic class dls=mblguation. 
(Ass-m;~g perfect word sense dls=n'~higuation) 
~Format :- (t~o, t~z, t~, t~s), where tz~ is the threshold that is applied to the ith level of the hierarchy. 
sense heuristic is used for the word sense disambigua- 
tion module and the conceptual distance metric is 
adopted for the semantic distance module 
It should be emphasized, however, that our al>- 
proach to s~m~-tic class disambiguation need not be 
coupled with any specific word sense disambiguation 
algorithm. The most frequent Word.Net sense is cho- 
sen simply because current word sense disambigua- 
tion algofithm~ still cannot beat the most frequent 
baseline consistently for all words. Our approach, 
in effect, allows domain-specific s~-~ic class dis- 
~mBiguation tO latch onto the improvements in the 
active research area of word sense disambiguation. 
As a baseline, we again sought the most frequent 
heuristic, which is the occurrence probability of the 
most frequent senantic class "entity". 9 
We compared our approach with supervised meth- 
ods ¢o contrast their reliance on annotated corpora 
with our r~nce on WordNet. One of the fore- 
most semantic e.l~¢,S disambiguation system which 
employs machine learning is the Kenwore framework 
(Cardie, 1993). Huwever, as we are unable to report 
comparative tests with K~ore z°, we adapted cwo 
other supervised algorithm% both successfully ap- 
plied to general word sense di~mhiguation, to the 
task of semantic class disambiguation. 
The first is the LBXAS algorithm which uses an 
exemplar-based learning framework s;mil~- to the 
case-based reasoning foundation of Kenmore (Ng, 
1997; Ng and Lee, 1996). L~ was shown to 
achieve high accuracy as compared to other word 
sense disambiguation algorithms. 
We also applied Teo et al's Bayesian word sense 
disambiguation algorithm to the task (Teo et al., 
1996). The approach compares favourably with 
other methods in word sense disambiguation when 
tested on a common data set of the word "interest". 
9This baseline is also used to evaluate the perfor- 
mance of K~ore (Cardie, 1993). 
Z°As work on one of the important input sources, the 
conceptu~ parser, is underway, per~.___ce results of 
Kenm~e on S~m~t~ic class dL~higuation cannot yet 
be reportecL 
The features used for both supervised algorithms 
are the local collocations of the surrounding 4 
words zz. Local collocation was shown to be the most 
indicative knowledge source for LBxA8 and these 
7 features are the common features used in both 
LF~X.AS and Teo et al's Bayesian algorithm. Both 
algorithmg are used for learning the specific sema-- 
tic class of words. 
For both algorithmg, the 1023-sentence test set is 
randomly partitioned into a 90% training set and 
a 10% testing set, in proportion with the overall 
class distribution. The algorithms are trained on 
the tr~;ng set and then used to dis~tdguate the 
distinct testing set. This was averaged over 10 runs. 
As with K~more, the tr~-~g set contains features 
of all the words in the training sentences, and the 
algorithms are to pick one s~-tic class for each 
word in the testing set. A word in the testing set 
need not have occurred in the training set. This 
is --fflce word sense disambiguation, whereby the 
training set cont~-~ features of one word, and the 
algorithm picks one sense for each occurence of this 
word in the testing set. 
To obtain a g~uge of human performance on this 
task, we sourced two independent human judge- 
ments. Two human judges are presented with a set 
of 80 sentences randomly selected from the 1023- 
example test corpus, each with a noun to be disam- 
biguated. Based on their understanding of the sen- 
tence, each noun is assigned a specific semantic cla.~ 
of the dom~n-specific hierarchy. Their responses are 
then compared ag~t the tagged answers of the test 
corpus. 
The s,~ic class disambiguation results are 
compiled and tabulated in Table 4. The definitions 
of general and specific semantic class disambigttation 
accuracy are detailed in Section 3.3. 
As is evident, our approach outperforms the most 
frequent heuristic substantially. Also, the perfor- 
zZGiven a word win the following sentence segment :- 
12 12 w rz ~'=, the 7features used are 12-h, lz..rl, rl..r2,12, 
l~, r2 and ~'2, whereby the first 3 features are concatena- 
tions off the words. 
62 
I 
I 
I 
I 
I 
I 
i 
I 
I 
i 
i 
I 
I 
I 
I 
I 
i 
I 
I 
Our Approach (1.023 exm:~i'~les) 
Most fzequent heuristic (10233 examples) 
f S~ervised :LZXAS) ST.a0 % 
Supervised :Bayes)... s.7.:s % 
Our Approac~h (80 examples) .... 71.15 % 
HI~ C (80 examples) W.50% 
Hruman D (S,O,.examples) 70.00 % 
Most fzequent heur~c (80 example_ ) ~1.~ % 
Table 4: Sem~mtic class dlsambiguation results. 
Disambiguation Accaracy 
Spedific Classes General Classes 
7s.9o % 8o.16 
46.92 % 46.92 % 
57.30% 
58.88% 
75.00 % 
82.50 % 
75.00 % 
51,25 % 
mance of both supervised algorithms lag b-hl-d that 
of our approach. Comparable performance with the 
two human judges is also achieved. 
It should be noted, though, that the amount of 
training data available to the supervised algorithms 
may not be sufficient. Ng and Lee (1996) found that 
train/rig sets of 1000-1500 e~mples per word are 
necessary for sense dJ-~mhiguation of one highly am- 
biguous word. The amount of Er~ining data needed 
for a supervised learning algorithm to achieve good 
performance on semantic class disambiguation may 
be larger than what we have used. Cardie (1993), 
for instance, used a larger 2056-instance case base in 
the evaluation of K~ore. 
4 Conclusion 
We have presented a portable, wide-coverage ap- 
proach to domain-specific semantic d~ disam- 
biguation which performs comparably with human 
judges. Our approach harnesses WordNet eHectively 
to outperform supervised methods which rely on an- 
nots~ed corpora. Unlike existing methods which re- 
quire h~d-cra.fting of lexicon or ~-ual annotation, 
the only human etfort involved in our approach is 
the mapping of the domain-specific semantic classes 
onto WordNeer. Through the use of general word 
sense disaznbiguation algorithms and semantic dis- 
tance metrics, our approach correlates the perfor- 
mance of semantic class disambiguation with the im- 
provemen~ in these actively researched fields. 

References 
Eneko Agirre and Germ~- Rigau. 1996. Word 
Sense Disambiguation Using Conceptual Density. 
In Proceedings of COLING-96. 
Rebecca Bruce and Janyce Wiebe. 1994. Word 
Sense Disambiguation using Decomposable Mod- 
eels. In Proceedings of A CL-94. 
Claire Cardie. 1993. A Case-Based Approach to 
Knowledge Acquisition for Domain-Specific Sen- 
tence .Analysis. In Proceedings of AAAI-9£ 
Ralph Grishman, John Sterling and Catherine 
Madeod. 1992. New York Univexs/ty Proteus 
System: MUC-4 Test Results and Analysis. In 
Proceeding8 of MUG-J. 
Ralph Grishman and John Sterling. 1992. Acqui- 
sition of Selecrional Pstterns. In Proceedings of 
COLING-9~. 
Ralph Grishman and John Sterling. 1994. Gener- 
alizing Automatically Generated Selectional Pat- 
terns. In Proceedings of COLING-#4. 
Marti A. Hearst. 1991. Noun Homograph Disam- 
biguation Using Local Context in Large Text Cor- 
pora. In Using Corpora, Univers/ty of Waterloo, 
Waterloo, Ontario. 
Alpha K. Luk. 1995. Statistical Sense Disambigua- 
tion with Relatively Small Corpora Using Dictio- 
vary Definitions. In Proceedings of A CL-#5. 
Andrei Mi~heev and Steven Finch. 1995. Towards a 
Wordbench for Acquis/tion of Domain Knowledge 
from Natural Language. In Proceedings of EAUL- 
g5. 
George A. Miner. 1990. An On-Line Lexical 
Database. In International Journal of Lezicogra 
phy, 3(4):235-312, 1990. 
George A. Miller, Martin Chodorow, Shad Landes, 
Claudia Leacock, and Robert G. Thom~¢. 1994. 
Using a Semantic Concordance for Sense Identifi- 
cation. In Proceedings of the ARPA Human Lan- 
guage Technology Workshop. 
MUC-4 Proceedings. 1992. Proceedings of 
the Fourth Message Understanding Conference 
(MUU-~), San Msteo, CA: Morgan Ka,,Cm~,m. 
MUC-6 Proceedings. 1996. Proceedings of the 
Sizth Message Understanding Conference (MUG- 
6), San Marco, CA: Morgan Kaufr-a-n. 
Hwee Tou Ng and Hian Beng Lee. 1996. Integrat- 
ing Multiple Knowledge Sources to Disambiguste 
Word Sense: An Exemplar-Based Approach. In 
Proceedings of ACL-96. 
Hwee Tou Ng. 1997. Exemplar-Based Word Sense 
Dissmbiguation: Some Recent Improyements. In 
Proceedings of the Second Conference on Empiri. 
cal Methods in Natural Language Proc~sing. 
Roy Rad~ Hafedh Mill, Enen Bi&nell and Maria 
Bleumer. 1989. Development and Application of 
• a Metric on S~m~tic Nets. In IEEE Transactions 
on Systems, Man, and Cybernetics, Vol. 19, No. 
1, Jan/Feb. 
Philip I:tesnik. 1995. Disarnbigua,ting Norm Group- 
ings with Respect to Word.Net Senses. In. Praceed- 
ings of the Third Workshop on Very Large Cot- 
.pore. 
Ellen M. Riloff. 1994. Informstion Extraction As 
a Basis for Portable Text Classification Systems. 
PhD the~, University of Massachmetts, Septem- 
ber 1994. 
Stephen Soderland, David Fisher, Jonathan Aseltine 
and Wendy Lehnert. 1995. CRYSTAL: Inducing a 
Conceptual Dictionary= In Proceedings of IJCAI- 
95. 
Michael Sussna. 1993. Word Sense Disambiguation 
for Free-TexZ lude~:ing using a massive ,Se~tic 
Net~rork. In Proceedings of the Seoond Interna- 
tional Co~ferenoe on Information and Knowledge 
Management (CIKM-93). 
Edward Teo, Chzistopher Ting, Li-Shiuan Peh and 
Hian-Beng Lee. 1996. Probabilistic Word-Sense 
Disambiguation and Bootstrapping with Unsuper- 
vised Gradient Descent. manuscript. 
Jean Veronis and Nancy Ide. 1990. Word Sense 
Disambigustion with Very Large Neural Networks 
extrazted from Machine Readable Dictionaries. In 
Procee.d: g6 of COLING-gO. 
David Yarowsky. 1994. Decision Lists for Lexi- 
cal Ambiguity Resolution: Application to Accent 
Restoration in Spanish and French. In Proceed/rigs 
oi ACL-g~. 
David Yarowsky. 1995. Unsupervised Word Sense 
Disambiguation RivMi~g Supervised Methods. In 
P~ocecedmgs of A ~ 95. 
