Using a Probabilistic Class-Based Lexicon 
for Lexical Ambiguity Resolution 
Detlef Prescher and Stefan Riezler and Mats Rooth 
Institut fiir Maschinelle S1)rachvcrarbeitung 
UniversitSt Stuttgart, Germany 
Abstract 
This paper presents the use of prot)abilistie 
class-based lexica tbr dismnbiguati(m in target- 
woxd selection. Our method emlfloys nfinimal 
1)llt; precise contextual information for disam- 
biguation. That is, only information provided 
by the target-verb, enriched by the condensed 
information of a probabilistic class-based lexi- 
con, is used. Induction of classes and fine-tuning 
to verbal arguments is done in an unsupervised 
manner by EM-lmsed clustering techniques. The 
method shows pronlising results in an evaluation 
on real-world translations. 
1 Introduction 
Disambiguation of lexical ambiguities in nat- 
urally oceuring free text is considered a hard 
task for computational linguistics. For instance, 
word sense disa.inbiguatiol~ is concerned with the 
protflem of assigning sense labels to occurrences 
of an ambiguous word. Resolving such ambi- 
guil;ies is useful in constraining semantic inter- 
pretation. A related task is target-word disam- 
biguation in machine translation. Here a deci- 
sion has to be made which of a set of alterna- 
tive target- words is the most appro- 
priate translation of a source- word. A 
sohltion to this disambiguation problem is di- 
rectly applicable in a machine translation sys- 
tem which is able to propose the translation al- 
ternatives. A further problem is the resolution 
of attachment ambiguities in syntactic parsing. 
Here the decision of verb versus argunlent at- 
ta&ment of noun phrases, or the choice for verb 
phrase versus noun phrase attachment of prepo- 
sitional phrases Call build upon a resolution of 
the related lexical mnbiguities. 
Statistical approaches have been applied suc- 
cessfully to these 1)roblems. The great advantage 
of statistical methods over symbolic-linguistic 
methods has been deemed to be their effec- 
tive exploitation of minimal linguist;it knowl- 
edge. However, the best performing statisti- 
cal approaches to lexical ambiguity resolution 
l;lmmselves rely on complex infornmtion sources 
such as "lemmas, inflected forms, parts of speech 
and arbitrary word classes If-.. \] local and dis- 
tant collocations, trigram sequences, a.nd predi- 
cate m'gument association" (Yarowsky (1995), p. 
190) or large context-windows up to 1000 neigh- 
boring words (Sch/itze, 1992). Unfortmmtely, in 
many applications such information is not read- 
ily available. For instance, in incremental ma- 
chine translation, it may be desirable to decide 
for the most probable translation of the argu- 
ments of a verb with only the translation of the 
verb as information source lint no large window 
of sun'ounding translations available. In parsing, 
the attachment of a nolninal head nlay haa~e to 
be resolved with only information al)out the se- 
mmltic roles of the verb but no other predi('ate 
argument associations at; hand. 
The aim of this paper is to use only nfinimal, 
but yet precise information fbr lexical ambiguity 
resolution. We will show that good results are 
obtainable by employing a simple and natural 
look-up in a probabilistic class-labeled lexicon 
for disambiguation. The lexicon provides a prob- 
ability distribution on semantic selection-classes 
labeling the slots of verbal subcategorization 
frames. Induction of distributions on frames and 
class-labels is accomplished in an unsupervised 
manner by applying the EM Mgorittnn. Disam- 
biguation then is done by a simple look-up in the 
probabilistie lexicon. We restrict our attention 
to a definition of senses as alternative transla- 
tions of source-words. Our approach provides a 
very natural solution for such a target- 
disambiguation task--look for the most fl'equent 
target-noun whose semantics fits best with the 
649 
Class 19 
PROB 0.0235 
0.0629 
0.0386 
0.0321 
0.0236 
0.0226 
0.0214 
0.0173 
0.0136 
0.0132 
0.0126 
0.0124 
0.0115 
0.0113 
0.0108 
0.0009 
0.0086 
0.0085 
0.0082 
0.0082 
0.0082 
enter.aso:o 
cover.aso:o 
call.aso:o 
include.aso:o 
l'tllh &SO ; O 
attend.aso:o 
cross.aso:o 
dominate.aso:o 
have.aso:s 
attract.aso:s 
occupy.aso:o 
include.aso:s 
COllt aJ n, ~-'~s o : S 
beCOlne,g8:S 
fornl.aso:o 
collapse.as:s 
crel~te.aso:o 
provide.aso:s 
orgallize.aso:o 
offer.aso:s 
d c:; c5 d d d d d d d d d d d d ~ d c; d c:; d d d c5 d d d d c:; 6 
• • • 
• • • 
• • 
• • • 
• • • 
o 
• • • 
• • • • • • * • • 
• • • • • • • • • el • • • 
• • • • • • • • • • 
• • • • • • • • • • 
• • • • • • • • • • • • 
• • • • • • • • • • • • 
Figure 1: Class 19: "locative action". At the top are listed the 20 most t:)robable nouns in the 
pLc(n119 ) distribution and their probabilities, and at leg are tile 30 most probable verbs in the 
pLC(V 119) distribution. 19 is the class index. Those verb-noun pairs which were seen in the training 
data appear with a dot in the class matrix. Verbs with suffix .as : s indicate the subject slot of an 
active intransitive. Similarily .aso : s denotes the subject slot of an active transitive, and .aso : o 
denotes the object slot of an active transitive. 
semantics required by the target-verb. We eval- 
uated this simple method on a large number of 
real-world translations and got results compara- 
ble to related approaches such as that of Dagan 
and Itai (1994) where much more selectional in- 
t!ormation is used. 
2 Lexicon Induction via EM-Based 
Clustering 
2.1 EM-Based Clustering 
For clustering, we used the method described 
in Rooth et al. (1999). There classes are de- 
rived from distributional data a sample of 
pairs of verbs and nouns, gathered by parsing 
an unannotated corpus and extracting tile fillers 
of grammatical relations. The semantically 
smoothed probability of a pair (v,n) is calcu- 
lated in a latent class (LC) model as pLC(V, n) = 
~<cPLC(C, v,'n). The joint distribution is de- 
fined by PLC(C, v, n) = PLC(C)PLc(V\[C)PLC(nIC ). 
By construction, conditioning of v and n on 
each other is solely made through the classes 
c. The parameters PLC(C), PLC(V\[C), PLC(n\[c) 
are estilnated by a particularily silnple version 
of tile EM algorithm for context-free models. 
Input to our clustering algorithm was a train- 
ing corpus of 1,178,698 tokens (608,850 types) 
of verb-noun pairs participating in the gram- 
matical relations of intransitive and transitive 
verbs and their subject- and object-fillers. Fig. 
1 shows an induced class froln a model with 35 
classes. Induced classes often have a basis in lex- 
ical semantics; class 19 can be interpreted as 
locative, involving location nouns "room", "are£', 
and "world" and verbs as "enter" and "cross". 
2.2 Probabilistic Labeling with Latent 
Classes using EM-estimation 
To induce latent classes tbr the object slot; of a 
fixed transitive verb v, another statistical infer- 
ence step was performed. Given a latent class 
modal PLC(') Ibr verb-noun pairs, and a sam- 
ple nl,..., nM of objects for a fixed transitive 
verb, we calculate tile probability of ml arbitrary 
object noun ,I, I~ N by p(n) = ~<cP(C, ~;,) = 
~<c P(c)pLc(n'Ic)" This fine-tuning of the class 
parameters p(c) to tile sample of objects for a 
fixed verb is formalized again as a simple in- 
stance of the EM algorithm. In an experiment 
with English data, we used a clustering model 
with 35 classes. From the maximum probabil- 
650 
ity pm:ses derived fl)r the British National Cor- 
pus with the head-lexicalized parser of Carroll 
and Rooth (1.998), we extracted frequency ta- 
bles tbr transitive verb-noun pairs. These tables 
were used to induce a small class-labeled lexicon 
(336 verbs). 
cross.aso:o 19 0.692 
mind 74.2 
road 30.3 
line 28.1 
1)ridge 27.5 
room 20.5 
t)order 17.8 
l)oundary 16.2 
river 14.6 
street 11.5 
atlantic 9.9 
mobilize.aso:o 6 0.386 
h)rce 2.00 
t)eoi)le 1.95 
army 1.46 
sector 0.90 
society 0.90 
worker 0.90 
meinber 0.88 
company 0.86 
majority 0.85 
party 0.80 
lID 160867) Es gibt einigc alte Passvorschriften, die. be- 
sagen, dass man cinch Pass habcn muss, wcnn man dic 
Grenze iiberschreitct. There are some old provisions re- 
ga.rding passports which state that people crossing the 
{border/ frontier/ boundary/ limit/ periphery/ 
edge} shoukI have their 1)assl)ort on them. 
lID 201946) Es 9ibt sehlie.sslich keinc L5sung ohne 
die Mobilisierung der bii~yerlichen Gesellschaft und 
die Solidaritiit dcr Dcmok,nten in der 9anzcn Welt. 
Ttmrc can be no solution, tinally, mflcss civilian {com- 
pany/ society/companionship/party/associate} 
is mobilized and solidarity demonstrated by democrats 
throughout the world. 
Figure 3: Exami)les for target-word ambiguities 
Figure 2: Estinmted fl:equencies of the objects 
of' the transitive verbs cross and mobilize 
Fig. 2 shows the topmost parts of the lexical 
entries for the transitive verbs cross and mo- 
bilize. Class 19 is the most prol)abh ~, class-label 
for the ol)jeet-slot of cross (prol)al)ility 0.692); 
tl~e objects of mobilize belong with prol)ability 
0.386 to class 16, which is the most probable 
(:lass for this slot. Fig. 2 shows for each verb the 
tell llOllllS 'It with highest estimated frequencies 
.l',,('n,) = f (n)p(cln), where .flu)is the fre(\]ll(~.ll(:y 
of n in the sample v,l, • • • , 'n,M. For example, the 
Dequency of seeing mind as object of c,ro.ss is 
estimated as 74.2 times, and the most fl'equent 
object of mobilize is estimated to be force. 
3 Disambiguation with Probabilistic 
Cluster-Based Lexicons 
Ii:t the following, we will des(:ril)e the simt}le 
and natural lexicon look-up mechanism which 
is eml)loyed in our disambiguation at)t)roach. 
Consider Fig. 3 which shows two bilingual sen- 
tences taken from our evaluation corlms (see 
Sect. 4). The source-words and their correspond- 
ing target-words are highlighted in bold thee. 
The correct translation of the source-noun (e.g. 
Gre.nzc) as deternfined by the actual trmlslators 
is replaced by the set of alterlmtive translations 
(e.g. { border, frontier, b(mndary, limit, peril)h- 
cry, edge }) as proposed by the word-to-word 
dictionary of Fig. 5 (see Sect. LI). 
The prol)lem to be solved is to lind a correct 
l;ranslation of the source-word using only min- 
imal contextual intbrmation. In our apt)roach , 
the decision between alternative target-nouns is 
done by llSillg only int'ormal,ion provided by the 
governing target-verb. The key idea is to back 
up this nfinimal information with the condensed 
and precise information of a probabilistic class- 
based lexicon. The criterion for choosing an al- 
terlmtive target-noun is thus the best fit of the 
lexical and semantic information of the target:- 
noun to the semantics of the argument-slot of 
the target-verb. This criterion is checked by a 
silnple lexicon look-up where the target-noun 
with highest estinmted class-based fl'equeney is 
determined. Fornmlly, choose l;11(; tm'get-nom~ gt, 
(and a class ~?) such that 
j&,,)=  .ax 
'nC N~c~C 
where L-(-.) = f(-,)v(d-.) is the estimated fre- 
quency of 'n, in tile sample of objects of a 
fixed target-verb, p(cl,n ) is the class-melnbershi t) 
probability of'n in c as determined by the proba- 
bilistic lexicon, and f(n) is the frequency of n in 
the combined sample of objects and trmlslation 
alternatives1. 
Consider example ID 160867 fron, Fig. 3. The 
mnbiguity to be resolved concerns the direct ob- 
jects of the verb cross whose lexical entry is 
partly shown in Fig. 2. Class 19 and the noun 
border is the pair yielding a higher estimated 
trequency than any other combination of a class 
and an alternative translation such as boundary. 
Similarly, for example ID 301946, the pair of the 
1Note that p(8) = max p(c) in most, but llOt all cases. 
cEC - - 
651 
target-noun society and class 6 gives highest es- 
timated frequency of the objects of mobilize. 
4 Evaluation 
We evaluated our resolution methods on a 
pseudo-disambiguation task sinlilar to that used 
in Rooth et al. (1999) for evaluating clustering 
models. We used a test set of 298 (v, n, n ~) triples 
where (v, n) is chosen randomly from a test cor- 
pus of pairs, and n ~ is chosen randomly accord- 
ing to the marginal noun distribution for the test 
corpus. Precision was calculated as the nmnber 
of times the disambiguation method decided for 
the non-random target noun (f~. = n). 
As shown in Fig. 4, we obtained 88 % pre- 
cision for the class-based lexicon (ProbLex), 
which is a gain of 9 % over the best cluster- 
ing model and a gain of 15 % over the hmnan 
baseline 2 . 
human clustering ProbLex mnt)iguity baseline 
2 73.5 % 79.0 % 88.3 % I 
Figure 4: Evaluation on pseudo-disambiguation 
task for noun-ambiguity 
The results of the pseudo-disambiguation 
could be confirmed in a fllrther evaluation on a 
large number of randonfly selected examples of 
a real-world bilingual corpus. The corpus con- 
sists of sentence-aligned debates of the Euro- 
pean parliament (mlcc = multilingual corpus 
for cooperation) with ca. 9 million tokens for 
German and English. From this corpus we pre- 
pared a gold standard as follows. We gathered 
word-to-word translations from online-available 
dictionaries and eliminated German nouns fbr 
which we could not find at least two English 
translations in the mice-corpus. The resulting 
35 word dictionary is shown in Fig. 5. Based on 
this dictionary, we extracted all bilingual sen- 
tence pairs from the corpus which included both 
the source-noun and the target-noun. We re- 
stricted the resulting ca. 10,000 sentence pairs 
to those which included a source-noun from this 
2Similar results for pseudo-dismnbiguation were ob- 
tained for a simpler approach which avoids an- 
other EM application for probabilistic class labeling. 
Here ~ (and ~) was chosen such that f~(v,~) = 
max((fLc (v, n) + 1)pcc (el v, n)). However, the sensitivity 
to class-parmnetcrs was lost in this approach. 
dictionary in the object position of a verb. Fm'- 
therniore, the target-object was required to be 
included in our dictionary mid had to appear 
in a similar verb-object position as the source- 
object fbr an acceptable English translation of 
the German verb. We marked the German noun 
n q in the source-sentence, its English translation 
ne as appearing in the corpus, and the English 
lexical verb re. For the 35 word dictionary of 
Fig. 5 this senti-automatic procedure resulted 
ill a test corpus of 1,340 examples. The aver- 
age ambiguity in this test corpus is 8.63 trans- 
lations per source-word. Furthermore, we took 
the semantically most distant translations for 25 
words which occured with a certain fi'equency 
in the ew~luation corpus. This gave a corpus of 
814 examples with an average ambiguity of 2.83 
translations. The entries belonging to this dic- 
tionary are highlighted in bold face in Fig. 5. 
The dictionaries and the related test corpora are 
available on the web 3. 
We believe that an evaluation on these test 
corpora is a realistic simulation of the hard task 
of target- disambiguation in real-word 
machine translation. The translation alterna- 
tives are selected fl'om online dictionaries, cor- 
rect translations are deternfined as the actual 
translations found in the bilingual corpus, no 
examples are omitted, the average ambiguity is 
high, and the translations are often very close 
to each other. In constrast to this, most other 
evaluations are based on frequent uses of only 
two clearly distant senses that were deternfined 
as interesting by the experimenters. 
Fig. 6 shows the results of lexical ambigu- 
ity resolution with probabilistic lcxica in com- 
parison to simpler methods. The rows show 
the results tbr evaluations on the two corpora 
with average ambiguity of 8.63 and 2.83 respec- 
tively. Colunm 2 shows the percentage of cor- 
rect translations found by disambiguation by 
random choice. Column 3 presents as another 
baseline disambiguation with the major sense, 
i.e., always choose the most frequent target- 
noun as translation of the source-noun. In col- 
unto 4, the empirical distribution of (v, n) pairs 
in the training corpus extracted from the BNC 
is used as disambiguator. Note that this method 
yields good results in terms of precision (P - 
#correct / $correct + $incorrect), but is much 
3http ://www. ims .uni-stuttgart. de/proj ekt e/gramot ron/ 
652 
Angrif\[ 
Art 
Aufgabe 
Auswahl 
Begriff 
Boden 
Einricht ung 
Erweit erung 
Fehler 
Gcnehmigung 
Clesehichte 
Gesdlschaft 
O|-el|ze 
Grund 
Karte 
Lage 
Mangel 
Menge 
Prfifung 
Schwlerigkelt 
Seite 
Sicherheit 
Sthnme 
'rerlllin 
"Vet'blnd ung 
Verbot 
Verpflicht u ng 
"~'et't rallen 
"Wahl 
"lgVeg 
Widerstand 
Zeiehen 
Ziel 
Z \[isaln 111 e llll alia 
Zustlmmung 
aggression, assault, oll)2nce, onset, onsbmght, attack , charge, raid, whammy, inroad 
form~ type, way, fashion, lit, kind, wise, lllallller, species, mode, sort, wtriety 
abandonment~ otIieo~ task, exercise, lesson, giveup, jot) , 1)roblcm, tax 
eligibility, selection, choice, wwlty, assortment, extract, range, sample 
concept, item, notion, idea 
ground, land, soil, floor, bottom 
arrangement, institution, constitution, cstablishlnellt, feature, installation, construction, setup, adjustment, composition, 
organization 
amplification, extension, enhancement, expansion, dilatation, upgr~ding, add-on, increment 
error~ shortcoming, blemish, I)lunder, bug, defect, demerit, failure, fault, flaw, mistake, trouble, slip, blooper, lapsus 
pernlission, approval, COllSellt, acceptance, al)l)robation , authorization 
hlstory~ story, tale, saga, strip 
company~ society, COmlmnionshil), party, associate 
border, frontier, boundary, Ihnlt, periphery, edge 
nlaster~ nlatter~ reasoll~ base, catlse, grOlllld~ bottolii root 
card, map, ticket, chart 
site, situation, position, bearing, layer, tier 
deficiency, lack, privation, want, shortage, shortcoming, absence, dearth, demerit, desideratum, insufticimlcy, paucity, scarceness 
alnollnt~ deal, lot, Illass I mtlltitttde, l)lenty, qtlalltity, quiverful~ vOhlllle 1 abull(latlce, aplellty 1 
assemblage , crowd, batch, crop, heal), lashings, scores, set, loads, I)ulk 
examinatlon, scrutiny, verification, ordeal, test, trial, inspection, tryout, 
assay, canvass, check, inquiry~ perusal, reconsideration, scruting 
difficttlty~ trollllle 1 problenl, severity, ar(lotlSlleSS 1 heaviness 
page~ party~ slde, point, aspect 
certainty, guarantee, safety, immunity, security , collateral , doubtlessness, sureness, deposit 
voice~ vote, tones 
elate, deadline~ meethtg, appointment, time, term 
assoclation, contact, link~ cha\[ll, ColIjtlnCtlOll~ COlll/ectioll~ fllSiOll, joint , conlpOtlll(l~ alliance, cl~tenation, tie, lllllOIl I t)Olld~ 
interface, liaison, touch, relation, incorporation 
ban, interdiction, I)rohibition, forbiddance 
eominitment: obligation, undertaking, duty, indebtedness , onus, debt, engagement, liability, bond 
COllfidence~ rellance, trllst~ faith, asstlrance~ dependence, private, secret 
election, option, choice , ballot, alternagive, poll , list 
path~ road, way, alley, route, lane 
resistance, opposition, drag 
character, icon, Sigll I sigllal, Syllll)ol, lllark, tokell~ figure, olneil 
ahn, destination, end, designation, target, goal, object, objective, sightings, intentimb prompt 
coherence, context~ COlltlgtllty, connectloli 
agreeinent~ approvaI~ assont, accordance, approbation, consent, afIinnation, allowance, compliance, comi)Iiancy, acclamation 
Figure 5: Dictionaries extracted from online resources 
ambiguity random major emlfirical sense distrib, clusl;ering ProbLex 
P: 46.1% 8.63 14.2 % 31.9 % E: 36.2 % 43.3 % 49.4 % 
P: 60.8 % 2.83 35.9 % 45.5 % E: 49.4 % 61.5 % 68.2 % 
Figure 6: Disambig, mtion results for clustering versus probabilistic lexicon methods 
worse in terms of effectiveness (E //corre(-t 
/ \]/-correct q #:incorrect t \]/:don't know). The 
reason for this is that even if the distribution 
(ff (v,n) pairs is estimated quite precisely for 
the pairs in the large training corpus, there are 
still many pairs which receive the same or no 
positive probability at all. These effects can'be 
overcome by a clustering approach to disam- 
biguation (column 5). Here the class-smoothed 
probability of a (v, n) pair is used to decide be- 
tween alternative target-nouns. Since the clus- 
tering model assigns a more fine-grained prob- 
ability to nearly every pair in its domain, there 
are no don't know cases for comparable preci- 
sion values. However, the senmntically smoothed 
probability of the clustering models is still too 
coarse-grained when compared to a disambigua- 
tion with a prot)abilistic lexicon. Here ~ fllrther 
gain in precision and equally effectiveness of ca. 
7 % is obtained on both corpora (column 6). 
We conjecture that this gain (:an be attrilmted 
to the combination of Dequency iilformation of 
the nouns and the fine-tuned distribution on the 
selection classes of the the nominal arguments 
of the verbs. We believe that including the set 
of translation alternatives in the ProbLex dis- 
tribution is important for increasing efficiency, 
because it gives the dismnbiguation model the 
opportunity to choose among unseen alterna- 
tives. Furthermore, it seems that the higher pre- 
cision of ProbLex can not be attributed to filling 
in zeroes in the empirical distribution. Rather, 
we speculate that ProbLex intelligently filters 
the empirical distribution by reducing maximal 
653 
counts for observations which do not fit into 
classes. This might help in cases where the em- 
pirical distribution has equal values for two al- 
ternatives. 
source target 
Seite 
Sicherheit 
Verbindung 
Verpflichtung 
Ziel 
overall precision 
page 
side 
guarantee 
safety 
commction 
link 
commitment 
obligation 
objective 
target 
Figure 7: Precision for finding correct and ac- 
ceptable translations by lexicon look-up 
Fig. 7 shows the results for disambiguation 
with probabilistic lexica for five sample words 
with two translations each. For this dictionary, 
a test corpus of 219 sentences was extracted, 200 
of which were additionally labeled with accept- 
able translations. Precision is 78 % for finding 
correct translations and 90 % for finding accept- 
able translations. 
Furthermore, in a subset of 100 test items 
with average ambiguity 8.6, a lmnlan judge hav- 
ing access only to the English verb and the set of 
candidates for the targel,-lloun, i.e. the informa- 
tion used by the model, selected anlong transla- 
tions. On this set;, human precision was 39 %. 
5 Discussion 
Fig. 8 shows a comparison of our approadl 
to state-of-the-art unsupervised algorithlns for 
word sense disambiguation. Column 2 shows the 
number of test examples used to evaluate the 
various approaches. The range is from ca. 100 
examples to ca. 37,000 examples. Our method 
was evaluated on test corpora of sizes 219, 814, 
and 1,340. Column 3 gives the average number 
of senses/eranslations for the different disam- 
biguation methods. Here the range of the ambi- 
guity rate is from 2 to about 9 senses 4. Column 4 
4The mnbiguity factor 2.27 attributed to Dagan and 
Itai's (1994) experiment is calculated by dividing their 
average of 3.27 alternative translations by their average 
of 1.44 correct translations. Furthermore, we calculated 
the ambiguity factor 3.51 for Resnik's (1997) experiment 
shows the rmldom baselines cited for the respec- 
tive experiments, ranging t'rom ca. 11% to 50 %. 
Precision values are given in column 5. In order 
to compare these results which were computed 
for different ambiguity factors, we standardized 
the measures to an evaluation for binary ambi- 
guity. This is achieved by calculal;ing pl/log2 arab 
for precision p and ambiguity factor arab. The 
consistency of this "binarization" can be seen by 
a standardization of the different random base- 
lines which yields a value of ca. 50 % for all 
approaches 5. The standardized precision of our 
approach is ca. 79 % on all test corpora. The 
most direct point of comparison is the method 
of Dagan and Itai (1994) whirl1 gives 91.4 % pre- 
cision (92.7 % standardized) and 62.1% effec- 
tiveness (66.8 % standardized) on 103 test; exam- 
ples for target word selection in the transfer of 
Hebrew to English. However, colnpensating this 
high precision measure for the low effectiveness 
gives values comparable to our results. Dagan 
and Itai's (1994) method is based on a large vari- 
ety of gramnmtieal relations tbr verbal, nominal, 
and adjectival predicates, but no class-based in- 
fornmtion or slot-labeling is used. I{esnik (1997) 
presented a disambiguation method which yields 
44.3 % precision (63.8 % standardized) tbr a 
test set of 88 verb-object tokens. His approach is 
coral)arable to ours in terlns of infbrmedness of 
the (tisambiguator. Hc also uses a class-based se- 
lection measure, but based on WordNet classes. 
However, the task of his evaluation was to se- 
lect WordNet-senses tbr the objects rather than 
the objects themselves, so the results cannot 
be compared directly. The stone is true for the 
SENSEVAL evaluation exelcise (Kilgarriff and 
Rosenzweig, 2000)--there word senses from the 
HECTOl~-dictionary had to be disambiguated. 
The precision results for the ten unsupervised 
systems taking part in the comt)etitive evalu- 
ation ranged Kern 20-65% at efficiency values 
from 3-54%. The SENSEVAL '~tan(lard is clearly 
beaten by the earlier results of Yarowsky (1995) 
(96.5 % precision) and Schiitze (1992) (92 % 
precision). However, a comparison to these re- 
from his random baseline 28.5 % by taking 100/28.5; re- 
versely, Dagan and Itai's (1994) random baseline can be 
calculated as 100/2.27 = 44.05. Tile ambiguity t;'~ctor for 
SENSEVAL is calculated for tile llOUll task in the English 
SENSEVAL test set. 
5Note that we are guaranteed to get exactly 50 % 
standardized random 1)aseline if random, arab = 100 %. 
654 
disambiguation corlms random precision method size aml)iguity random 1)recision (standardized) (standardized) 
)robLex 1 340 8.63 14.2 % 49.4 % 53.4 % 79.7 % 
814 2.83 35.9 % 68.2 % 50.5 % 77.5 % 
219 2 50.0 % 78.0 % 50.0 % 78.0 % 
)agan, Itai 94 
{esnik 97 
;ENSEVAL 00 
(m'owsky 95 
',chiitze 92 
103 
88 
2 756 
37 000 
3 000 
2.27 
3.51 
9.17 
2 
2 
44.1% 
28.5 % 
10.9 % 
50.0 % 
50.0 % 
P: 91.4 % 
E: 62.1% 
44.3 % 
P: 20-65 % 
E: 3-54 % 
96.5 % 
92.0 % 
50.0 % 
50.0 % 
50.0 % 
50.0 % 
50.0 % 
P: 92.7 % 
E: 66.8 % 
63.8 % 
P: 60-87 % 
E: 33-83 % 
96.5 % 
92.0 % 
Figure 8: Comparison of unsupervised lexical disambiguation methods. 
sults is again somewhat difficult. Firstly, these 
at)proaches were ewfluated on words with two 
clearly (tistmlt senses which were de/;el'nfined by 
the experimenters. In contrast, our method was 
evalutated on randonfly selected actual transla- 
tions of a large t)ilingual cortms. Furthermore, 
these apl)roaches use large amounts of infbrma- 
tion in terms of linguistic ca.tegorizations, large 
context windows, or even 1111nual intervention 
such as initial sense seeding (hqtrowsky, 1995). 
Such information is easily obtainabh;, e.g., in I1\]. 
at)tflications , but often burdensome to gather or 
sim.i)ly uslavail~bh'~ in situations such as incre- 
mental parsing O1' translation. 
6 Conclusion 
The disanfl3iguation method presented in this 
pa.per delibera.tely is restricted to the limited 
mnomlt of information provided by a proba- 
bilistic class-based lexicon. This intbrmation yet 
proves itself accurate enough to yield good em- 
pirical results, e.g., in target- disam- 
biguation. The t)rol)al)ilistic class-based lexica 
are induced in an unsupervised manner fl'om 
large mmnnotated corpora. Once the lexica are 
constructed, lexical mnbiguity resolution can be 
done by a simple lexicon look-up. I51 target- 
word selection, the nlOSt fl'equent target-noun 
whose semantics fits best to tit(; semantics of the 
argument-slot of the target-verb is chosen. We 
evaluated our method on randomly selected ex- 
amities Dora real-world bilingual corpora which 
constitutes a realistic hard task. Dismnbiguation 
based on probabilistie lexica perfornmd satisfim-' 
tory for this |;ask. The lesson lem'ned tYom our 
experimental results is that hybrid models con> 
bining fi:equency information and class-based 
t)robabilities outlmrtbnn both pure fl'equency- 
based models and pure clustering models. 1'511"- 
ther improvements are to be expected from 
extended lexica including, e.g., adjectival and 
prepositional predicates. 

References 

Gleml Carroll mid Mats F\[ooth. 1998. Valence 
induction with a head-lexicalized PCFG. In 
P'roceediugs o.f EMNLP-,7, Granada. 

Ido l)agan and Ahm Itai. 1994:. Word sense dis- 
ambiguation using a second  1110510- 
linguaJ corlms. Computational Linguistics, 
20:563 596. 

Adam Kilgarriff and Joseph lq.osenzweig. 2000. 
English SENSEVAIA I-{.el)ol't and results. In 
Proceedings of LR\]';C 2000. 

Philip l{csnik. 1997. Selectional preference and 
sense dis~mfl)iguation, l\[ll Proceedings of the 
ANLP'97 Workshop: Tagging Tc:ct 'with Lezi- 
cal Semantics: Why, What, and How?, V~:ash- 
ington, D.C. 

Mats l\].ooth, Stefan I{iezler, Detlef Prescher, 
Glenn Carroll, and Franz Bell. 1999. Induc- 
ing a semantically annotated lexicon via EM- 
based clustering. In Proceedings of the 37th 
Annual Meeting of th, c Assoc.iation .for Com,- 
putational Linguistics (A CL '99), Maryland. 

Ilinrieh Schfitze. 1992. Dimensions of meaning. 
151 Proceedings of S'upercomlnd.ing '92. 

David Yarowsky. 1995. Unsupervised word 
sense dismnbiguation rivaling supervised 
methods. In Proceedings of the 33rd Annual 
Meeting of th, c Association for Compv, tational 
Linguistics (ACL'95), Cambridge, MA. 
