Learning Word Clusters from Data Types 
Paolo Allegrini, Simonetta Montemagni, Vito Pirrelli 
Istituto di Linguistica Comlmtazionale - CNR 
Via della Faggiola 32, Pisa, Italy 
{ allegrii),simo,vito } @ilc.pi.cnr.it 
Abstract 
The paper illustrates a linguistic knowledge ac- 
quisition model making use of data types, in- 
finite nlenlory, and an inferential mechanism 
tbr inducing new intbrmation Dora known data. 
The mode\] is colnpared with standard stochas- 
tic lnethods applied to data tokens, and tested 
on a task of lexico semantic classification. 
1 Introduction and Background 
Of late, consideral)le interest has been raised by 
the use of local syntactic contexts to automati- 
cally in&me lexico-semantic classes from parsed 
corI)ora (Pereira and Tishby 1992; Pereira ct 
al. 1993; Rooth 1995; Rooth et al. 1999). This 
family of approaches takes a 1)air of words (usu- 
ally a verb plus a noun), and a syntactic rela- 
tion holding 1)etween the two in context (llSll- 
ally the ol)ject), and calculates its token distri- 
bution in a training corI)us. These (:omits de- 
line the range of more or less tyi)ical syntac- 
tic collocates selected by a verb. The semat> 
tic similarity between words is then delined in 
terms of sul)stitutability in local contexts (see 
also Grefenstette 1994; I.,in 1998): two verbs 
are semantically close if they typically share the 
same range of collocates; conversely, two nouns 
are semantically close if they take l)art in the 
same tyl)e of selection dependencies, i.e. if they 
arc selected t)y the same verbs, with the same 
function. \]~q'onl this perspective, a syntactically 
asymmetric relation (a dependency) is reinter- 
preted as a semantic co-selection, where each 
term of the relation can be defined with respect 
to the other. 
This symmetric similarity metric is often ac- 
(;omi)anied by the non trivial assuml)tion that 
th, c ,semantic classification of both vcrb,s and 
nouns be symmetric too. This is enforced by 
maximizing I7 ?tj), with 
to 
1--1 
where p(Ck) is the probability of class Ck 1)e- 
ing tbund in the training corpus, and p(vi\[Ck) 
and p(nj\[Ci~) define the probability that verb vi 
and noun nj be associated with the semantic 
dimension (or meaning component) of class C/~. 
Intuitively, the joint distribution of fimctiona.lly 
mmotated verb-noun pairs is accounted for t)y 
assuming thai; each l)air meml)er indel)endently 
correlates with the same semantic dimension, or 
selection type (Rooth 1995), a concel)tual pair 
de.fining all pairs in the class: <g. "scalar mo- 
tion", "communicative action" etc. 
The al)proach has the potential of dealing 
with polysemous words: as the same word can 
in principle belong to more than one (-lass, there 
are good reasons to expect that the correspond- 
ing selection type positively correlates with one 
and only one sense of polysemous words. A fur- 
ther t)omls of the al)I)roach is that it makes it ex- 
i)licit the perspectivizing factor underlying the 
discovered similarity of words in a class. 
On a less positive side, poorly selective verbs 
(i.e. verbs which potentially combine with any 
1101111) Sllch as give, find or get tend to stick to- 
gether in highly probable classes, 1)ut apl)ear to 
stake out rather uIfinlbrmative senlantic dimen- 
sions, relating a motley collection of 11011118, such 
as part, way, reason and problem (Rooth 1995), 
whose only cominonality is the property of be- 
ing freely interchangeable in the context of the 
above-mentioned verbs. 
Another related issue is how many such di- 
nlensions are necessary to account tbr the entire 
variety of senses attested in the training corpus. 
This is an empirical question, but we contend 
an inipor~ant one, as the usal)ility of the result;- 
lag (:lasses heavily dot)ends (m it. It is (:ommon 
knowledge that verbs can 1)c, exceedingly (-hoosy 
ill tile way they select their collocates. Hence, 
one is Mlowed to use the class Ck to make t)re - 
dictions about the set of collocates of a verb 
v~, only if P(v,:\]C~) is sufficiently high. Con- 
versely, if CI~ hal)pens to poorly (:orrelate with 
any verb, the set of nouns in Ck is unlikely 1;o 
reflect any lexical selection. This coml)ounds 
wii;h the 1)rol)lent that the meaning of a verb vi 
can significantly involve more lihalt Olte Selltall- 
tic (timension: at i;he plesenl; stage of research 
in (:Olnlmtat;iollal lexical SO, lnanti('s, Ire scholar 
has shown what fun(:tion relal;es l;lm nmaning 
components of vi to its sehx:tional behaviom'. 
There is wide room for flu'ther resear(:h in this 
area, but truly ext)lorative tools are still needed. 
Finally, the des(:ribe(t method is a(:ul;c, ly 
t)rone to the 1)roblem of si)arse data. A ll;l)ough i,(cl,,,) 
is rightly ext)e(:ted to converg(, faster 
,:ha:, p(,,l.,), still (:o,,vergcu(:e of,(Cl',,) ,:al, b(, 
ex(:(;edingly sh)w with low frequen('y nouns. It 
is moo|; i;ll;~I; sieving more and more (:()rims (tat~ 
is a solution in all (:ases, its word fl'cquen(:y is 
highly sensitive to changes in text genre, topic 
and domain (S(:hiitze and Pede, rsen 1993). 
2 The approach 
Ih~re we illustrate a (lifl'er(mt; at)l)roa('h t() a(> 
(is|ring lexico semant;i(" (:lasses from sy~,ta('t;i- 
(:ally lo(-al ('ontexts. Like the family of sto(:has- 
tie metho(ts of se(:tion 1, we make use of ~t 
simibu:ity ntel;ric 1)ased on sul)stitui;ability ill 
(ve, rb,noun,flntction) tril)les. We also share the 
assumption that lexi(:o semantic classes are in- 
herently multidimensional, as they heavily de- 
pend on cxis|;ence of a perspectivizing factor. 
Yet, we depart from other assmnt)tions. Clas- 
sification of verl)s an(l n(mns is asymmetric: 
two IIOIIIIS {Ll'e similar if (;hey collo(:ate with as 
many semantically diverse vcrl)s as possible in 
as many (tifli:rent syntactic contexts as l)ossit)le. 
The converse apl)lies to verbs. In other words, 
semantic similarity of nom:s is not conditional 
on the similarity of their ac(:oml)anying verbs, 
and vicevcrsa. In a sells(',, classification brc, aks 
th, c symmetry: maximization of the silnilarity 
of nouns (verbs) may cause minimization of the 
similarity of i;heir accoml)anying verbs (nouns). 
A (:lass where a maximum of noun similarity 
correlates with a lni~ximmn of verb similarity 
cm~ be uninforniative, as exeml)litied above by 
the ease of t)oorly selective verbs. 
Secondly, we assmne (following Fodor 1998) 
that the number of t)erst)ectivizing factors gov- 
erning lexieal selection may have the order of 
magnitude of the lexicon itself. The use of 
global semantic dimensions may smooth out lex- 
ical t)refcrences. This is hardly what we need 
to semantically annotate lexical l)reti;rences. A 
more conservative al)proa(:h to the t)rol)lem, in- 
ducing local semantic ('lasses, (:an (:oml)ine al)- 
1)licability to real  1)recessing l)roblen~s 
with the fln'l;lmr t)omls of exploring a relatively 
mmharted territory. 
Thirdly, p(vi, nj) ai)t)ears to be too sensitive 
to changes in text genre, tol)ic lind domain to 
be eXl)ectcd to converge reliably. We prefer to 
ground a similarity metric oIx measuring the cor- 
relation among verb noun, type, s ratlw, r than |e- 
ke, as, tbr two basic reasons: i) verl) noun types 
arc (tis(:retc, ;m(t less l)rone t;o random varia- 
tion in it (parsed) (:orpus, ii) verl) noun tyl)eS 
(:ml reliably l)e a(:(tuir(xl from highly intbrm~ttive 
trot hardly redundant knowledge sources such as 
h~xi(:a an(1 encyclot)aedias. 
Finally, our information refit tbr measuring 
wor(t similarity is not a (:Oul)le of context 
sharing pairs (e.g. (set, sta.ndard,obj) and 
(sct, re.cord,obj)) but a qv, ad'r'uplc, of such con- 
text;s, tbrme(t 1)y (:ombining two verbs with two 
.(,.))s (,.,. 
a.d ), 
such that they enter an av, ah)gical proportion. 
2.1 The analogical proportion 
In the t)resenl; conl;ext, an anah)gieal prot)ortion 
(hereafl;er AP) is a quadrui)le of flmctionally 
mmotated t)airs resulting from tile combination 
of any two ltOllllS 'l~, i and ~3.j wit;h any two verbs 
v/,. and vt su(:h as (2) holds: 
(v~.,ni,f,,,) : (v~.,nj,L,,)= 
(v,,ni, fn) : (vt,nj,fiz), (2) 
where terms along the two diagonals can sw~p 
t)lace in the 1)rot)ortion , and identity of sub- 
s(:ript indi(:ates identity of wflues. Three aspects 
of (2) are worth eml)hasizing in this context. 
First, it does not require that the stone syn- 
tactic time|ion hold 1)etween all pairs, but only 
that time|ions be pairwisc identical. Moreover, 
(2) does not cover all possible syntactic contexts 
where hi, uj, "vk and vt may coral)|he, but only 
th, ose where verb and .function values co-vary. 
(.set, standard, obj) : (,set, record, obj) = 
(meet, standard, obj) : (,,,.tet:,record, x) (3) 
We call this constraint tile "same-verb-same 
flmction" principle. As we will see in section 2.3, 
the principle has important consequences on tile 
sort of similarity induced by (2). Finally, if one 
uses subscripts as tbrmal constraints on type 
identity, then any term can be derived from (2) 
if the values of all other terms are known. For 
example given tile partially instantiated propof 
tion in (3), the last term is filled in unambigu- 
ously by substituting x = fn = obj. 
AP is an important generalization of the 
inter-substitutability assumption, as it extends 
tile assumption to cases of flmctionally hetero- 
geneous verb-nonn pairs. Intuitively, an AP 
says that, for two nouns to be taken as sys- 
tematically similar, one has to be ready to 'ase 
th, cm interchangeably in at lea,st two different lo- 
cal contexts. This is where the inferential and 
the classificatory perspectives meet. 
2.2 Mathematical background 
We gave reasons for defining the similarity met- 
ric as a flmction of verb-noun type correlation 
rather than verb noun token correlation. In 
this section we sketch the mathematical frame- 
work underlying this assumption, to show that, 
tbr a set of verb nonn pairs with a unique syn- 
tactic function, AP is the smallest C that sat- 
isfies eq.(1). 
Eq.(1) says that vi and nj are conditionally 
independent given C, meaning that their corre- 
lation only det)ends on the probability of their 
belonging to C, as tbrmally restated in eq.(4). 
p(n, vlC) = p(nlC)p(vlC) (4) 
In passing from token to type frequency, we as- 
sume that a projection operator simply assigns 
a mfitbrm type probability to each event (pair) 
with a nonzero token probability in the train- 
ing corpus. From a learning perspective, this 
corresponds to the assumption that an infinite 
memory filters out events already seen during 
training. The type probability pT(n,v) is de- 
fined as in eq.(5), where Np is the number of 
different pairs attested in the training corpus. 
pT(n,v) = 1/Np if the pair is attested, 
pT(n,v) = 0 otherwise. (5) 
By eq.(4), pT(n, vlC ) ¢ 0 if and only if 
pT(nlC) ~ 0 and pT(vlC) ~ O. This amounts 
to saying that all verbs in C are freely inter- 
changeable in the context of all nouns in C, and 
viceversa. We will hereafter refer to C as a sub- 
stitutability island ( SI). AP can accordingly be 
looked at as the minimal SI. 
The strength of correlation of nouns and 
verbs in each SI can be measured as a sum- 
mation over the strength of all APs where they 
enter. Formally, one can define a correlation 
score or(v, n) as the probability of v and n be- 
ing attested in a pair. This can be derived from 
our definition of pr(v,n), as shown in eq.(6), 
by substituting pT(n,v) = pr(v)pT(nlv) and 
pT(nlv ) = 1/w(v), where w(a) is tile type fi'e- 
quency of a (i.e. number of different attested 
pairs containing a). 
1 - = = 
G 
(6) N,, G 
Eq.(6), after simplification, yields the tbllowing 
o< (7) 
By the same token, the correlation flmction 
c7(AP) relative to the 4 possit)le pairs in AP 
is calculated as 
cr(dP) = 1)7'('/)11~3,1 )PSF (~?,2 \[V\] )pT(V2 I'rl,2)PT(7~l \[?)2 ) 
(3( \[C0(~%1)C0(Vl )C0(~'1,2)C0('02)\] -1. (g) 
Eq.(8) captures the intuition that the corre- 
lation score between verbs and nouns in AP 
is an inverse function of their type frequency. 
Nouns and verbs with high type frequency oc- 
cur in many different pairs: the less selective 
they are, the smaller their semantic contribu- 
tion to cr(AP). 
Our preference tbr a(AP) over el(v, n) under- 
lies the definition of correlation score of SI given 
in eq.(9) (see also section 4). 
-_- Z (9) APESI 
2.3 Breaking the symmetry 
In section 2.2 we assumed, for the sake of sim- 
plicity, that verbs and nouns are possibly re- 
lated through one syntactic function only. In a 
10 
proportion like (2), however, l;he syntactic time- 
tion is allowed to wtry. Nonetheless each rclal;ed 
S\] contains nouns which always combine with 
a given verb with one and the .same syntactic 
./:unction. Clearly, the Sallle is no|; true of verbs. 
Suppose that an S1 contains two verbs vk and vt 
(say drive and pierce) and two nouns ni and nj 
(say nail and peg) that ~re respecl;ively el)jeer 
and subject of 'vl~ and yr. The type of similar- 
ity in the resulting n(mn and verb clusters is 
of a completely ditti;rent nature: in the case of 
n(mns, we acquire dist'rib,utionally parallel words 
(e.g. nail and peg); in the case of verbs, we 
get distrib'utionaIly correlated words (say drive 
and pierce) which are not interchangeable in the 
same conl;exl;. Mixing the two types of distribu- 
tional similarity in the same class makes little 
sense. Hereafter, we will aim at maximizing the 
similarity of disl;ributionally parallel nouns. In 
doing so, we will use functionally hel;erogencous 
contexts as in (2). This breaks classitication 
symlne|,ry, and there is no guarantee I;hal; se- 
mantically coher(mt verb clust;ers be rcl;m'ncd. 
3 The method 
The section illusl;ratcs an at)plication of the 
principles of section 2 1;() 1;t5o task of clustering 
the set of object;s ot! a vo, rl) on t;he basis of a 
repository of flmctionally mmol;al;ed cont(;xts. 
3.1 The knowledge base 
The training evidence is a Knowledge. Base 
(KB) of flm('tionally anno|;ated verb noun 
1)airs, ins|;mltiating a wide rallg~e of syntactic re- 
lations: 
a) vert)objecl;, e.g. (,~'a,,,.~,,,'c, ~,,'ol, lc,,z, obj) 
'cause-1)roblenl'; 
b) verb subject, e.g. (capitare, pwblcma subj) 
'occur-problem'; 
c) verb prepositional_complement, e.g. (recap- 
pare, probIcma, in,) 'run_into-problenl'. 
The KaY contains 43,000 pair types, auto- 
matically extracted from different knowledge 
sources: dictionaries, both bilingual and mono- 
lingual (Montemagni 1995), and a corpus of ti- 
nancial newspapers (Federici st al. 1998). The 
two sources rettect two ditt'erent modes of lexi- 
cal usage: dictionaries give typical examples of 
use of a word, and rmming corpora attest ac- 
tual usage of words in specific enfl)edding do- 
mains. These differences have all impact on 
the typology of senses which the two sources 
1)rovi(le evidence for. General dictionaries tes- 
titly all possible senses of a given word; tyl)ical 
word collocates acquired from dictionaries tend 
to cover the entire range of possible senses of 
a headword. On the other hand, unrestricted 
texts reIlect actual usage and possibly bear wit- 
hess to senses which are relevant to a specific 
domain only. 
3.2 The input words 
There is abundant psycholinguistic evidence 
that semantic similarity between words is em- 
inently conI;exl; sensitive (Miller and Charles 
1991). Moreover, in many  processing 
tasks, word similarity is typically judged rela- 
tive to an actual context, as in the cases of 
syntactic disambiguation (both structural and 
fulwtional), word sense disambiguation, and se- 
lection of the contextually approt)riate transla- 
1;ion equiwflent of a word given its neighl)ouring 
words. Finally, close examination of real data 
shows that (titl'erellt word senses select classes of 
complements according to different dilnensions 
of semantic similarity. This is so pervasive, that 
it soon be(:omes impossit)le to t)rovide an efl'ec- 
live account of these, dimensions independently 
of the sense in question. 
Ewduation of botll accuracy and usability of 
any autontatic classitication of words into se- 
mantic clusters cammt lint artificially elude th(; 
1)asic question "similar in what respc, ct;?". Our 
choice of input words retlects these concerns. 
\¥e automatically clustered the set of objects of 
a given verb, as they arc attested in a test col 
pus. This yields local lexico seman{;ic classes, 
i.e. conditional on the selected verb head, as 
opposed to global classes, i.e. built once and 
tbr all to accomlt tbr the collocates of any verb. 
Among the practical advantages of local clas- 
sitication we should at least mention the follow- 
ing two. Choice of a verb head as a perspec- 
tivizing factor considerably reduces the possi- 
bility that the same polysemous object collocate 
is used in different senses with the same verb. 
Fnrthermore, the resulting clusters can give in- 
tormal;ion about the senses, or meaning facets, 
of the verb head. 
a.a Identification and ranking of noun 
clusters 
For the sake of concreteness, let us consider the 
tbllowing object-collocates of the Italian verb 
11 
{APPESANTIMENTO~ CRESCITA, 
FLESSIONE, RIALZO} 
{CRESCITA, FLESSIONE} 
{CRESCITA, GUAI0} 
{CRESCITA, PROBLEMA} 
{CRESCITA, RITARD0} 
{FLESSIONE, PROBLEMA} 
{FLESSIONE, RIALZO} 
{GUAI0, PROBLEMA} 
{RIDIMENSIONAMENT0, RITARD0} 
Figure h Some Sis 
headword causarc. 
: CAUSARE/O, REGISTRARE/O 
: CAUSARE/O, EVIDENZIARE/S, 
MEDIARE/O, MOSTRARE/O~ 
PRESENTARE/S, PRESENTIRE/O 
REGISTRARE/O, REGISTRARE/$ 
: CAUSARE/0, PROVOCARE/0 
AVERE/S~ CAUSARE/0, EVIDENZIARE/0 
PORRE/S, PRESENTARE/S 
CAUSARE/O, USARE/S 
CAUSARE/0, PRESENTARE/S, STARE/S 
CAUSARE/0, REGISTRARE/0 
REGISTRARE/S, SUBIRE/0 
CAUSARE/0, CAVARE -- SI/S, INCAPPARE/S 
CAUSARE/O, GIUSTIFICARE/O 
relative to the collocates of the 
causare 'cause', as they are found in a test cor- 
pus: 
appcsantimento 'increase in weight', 
crescita 'growth', flessionc 'decrease', 
guaio 'trouble', p~vblcma 'prol)lem', rialzo 
'rise', ridimensionamcnto 'reduction', 
ritardo 'delay', turbolenza 'turbulence'. 
Clustering these input words requires prelim- 
inary identification of Substitutability Islands 
(Sis). An example of SI is the quadruple 
tbrmed by the verb pair causare 'cause' and in- 
eapparc 'rnn into' and the noun pair guaio 'trou- 
ble' and problema 'problem', where menfl)ers of 
the same pair are inter-substitutable in context, 
give:: the constraints entbrced by the AP type 
in (2). Note that guaio and problema are ob- 
jects of eausare, and prepositional complements 
(headed by in 'in') of incappare. This makes it 
possible to maximize the sinfilarity of trouble 
and problem across fimctionally heterogeneous 
contexts. 
Bigger Sis than the one just shown will form 
as many APs as there are quadruples of col:- 
textually interchangeable nouns and verbs. We 
consider a lexico-semantic cluster of nouns the 
projection of an SI onto the set of nouns. Fig.1 
illustrates a sample of noun clusters (between 
curly brackets) projected from a set of Sis, to- 
gether with a list of the verbs tbund in the same 
Sis (the suffix 'S' stands tbr subject, and 'O' 
for object). Due to the asymmetry of classifica- 
tion, verbs in Sis are not taken to tbrm part of 
a lexico-semantic cluster in the same sense as 
nonns are. 
7.04509e-O5{GUAIO,PKOBLEMA} 
7.01459e-O5{RIDIMENSIONAHENTO,RITARDO} 
4.65858e-O5{CRESCITA,FLESSIONE} 
I.T5699e-O5{FLESSIONE,RIALZO} 
9.49509e-O6{APPESANTIMENTO,CRESClTA,FLESSIONE,RIALZO} 
1.88964e-O6{CRESCITA,GUAIO} 
1.19814e-O6{CRESCITA,RITARDO} 
8.84254e-OT{CRESCITA,PROBLEMA} 
6,TI41e-OT{FLESSIONE,PROBLEMA} 
Figure 2: Nine top-most scored noun clusters 
Not all projected noun clusters exhibit the 
same degree of semantic coherence. Intuitively, 
the cluster {appesantimento crescita flessione 
riaIzo} 'increase in weight, growth, decrease, 
rise' is semantically more appeMing tlmn the 
cluster {crescita problema} 'growth problem' 
(Fig.l). 
A quantitative measure of the sen:antic cohe- 
sion of a noun cluster CN is give:: by the con'e- 
lation score ~(SI) of the SI of which UN is a 
projection. In Fig.2 noun clusters are ranked by 
decreasing vahms of cr(SI), calculated according 
to eq.(9). 
3.4 Centroid identification 
Noun clusters of Figs.1 and 2 are admittedly 
considerably fine grained. A coarser grain can 
be attained trivially through set union of inte:'- 
secting clusters. In fact, what we want to obtain 
is a set of mazimally orthogonal and semanti- 
cally coherent noun classes, under the assump- 
tion that these (:lasses highly correlate with the 
principal meaning components of the verb head 
of which input nouns are objects. 
In the algorithm ewfluated here this is 
achieved in two steps: i) first, we select the 
best possible centroids of the prospective classes 
among the noun clusters of Fig.2; secondly, 
ii) we lmnp outstanding clusters (i.e. clusters 
which have not been selected in step i)) around 
the identified centroids. In what tbllows, we will 
only focus on step i). Results and evaluation of 
step ii) are reported in (Allegrini et al. 2000). 
In step i) we assume that centroids are 
disjunctively defined, maximally coherent 
classes; hence, there exists no pair of intersect- 
ing centroids. The best possible selection of 
centroids will include non intersecting clusters 
with the highest possible cumulative score. In 
practice, the best centroid corresponds to the 
12 
cluster with the topmost cr(SI). The second 
best centroid is the cluster with the second 
highest o-(SI) and no intersection with the first; 
centroid, and so on (the i-th centroid is 1;11(', 
i.-th highest chlster with no intersection with 
the tirst i - 1. centroids) until all clusters in the 
rank are used Ill). Clusters selected as centroids 
in the causare example above ~tre: {GUAIO PROBLEMA}, {RIDIMENSIONAMENTO RITARDO}, 
{CaP.SCITA FL~.SS-rONE}. 
Clearly, this is not the only t)ossible strategy 
t'or centroid selection, lint certainly a suitabh; 
one giv(;n our assulnl)tions mM goals. To stun 
Ul) , the targeted classification is local, i.('., con- 
ditional on a specific verl~ head, and orthogonal , 
i.c. it aims at identifying maximally disjulmtive 
classes with high correlation with the principal 
meaning ('Oral)orients of the vert) head. This 
strategy h;ads to identificalfion of the different 
senses, or possibly me,ruing facets, of a vert). 
In tlteir turn, noun clusters may capture sul)- 
tle semm~tic distinctions. For instance,, a dis- 
tinction is made between incremental eveni;s or 
results of incrt'anental e, vents, which 1)resut)l)OSe 
~ scalar dimension (as in the ease of {cre,,s'cita 
,/lcssione} 'growth, decrease') and re, scheduling 
eve, nts, where a change occurs with respect to 
a previously planned event or object (see the 
centroid {ridimensiov, amento ritardo} :reduc- 
tion debw' ). 
4 Experiment and evaluation 
We, were able to extract all S l's relative to the 
entire K\]3. However, we report here an intrinsic 
evaluation of the accuracy of acquired ce.ntroids 
which involves ()lily a small subset of our results, 
since provision of a refhrence class tyl)ology is 
extremely labour intensive. 1 
We consider 20 Italian verbs and their object 
collocates.gThe object collocates were automat- 
ically extracted fi'om the "Italian SPAIIKLE 
Reference Corpus", a corpus of Italian financial 
1For an extrinsic evahlation of the proposed similar- 
ity measure the reader is referred to (Montemagni et aI. 
1996; Briscoe ct al. 1999; Federici ct al. 1999a). 
2The |x,'st verbs are: a.qgiungcrc 'add', aiutare 'hell)' , 
a,sl)cttarc 'expect', cambiarc 'change', C(t*taO,?'t: t(:}tllSe:~ 
chicdcrc 'ask', considc.rarc 'consider', dare. 'give', dc- 
ciderc 'decide', fornive 'provide', muoverc. 'move', pcrm~'.- 
ttere 'allow', portarc 'bring', p~wlurrc 'produce', sccglicrc 
'choose', sentirc 'feel', stabilire 'establislF, tagliarc 'cut', 
terminate 'end', trovarc 'find'. 
newspapers of about one million word tokens 
(Federici ct al. 1998). 
l?or each test verb, an indetmndent classifica- 
tion of its collocates was created lnanually, by 
partitioning the collocates into disjoint sets of 
semantically coherent lexical preferences, each 
set pointing to distinct senses of the test; verb, 
according to a reference monolingual dictionary 
(Garzanti 1984). This considerably reduces the 
anlount of subjectivity inevitably involved in 
the creation of a reference partition, and mini- 
mizes the probability that more than one sense 
of a t)olysemous noun can appear in the same 
class of collocates. 
The inferred centroids, selected from clusters 
ranked by c~(SI) defined as in (9), are t)rojected 
~gainst the reference classification. Precision is 
delined as the ratio between 1;t1(; mmflmr of con- 
troids t)roperly inchlded in one reference class 
and l;he nmnber of inferred centroids. Recall is 
defined as the ratio between the number of relhr- 
time classes which properly inchlde at least one 
centroid an(t the nmnber of all reference classes. 
Fig.3 shows results for the sets of object collo- 
cates of 1)olysemous {;est verbs only, as lttOltOSe- 
mous verbs trivially yMd 100% precision recall. 
An average, wflue over the sets of object col- 
locates of all verbs is also shown, with 86% 
88% of precision recall. Another average value 
is also l)lotted (as a black ul)right triangle), ol)- 
l:ained \])y ranking n(mn clusters l)y ~(S\]) calcu- 
lated as ill (10). This average wflue (53% 53% 
precision recall) provides a sort of baseline of 
the difliculty of the task, and sheds consider- 
able light on the use of APs, rather than simple 
verb noun pairs, as inforlnation units ibr mea- 
suring internal cohesion of centroids. 
 (si) =_ (10) 
(n,v)G91 
5 Conclusion 
We described a linguistic knowledge acquisition 
model and tested it; on a word classification task. 
The main points of our prot)osal are: 
• classification is asymmetric, grounded on 
principles of machine learning with intinite 
memory; 
• the algorithm is explorative and non- 
reductionist; no a priori model of class dis- 
13 
' i ............ T L ......... i:i:  
! AIUTARE 
I ICAMBIARE 
0.9 CHIEDERE 
CONSIDERARE • 
- bARE 
' ,?DECIDERE 
0.8 PORTARE 
PRODURRE 
• SCEGLIERE ~ 
SENTIRE 
STABILIRE 
~TAGLIARE \[:1~ • 
~:TFIOVA RE 
0.6 ~'AVERAGE(AP) 
~AVERAGE(pai 0 
i • i 
o.5 ::.i • '1 
I 
i 
0.3 0.4 0.5 0.6 0,7 0.8 0.9 1 
PRECISION 
d < ~ 
0.7 
Figure 3: Centroid precision and recall for object 
collocates of polysemous verbs. 
tril)ution is assmned; 
• classification is modelled z~s the task of 
forming a web of context dependent se- 
mantic associations among words; 
• the approach uses a context--sensitive no- 
tion of semantic similarity; 
• the approach rests on the notion of analog- 
ical proportion, which proves to t)e a reli- 
able intbrmation refit for measuring seman- 
tic similarity; 
• analogical t)roportions are harder to track 
down than simple pairs, and interconnected 
in a highly complex way; yet, reliance on 
data types, as opposed to token frequen- 
cies, makes the proposed method comtm- 
rationally tractable and resistant to data 
sparseness. 

References 

Allegrini P., Montemagni S., Pirrelli V. (2000) Con- 
trolled Bootstrapt)ing of Lexico-semantic Classes 
as a Bridge between Paradigmatic and Syntag- 
matic Knowledge: Methodology and Evaluation. 
In Precccdings of LREC-2000, Athens, Greece 
May-June 2000, pp. 601-608. 

Briseoe T., McCarthy D., Carroll J., Allegrini P., 
Calzolari N., Federici S., Montemagni S., Pir- 
relli V., Abney S., Beil F., Carroll G., Light M., 
Prescher D., Riezler S., Rooth M. (1999) Acqui- 
sition System for Syntactic and Semantic Type 
and Selection. Deliverable 7.2. WP 7, EC project 
SPARKLE "Shallow Parsing and Knowledge Ex- 
traction for Language Engineering" (LE-2111). 

Federici, S., Montemagni, S., Pirrelli, V. (1999) 
SENSE: an Analogy based Word Sense Disam- 
biguation System. In M. Light and M. Pahner 
(eds.), Special Issue of Natural Language Engi- 
neerin 9 on Lexical Semantic Tagging. 

Federici, S., Montemagni, S., Pirrelli, V., Calzolari, 
N. (1998) Analogy-based Extraction of Lexical 
Knowledge from Corpora: the SPARKLE Expe- 
rience. In Proceedings of LREC-1998, Granada, 
SP, May 1998. 

Fodor, J.A. (1998) Concepts. Where Co.qnitivc Sci- 
ence Went Wzvng. Clarendon Press, Oxtbrd, 
1998. 

Garzanti (1984) ll Nuovo Dizionario Italiano 
Garzanti. Garzanti, Milano, 1984. 

Grefenstette, G. (1994) Explorations in Automatic 
Thesaurus Discovery. Kluwer Acadenfic Publish- 
ers, Boston, 1994. 

Lin D. (1998) Automatic Retrieval and Clustering 
of Similar Words. In Proceedings of COLING-ACL'98, Montreal, Canada, August 1998. 

Miller, G.A., Charles, W.G. (1991) Contextual Cor- 
relates of Semantic SimilariW. In Language and 
Cognitive Processes, 6 (1), pp. 1-28. 

Montemagni, S. (1.995) Subject and Object in Italian 
Sentence Processing. PhD Dissertation, UMIST, 
Manchester, UK, 1995. 

MontemagIfi, S., Federici, S., Pirrelli, V. (1996) 
Resolving syntactic ambiguities with lexico 
semantic patterns: an analogy-based aplproach. 
In Proceedings of COLING-96, Copenhagen, August 1996, pp. 376 381. 

Pereira, F., Tishby, N. (1992) Distributional Sim- 
ilarity, Phase Transitions and Hierarchical Clus- 
tering. In Working Notes, Fall Symposium Series. 
AAAI, pp. 54 64. 

Pereira, F., Tishby, N., Lee, L. (1993) Distrilmtional 
Clustering Of English Words. In Proceedings of 
the 30th Annual Meeting of the Association for 
Computational Linguistics, pp. 183 190. 

Rooth, M. (Ms) Two-dimensional clusters in gram- 
matical relations. In Symposium on 12epresenta- 
tion and Acquisition of Lcxical Knowledge: Pol- 
ysemy, Ambiguity and Gcncrativity, AAAI 1995 
Spring Symposium Series, Stanford University. 

Rooth, M., Riezler, S., Prescher, D., Carroll, G., 
Bell, F. (1999) Inducing a Semantically Annotated 
Lexicon via EM-Based Clustering. In Procccdings 
of the 37th Annual Meeting of the Association for 
Computational Linguistics, Maryland, USA, June 
1999, IP. 104-111. 

Schfitze, H., Pedersen, J. (1993) A vector model 
for syntagmatic and paradigmatic relatedness. In 
Proceedings of the 9th Annual Confcrcncc of the 
UW Centrc for" the New OED and Text 12cscarch, 
Oxford, England, 1993, pp. 104-113. 
