Lexical Discrimination with the Italian Version of WORDNET 
Alessandro Artale, Bernardo Magnini and Carlo Strapparava 
IRST, 1-38050 Povo TN, Italy 
e-mail: {artalelmagninilstrappa}@irst. itc. it 
Abstract 
We present a prototype of the Italian version of 
WORDNET, a general computational lexical re- 
source. Some relevant extensions are discussed 
to make it usable for parsing: in particular we 
add verbal selectional restrictions to make lexi- 
cal discrimination effective. Italian WORDNET 
has been coupled with a parser and a number of 
experiments have been performed to individu- 
ate the methodology with the best trade-off be- 
tween disambiguation rate and precision. Re- 
sults confirm intuitive hypothesis on the role of 
selectional restrictions and show evidences for 
a WORDNET-Iike organization of lexical senses. 
1 Introduction 
WORDNET is a thesaurus for the English language based 
on psycholinguistics principles and developed at the 
Princeton University by George Miller \[Miller, 1990\]. 
It has been conceived as a computational resource, so 
improving some of the drawbacks of traditional dictio- 
naries, such as the circularity of the definitions and the 
ambiguity of sense references. Lemmas (about 130,000 
for version 1.5) are organized in synonyms classes (about 
100,000 synsets). 
The more evident problem with WORDNET is that 
it is a lexical knowledge base for English, and so it 
is not usable for other languages. Here we present 
the efforts made in the development of the Italian ver- 
sion of WoRDNET \[Magnini and Strapparava, 1994; 
Magnini et al., 1994\], a project started at IltSW about 
one year ago in the context of ILEX \[Delmonte et al., 
1996\] a more general project aiming at the realization of 
a computational dictionary for Italian 1. 
A second problem with WORDNET is that it needs 
some important extensions to make it usable for effective 
parsing. In particular, parsing requires a powerful mech- 
anism for lexical discrimination, in order to select the 
appropriate lexical readings for each word in the input 
sentence. In this paper we also explore the integration 
of "selectwnal res~mctwng', a traditional technique used 
1The ILEX consortium includes the Computer Science 
Department of the University of Torino, the University of 
Venezia, and the brench of the University of Torino at 
Vercelli. 
for lexical discrimination, with Italian WOttDNET. Se- 
lectional restrictions provide explicit semantic informa- 
tion that the verb supplies about its arguments \[Jackend- 
off, 1990\], and should be fully integrated into the verb's 
argument structure. 
Although selectional restrictions are different in dif- 
ferent domains \[Basili et al., 1996\] we are interested in 
finding common invariants across sublanguages. It is our 
intention to build a very general instrument that can be 
afterwards tuned to particular domains by identifying 
more specific uses. The main motivation is to have both 
a robust and a computational efficient natural language 
system. On one hand, robustness is emphasized because 
sentences that are syntactically correct, but which are 
not successfully analyzed in the specific application do- 
main, can have a valid linguistic meaning. On the other 
hand, we are able to filter the sentence meanings on a 
linguistic basis. This phase discards the unplausible lec- 
tures pruning the search space by looking for compatibil- 
ity semantic relations. This kind of discrimination can 
be realized with computationally effective algorithms by 
exploiting the lexical taxonomy of WORDNET, postpon- 
ing more complex and expensive computations to the 
domanin specific analysis. 
The paper is structured as follow. Section 2 de- 
scribes the Italian prototype of WORDNET; while sec- 
tion 3 shows how selectional restrictions has been added 
to verb senses. Section 4 shows how Italian WOttDNET 
has been coupled with the parser, both for describing 
lexical senses and as a repository for selectional restric- 
tions. Section 5 reports a number of experiments that 
has been performed to individuate the methodology de- 
sign with the best trade-off between disambiguation rate 
and precision. Finally section 6 provides some conclusive 
remarks. 
2 The Italian WORDNET Prototype 
The Italian version of WORDNET is based on the as- 
sumption that a large part of the conceptual relations 
defined for English (about 72,000 ISA relations and 
5,600 PART-OF relations) can be shared with Italian. 
WORDNET can be described as a lexical matrix with 
two dimensions: the lexical relations, which hold among 
words and so are language specific, and the conceptual 
relations, which hold among senses and that, at least 
in part, we consider independent from a particular lan- 
32 
Wo~s P 
Meanings 
HASPAR'r \ 
Lonymy 
;ynsS) 
Pohsemy 
Figure h Multilingual lexical matrix 
guage. The Italian version of WORDNET aims at the 
realization of a multilingual lexical matrix through the 
addition of a third dimension relative to the language. 
Figure 1 shows the three dimensions of the matrix: (a) 
words in a language, indicated by YY~ ; (b) meanings, in- 
dicated by .A4~; (c) languages, indicated by £k. From 
an abstract point of view, to develop the multilingual 
matrix it is necessary to re-map the Italian lexical forms 
with corresponding meanings (.A4,), building the set of 
synsets for Italian (making explicit the values for the 
I intersections £~). The result will be a complete redefi- 
nition of the lexical relations, while for the semantic re- 
lations, those originally defined for English will be used 
as much as possible. 
An implementation of the Multilingual lexical matrix 
has been realized which allows a complete integration 
with the English version and the availability of all the 
translations for the Italian lemmas. The architecture is 
easily extendable to other languages. The integration 
with the computational lexicon ILEX is under develop- 
ment: it will make the access to other levels of lexical in- 
formation, such as morphological classes, syntactic cate- 
gories and sub-categorization frames available. The Ital- 
ian version of WORDNET, in December 1996, included 
about 10,000 lemmas (7,000 nouns, 700 verbs, 1,500 
adjectives, 600 adverbs). 
Till now, data acquisition has been mostly manual, 
with the help of a graphical interface; however a basic 
goal of the project is the experimentation of techniques 
for the (semi)automatic acquisition of data. Algorithms 
for the resolution of the ambiguities in the coupling with 
the English WORDNET have been developed. Versions 
automatically created are then tested against manually 
acquired data, with the aim of incrementally improve the 
precision level. A final manual check is performed for all 
the data automatically acquired. It is also foreseen the 
use of corpora to extract contextual information to be 
used during the disambiguation process. 
3 Adding Selectional Restrictions to 
Verbs 
A number of steps have been followed to add selectional 
restrictions to Italian WORDNET. First, Italian verb 
senses were extracted from a paper version of an Italian 
dictionary and checked against a corpus of genereric Ital- 
ian texts. Each verb sense has been then coupled with 
one or more English WORDNET synsets 2. This phase 
has been performed manually with the help of a graph- 
ical interface (see figure 2) that includes four integrated 
working tools: (i) a bilingual dictionary with more than 
30,000 lemmas; (ii) a graph that allows the visualiza- 
tion of the coupling with the English WORDNET; (iii) 
the bilingual WoRDNET, that behaves exactly like the 
English version with the additional possibility to browse 
the Italian semantic network; (iv) finally, the working 
cards allow the insertion, modification and check of the 
data for a synset. The result of this phase is the ex- 
tension of the English WordNet with the Italian synsets. 
Figure 3 shows the correspondence between English and 
Italian synsets for the verb Scrivere (Write). 
The next step is the definition of the sense subgate- 
gorization frame. This includes both syntactic informa- 
tion (i.e., argumental positions, prepositions on indirect 
objects, category type) and semantic information, such 
as thematic roles and selectional restrictions. Syntactic 
information are associated to single verbs, while seman- 
tic information are associated to the whole synset, i.e., 
semantic participants are shared among all the verbs be- 
longing to the synset. 
We built selectional restrictions using the synsets of 
the noun hyerarchy. Two different possibilities for defin- 
ing selectional restrictions are considered: 
1. Selectional restrictions obtained from the frames 
currently provided by WoRDNET. 
~As for figurative uses, they can also be coupled with 
WoRDNET provided that an appropriate synset do exist. 
33 
!  en$$ Z in¥1are msrld~e $cnveM spedlre 
~eeilse 3 8c~vsre pubbl~csr8 
Se.se 4 
Se"le 5 Sc~Wm ~dlge~ comporm 
Figure 2: The Italian WORDNET interface. 
2. Selectional restrictions obtained from the whole 
WORDNET noun hierarchy. 
As far as the first hypothesis is concerned, WoRDNET 
describes all the English verbs resorting to a set of 35 
different syntactic frames, which in turn include only 
two restrictions, that is Something and Somebody. For 
example, the frames provided for the verb Write in the 
synset {Publish, Write} are: 
Somebody... s 
Somebody... s Something 
The problem arising in using these two restrictions 
is that they are completely uncorrelated to the noun 
synsets, then, they have to be matched with the proper 
synsets in the noun hierarchy. The concept Somebody in- 
cludes not only the synset Person but also all the synsets 
denoting group of people that could hold the agent the- 
matic role. We defined Somebody using the following 
boolean combination of synsets: 
Somebody 
Person V People V People-Multitude V 
(Social-Group A ~(Society V Subculture V 
Political-System V Moiety V Clan)) 
Something is defined as ~he complement of Somebody. 
In the second hypothesis selectional restrictions are 
taken from the whole noun hierarchy. As an example, fig- 
ure 4 illustrates the senses for the Italian verb Scrivere 
(Write) found in Italian WORDNET. For each sense we 
report a conventional name - which unambiguously iden- 
tifies the synset - and the argumental positions admit- 
ted for that sense, with the indication of the selectional 
restrictions. The appropriate combination of synsets for 
an argumental position has to be both enough general to 
preserve all the human readings, and enough restricted 
for discriminating among different senses of both verb 
and noun. 
Founding the appropriate selectional restrictions re- 
vealed itself difficult and time consuming. The process 
required a deep search into the WOI~DNET noun hier- 
archy. In order to achieve a good trade-off between dis- 
crimination power and precision level we adopted an em- 
pirical process with successive steps of refinement. We 
started with general selectional restrictions and then we 
validate them against experimental results. This itera- 
tire process ended with complex selectional restritions 
for verbs, as the figure 4 shows. 
The WORDNET verb taxonomy is based on the ~ro- 
ponymy relation, which is defined as the co-occurrence 
of both lexical implication and temporal co-extension be- 
tween two verbs. We would note that, every time a tro- 
ponymy relation between two verbs holds, an ISA rela- 
34 
Synset Label Italian Synset English Synset 
Write 
Write-Music 
Writs-Communicate 
Write-Publish 
Write-Send 
{scrivere redigere comporre} 
{scrivere comporre} 
{scrivere comunlcare_per..iscritto} 
{scrivere pubblicare } 
{inviare mandare scrivere spedire} 
{write compose pen indite} 
{compose write write_music} 
{write communicate_by.writing express.by_writing} 
{publish write} 
{mail write post send} 
Figure 3: Correspondences between Italian and English synsets for the verb 'scrsrere' (write). 
WoRDNET Synset \[ Subject Object Indlrect-Object 
Write Somebody ---- Written-MateriaiV 
Symbolic-Repres V SayingV 
Correspondence V Sentencer 
Message V Message-ContentV 
Code V Symbol V Dater 
Lan~..mge--Unit V PropertyV 
Address-SpeechV Print-Media 
Write-Music Person Music -- 
Write-Communicate Somebody Somebody (Written-4{aterialA ~Section)V 
Symbolic-Repres V SayingV 
Sentence V NameV 
Message V Message-ContentV 
Code V Date V Property 
Write-Publish Somebody Written-MateriaiV Print..-MediaV Publishing-House 
(Print-MediaA ~Section) 
Write-Send Somebody CorrespondenceV MessageV Somebody 
Letter-Missive 
Figure 4: Lexical entries for Scriveze (Writs). 
tion between the correspondent selectional restrictions 
holds, too. 
4 Coupling WORDNET and a TFS Parser 
In this section we describe the architecture we used for 
checking WORDNET usability in parsing. Italian WORD- 
NET has been used in two different phases of the linguis- 
tic analysis. On a first phase, we use Italian WORD- 
NET as a lexicon repository to carry on lexical analysis. 
During the semantic analysis Italian WORDNET is used 
as a kind of Knowledge Base (KB) exploiting the struc- 
tural relationships among synsets. In particular, we used 
the supertype/subtype-like hierarchy of synsets during 
the parsing process in order to discard unplausible con- 
stituents on a semantic base. 
The parser used is a CYK chart parser embedded in 
the GEPPETTO environment \[Ciravegna ¢t al., 1996\], and 
coupled with a proper unification algorithm. GEPPETTO 
is based on a Typed Feature Logic \[-Carpenter, 1992\] for 
the specification of linguistic data. The GEPPETTO envi- 
ronment allows to edit and debug grammars and lexica, 
linking linguistic data to a parser and/or a generator, in- 
tegrating various form of KBs, and using specialized pro- 
cessors (e.g., morphological analyzers). In particular, we 
integrated the hierarchical structure of WORDNET as an 
external KB, while an ISA function uses the WORDNET 
hierarchy in order to check subsumption relationships 
between WoRDNET synsets. 
The grammar is written adopting a HPSG-like style, 
and each rule is regarded as Typed Feature Structure 
(TFS). For the current experiment the grammar cover- 
age is limited to very simple verbal sentences formed by 
a subject, a main verb together with its internal argu- 
ments and, possibly, an adjunct phrase. Observe that, 
the syntactic analysis does not take into account the pp- 
attachment case. We excluded the possibility to capture 
these complex nominal phrases. Indeed, the object of the 
experiment is to disambiguate among WORDNET senses 
of both verbs and nouns on the basis of the lexical se- 
mantic restrictions for the arguments of the verb and the 
lexical semantic associated to the noun. 
A condition for using WORDNET coupled with the 
GEPPETTO environment is to bring it in a format ef- 
fectively usable. The exploited idea was to rebuild the 
WORDNET hierarchy in CLOS, the object-oriented part 
of COMMON LISP. The advantages of this approach is 
35 
Word 
Regina (Queen) 
Art~colo (Article) 
Lettera (Letter) 
Libro (Book) 
WoBDNET SynsetLabe~ 
queen-Insect,Queen-Regnant,Queen-Wi~e, 
Queen-Card, Queen-Chess 
Article-Artefact,Article-Clause, 
Article-Grawmar, Article-Document 
Letter-Missive, Letter-Alphabet 
Book-Publication, Book-Section,Book-Object 
Figure 5: Lexical entries for nouns. 
the possibdity to implement a fast and flexible access 
to the synsets hierarchy and, in particular, an efficient 
ISA functionality as required for the semantic checking 
during the parsing. The arguments to ISA function may 
be a complex boolean combination of synsets (e.g., see 
selectional restrictions in figure 4). 
The parser controls the overall processing. Whenever 
it tries to build a (partially recognized) constituent it 
incrementally verifies the admissibility of the semantic 
part of such a constituent, using the WORDNET hierar- 
chy. In particular, whenever a noun is associated with 
a verbal argument the ISA function is triggered to check 
whether the synset of the noun is subsumed by the selec- 
tional restriction of the corresponding verbal argument. 
Due to the large number of analyses, it is useful to dis- 
card unplausible constituents as soon as possible to cut 
the search space. This has been obtained interliving the 
syntactic and semantic processes: as soon as the seman- 
tic test fails the constituent is rejected. 
5 Experiments and Results 
In this section we describe the empirical results obtained 
by coupling a WoRDNET based lexicon with a parser In 
our intention, the experiment would bring evidence for 
the following aspects: 
. Plausibility of WoRDNBT senses for describing a 
lexical entries; 
e Usability of WoRDNET for carrying out lexical dis- 
crimination. 
The experiment has been carried out on 60 sentences 
with 1201 different lectures, and formed by using seven 
verbs (wr~te, eat, smell, corrode, buy, receive, assocza~e) 
coupled with fifty common nouns and two proper nouns. 
In the general experimental setting a sentence is given to 
the parser in a situation characterized by multiple lexi- 
cal entries for each single word (one for each WoRDNET 
sense). The analyses produced by the parser are com- 
pared with the set of interpretations given by a human. 
As far as nouns are concerned, a lexlcal entry includes 
all the senses found in Italian WORDNET. Some of the 
nouns used in the experiment are shown in figure 5. As 
for verbs, we started from the Italian WoRDNET senses 
and then we faced to the problem of mdividuatmg the 
proper selectional restrictions for each argumental posi- 
tion of the verb subcategorization frame as seen before. 
So we build a small number of lexical entries, by means 
of which we composed the sentences of the experiment. 
We experiment the two hypotheses on selectional restric- 
tions presented in section 3, i.e., the one with general 
WoRDNET frames and the other with more refined se- 
lectional restrictions. 
As an example, figure 6 shows the output of the parser 
for the sentence "La regzna scrtsse ~ma leltera a Gzo- 
vanng' ("The queen wrote a letter to John"). As a con- 
venction, we decided to describe internal arguments with 
the symbol '/', while a '//' denotes a verbal adjunct. 
This sentence was selected because it produces an high 
number of lectures (40) among the test suite sentences. 
This is due to both the verb sense ambiguity (write has 
five senses) and to the noun ambiguitms (queen has five 
senses, and letter two). Note that the parser excludes 
the sense Write-Publish since the indirect object must 
be introduced by the Italian prepositions "su" or "per ~' 
(in English "on" or "fort'), while in this example we have 
the preposition "a" ("to"). 
Let us first consider the results obtained in the second 
experimental setting, which best approximate the human 
judgment. Out of the eight interpretations accepted, 
two are implausible for a human reader. This caused by 
the contemporary presence of the sense Letter-Missive 
and of the proper noun John as, respectively, patient and 
beneficiary of the write verb sense. Note that, each of 
these senses are, per se, valid arguments since they sat- 
isfy the selectional restrictions. 
In the first experimental setting, the presence of 
weaker selectional restrictions (just ~omebody, Some- 
thzng) ymlds more spurious readings. As a matter of 
fact, the more evzdent problem is that many times ar- 
gumental positions are not properly filled. For ex- 
ample, that "a queen-Regnant could ~/rlte-Music an 
Letter-Missive" (i.e., a kind of correspondence) is one 
of the allowed readings. 
Figure 7 reports the quantitative results of the ex- 
periment. They are preliminary since they have been 
obtained on a limited number of sentences (60). The 
figure shows, for each experimental setting, the num- 
ber of total readings produced by the parser, the dis- 
crlmination rate (i.e., the rate of the lectures rejected: 
(1201- z)/1201), and the precision (i.e., the rate of cor- 
rect lectures: 122/z). These results have to be inter- 
preted considering that the focus of the experiment is 
on selectional restrictions, which of course is just one 
among the various kinds of information occurring dur- 
ing lexical discrimination. It is worth mentioning here, 
among the others, some crucial information sources: (i) 
36 
Sentence 
Restrictions 
Number of lectures 
Restrictions 
Number of lectures 
Restrictions 
Number of lectures 
Lectures 
Human Judgment 
Number of lectures 
Lectures 
La regina scrisse una lettera a Giovanni 
(THE QUEEN WROTE A LETTER TO JOHN) 
No semantic discrimination 
40 
Discrimination with WoRDNET Frames (I exp. setting) 
16 
Discrimination with WORDNET Full Hierarchy (II exp. setting) 
8 
(1, 2, 3, 4, 5, 6) 
Queen-Wife/Write/Letter-Alphabet//John 
Queen-Regnant/Write/Letter-Alphabet//John 
(~) 
(8) 
6 
Queen, Regnant/Wfite-Comunicate/Letter-Missive/John (1) 
Queen-Wife/Write-Comunicate/Letter- Missive/John (2) 
Queen-Regnant/Writ e-Send/Letter-Missive/John (3) 
Queen-Wife/Wnte-Send/Letter-Missive/John (4) 
Queen-Regnant /Write /Let ter- Missive/ / John (5) 
Queen-Wife/Write/Letter- Missive//John (6) 
Figure 6: An example of sentence 
Experimental Setting ~ of lectures \[ Discrimination Rate Precision 
Without discrimination 1201 0% 10% 
Discrimination with WoRDNET Frames 688 43% 18% 
Discrimination with the WoRDNET Full Hierarchy 164 86% 74% 
Human Judgment 122 90% 100% 
Figure 7: Quantitative results obtained on 60 sentences 
world knowledge: e.g., it is very strange to Write an 
Article-Clause on a Newspaper-Periodic; (ii) aspec- 
tual properties of the verb: e.g., it is very difficult to 
interpret La regina sta scmvendo un artscolo sul gzonale 
(The queen ss wmtmg an article on the newspaper) with 
the Write-Publish sense, because publishing is a cul- 
minative process. 
6 Conclusions 
they are more detailed. 
Some general suggestions can be drawn in order to 
individuate a trade-of between the effort necessary for 
describing selectional restrictions and the lexical disam- 
biguation obtained. Although the definition of detailed 
selectional restrictions was highly time comsuming, our 
experience shown that this approach obtains good results 
both in the discrimination rate and in the precision. 
In this paper we presented the approach underlying the 
Italian WoRDNET, a general computational lexical re- 
source. A prototype has been realized which implements 
a multilingual lexical matrix. Data acquisition has been 
mostly manual with the help of a graphical interface. 
In light of the concrete use of the Italian WORDNET we 
propose the integration of selectional restrictions into the 
verbal taxonomy. An empirical verification has been per- 
formed which confirms the intuitive hypothesis that se- 
lectional restrictions crucially affect lexical disambigua- 
tion and that the discrimination rate improves as far as 
The experiment also brings evidence for a WORDNET 
like sense organization. In fact, different selectional re- 
strictions apply to different senses allowing to discrimi- 
nate among different readings. However, an important 
drawback in WoRDNET is the lack of relations among 
related senses of the same word. This is particularly 
crucial for the logzcal pohsemy cases \[Pustejovsky, 1995\], 
when a sense can be generated from another in a pre- 
dictable way, and, in general, to treat the so called "verb 
mutability effect" as discussed in \[Gentner and France, 
1988\]. 
37 

References 
\[Basili et al., 1996\] It. Basili, M.T. Pazienza, and P. Ve- 
lardi. Integrating general-purpose and corpus-based 
verb classification. Computational Linguislies, 22(4), 
1996. 
\[Carpenter, 1992\] B. Carpenter. The logic of typed fea- 
ture Structures. Cambridge University Press, Cam- 
bridge, Massachusetts, 1992. 
\[Ciravegna e~ al., 1996\] F. Ciravegna, A. Lavelli, D. Pe- 
trelli, and F. Pianesi. The GEPPETTO environment, 
Version 2.0.b. User Manual. Technical report, IRST, 
1996. 
\[Delmonte et al., 1996\] R. Delmonte, G. Ferrari, A. Goy, 
L. Lesmo, B. Magnini, E. Pianta, O. Stock, and 
C. Strapparava. ILEX - un dizionario computazionale 
dell'italiano. In Proc. of 5 ~h Convegno Nazionale 
della Associazione Italiana per l'Intelligenza Artifi- 
eiale, Napoli, September 1996. 
\[Gentner and France, 1988\] D. Gentner and I.M. Fran- 
ce. The verb mutability effect: Studies of the combi- 
natorial semantics of nouns and verbs. In S. Small, 
G.W. Cottrell, and M.K. Tanenhaus, editors, Lezicai 
Ambiguity Resolution. Morgan Kaufman, San Mateo, 
California, 1988. 
\[Jackendoff, 1990\] Ray Jackendoff. Semantic Structures. 
Current Studies in Linguistics. The MIT Press, Cam- 
bridge, Massachusetts/London, England, 1990. 
\[Magnini and Strapparava, 1994\] B. Magnini and 
C. Strapparava. Costruzione di una base di conoscenza 
lessicale per l'italiano basata su WordNet. In Proc. of 
the 28 ~h International Congress of the Sociel~i Lin- 
guistica Italiana; Palermo, Italy, Ottobre 1994. 
\[Magnini el al., 1994\] B. Magnini, C. Strapparava, 
F. Ciravegna, and E. Pianta. Multilingual lexical 
knowledge bases: Applied WordNet prospects. In The 
Future of Dictionary - Workshop sponsored by Rank 
Xerox European Research Centre and ESPRIT Project 
Acquilex II, Grenoble, France, October 1994. 
\[Miller, 1990\] G. A. Miller. WordNet: "An on-line lexi- 
cal database", International Journal of Lexicography 
(special issue), 3(4):235-312, 1990. 
\[Pustejovsky, 1995\] J. Pustejovsky. The Generative Lex- 
icon. The MIT Press, Cambridge, Massachusetts, 
1995. 
