Towards a Meaning-Full Comparison of Lexieal Resources 
Kenneth C Lltkowska 
CL Research 
9208 Gue Road 
Damascus, MD 20872 
ken@clres corn 
http//www tires tom 
Abstract 
The mapping from WordNet to Hector senses m Senseval provides a "gold standard" against wluch to 
judge our ability to compare lexlcal resources The "gold standard" is provided through a word overlap analysis 
(with and without a stop list) for flus mapping, achieving at most a 36 percent correct mapping (inflated by 9 
percent from "empty" assignments) An alternaUve componenttal analysis of the defimtaons, using syntacUc, 
collocatmnal, and semantac component and relation identification (through the use ofdefimng patterns integrated 
seamlessly mto the parsing thclaonary), provides an almost 41 percent correct mapping, with an additaonal 4 
percent by recogmzmg semantic components not used in the Senseval mapping Defimtion sets of the Senseval 
words from three pubhshed thclaonanes and Dorr's lextcal knowledge base were added to WordNet and the Hector 
database to exanune the nature of the mapping process between defimtton sets of more and less sco\[~e The 
tecbauques described here consUtute only an maaal implementation of the componenUal analysis approach and 
suggests that considerable further improvements can be aclueved 
Introduction 
The difficulty of companng lemcal resources, 
long a s~gnfficant challenge in computauonal 
hnguistlcs (Atlans, 1991), came to the fore in the 
recent Senseval competatton (IOlgarnff, 1998), when 
some systems that relied heavily on the WordNet 
(Miller, et al, 1990) sense inventory were faced with 
the necessity of using another sense inventory 
(Hecto0 A hasty solutaon to the problem was the " 
development of a map between the two inventories, 
but some part~cipants expressed concerns that use of 
flus map may have degraded their performance to an 
unknown degree 
Although there were disclaimers about the 
WordNet-Hector map, it nonetheless stands as a 
usable gold standard for efforts to compare lexical 
resources Moreover, we have a usable baseline (a 
word overlap method suggested m (Lesk, 1986)) 
against which to compare whether we are able to 
make improvements m the mapping (since flus 
method has been shown to perform not as well as 
expected (Krovetz, 1992)) 
We first describe the lextcal resources used m 
the study (Hector, WordNet, other dicUonanes, and a 
lex~cal knowledge base), first characterizing them in 
terms ofpolysemy and the types of leracal 
mformaUon each contmns (syntacUc properties and 
features, semantac components and relaUons, and 
collocaUonal properties) We then present results of 
perfornung the word overlap analysis of the 18 verbs 
used m Senseval, analyzing the definitions m 
WordNet and Hector We then expand our analysis 
to include other dictionaries We describe our 
methods of analysis, particularly the methods of 
parsing defimtaons and identff)qng semantic 
relations (semrels) based on defimng patterns, 
essentially takang first steps m Implementing the 
program described by Atkms and focusmg on the use 
of"meamng" full mformataon rather than statistical 
mformaUon We identify the results that have been 
achieved thus far and outline further steps that may 
add more "meanmg" to the analysis 
IAll analyses described m this paper were performed 
automatically using functlonahty incorporated m 
DIMAP (Dictionary Maintenance Programs) 
(available for immediate download at (CL Research, 
1999a)) This includes automatac extracuon of 
WordNet reformation for the selected words 
(mtegrated m DIMAP) Hector defimtlons were 
uploaded into DIMAP dicUonanes after use of a 
conversmn program Defimtlons for other 
30 
The Lexical Resources 
Tlus analysis focuses on the mmn verb senses 
used In Senseval (not ichoms and phrases), 
specifically the followmg 
AMAZE, BAND, BET, BOTHER, BURY, CALCULATE, 
CONSUME, DERIVE, FLOAT, HURDLE, INVADE, 
PROMISE, SACK, SANCTION, SCRAP, SEIZE, 
SHAKE, SLIGHT 
The Hector database used In Senseval consists of a 
tree of senses, each of which contains defimttons, 
syntactic properties, example usages, and "clues" 
(collocational information about the syntactic and 
semantic enwronment in wluch a word appears in 
the spectfic sense) The WordNet database contmns 
synonyms (synsets), perhaps a defimtton or example 
usages (gloss), some syntactic mformaUon (verb 
frames), hypernyms, hyponyms, and some other 
semrels (ENTAILS, CAUSES) 
To extend our analysis In order to look at other 
issues of lexacal resource comparison, we have 
included the defirauons or leracal information from 
the following additional sources 
• Webster's 3 ra New International Dictionary (W3) 
• Oxford Advanced l.earners D~ctlonary (OALD) 
• American Hentage DlcUonary (AI-ID) 
• Dorr's Lexacal Knowledge Base (Dorr) 
We used only the defimuons from W3, OALD, and 
AHD (which also contmn sample usages and some 
collocattonal information m the form of usage notes, 
not used at the present tame) Dorr's database 
contains thematic grids wluch characterize the 
thematic roles of obligatory and optional semanuc 
components, frequently identifying accompanying 
preposmons (Olsen, et al, 1998) 
The following table identities the number of 
senses and average overall polysemy for each of 
these resources 
dictionaries were entered by hand 
Word 
amaze 
band 
bet 
bother 
bury 
calculate 
consume 
denve 
float 
hurdle 
invade 
pronuse 
sack 
sanction 
scrap 
seize 
shake 
shght 
Average 
Polysemy 
o 
o 
o 
1 2 4 2 
3 1 II 4 
4 2 5 5 
7 6 9 7 
12 6 14 5 
5 5 10 9 
6 6 8 8 
6 5 15 5 
16 4 41 14 
2 1 4 3 
6 2 10 5 
5 4 7 4 
4 4 6 3 
2 2 5 2 
3 1 3 3 
11 6 21 13 
8 8 37 17 
1 1 6 3 
O 
1 2 
2 4 
1 3 
4 4 
8 1 
3 1 
3 1 
3 2 
10 5 
1 0 
3 1 
3 2 
2 0 
1 1 
1 0 
7 1 
7 12 
I 0 
57 37 120 62 34 22 
Word Overlap Analysis 
We first estabhsh a baseline for automatic 
replication of the lexicographer's mappmg from 
WordNet 1 6 to Hector, using a s~mple word overlap 
analysis smular to (Lesk, 1986) The lextcographer 
mapped the 66 WordNet senses (each synset m 
which a test occurred) Into 102 Hector senses A 
total of 86 assignments were made, 9 WordNet 
senses were gwen no assignments, 40 recewed 
exactly one, and 17 senses received 2 or 3 
asssgnments The WordNet senses contained 348 
words (about half of wluch were common words 
appeanng on our stop list, which contained 165 
words, mostly preposmons, pronouns, and 
conjunctions) The Hector senses selected m the 
word overlap analysis contained about 960 words (all 
Hector senses contained 1878 words) 
We performed a strict word overlap analysts 
(with and wsthout a stop hst) between tile definlUons 
in WordNet and the Hector senses, that is, we did 
not attempt to ldenttfy root forms of Inflected words 
We took each word m a WordNet sense and 
determined whether ~t appeared in a Hector sense, 
we selected a Hector sense based on the highest 
percentage of words over all Hector senses An 
31 
empty selection was made ff all the words in the 
WordNet sense did not appear in any Hector sense, 
only content words were considered when the stop 
hst was used 
For example, for bet, WordNet sense 2 (stake 
(money) on the outcome of an issue) mapped into 
Hector sense 4 ((of a person) to risk (a sum of money 
or property) m thts way) In this case, there was an 
overlap on two words (money, 039 in the Hector 
defimtlon (0 13 of its 15 words) without the stop list 
When the stop list was invoked, there was an overlap 
of only one word (money, 0 07 of the Hector 
defimtion) In this case, the lexicographer had made 
three assignments (Hector senses 2, 3, and 4), our 
scoring method treated flus as only 1 out of 3 correct 
(not using the relaxed method employed in Senseval 
of treating flus as completely correct) 
Without the stop hst, our selections matched the 
lexicographer's in 28 of 86 cases (32 6%), using the 
stop list, we were successful in 31 of 86 cases 
(36 1%) The improvement arising when the stop 
list was used is deceptive, where 8 cases were due to 
empty assignments (so that only 23 cases, 26 7%, 
were due to matching content words) Overall, only 
41 content words were involved in these 23 successes 
when the stop list was used, an average of I 8 
content words 
To summanze the word overlap analysis (1) 
despite a ncher set of defimtions in Hector, 9 of 66 
WordNet senses (13 6%) could not be assigned, (2) 
despite the greater detail in Hector senses compared 
to WordNet senses (2 8 times as many words), only 
1 8 content words participated in the assignments, 
and (3) therefore, the defimng vocabulary between 
these two definition sets seems to be somewhat 
divergent Although it might appear as if the word 
overlap analysis does not perform well, this is not 
the case The analysis provides a broad overview of 
the defimuon companson process between two 
definmon sets and frames a deeper analysis of the 
differences Moreover, it appears that the accuracy 
of a "gold standard" mapping is not crucially 
important The quality of the mapping may help 
frame the subsequent analysis more precisely, but it 
seems sufficient that any reasonable mapping will 
suffice This will be discussed further after 
presenting the results of the componentlal analysis of 
the defimtlons 
32 
Meaning-Full Analysis of Definitions 
The deeper analysis of the mapping between two 
defimtion sets relies primarily on two major steps 
(1) parsing definitions and using defimng patterns to 
identify semrels present m the definitions and (2) 
relaxing values to these relations by allowing 
"synonymic" substitution (using WordNet) Thus, 
for example, ffwe identify hypernyms or instruments 
from parsing a defimtion, we would say that the 
defimtions are "equal" not just ffthe hypernym or 
instrument is the same word, but also Lf the 
hypernyms or instruments are members of the same 
synset 
This approach is based on the finding 
(Litkowski, 1978) that a dictionary induces a 
semantic network where nodes represent "concepts" 
that may be lexicahzed and verbalized in more than 
one way This finding implies, in general, the 
absence of true synonyms, and instead the kind of 
"concept" embodied in WordNet synsets (with 
several lexical items and phraseologles) A slmdar 
approach, parsing defimtlons and relaxing semrel 
values, was followed in (Dolan, 1994) for clnstenng 
related senses w~thin a single dictionary 
The ideal toward which this approach strives is 
a complete identification of the meamng components 
included in a defimtion The meaning components 
can include syntactic features and charactenstlcs 
(including subcategonzation patterns), semantm 
components (realized through identification of 
semrels), selectional restrictions, and coUocational 
specifications 
The first stage of the analysis parses the 
definitions (CL Research, 1999b, Litkowski, to 
appear) and uses the parse results to extract (via 
defining patterns) semrels Since definitions have 
many idiosyncrasies (that do not follow ordinary 
text), an important first step in this stage is 
preprocessmg the definition text to put it into a 
sentence frame that facilitates the extraction of 
semrels 2 
2Note that the stop hst is not applicable to the 
definition parsing The parser is a full-scale 
sentence parser, where prepositmns and other words 
on the stop list are necessary for successful parsing 
Moreover, inclusion of the prepositions is cmcml to 
the method, since they are the bearers of much 
semrel information 
The extractmn of semrels examines the parse 
results, a e, a tree whose mtermedaate nodes 
represent non-ternunals and whose leaves represent 
the lextcal atems that compnse the defimuons, where 
any node may also include annotations such as 
characterizations of number and tense For all noun 
or verb defimttons, flus includes Identification of the 
head noun (with recogmtton of"empty" heads) or 
verb, for verbs, we signal whether the defimtaon 
contmned any selecttonal restnctmus (that as, 
pamcular parenthesazed expressaons) for the subject 
and object We then exanune preposattonal phrases 
In the defimUon and deterrmne whether we have a 
"defining pattern" for the preposaUon whach we can 
use as mdacaUve of a partacular semrel We also 
identify adverbs m the parse tree and look these up 
in WordNet to adentffy an adjecuve synset from 
wluch they are derived (if one is gwen) 
The defimng pattems are actually part of the 
dictionary used by the parser That is, we do not 
have to develop specafic routines to look for specLfic 
patterns A defimng pattern ~s a regular expressaon 
that arlaculates a syntactac pattern to be matched 
Thus, to recograze a "manner" semrel, we have the 
foUowmg entry for "m" 
m(dpat((~ rep0 l(det(0)) adj manner(0) 
st(manner)))) 
This allows us to recognize "m" as possibly gwmg 
rise to a "manner" component, where we recogmze 
"m" (the tdde, which allows us to specify partacular 
elements before the "m" as well), vath a noun phrase 
that consasts of 0 or 1 determiner, an adjectwe, and 
the lateral "manner" The '0 ° after the detenmner 
and the hteral mdacate that these words are not 
copied into the value for a "manner" role, so that the 
value to the "manner" semrel becomes only the 
adjectwe that as recogmzed 
The second stage of the analysis uses the 
populated lexacal database to compare senses and 
make the selectaons This process follows the 
general methodology used m Senseval (Lltkowska, to 
appear) Specifically, m the defimtaon comparison, 
we first exanune exclusaon cntena to rule out 
specific mappings These criteria include syntacUc 
properUes (e g, a verb sense that Is only transluve 
cannot map into one that Is only mtransRave) and 
collocataonal propertaes (e g, a sense that is used 
with a parUcle cannot map into one that uses a 
different particle) At the present tune, these are 
used only rmmmally 
33 
We next score each viable sense based on rots 
semrels We increment the score ff the senses have a 
common hypernym or If a sense's hypernyms belong 
• to the same synset as the other sense's hypernyms If 
a parUcular sense con~ns a large number of 
synonyms (that as, no differentiae on the hypernym) 
and they overlap consaderably m the synsets they 
evoke, the score can be increased substanUally 
Currently, we add 5 points for each match 3 
We increment the score based on common 
semrels In tins amtml tmplementaUon, we have 
defimng patterns (usually qmte nummal) for 
recogmzmg Instrument, means, location, purpose, 
source, manner, has-constituents, has-members, 
is-part-of, locale, and goal 4 We Increment the 
score by 2 points when we have a common semrel 
and then by another 5 points when the value Is 
~dentacal or m the same synset 
After all possable increments to the scores have 
been made, we then select the sense(s) w~th the 
lughest score Finally, we compare our selecuon 
with that of the gold standard to assess our mapping 
over all senses 
Another way an wluch our methodology follows 
the Senseval process as that at proceeds 
incrementally Thus, ~t ms not necessary to have a 
"final" perfect parse and mapping rouUne We can 
make conUnual refinements at any stage of the 
process and exarmne the overall effect As m 
Senseval, we may make changes to deal wath a 
particular phenomenon with the result that overall 
performance dechnes, but w~th a sounder basis for 
making subsequent amprovements 
Results of Componential Analysis 
The "gold standard" analysis Involves mapping 
66 WordNet senses with 348 words into 102 Hector 
senses with 1878 words Using the method 
described above, we obtained 35 out of 86 correct 
3At the present tame, we use WordNet to adentffy 
semreis We envaslon usmg the full semanlac 
network created by parsing all a dlcUonary's 
defimtaons Thas would include a richer set of 
semrels than currently included m WordNet 
4The defimng patterns are developed by hand We 
have onlyJust begun this effort, so the current set ms 
somewhat Impoverished 
mappmgs (407%), a shght improvement over the 31 
correct assignments usmg the stop-last word overlap 
techmque However, as mentioned above, the stop- 
hst techmque had aclueved 8 of its successes by 
matclung null assignments Consadered on tlus 
basins, ~t seems that the componentaal analysis 
techmque provides substantial ~mprovement In 
addition, our technique "erred" on 4 cases by malang 
assagnments where none were made by the 
leracographer We suggest that these cases do 
con~n some common elements of meaning and may 
conceivably not be construed as errors 
The mapping from WordNet to Hector had 
relatavely few empty mappings, senses for wtuch It 
was not possable to make an assignment These are 
the cases where at appears that the chetmnanes do 
not overlap and thus prowde a tentative mdacataon of 
where two dictionaries may have different coverage 
The cases of multiple assignments mchcate the 
degree ofamblgmty m the mapping The average m 
both darecUons between Hector and WordNet were 
donunated by the mabdaty to obtain good 
dascnnunatton for the word "semze" Thus, tlus 
method identifies individual words where the 
&scnnunatwe ablhty needs to be further refined 
• Perhaps more importantly, the componentml 
analysis method exploits consaderably more WordNet - Hector 
• mformauon than the word overlap methods 
Whereas the stop-hst word overlap mapping was 
• based on only 41 content words, the componenual ~ 
approach (In the selected mappings) had 228 hits in ~ .~ 
• developing ats scores, with only a small number of ~ .~ ~ 
defining patterns 
Comparison of Dictionaries 
tel 
O ~3 
0'3 
We next exanuned the nature of the 
mterrelalaons between parrs of chctaonanes w~thout 
use of a "gold standard" to assess the process of 
mapping For t/us purpose, we mapped m both 
&recttons between the paars {WordNet, Hector}, 
{W3, OALD}, and {W3, AHD We exanune Dorr's 
lexacal knowledge base for the amphcatlons It may 
have m the mapping process 
Neither WordNet nor Hector are properly 
v~ewed as chcuonanes, since there was no mtenuon 
to pubhsh them as such WordNet "glosses" are 
generally smaller (53 words per sense) compared to 
Hector (184 words per sense), whach contains many 
words specff3nng selectmnal restnct~ons on the 
subject and object of the verbs Hector was used 
primarily for a large-scale sense tagging project 
The three formal d~ctmnanes were subject to 
rigorous pubhslung and style standards The 
average number of words per sense were 87 
(OALD), 7 1 (AHD), and 9 9 (W3), w~th an average 
of 3 4, 62, and 120 senses per word 
Each table shows the average number of senses 
being mapped, the average number of assignments m 
the target dlCtmnary, the average number of senses 
for which no assagnment could be made, the average 
number of mulUple assignments per word, and the 
average score of the assignments that were made 
WN-Hector 37 47 06 17 119 
Hector-WN 57 64 14 22 113 
These points are further emphasized m the 
mapping between W3 and OALD, where the 
disparity between the empty and mulUple 
assagnments indicate that we are mapping between 
dictionaries qmte disparate This tends to be the 
case not only for the enUre set of words, but also is 
evident for individual words where there is a 
considerable d~spanty m the number of senses, 
wtuch then dominate the overall dlspanty Thus, for 
example, W3 has 41 defimUons for "float", while 
OALD has 10 We tend to be unable to find the 
specific sense m going from W3 to OALD, because at 
is likely that we have many more specific defimtlons 
that are not present In the other direction, we are 
hkely to have considerable ambiguity and multiple 
assignments 
W3-OALD 
OALD-W3 
W3 - OALD 
120 78 60 18 99 
34 60 07 32 86 
34 
A 
Between W3 and AHD, there ss less overall 
daspanty between the defimtaon sets, although since 
W3 Is tmabndged, we stall have a relatavely lugh 
number of senses m W3 that do not appear to be 
present m AHD Finally, It should be noted that the 
scores for the published dictaonanes tend to be a 
little lower than for WordNet and Hector Tlus 
reflects the hkehhood that we have not extracted as 
much mformataon as we dad m parsing and 
analyzmg the defimtaon sets used m Senseval 
W3 - AHD 
oJ 
• 'q O 
W3-AHD 120 115 40 36 90 
AHD-W3 6 2 9 1 1 2 4 1 9 1 
We next considered Dorr's lexacal database We 
first transformed her theta grids •to syntactic 
spectflcataons (transttave or lntransmttve) and 
identtficataon of semreis (e g, where she Identified 
an instr component, we added such a semrel to the 
DIMAP sense) We were able to identify a mappmg 
from WordNet to her senses for two words ("float" 
and "shake") for wluch Dorr has several entries 
However, smce she has considerably more semanuc 
components than we are currently able to recogmze, 
we dad not pursue this avenue any further at flus 
time 
More important than just mappmg between two 
words, Dorr's data mdacates the posstbday of further 
exploitation of a richer set of semanUc components 
Spectfically, as reported m (Olsen, et al, 1998), m 
descnbmg procedures for automatically acqumng 
thematic grids for Mandann Chinese, ~t was noted 
that "verbs that incorporate themaUc elements m 
their meamng would not allow that element to 
appear m the complement structure" Thus, by usmg 
Dorr's thematic grids when verb are parsed m 
defimtaons, it ~s possible to ~dentffy where partacular 
semantac components are lexicahzed and which 
others are transnutted through to the themaUc grid 
(complement or subcategonzataon pattern) for the 
defimendum 
The transmiss~on of semantic components to the 
thematic gnd ~s also reflected overtly m many 
defimtlons For example, shake has one definition, 
"to bnng to a specified condatton by or as ffby 
repeated qmck jerky movements" We would thus 
expect that the thematac grid for this defimtaon 
should include a "goal" And, •deed, Dorr's 
database has two senses whch reqmre a "goal" as 
part of their thematic grid Smularly, for many 
defimtaons m the sample set, we ~dentLfied a source 
defimng pattern based on the word "from," 
frequently, the object of the preposmon was the word 
"source" ttseff, mdacatmg that the subcategonzaUon, 
properties of the defimendum should •elude a 
source component 
Discussion 
Wlule the improvement m mapping by using the 
componentaal analysis techmque (over the word 
overlap methods) is modest, we consider these 
results qmte slgmficant m wew of the very small 
number of defimng patterns we have Implemented 
Most of the improvement stems from the word 
substatuUon pnnclple described earlier (as ewdenced 
by the preponderance of 5 point scores) This 
techmque also provides a mechamsm for bnngmg 
back the stop words, wz, the preposmons, wluch are 
the careers of mformatmn about semrels (the 2 point 
scores) 
The more general conclusion (from the word 
subsutuuon) is that the success arises from no longer 
considenng a defimtmn m ~solation The proper 
context for a word and its defimtions consists not 
.lUSt of the words that make up the definition, but 
also the total semantac network represented by the 
dictaonary 
We have aclueved our results by explomng only 
a small part of that network We have moved only a 
few steps •to that network beyond the mdawdual 
words and their definitions We would expect that 
further expansmn, first by the add•on of further and 
~mproved semrel defining patterns, and second, 
through the identaficataon of more pnmmve semanuc 
components, will add considerably to our abflay to 
map between lexacal resources We also expect 
~mprovements from consideration of other 
techniques, such as attempts at ontology ahgnment 
(Hovy, 1998) 
Although tile definition analysis provlded here 
was performed on definmons with• a stogie 
language, the vanous meamng components 
m 
m 
m 
m 
m 
m 
m 
m 
35 
correspond to those used in an Interhngua The use 
of the exUncuon method (developed m order to 
charactenze verbs m another language, Clunese) can 
frmtfully be applied here as well 
Two further observaUons about tlus process can 
be made The first is that rchance on a well- 
established semantic network such as WordNet ,s not 
necessary The componenUal analysis method rehes 
on the local neighborhood of words m the 
defimUons, not on the completeness of the network 
Indeed, the network ~tsel£ can be bootstrapped based 
on the parsing results The method can work vath 
any semanUc network or ontology and may be used 
to refine or flesh out the network or ontology 
The second observation is that it is not necessary 
to have a well-estabhshed "gold standard" Any 
mapping vail do All that Is necessary is for any 
mvesugator (lemcographer or not) to create a 
judgmental mappmg The methods employed here 
can then quanufy ttus mapping based on a word 
overlap analysis and then further examine tt based 
on the componenaal analysis The componenUal 
analysis method can then be used to exanune 
underlying subtleUes and nuances tn the defimUous, 
wluch a lemcographer or analyst can then examine 
m further detail to assess the mapping 
Future Work 
Tlus work has marked the first ume that all the 
necessary mfrastructure has been combmed tn a 
rudimentary form Because of its rudimentary status, 
the opportumUes for improvement are quite 
extensive In addlUon, there are many opportumUes 
for using the techmques descnbed here m further 
NLP apphcatlons 
First, the techmques described here have 
immediate apphcabtllty as part of a lexicographer's 
workstaUon When defimUons are parsed and 
semrels are zdenttfied, the resulUng data structures 
can be apphed against a corpus of instances for 
parUcular words (as m Senseval) for improving 
word-sense disamblguaUon The techmques will 
also permit comparing an entry vath Itself to 
deternune the mterrelattonshtps among ~ts 
defimUons and of companng the defimUons of two 
"synonyms" to deternune the amount of overlap 
between them on a defimtlon by defimUon bas~s 
Although the analys,s here has focused on the 
parsing of defimUous, the development of defimng 
patterns clearly extends to generalized text parsing 
since the defimng patterns have been incorporated 
mto the same chcttonary used for parsing free text, 
the patterns can be used threctly to identify the 
presence of parUcular semrels among sentenual 
consUtuents We are working to integrate th~s 
funcUonahty into our word-sense &sambiguaUon 
techruques (both the defimng patterns and the 
semrels) Even further, mt seems that matclung 
defimng patterns in free text can be used for lextcal 
acquisition Textual matenal that contains these 
patterns could concewably be flagged as providing 
defimUonal matenal which can then be compared to 
emstmg defimUons to assess whether their use ts 
cous,stent vath these defimUons, and ff not, at least 
to flag the inconsistency 
The tecluuques descnbed here can be apphed 
directly to the fields of ontology development and 
analysis of ternunologlcal databases For ontoiogles, 
vath or w~thout defimuons, the methods employed 
can be used to compare entries m dai'erent 
ontologles based pnmanly on the relattous m the 
ontology, both luerarclucal and other For 
ternunologlcal databases, the methods descnbed here 
can be used to exanune the set of conceptual 
relaUons lmphed by the defimtmus The defimuon 
parsing wall facd~tate the development of the 
termmolog~ca I network tn the pamcular field 
covered by the database 
The componenUal analysts methods result m a 
richer semantic network that can be used m other 
apphcattous Thus, for example, ~t ts possible to 
extend the leracal chatmng methods described m 
(Green, 1997), which are based on the semrels used 
m WordNet The semrels developed with the 
componenttal analysis method would provide 
additional detad available for apphcauon of lexlcal 
cohesion methods In particular, addtUonal relattous 
would penmt some structunng wmthm the individual 
leracal chams, rather than just consldenng each 
cham as an amorphous set (Green, 1999) 
Finally, we are currently investigating the use of 
the componenUal analysts techmque for mformauon 
extracUon The techmque identifies (from 
defimtlous) slots that can be used as slots or fields m 
template generataon Once these slots are identified, 
we wall be attemptmg to extract slot values from 
Items m large catalog databases (mdhons of items) 
36 
In conclusion, it would seem that, instead of a 
paucity of tnformation allovang us to compare 
lexmal resources, by bnngmg m the full semantic 
network of the lexicon, we are overwhelmed with a 
plethora of data 
Acknowledgments 
I would like to thank Bonnie Dorr, Chnstiane 
Fellbaum, Steve Green, Ed Hovy, Ramesh 
Knshnamurthy, Bob Krovetz, Thomas Potter, Lucy 
Vanderwende, and an anonymous reviewer for their 
comments on an earlier draft of this paper 

References 
Atlans, B T S (1991) Bmldmga lexicon The 
contribution of lexicography lnternattonal Journal 
of Lextcography, 4(3), 167-204 

CL Research (1999a) CL Research Demos 
http//www clres com/Demo html 

CL Research (1999b) Dmtlonary Parsing 
Project http//www clres com/dpp html 

Dolan, W B (1994, 5-9 Aug) Word Sense 
Amblguation Chistenng Related Senses 
COLING-94, The 15th International Conference on 
Computational Linguistics Kyoto, Japan 

Green, S J (1997) Automatically generating 
hypertext by computing semantic smulanty \[Dlss\], 
Toronto, Canada Umverstty of Toronto 

Green, S J (Sjgreen@mn mq edu au) (1999, 1 
June) (Rich semantic networks) 

Hovy, E (1998, May) Combining and 
Standardizing Large-Scale, Practical Ontologms for 
Machine Translation and Other Uses Language 
Resources and Evaluation Conference Granada, 
Spam 

Kalgarnff, A (1998) SENSEVAL Home Page 
http//www itn bton ac uk/events/senseval/ 

Krovetz, R (1992, June) Sense-Linking m a 
Machine Readable Dictionary 30th Annual Meeting 
of the Association for Computational Lmgu~stics 
Newark, Delaware Association for Computational 
Lmgtustics 

Lesk, M (1986) Automatic Sense 
Dlsamblguation Using Machine Readable 
Dmttonanes How to Tell a Pine Cone from an Ice 
Cream Cone Proceechngs of SIGDOC 

Lttkowski, K C (1978) Models of the semantic 
structure of dictionaries American Journal of 
Computattonal Lmgutsttcs, Atf 81, 25-74 

Lttkowskl, K C (to appear) SENSEVAL The 
CL Research Expenence Computers and the 
Humamttes 

Mtller, G A, Beckwlth, R, Fellbaum, C, 
Gross, D, & Miller, K J (1990) Introduction to 
WordNet An on-hne lexical database lnternatwnal 
Journal of Lexicography, 3(4), 235-244 

Olsen, M B, Dorr, B J, & Thomas, S C 
(1998, 28-31 October) Enhancmg Automatic 
Acqulsmon of Thematic Structure in a Large-Scale 
Lexacon for Mandann Chinese Tlurd Conference of 
the Association for Machine Translation m the 
Americas, AMTA-98 Langhorne, PA 
