An Overt Semantics with a Machine-guided Approach for Robust 
LKBs 
Evelyne Viegas 
New Mexico State Umverslty 
Computing Research Laboratory 
Las Cruces, NM 88003 
USA 
wegas@crl nmsu edu 
Abstract 
In this paper, we report on our experience in build- 
mg computational semantic lexicons for use in NLP 
applications In a machine-graded approach, the 
computer reduces part of the semantic knowledge 
to be acquired by an acqulrer An overt semantics 
can help predict the syntactic behavior of words By 
overt semantics we mean applying the hnkmg or lexl- 
cal rules at the semantic level and not on lexlcal base 
forms More specffically~ we address the different 
strategies of acqms~tlon arguing for an application- 
driven, training-intensive effort We also report on 
how to develop lexicons using off the shelf resources, 
and address multlhngual issues We will try to pro- 
vide an assessment of the difficulties we encountered 
and some directions to bypass them 
1 Introduction 
Our experience in building computational semantic 
lexlcons which are used by Natural Language Pro- 
cessmg (NLP) systems comes from Mlkrokosmos, 
a knowledge-based machine translation system, 1 
where texts from Spanish and Chinese are translated 
into Enghsh Mlkr0k~n~os adopts an xnterhngua- 
based approach (Nlrenfurg et al, 1992) and all lexi- 
cons can be used for multdmgual analysis and gener- 
atmn each word is mapped to an mterhngua struc- 
ture The lexicons built for Mlkrokosmos are multi- 
purpose multlhngual to support translation or 
multflmgual generatmn tasks, reusable, that is, ap- 
phcable to several NLP tasks~ (e g, generation, anal- 
ysis, information extraction), and maintainable, 
that Is, supporting semi-automatic acquisition and 
restructuring of the lexicons 
The content of the Lexlcal Knowledge Base (LKB) 
~s essentially the same ~rrespective of a particular 
application The types of information important for 
analysis and generation might differ, as suggested by 
Dale and Melhsh (1998) For instance, recording all 
the senses of a lexeme is more important for analy- 
sis than generation, conversely, knowing styhstlc m- 
formation on words such as hzghfalutm or formal m 
1For a descnptmn of Mlkrokosmos, see 
http//crl nmsu edu/Research/Projects/nukro/mdex html 
important for generation (Hovy, 1988) The content 
of a multl-purpose LKB is apphcatlon Independent 
(modulo its indexing m analysis the LKB Is Indexed 
on lexemes whereas for generation the LKB is in- 
dexed on concepts) We argue, m section 2 that the 
acquisition process ~s apphcatmn-dependent Mole- 
over we argue that defining the meamng of a word 
for NLP systems requires a training-intensive effolt 
In other words, the fact that we, as humans, under- 
stand texts does not entail that we can determine 
the "computational" meaning of a word Chomsklan 
trees are linguists' constructs, not Innate structures 
A hngmst must be trained to be able to build syn- 
tactic patterns (e g, trees) In computational se- 
mantics, the same rule applies one must be trained 
to build the corresponding semantics (e g frames, 
predicates, ) for a word In order to approach 
the "computational" meaning of a word, training Is 
the most important means we have to date to en- 
sure consistency among acquuers Other means are 
to adopt an overt semantics with a machine-guided 
approach which directs as much as possible the ac- 
qulrer (Section 3) This machine guided approach 
could also act behind the ' back" of an acquner "cor- 
recting ~' some mc0nslstencms m lexmal descnptlons 
between acqmrers, as will be shown m Sectmn 6 
In Section 4, we &scuss our use of off the shelf le- 
sources, such as WordNet (Miller, 1990), to accel- 
erate the machine-graded acqmsltlon of the Enghsh 
lexicon by taking advantage of the existing database 
of synsets 2 whmh provide synonym lists for a lexeme 
We also show how a semantic-based approach, can 
help predict the syntactic behavior of words Note 
that the reverse (predicting semantms from syntax) 
is not true, as some experiments on Levm's work 
(1993) have shown (Sectmn 4) In Sectmn 5, we ad- 
dress mulUlmgual issues in lexicon development 
2 Application-driven Acquisition 
The semantics of an entry is an underspecffied Text 
Meaning Representation (TMR) fragment (e g, De- 
2Synsets represent WordNet's building blocks whmh are 
words, synonyms or Rear-synonyms, that can be used to refer 
to a given concept (Miller, 1990) 
62 
\[\] 
fnse and Nlrenburg, 1991) Th,s TMR fragment can 
be a concept from the ontology or some lnterhngua 
structures such as att,tudes, modahtms, aspects, sets 
and TMR relatmns (addltion, enumeratlon, com- 
pamson) Concepts and lnterhngua structures can 
appear together or independently The ontology, 
to which lexemes are mapped, conmsts of concepts 
(named sets of property-value pairs) organized hl- 
erarchmally along subsumptlon hnks, w~th an aver- 
age of 14 relational hnks (such as ISA, SUBCLASS, 
AGENT, THEME-OF, HEADED-BY, HAS-MEMBER) 
per concept (Mahesh, 1996) In a multflmgual enw- 
ronment, the main practical advantage of connect- 
mg the lexlcon to an ontology is cost-effect,veness, 
as only the "language-dependent" propertms have tO 
be acquired when adding new natural laffh~iages to 
the system 
The mapping between a word and the ontology 
is the most difficult task of lexicon acquisition, and 
requires to develop the most cost-effective approach 
in terms of trmmng and strategms 
2 1 Importance of Training 
The expemment reported below shows that training 
is essential to determine the "computational" mean- 
mg of a word A native spea\]~er of Spanish, who had 
not taken part m the lexacon traanmg process, was 
asked to add some senses to entries m the Spanish 
lexicon Thls was mainly done for testing the ana- 
lyzer, as there were only 23 out of 167 words which 
were ambiguous m one text we were analyzing But 
we also d~scovered thls was a very useful exercise for 
testing the quahty of a semantic lexicon 
The list of added senses was reviewed by two com- 
putational hngmsts, one in charge of supervising the 
training and the other with proficiency m our frame- 
work who had seen entries as they were used by the 
analyzer but had not taken part to the training pro- 
cess either The untrained acqmrer, hereafter UN- 
ACQ, added a total of 111 to 55 open class words 
or so Among these 55 words where ambiguity had 
been added, 33 were already ambiguous in the Span- 
zsh lexicon 
After a closer look at the Spanish lexicon, and at 
the senses retrieved by the semantic analyzer, and 
after doing an on-hne corpora search, the compu- 
tational hngulsts accepted less than 20 new senses 
among the 111 suggested This "overge,aeratlor~" of 
senses by UNACQ had different origins 1) the ana- 
lyzer did not present all the senses from the Span- 
lsh lexicon to UNACQ, it only presented the ones 
that were accepted after syntactic binding, u) the 
senses added by UNACQ were "equivalent" to the 
senses already in the Spanish lexicon, but not rec- 
ogmzed by UNACQ, as they were acquired as "un- 
specffied" in the Spanish lexicon, m) UNACQ hard- 
coded non-hteral meanings of the words, iv) the ad- 
dition of senses was MRD-dnven UNACQ acquired 
the list of meanings provzded by the Spanlsh-Enghsh 
Larousse and Colhns, adopt,ng an enumeration ap- 
proach Such a task Is not superficial, it ensures that 
the quahty of the core lexicon ,s good enough so that 
it can serve as a basis for lexicon expansion tech- 
tuques, some of which we develop below (see Vmgas 
(1999) for the choices an acqu,rer faces when work- 
mg out the semantic mapping Of a word) 
2.2 Strategms 
There are mainly two approaches to word sense as- 
signment corpus-dr,yen and mental-driven The 
former is better adapted to braiding lexicons used 
m analysis, whereas the latter better suits lexicons 
to be used in generation We refer to Kllganff (1997) 
for the corpus-driven approach, and discuss m this 
paper the mental-driven.approach A mental-driven 
or thesaurus-drlven approach consists m grouping 
together lexemes which share the same meaning In 
order to ensure consistency among acqulrers' map- 
pings we have divided the process of acqu,rmg a coin- 
putatlonal semantic lexicon into two phases pre- 
acquisition and acquisition There is still time to 
revise a pre-acqulred mapping at acquis,tlon time, if 
needed 
2.3 The Pre-Acquisitmn Phase 
For a generatmn lexicon, the method of preparing 
the pre-acqulsltlon files can be as follows 1) extract 
all concepts from the ontology, n) lexlcahze them us- 
mg on-hne thesauri, dlctlonanes and native speak- 
ers' lntultmns, m) order pre-acqulsltion files accord- 
mg to the semantic Mapping-Tag (see below) 
A pre-acqmmtion record includes 7 fields Seman- 
tics, Mapping-Tag, Lexeme, POS, Translations, Fie- 
quency_, and Polysemy-Count 
The Semantics field includes only the ontological 
head concept, in which the word sense should be 
anchored (no selectlonal restrictions or other ptop- 
ertms are specified at this stage) The Mapping-Tag 
field (see below) describes the type of connectloa 
between the word sense and its conceptual mean- 
mg some word senses are directly mapped ("dim" 
map) to a single concept in the ontology, wheleas 
the meaning of some other word senses is descllbed 
through the combination of concepts hnked vm prop- 
ertles (relations or attributes) We defined seven 
tags which flag the entry for a specific task For m- 
stance, "devb" (deverbal) is used primarily for nouns 
and adjectives when their meaning is a composition 
of a filler and an event (e g bombing, readable), 
"asp" (aspecttral) is used for true aspectuals (e g 
begin) and also with actions expressing aspectual- 
lty (e g stare, duration prolonged) The Transla- 
tmns field includes an English translation (for lan- 
guages other than English) Frequency, POS and 
polysemy count are extracted automatically, using 
on-hne large corpora for frequency, and WordNet for 
• .in 
mm 
63 
the part of speech and the polysemy count Bihn- 
gum dictionaries, filtered by a native speaker of the 
foreign language, are used for the translations into 
English (for the acquisition of languages other than 
English) 
In order to increase speed at acqulsitmn time, each 
acqmrer works on one type of Mapping-Tag at a 
time For instance, some acqmrers work on type 
OBJECT Type OBJECT call only be lexlcahzed into 
nouns, e g DEVICE -+ devzce instrument tool apph- 
ance Others work on the type EVENT ~VENTS 
can be lexlcahzed into nouns, e g EXPLODE --~ 
bombzn9, bombardment, or into verbs bomb, bombard, 
drop_bombs_on, throw_bombs_at In order to increase 
consistency, acquIrers go through specially designed 
trammg sessions 
3 Overt Semantics to Predict 
Syntax: A Machine-guided 
Approach 
Mappings between semantic roles and syntacUc com- 
plements axe defined via a mapping (a rule) These 
mappings can be defined for large sub-classes of lex- 
ical entries For example, the rule Atl;-Pred-Adj 
creates an entry which accepts in the semantic fea- 
ture a concept from the subtree of ATTRIBUTE or 
an ATTITUDE and accepts attributive (e g safe 
car) and predlcat,ve uses (e g the car zs sa\]e) In 
the case of an adjecUve mapped to a RELATION 
(e g MENTAL-OBJECT-RELATION) the preferred 
rule would be Att-Adj generating an attributive 
reading (e g, dental practzce), and not (~the prac- 
tice ~s dental) 
By selecting the appropriate mapping for classes 
of entries, it is possible to hide the mapping from 
the acqulrer since these mappings are defined in a 
lexlcal class, not m an instance As defined by an 
acqulrer, an entry looks as follows 
\[key "safe", 
syn Att-Pred-Adj, 
sem \[name Safety-Attrlbute, 
range Safe\]\], 
During compilation of the dictionary, the 
Att-Pred-Ad 3 label is replaced with its definition 
and makes explicit the co-reference between the sub- 
categorization and the semantics 
So far we have developed for the English lexicon 
about twenty syntactic patterns whmh apply to a 
large number of semantic frames In the case of ad- 
jectives, we have 3 rules, one for attributive adjec- 
tives, another one for predicative adjectives, and a 
third one for attributive adjective used predicatively 
In the case of nouns, we have developed four pat- 
terns as Illustrated below 
Subcat pattern Example ................................................. 
weapon 
NObllOpt father (of two) 
NObll-0bl20pt bomblng of Iraq (by the US) 
NObllOpt-Obl2Opt computatlon (of the bank reserve) 
(by its clerks) 
We presented above the labels of subcategorlza- 
tion patterns as they appear at acqms~tlon time At 
processing time, there is no difference between Obll 
and Ob12, which are both of type Oblique Our 
machine-graded approach helps the acqmrer to se- 
lect a rule as it only presents the relevant ones for 
a specffic semantic type For instance, in the case 
of a lexeme mapped to an OBJECT no rules having 
obliques will be presented to the acqmrer as de- 
scribed below 
example semantzcs subeat lexlcal class 
.................................................. 
weapon 0hi N 0bin 
father . Prop NObll0pt PropN0bjEvent0bllOpt 
bomblng Event N0110b120pt EventN0b\]0bll0b\] 
computatlon N0bllOp~0bl20pt -Event0bl20pt .................................................. 
The table above should be read as follows the first 
column provides type examples for nouns, the sec- 
ond column (semantics) provides the list of semantlc 
types that a noun can be, Obj (Object), Prop (Prop- 
erty) and Event, the third column (subcategonza- 
tion) presents all subcategorizations a noun can sub- 
categorize for, the fourth column (lexlcal class) con- 
catenates the semantics and the subcategorization 
For instance EventNObjObllObjEventOb12Opt' is 
the lexmal, class of nouns.which are of.type 'Event' 
and therefore subcategorlze for two obhques (Obl) 
- the former must be Obj whereas the latter can 
be either Obj or Event These Obl can be optional 
(Opt) 
Acqmrers may specify the preposition (head of the 
oblique or preposlUonal phrase) For Instance, in the 
case of lather, once an acqulrer has mapped the word 
to the concept ' Father" which is a Prop (Property) 
the acquisition tool presents the subcategorizatmn 
NObllOpt This allows the acquirer to select wluch 
prepositlon(s) can go with the range of Father (in 
this case "of" will be selected) This Information is 
important in generation For generation, one must 
specify, at acquisition tlme, whether or not one can 
say the bombing o/Iraq , the bombing of Iraq by the 
US ~ the bombing by the US It also helps in word 
sense dxsamblguation 
In the case of verbs, one can also define lexico- 
syntactic classes for different semantic classes For 
instance, in the case of ASSERTIVEACT the lexemes 
mapped to it will accept a comp clause (e g he 
A 
64 
sazd (that,) he would come) One class of aspec- 
tuals subcategonzes for nps (e g I started a new 
book), xcomps (e g I started reading/Tinting a new 
book), and accepts the intransitive alternation when 
the grammatical object is of type Event (e g the 
surgery started very late) 3 
4 Propagation of Lexicons 
In this section, we briefly discuss how to extend a 
lexmon using denvatlonal morphology, and off the 
shelf resources such as WordNet (Miller, 1990) to 
propagate the English lexicon with synonyms, and 
Levm's database of subcategonzatlons and alterna- 
tions for Enghsh verbs (Levm, 1993) to encode syn- 
tactic information m the verb entries 4 
4.1 Morpho-semantics for Derlvatlonal 
Morphology 
We refer the reader to Vmgas et al (1996) for the 
details on this type of acquisition and theoretical 
background of Lexlcal Rules (LRs) To sketch this 
operation briefly, applying morpho semantic LRs 
to the entry for the Spanish verb comprar (buy), 
our acquisition system produced automatically 26 
new entries (comprador-N1 (buyer), comprable-Ad\] 
(buyable), etc) This includes creating new syntax, 
semantics and syntax-semantm mappings with cor- 
rect subcategomzations and also the right semantics 
For instance, the lexmal entry for comprable will have 
the subcategonzatlon for predicative and attributive 
adjectives and the semantics adds the attribute FEA- 
SIBILITYATTRIBUTE to the basic meaning BuY of 
comprar (Vmgas et al, 1996) describes about 100 
morpho-semantlc LRs, which were applied to 1056 
verb citation forms with 1,263 senses among them 
The rules helped acquire an average of 26 candidate 
new ~ntrms per verb sense This produced a total 
of 31,680 candidate entries, with an average of over 
90% and 85% correctness in the assignment of syntax 
and semantics respectively LRs constitute a power- 
ful tool to extend a core lexicon from a monohngual 
viewpoint We present other ways of extending lex- 
icons, from monohngual (next paragraph) and mul- 
tlhngual (Section 5) perspectives 
4 2 Using WordNet 
WordNet has been used as follows We extracted 
the synsets assooated with a lexeme using fuzzy 
string matches between, on the one hand, the value 
of the ontological concept (e g, DESIRE), its defim- 
tlon (e g, for DESIRE "to want something") and 
the concept and definition of ~ts corresponding ISA 
concept (e g, INTEND) and, on the other hand, the 
direct hypernyms and hyponyms for the lexeme m 
3See Vmgas et al (1999) for the details on the web rater- 
faces used for the acqulsltmn 
4See Vlegas et al (1998) for more details 
WordNet synsets For instance, for the English verb 
expect, mapped to the ontological concept DESIRE 
our algorithm only kept one synset hope, expect, 
trust, deswe for expect and the following synsets for 
Its hypernyms wzsh, des:re, want 
The output of our automatlc procedure and man- 
ual filtering Is illustrated below for the ontological 
concept DESIRE, along with the synonyms from 
WordNet belonging to the same ontological class 
DESIRE 
want 
expect 
trust 
wish 
All these lexemes will be mapped to the concept 
DESIRE and minimally accept the same subcatego- 
mzatlons (e g np-v-np-xcomp as in I want you to 
/eel comfortable) 
We should mention that this step also Involved 
some manual filtering by acqulrers We used a 
machine-guided mode to help the acqulrer in this 
task This type of filtering was done very quickly, 
mainly due to the fact that WordNet is orgamzed 
on a semantic basis 
4.3 Using Levin's DB 
One of the major problems m using Levm's database 
was filtering out homonyms, as classes in Levm's 
database are defined on the basis of the same subcat- 
egonzatlon pattern (as seen in alternations) and not 
on a semantic basis, as shown by many researchers s 
The advantage of our approach is that ~t is 
semantic-based, this allows us to organize verbs into 
true (frame-based) semantic classes, with their asso- 
ciated sets of subcategonzatlons Therefore, we can 
pledlct that all velbs belonging to a particular se- 
mantra class Will have the Same syntactic behavior 
For instance, if one considers the serhantm class of 
aspectual verbs which selects a theme of type Event, 
e g begin, continue, finzsh, then one can minimally 
associate to any verb belonging to this semantic class 
the following subcategonzatlons (a) NP-V-NP m 
John began h:s homework, (b) NP-V-XCOMP John 
began to work//workzng Note that the reverse is not 
necessarily true verbs which accept (a) and (b) are 
not necessarily aspectuals, e g forget in I forgot the 
key or I forgot to brzng the key 
5 Applicability to Other Languages 
In thls sectlon,-we briefly address what can be gen- 
erahzed to multiple languages The methodologms 
described here are part of what is needed to build 
5Many experiments have resulted m a sumlar finding, as 
described In (Dorr et al, 1997), (Dang et al, 1997) Samt- 
D~z~er (1996) also showed that these classes do not apply eas- 
fly to French 
65 
a multi-purpose LKB while keeping the costs of ac- 
qmsltmn as low as possible 
5.1 Semantic Multillnguahty 
By mapping lexemes to concepts, it is possible to 
create lexicons for dufferent languages, at a mml- 
mum cost, once a core lexicon has been acquired 
Th~s task can be further accelerated if one has ac- 
cess to blhngual d~ctmnames to semi-automate the 
translatmn task Finally, if one has access to a rlch 
structured ontology (as is the case in Mlkrokosmos) 
then dynamic procedures (e g, generalization, spe- 
ciahzatmn) can help the acqulrer in "filling" the gap 
m the case of lexlco-semantm mismatches (e g, cook, 
bake ~ cuwe) 
5.2 More Related Languag e 
Multflinguahty: Morpho:semantics 
All the LRs (e g, LR2agent-o\]) developed for Span- 
lsh can be used to extend other languages, even un- 
related ones, in other words, these rules are lan- 
guage mdependent The morpho-semant,c aspect 
of the LRs is, however, specific to particular lan- 
guages But, in order to benefit from the work done 
on morpho-semantlc LRs, we separated the assign- 
meat of affixes from the assignment of LRs In other 
words, if m Spanish LR~agent-of m ass~ghed to say 
the suffix -dot, by translating suffixes between lan- 
guages, (-dot -+ -cur in French), the French lexl- 
con can be extended m the same way (comprador 
--+ acheteur) Again, this work will necesmtate some 
manual checking, because of some overgeneration, 
which cannot be accepted for generation But over- 
all, one can use the same methodology, the same LRs 
and engine to produce new entries 
5.3 Even More Related Language 
MultihnguahtY 
The subcategonzatlons attached to a lexeme have an 
even more idiosyncratic behavmr than lexlcal LRs 
But here agam, the rules we developed can be ap- 
plied at least to family-related languages, and then 
filtered out by a human For mstance, the Spanish 
word comer has the pattern np-v-np associated to it 
(e g Juan come una pera), so this same pattern will 
be attached to the translatmn of comer (eat) as m 
(Juan eats a pear) However, gomg from Spanish to 
English, one misses all the alternatmns (Lewn, 1993) 
not common in Spanish such as John gave Mary a 
book 
6 Summary 
The MIkrokosmos lexicon acqulmtlon group has ac- 
quired the following data - Spanish lexicon 7,000 
word sense entrms (35,000 word sense entries after 
applying the morpho-semantm lexlcal rules), Chi- 
nese lexicon about 3,000 word sense entries, and En- 
ghsh, about 15,000 word sense entrms so far For 
instance, the acquisition of 15,000 word sense en- 
tries took one year and involved 50% of the time 
of a computational hngmst (to develop the method- 
ology, train the acqulrers and design the GUIs), 50 
acqmrer hours per week, 10 hours per week of a pro- 
grammer to tmplement the GUIs, mamtam the tools 
and test the entries 
Our approach to the development of lex, cons dif- 
fers from others m that our rules apply directly to 
semantlc frames and not to the basic forms of verbs 
Our methodology allows us to alleviate the burden 
of manual checking by applying linking rules directly 
on the semantics of the lexemes Some rules add 
discourse related features, such as focus m some al- 
ternations, e g, they zmproved the s~tuatwn --+ the 
s~tuatzon zmproved 
What is Important to evaluate is how much do we 
gam by using rules and other resources Today, ~t ~s 
still d~fficult to say exactly how much 
Adequately predicting the subcategonzatlons for 
a semantic class depends on its gram size the finer- 
grained, the better the pred2ctwn wall be However, 
m NLP apphcatlons, where one Is constramed by 
tlme, only the semantics necessary for a particular 
application is acquired, which means that m many 
cases the semantms is left at a coarser grain s~ze 
than the one required to predict the subcategomza- 
tlons In practice, we overgenerate some subcatego- 
rtzatlons and need therefore to have them checked 
by humans This ~s why we have concentrated on 
a small set of rules Results on that trade-off issue 
have been reported in Vmgas et al (1998) 
Our experience in large-scale acqulsltmn of lexi- 
cons shows that Idiosyncrasies overrule many of our 
general rules This is mainly due to the fact that 
we need a more fine-gramed semantms than the one 
which is available now This ,s not just a criti- 
cism of our framework, ,t is a genelal fact that we 
all encounter when mvestlgatmg lexlcal semantics 
This might be due to the fact that we work m a 
synchronic perspective (a highly recommended ap- 
proachl), whereas language evolves constantly, thus 
creatmg "artificial" ldmsyncrasms In any case one 
cannot avold them when butldmg a computatmnal 
semantic lexicon 
We have also learnt dunng the acquisition of the 
Mtkrokosmos lexmons that different acqulrers, who 
have been through the same Intensive tralmng, will 
arrive at the same numbel of meanings for a word, 
in more than 90% of the cases The meaning of 
a word might differ, for different trained acqulrers, 
along ISA links Corpora also Influence the decision 
of the acqmrers, and here too we have seen some hu- 
man "mconsltencms" which we thmk could be "cor- 
rected" automatically, as discussed in the followmg 
section 
66 
7 Perspectives 
We are investigating the msue of taking into account 
mconmstent lex~cal descnptmns between the lexicon 
acqmrers by taking advantage of the semantm tarot- 
matron encoded in the ontology 
If we look at the following data and their subcat- 
egonzatlons 
(1) I fixed the meal - \[NP1, NP2\] 
(2) I fixed a sandwlch for you - \[NP1, NP2, PP1\] 
(3) I fixed you a sandwich - \[NP1, NP3, NP2\] 
where fix means PREPAREFOOD, then one must 
subcategonzatmns m between square brackets are 
assocmted to the lexmal ~tems mapped to A, B and 
C 
A- CREATEINGEST 
\[NP1, NP2, PP1, PP2, \] 
allow the analysm and generation of any of sentences 
(1), (2) and (3), whether there is a mapping of fix s - CREATEINGESTBENEF C - CREATEINGESTTHEME 
onto the concepts A or B or C 6 \[NPI, NP3, NP2\] \[NPI, NP2\] \[NPI, NP2, PPI\] 
A- CREATEINGEST 
\[ARGI, ARO2, ARQ3, ARC4, \] 
B - CREATEINGESTBENEF C- CREATEINGESTTHEME 
\[ARGI, ARG2, ARG3\] \[ARGI, haG2\] 
Looking at example (1), and m absence of exam- 
ples (2) and (3) in the corpus, fix could easily be 
mapped into A or C, whereas with examples (2) and 
(3), and m absence of example (1) m the corpus, 
~t could be easily mapped into A or B by the ac- 
qmrers We clmm that thin is of no importance as 
far as there are mechamsms to go from one to the 
other This requires to have access to semantic refor- 
mation The dmgram above is a computational hn- 
gumt construct and has no "reality" per se B and 
C are constructs which provide for every semantic 
class the different semantic patterns that a particu- 
lar semantic class accepts, such as the pattern CRE- 
ATEINGESTBENEF requires 3 semantm arguments 
(AGENT, BENEFICIARY and THEME), whereas CRE- 
ATEINGESTTHEME only reqmres 2 semantic argu- 
ments In this case, thin means that the BENEFI- 
CIARY IS optional, a fact the acqmrer "failed" to 
recogmze 
This diagram can be further specified for a partic- 
ular natural language, where the required arguments 
are mapped to syntactic arguments and where lex- 
lcal rules for a particular language provide the link 
between the different semantic patterns for a seman- 
tic class The dmgram below is for English where 
6We g*ve the general diagram for CREATEINGEST events, as 
PREPAREFOOD is a subtype and will inherit all the properties 
of the semantic class CREATEINGEST 
The corpus can indeed Influence the way a lexi- 
con acqmrer will do the mapping So if a lexicon 
acqmrer creates an unspecffied entry (mapping fix 
on (A), as opposed to (B) or (C)), dynamic mecha- 
msms such as speclahzatlon or generahzatmn would 
enable the system to get to (B) and (C) from (A) 
and vice versa (to (A) from (B) or (C)) Moreover, 
ff a lexicon acqmrer decides to map to (B) instead of 
(C) or vine versa, then a lexmal rule (LR) between 
(B) and (C) will enable the system to go from (B) 
to (C) and wce versa 
In other words, although there are three poten- 
trolly different ways of writing the lexlcon entry for 
fix for example sentences (1), (2) and (3), these dif- 
ferent ways of encoding fix should remain a virtual 
difference at processing t~me the system must en- 
code mechanisms and rules to "interpret" and recon- 
cile the different points of wew of different acqmrels 
This enables the system to process sentences (1), (2) 
and (3) from any of the three p0tentml lexicon en- 
tries 
We beheve that an unportant msue m compu- 
tational semantics m to study how lexicon entries 
could be dynamically changed to fit different hngms- 
tic contexts and different acqmrers' analysis of the 
data This is what we plan to investigate m our 
future research 
Acknowledgments. 
This work has been supported m part by DoD un- 
der contract number MDA-904-92-C-5189 We are 
grateful to the Mikrokosmos team, and m partmu- 
lar Stephen Be'ale, Kavl Mahesh, Sergel Nlrenburg, 
Boyan Onyshkevych and Victor Raskm Without 
their input and the many d~scusmons with them 
it would not have been posmble for the author to 
develop the methodologms described here and su- 
pervise the acqmmtlon of the Mlkrokosmos lexi- 
cons However, the opinions expressed m th~s at- 
67 
hcle are those of the author and do not necessar- 
tly reflect their opmmns We would also hke to 
thank the anonymous rewewers for their useful com- 
ments Last but not least, we would hke to thank 
all the acqmrers who developed the lexicons over the 
years Oscar Cossm, Ron Dolan, Margarita Gon- 
zales, Wanymg Jm, Juhe Lonergan, Jeff Longwell, 
Maya, Jawer Ochoa, Armm Ruelas 

References 

Dale, R & C Melhsh (1998) Towards Evaluation m 
Natural Language Generahon In Proc of LRE, 
Granada, Spare 

Dang, H, Rosenzwelg, J & M Palmer (1997) As- 
sociating Semantic Components with Intersechve 
Levm Classes In MCCS-97-314, CRL, NMSU 

Dorr, B , Olsen, M & D Clark (1997) Using Word- 
Net to Point Hmrarchlcal Structure m Levm's Verb 
Classes In MCCS-97-314, CRL, NMSU 

Hovy, E (1988) Generating Natural Language Under 
Pragmatzc Constraznts L Erlbaum, Hlllsdale, NJ 

Kllgarrtff, A (1997) Sample the Lezzcon Techmcal 
Report, ITRI-97-01 

Levm, B (1993) Enghsh Verb Classes and Alterna- 
hons A Prehmmary Investigation Chmago Um- 
verslty of Chicago Press 

Mahesh, K (1996) Ontology Development Ideology 
and Methodology MCCS-96-292, CRL, NMSU 

Miller, G A (1990) Nouns m WordNet A Lexlcal 
Inheritance System In Internatzonal Journal o/ 
Lexicography 3 4 

Nlrenburg, S, J Carbonell, M Tomlta & K Good- 
man (1992) Machine Translatzon A Knowledge- 
Based Approach Morgan Kaufmann 

Nlrenburg, S & C Defnse (1991) Practmal Compu- 
tat!onal Llngulstlcs In Johnson & Rosner (eds) 
Computatzonal Lmguzstzcs and Formal Semantics 
CUP 

Samt-Dlzmr, P (1996) Verb semantm classes based 
on 'alternatmns' and on WordNet-hke seman- 
tic criteria A Powerful Convergence In Proc 
of Predzcatwe Forms m Lezzcal Semantzcs and 
LKBs, Toulouse 

Vmgas, E, Onyshkevych, B, Raskm, V & S Nlren- 
burg (1996) From Submzt to Submztted via Sub- 
mzsswn on Lexlcal Rules m Large-scale Lexicon 
Acqmmtmn In Proc of the 34th ACL, UCSC, CA 

Vmgas, E (1999) The Manifesto of Large-scale Se- 
mantra Lexicon Acqulsltmn In R Zajac (ed) 
TALN- Le Multdmguzsme, Vol 40-1 

Vmgas, E, Ruelas, A, Beale, S & S Nlrenburg 
(1998) Extending a Core Lexicon Using On-hne 
Language Resources with Savolr-Fmre In Proc 
of the lrst LRE, Granada, Spare 

Vmgas, E, Ruelas, A, Lonergan, J, Longwell, J, 
Beale, S and S Nlrenburg (1999) Developing a 
Large-scale Semantic LKB to Su~t an Intelligent 
Planner In Proc of the 7th ENLG Toulouse 
