SIMPLE - Semantic Information for Multifunctional Plurilingual 
Lexica: Some Examples of Danish:Concrete Nouns 
Bolette Sandford Pedersen 
Center for Sprogteknologl 
Njalsgade 80 
DK-2300 S Denmark 
bolette@cst ku dk 
Brltt Keson 
Det Danske Sprog- og Lltteraturselskab 
Chnstmns Brygge I, l 
DK- 1219 Copenhagen K Denmark 
bntt_keson@lrct org 
Abstract 
SIMPLE is a large-scale Emopean lexicon project 
funded by the European Commlssmn with the 
partlctpat~on ot 12 European countries The mm of 
the project is to add harmomzed semantm 
mtormatlon to the LE-PAROLE lexicons 1, which 
contain motphological and syntactic information In 
this paper we present some examples of concrete 
nouns trom the Danish SIMPLE lexicon which 
illustrate two central aspects of the SIMPLE model 
1) the expressive power of the Quaha Structure 
exemphhed with a phenomenon relevant to a 
Scandinavian language like Damsh namely the 
iepresentatlon of the mternal structure of Danish non- 
devet bal nominal compounds, and 2) the 
leptesentatmn ol legular polysemy in the Damsh 
SIMPLE lexmon 
1 Introduction 
The SIMPLE model is primarily based on three 
lexlcal flameworks (Lencl et al, 1998) The 
Geneiatlve Lexicon (cf Pustejovsky, 1995), 
WoldNet (cf Miller and Fellbaum, 1991), and 
EuroWordNet (ct Vossen et al, 1998) The 
basic underlying assumption m the model is that 
word senses diffei in tel ms of their internal 
complexity Hence the SIMPLE model consists 
of three different semantic types (t) simple 
types, which can be characterized In terms of 
z The LE-PAROLE lexlcons contain 20,000 entries with 
corresponding morphological and syntactic mlormation tot 
each ot the 12 languages that parttclpated m this project, 
whlch was also tunded by the European Commtssmn (ct 
Rmmy et al, 1998) 
monodimensional relations, (n) unified types, 
which involve multidimensional information, 
and (ill) complex types, which identify regular 
polysemous classes 
One of the basic tasks during the SIMPLE 
lexicon encoding phase is the assignment of 
semantic typing to the word senses to be 
encoded (called the semantic units or SemU's) 
A set of schematic structures called 'templates' 
constituting the SIMPLE Ontology (consisting 
of approx 140 semantic types in all) guides thls 
encoding process A template ~s a cohort of 
various different Information types which is 
primarily used by the lexicon encoder to express 
the semantic type of a word sense, but also to 
express its domain, defmlnon, predicative 
representation, argument structure, polysemous 
classes, etc 
The multiple dimensions of meaning are 
represented In SIMPLE by the use of the Quaha 
Structure from the Generative Lexicon 
(Pustejovsky 1995) to represent lexlcal meaning 
expressed by means oforthogonat inheritance 
The Quaha Structure involves tour different 
roles 0) the formal role, which ptovldes 
reformation that distinguishes an entity wtthm a 
lalger set, 01) the agennve role, which concelns 
the orlgm of an entity, (m) the tehc role, which 
concerns the typical function of an entity, and 
0v) the constitutive role, which expresses a 
variety of relations concerning the internal 
constitution of an entity As an illustration, 
consider m Figure 1 the meaning components 
involved m the noun pudding 
46 
Figure 1: The meaning components ofpuddmg (Lento et al 1998 pp 17) 
constitutive ~ agenttve I\[ \[ \[1 
substance ingredients eat make 
puddmg 
The central meaning aspects are m~rrored in the 
llngmstlc contexts surrounding the word, so for 
pudding we could have John refused the 
pudding refemng to the eat event, that's an easy 
pudding referring to the make event, there zs 
pudding on tile floor referring to the substance 
dimension, and that was a nice bread puddmg 
referrmg to the ingredients of whlch it Is made 
As an example of the semantic types explessed 
m the SIMPLE Ontology and of how the 
different dlmensmns of meaning are involved 
for each semantic type, consider F~gure 2 which 
shows a subset of the SIMPLE ontology 
referring to human beings 
Figure 2: Subset of the SIMPLE Core Ontology representing human beings 
Some examples of word senses encoded as 
simple types are russer (a Russian) under the 
template type 'people', 'jcde (Jew). under 'ldeo', 
kusme (female cousin) under l~mshlp, and yen 
(friend) under 'socml status' These senses may 
naturally involve addmonal dlmensmns ot 
meaning, however they are not cons,dered tytze- 
defining m the SIMPLE model In contrast, 
word senses encoded under the emphas~sed 
boxes above, such as borddame (female droner 
partner) under 'agent of temporary activity', 
alkohohker (an alcoholic) under 'agent of 
persistent actlwty', and lcege (doctor) under 
'professmn', are untried types These are 
identified by more than one coordinate in the 
type h~etarchy, since they involve more than one 
type-defining dlmenslon of meaning Thus 
'agent of temporary activity' also revolves an 
agennve role For borddame th~s is defined by 
the act of sitting next to someone at a droner 
'Agent of persmtent activity' and 'profession' 
revolve a tehc role, which for alkohohker is the 
act of drinking, and for lcege ~s the act of curmg 
2 Danish nommal compounds denoting 
containers: expressing their internal 
structure via the Qualia Structure 
The internal semantic structure of Damsh 
deverbal nominal compounds (t e 
fllsebelcegnmg (flagging, lit 'flagstone paving')) 
47 
can to some extent be ~dent~fied by the argument 
structure of the derived verb and the internal 
ranking of ~ts arguments (cf Orsnes, 1995) In 
contrast, non-deverbal nominal compounds 
generally display a more arbitrary internal 
structure m Damsh (cf Paggm & Orsnes 1993), 
and hence they reqmre a h~gher degree .of 
expresswe power m the lex~cal entries The 
Quaha Structure as It ~s expressed m the 
SIMPLE model provides a good basls for a 
lexlcahsed encoding of Damsh non-deverbal 
nominal compounds, as for example nominal 
compounds denoting containers 
The template type 'Container' belongs to the set 
of templates conmtutmg the SIMPLE Core 
Ontology It is a unified type that has the 
umficat~on path 'Concrete entity + 
Artffact/Agenttve + Tehc' Thts indicates that 
the template denotes a kind of concrete ennty, 
and that ~t has been augmented w~th two kinds of 
addltmnal type-defining mformatmn (1) 
agentlve mformatmn (namely that these concrete 
entrees are man-made artifacts), and (n) tehc 
mformatmn (namely that these concrete entrees 
are used for a spemfic purpose to contain 
things) 
All Damsh contamers encoded under th~s 
template type have been encoded w~th Damsh 
mformatmn about the formal role vm an tsa- 
hlermchy As a default, beholder (container) is 
chosen as hypernym for the Damsh containers m 
the re_a-hierarchy Since containers are (man- 
made) artifacts, the process of their creanon ~s 
spectfied wa the agentlve role For Damsh 
containers thls is the process fremsttlle (to 
create) As Is apparent from the umficatmn path 
for th~s template, containers are also encoded 
w~th the type-defining tehc mformatmn that 
their funcnon ts to contam things For Damsh 
containers th~s ~s specified w~th the verb 
mdeholde (to contam) m the encoding of the 
tehc role 
the tehc role for such compounds as mdlebteger 
(measuring cup), raflebceger (ht cup for casting 
dice = dice cup), or drtkkebmger (dnnkmg cup) 
In the SIMPLE model, the encoding of the 
meaning of bmger can also be further specified 
by mcludmg mformatmn about the constltutwe 
..role for other types of compounds The 
encodmgs can mclude addmonal mformatmn on 
0) the materml of which the contamer is made, 
e g plasttcbmger (plastm cup), messmgbcegel 
(brass cup), or papbceger (paper cup), or (n) 
what the container (prototyp~cal\[y) contains, e g 
askebmger (ashtray), gtftbceger (poisoned 
chahce), or yoghurtbceger (yoghurt cup) 
The following SGML-encoded examples from 
the Danish SIMPLE lexicon 2 demonstrate the 
use of the SIMPLE Quaha Structure to 
dlstmgmsh between different types of compound 
containers The first example shows the 
encoding of vmflaske (wine bottle) This is 
shown to be a hyponym offlaske (bottle) m the 
formal ls_a-relattonshlp The encoding of 
vmflaske differs from flaske by the addmonal 
constitutive mformatmn expressmg that ~t 
(prototyplcally) contains vm (wine) In the 
second example, the meaning of bcerepose 
(carrying bag) ~s hnked to Its hypernym pose 
(bag) m the Is_a-hierarchy The encoding of 
bcerepose also contains the addmonal tehc 
mformanon that Its prototyptcal functmn ts to 
cany somethmg (bcere) In the last example, the 
meamng of bhkd&e (tin can) is shown to be a 
hyponym of ddse (can) m the Is_a-hierarchy 
The encodmg of bhkdc°tse also contains the 
add~ttonal const~tunve mfotmatmn that ~t ~s 
made of bhk (tin), whereas ddse would be 
underspec~fied w~th respect to th~s role 
The Quaha Structure used m the SIMPLE model 
is well-stated for capturing a ma.lonty of the 
mtemai lelatmnshlps for Damsh non-deverbal 
nominal compounds denoting containers As an 
example, the encoding of the meaning of a word 
hke bceger (cup) can be further augmented with 
2 The SGML-encodmgs displayed here are shghtly 
stmphfied versions ot the actual Damsh lexicon encodmgs 
The example sentences were taken trom a composmon ot 
several Damsh corpora (45 mill running words m all) The 
detmmons marked '(NDO)" were taken trom the CD-ROM 
edltton of Polmkens Store Nye Nudansk Ordbog (1997) 
48 
<SemU 
<SemU 
<SemU 
Id="USEM_N_vmflaske CON 1" 
namang="wnflaske" /wine bottle/ 
example=-"en vmflaske kan genbruges syv tfl otte gange" /a 1tree bottle can be reused te~en-etght ttraer/ 
freedefinmon="flaske til vm" /a bottle Jot ~me/ 
wetghtvalsemfeaturel=" 
WVSFTemplateContamerPROT 
WVSFUmficatmnPathConereteenttty-Ar tt faetAgenttve-TehePROT"> 
<RWetghtValSemU 
target="USEM N_flaske_CON_ 1"/bottle/ 
semr="SRIsa"> 
<RWelghtValSemU 
target='USEM V fremsttlle l"/to p:oduce/ 
semr="S RCreatedby"> 
<RWetghtValSemU 
target="USEM_V mdeholde_l" /to cm~mt~d 
~emr="SRUsedfor"> 
'~ ~, Weigh t -vai,~emi.J 
target="USEM_N_vm A R D_ 1" /~tne/ 
semr=-"SRContams"> </SemU> 
Addmonal constitutive mformatmn 
('Contains') 
id=" USEM_N_bmrepose CON 1" 
nammg="b,erepose" Aa: l ymg bag/ 
example=' hay mmdst en hel bmrepose fuld parat ttl at s,'ette pa bordet" 
/keep at least a t~hole cal t ~tng bag \]ull :ead~ to put on the table/ 
freedefinttlon="en pose at paptr, plastic el stot reed hankc sore bruges ttl at bmre tx kobmandsvarer t (NDO)" 
/a bag t~tth handle~ made o/paper plavttc m cloth ~htch t~ u~ed to catr~ e g ~,,:oce:le~ (NDO)/ 
wetghtvalsemteatarel=" 
WVSFTemplateContamerPROT 
WVSFUmficatlonPathConereteenttty-ArtffaetAgentlve.TehcPROT"> 
<RWe~ghtValSemU 
target="USEM_N pose_CON 1"/bag/ 
semr=-"SRIsa"> 
<RWetghtValSemU 
target="USEM_V fremstflle_ 1"/to produce/ 
semr="SRCreatedby"> 
<RWelghtValSemU 
target="USEM V mdeholde_ 1"/to contatn/ 
semr="SRUsedfor"> 
<RWetghtValSemU 
target="USEM V ba~re_l" /to ~.altx/ 
semr=-"SRUsedfor '> </SemU> \[ I 
Addltmnal tehc lnformatmn 
(Llsedfor') 
id= 'US EM_N_bhkd~se_CON_ I" 
nammg="bhkd.~se" /tm cald 
example="en urtepotteunderskal, hvon man omvendt har sat en tom bhkdase fyldes reed vand' 
/a flo~et put ~autet m ~htch one har placed an empt~ tm can up~tde dtmn t~ then fdled utth ~atet/ 
freedefimnon="d~se lavet af bhk" Aan made of trial 
welghtvalsemfeaturel=" 
WVS FTemplateContamerPROT 
WVSFUmficattonPathConcreteentlty-ArtttactAgenttve-TehcPROT '> 
<RWelghtValSemU 
target="USEM~N_dhse_CON_ 1" Atod 
semr= 'SRIsa"> 
<RWezghtValSemU 
target= 'USEM_V_fremstdle_ 1'/to imMuc e~ 
semr= 'SRCreatedby"> 
<RWmghtValSemU 
target="USEM_V_mdeholde_l"/to contain/ 
senu'="SRUsedfor"> 
<R WetghtValSemU 
target="USEM_N bhk ARS_ 1"/tm/ 
semr=-"SRMadeof"> </SemU> 
Addmonal const~tutlve mformatmn 
('Madeof') 
49 
m 
1 
m 
\[\] 
m 
m 
m 
mm 
m 
m 
mm 
mm 
m 
m 
U 
m 
m 
\[\] 
m 
m 
m 
n 
m 
m 
m 
1 
3 Regular polysemy in the Danish 
lexicon 
Regular polysemy - when groups of related 
words display the same amblgmty - is handled m 
a uniform way In the SIMPLE model vm the 
identification of a set of well-estabhshed regular 
semantic classes, which are adjusted for each of 
the languages involved Whde unsystematic 
ambiguous readings of a word are represented as 
totally unrelated semantic units, regular 
polysemous senses can be encoded as 
mterhnked semantic units In the SIMPLE 
model this is represented by an reformation slot 
called complex, whose value is the polysemous 
class to which the semantic unlt belongs This 
strategy relates to Pustejovsky (1995), where 
regular polysemous classes correspond to 
complex types, which allows for an 
underspectfied semantic typing of word senses 
The solution adopted in SIMPLE intends to be a 
first step towards the future development of 
underspeclfled semantic types (Lencl et al, 
1998) 
Empmcally-based studies of regular 
polysemous semantic classes of Damsh are at 
present very scarce (see however Boje & 
Sch0sler (ed) (1992) pp 11-12 and Braasch & 
Pedersen (forthcoming) for some minor 
cons~derauons of regular polysemy m Danish 
nouns as well as Malmgren (1988) for an 
extensive study of iegular polysemy in Swedish, 
a language which displays polysemous 
behavlour very slmdar to Damsh) Th~s fact 
underpins the need for corpus-oriented encoding 
procedures which have therefore been given a 
centlal focus m the the Danish SIMPLE 
d~ctlonary in the sense that each semantic 
encoding is supported by corpus exammauons 3 
In the Damsh lexicon the most productive cases 
of regular polysemy mvolwng concrete nouns 
prove to be the following 
• animal/food 
3 We apply the corpus tool Xkwlc on a composite corpus 
ot 44 mill runmng words (consisting ot "Bedmgske 
Korpus', 'Bergenholtz Korpus' and 'Parole Korpus') 
• geographical location / human gloup 
• fruit / plant 
• human group / restitution 
• semiotic artifact / reformation 
Other well-known polysemous palrs are not 
productive m Damsh, as for example 'people / 
language' and 'flower / colour', where only a few 
examples of each can be found Thls difference 
relates to the d~stmct~on made by Apresjan (apud 
Malmgren, 1988) between productive and 
iegular polysemy Here productwe polysemy 
refers to cases where more or less the whole 
group of nouns within a semantlc class display 
the same polysemy relations, whereas regular 
polysemy refers to cases where at least two 
words - but not the whole class - follow the 
same polysemy pattern 
Below xs shown an example of the semantic 
encoding of a proper noun denoting a Damsh 
c~ty, which belongs to the productwe 
'geographlcal location / human group' polysemy 
This example shows the encoding of the human 
group' sense of the word This can be seen m the 
corpus example ('example'), the defimtlon 
('freedeflnmon'), and the kinds of quaha roles 
encoded 
We beheve that the corpus-orlented approach 
used dunng the encoding of the Damsh SIMPLE 
lexicon facdltates the Ident~ficat~on of new 
polysemous classes; since the differences m 
d~stnbutlonal patterns of the encoded wo~ds 
senses are a good indication of whether a regular 
polysemy relation could be revolved 
50 
<SemU ~d=" USEM_N_Dragor HUG_ 1" 
nammg="DragCr" /DlagOr- Dam~h otto~ 
example="DragC, r m.-i i ar af reed godt 31 mill kr ul den kommunale udhgnmg" 
/Tht~ ~ear DragCr muct pa) approt 31 mdl clo~n~ to the 6ommumty equahzatton / 
freedefimaon="de mennesker der bor t Dragcr"/The people hying in DragCr/ 
wetghtvalsemfeaturel=" 
WVSFTemplateHumanGroup 
WVSFTemplateS uperTypeGroupPROT 
<RWe~ghtValSemU 
target="USEM N_befolknmg_HUG_l" /pol~Uk:tum / 
senu'="S Rlsa"> 
<RWelghtValSemU 
target="USEM_NIndb}gger_HUM_l" / cttt-ett / 
semr="S RHasasrnember"> 
<RWe~ghtValSemU 
I target=" USEM_N_Dragcr_GEO_I" 
semr="S RPolysemyHu manGroup-GeopohttcalLocatlon":~ </SemU> 
4 Conclusion 
Damsh ts a typical Scandmawan language wtth 
respect to nominal compounding and patterns of 
regular polysemy In this paper we have 
demonstrated how a large-scale, plunhngual, 
and multffunctlonal lexicon project hke SIMPLE 
facdltates a flexible semanttc encoding that can 
capture untversal semantic principles as well as 
these language-spectflc charactensttcs 
Considering the current status of language 
technology for a 'small' European language such 
as Dantsh, the scope of the SIMPLE project 
makes it a truly ptoneermg project The 
development of thts harmontzed large-scale 
semantic lexicon for 12 European languages will 
enable system developers to ~mplement 
sophisticated language technology that will also 
encompass small European languages in the 
future 

References 

Pohttkens Store Nudansk Ordbog p~ cd-rom, Version 2 1 
1997 Polmkens Forlag, K~benhavn 

Boje, F & L Sch~sler (ed) (1992) 'DISEM - A Semantic 
MT-Component' m CST Wotktng Papers no 1, Center 
fol Sprogteknologt, Copenhagen 

Braasch, A & B Pedersen (torthcommg) 'En stor 
sprogteknologlsk ordbog tot dansk - med s2erhgt Iokus 
p3, hhndtermg at flertydxghed ten mveaudelt ordbo~', m 
7 Mode om Udfo:skmng af Dan~k Sptog, Arhus 
Umverstty 

Lenct, A, F Busa, N Rulmy, E Gola, M Monachml, N 
Calzolan, A Zampolh, El Gutmter, G Recourc6 L 
Humphreys, U Von Rekovsky, A Ogonowskl, C 
McCauley, W Peters, I Peters, M Vtllegas (1998) 
'Speclficatlons', SIMPLE Work, Ltngut~tt6 DehveJable 
D2 1, Ptsa 

Malmgren, S (1988) 'On Regular Polysemy m Swedtsh', 
m Studtes m Computer-Aided Lextcographv, Almqutst 
& Wtksell, Stockholm 

Rulmy, N O Corazzart, E Gola, A Spanu, N Calzolan, 
A Zampolh (1998) 'The European LE-PAROLE 
ProJect The ltahan Syntactic Lextcon', m Fost 
b~ternat,onal Conference on Language Re~ou:ces & 
Evaluatton, Granada, Spare 

Pagglo, P & B Orsnes (1993) Automatic translation ol 
nominal compounds A case study ot Damsh and ltahan' 
m Revtzta d, LuzgutJt,ca 5 I pp 129-156, Rosenbelg & 
Selher, Tormo 

Pustejovsky, J (1995) The Genetattve Lexicon, 
Cambridge, MA, The MIT Press 

Vossen, P, L Bloksma, H Rodngues, S Chment, A 
Roventm~, F Bretagna, A ALonge, W Peters (1998) 
'The EuroWordNet Base Concepts and Top Ontology', 
Dehverable D017, DO34,D036, WP5, LE2-4003 

Orsnes, B (1995) The Dettvatton and Compounding oJ 
Comple~ Event Nominals tn Modern Damsh PhD 
D~ssertatlon, Umverslty ot Copenhagen 
