GermaNet - a Lexical-Semantic Net for German 
Birgit Hamp and Helmut Feldweg 
Seminar fiir Sprachwissenschaft 
University of Tiibingen 
Germany 
email: {hamp, feldweg}~sfs, nphil, uni-tuebingen, de 
Abstract 
We present the lexical-semantic net for 
German "GermaNet" which integrates 
conceptual ontological information with 
lexical semantics, within and across word 
classes. It is compatible with the Prince- 
ton WordNet but integrates principle- 
based modifications on the construc- 
tional and organizational level as well as 
on the level of lexical and conceptual re- 
lations. GermaNet includes a new treat- 
ment of regular polysemy, artificial con- 
cepts and of particle verbs. It further- 
more encodes cross-classification and ba- 
sic syntactic information, constituting an 
interesting tool in exploring the interac- 
tion of syntax and semantics. The de- 
velopment of such a large scale resource 
is particularly important as German up 
to now lacks basic online tools for the se- 
mantic exploration of very large corpora. 
1 Introduction 
GermaNet is a broad-coverage lexical-semantic 
net for German which currently contains some 
16.000 words and alms at modeling at least the 
base vocabulary of German. It can be thought 
of as an online ontology in which meanings asso- 
ciated with words (so-called synsets) are grouped 
according to their semantic relatedness. The basic 
framework of GermaNet is similar to the Prince- 
ton WordNet (Miller et al., 1993), guarantee- 
ing maximal compatibility. Nevertheless some 
principle-based modifications have been applied. 
GermaNet is built from scratch, which means that 
it is neither a translation of the English Word- 
Net nor is it based on a single dictionary or the- 
saurus. The development of a German wordnet 
has the advantage that the applications devel- 
oped for English using WordNet as a resource can 
be used for German with only minor modifica- 
tions. This affects for example information extrac- 
tion, automatic sense disambiguation and intelli- 
gent document retrieval. Furthermore, GermaNet 
can serve as a training source for statistical meth- 
ods in natural language processing (NLP) and it 
makes future integration of German in multilin- 
gual resources such as EuroWordNet (Bloksma et 
al., 1996) possible. 
This paper gives an overview of the resource 
situation, followed by sections on the coverage of 
the net and the basic relations used for linkage 
of lexical and conceptual items. The main part 
of the paper is concerned with the construction 
principles of GermaNet and particular features of 
each of the word classes. 
2 Resources and Modeling 
Methods 
In English a variety of large-scale online linguistic 
resources are available. The application of these 
resources is essential for various NLP tasks in re- 
ducing time effort and error rate, as well as guar- 
anteeing a broader and more domain-independent 
coverage. The resources are typically put to use 
for the creation of consistent and large lexical 
databases for parsing and machine translation as 
well as for the treatment of lexical, syntactic and 
semantic ambiguity. Furthermore, linguistic re- 
sources are becoming increasingly important as 
training and evaluation material for statistical 
methods. 
In German, however, not many large-scale 
monolingual resources are publically available 
which can aid the building of a semantic net. The 
particular resource situation for German makes 
it necessary to rely to a large extent on manual 
labour for the creation process of a wordnet, based 
on monolingual general and specialist dictionaries 
and literature, as well as comparisons with the 
English WordNet. However, we take a strongly 
9 
corpus-based approach by determining the base 
vocabulary modeled in GermaNet by lemmatized 
frequency lists from text corpora x. This list is fur- 
ther tuned by using other available sources such as 
the CELEX German database. Clustering meth- 
ods, which in principle can apply to large corpora 
without requiring any further information in order 
to give similar words as output, proved to be inter- 
esting but not helpful for the construction of the 
core net. Selectional restrictions of verbs for nouns 
will, however, be automatically extracted by clus- 
tering methods. We use the Princeton Word- 
Net technology for the database format, database 
compilation, as well as the Princeton WordNet in- 
terface, applying extensions only where necessary. 
This results in maximal compatibility. 
3 Implementation 
3.1 Coverage 
GermaNet shares the basic database division into 
the four word classes noun, adjective, verb, and 
adverb with WordNet, although adverbs are not 
implemented in the current working phase. 
For each of the word classes the semantic space 
is divided into some 15 semantic fields. The pur- 
pose of this division is mainly of an organizational 
nature: it allows to split the work into packages. 
Naturally, the semantic fields are closely related 
to major nodes in the semantic network. How- 
ever, they do not have to agree completely with 
the net's top-level ontology, since a lexicographer 
can always include relations across these fields and 
the division into fields is normally not shown to 
the user by the interface software. 
GermaNet only implements lemmas. We as- 
sume that inflected forms are mapped to base 
forms by an external morphological analyzer 
(which might be integrated into an interface to 
GermaNet). In general, proper names and ab- 
breviations are not integrated, even though the 
lexicographer may do so for important and fre- 
quent cases. Frequency counts from text corpora 
serve as a guideline for the inclusion of lemmas. In 
the current version of the database multi-word ex- 
pressions are only covered occasionaly for proper 
names (Olympische Spiele) and terminological ex- 
pressions (weifles Blutk6rperchen). Derivates and 
a large number of high frequent German com- 
pounds are coded manually, making frequent use 
1We have access to a large tagged and lemma- 
tized online corpus of 60.000.000 words, compris- 
ing the ECI-corpus (1994) (Frankfurter Rundschau, Danau-Kumer, VDI Nachr~chten) 
and the T~b,nger NewsKorpus, 
consisting of texts collected m Tfibingen 
from electronic newsgroups. 
of cross-classification. An implementation of a 
more suitable rule-based classification of derivates 
and the unlimited number of semantically trans- 
parent compounds fails due to the lack of algo- 
rithms for their sound semantic classification. The 
amount of polysemy is kept to a minimum in Ger- 
manet, an additional sense of a word is only intro- 
duced if it conflicts with the coordinates of other 
senses of the word in the network. When in doubt, 
GermaNet refers to the degree of polysemy given 
in standard monolingual print dictionaries. Addi- 
tionally, GermaNet makes use of systematic cross- 
classification. 
3.2 Relations 
Two basic types of relations can be distinguished: 
lexlcal relations which hold between different 
lexical realizations of concepts, and conceptual 
relations which hold between different concepts 
in all their particular realizations. 
Synonymy and antonymy are bidirectional 
lexical relations holding for all word classes. All 
other relations (except for the 'pertains to' rela- 
tion) are conceptual relations. An example for 
synonymy are torkeln and taumeln, which both 
express the concept of the same particular lurch- 
ing motion. An example for antonymy are the 
adjectives kalt (cold) and warm (warm). These 
two relations are implemented and interpreted in 
GermaNet as in WordNet. 
The relation pertains to relates denominal ad- 
jectives with their nominal base (finanzzell 'finan- 
cial' with Finanzen 'finances'), deverbal nominal- 
izations with their verbal base (Entdeckung 'dis- 
covery' with entdecken 'discover') and deadjecti- 
val nominalizations with their respective adjecti- 
val base (Mi~digkeit 'tiredness' with miide 'tired'). 
This pointer is semantic and not morphological 
in nature because different morphological realiza- 
tions can be used to denote derivations from dif- 
ferent meanings of the same lemma (e.g. konven- 
tionell is related to Konvention (Regeln des Urn- 
gangs) (social rule), while konventzonal is related 
to Konvention Ouristiseher Text) (agreement). 
The relation of hyponymy ('is-a') holds for all 
word classes and is implemented in GermaNet as 
in WordNet, so for example Rotkehlchen (robin) 
is a hyponym of Vogel (bird). 
Meronymy ('has-a'), the part-whole rela- 
tion, holds only for nouns and is subdivided 
into three relations in WordNet (component- 
relation, member-relation, stuff-relation). Get- 
maNet, however, currently assumes only one basic 
meronymy relation. An example for meronymy is 
Arm (arm) standing in the meronymy relation to 
KSrper (body). 
10 
For verbs, WordNet makes the assumption that 
the relation of entailment holds in two differ- 
ent situations. (i) In cases of 'temporal inclusion' 
of two events as in schnarchen (snoring) entailing 
schlafen (sleeping). (ii) In cases without tempo- 
ral inclusion as in what Fellbaum (1993, 19) calls 
'backward presupposition', holding between gelin- 
gen (succeed) and versuchen (try). However, these 
two cases are quite distinct from each other, justi- 
fying their separation into two different relations 
in GermaNet. The relation of entailment is kept 
for the case of backward presupposition. Follow- 
ing a suggestion made in EuroWordNet (Alonge, 
1996, 43), we distinguish temporal inclusion by 
its characteristics that the first event is always a 
subevent of the second, and thus the relation is 
called subevent relation. 
The cause relation in WordNet is restricted to 
hold between verbs. We extend its coverage to 
account for resultative verbs by connecting the 
verb to its adjectival resultative state. For ex- 
ample 5When (to open) causes often (open). 
Selectional restrictions, giving information 
about typical nominal arguments for verbs and 
adjectives, are additionally implemented. They do 
not exist in WordNet even though their existence 
is claimed to be important to fully characterize a 
verbs lexical behavior (Fellbaum, 1993, 28). These 
selectional properties will be generated automat- 
ically by clustering methods once a sense-tagged 
corpus with GermaNet classes is available. 
Another additional pointer is created to account 
for regular polysemy in an elegant and efficient 
way, marking potential regular polysemy at a very 
high level and thus avoiding duplication of entries 
and time-consuming work (c.f. section 5.1). 
As opposed to WordNet, connectivity between 
word classes is a strong point of GermaNet. This 
is achieved in different ways: The cross-class rela- 
tions ('pertains to') of WordNet are used more fre- 
quently. Certain WordNet relations are modified 
to cross word classes (verbs are allowed to 'cause' 
adjectives) and new cross-class relations are in- 
troduced (e.g. 'selectional restrictions'). Cross- 
class relations are particularly important as the 
expression of one concept is often not restricted 
to a single word class. 
Additionally, the final version will contain ex- 
amples for each concept which are to be automat- 
ically extracted from the corpus. 
4 Guiding Principles 
Some of the guiding principles of the GermaNet 
ontology creation are different from WordNet and 
therefore now explained. 
4.1 Artificial Concepts 
WordNet does contain artificial concepts, that is 
non-lexicaiized concepts. However, they are nei- 
ther marked nor put to systematic use nor even 
exactly defined. In contrast, GermaNet enforces 
the systematic usage of artificial concepts and es- 
pecially marks them by a "?'. Thus they can be 
cut out on the interface level if the user wishes 
so. We encode two different sorts of artificial con- 
cepts: (i) lexical gaps which are of a conceptual 
nature, meaning that they can be expected to be 
expressed in other languages (see figure 2) and (ii) 
proper artificial concepts (see figure 3). 2 Advan- 
tages of artificial concepts are the avoidance of un- 
motivated co-hyponyms and a systematic struc- 
turing of the data. See the following examples: 
In figure 1 noble man is a co-hyponym to the 
other three hyponyms of human, even though the 
first three are related to a certain education and 
noble man refers to a state a person is in from birth 
on. This intuition is modeled in figure 2 with the 
additional artificial concept feducated human. 
Imaslor c~ 
Figure 1: Without Artifical Concepts 
Figure 2: Lexical Gaps 
In figure 3, all concepts except for the leaves 
are proper artificial concepts. That is, one would 
not expect any language to explicitly verbalize the 
concept of for example manner of motion verbs 
which specify the specific instrument used. Nev- 
ertheless such a structuring is important because 
~Note that these are not notationally distinguished 
up to now; this still needs to be added. 
ll 
it captures semantic intuitions every speaker of 
German has and it groups verbs according to their 
semantic relatedness. 
4.2 Cross-Classification 
Contrary to WordNet, GermaNet enforces the use 
of cross-classification whenever two conflicting hi- 
erarchies apply. This becomes important for ex- 
ample in the classification of animals, where folk 
and specialized biological hierarchy compete on 
a large scale. By cross-classifying between these 
two hierarchies the taxonomy becomes more ac- 
cessible and integrates different semantic compo- 
nents which are essential to the meaning of the 
concepts. For example, in figure 4 the concept of 
a cat is shown to biologically be a vertebrate, and 
a pet in the folk hierarchy, whereas a whale is only 
a vertebrate and not a pet. 
Figure 4: Cross-Classification 
The concept of cross-classification is of great 
importance in the verbal domain as well, where 
most concepts have several meaning components 
according to which they could be classified. How- 
ever, relevant information would be lost if only 
one particular aspect was chosen with respect to 
hyponymy. Verbs of sound for example form a 
distinct semantic class (Levin et al., in press), the 
members of which differ with respect to additional 
verb classes with which they cross-classify, in En- 
glish as in German. According to Levin (in press, 
7), some can be used as verbs of motion accompa- 
nied by sound ( A train rumbled across the loop- 
line bridge.), others as verbs of introducing direct 
speech (Annabel squeaked, "Why can't you stay 
with us?") or verbs expressing the causation of 
the emission of a sound (He crackled the news- 
paper, folding it carelessly). Systematic cross- 
classification allows to capture this fine-grained 
distinction easily and in a principle-based way. 
5 Individual Word Classes 
5.1 Nouns 
With respect to nouns the treatment of regular 
polysemy in GermaNet deserves special atten- 
tion. 
A number of proposals have been made for the 
representation of regular polysemy in the lexicon. 
It is generally agreed that a pure sense enumera- 
tion approach is not sufficient. Instead, the differ- 
ent senses of a regularly polysemous word need to 
be treated in a more principle-based manner (see 
for example Pustejovsky (1996)). 
GermaNet is facing the problem that lexical en- 
tries are integrated in an ontology with strict in- 
heritance rules. This implies that any notion of 
regular polysemy must obey the rules of inheri- 
tance. It furthermore prohibits joint polysemous 
entries with dependencies from applying for only 
one aspect of a polysemous entry. 
A familiar type of regular polysemy is the "or- 
ganization - building it occupies" polysemy. Ger- 
maNet lists synonyms along with each concept. 
Therefore it is not possible to merge such a type 
of polysemy into one concept and use cross- 
classification to point to both, institution and 
buil&ng as in figure 5. This is only possible if 
all synonyms of both senses and all their depen- 
dent nodes in the hierarchy share the same regular 
polysemy, which is hardly ever the case. 
lartlfact I Iorganizativnl 
I 1 If, ilityl lin,titutio. I 
Figure 5: Regular Polysemy 
Cross-Classification 
as 
To allow for regular polysemy, GermaNet in- 
troduces a special bidirectional relator which is 
placed to the top concepts for which the regular 
polysemy holds (c.f. figure 6). 
In figure 6 the entry bank1 (a financial institu- 
tzon that accepts depossts and channels the money 
into lending activities) may have the synonyms 
depository financial institution, banking concern, 
12 
i~manner of motion I 
[ '~iter'atlve I I?spoed I lTin..trumentl I'~sound I'general 
.o- oo-- .0 
Figure 3: Proper Artificial Concepts 
Ior  za"on I F 
regular polysemy pointer 
I 
I 
i depository financial institufionl 
bankl, banking concern, \[ 
banking company | 
L I 
I I I 
I 
Figure 6: Regular Polysemy Pointer 
banking company, which are not synonyms of 
banks (a building in which commercial banking 
is transacted). In addition, bankl may have hy- 
ponyms such as credit union, agent bank, commer- 
cial bank, full service bank, which do not share the 
regular polysemy of bank1 and banks. 
Statistically frequent cases of regular polysemy 
are manually and explicitly encoded in the net. 
This is necessary because they often really are 
two separate concepts (as in pork, pig) and each 
sense may have different synonyms (pork meat is 
only synonym to pork). However, the polysemy 
pointer additionally allows the recognition of sta- 
tistically infrequent uses of a word sense created 
by regular polysemy. So for example the sentence 
I had crocodile for lunch is very infrequent in that 
crocodile is no t commonly perceived as meat but 
only as animal. Nevertheless we know that a reg- 
ular polysemy exists between meat and animal. 
Therefore we can reconstruct via the regular pol- 
ysemy pointer that the meat sense is referred to 
in this particular sentence even though it is not 
explicitly encoded. Thus the pointer can be con- 
ceived of as an implementation of a simple default 
via which the net can account for language pro- 
ductivity and regularity in an effective manner. 
5.2 Adjectives 
Adjectives in GermaNet are modeled in a tax- 
onomical manner making heavy use of the hy- 
ponymy relation, which is very different from the 
satellite approach taken in WordNet. Our ap- 
proach avoids the rather fuzzy concept of indi- 
rect antonyms introduced by WordNet. Addition- 
ally we do not introduce artificial antonyms as 
WordNet does (pregnant, unpregnant). The taxo- 
13 
nomical classes follow (Hundsnurscher and Splett, 
1982) with an additional class for pertainyms 3. 
5.3 Verbs 
Syntactic frames and particle verbs deserve spe- 
cial attention in the verbal domain. The frames 
used in GermaNet differ from those in WordNet, 
and particle verbs as such are treated in WordNet 
at all. 
Each verb sense is linked to one or more syntac- 
tic frames which are encoded on a lexical rather 
than on a conceptual level. The frames used 
in GermaNet are based on the complementation 
codes provided by CELEX (Burnage, 1995). The 
notation in GermaNet differs from the CELEX 
database in providing a notation for the subject 
and a complementation code for Obligatory reflex- 
ive phrases. GermaNet provides frames for verb 
senses, rather than for lemmas, implying a full 
disambiguation of the CELEX complementation 
codes for GermaNet. 
Syntactic information in GermaNet differs from 
that given in WordNet in several ways. It marks 
expletive subjects and reflexives explicitly, en- 
codes case information, which is especially impor- 
tant in German, distinguishes between different 
realizations of prepositional and adverbial phrases 
and marks to-infinitival as well as pure infinitival 
complements explicitly. 
Particles pose a particular problem in German. 
They are very productive, which would lead to an 
explosion of entries if each particle verb was ex- 
plicitly encoded. Some particles establish a regu- 
lar semantic pattern which can not be accounted 
for by a simple enumeration approach, whereas 
others are very irregular and ambiguous. We 
therefore propose a mixed approach, treating ir- 
regular particle verbs by enumeration and regular 
particle verbs in a compositional manner. Com- 
position can be thought of as a default which can 
be overwritten by explicit entries in the database. 
We assume a morphological component such as 
GERTWOL (1996) to apply before the composi- 
tional process starts. Composition itself is imple- 
mented as follows, relying on a separate lexicon 
for particles. The particle lexicon is hierarchically 
structured and lists selectional restrictions with 
respect to the base verb selected. An example for 
the hierarchical structure is given in figure 7 (with- 
out selectional restrictions for matters of simplic- 
ity), where heraus- is a hyponym of her- and aus-. 
SAdjectives pertaining to a noun from which they 
derive their meaning (financial, finances). 
Selectional restrictions for particles include Ak- 
tionsart, a particular semantic verb field, deictic 
orientation and directional orientation of the base 
verb. 
The evaluation of a particle verb takes the fol- 
lowing steps. First, GermaNet is searched for an 
explicit entry of the particle verb. If no such entry 
exists the verb is morphologically analyzed and its 
semantics is compositionally determined. For ex- 
ample the particle verb herauslau\]en in figure7 is 
a hyponym to lau\]en (walk) as well as to heraus-. 
Criteria for a compositional treatment are sep- 
arability, productivity and a regular semantics 
of the particle (see Fleischer and Barz (1992), 
Stiebels (1994), Stegmann (1996)). 
6 Conclusion 
A wordnet for German has been described which, 
compared with the Princeton WordNet, integrates 
principle-based modifications and extensions on 
the constructional and organizational level as well 
as on the level of lexical and conceptual relations. 
Innovative features of GermaNet are a new treat- 
ment of regular polysemy and of particle verbs, 
as well as a principle-based encoding of cross- 
classification and artificial concepts. As com- 
patibility with the Princeton WordNet and Eu- 
roWordNet is a major construction criteria of Ger- 
maNet, German can now, finally, be integrated 
into multilingual large-scale projects based on on- 
tological and conceptual information. This con- 
stitutes an important step towards the design of 
truly multilingual tools applicable in key areas 
such as information retrieval and intelligent In- 
ternet search engines. 

References 
A. Alonge. 1996. Definition of the links and sub- 
sets for verbs. Deliverable D006, WP4.1, Eu- 
roWordNet, LE2-4003. 
L. Bloksma, P. Diez-Orzas, and P. Vossen. 1996. 
User Requirements and Functional Specifica- 
tion of the EuroWordNet Project. Deliverable 
D001, WP1, EuroWordNet, LE2-4003. 
G. Burnage, 1995. The CELEX Lexical Database, 
Release 2. CELEX - Centre for Lexical In- 
formation; Max Planck Institute for Psycholin- 
guistics, The Netherlands. 
C. Fellbaum. 1993. A Semantic Network of 
English Verbs. In G.A. Miller, It. Beckwith, 
C. Fellbaum, D. Gross, and K. Miller, editors, 
Five Papers on WordNet. August. Revised ver- 
sion. 
Wolfgang Fleischer and Irmhild Barz. 1992. 
Wortbildung der deutsehen Gegenwartssprache. 
Max Niemeyer Verlag, Tiibingen. 
GERTWOL. 1996. German morphological anal- 
yser. http://www.lingsoft.fi/doc/gertwol/. 
Franz Hundsnurscher and Jochen Splett. 1982. 
Semantik der Adjektive des Deutschen: Analyse 
der semantisehen Relationen. Westdeutscher 
Verlag, Opladen. 
European Corpus Initiative. 1994. European Cor- 
pus Initiative Multilingual Corpus. 
B. Levin, G. Song, and B.T.S. Atkins. in press. 
Making sense of corpus data: A case study of 
verbs of sound. International Journal o\] Corpus 
Linguistics, page 41 pages. 
G.A. Miller, R. Beckwith, C. Fellbanm, D. Gross, 
and K. Miller. 1993. Five Papers on WordNet. 
Technical report, Cognitive Science Laboratory, 
Princeton University, August. Revised version. 
James Pustejovsky. 1996. The Generative Lexi- 
con. MIT Press. 
Rosmary Stegmann. 1996. Semantic Analysis 
and Classification of Verbs of Direction. MA- 
Thesis. Seminar ffir Sprachwissenschaft, Uni- 
versit/it Tfibingen. 
Barbara Stiebels. 1994. Lexikalische Argumente 
und Adjunkte. Zum semantischen Beitrag von 
verbalen Priifixen und Partikeln. Disser- 
tation. Philosophische Fakult~t, Universit/it 
Dfisseldorf. 
