File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0802_metho.xml

Size: 18,995 bytes

Last Modified: 2025-10-06 14:14:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0802">
  <Title>GermaNet a Lexical-Semantic Net for German</Title>
  <Section position="4" start_page="9" end_page="10" type="metho">
    <SectionTitle>
3 Implementation
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="9" end_page="9" type="sub_section">
      <SectionTitle>
3.1 Coverage
</SectionTitle>
      <Paragraph position="0"> GermaNet shares the basic database division into the four word classes noun, adjective, verb, and adverb with WordNet, although adverbs are not implemented in the current working phase.</Paragraph>
      <Paragraph position="1"> For each of the word classes the semantic space is divided into some 15 semantic fields. The purpose of this division is mainly of an organizational nature: it allows to split the work into packages.</Paragraph>
      <Paragraph position="2"> Naturally, the semantic fields are closely related to major nodes in the semantic network. However, they do not have to agree completely with the net's top-level ontology, since a lexicographer can always include relations across these fields and the division into fields is normally not shown to the user by the interface software.</Paragraph>
      <Paragraph position="3"> GermaNet only implements lemmas. We assume that inflected forms are mapped to base forms by an external morphological analyzer (which might be integrated into an interface to GermaNet). In general, proper names and abbreviations are not integrated, even though the lexicographer may do so for important and frequent cases. Frequency counts from text corpora serve as a guideline for the inclusion of lemmas. In the current version of the database multi-word expressions are only covered occasionaly for proper names (Olympische Spiele) and terminological expressions (weifles Blutk6rperchen). Derivates and a large number of high frequent German compounds are coded manually, making frequent use 1We have access to a large tagged and lemmatized online corpus of 60.000.000 words, comprising the ECI-corpus (1994) (Frankfurter Rundschau, Danau-Kumer, VDI Nachr~chten) and the T~b,nger NewsKorpus, consisting of texts collected m Tfibingen from electronic newsgroups.</Paragraph>
      <Paragraph position="4"> of cross-classification. An implementation of a more suitable rule-based classification of derivates and the unlimited number of semantically transparent compounds fails due to the lack of algorithms for their sound semantic classification. The amount of polysemy is kept to a minimum in Germanet, an additional sense of a word is only introduced if it conflicts with the coordinates of other senses of the word in the network. When in doubt, GermaNet refers to the degree of polysemy given in standard monolingual print dictionaries. Additionally, GermaNet makes use of systematic crossclassification. null</Paragraph>
    </Section>
    <Section position="2" start_page="9" end_page="10" type="sub_section">
      <SectionTitle>
3.2 Relations
</SectionTitle>
      <Paragraph position="0"> Two basic types of relations can be distinguished: lexlcal relations which hold between different lexical realizations of concepts, and conceptual relations which hold between different concepts in all their particular realizations.</Paragraph>
      <Paragraph position="1"> Synonymy and antonymy are bidirectional lexical relations holding for all word classes. All other relations (except for the 'pertains to' relation) are conceptual relations. An example for synonymy are torkeln and taumeln, which both express the concept of the same particular lurching motion. An example for antonymy are the adjectives kalt (cold) and warm (warm). These two relations are implemented and interpreted in GermaNet as in WordNet.</Paragraph>
      <Paragraph position="2"> The relation pertains to relates denominal adjectives with their nominal base (finanzzell 'financial' with Finanzen 'finances'), deverbal nominalizations with their verbal base (Entdeckung 'discovery' with entdecken 'discover') and deadjectival nominalizations with their respective adjectival base (Mi~digkeit 'tiredness' with miide 'tired'). This pointer is semantic and not morphological in nature because different morphological realizations can be used to denote derivations from different meanings of the same lemma (e.g. konventionell is related to Konvention (Regeln des Urngangs) (social rule), while konventzonal is related to Konvention Ouristiseher Text) (agreement).</Paragraph>
      <Paragraph position="3"> The relation of hyponymy ('is-a') holds for all word classes and is implemented in GermaNet as in WordNet, so for example Rotkehlchen (robin) is a hyponym of Vogel (bird).</Paragraph>
      <Paragraph position="4"> Meronymy ('has-a'), the part-whole relation, holds only for nouns and is subdivided into three relations in WordNet (componentrelation, member-relation, stuff-relation). GetmaNet, however, currently assumes only one basic meronymy relation. An example for meronymy is Arm (arm) standing in the meronymy relation to KSrper (body).</Paragraph>
      <Paragraph position="5">  For verbs, WordNet makes the assumption that the relation of entailment holds in two different situations. (i) In cases of 'temporal inclusion' of two events as in schnarchen (snoring) entailing schlafen (sleeping). (ii) In cases without temporal inclusion as in what Fellbaum (1993, 19) calls 'backward presupposition', holding between gelingen (succeed) and versuchen (try). However, these two cases are quite distinct from each other, justifying their separation into two different relations in GermaNet. The relation of entailment is kept for the case of backward presupposition. Following a suggestion made in EuroWordNet (Alonge, 1996, 43), we distinguish temporal inclusion by its characteristics that the first event is always a subevent of the second, and thus the relation is called subevent relation.</Paragraph>
      <Paragraph position="6"> The cause relation in WordNet is restricted to hold between verbs. We extend its coverage to account for resultative verbs by connecting the verb to its adjectival resultative state. For example 5When (to open) causes often (open).</Paragraph>
      <Paragraph position="7"> Selectional restrictions, giving information about typical nominal arguments for verbs and adjectives, are additionally implemented. They do not exist in WordNet even though their existence is claimed to be important to fully characterize a verbs lexical behavior (Fellbaum, 1993, 28). These selectional properties will be generated automatically by clustering methods once a sense-tagged corpus with GermaNet classes is available.</Paragraph>
      <Paragraph position="8"> Another additional pointer is created to account for regular polysemy in an elegant and efficient way, marking potential regular polysemy at a very high level and thus avoiding duplication of entries and time-consuming work (c.f. section 5.1).</Paragraph>
      <Paragraph position="9"> As opposed to WordNet, connectivity between word classes is a strong point of GermaNet. This is achieved in different ways: The cross-class relations ('pertains to') of WordNet are used more frequently. Certain WordNet relations are modified to cross word classes (verbs are allowed to 'cause' adjectives) and new cross-class relations are introduced (e.g. 'selectional restrictions'). Cross-class relations are particularly important as the expression of one concept is often not restricted to a single word class.</Paragraph>
      <Paragraph position="10"> Additionally, the final version will contain examples for each concept which are to be automatically extracted from the corpus.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="10" end_page="10" type="metho">
    <SectionTitle>
4 Guiding Principles
</SectionTitle>
    <Paragraph position="0"> Some of the guiding principles of the GermaNet ontology creation are different from WordNet and therefore now explained.</Paragraph>
    <Section position="1" start_page="10" end_page="10" type="sub_section">
      <SectionTitle>
4.1 Artificial Concepts
</SectionTitle>
      <Paragraph position="0"> WordNet does contain artificial concepts, that is non-lexicaiized concepts. However, they are neither marked nor put to systematic use nor even exactly defined. In contrast, GermaNet enforces the systematic usage of artificial concepts and especially marks them by a &amp;quot;?'. Thus they can be cut out on the interface level if the user wishes so. We encode two different sorts of artificial concepts: (i) lexical gaps which are of a conceptual nature, meaning that they can be expected to be expressed in other languages (see figure 2) and (ii) proper artificial concepts (see figure 3). 2 Advantages of artificial concepts are the avoidance of unmotivated co-hyponyms and a systematic structuring of the data. See the following examples: In figure 1 noble man is a co-hyponym to the other three hyponyms of human, even though the first three are related to a certain education and noble man refers to a state a person is in from birth on. This intuition is modeled in figure 2 with the additional artificial concept feducated human.</Paragraph>
      <Paragraph position="1">  In figure 3, all concepts except for the leaves are proper artificial concepts. That is, one would not expect any language to explicitly verbalize the concept of for example manner of motion verbs which specify the specific instrument used. Nevertheless such a structuring is important because ~Note that these are not notationally distinguished up to now; this still needs to be added.</Paragraph>
      <Paragraph position="2"> ll it captures semantic intuitions every speaker of German has and it groups verbs according to their semantic relatedness.</Paragraph>
    </Section>
    <Section position="2" start_page="10" end_page="10" type="sub_section">
      <SectionTitle>
4.2 Cross-Classification
</SectionTitle>
      <Paragraph position="0"> Contrary to WordNet, GermaNet enforces the use of cross-classification whenever two conflicting hierarchies apply. This becomes important for example in the classification of animals, where folk and specialized biological hierarchy compete on a large scale. By cross-classifying between these two hierarchies the taxonomy becomes more accessible and integrates different semantic components which are essential to the meaning of the concepts. For example, in figure 4 the concept of a cat is shown to biologically be a vertebrate, and a pet in the folk hierarchy, whereas a whale is only  The concept of cross-classification is of great importance in the verbal domain as well, where most concepts have several meaning components according to which they could be classified. However, relevant information would be lost if only one particular aspect was chosen with respect to hyponymy. Verbs of sound for example form a distinct semantic class (Levin et al., in press), the members of which differ with respect to additional verb classes with which they cross-classify, in English as in German. According to Levin (in press, 7), some can be used as verbs of motion accompanied by sound ( A train rumbled across the loopline bridge.), others as verbs of introducing direct speech (Annabel squeaked, &amp;quot;Why can't you stay with us?&amp;quot;) or verbs expressing the causation of the emission of a sound (He crackled the newspaper, folding it carelessly). Systematic cross-classification allows to capture this fine-grained distinction easily and in a principle-based way.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="10" end_page="13" type="metho">
    <SectionTitle>
5 Individual Word Classes
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="10" end_page="12" type="sub_section">
      <SectionTitle>
5.1 Nouns
</SectionTitle>
      <Paragraph position="0"> With respect to nouns the treatment of regular polysemy in GermaNet deserves special attention. null A number of proposals have been made for the representation of regular polysemy in the lexicon. It is generally agreed that a pure sense enumeration approach is not sufficient. Instead, the different senses of a regularly polysemous word need to be treated in a more principle-based manner (see for example Pustejovsky (1996)).</Paragraph>
      <Paragraph position="1"> GermaNet is facing the problem that lexical entries are integrated in an ontology with strict inheritance rules. This implies that any notion of regular polysemy must obey the rules of inheritance. It furthermore prohibits joint polysemous entries with dependencies from applying for only one aspect of a polysemous entry.</Paragraph>
      <Paragraph position="2"> A familiar type of regular polysemy is the &amp;quot;organization - building it occupies&amp;quot; polysemy. GermaNet lists synonyms along with each concept.</Paragraph>
      <Paragraph position="3"> Therefore it is not possible to merge such a type of polysemy into one concept and use cross-classification to point to both, institution and buil&amp;ng as in figure 5. This is only possible if all synonyms of both senses and all their dependent nodes in the hierarchy share the same regular polysemy, which is hardly ever the case.</Paragraph>
      <Paragraph position="4">  lartlfact I Iorganizativnl I 1 If, ilityl lin,titutio. I  Cross-Classification as To allow for regular polysemy, GermaNet introduces a special bidirectional relator which is placed to the top concepts for which the regular polysemy holds (c.f. figure 6).</Paragraph>
      <Paragraph position="5"> In figure 6 the entry bank1 (a financial institutzon that accepts depossts and channels the money into lending activities) may have the synonyms depository financial institution, banking concern,  banking company, which are not synonyms of banks (a building in which commercial banking is transacted). In addition, bankl may have hyponyms such as credit union, agent bank, commercial bank, full service bank, which do not share the regular polysemy of bank1 and banks.</Paragraph>
      <Paragraph position="6"> Statistically frequent cases of regular polysemy are manually and explicitly encoded in the net. This is necessary because they often really are two separate concepts (as in pork, pig) and each sense may have different synonyms (pork meat is only synonym to pork). However, the polysemy pointer additionally allows the recognition of statistically infrequent uses of a word sense created by regular polysemy. So for example the sentence I had crocodile for lunch is very infrequent in that crocodile is no t commonly perceived as meat but only as animal. Nevertheless we know that a regular polysemy exists between meat and animal. Therefore we can reconstruct via the regular polysemy pointer that the meat sense is referred to in this particular sentence even though it is not explicitly encoded. Thus the pointer can be conceived of as an implementation of a simple default via which the net can account for language productivity and regularity in an effective manner.</Paragraph>
    </Section>
    <Section position="2" start_page="12" end_page="13" type="sub_section">
      <SectionTitle>
5.2 Adjectives
</SectionTitle>
      <Paragraph position="0"> Adjectives in GermaNet are modeled in a taxonomical manner making heavy use of the hyponymy relation, which is very different from the satellite approach taken in WordNet. Our approach avoids the rather fuzzy concept of indirect antonyms introduced by WordNet. Additionally we do not introduce artificial antonyms as WordNet does (pregnant, unpregnant). The taxo- null nomical classes follow (Hundsnurscher and Splett, 1982) with an additional class for pertainyms 3.</Paragraph>
    </Section>
    <Section position="3" start_page="13" end_page="13" type="sub_section">
      <SectionTitle>
5.3 Verbs
</SectionTitle>
      <Paragraph position="0"> Syntactic frames and particle verbs deserve special attention in the verbal domain. The frames used in GermaNet differ from those in WordNet, and particle verbs as such are treated in WordNet at all.</Paragraph>
      <Paragraph position="1"> Each verb sense is linked to one or more syntactic frames which are encoded on a lexical rather than on a conceptual level. The frames used in GermaNet are based on the complementation codes provided by CELEX (Burnage, 1995). The notation in GermaNet differs from the CELEX database in providing a notation for the subject and a complementation code for Obligatory reflexive phrases. GermaNet provides frames for verb senses, rather than for lemmas, implying a full disambiguation of the CELEX complementation codes for GermaNet.</Paragraph>
      <Paragraph position="2"> Syntactic information in GermaNet differs from that given in WordNet in several ways. It marks expletive subjects and reflexives explicitly, encodes case information, which is especially important in German, distinguishes between different realizations of prepositional and adverbial phrases and marks to-infinitival as well as pure infinitival complements explicitly.</Paragraph>
      <Paragraph position="3"> Particles pose a particular problem in German.</Paragraph>
      <Paragraph position="4"> They are very productive, which would lead to an explosion of entries if each particle verb was explicitly encoded. Some particles establish a regular semantic pattern which can not be accounted for by a simple enumeration approach, whereas others are very irregular and ambiguous. We therefore propose a mixed approach, treating irregular particle verbs by enumeration and regular particle verbs in a compositional manner. Composition can be thought of as a default which can be overwritten by explicit entries in the database. We assume a morphological component such as GERTWOL (1996) to apply before the compositional process starts. Composition itself is implemented as follows, relying on a separate lexicon for particles. The particle lexicon is hierarchically structured and lists selectional restrictions with respect to the base verb selected. An example for the hierarchical structure is given in figure 7 (without selectional restrictions for matters of simplicity), where heraus- is a hyponym of her- and aus-. SAdjectives pertaining to a noun from which they derive their meaning (financial, finances).</Paragraph>
      <Paragraph position="5"> Selectional restrictions for particles include Aktionsart, a particular semantic verb field, deictic orientation and directional orientation of the base verb.</Paragraph>
      <Paragraph position="6"> The evaluation of a particle verb takes the following steps. First, GermaNet is searched for an explicit entry of the particle verb. If no such entry exists the verb is morphologically analyzed and its semantics is compositionally determined. For example the particle verb herauslau\]en in figure7 is a hyponym to lau\]en (walk) as well as to heraus-.</Paragraph>
      <Paragraph position="7"> Criteria for a compositional treatment are separability, productivity and a regular semantics of the particle (see Fleischer and Barz (1992), Stiebels (1994), Stegmann (1996)).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML