Semantic di(:tionary viewed as a lexic~l database 
Elena V.l'aducho, w~ 
Ekaterina V.Rakhilina 
Marina V.Filipenko 
Institute of scientilic and technical inlbrmation (V1NITI) 
Academy of sciences of lLu~,d;t 
125219 Moscow, Usievicha 20a 
e-mail t)sy-pub@comlab.vega.msk.su 
Telefax: (7.095) 9430060 
Telex: 411249 
Abstract 
In this paper an expert system is described 
which is ealle(l Lexicographer and which aims 
at supplying the user with diverse informa- 
tion about lhlssian words, including biblio- 
graphic information concerning individual lex.. 
ieM items. It is SUl)posed that tim system may 
be of use for a practical contputationa| linguist 
,iLnd at the same time will serw~ as nn instru- 
ment of linguistic research. 
the user with diverse inform~tion about t/.m; 
slan words, of. \[2\]. 
The system is conceived ;~s an aid both in the 
area of natural language t)ro(:essing and in the 
traditi(mal lexicogr~qflly. 
The system consists o\[ two I)asi(: colnpollellts: 
\],cxi(:on (containing ~'~ome 13.000 most 
corn\[non words); 
- l|ibliograt)hical (1;md)ase. 
It is tim l,exicon that is of prim;~ry c<)ncern 
in this l~al)cr. 
Lexieal database and il,s The idea was topresent the l,exicon in :~ form 
advantages over traditional of a lexical dat, aha:;e (IA)B). 
dictionaries \[,l)t~ is a vo(:abuiary presented in ;~ machine 
rea(l~ble form and consisting of sever;d do 
mMnes, ;ts in a usuM relational databmse. 'l'}te In this paper we investigate general princi- 
ples implemented in nn expert systeln (cMled user may get information ahout morphology, 
LEXICOGRAPHER), designed to SUl)ply synl,~ctic combimd)ility and semantit: l'eatnres 
ACIT~ DE COLING-92, NANqES, 23-28 AOt~rr 1992 1 2 9 5 1)ROe. O1: COLING-92, NAN-n!S, Auc;. 23-28, 1992 
of individual lexical items. It is semantics that 
we concentrate upon in this paper. 
Many attempts have been made to use tra- 
ditional dictionaries in order to assign word 
senses to general semantic categories, cf. \[1\]. 
Our LDB contains semantic information that 
cannot be elicited from the existing dictionar- 
ies. The priority is given to semantic fea- 
tures influencing lexieal or grammatical co- 
occurrence. In this paper possibilities are 
discussed of predicting selection~l restrictions, 
syntactic features and other formal character- 
istics of the utterance - such as the array of ar- 
guments and their semantic interpretation, the 
meaning of an aspeetual form of a verb etc., - 
on the basis of semantic features of a word in 
the lexicon. 
The main advantage of a lexicai database as 
compared with a traditional dictionary consists 
in the fact that a database makes it possible to 
present semantic information in a format en- 
abling the computer to locate efficiently vari- 
ous types of information specified for a given 
class of words. To put it differently, the main 
advantage of a database consists in the possi- 
bility of compiling lists of words possessing a 
common feature or a set of features. 
There are three main principles that the sys- 
tem is based upon. 
1. We are convinced that semantic features 
of words determine co-occurence to a much 
greater extent than it is usually acknowledged. 
In other words, we claim that many aspects of 
syntactic subcategorization of lexical items are 
predictable from their meaning. 
2. A semantic feature of a word is essentially 
a semantic component (or components) in its 
lexieographic definition. 
3. A great amount of information about the 
meaning of a lexical unit; about its combina- 
tory possibilities; prosody; referential features; 
or about its regular ambiguity, need not he 
stored in the dictionary: this information be- 
longs to wi~at may be called a grammar of 
lexicon and should be formulated in a gen- 
eralized form. In this form it can be stored 
in a Lexical Knowledge-Base of semantic 
and syntactic regularities. This Knowledge- 
Base has not yet been designed, but semantic 
features of words in LDB are conceived as an 
input for general rules that will be stored in 
this hypothetical Knowledge-Base. 
2 Lexical Database for Con- 
crete Nouns 
There are different layers of lexicon that require 
specific formats of a database, and the choice 
of the format is one of the main problems of 
database formation. 
In what follows we list domains in the Lexical 
Database for Concrete Nouns - one of the com- 
ponents of Lexicographer, now implemented in 
a working program~ Each domain is interpreted 
as a feature that can take a definite set of val- 
ues. 
Domain I. Morphological and syntactico- 
ACTES DE COLING-92, NANTEs, 23-28 AOt~T 1992 ! 2 9 6 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
morphological information (taken front the 
graminatic~d dictionary \[3\]). 
'\].'his domain is subdivided into thre, e do- 
mains: 
1.1. Gender (fern., nlasc., neuter., com- 
nlon). 
1.2. Animate/Inanimate 
1.3. \])eclension and accentuation. 
All the other domains contain semantic in 
formation. We do not mean that the system of 
semantic features wouhl provide a word with 
an exhaustive texicographic delinition - this is 
not the appropriate task for a lexical database. 
Tire purpose of a database is to highlight those 
semantic aspects of a word that unite semanti 
eally cognate words and differentiate many of 
semantically different words from one another. 
In othe.r words, lexical database is an instru- 
ment of predicting and calculating all sorts of 
usefifl semantic classes of words. 
Domains ILl and II.2 specify MereologieM 
status of a word (more precisely, of a lexeme 
- namely, of a word taken in one of its lexi- 
cal meanings). The wdues of the feature I1.1 
may be: PART, SET or WII()LE. In the later 
case dommn 11.2 is emtrty while in the tlrst two 
cases it specifies the WIIOLE for the PART 
and the ELEMENT for the SI';T: I'ART (SI';T) 
of what? E.g., 
(1) krylo 'all' 
M-status I PART 
Of what { body 
(2) stado 'herd' 
M-status I SF/I' 
Of what \]animals 
(3) ehelovek 'man' 
M-status I WflOI,E 
Of what I - 
I)omain 1\[.3 provides a lexeme with a tax- 
onumic supercategory, such as Person, Plant, 
Animal, Metal, Building, Sphere of activity etc. 
Tiffs domain is of primary importance arrd it is 
tlds domain that defines the most interesting 
classes of concrete \]exenle8. The systenl of tax- 
onomic categories h~ a hierarchical structure. 
Thus, the possibility is provided to state imo 
plieative (lelmndencies between categories, so 
that the lower category inherits all the informa- 
tion from the category of a higher \[rwel. E.g., 
T. eategol'y (osobnjak 'private residence') = 
dora 'h<)use'; 
T-category (dora) - postrojka 'building'; 
T-category (postrojka) sooruzenie 'con- 
struction'. 
Thus, lexeme osobnjak will be assigned not 
only to the class of houses but ~dso to the class 
of lmildings and to the class of constructions. 
Domain II.4 specifies a Predicate semanti- 
cally connected with the noun in question. It 
turns out that such predicates occupy the nr()st 
prominent place in lexicographic definitions of 
a great majority of concrete nouns. Usually 
these are predicates that determine a standard 
way in which the corresponding object is used 
ACRES DE COLING-92, NAN'rE.S, 23-28 ^ot;r 1992 1 2 9 7 PROC. O1: COLING-92, NAturEs, AUG. 23-28, 1992 
(functional predicates): 
Predicate (house) = to live 
Predicate (chair) = to sit 
Predicate (goblet) = to drink. There are 
also nouns that imply a non-functional predi- 
cate in ttteir lexicographic definition - a predi- 
cate that determines its characteristic property, 
cf. 
Predicate (liquide) = to flow. 
Some nouns require predicates of both types, 
cf. 
Predicate (cellar) = 
1) to store products; 
2) digged under the floor of a house. 
For some classes of nouns Domain 1.4 Predi- 
cate is empty, e.g., for some (not all!) names of 
the so-called natural classes and for the names 
of parts of the corresponding objects, cf. 
krab 'crab': 
M-status I WHOLE 
Of what I - 
T-category \] animal 
Predicate \] - 
Inclusion of predicates into a lexicographic 
definition of concrete nouns may be considered 
an attempt to fertilize theoretical lexicography 
with the ideas of frame semantics. 
Domain II.5: Predicate may have a Restric- 
tion as for the range of possible taxonomic 
classes of its arguments, e.g. 
khishchnik 'beast of prey' 
M-status \[ WHOLE 
Of what \[ - 
T-category\] animM 
Predicate I to eat 
Restriction I animal 
The Database for Concrete Nouns is ready for 
demonstration. The database for verbs and a 
sma21 base for pronouns are in 'a stage of prepa- 
ration. 
3 Combinability predictions 
for concrete nouns 
Here are some examples of how semantic infor- 
mation contained in the database can be used 
to predict syntactic regularities. 
Example 1. As was stated earlier, domains 
II.1, II.2 define the following relations: 
1) PART-WHOLE; 
2) SET-ELEMENT. 
There are propositions that differentiate 
these two relations; thus, combinations in (a), 
with PART-WHOLE relation are possible with 
a preposition U, while combinations in (b), 
with a SET-ELEMENT relation, are not: 
a. nozka 'leg' U stula 'chair' 
pugovica 'button' U paljto ~coat' 
b. *chaschka 'cup' U serviza 'service' 
*korova 'cow' U stada 'herd' 
ACTES DE COLING-92, NAr, rrEs. 23-28 ^ot~'r 1992 1 2 9 8 Paoc. ol, COLING-92, NANTES. AUG. 23-28, 1992 
Note that Genitiw~ C~uue can l)e used to ex- 
press both relations. 
Example 2 makes use of the domain f)red 
icate: it is the predicate implied by a lexico- 
gr~l)hic definition of a noun th;Lt deterinine, ill 
very inany eases, the exact interpretation of the 
Genitive construction with a concrete noun ~ 
a heard. 
Thus, a noun gnezdo eaest' h:us a possessive 
valency gnezdo orla Chest of an eagle', chjo 
gnezdo? ~whose nest', \[)ccausc of the predi- 
cate ~to liw~' included in the lexicographic deft- 
nition of gnezdo 'nest' has an unbounded vari- 
able: who lives? On the other hand, for such 
a noun as professor 'professor' Genitive con. 
struction realizes its object valency, ef. profes- 
sor inatenlatiki 'professor of mathematics', be. 
cause of the l)re(licat(* ~to study', included in its 
lexicographic detinition; an unbounded variable 
here corresponds to the object wdency: studies 
what? 
Examples of this kind are ,~bundant. 
To stun up, the following aspects of the pro 
posed type of a semantic dictionary are of pri 
mary illlportance. 
1. The fact that information is presented in 
the forill of a (lataln~se, which provides the fit- 
cility of compiling all sorts of \[exical lists. 
2. intensive use ofT-categories (and other re.- 
current semantic features), which gives seman- 
tic explications for combinability restrictions. 
3. l)ivision of lexical information into two 
parts - Lexical Data Ba.se a~nd l,exicM Knowl 
edge Base, which widens the range of possible 
lexicographic generMiz~ttions. 
ReterellCCS 
\[1\] Cellerstan, M. (ed.) Stu,(~es in co,nput,,,' 
aided lexicoh)gy. Stockholm, 1988. 
\[2\] t'aducheva E.V., Rakhilina E.V. Pre- 
dicting co-oceurenee restrictions by us 
ing semantic classiilcations in the lexicon. 
(X)I,INC,-90, IIelsinki, 199(I. 
\[:1\] Zalizniak A.A. Granmlaticheskij slovar' 
russkogo jazyka. 2-d ed. Moscow, 198(I. 
Acn .~s l)E COLING-92, NANrES, 23 28 ^o(rr 1992 1 2 9 9 Paoc. o1: COLING-92, NAml~s, Au(J. 23-28, 1992 
