Word Manager. A System for the Definition, Access and 
Maintenance of Lexical Databases 
Marc DOMENIG 
Institut ffir Informatik 
Universitt~t Zfirich-Irchel 
CH-8057 Zfirich 
Swi~erland 
Abstract 
This paper describes Word Manager, a system which 
is currently the object of a research project at the 
University of Zfirich Computer Science Department. 
Word Manager supports the definition, access and 
maintenance of lexical databases. It comprises a 
formal language for the implemenation of 
morphological knowledge. This formal language is 
integrated in a graphics-oriented, high-level user 
Interface and is language independent. The project is 
now in the prototyping phase where parts of the 
software are pretty far advanced (the user interface) 
and .others are still rudimentary (the rule 
compilcr/runtime system). The design of the system 
was strongly influenced by Koskenniemi's two-level 
model /Koskenniemi 1983/, its successors /Bear 
1986/, /Black 1986/, /Borin 1986/, /Darymple 
1987/, the ANSI-SPARC 3-Schema Concept /ANSI- 
X3-SPARC 1975/ and visual programming techniques 
/Bocker 1986/, /Myers 1986/: We will focus the 
discussion on one aspect: the user interfacing for the 
construction of the lexlcal data base. 
I. Introduction 
As i have argued elsewhere /Domenig 1986, 1987a, 
1987b/, a dedicated system yields many advantages 
for the implementation, use and maintenance of 
lexical databases. The functionality of general 
purpose database management systems - e.g. 
relational ones - ls too limited for lexlcal databases 
because they are not tuned to the task at hand; In 
particular, they do not provide for a formalism which 
is suited to describe linguistic knowledge. The 
reason wily we would like to have such a formalism is 
that it allows us to take advantage of a computer's 
processing abilities, i.e. we may construct a lexieal 
database which is not only a collection of purely 
'static' information - a set of entries - but has 
'dynamic ~ capabilities. For instance, the latter might 
be that it can analyse and generate inflected or 
composed word forms. "What would be the advantage 
of that?" one might ask. "It is no problem to add on 
these capabilities to a purely 'static' set of entries 
stored within a commercially available database 
management system by writing programs in the host 
language to this systemI" 
The answer is: there are a lot of advantages and 
I hope to clarify some of them in this paper. A 
dedicated system supports the construction, use and 
maintenance of lexical databases much more directly 
than a general purpose database management system 
in conjunction with a conventional programming 
154 
language interface. Word Manager was designed as 
such a system, whereas Word Manager does not 
necessarily manage all the information stored in a 
lexical database. At this stage of the project, it 
manages only morphological knowledge, i.e. it would 
be quite feasible to use it as a front-end to a database 
managed by a general purpose system. 
2. Overview of the user interfacing 
Word Manager distinguishes two quite different 
Interfaces for the construction and maintenance of 
lexical databases: one for the specification of what I 
term conceptual knowledge (linguist interface) and 
one for the specification of what I call non- 
conceptual knowledge (lexicographer interface). The 
former is the place where the kind of morphological 
knowledge is defined which can be typically found In 
grammars, the latter is a dialogue-oriented interface 
for the entering of the bulk of the data. 
The relationship between the two Interfaces is 
one of a strong dependency, i.e. the lexicographer 
interface depends very much on the specifications In 
the linguist interface. Much of the machine- 
lexicographer dialogue can be inferred automatically 
from these specifications. The formalism employed 
in the linguist interface was designed to be powerful 
enough to implement morphological knowledge of 
several natural languages on the one hand, yet 
dedicated enough to be easy to handle for linguists. 
Moreover, it provides the opportunity to experiment 
with different conceptual approaches within a 
certain framework. The following section will oufline 
it. 
3. The specification of morphological knowledge in 
the linguist interface 
The linguist interface is conceived as a highly 
controlled environment which takes advantage of the 
latest hard- and software technology. This means that 
the user does not communicate with the computer 
on the level of its operating system except for when 
the application is started. On the level of the 
operating system, each morphological knowledge 
specification is represented by a so-called document 
/con (the two rightmost icons in Fig. 1 are document 
icons). By mousing such an icon, the user may start 
the application and load the specification stored in 
the document. Alternatively, he could start it by 
mousing the application Icon (the leftmost icon In 
Fig. I is the application icon). Within the application 
environment, each document (morphological 
knowledge specification) is represented by a so- 
@ Fi|e Compile 
Fig. I: The level of the operating system 
Fig. 2: The top level of the lin~ist interface application 
called tool-window which contains eight labelled 
check-bones (see Fig. 2). Each of these check-boxes 
represents a window, the name and purpose of 
which is indicated by the labeh 
The window surface character set provides for 
the definition of the character set out of which so- 
called surface strings are built. Surfhce strings are 
used for the surface representation of word forms. 
The window is graphics-oriented, i.e. most of the 
definitions are done with mouse- and menu 
commands (see Fig. 3). 
The window lexlcal character set provides for 
the definition of the character set out of which so~ 
called lexical stzlngs are built. Lexical strings are 
used to define linguistically motivated abstractions of 
surface strings. The set is usually defined to include 
characters denoting morpheme boundaries and/or 
morphophonemes. The window is ve~ T similar to the 
surface character set window. 
The window feature domains provides for the 
domain specifications of the attrlbute-value pairs 
which are used in the rule- and constituent 
specifications (see below). The window is a text- 
oriented editor. An examp~.e specification is shown in 
v~g. 4. 
Cat; (N V A \]? Q) 
Case (NCH G%IN DAT ACC) 
Genciar (M F N) 
hbn (S~ PL) 
~lg. 4: Example definition in windowjkature domains 
The window feature dependencies provides 
for the definition of dependencies between features. 
/~m example specification is stiown in Fig. 5. 
(Cat N) demands G~nder 
Fig. 5: F~ample deflui¢ion in window feature 
dependencies 
155 
-0 ..... ,'< ,i,',.'iGorman:surface character set 
0 i 2 8 4 5 8 7 8 
P p ti 
""A O a q 
B R b r 
C S c s 
D T d t 
E U e u 
F D f v 
G Wg w 
H X h x 
I Y i y 
J IZ j z i~ 
K k 
L I 
M m 
N n 
0 o 
C D E F 9 A B 
i 
I I II \]r I.~ii Ii°i °ol, r, o o o : 
. ¢ . . j k , °l",I 
x c v b . m . I I I~II ........ I l ill 
specie/characters 
character sort order 
,___I~A.A a ~ B b C c O d IEo\[~>{~\[\] 
I¢\[ , 
ligature sort order 
AE A ae ~ ----- 
OE 0 oe 
ssB 
UE 0 ue 0 
Ol I0~ 
comment 
@ 
The German surface ,,, -- 
Fig. 3: The window surface character set 
The window two-level rules provides for the 
definition of morphophonemic rules which realize 
the mapping function between the surface- and 
lexical strings. The rules specified here are similar to 
those in DKIMMO/TWOL /Darymple 1987/. The 
window is a text--oriented editor. An example 
specification is shown in Fig. 6 (the two rules handle 
noun genitive \[e\]s: the first one replaces "+" by "e" as 
in Strausses, Reflexes, Reizes, the second one 
duplicates "s" as in VerhtHtnisses, Verht~ngnisses, 
Erschwernisses). 
The window functions provides for the 
detinition of rules for the kind of string- 
manipulations which should not be realized with two- 
level rules (because their power would be excessive 
or they would imply the introduction of linguistically 
unmotivated morphophonemes). The window is a 
text-oriented editor. An example specification is 
shown in Fig. 7 (ReCap recapitalizes prefixed nouns). 
ReCap 
" (.*)A(.*)/\la\2" value 
" (. *) B (. *)/\Ib\2" value 
..o 
" (.*) Z (.*)/\iz\2" value 
"^a(.*)/A\I" value 
"^b (.*)/B\I" value 
"^z (.*)/Z\I" value 
Fig. 7: Example definition in windowfunct/ons 
"o* \[SXZ\] " (ICat N-ROC~f) '"' (ICat N-ENDING) 
".*" {ICat N-ROOT) "nis/niss" (ICat N-~qgING) 
"+s/es" (Case GEN) 
"+s/es" (Case (~lq) 
156 
Fig. 6: Example definition in window two-level rules 
German:inflection I I q| 
I(c.t N) 
i (Cat u) \] 
l(cot A) l 
I! \] 
\] 
(IRuie UMLAUT) I 
('""__'e \] 
i(ICat N-SUFFIX) 
~(IRule ÷\[EIS/+E) 
L~ ...... , 
Fig. 8: The window i~IjIeetion 
The: window inflection provides for the 
definition of word classes with their inflectional rules 
and paradigms. This window is a graphical tree 
editor which allows the interactive construction of 
an n-ary tree. This tree is used to structure the rules 
and constituents which define the word classes. The 
structuring criteria are features (attribute value 
pairs) and the structure has the following semantics: 
the rule:; specified in a subtree operate on the 
constitueats specified within the same subtree. Fig. 8 
shows a subtree which contains rules and 
constituents for German noun Inflection (only the top 
branch (IRule UMLAUT) ls expanded down to the 
terminal nodes). The terminal nodes of the tree 
contain either rules or constituents. By mousing 
them, the user may open text-oriented editor 
windows. An example of a rule is shown in Fig. 9: it 
consists of matching constraints (realized by feature 
sets) on the constituents and specifies a set of lemma 
forms and a set of word forms. In the example, the 
set of lemma forms - specified below the kcyword 
'lemma' - is a single word form (nominative singular; 
i\[, .............. German:inflecUon:(Cat N)(IRule UMLAUT.+\[E\]S/+E) '"::P\]- 
i@llllllk 
(IC~ UklLAUT.II-HOO1}\[Num SG) 
plrldigm 
{l(:ll Ukf.AUT.N-I~OOTj{Num SG) (IC~t UhiLAUT.N-ROOT)(Num PL) 
ixulple~1: AslI, Sch~iluch 
( ICel UMLAUT.N-END INGl(flum SG) 
( IC~: UMLAUT.N-lEND INGI(Num SG) (ICIt UMLAUT.N-ENDING)(Hum PL) 
(iCaJt H-SUFFIX)\[Num $G)( ICalt SG+\[ElS)(C:sse NOM} 
{ICat N-~SUFFIX)(Num SG)( ICat SG-~E\]S) (ICat N-,SUFFIX)(Num PL)(ICM PL~E) 
IPlg. ~. Ezmnple ~Uon~ rule wiudow 
157 
the pattern of feature sets identifies exactly one form 
which is put together by the concatenation of three 
constituents). The set of word forms - specified 
below the keyword 'paradigm' - consists of eight 
elements (the case paradigm; the two patterns of 
feature sets identify exactly eight forms, each of 
which is put together by the concatenation of three 
constituents). The constituent windows specify 
either so-called hard-coded constituents or 
constituent types. The former are feature sets which 
are associated with 'hard-coded' lexical strings (see 
Fig. 10); they are typically used to specify inflectional 
eO~ German:inflection:(Cat N)(ICot N-SUFFIX.SG+\[EIS)(Num SG) ~ilJl~ 
%° (Case NOM) 
=+\[e\]s = (Case GEN) '° ~£\[e\]" 
(Case DAT) 
=. " (Case ACC I 
constituents arc structured by features which qualify 
them. The rules in the terminal nodes (see Fig. 12) 
define new potential word entries by specifying how 
constituents of existing entries are combined with 
IO~ German:compo$1tion:(CRuie T0-N.N-I'0-N,PREFIII) Lim~ 
Source ~I 
I (CCa\[ PREFIX) 
(IRulo ?x) I ! 2 (ICat H-ROO D (Num SG) I I 
3 (ICat N-ENDING) (Hum SG) \[ I 4 (lest N-ROOT) (Num PL) \] \] 
S (ICat N-ENDING) (Hum PL) I 1 
I I ~rget i t 
(m.le ?x) I I (ReCap (~ I 2)) (ICat N=ROOT) (~um SG) I I 
3 (ICat N-ENDING) (Hum SG) |: (ReCap (+ 1 4)) (ICat N-ROOT) (Hum PL) | 
5 (Ieat N-ENDING) (N.m PL) \] 
examples \[ Liliputl-o~raat, MinisendeL Riesewischlauch \[__ 
Fig. I0: Example window with hard-coded 
constituents 
affixes. The latter are feature sets where the 
associated strings are represented by place holders, 
i.e. the strings are not specified yet but will be 
entered later, either via the lexicographer interface 
or by the firing of compositional rules (see Fig. 1 1). 
~0- German:inflectlon:(Cat N)(ICat N-STEM.UMLAUT)-~ 
(ICat N-ROOT) (Num SG) 
(ICat N-ENDING) (Num SG) 
(ICat N-ROOT) (Num PL) 
(ICat N-ENDING) (Num PL) 
1 entered 
2 entered 
3 (Umlaut 1) 
4 (Copy 2) 
examples 
Ast, Schiauch 
Fig. I I: Ex~ple window wlth congtituent ty~e~ 
They are typically used to specify word roots. From 
what has been said so far, we may infer how an entry 
into the database is made and what it will generate: 
the specification of an entry requires the 
identification of an inflectional rule and the 
specification of the lexical strings which are 
represented as place holders in the constituents 
matched by the rule. Usually, this means that one or 
two strings have to be entered. From this, the system 
may generate the entire inflectional paradigm of the 
word. Notice that the user of the linguist interface 
defines with his specification what a word is (viz, a 
set of lemma forms and a set of word forms). 
Moreover, Word Manager imposes the convention 
that only entire words - and no isolated word forms - 
may be entered into the database. 
The window composition provides for the 
definition of compositional rules and constituents 
(affixes). This window is a graphical tree editor 
similar to the window inflection where the rules and 
each other and with constituents defined in the 
window composition (dertvatlonal affixes). These 
rules are usually not applied generatively but 
analytically, because a generative application is likely 
to overgenerate (theoretically, the user may specify 
an arbitrary number of features which restrict 
excessive generation, but I believe that this is 
unpractical in most cases, because it implles that the 
lexicographer has to specify a host of features for 
each entry)~ The purpose of the rules is that all 
derived and compound words may be entered Into 
the database via the triggering of such l~les. This has 
the advantage thai: the system (automatically) keeps 
track of the derivational history and therefore the 
morphological s~_xucturing of each entry. 
40 The lexicographer interface 
Given a compiled specification of the conceptual 
morphological knowledge defined within the linguist 
interface, Word Manager may generate a dialogue 
which guides the lexicographer towards the 
Identification of the inflectional/compositional rules 
that must be triggered in order to add an entry to 
the database. In the case of non-composed words, for 
example, Word Mm\]ager may simply navigate in the 
tree which structures the Inflectional rules (specified 
in the window inflection), posing questions 
according to the structuring criteria. 
In the case of composed words, Word Manager 
may apply the compositional rules in analytical mode, 
provided that the 'initial' infbrmation consists of a 
word string, Such an analytical application of the 
rules is usually not very overgeneratlng - In contrast 
to a generative application-, i.e. the system will be 
able to present a reasonably limited number of 
selection choices. 
50 Concl~ion 
The advantages of a dedicated system like Word 
Manager for the mmlagement of lexical databases are 
manifold. In this paper, we have restricted the 
discussion to the advantages yielded during the 
construction of the database. These are by no means 
the only ones: the dedication also Implies that the 
overhead of non-dedicated systems (e.g. general 
purpose DBMS in conjunction with conventional 
programming languagesL Le. the featt~res which are 
\].58 
superfluons for lexical databases, is avoided. On the 
othex' hanc;i, Word l~Aanager provides features which a 
general pltrpose system will never have, viz. the 
special t'ormalis~ to implement morphological 
knowledge, "~his is not only beneficial from the point 
of view of the interfacing to the database but also 
from the point of view of the software design: in the 
dedicated :~ystem, the morphological knowledge is a 
part of ttL~ conceptual database .~chem~ (in the 
terminoloKy of database theory) and thus belongs to 
the kernel of the sysi~cm, as: it were. When a general 
purpose database management system in conjunction 
with a conventional programming language is used to 
implement the same kind of knowledge, it has to be 
hnplemented within the external schemata to the 
database and thus repeatedly fox' each of them. The 
soocalted code factoring is i:hcretbre much better in 
a dedicalcd system: the knowledge is more 
centralizecz, and Implemented with a minimum of 
x'eduncancy. 

References

/ANSI-X3.~Pt~RC t975/ ANSI/X3/SPARC Study 
G~'oup on Data Base Management Systems: 
"lntcrh~J. ltepoit 75-02-08," I,'DT (Bull, of 
the ACIvl 31GMOD) 7. 1975. 

/B~.a~ • 198~3/ Bear J. : "A Morphological 
Reeogi.dzer with Syntaciic and Phonological 
Rules." in: Proceedings of the 1 ittt 
btternational Conj~:rence on Computational 
Linguistic.~, Bonn, August 25-29~ 1986. 

/Black 19116/ Black A.W~, et al.: "A Dictionmy and 
Morpl-mlogical Analyser for English." in: 
Proceedings oj the i l th Intet~national 
Conference on Co~wutationul Li~tyui.~:ttc~J, 
I~onn, August 25-29, 19t~tl. 

/B0cker i\[986/ Bocker H. \] )., et al. : "The 
Enhancement of Understanding Through 
Visual Representations." in: Human Factors 
(n Computing Systems, CI-II'86 Conference 
Pcoceedinga (Special issue of tbc SIGCHI 
Bulletin), Bosi:on, April 13-17, 1986. 

/Borin 19116/ Borin L,: "What is a Lexical 
l~epresentation?" in: Papers for the Fifth 
Scandinavian Con\]e~nce qf Computational 
Linguistics, Helsinki, December 1 1-12, 
1985. University of Helsinki, Department of 
General IAnguistics, Publications No. 15, 
i\[986. 

/DarympI(,. 1987/ Dah'ymple M., et al. : 
DKIMMO/TWOL: A Development 
Environment .\['or Morphological .Analysis, in 
Kaplan R., Karttunen L.: "Computational 
IVtorphol0gy." Course Script LI283, 1987 
Llngulstlc Institute, Stanford University, 
J~Lme 29oAuguat 7. I,¢~7. 

/Domenlg 1986/ Domenig M., Shann P.: 
'"t'owards a Dedicated Database Management 
System for Dictionaries." in: Proceedings of 
the l lth International Conference on 
Computational Linguistics, Bonn, August 
25-29, 1986. 

/Domenig 1987a/ Domenig M.: Entwurf eines 
dedizierten Datenbanksystems f~r Lexika. 
Problemanalyse und Software-Entwmf 
anhand eine~.; Projektes Jhr maschinelle 
Sprach~bersetzung. Niemeyer Verlag, 
Ttibingen, Retbe "Spraehe tu~d Inlbrmatlon" 
lid. 17, 1987. 

/Domenlg 19871)/ Domenlg M. : "On the 
Formallsatlon of Dictionaries." in: Sprache 
und Datenverarbeitung, 1/1987. 

/Koskenniemi 1983/ Koskenniemi K.: Two-Level 
Morphology: A General Computational 
Model ,for Word-Form Recognition and 
Production. Dissertation at the University of 
Helsinki, Department of General 
Linguistics, Publications No. 11, 1983. 

/Myers 1986/ Myers B.A.: "Visual Programming, 
Programming by Example, and Program 
Visualization, a Taxonomy." in: Human 
Factors In Computing Systems, CHI'86 
Conference Proceedings (Special Issue of 
the SIGCHI Bulletin), Boston, April 13-17. 
1986. 
