A REUSABLE LEXICAL DATABASE TOOL FOR MACltlNE TRANSLATION 
BRI(HTTE BL,~SER ULRIKF, SCHWALL 
IBM Germany IBM Germany 
Institute for Knowledgc Institute for Knowledge 
Based Systems Based Systems 
P.O. Box 10 30 68 P.O. Box 10 30 68 
W-6900 lleidelberg W-6900 lteidelberg 
Email: alscbwee schwall 
at dhdibm 1 .bitnet at dhdibm l.bitnet 
O. ABSTRACT 
This paper describes the lexical database tool LOLA 
(Linguistic-Oriented Lexical database Approach) 
which has been developed fur the construction and 
maintenance of lexicons for the maclfine translation 
system I,MT. First, the requirements such a tool 
should meet are discussed, then I,MT and the lexi- 
cal information it requires, and some issues con- 
cerning vocabulary acquisition are presented. 
Afterwards the architecture aml the components of 
the I,OLA system are described and it is shown how 
we tried to meet the requirements worked out em'- 
hier. Although I,OI,A originally has been designed 
and implemented for the German-English I,MT 
prototype, it aimed from the beginning at a repre- 
sentation of lexical data that can be reused for uther 
LMT or MT prototypes or even other NLP appli- 
cations. A special point of discussion will therefore 
be the adaptability of the tool and its cumponents 
as well as the reusability of the lexical data stored in 
the database for the lexicon development for I,MT 
or for other applications. 
i. Introduction 
The availability of large-scale lexical information has 
widely been recognized as a bottleneck in the con- 
struction of Natural Language Processing (NLI') 
systems. The lexical database I,OLA has been de- 
veloped in connection with the Logic- 
programming-based Machine Translation (LMT) 
system and shall be presented here. This work is 
part of the objectives of the project Transl,exis 
launched in 1991 at the Institute of Knowledge 
Based Systems of the IBM Germany Scientific 
Center. Transl,exis aims at the theoretically and 
empirically well motivated lexical description and 
the management of the lexical information of LMT 
in a database. It is conceived as a first step towards 
a reusable lexical knowledge base. 
1.1. Requirements for convenient 
construction and maintenance of Lexicons 
Based on our experience and existing literature, a 
tool for the construction and maintenance of large 
NLP lexicons with a complex entry structure should 
meet the following requirements: 
tJ Adequate expressive power of the represen- 
tation formalism: the expressive power must 
be sufficient to cover the facts of lexical de- 
scription. 
ANGELIKA STORRER 
University of Tiibingen 
Seminar ffir 
natfirlichsprachl. Systeme 
Wilhelmstr. 113 
W-7400 'Fiibingen 
storrer at arbuckle.sns. 
ncuphilologie.uni-tuebingen.de 
t~ Methodology for the description of lexical in- 
formation: criteria and guidelines relevant for 
encoding should be developed and docu- 
mented. 
n Orientation towards lexicographic procedure: 
the design of the tool should take the logical 
course of the lexicographie work procedure into 
consideration and support it during all its steps 
and phases. The lexicographer should be ena- 
bled to concentrate on the lexicographic de- 
scription of lexical units while the tool itself 
automatically takes care of the remaining tasks 
in lexicon development. 
n Consistency mid integrity checking of the lexi- 
cal data: when entries are added or updated, 
the system should reject invalid values for par- 
ticular features and check if the input leads to 
inconsistency of the database. 
t~ Data independence: An extreme dependency 
between the structuring of lexical data stored in 
the database and the structure of the lexical 
entries in a given application system should be 
avoided. In tiffs way the lexical data will remain 
resistant to modifications in the 
NLP/MT-systems that make use of these data. 
t~ Reusability/Rcversability of the data (cf. 
Calzolari 1989, tlcid 1991): lexical data should 
be represented in such a way that it can -- apart 
from its transfer specific components - be re- 
used for other MT-prototypes with the same 
source or target languagc, or with the reverse 
language pair (e.g. German-English and 
English-German). Ideally, the lexical data 
should be independent to such a degree that 
they are also reusable for other 
NLP-applications. 
D Multi-user access: it should be possible for se- 
veral users to work on the lexicon simultane- 
ously. 
D llelp facilities: the criteria and guidelines for 
lexical description should be easily accessible. 
The availability of monolingnal and bilingual 
dictionaries are to support the lexicographer's 
linguistic competence. 
1.2 LMT 
LMT, developed by Michael MeCord, is in basic 
design a source-based transfer system in which the 
source analysis is done with Slot Grammar (cf. 
McCord 1989, 1990, forthcoming). Two main 
characteristics of LMT should be emphas~ed: 
1. the lexicalism, arising from Slot Grammar 
source analysis; 
ACRES DE COLINO-92. NANTES, 23-28 AOt'rr 1992 5 1 0 PROC. OF COLING-92, NANTES, AUG. 23-98, 1992 
2. a large language-general X4o-Y-translation 
shell. 
Both features facilitate the development of proto- 
types for ncw hmguage pairs I , Versions of I,MT 
(in wu'ious stages) exist currently for nine language 
pairs. 
I,MT currently requires the lollowing types of in- 
formation to bc specified k)r lexical units (I,lJ): 
pat1 of speech; 
u word SellSCS i 
u morphological properties; 
rl agreelnent features; 
o the valency, i.e. the li'anlc of 
optiomd/ohligatory complement slots; 
o the specification of the fillers (Nl)s, suhordinate 
clauses) for each slot; 
t~ semantic compatibility constraints and 
collocations; 
n characterization of mulliword lexmncs; 
u subject area; 
translation relations; 
tq lexieal transtbmmtions. 
In McCord (forthcoming), an external lexical tormat 
(l';lA:) is prescntcd wtfich allows the representation 
of the above information. 1Jntil now, however, the 
lexical data has been kept in sequential files attd 
updating tins been done with a text editor. Thtls 
most of tile above-mentiuncd requirenlents eoukl 
not be met. 
1.3 Vocabulary Acquisition 
The hand-coding of dictionaries is a laborious and 
timc-eonsuming task. Thercforc a nmnbcr of at- 
tempts have been made to exploit corpora and/or 
machine readable dictionaries (MRDs) for the 
build-up of NI A)-lexicons (el. 3.5) 2 . In many cases, 
however, the lexical information in MRD's is nci- 
thor complete nor sufficiently explicit for NLP/MT 
purposes anti has to he rcvised hy lexicographers. 
Ideally, the demands on a lexicograptter shoukl only 
bc of linguistic nature. For this reason a sophisti- 
cated tool is necdcd to guklc anti suppnrt the 
NlJ'/MT-lexicographer in revising entries auto- 
maticaHy converted from machine readable sources 
as well as in buikling up new vocabulary. 
2. LOLA - architecture and components 
The lexical data base tool I,OLA aims at meeting 
the above mentkmed requirements. Its design and 
development are based on work achieved in the 
l,EX-project and the COIJ';X-projcct 3 . LOLA 
makes use of automatic consistency and hltegrity 
checks as well as of the support of multi-user access 
provided as standard facilities by the relational 
I)BMS SQL/I)S. Updates are made with the help 
of a user interface tltat supports tim lexicographer 
during the encoding process. The representation of 
the lexical data has been worked out to be as inde- 
pendent as possible of the format of a specific ap- 
plicatiun lexicon, thus increasing the degree of 
reusability of the lexical data. In additiun, a cata- 
logue of criteria and guiddhms for lexical description 
is being elaborated and will be integrated into tim 
tool. 
IXlLA L~xtcal I)ar.aim~ '1"~l 
k ..... 3// \] ~x ~ 
Figure I. IX)I,A - architecture 
The main components of the architecture of the 
1,OI,A system arc tile following (of. l:igure 1): 
1. LO1,A-I)B: the database itself. 
2. COI,OI,A (COder's interface to LO1,A): 
Interface tilt hand-codiug and modificatkm of 
tim lexical data, stored in I,OLA-I)B. 
3. I)B TO LMT: program tll~d generates I,MT 
lexicon matfies from the lcxieal data stored in 
1 I)I,A-I)B. 
4. I,MT TO 1)11: program that loads already ex- 
isting 1 ,MT lexicons into I,OI,A-I)B. 
5, I,I)11 TO DB: program that converts data 
from ~M l(l)'s into I ,O 1 ,A-1)B. 
In the tbllowing we give a brief description of these 
components. 
2.1. The Database 
The database was desi~md iu two steps: develop- 
merit of the conceptual scheme and dcvclopment of 
tim database scheme. 
In the conceptual design phase, the lexical objects, 
ttmir properties, and their interrelations were re- 
presented in an entity-relatinnship diagram (of. 
I,MT is file technical basis of an international project at IBM wifll cooperation between IBM Researdl, tile IBM 
Science Centers in lleidelberg, Madrid, Paris, llaifa, and Cairo, and IBM European l.anguage Services in 
Copenhagen (cf. Rimon et al. 1991). 
2 Cf. Byrd et al. 1987; for an overview of related activities within file l,MT-project, cL Rimon et al. 1991, pp. 14-15. 
Cf. Barnett et al. 1986; Blumenthal et al. 1988; Storrer 1990. 
Ac~s DE COLING-92, NANTES, 23-28 ao(Tr 1992 5 1 l PROe. OF COLING-92, NANTES, AUG. 23-28, 1992 
Chen 1976). Although the ER-model does not have 
the expressive power to cover all aspects of lcxical 
description, especially complex constraints, it has 
been chosen here as a compromise between a cmn- 
plete lexical representation and the realization in a 
traditional database system. 
The resulting ER-diagram for the German-English 
lexicon is shown in Figure 2 a . 
The conceptual scheme is still independent of the 
choice of a specific DBMS and of other implemen- 
tation aspects. The basic principles of the concep- 
tual design of our database will be sketched out in 
the following. 
Orientation towards linguistic structure, not towards 
the structure of the application lexicon. 
The diagram reflects, in the first place, the structure 
of the linguistic objects, their properties and and 
their interrelations, and it is influenced to a smaller 
degree by the structure of the application lexicon. 
As a consequence, the data is quite resistant to 
structural changes hi the format of the application 
lexicon. The abstraction from the structures of the 
application lexicon has a positive side effect with 
regard to the exploitation of machine readable lexi- 
cai resources: on one hand, we can handle cases, in 
which not all information required by LMT is pro- 
vided in the entries of MRD's. The information 
acquired can be stored as entries to be completed 
and revised later. On the other hand, we are free to 
store types of lexical information that are of rele- 
vance for NLP applications and can be acquired 
from MRD's or other NLP lexicons but are not 
processed in a current LMT-version. We can save 
them in the database as coding aids for the 
lexicographers, for future prototype versions, or 
other NLP applications. 
Analogous structure for source and target language 
wherever possible. 
The lower part of the ER-diagram represents the 
German source, tile upper part tile English target 
language. For both languages, an entity of the type 
entry can have one or more homonyms, each of 
which can have one or more senses. The senses 
themselves can open one or more sense-specific slots 
(one-to-many relations). A sense-specific slot can 
be filled by several types of fillers and the same type 
of filler can flU several sense-specific slots (many- 
to-many relation). The basic types of entities and 
relations, which are the same for all languages, are 
described by their characteristic features represented 
as attributes. The number of attributes as well as 
their values may differ according to language- 
specific peculiarities s . 
Many-to-many relations between the lexical objects 
of both languages. 
We represent the relation of lexical equivalence be- 
tween source and target senses, as a many-to-many 
relation (one source sense can have multiple target 
equivalents and vice versa). This breaks with the 
traditional hierarchical entry structure of bilingual 
dictionaries (Calzolari et at. 1990), but it avoids re- 
dundant description and storage of one target sense 
that is lexically equivalent to different source senses. 
Another relation holds for the sense-specific slots 
of two senses that are regarded as lexicMly equiv- 
alent. We decided to establish this relation between 
slots and not between slot frames. This way we caaa 
elegantly describe lexically equivalent senses with 
non-corresponding slotframes ~ . In this way the re- 
lations between the two languages may be used to 
a great extent bidirectionally for the XY- as well as 
for tbe YX-language pair. 
The conceptual scheme captured in the ER-diagram 
was then mapped into a database scheme and im- 
plemented in tile relational DBMS SQL/DS. We 
chose a relational DBMS, because -- for the main- 
tenance of the large LMT-GE lexicon (about 50,000 
entries) -- we were in need of a stable DBMS which 
supports multi-user access, has facilities for auto- 
matic checking of consistency and integrity of the 
lexical data mad allows for the specification of mul- 
tiple user-specific views on the data. To avoid re- 
dundancy and update anomalies we tried to 
normalize our relations as far it was useful with re- 
spect to our approach. In total, 32 tables are imple- 
mented: 25 tables describe lexical objects and 
relations by means of attributes with associated val- 
ues, 7 tables serve to store "knowledge about the 
lexical knowledge", e.g. the admitted values for at- 
tributes sucia as semantic type, filler-type, slot-type 
for both languages. 
2.2. COLOLA: the user interface to LOLA 
COLOLA is the user interface to LOLA-DB that 
looks up the lexical data of a given search word and 
displays it on sequentially connected menus. The 
design of the menus as well as their sequential order 
was guided by the manner in which lexicographers 
describe lexical entries. The following operations 
can be performed: 
4 The boxes represent types of entities, the diamonds represent types of relations between entities, the ellipses represent 
attributes which characterize types of entities or relations. The labels of the connection lines indicate whether the 
relation in question is a one-to-one, one-to-many, many-to-one or many-to-many relation. 
The ER-diagram is a simplified version of the actual conceptual model. For the purpose of this paper, several entity 
types, attributes, and relations have been leR out. 
s E.g.: in German a preposition like "auf' can govern either an accusative NP ("warren auf') or a dative NP ("lasten auf') 
depending on the verb that lakes the prepositional phrase with the respective preposition as a complement. 
Therefore "case" is a feature, relevant for the description of German slot fillers filling a prepositional complement 
slot, 
s E.g. cases like "like" and "gefallen" where the subject of the English verb corresponds to the dative object of the 
German verb; or cases like "geigen" and "play the violin" where the English direct object filler "the violin" is incor- 
porated in the semantics of the German verb "geigen". 
ACTES DE CO\[JNG-92, NAbrl~, 23-28 AOt~'r 1992 5 1 2 PROC. or COLING-92, NANTES, AUO. 23-28, 1992 
\] 
" - " 7 " ~ /~" 1" \ "\ n, 
Figure 2. Entity Relationship Diagram for \[;erman-Engli,-.h 
n addition of new source or target entries 
n deletion of existing entries 
n change of existing or addition of new features 
to existing entries 
\[\] deletion of features from existing entries 
u assigmnent of new or deletitm of existing 
transit(inn equivalents for a given source sense 
N update, insertinn or deletion of transfer infnr- 
mation for each pair of translation equivalents. 
For each part of speech, a specific sequence of 
menus is defined. There are ntcnus for homonyms 
and senses of source and target entries; the qinkiug" 
of the source senses and target senses regarded to 
be lexically equivalent is clone via transfer menus. 
This allows lexicographers to spcci~dize on specific 
parts of speech or on specific fi~atures which can be 
locally updated. 
COI,OLA controls multi-user access to the 
LOI,A-I)B so tltat several lexicographers (:an up- 
date the lcxical database simultaneously. The log- 
ical unit of work is the source or target homonym: 
when a lcxicngraphcr rcqucsts to update a 
homonynt, this homonym, together with its senses, 
is locked tar other users. 
If a new entry contains blanks, a multiword menu 
is called where the multiword is split up into its 
components. For each component, the following 
lexical information is gathered: the part of speech, 
whether the word may inflect within the multiword, 
and whether a phrase can be inserted between one 
multiword component mKl tire previous one with- 
out doing away with idiomaticity. 
If new hotnonylus or 8ellses ;:ire inserted ou the 
multiword menu as well as on other menus, default 
wducs for tcaturcs arc displayed. They can either 
be accepted or rejected and overwrittcn by the 
lcxiengraphcr 7 . The assumptions on dethult values 
for attributes nf lcxical intbrmation may differ ac- 
cording to diffcrcnt grammars and systems. We 
therefore decided to store the complete lexJcal in- 
fbrmatinn and use dcfault wdues as proposals in tire 
user interface. With this approach we aUow for two 
advantages: on one hand, the data in thc database 
can be used lot difllsreut appficatinrts having distinct 
theory specific assumptions on defaults. On the 
other hand, the user of CO1,OI,A can bcuetit li'om 
the economic advantages ()f default assumptions. 
C()I~OLA does extensive crmsistency chccking of 
the values entered by tire lexicographers. Illegal val- 
ues arc rejected and warning messages are displayed 
in situations where errors nfight easily occur. Al- 
though much of tim consistency cttecking is sup.. 
pnrted by the database management system, some 
extensions were necessary. 
Further support for the lexicographers is provided 
by an interface tn the WordSmith nn-linc dictit)nary 
system (cf. Byrd/Neff 1987). Several machine read- 
able dictionaries are available e.g. Collins German- 
tinglish, F, nglish-German, Longman's I)ictionai T of 
Contemporary l':uglish, and Webster 7th Collegiate 
dictionary. The lexicographer can look up entries in 
these dictionaries during the encoding process. 
Furthermore, help menus are provided in which ttre 
vMid values tbr specific features can be looked tip. 
7 Default values are provided, for instance, for slot fillers. German direct objcct slots get an accusative noun phrase 
as lhe default filler. The lexicographers may accept this, add other fillers or write over it with ann(fief filler. 
AC1T~ BE COLING-92, NAN'fES, 23-28 AOUf 1992 5 1 3 I)ROC. OF COLING-92, NANTES, AUO. 23-28, 1992 
2.3. DB 7"0 LMT 
A conversion program I)B TO LMT has been dc- 
veloped which extracts lexical in-formation stored in 
the relations of I,OI,A-DB and converts it into 
l,MT-lbrmat. 1)11 fl'O _I,MT consists of two com- 
ponents: 
u a datalmsc extractor and 
O a conversion program 
The database extractor selects the source entries and 
the corresponding target entries and stores them in 
database format. This format can be regarded as an 
intermediate representation betwccn database 
scheme and I,MT-format. It consists nf a set of 
Proh)g predicates which correspond to the relations 
of thc database scheme. Thcrc are, for instance, 
entry, homonym, sense, and slot predicates which 
correspond to the entry, homonym, sense, and slot 
relations in the database. The conversion program 
finally converts the database format into the 
LMT-format. It has to be adapted according to the 
changes or extcnsions of" the l,MT-ff~rmat. 
2.4. LMT TO DB 
Before and during LOLA design and development, 
I,MT lexicons in EI, F were already crcated and 
updated in fdes. Since these lexicons still need up- 
dating and since this is much better supported by 
1,OLA, a conversion program LMT TO DB was 
needed which converts EI, I r entries c~f lextcon files 
into the database format and loads them into 
I,OI,A-DB. I MT_TO_I)I~ consists of three corn- 
ponents: 
t~ the lexiemt cmnpiler of LMT, 
tJ a conversion component, and 
\[\] a database loader. 
The lexicon compiler is the component of the I,M'I" 
system which converts the EI,F into tire internal 
I,MT format s . In the internal format all abbrevi- 
ation conventions and default assumptions are al- 
ready interpreted and expanded accordingly so that 
the complete lexical itfformation is represented ex- 
plicitly. The conversion component then converts 
the internal l,MT-format to database forrnat. The 
database loader generates the SQL-statements and 
updates the database. It has to check first whether 
the hmnonym or sense to be inserted is identical 
with ,an homonym or sense stored in the database. 
If all the features of two homonyms or senses can 
be unified, they are regarded to be identical and the 
already existing entry is merged with the converted 
entry. In all other cases the homonym or sense is 
inserted into the database and merging has to be 
done by the lexicographers with COI,O1,A. 
2.5. LDB TO DB 
To supplement the lexical coverage of the LMT 
system, a dictionary access module has been devel- 
opcd which allows real-time access (cf. 
Neff/McCord 1990) to Collins bilingual dictionaries 
awfilable as lexical data bases (l,l)Bs) 9 . The mod- 
ule includes a language pair independent shell com- 
ponent COl ,1 ,XY and laoguage-speeific 
components ,'rod converts the lexical data of the 
Ll)ll into the l,Ml'-format, l,Dll TO l)ll is based 
on these programs. It consists of 
\[\] a pattern matching component, 
tJ a restroettrring enmlmnent , 
a corrvel'sion eonlllofierrt , and 
t~ the database loader of 1,MT TO I)B. 
With the pattern matclfing cmnlmnent, those fea- 
tures (sub-trees) that are to be converted are selected 
from the dictionary entries. In printed dictionaries, 
features colnnlon tO more th,'m one sub-tree are of. 
ten factorcd out in order to save space. With the 
re,structuring component, those features can be 
moved 1o the sub-trees they logically belong to. 
The conversion ennlponent converts the restructured 
dictionary entry to database format. The database 
loader of LMT TO I)B merges the entry with a 
possibly already exiting one in I,OI,A-DB anti 
generates the SQl,-staternents to update the data- 
base. The converted entries can be revised by the 
lexicographers with COI,OI,A. 
3. Reusability of the LOLA system 
3.1 Reusability of the tool components 
The first I,OI,A prototype was tleveloped to sup- 
port lexicon development for the language pair 
German-English. In the meantime, work has been 
started to make the tool usable for lexicon develop- 
rnent of the Fnglish-I)anish and English-Spanish 
I,MT systerns. As a positive result of the design 
principles described in section 3.1., the database 
scheme had to be modified only slightly with regard 
to prototype-specific differences I° . The values for" 
1,'mguage-specific attributes such as types of slots 
mad fillers will be defined for" the "new" languages 
Spanish and Danish and will be stored in the data- 
base. Thcy can then bc used fi~r consistency check- 
ing (only defined valucs can be updated in the 
database). In COI,OI,A we had to take into ac- 
count the homonym level on thc target side, where 
s In tile morpho-lexical processing and compiling phase, I~,I.F entries are converted into an internal format (cf. McCord 
(forthcoming): sect. 2) which represents file initial source and transfer analysis of an individual input word string. 
9 An LI)II provides a tree representation of tile hierarchical structure of tile dictionary entries. The nodes of tile tree 
are labeled with attributes having specific values for each individual entry. The t,DB can be queried wilt the spe- 
cialized query language LQL (cf. Neff/Byrd/Rizk 1988). 
~o English-Daoish and English-Spanish use lexicon driven morphology for the target languages Spanish (cf. Rimon et 
at. 1991) and Danish, whereas German-Engllsh uses a rule-based target morphology for English (el', McCord/Wolff 
1988). 
ACRES DE COL\[NG-92, NANTES, 23-28 AUra" 1992 5 1 4 PROC. OF COLING-92, NA.'CrES, Auo. 23-28, 1992 
the features of Spanish ,anti l)anish morphology 
have to be specitied. The programs that convert the 
database entries into the timnat of the application 
lexicons and vice versa (\])llfl'O I,MT mul 
I,MT_TO DIt) need generalization \]il nrdei to 
achieve an abstraction from prototype-specific lea- 
lures of \] ,MT. 
3.2 Reusability of the lexical data 
In order to meet the requirement of data independ- 
ence, the representation of lexical entries in the da- 
tabase is highly independent of that in the 
application lexicon. In the database, the description 
of linguistic entities and their interrelatkms is given 
in a set of tables where specitic values air stored lor 
the characteristic attributes of each individual entity. 
On these tables, different views can be defined for 
different types of users, l)iffercnt programs (like 
I)B TO I,MT) can extract exactly the attribute 
values needed fur their respective appficatiou and 
convert them into each given format. This way, 
from one and the same data base several lexicons 
can be generated, in which the same "lhaguistic 
world' is structured differently or represented in a 
completely ditti~rent way. The possibilities of reus- 
ability ,are naturally defined and limited by thc 
number of the registered types of lexical information 
in the origin',d data base. As far as the LOI,A da- 
tabase is concerned, the very detailed description of 
slot frames as well as the information about nmlti- 
words and the properties of their emnponents may 
be reused for other NLP applicatkms with one of 
the languages inwflved. The reusability of the 
transter intormation (specitied in the transfer re- 
lations between the languages of a given language 
pair) for other MT systems depends highly on the 
respective MT approach. As to the question of 
rcusabilty of the data in the LMT system "family", 
three different cascs have to be distinguished: 
1. lexical-data description given for a source lan- 
guage X is reused for another language pair 
having X as source language, 
2. lexical-data description given for a source lan- 
guage X is reused for another language pair 
having X as target language, 
3. lexical-data description given for a target lan- 
guage Y is reused for another language pair 
having Y as source language. 
In the tirst two cases, reusability of tim lexical data 
of language X is very high. In tim ttfird case, the 
description of Y as source 1;mguage may have to be 
more detailed in order to achieve ml adequate syn- 
tactic analysis n . New attributes or even new types 
of entities or relationships may be needed mad the 
database scheme will have to be etd~aneed accord- 
ingly. 
4. Outlook 
Our long-term goal is a multilingnal database, in 
wlfich the lexical knowledge tbr each language in- 
volved in the 1,MT project is represented only once. 
Application lexicons lor I,MT prototypes with dif- 
ferent language pairs arc generated by extracting tile 
requitvd informatiun from the database and by 
converting it into the respective l,MT-format. 
Furthermore, tim tool is to be extended in such a 
way that it is not restricted to the construction of 
MT lexicons, hut c~m also be used as a terminology 
workbench and thus support tim construction and 
m;dntenance of terminology. An integra|ed MT 
and tcrminulogy database would have the advantage 
that the Icxical knowledge encoded by 
terminnlogists aud translators can be used by tim 
tr,'ulslalkm system as wclh For refinement and 
completion of the description of the German lan- 
guage, it is pl~mned to integrate lurtber i~ffurmation 
from available Gerinan NI,I' lexicons into the 
I,OI,A-DII. A basic proldem concerning this 
undertaking will bc to identify and to match the 
basic categories "entries", "homonyms", "senses", 
which arc detined in various Icxical resources ac- 
cording to diffbrent criteria, only some nf which be- 
ing transparent. With this erfurt, we hope to gain 
Ihrther knowledge on the limits and possibilities 
concerning the reusability tff lexical data. 
ACRES DE COLING-92, NANTES. 23-28 AO~" 1992 5 1 6 PROC. OF COLING-92, NANTES, AUC. 23-28, 1992 
ii ii.g. in a source-based translation system like LMT, tile information on whether a target slot is obligatory is not di- 
rectly encoded in tile I,MT transli~r-lexicon; tile system controls target slot asslgnmeat by tile presence of a corre- 
sponding source slot in a given input sentence and tile mapping relalion specilied within tim transfer lexicon entry. 
On the source side, however, the feature of slot obligatorlness is used for purposes of analysis disambiguation. 
A~'ES DE COLING-92. N^I,n"~S, 23-28 ^ot)r 1992 5 1 5 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 

References 

Barnett, B., 11. l,ehmann, M. Zneppritz (1986): "A 
Word Database for Natured I,anguage Processing", 
l'roceedings I l th International Con\]erence on Com- 
putational Linguistics COLlNG86 Augast 25th to 
291h, 1986, Bonn, Federal Republic of Germany, pp. 
435-440. 

Btumcnthal ct al. (1988): "Was ist eigentlich ein 
Verweis?- Konzeptuellc Dateumodellierung als 
Voraussctzung computcrgestiilzter Verwcisbehand- 
lung." , Ilarras,G.(ed.): Das WOrlerbuch. Artikel 
und Verweisstrukturen. l)fisseldorf 1988. 

Byrd, R., N. Calzolari , M. Chodorow, J. Klavans, 
M. S. Neff, O. Rizk(1987): "Tools and Methods tbr 
Computatkmal I,exicography", Computational Lin- 
guistics,13, 3-4. 

Byrd, R. J., M. S. Neff" (1987): WordSmith User's 
Guide, Research Report, IBM Research Division, 
Yorktown lleigbts, NY 10598. 

Chen, Peter P.-S. (1976): "The lintity-Relationship 
Model - Towards a Unified View of Data" , ACM 
Tratt~actioas on Database Sy.vtetrtv 1, pp. 9-36. 

Calzolmi, N. (1989): "The Development of l,arge 
Muno- and Bifingual I,exical Databases", Contrib- 
ution to the IBM Europe Institute "Computer based 
TrarLffation of Natural Language", G~ua'nisch- 
Partenkirchen. 

Calzolari et al. (1990): "Computational Model of 
the Dictionary F.ntry - Preliminary P, epnrt", Pro/ect 
AQUILEX, Pisa. 

IIeid, U. (1991): A short report on the EUROTRA-7 
Study, Univ. of Stuttgart t99t. 

MeCord, M. C. (1989) "A NEw Version of the 
Machine 'l'ranslatkm System I,MT", Literary and 
Linguistic Computing, 4, pp. 218-229. 

McCord, M. ('. (1990): "Slot Grammar: A System 
for Simpler Construction of Practical Natural l~tH- 
guage Grammars", In R. Studer (ed.), Natural Lan- 
guage and Logic: International Scientific 
Symposium, Lecture Notes in Computer Science, 
Springer Verlag, Berlin, pp. \] 18-145. 

MeCord, M. C. (forthcoming): "The Slot Gram- 
mar System", In J. Wedekind and Ch. P, ohrer (ed.), 
Unification in Grammar, to appear in MIT Press. 

McCord, M. C., Wolff, S. (1988): The Lexicon and 
Morphology for LMT, a Prolog-based MT system, 
Research P, eport RC 13403, IBM Research Divi- 
sion, Yorktown lleights, NY 1(1598. 

Neff/Byrd/Rizk (1988) M. S. Neff, R. J. Byrd, O. 
A. P.izk: "Creating and Querying }lierarclfical 
l,cxical Data Bases", l'roceedings of the 2rid ACI. 
Conference on Applied NLP, 1988. 

Neff, M. S., M. C. McCord (1990): "Acquiring 
Lcxical l)ata From Machine-readable Dictionary 
Resources for Machine Translation", Proceedings 
of the 3rd Int. Conf. on Theoretical and 
Methodological Issues in Machine Translation of 
Natural Languages, pp. 87-92. Linguistics Research 
Center, Univ. of Texas, Austin. 

Storrer, A. (1990): "()berlegungen zur 
Reprhsentation der Verbsyntax in einer 
multifunktional-polythcoretisehen lexikalischen 
I)atenb.'mk" , Sehaeder,B./Rieger,B. (ed.): Lexikon 
und Lexikographie: ma.~chinell - maschinell gestatzt. 
Grundlagen Entwieklungen Produkte, 
Hildesheim, pp. 120-133. 

Ri\[non, M., McCord, M. C., SchwaU, U., Martinez, 
P. (1991): "Advances in Maehiue Translation Re- 
search in IBM," Proceedings of MT Summit III, pp. 
11-18, Washington D.C. 
