An Intelligent Multi-Dictionary Environment 
Gdbor Pr6sz6ky 
MorphoLogic 
K6smfirki u. 8., H-1118 Budapest, Hungary 
proszeky @ morphologic.hu 
Abstract 
An open, extendible multi-dictionary sys- 
tem is introduced in the paper. It supports 
the translator in accessing adequate entries 
of various bi- and monolingual dictionaries 
and translation examples from parallel cor- 
pora. Simultaneously an unlimited number 
of dictionaries can be held open, thus by a 
single interrogation step, all the dictionaries 
(translations, explanations, synonyms, etc.) 
can be surveyed. The implemented system 
(called MoBiDic) knows morphological 
rules of the dictionaries' languages. Thus, 
never the actual (inflected) words, but al- 
ways their lemmas - that is, the right dic- 
tionary entries - are looked up. MoBiDic 
has an open, multimedial architecture, thus 
it is suitable for handling not only textual, 
but speaking or picture dictionaries, as well. 
The same system is also able to find words 
and expressions in corpora, dynamically 
providing the translators with examples 
from their earlier translations or other 
translators' works. MoBiDic has been de- 
signed for translator workgroups, where the 
translators' own glossaries (built also with 
the help of the system) may also be dis- 
seminated among the members of the 
group, with different access rights, if 
needed. The system has a TCP/IP-based 
client-server implementation for various 
platforms and available with a gradually in- 
creasing number of dictionaries for numer- 
ous language pairs. 
Introduction 
"The whole world of translation is opening up, to 
new possibilities, and to technological and meth- 
odological change" (Kingscott 1993). Some years 
after the above claim, we see that software tools 
for translators, even the most recent ones, do not 
yet guarantee perfect solutions to automatic 
translation. More and more systems introduce, 
however, new facilities to the translator working 
in a computational environment. As Hutchins 
says, "the best use must be made of those systems 
that are available, and the producers and develop- 
ers must be encouraged to improve and introduce 
new facilities to meet user needs." (Hutchins 
1996) 
It is almost a commonplace that texts - books, 
newspapers, letters, official memos, brochures, 
any type of publications, reports, etc. - in the 
nineties are written, sent, read and translated with 
the help of the electronic media. Consequently, 
traditional information sources, like paper-based 
dictionaries, and lexicons, are no longer as much a 
part of the translation environment. 
Electronic dictionaries for most developers just 
mean, however, to make the well-known paper 
dictionary image appear on the computer screen. 
It is easy to understand why we say that dictionary 
computerization does not mean producing ma- 
chine-readable versions of traditional printed dic- 
tionaries, but the combination of the existing lexi- 
cal resources with up-to-date language technol- 
ogy. 
On the other hand, there is a question whether 
we have to continue in the traditional way of de- 
veloping new - and different - lexicons for any 
new application/system, starting from scratch 
every time and therefore consuming time, money 
and manpower, or is it new lexicons. 
In what follows, timely to think of the possi- 
bility of making the effort to converge, trying to 
avoid unnecessary duplications and - where pos- 
sible - building on what already exists (Calzolari 
1994). Consequently, in the near future we have 
to combine the two above needs: making existing 
1067 
lexical resources computationally accessible and 
showing the strategy how to develop we try to ar- 
gue for changes in development strategies of 
electronic translation dictionaries. Today's ling- 
ware technology can - and must - use dynamic 
actions, like morpho-syntactic analysis, lemmati- 
zation, spell checking, and so on. On the other 
hand, dictionaries can never be full in any sense, 
therefore we have to make parallel multi- 
dictionary access possible. It means that a single 
dictionary look-up should use an unlimited num- 
ber of lexical resources that are available for the 
translator. 
1 The MoBiDic Look-up System 
To start with the most natural activity concerning 
dictionaries is searching them for a single word. 
There is no problem if it can be found among the 
headwords of the dictionary, that is, when the in- 
put string can match. But sometimes the translator 
starts the look-up process by clicking an inflected 
word-form of an open document that cannot be 
found among the headwords. For the user it is a 
boring and time-consuming task to type the lexical 
form, that is, the one accepted letter-by-letter by 
the dictionary. To make the system able to find 
the stem of the input word-form automatically, 
MoBiDic uses a lemmatizer that provides the dic- 
tionary look-up module with the stem(s) to be 
found (Figure 1). 
Translators frequently want to find the word as 
a part of multi-word expressions or idioms. If the 
user does not know whether the actual word is 
part of some phrasal compound or idiom, the tra- 
ditional paper dictionaries are very difficult to 
use. Namely, if the word in question is the so- 
called headword of a multi-word expression, it 
can be found easily. In case it is not the headword, 
one has to know the phrasal compound the word 
is a part of, but it is a typical "Catch 22" Situation: 
if the expression is known why to search the dic- 
tionary for it? MoBiDic helps the user to find all 
the multi-word expressions containing the actual 
word's stem, independently whether it is a head- 
word or not. E.g. not only 'lead' but both 'dog' and 
'//fe' provide us (among others) with the multi- 
word expression 'lead a dog's life' that can be 
found under 'lead' only in a paper dictionary. In 
other words, users of the traditional dictionaries 
k:~:rm~ I II II II .. !DI :,..I 
I.N~ kit~ os 
2" lel° ess el kimer, lel~'P, vegi~/a 
lI.(k ~ eft.) lie k allilleilli 141 tt/ddl laNtlil, 1~ ~ a miglii 
a~s-[elm z [.~] (v#.) 
~sgel~eitet 2. (hezuk6I) elme ~#,, t ~ivo2~k. 16me lty leer am~ekem ~ei 
~l[[[[[[[[gmnim[ii[m 3, ~au)l; k~akul 
4. kiallzik, elels:~, ~haravad 
eusgekss:en 5. elfoID", elt~mik, elv~z 
eu~en~c~ 6.v~gz~d~ 
au~em~e~ 
~ .. 7. our e~.) (~mi~ e) t ~ek~ik, (~mit) h aj ~r~l, ('emit) h ejla~z 
em~echnet , seLq Plan geii ~ra~ iu az a ~rve 
ausgei~.oche~ ~I 9. au~e~em lu#en kib oc i ~t 
Figure 1 
Look-up of a morphologically complex inflected form: 
'ausgegangen' in a German-Hungarian dictionary. 
are supposed to know the expression (what's 
more: the keyword of the expression) to find it in 
the lexicon. Search for 'leada dog's life' through 
its components gives the following result in 
MoBiDic: 
lead {lead, leads, leading, led} 
27 occurrences in expressions of the basic dictionary, 
dog {dog, dogs, dog's, dogs'} 
21 occurrences in expressions of the basic dictionary, 
life {life, lives, life's, lives'} 
77 occurrences in expressions of the basic dictionary, 
lead AND life 
5 occurrences in expressions of the basic dictionary, 
dog AND life 
2 occurrences in expressions of the basic dictionary, 
lead AND dog 
1 occurrence in expressions of the basic dictionary, 
lead a dog's life 
I occurrence as an expression in the basic dictionary. 
'Bi' is somewhat misleading in the name Mo- 
BiDic. Bilingual in this sense means that the 
source and the target language are not the same 
types of object for the program. For MoBiDic, 
source language is the language the morphology 
of which has to be known, to provide the user 
with adequate output. The output is expected to be 
in the target language - the characters, the alpha- 
betic order, etc. of which has to be known to make 
the hits appear on the screen in adequate format. 
Of course, the source and target languages can be 
the same, e.g. in explanatory or etymological dic- 
tionaries (Figure 2). 
1068 
Figure 2 
Hungarian explanation of 'acceptable quality level' in 
the English-Hungarian Economical Explanatory Dic- 
tionary. 
There is an another sort of monolingual dic- 
tionary, the synonym dictionary. The translator 
frequently wants to use a synonym (antonym, hy- 
pernym, hyponym) of the actual word. An intelli- 
gent software tool, like MorphoLogic's Helyette 1, 
is the combination of a thesaurus (synonym dic- 
tionary), a morphological analyzer and a genera- 
tor, because the output is re-inflected according to 
the morphological information contained by the 
input word-form. The - so-called inflectional - 
thesaurus works as follows: 
INPUT: came 
ANALYSIS : came = come + Past 
STEM: come 
SYNONYM: go 
SYNTHESIS: go + Past = went 
OUTPUT: went 
There are special sorts of information in a dic- 
tionary. For example, pronunciation is not typi- 
cally needed for translation, but can be useful for 
language learners. Pronunciation of the word is, 
therefore, an information that should be switched 
on and off, according to the user's needs. In an 
electronic dictionary it is expected that not only 
the written phonetic transcription, but also the 
spoken output can be heard. If the dictionary sup- 
ports multimedia, explanatory pictures can help 
understand the word, even for professionals, not 
for language learners only (Fig. 3). 
If the translator makes a spelling error, first a 
speller starts, and then the corrected word-form is 
sent to the dictionary look-up system. 
Examples do belong to the entries of large, 
professional paper dictionaries. In electronic dic- 
To be combined with MoBiDic in the near future. 
tionaries occurrences of the word in texts of other 
authors, or wants to see bilingual texts with their 
aligned translations: monolingual or aligned bilin- 
gual corpus, a free text search module and a lem- 
matizer. 
2 Dictionaries in MoBiDic 
The lexicographic basis for MoBiDic is sup- 
plied by various publishing houses. More pre- 
cisely, MorphoLogic has licenses to almost 50 
dictionaries already published in paper format of 
miscellaneous topics, diverse sizes and many lan- 
guage pairs. The user can choose which dictionary 
to use in general, and which of them open actu- 
ally. Currently, if all the available dictionaries are 
open, MoBiDic handles approximately 1 million 
lexical entries. 
Some of the dictionaries, mainly the termino- 
logical ones, have usually a very simple list-based 
structure. Dictionaries shown by Figure 1 and 
Figure 2, however, appear on the screen with the 
traditional paper dictionary image. It is done by 
using SGML representations and an on-line 
SGML-RTF conversion. MoBiDic can do exact 
structural search not influenced by the layout at 
all. 
Generally, the original lexical resource - even 
it has been available in electronic format - did not 
use SGML. For this reason, a special system for a 
semi-automatic conversion of some formatted text 
files containing dictionary data to SGML format 
has been developed for the MoBiDic environ- 
ment. This system is not available for the end- 
users, it serves industrial purposes. 2 First, in order 
to enable selective access to the information in 
dictionary entries, a thorough structural analysis is 
done, while inconsistent and faulty entries are 
marked. They are corrected later, manually. The 
resulting SGML-annotated dictionaries are en- 
hanced with the necessary indexes. They are 
lemma-variants and expanded sub-entries made 
with the help of existing language technology 
modules (Pr6szrky 1994). 
Users like to work with their own little vo- 
cabularies, glossaries, and the professional trans- 
lator is usually asked to use official translation 
2 See http://www.morphologic.hu/esgml.htm 
1069 
equivalents provided by the employer. These 
glossaries are generally never published, but there 
is a need to us them in the same environment. 
MoBiDic is able to treat user dictionaries con- 
taining any type of information sources (lexicons, 
encyclopedias and dictionaries). 
Figure 3 
'grapes' (from the PicDIC picture dictionary) 
with pronunciation in MoBiDic 
"_t :1 ~u~` 
t "i i+ , +~ I + • 
dmy ['dju:tl] n I kbteless+g, 
feladat 2 on/off ~ ~olg/datban, 
fzsyeleteslszolg/daton ~vfal 3 vlan 
4 ~free vimamentes 
Ill E,,~.h "I 
~lv6m 
Ilcladat 
I" 1 duty [Benldn 9 (SGML] l 
I__.~l au%, lauW.ess ISGULII-- I 
I= II d,~ pnformatics [SGML 
I- ~" ""~ iL, tsGuui 
Figure 4 
Search for the (lemma of) 'duties' in a set of English- 
Hungarian dictionaries 
The strength of this method is that user dic- 
tionaries are looked up for a word exactly when 
other dictionaries, thus translator's remarks can 
also be read when other dictionaries provide the 
user with their translation equivalents. Here we 
have to emphasize again that MoBiDic is not yet 
another electronic dictionary, but a multi- 
dictionary environment where a single word is 
sent to every open dictionary by a single mouse- 
click. In Figure 4 the user started from the word- 
form "duties ', and eight dictionaries (that are open 
and contain English either on the source or the 
target side) send translations to the screen. 
3 Implementation Features 
The most recent development is MoBiDic's cli- 
ent-server implementation. Its server side (Win- 
dows NT, Unix and Novell) consists, in fact, of 
two servers: the linguistic server and the diction- 
ary server. The user interface and screen handling 
modules will take place on the (Win, Mac, Linux, 
Java, etc.) client side. 
There are many software modules of other ven- 
dors on the market that can also be combined with 
MoBiDic through its well-defined application 
programming interface (API). With the help of 
this API the user can communicate to the other 
modules from MoBiDic without leaving it. Be- 
cause of technical and legal reasons, it can, of 
course, be done in collaboration with the devel- 
oper of the product in question. The picture dic- 
tionary shown by Figure 4 is a working example: 
the vocabulary part of the (also commercial) 
CALL program called PicDIC is available for 
MoBiDic users from the familiar environment. 
Translators who generally use their favorite 
word-processor while translating can use Mo- 
BiDic from their word-processing tools with the 
help of the included macros. Another important 
issue is that users can use their CD-ROM drive for 
other purposes while translating. Namely, Mo- 
BiDic has minimal space requirement because of 
its compression method 3, therefore the full dic- 
tionary system can be copied to the hard disk: thus 
the CD drive is freed and can be used for other 
purposes. 
4 Comparison with other methods 
There are several dictionary programs both in 
laboratories and on the market, but only some of 
them share the so-called "intelligent" features 
with MoBiDic. Rank Xerox developed in the 
COMPASS and Locolex projects a prototype that 
accesses enhanced and structurally elaborated 
dictionaries with an intelligent, context-sensitive 
3 Average 1-2 Mb/dictionary. 
1070 
look-up procedure, presenting the information to 
the user through an attractive graphical interface. 
(Feldweg and Breidt 1996) Unlike MoBiDic, it 
does not have access to more than one dictionary 
at the same time. Consequently, user dictionaries 
are not supported. SGML is, however, used both 
in the dictionary and the corpus modules. There is 
a focus on the intelligent treatment of multi-word 
units in the IDAREX formalism (Breidt et al 
1996). Another project with similar aims is 
GLOSSER. Its prototype (Nerbonne et al. 1997) 
carries out a morphological analysis of the sen- 
tence in which the selected word occurs and a sto- 
chastic disambiguation of the word class informa- 
tion. This information is then matched against a 
(single, but SGML) dictionary and corpora. The 
GLOSSER prototype displays context dependent 
translations and on request, examples from the 
available corpora. Neither of the above develop- 
ments nor other web dictionary services (e.g. 
WordBot) share all the important features with 
MoBiDic: client-server architecture, multi- 
dictionary access, user dictionary handling, par- 
allel (and intelligent) dictionary and corpus look- 
up. What's more, MoBiDic is commercially also 
available, that is tested by thousands of "real" 
end-users. 
Conclusion 
MoBiDic is a multi-dictionary translation envi- 
ronment based on a client-server architecture. It 
consists of the following main parts: linguistic 
server, dictionary server and the client with the 
graphical user interface. There are several bene- 
fits: 
(1) the linguistic server is dictionary independent 
and language dependent4; 
(2) the dictionary server has intelligent access to 
various sorts of dictionaries (from SGML to 
multimedia) and bilingual corpora; 
4 Recently, English, German, Hungarian, Polish, Czech 
and Romanian morphological components are avail- 
able for the MoBiDic users. Descriptions for further 
languages are under development, see the web site 
http://www.morphologic.hu for the actual list of lan- 
guages. 
(3) simultaneously an unlimited number of dic- 
tionaries can be held open, thus by a single 
interrogation step, all the dictionaries (with 
translations, explanations, synonyms, etc.) can 
be surveyed; 
(4) the translators' own glossaries built with the 
help of the system may also be disseminated 
(as new dictionaries, with the needed copy- 
rights) among other users, if needed; 
(5) it has an open architecture and a well-defined 
API;. 
(6) it has been implemented and is available with 
a gradually increasing number of dictionaries 
for numerous language pairs. 
MoBiDic is, therefore, not a research project only, 
but a set of translation tools for a wider public. 

References 
Breidt. E., F. Segond and G. Valetto (1994) Local 
Grammars for the Description of Multi-Word Lexe- 
mes and Their Automatic Recognition in Texts. Pa- 
pers in Computational Lexicography, Linguistics In- 
stitute, HAS, Budapest, pp. 19-28. 
Calzolari, N. (1994) Issues for Lexicon Building. In: A. 
Zampolli, N. Calzolari & M. Palmer (eds.) Current 
Issues in Computational Linguistics: In Honour of 
Don Walker. Kluwer / Giardini Editori, Pisa, pp. 
267-281. 
Feldweg, H. and E. Breidt. (1996) COMPASS - An 
Intelligent Dictionary System for Reading Text in a 
Foreign Language. Papers in Computational Lexi- 
cography, Linguistics Institute, HAS, Budapest, pp. 
53--62. 
Hutchins, J. (1996) Introduction. Proceediings of the 
EAMT Machine Translation Workshop, Vienna, pp. 
7-8. 
Kingscott, G. (1993) Applications of Machine Transla- 
tion. In: Transferre necesse est... (Current Issues of 
Translation Theory), Szombathely, pp. 239-248. 
Nerbonne, L. Karttunen, E. Paskaleva, G. Pr6szrky and 
T. Roosmaa (1997) Reading More into Foreign Lan- 
guages. Proceedings of the Fifth Conference on Ap- 
plied Natural Language Processing, Washington.. 
Pr6szrky, G. (1994) Industrial Applications of Unifica- 
tion Morphology. Proceedings of the 4th Conference 
on Applied Natural Language Processing, Stuttgart, 
pp. 157-159. 
