Automatic Generation of Translation Dictionaries Using Intermediary
Languages
Kisuh Ahn and Matthew Frampton
ICCS,SchoolofInformatics
EdinburghUniversity
K.Ahn@sms.ed.ac.uk,M.J.E.Frampton@sms.ed.ac.uk
Abstract
Wedescribeamethodwhichusesoneormore
intermediary languages in order to automati-
cally generate translation dictionaries. Such
a method could potentially be used to effi-
ciently create translation dictionaries for lan-
guage groups which have as yet had little in-
teraction. For any given word in the source
language,ourmethodinvolvesfirsttranslating
into the intermediary language(s), then into
thetargetlanguage,backintotheintermediary
language(s) and finally back into the source
language. The relationship between a word
andthe numberof possibletranslations in an-
otherlanguageismostoften1-to-many,andso
at each stage, the number of possible transla-
tionsgrowsexponentially.Ifwearrivebackat
the same starting point i.e. the same word in
the sourcelanguage,thenwe hypothesisethat
the meanings of the words in the chain have
not diverged significantly. Hence we back-
track through the link structure to the target
language word and accept this as a suitable
translation. We havetestedourmethodbyus-
ingEnglishasanintermediarylanguagetoau-
tomaticallygenerateaSpanish-to-Germandic-
tionary,andtheresultsareencouraging.
1 Introduction
In this paper we describe a method which uses one or
moreintermediarylanguagestoautomaticallygenerate
a dictionary to translate from one language,
a0
,toan-
other, a1 . The method relies on using dictionaries that
canconnect
a0
to a1 andbackto
a0
viatheintermediary
language(s),e.g.
a0 a2 a4 a6
,
a4 a6 a2
a1 , a1
a2 a4 a6
,
a4 a6 a2
a0
,where
a4 a6
is an intermediarylanguage such as En-
glish. The resources required to exploit the method
are not difficult to find since dictionaries already ex-
istthattranslatebetweenEnglishandavastnumberof
otherlanguages. Whereas at present theproductionof
translationdictionariesismanual(e.g. (Serasset1994)),
ourmethodisautomatic. Webelievethatprojectssuch
as(Boitet et al.2002)and(Wiktionary),whicharecur-
rentlygeneratingtranslationdictionariesbyhandcould
benefitgreatlyfromusingourmethod. Translationdic-
tionariesare usefulnot onlyforend-userconsumption
but also for various multilingual tasks such as cross-
language question answering (e.g. (Ahn et al.2004))
andinformationretrieval(e.g. (Argawet al.2004)).We
have applied our method to automatically generate a
Spanish-to-Germandictionary.Wechosethislanguage
pairbecausewewereabletofindanonlineSpanish-to-
Germandictionarywhichcouldbeusedtoevaluateour
result.
The structure of the paper is as follows. In sec-
tion 2.1, we describe how if we translate a word from
a source language into an intermediary language, and
then into a target language, the number of possible
translationsmaygrowdrastically. Someofthesetrans-
lations will be ‘better’ than others, and in section 2.2
we givea detailed description of ourmethodforiden-
tifyingthese‘better’translations. Havingidentifiedthe
‘better’translationswecanthenautomaticallygenerate
a dictionary that translates directly from the source to
the target language. In section 3 we describe how we
used our method to automatically generate a Spanish-
to-German dictionary, and in section 3.3, we evaluate
the result. Finally, in section 4, we conclude and sug-
gestfuturework.
2 Translating Via An Intermediary
Language
2.1 The Problem
Consider the problem of finding the different possible
translationsforaword a11 fromlanguage
a0
inlanguage
a1 when there is no available
a0 a2
a1 dictionary. Let
us assume that there are dictionaries which allow us
to connect from
a0
to a1 and back to
a0
via an inter-
mediary language
a4 a6
i.e. dictionaries for
a0 a2 a4 a6
,
a4 a6 a2
a1 , a1
a2 a4 a6
and
a4 a6 a2 a0
,asshowninfigure1.
If there was only ever a18 suitable translation for any
givenword in anotherlanguage, then it would be triv-
ial to use dictionaries
a0 a2 a4 a6
and
a4 a6 a2
a1 in order
to obtain a translation of a11 in language a1 .However,
this is not the case - forany given word a11 in language
a0
the
a0 a2 a4 a6
dictionary will usually give multiple
possibletranslations a23 a24 a26 a28 a30 a30 a30 a24 a26 a34 a36 ,someofwhichdiverge
morethanothersinmeaningfrom a11 .The
a4 a6 a2
a1 dic-
tionarywillthenproducemultiplepossibletranslations
foreach of a23 a24 a26 a28 a30 a30 a30 a24 a26 a34 a36 to give a23 a39 a28 a30 a30 a30 a39 a42 a36 where a44 a45 a47 a49 .
Again, some of a23 a39 a28 a30 a30 a30 a39 a42 a36 will diverge more than oth-
41
DictionaryDictionary
  X −> IL
Dictionary
  IL −> Y
  Y −> IL
Dictionary
  IL −> X
Figure1: Thecycleofdictionaries
ers in meaning from their source words in a23 a24 a26 a28 a30 a30 a30 a24 a26 a34 a36 .
Hence we have a44 possible translations of the word a11
fromlanguage
a0
inlanguage a1 .Someofa23 a39 a28 a30 a30 a30 a39 a42 a36 will
havedivergedless in meaningthan others from a11 ,and
socanbeconsidered‘better’translations. Theproblem
thenishowtoidentifythese‘better’translations.
2.2 Using The Link Structure To Find ‘Better’
Translations
Our method for identifying the ‘better’ translations is
tofirstusedictionary a1
a2 a4 a6
toproduce a23 a24 a26
a1
a28 a30 a30 a30 a24 a26
a1 a2
a36 ,
themultiplepossibletranslationsofeach of a23 a39 a28 a30 a30 a30 a39 a4 a36 ,
where a6 a45 a47 a7 . Next we use dictionary
a4 a6 a2 a0
to
give a23 a11
a1
a28 a30 a30 a30 a11
a1 a9
a36 , the multiple translations of each of
a23 a24 a26
a1
a28 a30 a30 a30 a24 a26
a1 a2
a36 ,wherea11 a45 a47 a6 . Wethenselecteachofthe
membersof the set a23 a11
a1
a28 a30 a30 a30 a11
a1 a9
a36 which are equal to the
originalword a11 . We hypothesisethat to have returned
to the same starting word, the meanings of the words
thathaveformedachainthroughthelinkstructurecan-
not have diverged significantly, and so we retrace two
stepstothewordin a23 a39 a28 a30 a30 a30 a39 a4 a36 andacceptthisasasuit-
able translation of a11 . Figure 2 represents a hypotheti-
cal case in which two members of the set a23 a11
a1
a28 a30 a30 a30 a11
a1 a9
a36
are equal to the original word a11 . We retrace ourroute
from these through the links to a39 a28 and a39 a13 , and we ac-
cepttheseassuitabletranslations.
X IL Y IL X
x1
x1
x1
−> −> −> −>
y1
y2
Figure 2: Translating from
a0 a2 a4 a6 a2
a1
a2 a4 a6 a2
a0
. Nodesarepossibletranslations.
If we apply the method described here to a large
numberof words from language
a0
then we can auto-
maticallygeneratealanguage
a0
-to-language a1 dictio-
nary. Herewehaveconsideredusingjustoneinterme-
diary language, but provided we have the dictionaries
to complete a cycle from
a0
to a1 and back to
a0
,then
wecanuseanynumberofintermediarylanguages,e.g.
a0 a2 a4 a6
,
a4 a6 a2 a4 a6 a1
,
a4 a6 a1 a2 a4 a6
,
a4 a6 a2
a1 ,where
a4 a6 a1
isasecondintermediarylanguage.
3 The Experiment
We have applied the method described in section 2 in
order to automatically generate a Spanish-to-German
dictionary using Spanish-to-English, English-to-
German, German-to-English and English-to-Spanish
dictionaries. We chose Spanish and German because
we were able to find an online Spanish-to-German
dictionary which could be used to evaluate our
automatically-generateddictionary.
3.1 Obtaining The Data
We first collected large lists of German and English
lemmas from the Celex Database, ((Baayen and Gu-
likers1995)). We also gathered a short list of Span-
ish lemmas, all starting with the letter ‘a’ from the
Wiktionary website (Wiktionary) to use as our start-
ing terms. We created our own dictionaries by mak-
ing use of online dictionaries. In order to obtain the
English translations for the German lemmas and vice
versa, we queried ‘The New English-German Dictio-
nary’ site of The Technical Universiy of Dresden
1
.
ToobtaintheEnglishtranslationsfortheSpanishlem-
mas and vice versa, we queried ‘The Spanish Dict’
website
2
. Finally, we wanted to compare the per-
formance of our automatically-generated Spanish-to-
German dictionary with that of a manually-generated
Spanish-to-German dictionary, and for this we used a
website called ‘DIX: Deutsch-Spanisch Woerterbuch’
3
.Tablea18 gives information about the four dictionar-
ies which we created in order to automatically gener-
ate our Spanish-to-Germandictionary. The fifth is the
manually-generateddictionaryusedforevaluation.
Dicts Ents Trans Trans/term
StoE a14 a14 a15 a18
a1 a16 a1
a18 a30 a18
EtoS a18 a18 a18 a15 a20
a1
a18 a22
a16 a16
a18 a30 a20
GtoE
a1 a23 a24
a18 a18
a23 a23
a14
a1
a14
a1
a30 a18
EtoG
a1
a14
a1 a24
a18 a18
a16
a15 a14
a24
a22 a15 a30
a16
StoG’ a14 a18
a16
a22 a18
a1
a14 a20 a30 a18
Table 1: Dictionaries; S = Spanish, E = English, G =
German,StoG’isthedictionaryusedforevaluation.
1
http://www.iee.et.tu-dresden.de/cgi-
bin/cgiwrap/wernerr/search.sh
2
http://www.spanishdict.com/
3
http://dix.osola.com/
42
3.2 Automatically Generating The Dictionary
For our experiment, we used the method described in
section2toautomaticallyconstructascaled-downver-
sion of a Spanish-to-German dictionary. It contained
a14 a14 a15 Spanish terms, all starting with the letter ‘a’. To
storeandoperateonthedata,weusedtheopensource
database program PostgresSQL, version
a23
a30 a22 a30 a15 .Start-
ing with the Spanish-to-English dictionary, at each of
stages a18 a0 a22 ,we produceda newdictionarytablewith
an additional column to the right for the new lan-
guage. We did this by using the appropriate dictio-
nary to look up the translations for the terms in the
old rightmost column, before inserting these transla-
tions into a new rightmost column. For example, to
create the Spanish-to-English-to-German(SEG) table,
we used the English-to-German dictionary to find the
translations for the English terms in the Spanish-to-
English(SE)table,andtheninsertedthesetranslations
into a new rightmostcolumn. We kept producingnew
tablesinthisfashionuntilwehadgeneratedaSpanish-
to-English-to-German-to-English-to-Spanish(SEGES)
table. Instage a15 ,thefinalstage,weselectedonlythose
rows in which the starting and ending Spanish terms
were the same. Important characteristics of these dic-
tionarytablesaregivenintable
a1
.
Stages Dicts Ents Trans Trans/term
0 SE a14 a14 a15 a18
a1 a16 a1
a18 a30 a18
1 SEG a14 a22 a18 a14
a24
a14 a14 a18 a18 a30
a16
2 SEGE a20 a14 a18
a1
a15
a16
a18
a1
a15
a1
a30 a22
3 SEGES a20 a14 a14 a22 a18
a16 a16 a1
a14
a23
a30 a18
4 SEGES a20 a22 a22 a15 a22 a18 a22 a18 a30 a18
Table 2: Constructing Dictionary; Ents = number of
entries, Trans = number of translations, Trans/term =
averagenumberoftranslationsgivenperentry.
Table
a1
shows that the number of translations-per-
term grew and grew from a18 a30 a18 translations in the start-
ingSpanish-to-Englishdictionarytoanenormous a14
a23
a30 a18
translationsper term in the SEGES table afterstage a22 .
However,afterstage a15 ,havingselectedonlythoserows
with matchingfirst andlast entriesforSpanish,we re-
ducedthenumberoftranslationsbackto a18 a30 a18 perterm.
3.3 Evaluation
Having automatically generated the Spanish-to-
German dictionary containing a20 a22 a22 unique Spanish
terms, we then compared it to the manually-generated
Spanish-to-German dictionary (see section 3.1).
We gave the same initial a14 a14 a15 Spanish terms to the
manually-generated dictionary but received transla-
tionsforonly a14 a18
a16
.
The results are summarised in table a22 . We observe
thatwhenweregardthemanually-generateddictionary
astheGold-standard,ourautomatically-generateddic-
tionarymanagedto producea relativelyadequatecov-
erage of some
a23
a22 a30 a14
a2
(a22
a24 a1
out of a20 a22 a22 ) with respect
Auto SG Man SG Overlap
Entries a20 a22 a22 a14 a18
a16
a22
a24 a1
a23
a23
a22 a30 a14
a2
a36
Total Trans a15 a22 a18 a22 a22 a18
a1
a14 a18 a20
a23 a23
Trans/Entry a18 a30 a18 a20 a30 a18 a15 a30
a16
a23
a23
a18 a30 a15
a2
a36
Table3: Result: SGautomaticvsSGmanual
to main entries overlap between the two dictionaries.
When we look at the numberof translations per term,
wefindthatourdictionarycoveredmostofthetransla-
tions found in the manually-generated dictionary (a15 a30
a16
out of a20 a30 a18 average or
a23
a18 a30 a15
a2
) for which there was a
correspondingentryinourdictionary. Infact, ourdic-
tionary produced more translations-per-term than the
manually-generated one. An extra translation may be
anerrororitmaynotappearinthemanually-generated
dictionary because the manually-generated dictionary
istoosparse. Furtherevaluationisrequiredin orderto
assesshowmanyoftheextratranslationswereerrors.
In conclusion, we find that our automatically-
generated dictionary has an adequate but not perfect
coverage and very good recall for each term covered
withinourdictionary. Asfortheprecisionofthetrans-
lationsfound,weneedmoreinvestigationandperhaps
a more complete manually-generated comparison dic-
tionary. The results might have been even better had
it not been for several problems with the four starting
dictionaries. Forexample,atranslationforaparticular
word could sometimes not be foundas an entry in the
next dictionary. This might be because the entry sim-
plywasn’tpresent,orbecauseofdifferentconventions
e.g. listing verbsas “to Z” whenanothersimply gives
“Z”. Another cause was differences in font encoding
e.g. with German umlauts. Results might also have
improved had the starting dictionaries provided more
translations per entry term, and had we used part-of-
speech information - this was impossible since not all
ofthedictionarieslistedpart-of-speech.Allinallgiven
the fact that the quality of data with which we started
was far from ideal, we believe that our method shows
greatpromiseforsavinghumanlabourintheconstruc-
tionoftranslationdictionaries.
4Conclusion
In this paper we have described a method using one
or more intermediary languages to automatically gen-
erate a dictionary to translate from one language,
a0
,
to another, a1 . The method relies on using dictionar-
ies that can connect
a0
to a1 and back to
a0
via the in-
termediarylanguage(s). We appliedthe methodto au-
tomatically generate a Spanish-to-German dictionary,
anddesptitethelimitationsofourstartingdictionaries,
the result seems to be reasonablygood. As was stated
insection a22 a30 a22 ,wedidnotevaluatewhethertranslations
we generated that were not in the gold-standard man-
ual dictionary were errors or good translations. This
is essential futurework. We also intend to empirically
43
testwhathappenswhenfurtherintermediarydictionar-
iesareintroducedintothechain.
We believe that our method can make a great con-
tributiontotheconstructionoftranslationdictionaries.
Evenifadictionaryproducedbyourmethodisnotcon-
sidered quite complete or accurate enough for general
use, it can serve as a very goodstarting point, thereby
saving a great deal of human labour - human labour
thatrequiresalargeamountoflinguisticexpertise. Our
methodcouldbeusedtoproducetranslationdictionar-
ies for relatively unconnected language groups, most
likely by using English as an intermediary language.
Suchtranslationdictionariescouldbeimportantinpro-
motingcommunicationbetweentheselanguagegroups
andanevermoreglobalisedandinterconnectedworld.
A final point to make regards applying our method
more generally outside of the domain of translation
dictionary construction. We believe that our method,
whichmakesuseoflinkstructures,couldbeappliedin
differentareasinvolvinggraphs.

References
KisuhAhn,BeatrixAlex,JohanBos,TiphaineDalmas,
Jochen L. Leidner,Matthew B. Smillie, and Bonnie
Webber. Cross-lingualquestionansweringwithqed.
2004.
Atelach Alemu Argaw, Lars Asker, Richard Coester,
and Jussi Kalgren. Dictionary based amharic - en-
glishinformationretrieval. 2004.
R.H. Baayen and L. Gulikers. The celex lexical
database(release2). In Distriubted by the Linguistic
Data Consortium,1995.
Christian Boitet, Mathieu Mangeot, and Gilles Seras-
set. The papillon project: Cooperatively build-
ing a multilingual lexical data-base to derive open
source dictionaries and lexicons. In 2nd Workshop
NLPXML, pages 93–96, Taipei, Taiwan, September
2002.
Gilles Serasset. Interlingual lexical organization for
multilingual lexical databases. In Proceedings of
15th International Conference on Computational
Linguistics, COLING-94,pages5–9,Aug1994.
Wiktionary. A wiki based opencontent dictionary. In
http://www.wiktionary.org/.
