Construction of a Hierarchical \]¥anslation Memory 
S. Vogel, H. Ney 
Lehrstuhl flit Intbrmatik VI, Computer Science Department 
1{1~7~111 Aach(',n University of T(~chnology 
1)-52056 Aachen, Gerinm~y 
Elnaih vogel@informatik, rwth-aachen, de 
Abstract 
_q}:anslation memories are t)ronfising devi('es for 
artt;omati(- translation. Their main weakuess, 
however, is poor coverage, on llllSeell {;ex|;. \]ill 
this t)at)er, l;he use of a hierarchical l;ransla- 
tion memory, (:onsisting of a ('as(:ade of finite 
si;~d;e transducers, is t)rot)os(;d. A mmfl)er of 
tr~nsdu(:e, rs is al)l)\]ied to (;onverl; s(;ni;enee 1)airs 
fl:om a t)ilingual cortms into translation pat- 
terns, which are then used as a translation me, m- 
ory. Pr(;liminary results on the (\]erman English 
V ERBMOIIIL ('orl)us a,re given. 
1 Introduction 
In reeenl; years, exa,int)le-1)ased t;l"ansl~l;i<)l~ has 
been 1)rol)osed as an efli<:ient ~n(;t;llo<l for auto- 
m~d;i(: translation (Sal;o and Nag;to, 1990; Ki- 
tan(), 1993; Brown, \]99(i). 'lli'anslations are 
sl;()l;ed ill a trallslai;i()ll llle.lllory tloll(t llso, d t;o coi1- 
SI;YllCI; trauslations for new sealten(:e.s. In its sin> 
1)lest version, examl)le-1)ased translation boils 
down to llSillg a (tat;fl)ase of SOllrce sell(;ell(;es 
with their l;rmlslations. For many translat;ion 
tasks, esl)eeially in coml)ul;er assisl;cd |;ransla- 
tion, this at)l)roa(:h works with greal; success. 
For flflly aul;onlat;i(" l;ranslal;ion the main t)rot) - 
\]em is t)oor COVel'a~e oi1 lleW data. To overcolIle 
this weakness, it hierar(:hi(:al translation llleln- 
ory is prot)osed. Al)plying a cascade of tinite 
sti%te |;ra, ils(hl('ers~ a~ SOllrce Selltellce is {;ralis- 
laW, d into the tin:get . 
2 The Transducers 
2.1 Overview 
A translat;ion lnemory is siml)ly a eolle(:|;ion 
of source-l;arge3; string i)airs. As a tirst Stel) ~ 
these translation examt)les (:all be (:onverted 
inl;o translation 1)atl;(.q:ns t)y lilt;reducing cate- 
gory \]abels, e.g. tbr prol)er nmnes or numbers. 
3.'0 make the translation patterns even more use- 
ful, not only single words but comph;x phrases 
can be replace.d by category labels. Which 
phrases t;o select for categorization depends on 
the aplflication, l,br example, the corpus lls0.d 
for this si;udy coal;alas many time and date ex- 
pressions. Therefore, a specialized |;ransduce, r 
was constructed to recognize and translal;e such 
e, xl)ressions. 
Each transducer is a se~ of quadrut)les of the 
tbrm: 
label # source pal;t;ern # l;arget; t)atl;ern # score, 
Som'ce l)al;terns and target patterns may con- 
tain category labels. We call su(:h l)atterns 
~(:ompomldL ~.l.~:ansdueea's working only on the 
word level are (:ailed 'simple'. \]if a la:ans- 
dll(:e,r coal;alas recursive p:tl;terns, e.g. \])ATE # 
\])NPE lind \])ATE # I)AS£I'~ and \])ATI'3 # -3.0, it; 
has |;o be. apl)lied re.cursively t;o t;he input;. 
The scores a,t|;a(:hed to the translation t)at - 
terns can be viewed ns translation scores. They 
are llse, d to bi~ts towards 1;he selection of lollg(;r 
part;eras and towards lliore likely translations 
in I;hose cases where several targol; patterns are 
associated with ()lie SOllrce t)a,l;i;ern. 
'.l'he transducers can be applied in 1)oth di- 
rections, i.e. for a given  pair, each 
 can be viewed as source . 
Thcrel)y, bilingual labeling is possilfle. This can 
l)e applied to convert a bilingual corlms into a 
selection of translation l)atterns which are. for- 
mulated in terms of words and ('ategory lal)els. 
2.2 Construction of the Transducers 
The transducers should t)e selected in such a 
way am to minimize l;he lle, ed tbr recursive ap- 
t)li(:al;ion in order l;o lint)rove efficiency. There- 
tbre, |;11(' l)atl;erns to search tbr are l)artitioned to 
forln a ('as(:ade of t;ransducers. Sonic trans(luc- 
ers analys(,' l)arts of the senten(:e and rel)la(:e it 
1131 
by a category label, which is then used at a later 
step by another transducer. The labeling of the 
days of the week or the names of the months is 
a prerequisite to apply more complex patterns 
for date expressions. The transducers currently 
used are listed in Table 1. 
'Fable 1: List of transducers. 
1. names (persons, towns, places, events, etc) 
2. spelling (e.g. 'D A double L') 
3. numbers (ordinal, cardinal, fractions, etc) 
4. time and date expressions 
5. parts of speech (tbr certain word classes) 
6. grammar (noun phrases, verb phrases) 
Some transducers are general in scope, e.g. 
the transducers for numbers, part of speech tags 
and grammar. Others are costumized towards 
the domain tbr which the translation system is 
developed. In tile VERBMOBIL corpus, which is 
used for the experiments, time and date expres- 
sions are very prominent. To recognize these 
expressions, a small grammar has been devel- 
oped and coded as finite state transducer. Ac- 
tually, two transducers are used. On the first 
level, words are replaced by labels, like DAY- 
OFWEEK = { Montag, Dienstag, ...}. On the 
second level, these labels are used to t'orm com- 
plex time and date expressions. This second 
transducer works recursively, as simpler expres- 
sions are used to build more complex expres- 
sions. 
Finally, a small grammar based on POS (part 
of speech) tags has been crafted mamlally. The 
purpose of this grammar is to recognize simple 
noun phrases. Extensions to handle the differ- 
ent word ordering in the verb phrases arc under 
development. 
2.3 Scoring 
The scores attached to the translation patterns 
can be viewed as a kind of translation scores. 
In the current implementation a rather crude 
heuristic together with some manual tuning in 
the grammar transducer is applied. The idea 
is to give preference to longer translation pat- 
terns as they take more context into account 
and encode word reordering in an explicit man- 
ner. Thus, fbr simple and compound translation 
patterns the score is exponential to the length 
of the source pattern. Tile scores are negative 
by convention: not translating a word gives zero 
cost, translating it gives a benefit, i.e. negative 
costs. In future, scoring will be refined by using 
corpus statistics to assign probabilities to the 
translation patterns. 
2.4 Bilingual Labeling 
The sentence pairs ill the bilingual training cor- 
pus can be segmented into shorter segments 
with the help of an alignment progrmn (Och et 
al., 1999). This collection of segments could be 
used directly as a translation memory. However, 
to improve the coverage on unseen data, these 
segnmnts are labeled. Applying the transducers 
as given in Table 1 transfbrms these segments 
into compound t)hrases. 
The procedure is as follows: 
1. For each transducer taken from the com- 
plete cascade - as given in Table 1 ap- 
lilY the transducer to both, the source and 
tlm target sentences of the bilingual train- 
ing cortms. 
2. Find those sentence pairs which contain 
equal number and types of category labels 
tbr both sentences. 
3. For sentence pairs which do not match in 
mmflmr and type of the category labels 
keep the original sentence pair. 
Table 2 shows examples of some translation 
patterns which resulted flom bilingual labeling. 
3 Applying the Transducers 
The working of the transducers is best described 
as tile construction of a translation graph. That 
is to say, the sentence to be translated is viewed 
as a graph which is traversed fi'om left to right. 
For each matching source pattern, as encoded 
in the transducers, a new edge is added to the 
graph. The edge is labeled with the category la- 
bel of the translation pattern. The translation 
and the translation score are attached to the 
edge. In this way a translation graph is con- 
structed. In those cases, where a source pattern 
has several translations, one edge tbr each trans- 
lation is added to the graph. 
Tim left right search on the graph is orga- 
nized in such a way that all paths are traversed 
1132 
Table 2: Coml)ound translation t)atterns (CTP). 
CTP ~ DATE_DAY ginge es wiedcr 
CTP ~ SURNAME am A1)i)~rat 
CTP ~ NP dauert DATE 
CTP @ nehmen PPER NP DATE 
@ DATE_DAY it is possible again :~ -4.6 
~/: this is SURNAME st)caking @ -3.3 
NP takes DATE :~ -3.3 
let PPER take NP DATE @ -4.6 
in parallel and tile patterns sl;ored in the trans- 
ducer are matched synchronously. For each 
~lo(te n and each edge e leading to n, all patterns 
in tile transducer starting with the label of e arc 
attached to n. This gives a mmlber of hypothe- 
ses describing partially matching patterns. Al- 
ready started hypotheses are expanded with tile 
lal)el of the edge running ti'om the l)revious node 
to the current node. This procedure is shown in 
l~'igul'e 1. For a selection of t;rmmlation patterns 
from the siml)le , word-1)ased translation mem- 
ory the hyt)otheses tbr 1)artially matching pat- 
terns generated during the left--right traversal 
are shown as well as the resulting new edges. 
The result of applying all transducers is a 
graph where each path is a (partial) transla- 
tion of the source sentence. The 1)ath with the 
best overall score is used to construct the fi- 
nal translation. For good result;s, not; only the 
scores from t;he transducers shoul(l 1)e used in 
selecting the best t)ath, but a  model 
of the target  should l)e inchlde(l. 
1 llIIl # al, on, at the 
9 Montag# Monday 
17 waere es so moeglich # would that be possilflc 
18 wic ist cs bel lhncn # how about you 
19 wie waerc es # how al)out 
20 wie wacrc cs denn # how about 
21 wie waere es denn am Montag # how about Monday 
22 wie wacrc es am Montag # Imw about Monday 
Figure 1: Ext)ansion of Pattern Hypotheses 
3.1 Error Tolerant Match 
To improve tile coverage on unseen test data, 
it may be avantageous to allow tbr approxima- 
tivc matching. The idea is, to apply longer seg- 
ments tbr syntactically better translations with- 
out loosing to much as far as tile content of the 
sentences is concerned. 
We us(; weighted edit distance, i.e. each er- 
ror (insertion, deletion, substitution) is assici- 
ated with an individual score. Thereby, the 
deletion or insertion of typical filler words can 
be allowed, whereas the deletion or insertion of 
content words is avoided. 
3.2 Translation on Word Lattices 
The approach described so far can be used for 
a tight integration of speech recognition and 
translation. Speech recognition systems typi- 
cally 1)ro(luce wor(l lattices which encode the 
most likely word sequences in an e.flicient lllall- 
net. A direct translation on the lattice has, 
compared to transforming the lattice, into an n- 
best list;, translating each word sequence, mM 
selecting the overall best translation, a nulnber 
of advantages: 
• all the paths can be covered, whereas in 
an n-best approach typically only a small 
fraction of tile paths is considered; 
• partial translation hypotheses are reused; 
• acoustic scores can be taken into account 
when calculating an overall score for each 
translation hypothesis. 
4 Experiments and Results 
In this section, we will report on first expert- 
ments and results obtained with the cascaded 
transducer approach. Experiments were per- 
tbrmed on the VERBMOBIL corpus. This cor- 
pus consists of spontaneously spoken dialogs in 
the appointment scheduling domain (Wahlster, 
1993). The vocabulary comprises 7 335 German 
1133 
words and 4382 English words. A test corI)us 
of 147 sentences with a total of 1 968 words was 
used to test the coverage of tile transducers and 
to run preliminary translation experiments. 
In Table 3 the sizes of the transducers are 
given. 
Table 3: Number of translation t)atterns of tile 
transducers. 
Transducer Patterns 
Nalne 
Spell 
Number 
Date 
POS Tags 
~ralnnlar 
442 
60 
342 
334 
671.4 
124 
4.1 Coverage 
In a first series of experiments, the coverage 
of the cascaded transducers was tested. TILe 
sentences pairs Dora the training corpus were 
segmented into shorter segments. This resulted 
in 43609 bilingual phrases running from 1 word 
up to 82 words in length. The longest phrases 
were discarded as it is very unlikely that they 
will match other sentences. Thus, for the ex- 
periments only 40000 sentence pairs were used, 
the longest sentences containing sixteen source 
words. 
Starting fi'om those simple phrases, succes- 
sively more transducers were applied 1lt) to the 
fllll cascade. In Table 4 the coverage for each 
level is shown. As expected, the coverage in- 
creases and nearly flfll coverage on the test 
sentences is reached. In tile final step, the 
POS transducer and the grammer transducer 
are both applied. 
The first cohnnn shows which transducers 
have been applied. In each step, one additional 
transducer is applied tbr bilingual labeling and 
tbr translation. Bilingual labeling reduces the 
number of distinct patterns in the translation 
memory, whereas the immber of compound pat- 
terns increases. The last column shows the 
number of words in the test sentences not cov- 
ered by the patterns ill tile translation nmmory. 
As can be seen, the coverage increases which 
each step. The large improvement in the final 
Table 4: Efl'ect of selected transducers oi1 cov- 
erage on test corpus. 
%'ansdncers Patterns Coln- not 
pound covered 
NOlle 
Name 
+ Spell 
+ Number 
+ Date 
+ Gramnlar 
40000 
39624 
39508 
38669 
36118 
35519 
1.259 
1468 
11181 
14684 
15682 
273 
254 
249 
238 
215 
9 
step results froln applying tile POS-tag trans- 
ducer whidl coveres a large part of the vocabu- 
lary. 
4.2 Translation 
First experiments have been performed to test 
tile approach tbr translation. So far, no lan- 
guage model tbr the target  is applied 
to score the different translations. 
For the sentence 'Samstag und Februar sind 
gut, aber der siebzehnte ware besser' the best 
t)ath through the resnlting translation graph 
gives a structure as shown in Figure 2. IlL Ta- 
ble 5, some translation examples for test sen- 
tences not seen ill the training corpus arc given. 
Table 5: Three translations generated t'rom the 
hierarchical translation memory. 
Ich werde lnit dem Fhlgzeug kolnnmn. 
I will come with the plane. 
Ja, wunderbar. Machen wir das so, und 
dann treflbn wir uns daim ill Hamburg. 
Vielen Dank und auf WiederhSren. 
Well, excellent. Shall we fix this, and 
then we will meet then in Hanfl)urg. 
Thank you very much goodbye. 
Das kann ich nicht einrichten. Ich habe 
eine Chance ab dreimldzwanzigsten 
Oktober. Ist es da bei Ihnen m6glich? 
It can I not arrange. I have 
a chance froln twenty-third of 
October. Is it as for you possible? 
1134 
I C_PHRASE 
the fourth would be better 
-7.4 
~DATE \] 
Saturday and February 
-4,2 
DATE DATE 
Saturday February 
-0.6 -0.6 
DAYWEEK 
Saturday 
-0.5 
Samstag I 
MONTH 
February 
-0.5 
Feb ruar 4 
I DATE 
the fourth 
-4.1 
~ DATEDAY the fourth -4.0 
are  ood but 
-2 1 -0 1 
~a_ere ~A 
Figure 2: \[\[~'m~slation example 
5 Sun'nnary and conclusions 
In this t)npcr a translation at)pronch 1)asexl on 
cascaded tin|re state l;ra,nsducers has l)een pre- 
sen|ext. A mm~l\] mm~l)er of simple l;rmlsdut'- 
(;rs is handcrafted and then used to convert; n 
bilingual cortms in|;o a translation memory con- 
sisting of som:(:c l)al;tcrn target; i)a,l;l;(;rn p~tirs, 
which inchuh; category lnlmls. Trmlslni;ion is 
then lmrformcd by applying l;he comtflel;e cas- 
ca(le of l;rans(luce.rs. 
First (;xl)e.rim(mts ha,v(; shown l;lm \])ot,cnl;i;J 
of this ai)l)ro~u:h for m~tchine l;ransla,tion. Good 
coverag(~ on mlse,(m test data ('ould 1)e ol)l;aine(l. 
The. main ditficulty in this nt)l)roach is to (te- 
l|he a (:onsistenl; scoring s('heme thr the (litt'e,r- 
ent transdu(:('rs. Especially, ~ good l)M~m('e t)(;- 
tween the grammm: and th(', word-t)as(;d |,ransb> 
lion m('mory is n(',c(;ssary. 'Phis will t)e th(' main 
focus for futur(', work. 
As Mrea(ty mentioned, ;~ l~tngmtge modal for 
th(; tnrget l~mguag(; has to bc integrated into 
t;h(, scoring of the translation hyl)othes(,s. Fi- 
mflly, the l, rmmdu('er based al)t)roadl to transla- 
tion will 1)e tested on word lattice.s as i)rodu(:ed 
by spee,(:h recognition systeans. 
Acknowledgement. This work was partly 
SUl)t)orted l)y the German Fede.ral Ministry of 
E(tuc~ttion, S(:ie.n(:e, ll.es(;m:ch mM 3b.(:hnoh)gy 
under the. Contract Nulnl)er 01 IV 701 Td 
(vl m vonu,). 

References 

R. 1). Brown. i\[996. Exmut)lc-1)ase, d machine 
translation in the pangloss system, l"rocc, cd- 
ings of the 16th, international Co~@rencc, on 
Computational Linguistics, 169-174, Copcn- 
tm,ge, n, l)emnark, August. 

It. l(itmJo. 1993. A COml)rehensive mM prn(> 
ti(-M model of memory-ha,seal machine trmls- 
la.tion, l~mcccdi,ng.~ of the 13th, hzl, c'r,natio'nal 
Joint Co'nfere, nce, o'n Art{/icial bl, tclligc,'n, ce, 
vohmm 2. 1276 1282. Morgmt Ka.ufmmm. 

F..\]. Och, C. Tillmmm, mM H. Ney. 1999. lm- 
prove, d aligmnent models for statistical ma- 
chilw, I;ranslation. Procceding,s of the Joint 
SIGDAT Co~@rcncc on Empirical Meth, ods 
in Na, t,wral Language PTwccs.sin9 and Very 
Large, Corpora, 20 28, University of Mm:y~ 
land, College Park, MD, USA, June. 

S. Sato and M. Nagao. 1990. Towmd memory- 
based tnmslation. P'rocc, edings of the 13th 
International Cm@rcnce on Computational 
Lingui,~tics, vol. 3, 24:7 ~252, Hclsinki, Fin- 
land. 

W. Wahlster. 1993. Vert)mobil: %'anslation of 
t'a(:c-to-fac(; dialogs. Proceedings of th, e MT 
Summit IV, 1.27 135, Kol)e, Jal)mL 
