Toward Memory--based Translation 
Satoshi SATO and Ma.koto NAGAO 
Dept. of Electrical Engineering, Kyoto University 
Yoshida-honmachi, Sa.kyo, K.yoto, 606, Ja.pan 
sa.to@kuee.kyoto-u.ac.jp 
Abstract 
An essential problem of example-based transla- 
tion is how to utilize more than one translation 
example for translating one source sentence. 
This 1)aper proposes a method to solve this 
problem. We introduce tile representation, 
called .matching e,,z:pressio~z, which tel)resents 
the combination of fragments of translation ex- 
amples. The translation process consists of 
three steps: (.1) Make the source matching ex- 
pression from lhe source sentence. (2) TransDr 
the source matching expression into the target 
matching expression. (3) Construct the target 
sentence from the target matching expression. 
This mechanism generates some candidates of 
translation. To select, the best translation out 
of them, we define the score of a translation. 
1 Introduction 
Use of extracted information fiom examples 
or example-based translation is becoming the 
new wave of machine translation. The ba.- 
sic idea. of example~based translation is very 
simple: translate a source sentence by imitat- 
ing the translation example of a similar sen- 
tence in the database. The idea first appeared 
in \[Nagao 84\], and some research has followed 
it \[Sumita 88\]\[Sato 89\]\[Sadler 89a.\]\[Sadler 89b\]. 
But a great deal of effort is still needed to im- 
plemenl the idea. 
In our previous work, we show how to select. 
the best target word in case-frame translation 
based on examples\[Sato 89\]. In this paper, we 
concentrate on two problems: 
1. ltow to combine some fragments of trans- 
lation examph~s in order to translate one 
sentence? 
2. tlow to select tile best tra.nslation out of 
inany candidates? 
We show partial solutions for them in MBT2. 
MBT2 is the second prototype system in our 
Memory-based Translation Project.. MBT2 ca.n 
do bi-directional m~nslation between an English 
word-dependency tree and a Japanese word- 
dependency tree. It is implemented in Sicstus 
Prolog. 
2 Need to Combine Frag- 
me nt s 
The basic idea of example-based translation is 
very simple: translate a source sentence by im- 
itating the translation example of similar sen- 
tencein the database. But in many cases, it is 
necessary to imitate more than one translation 
example and combine some fragments of them. 
Let's consider the translation of the following 
sentence. 
(1) He buys a book on international politics. 
If we know the following translation examt)le 
(2) and (3), we can translate sentence (1) into 
sentence (4) by imitating examples and colnbin- 
ing fragments of them. 
(2) He buys a notebook. 
Kate ha nouto wo ka~. 
(3) I read a boo\]~ on international polilics. 
Watt, hi ha kokusaiseiji nit,suite l:akareta 
hon wo yomu. 
(4) Kate ha kokusMseiji nitsuite kM~reta hon 
WO ka~ll. 
It is easy for a human to do this, but not 
so for a machine. The ability to combine some 
fragments of translation examples is essential to 
example-based translation. A lack of this abil- 
ity restricts the power of example-based trans- 
lation. In this paper, we concentrate on the 
implementation of this ability on machine. 
i 24 7 
3 Matching Expression 
To implenrent the ability to combine some frag- 
ments of t.ra.nslation example in order to trans- 
late one sentence, we must determine the fol- 
lowing: 
® how to represent translation examples 
• what is a fragment 
• how to represe.t the combination of flag- 
lnent.s 
3.1 Translation Database 
The translation database is the collection of 
translation examples. A t~anslation example 
consists of three parts: 
• an English word-dependency tree (EWD) 
• a Japanese word-dependency tree (JWD) 
• correspondence links 
For example, in Prolog, 
ewd e(\[el,\[buy,v\], 
\[e2,\[he,pron\]\], 
\[e3, \[notebook,n\], 
\[e4, \[a,det\]\]\]\]) . 
%% He buys a notebook. 
jwd_e( \[jI, \[kau,v\] , 
\[j2, \[ha,p\] , 
\[j3,\[kare,pron\]\]\], 
\[j4, \[wo,p\] , 
\[j5, \[nouto,n\]\]\]\]). 
%% Kare ha nouto wo kau. 
clinks(\[\[el,jl\],\[e2,j3\],\[e3,j5\]\]). 
%% el <-> jl, e2 <-> j3, e3 <-> j5 
Each number with prefix 'e' or 'j' in word- 
dependency trees represents the ID of the sub- 
tree. Each node in a tree contains a word (in 
root form) and its syntactic category. A corre- 
spondence link is represented as a pair of iDs. 
3.2 Translation Unit 
A word-dependency (sub)tree which has a cor- 
respondence link is transhttable; e.g. el, e2, e3, 
jl, j3, j5. A translatable tree in which some 
translatable subtrees are removed is also trans- 
lata.ble; e.g. el-e2, el-e3, el-e2-e3, jl-j3, 
jl-j5, jl-ja-jS. Both of them are tra.nslat- 
M)le fragments. Sadler calls them translation 
w,,its\[Sadler 89a,\]. 
3.3 Matching Expression 
Next we will introduce the concept 'matching 
expression.' Matching expression(ME) is de- 
fined as the following: 
<HE> ::= \[<ID>I<ME-Commands>\] 
<ME-Commands> : : = 
\[\] 
or \[<ME-Command> I <ME-Commands>\] 
<ME-Command> : := 
\[d, < ID>\] 
or \[r,<ID>, <ME>\] 
or \[a,<ID>,<ME>\] 
%% delete <ID> 
%% replace <ID> 
%% with <ME> 
%% add <ME> as a 
%% child of root 
%% node of <ID> 
Every ID in an ME should be translatable. 
We assume the example in Section 3.1 and 
the following example. 
ewd_e( fell, freud,v\] , 
\[el2, \['I ),prOn\]\] , 
\[el3, \[book,n\] , 
\[el4, \[a,det\] \] , 
\[elb, Ion,p\] , 
\[el6, \[politics,n\] , 
felT, \[international, adj\] 
\]\]\]\]1). 
Y,Y, I read a book on international 
%% politics. 
jwd_e(\[jll, \[yomu,v\] , 
\[j12, \[ha,p\] , 
\[j13, \[watashi,pron\] \]\] , 
\[j14, \[wo,p\] , 
\[j15, \[hon,n\] , 
\[j16, \[ta, aux\] , 
\[j17, \[reru,aux\] , 
\[j18, \[kaku,v\] , 
\[j19, \[nitsuite,p\] , 
\[j20, \[kokusaiseij i,n\] 
1\]\]11\]\]\]). 
%% Watashi ha kokusaiseiji nitsui.te 
%% kakareta hon wo yomu. 
clinks(\[ell,\]ll\],\[e12,j13\],\[e13,j15\], 
\[e16,j20\]\]). 
Under this assumption, the word-dependency 
tree (a) can be represented by the matching ex- 
pression (b). 
(a) \[ \[buy,v1 , 
\[\[he,pron\]\] , 
\[\[book,hi , 
\[\[a,det\]\] , 
\[ Ion,p\], 
\[\[politics,n\] , 
\[ \[international,adj\]\]\] \] \] \] 
%% He buys a book on international 
Y,Y, politics. 
(b) \[el,\[r,e3,\[el3\]\]\] 
248 2 
Source WD (SWD) 
# 
Source ME (SME) 
Target ME (TME) 
g 
~.,ompo~itiol~ \] 
Target WD (TWD) 
Figure 1: Flow of Translaton 
The matching expression (b) consists of two 
transla,tion units: el-e3, e13. And it has the 
information to combine them. 
4 Tl'anslation via Matching 
Expression 
Figure 1 shows the flow of the translation pro- 
. cess. The translation process consists of three 
steps: decomposition, transfer, and composi- 
tion. This process generates all candidates 
of translation using Prolog's backtrack mecha- 
nism. 
4.1 Decomposition 
In decomposition, the system decomposes a 
source word-dependency tree(SWD) into trans- 
lation units, and makes a source matching ex- 
pression(SME). For example, 
SWD = \[\[buy,v\], 
\[ \[he,pron\] \] , 
\[ \[book,n\] , 
\[\[a,det\]\] , 
\[\[on,p\], 
\[ \[politics ,n\], 
\[ \[international, adj\] \] \] \] \] \] 
SME = \[el,\[r,ea,\[el3\]\]\] 
The main tasks in this step are to retrieve 
translation units and compare the source WD 
with retrieved translation units. To retrieve 
translation units quickly, we use some hashing 
techniques. There are two program to do the 
comparison task; one for English WDs and one 
for Japanese WDs. In comparison of Japanese 
WDs, the order of subtrees is not inlportant. 
To reduce the search space and the num- 
ber of candidates, we define replaceablity be- 
tween syntactic categories. If two nodes 
are replaceable, system makes only ~ replace- 
command. As a result, the the system does 
not make some matching expressions; e.g. 
\[el, \[d,e3\] , \[a,el, \[e13\]\]\] 
4.2 Transfer 
in the transfer step, the system replaces every 
ID in the source matching expression with its 
corresponding ID. For example, 
SME = \[el,\[r,eS,\[el3\]\]\] 
TME = \[j1,\[r,jS,\[j15\]\]\] 
4.3 Composition 
in the composition step, the system composes 
the target word-dependency tree according to 
the target matching expression. For example. 
TME = \[jl,\[r,j5, \[jlS\]\]\] 
TWD = \[\[kau,v\], 
\[ \[ha,p\] , 
\[ \[kare,pron\] \] \] , 
\[\[wo,p\], 
\[ \[hon ,n\] , 
\[\[ta, aux\], 
\[ \[tern, aux\], 
\[ \[kaku, v\] , 
\[ \[nitsuite,p\] , 
\[ \[kokusaiseiji,n\] \] \] \] \] \] \] \] \] 
~,~. Kate ha kokusaiseiji nitsuite 
~,~, kakareta hon wo kau. 
This step divides into two sub-steps; the 
main composing step and validity checking. In 
the main composing step, there is no ambi- 
guity with one exception. Because an add- 
command \[a,<ID>,<ME>\] specifies only the 
parent node(<ID>) to add the tree(<ME>), there 
are some choices in composing English word- 
dependency trees. In this step, all possibilities 
are generated. 
Validity of the composed word-dependency 
trees are checked using syntactic categories. 
Validity is checked in every parent-children 
unit. For example, in the above target word- 
dependency tree, 
\[v, \[p,p\] \] , \[p, \[prom \], \[p, \[n\] \], 
In, \[aux\] \] .... 
are checked. A unit is valid if there is a 
unit which has the same category pattern in the 
database. A word-dependency tree is valid if all 
parent-children units are valid. 
3 249 
z" 
/ 7t2 L 
1 "/~ 
l I I , 
\ 7/5 71,7 .2 
.. ~ ._.___-- restricted enviornment ~/" '~" \ 
'nll / m2 (= n2) ", mlo 
/ m,8 2 
/ ,- 
d 
Translation Gx~mple Source (or Target) WD 
Figure 2: Restricted Environments of TU 
5 Score of Translation 
To select the best translation out of all can- 
didates generated by system, we introduce the 
score of a tra.nslM.ion. We define it based on the 
score of the matching expression, because the 
matching expression determines the translation 
outi)ut. The scores of.the source matching ex- 
pression and the target matching expression are 
calculated separately. 
5.1 Score of Translation Unit 
First, we will define the score of a translation 
unit. The score of a translation unit should 
reflect the correctness of the translation unit. 
Which translation unit is better? Two main fac- 
t.ors are: 
1. A larger translation unit is better. 
2. A translation unit in a matching expression 
is a fragment of a source (or target) word- 
dependency tree, and also a fragment of a 
translation example. There are two envi- 
ronments of a translation unit; in a source 
(or target) tree and in a translation exam- 
ple. The more similar these two environ- 
meuts are, the better. 
To calculate 1, we define the size of a trans- 
lation unit(TU ). 
size(TU) = the number of nodes in TU 
To calculate 2, we need a measure of simi- 
larity between two environments, i.e. external 
similarity. To estimate external similarity, we 
introduce a unit called restricted environment. 
A restricted environment consists of the nodes 
one link outside of a TU normally. If corre- 
sponding nodes are same in two environments, 
those environments are extended one more link 
outside. Figure 2 illustrates restricted environ- 
ments of a TU. We estimate external similarity 
as the best matching of two restricted environ- 
ments. To find the best matching, we first deter- 
mine the correspondences between nodes in two 
restricted environments. Some nodes have sev- 
eral candidates of correspondence. For example, 
n7 corresponds with rn6 or m7. In this case, 
we select the most similar node. To do this, 
we assume that similarity values between nodes 
(words) are defined as numeric values between 0 
and 1 in a thesaurus. When the best matching 
is found, we can calculate the matching point 
between two environments, mpoint(TU, WD). 
mpoint(TU, WD) = 
summation of similarity values between corre- 
sponding nodes in two restricted environments 
~t the best matching 
We use this value as a measure of similarity 
between two environments. 
Finally, we define the score of a translation 
unit, seore(TU, WD). 
score(TU, WD) = 
size(TU) x (size(Tg) + mpoiut(TU, WD)) 
For example, we assume that the following 
similarity vMues are defined in a thesaurus. 
250 4 
sim(\[book,n\], \[notebook,n\],O.8). 
sire( \[buy,v\] , \[read,v\] ,0.5) . 
sire( \[hon,n\] , \[nouto,n\] ,0.8). 
sim(\[kau,v\],\[yomu,v\],O.5). 
Then i.he scores of translation units in the 
previous section are the followings. 
jl-j5 I\[ '! \[ 0.S 
I J  s___2_JLL 
5.2 Score of Matching Expression 
\]?he score of a nlatching expression is defined as 
the following. 
score.( .'tiE:. It'D) F~YUCME score(TU, WD) sizc(WD) 2 
FOl; exaul ple, 
\[jl, \[r,jS, \[j15\] \] 
5.3 Score of Translation 
Finally, we define the score of a translation as 
the following. 
scur~:(SWD. SME, TME, TWD) = 
~,n i~( seo,'~( S ME. S WD), score( T~I E, TW D ) ) 
For example, the score of the translation in 
• the previous section is 0.6.I2. 
6 Examples 
The English verb eat corresponds to two 
Japanese verbs, tabcrv and okasu. For exam- 
pie. 
(4) The mall eats w.:getabtes. 
Hito ha yasal wo taberu. 
(5) Acid eats metal. 
San ha kinzoku wo oka.qu. 
Figure 3 shows translation outl)uts based on 
example (,t) and (5) by MBT2. MBT2 chooses 
htberu for he cat.s t~ota, toes and okasu for sulfuric 
acid cals iron. 
*** Tr&nslation Source *** 
\[ \[eat, v\] , 
\[ the,pron\] \] , 
\[ \[potato,n\] \] \] 
Y,Y, He eats potatoes. 
*** Translation Results *** 
No. I (Score = 0.5889) 
\[ \[taberu, v\] , 
\[ \[ha,p\], 
\[ \[kare,pron\] \] \] , 
\[\[wo,p\], 
\[ \[jagaimo,n\] \] \] \] 
No. 2 (Score = 0.4S56) 
\[ \[okasu, v\], 
\[ \[ha, p\], 
\[ \[kare,pron\] \] \] , 
\[ \[~o,p\], 
\[\[jagaimo,n\]\] \]\] 
*** Translation Source *** 
\[\[eat,v\] , 
\[\[acid,n\] , 
\[ \[sulfuric,adj\]\] \] , 
\[\[iron,n\]\]\] 
%% Sulfuric acid eats iron. 
*** Translation Results *** 
No. I (Score = 0.5500) 
\[ \[okasu, v\], 
\[ \[ha, p\] , 
\[ \[ryuusan,n\] \] \] , 
\[ \[wo,p\], 
\[\[tetsu,n\] \] \]\] 
No. 2 (Score = 0.4688) 
\[\[taberu, v\] , 
\[ \[ha, p\], 
\[ \[ryuusan,n\]\] \], 
\[\[wo,p\], 
\[\[tetsu,n\]\] \]\] 
Figure 3: Translation Outputs by MBT2 
5 251 
7 Discussion 
Although MBT2 is not a full realization of Na- 
gao's idea., it contains some merits from the orig- 
inal idea. 
1. It is easy to modify the system. 
The knowledge of the system is in the form 
of translation examl)les and thesauri. We 
can modify the system with addition of 
translation examples. 
. It can do high quality translation. 
The system sees as wide a scope as possible 
in a sentence and uses the largest transla- 
tion units. It produces high quality trans- 
lations. 
. It can translate some metaphorical sen- 
tences. 
In the system, semantic information is not 
used as constraints. As a result, the system 
can translate some metaphorical sentences. 
Demerits or problems of the system are: 
1. A great deal of computation is needed. 
2. Can we make good thesauri? 
The first l)roblem is not serious. Parallel compu- 
tation or some heuristics will overcome it. But 
the second problem is serious. We have to study 
how to construct large thesauri. 
acknowlegments 
The authors would like to thank Mort Webster 
for his proof reading. 
8 Conclusion 
This paper describes how to combine some 
translation units in order to translate one sen- 
tence and how to select tile best translation out 
of some candidates generated by system. To 
represent the combination of fragments, we in- 
troduce the representation called matching ex- 
pression. To select the best translation, we de- 
fine the score of translation based on the score 
of the matching expression. 
This framework can be applied to not only the 
translation between word-dependency trees but 
also the translation between other data struc- 
tures. We hope that generation can be imple- 
mented in same framework as the translation 
from a word-dependency tree to a list or string. 

References 

\[Nagao 84\] Makoto Nagao, A Framework of 
a Mechanical Translation between Japanese 
and English by Analogy Principle, in ARTI- 
FICIAL AND tIUMAN INTELMGENCE 
(A. Elithorn and R. Banerji, editors), El- 
sevier Science Publishers, B.V, 198.t. 

\[Sadler 89a\] Victor Sadler, The Bilingual 
Knowledge Bank(BKB), BSO/Research, 
1989. 

\[Sadler 89b\] Victor Sadler, Translating with a 
simulated Bilingual Knowledge Bank{ BKB), 
BSO/Research, 1989. 

\[Sato 89\] Satoshi Sa.to and Makoto Nagao, 
Memory-based Translation, IPSJ-WG, NL- 
70-9, 1989. (in Japanese) 

\[Sumita 88\] E. Sumita and Y. Tsutsumi, A 
Translation Aid System Using Flexible Text 
Retrieval Based on Syntax-Matching, TRL 
Research Report, TR-87-1019, Tokyo Re- 
search Laboratory, IBM, 1988. 
