Finding Structural Correspondences from Bilingual Parsed Corpus 
for Corpus-b sed Translation 
Hideo Watanabe*, Sadao Kurohashi** and Eiji Aramaki** 
* IBM Researdt, Tokyo Research Laboratory 
1623-14 Shimotsuruma, Yamato, 
Kanagawa 242-8502, Japan 
watanabe@trl.ibm.co.jp 
** Graduate School of Inforlnatics, Kyoto University 
Yoshida-homnachi, Sakyo, 
Kyoto 606-8501, .JaI)an 
kuro@i.kyoto-u.ac.jp, 
aramaki@pine.kuee.kyoto-u.ac.jp 
Abstract 
In this paper, we describe a system and meth- 
ods for finding structural correspondences from the 
paired dependency structures of a source sentence 
and its translation in a target . The sys- 
tem we have developed finds word correspondences 
first, then finds phrasal correspon(tences based on 
word correspondences. We have also developed a 
GUI system with which a user can check and cor- 
rect tile correspondences retrieved by the system. 
These structural correspondences will be used as 
raw translation I)atterns in a corpus-based transla- 
tion system. 
1 Introduction 
So far, a number of methodologies and systelns 
for machine trauslation using large corpora exist. 
They include example-based at)proaches \[7, 8, 9, 
12\], pattern-based approaches \[10, 11, 14\], and sta- 
tistical approaches. For instance, example-based 
approaches use a large set of translation patterns 
each of which is a pair of parsed structures of a 
source- fragment and its target- 
translation fragment. Figure 1 shows an exanl- 
ple of translation by an example-based method, ill 
which translation patterns (pl) and (p2) are se- 
lected as similar to a (left hand) Japanese depen- 
dency structure, and an (right hand) English de- 
pendency structure is constructed by merging the 
target parts of these translation patterns 1. 
In this kind of system, it is very important to 
collect a large set of translatiou patterns easily and 
efficiently. Previous systems, however, collect such 
translation patterns mostly manually. Therefore, 
they have problems in terms of the development 
cost. 
1Words in parenthesis at the nodes of the Japanese de- 
pendency structure are representative English translations, 
and are for explanation. 
This paper tries to provide solutions for this is- 
sue by proposing methods for finding structural 
correspondences of parsed trees of a translation 
pair. These structural correspondences are used as 
bases of translation patterns in corpus-based ap- 
proaches. 
Figure 2 shows an example of extracting struc- 
tural correspondences. In this figure, tile left tree 
is a Japanese dependency tree, the right tree is a 
dependency tree of its English translation, dotted 
arrows represent word correspondence, and a pair 
of boxes connected by a solid line represent phrasal 
correspondence. We would like to extract these 
 ,ook \ 
"4"-..~ ...," ~.. a movie ~ 
Figure 2: An Example of Finding Structural Cor- 
respoudences 
word and phrasal correspondeuces automatically. 
In what follows, we will describe details of proce- 
dures for finding these structural correspondences. 
2 Finding Structural Correspondences 
This sectiou describes methods for finding struc- 
tural correspondences for a paired parsed trees. 
2.1 Data Structure 
Before going into the details of finding structural 
correspondences, we describe the data format of a 
906 
verb -- 
9a 
noun--noun , 
n0mu--drink 
ll0un -- n0un 
verb ! 
i ,, 
t. 
"', dl lk 
 he  .__medicine 
l 
1 ~ # 
\[.-- 
I 
(p2) 
Figure 1: Translation Example by Examt)le-based ~li'anslation 
dependency structure. A det)endeney stru('ture as 
used in this pat)er is a tree consisting of nodes and 
links (or m:cs), wh('.re a node represents a content 
word, while a link rel)resents a fllnctional word or 
a relation between content words. For instance, as 
shown in Figure 2, a t)reposition "at;" is represented 
as a link in l~,nglish. 
2.2 Finding Word Correspondences 
The tirst task for finding stru('tm:al corresI)On- 
den(:c's is to lind word (:orro, sl)ondenccs t)et;ween (;he 
nodes of a sour(:e parsed tree and the nodes of a 
t;wget parsed tree. 
Word correspondences are tkmn(1 by eonsull;ing a 
source-to-target translation dictionary. Most words 
can find a unique 1;ranslation candidate in a target 
tree, but there are cases such that there are many 
translation candidates in a target parsed tree for 
a source word. Theretbre, the main task of tind- 
ing word correspondences is to determine the most 
plausible l;ranslation word mnong can(tidates. We 
call a pair of a source word and its translation 
candidate word in a target tree a word correspon- 
dence candidate denoted by WC(s,/,), where s is a 
source word and t is a target word. If 17\[TC(s,/,) ix a 
word correspondence candida.te such that there is 
rto other WC originating h'om s, then it is called 
WA word correspondence. 
The basic idea to select the most plausil)le word 
correspondence candidate ix to select a candidate 
which is near to another word correspondence whose 
source is also near to a sour(:e word in question. 
Suppose a source word s has multiple candidate 
translation target words t~ (i = 1,...,7~,), that is, 
there are multiple 17FCs originating h'om .s'. We, 
denote these multiple word corresl)ondence candi- 
dates by WC(s, tl). For each I'VC of s, this proce- 
dure finds the neighbor WA correspondence whose 
distance to WC ix below a threshold. The distance 
between WC(sl,/,~) and WA(s.2,/,2) is defined as 
the distance between sl and .s2 plus the distmme 
between s2 and 1,2 where a distance between two 
nodes is defined as the number of nodes in the t)ath 
whoso, ends are the two nodes. Among I~VCs of 
.s for which neighbor H/A ix tound, the one with 
the smallest (listan(:(~ is chosen as the word corre- 
Sl)ondenee of s, and I/VCs whMl are not chosen 
are invalidated (or deleted). We call a word corre- 
spondence found t)y this procedure WX. We use 
3 as t;he distance threshold of the above procedure 
currently. This procedure ix applied to all source 
nodes which have multii)le WCs. Figure 3 shows 
an example of WX word correspondence. In this 
examt)le, since the Japanese word "ki" has two En- 
glish l;ranslation word candidates "time" and "pe- 
riod," there are two WCs (~7C 1 and WC2). The 
direct parent node "ymlryo" of "ki" has a WA cor- 
respondence (I/VA1) to "concern," and the direct 
child node "ikou" has also a WA correspondc'nee 
(WA2) to "transition." In this ease, since the dis- 
tance between I'VC2 and WA2 is smaller than the 
distan(:e between I.VC1 and WA1, I'VC~ in clmnged 
to a 1/l/X, and I~ITC1 is adandoned. 
In addition to WX correspondences, we consider 
a special case such that given a word correspon- 
dence l'lZ(s,/,), if s has only one child node which ix 
907 
.., ........ be ....... ",,, 
.. -~%omp t 
...WAI at// concern 
." time 
yuuryo ,.." 
(concern) -" 
ni ..,Wc1 same 
.... accompany 
ki..,*\[__ 
(time) .............. W_..G2 ............ '"-- period 
ikou ......... VVA2 of 
transition (transition) 
Figure 3: An Exmnt)le of WX Word Correst)on- 
(lence 
a leaf and t has also only one child node which ix a 
leaf, th(;n we COllStrllet a lleW word correspondence 
called 1US from these two leaf nodes. This WE 
procedure is al)plied to all word correspondences. 
Note tlmt this word correst)ondence is not to se.le, ct 
one of candidates, rather it is a new finding of word 
corre, spondence by utilizing a special structm:e. For 
instance, in Figure 3, if there is a word eorrespol> 
dence 1)etween "ki" and "period" and there is no 
word correst)ondence between "ikou" and "transi- 
tion," then I<V,g(iko'u~ transition) will be found 1)3' 
this 1)roeedure. 
These WX and WS t)rocedures are continuously 
al)plied until no new word correspondences arc t'(mnd. 
Aft;er al)l)lying the above WX and I'VS pro(:e- 
dures, there are some target words t such that t is a 
destination of a l,l/C(.s ", t) and there ix no other 1,176 , 
whose destination ix t:. In this case, the lUG(s,t) 
correspondence candidate is chosen as a valid word 
correspondence between s and/,, and it; is called a 
HzZ word eorrest)ondence. 
We call a source node or a target node of a word 
correspondence an anchor node in what tbllows. 
The above t)rocedures for finding word corre- 
sI)ondences are summarized as follows: 
Find WCs by consulting translation dictionary; 
Find WAs; 
while (true) { 
find WXs; 
find WSs; 
if no new word corresp, is found, then break; 
} 
find WZs; 
2.3 Finding Phrasal Col'resl)ondences 
The next step is to tind phrasal correspondences 
based on word eorl'eSl)ondences t'(mnd t) 3, 1)roce.- 
dures described in tim previous section. What we 
would like to retrieve here, is a set of phrasal cor- 
respondences which (:overs all elements of a paired 
dependency trees. 
In what follows, we (:all a portion of a tree which 
consists of nodes in a 1)att~ from a node ?t I (;o all- 
oth(;r node nu which is a descen(lanl; of n:l a lin-. 
ear tree denoted by LT(v,1, n~), and we denote a 
minimal sul)tree including st)coiffed nodes hi, ..., n.~, 
l)y T(nl,...,n,). For instan(:(,~ in the English tree 
structure (the right tree) in Figure 4, LT(tcch, nology , 
science) is a rectangular area covering %eclmol- 
• tg "e e ~ ogy," and SOl ,no ,, anti .T(J'acl;or, cou'ntrjl ) is a 
1)olygonal area covering "factor,""atDcl,, .... t)ol- 
icy," and "country." 
The tirst step is to find a 1)air of word correst)on- 
dences W, (.~'~, t,) and ~4q(.,.~, t,~) such that .,, a.,t 
s2 constructs a linear tree LT(si, s2) and there is no 
anchor node in th(' 1)al;h from s~ to s2 other than .s'~ 
and .s2, where 1UI and H~ denote any tyi)e of word 
('orrest)on(lences 2 and we assmne there is a word 
corresI)ondence t)etwee, n roots of source and (;arget 
trees by defmflt. We construct a t)hrasal correspon- 
dence fi'om source nodes in LT(s,,s2) and target 
l/o(les itl r\]'(t:l,/'2), (l()llote(t by \];'(l~,~F'(.q'l, .";2), 5\].n(tl, t2)). 
For illstall('e~ ill Figllre 41~ \]"11~ \]~12~ 1)'2~ 1)3 and 
\])4 tu.'e source portions of phrasal et)rrespondences 
found in this step. 
The next stel) checks, for ea(:h 1', if all anchor 
llo(les of wor(1 eorres1)Oll(leile(?s wllose SOUlT(;e o1 ~;al- 
get node is included in P are al,eo included in P. 
If a t)hrasal correst)ondenee satisiies this condition, 
then it is called closed, otherwise it ix called open. 
Further, nodes which are not included in the I ) in 
question are called open nodes. If a l ) ix ot)en, then 
it ix merged with other 1)hrasal correspondences 
having ol)en nodes of P so that the merged 1)hrasal 
correspondence becomes (-losed. 
Next, each P~,, is checked if there is another l)q 
which shares any nodes ottmr than anchor nodes 
with P.,,. If this is the case, these P:., and 1~ are 
lnerged into one phrasal correspondence. In Figure 
4, t)hrasal correspondences i 11 and P12 are merged 
into P1, since their source I)ortions LT (haikei, koku) 
and LT (haikci, seisaku) share "doukou" which is 
not an anchor node. 
Finally, any path whose nodes other than the 
root are not included in any 1)s but the root node 
ix included in a 1 ) is searched for. This procedure 
2Since WC is not a word correspondence (it is a candi- 
date, of word corresi)ondence), it is llOi; conside, red here. 
908 
is apl)lied I;o 1)oth source a.nd (;arget trees. A im.th 
found 1)y this 1)ro(:(xlur(~ is called an open pal, h,, m~(t 
its root no(le is called a pivot. If such an Ol)en path 
is found, it is t)rocessed as follows: l, br each 1)ivot 
node, (a) if the t)ivot is not an mmhor nod(;, then 
open lmths originating fl:om the pivot is merged 
into a 1 ) having I;he pivot, (b) if the pivot is an 
~LIlChOf llo(lo~ {;hOll 3_ llOW t)hl'~lS~L1 c()rFos1)oII(|(~IlC( ~, iS 
created from Ol)(m 1)ai;hs originating from the m> 
thor nodes of the word (:orrcsl)on(l(:ncc. 
In Figure 4, w(: get tinally four phrasal (:orr(:- 
Sl)on(lences l~, f~, l~, an(l l~t. 
! haikei.!,,: I - ................... {-~ factor',,,l ', 
i /~0 :: a ect " 
,, ',,(tre, nd)if\ ~ ~-" li; k .'/ 
;~oy, v __: 
~, koku ( seisak~'~{t ----.~- --~ 
~' (C0UrlttV)_l , (p0%,~ :t' .. technology .lrltly ~< :::>i--- 
-::-:~ 
= 7 :: TLI io. ; I 
(major) ~,_/_ 
- X-~ / 
' giutu"\]\] (technolo~ly)l' 
.¢ science 
kagaku " 
(scie, nce) .- 
P4 
/ t 
t/ ff 
i 
Figm:e d: An l~;xaml)le of Finding Phrasal Corr(> 
S\])Olld(~,IIC(',S 
The above 1)ro(:edures fl)r finding l)hrasal (:orr(> 
Ht)oIIdoIICOS ~-LF(~ SlllIllIl¢~riz(Kl gtS follows: 
Find initial Ps; 
Mea'ge a.n Ol)Cn 1>~ with other i's having 
open nodes of 1}; 
Create new Ps 1)y merging \])s 
which have more tlmn 2 (:ommon nodes; 
Find ot)en path, alld 
if the t)ivot is ml mmhor, |;hen 
merge the path to P having the anchor, 
otherwise create new l ) by merging 
all open t)ai,hs having l;lm pivot; 
3 Experiments 
3.1 C, orlms and Dictionary 
We used (l()(;lllil(~'ll|;s t'rolil White Papers on S(:i- 
en(-e and Technology (1.994 to \]996) pul)lished by 
the S(:ience mid Technology Agency (STA) of tim 
.\]al)mmse govcrlim(~nl;. STA lmblished th(;se White 
PaI)ers in both Jat)mmse and English. The Com- 
mmfications l{esea.rch Laboratory of" the Ministry 
of Posts and Telecommuni(:a.tion of the .\]al)mmse 
goverlmmnt supl)lied us with the l)ilingual corpus 
wtfich is already roughly aligned. We made a bilin- 
gual cortms consisting of pa.rs(;d dependency struc- 
tures by using the KNP\[2\] .\]al)mmso, 1)arser ((l(wel- 
Ol)ed by Kyoto (hfive)sity) for .Jal)anes(~ sentences 
and the ESG\[5\] English 1)arser (developed by IBM 
Watson i{e, sear(:h Center) for English s(~nl;(!nces. 
We mad(} al)oul; 500 senl;(m(:e l)airs, each of whi(:h 
11~1,'4 ;I, OIlC-I;O-OII(', 80,11|;(',11(;0 (-orresl)onden(:(~,, fl'OI\[l (,\]lO 
raw (t~tta of l;he, White l)al)crs, mid s(',l(;(;i;(xl rm> 
domly aboul; 130 s('aH;en(:c pairs for (',Xl)(Mm(;nts. 
ilow(wer, since a 1)nrser does not always \])ro(hwe 
(;orl'c(;\[; 1);~l"s(t t;re(}s~ wo (~x(:lude(1 some, ,~(~ii|;(Hic(~ p;Lil's 
wlfich have severe 1)arse errors, and tinally got i\[15 
S(~,II\[;OIlC(; pairs as a, to, st s(%. 
As a trm~slation wor(1 dictionary/)etw(',(m .l at)ml(',s(; 
and English, we, tirsl; used ,l-to-l~; trmlslati()n (li(:- 
l,ionary which has mot(,' tlmn 100,000 (,ifl;l'i(;~, but 
we, fi)un(l l;}l~/{; l;ller(? are som(~ word ('orr(~sl)Oll(l(~,llt;(~s 
not (:()v(ued in this di(:ti()nary. Tlmref()rG we merged 
(retries fi:om \]';-t;o-.I translatioll dictionary in order 
to get; much broad (:ov(wag(,'. The l;oDd nulnl)(}r ()f 
entries a.re now more I;ha.n \[50,000. 
3.2 Experinmntal Results 
Td)le i shows l;he result of (~Xl)c, rimeni; fl)r tind- 
ing word correspond(nm(~s. A row with ALL in th(', 
l:yl)e cohmm shows Llle total ~CClll'~lcy of WOI'(1 cor- 
r(Lqpolld('31c(~s and ol;\]l{~r rows sh()\v Llle .~iCClll'ktcy of 
each t, yt)e. It is clear that WA (:orr(~sl)Olld(~ll(;(',s 
have a very high a('cura(:y. Other word (:orresl)On-- 
do, nc(,,s also ha.ve a roJatively high ac(:ura(:y. 
Table 2 shows tim remflt of exl)erimenl,s for find~ 
ing 1)hrasal correspondences. The row with ALL in 
I;he l;yt)c cohlmn shows l;he l;ol;al accuracy of phrasal 
(:ol'r(~sl)ondo, n(:(~s found by the 1)rol)osed 1)rocedure. 
This ac(:macy level is not I)romising and it is not; 
useful for later 1)ro(:e, sses since it needs human (:he(:k- 
ing ml(l (:orrec£ion. Therefore, we sul)categoriz(~ 
each phrasal corl'eSpond(m('es, and check l;he a('- 
(:uracy for each subca.tegory. 
We consider the following sut)catcgories for 1)hrasal 
('x)rl'(}Sl)olidell(-(~s: 
• MIN ... The minimal t)hrasal correst)ondence, 
that is, I'(1Zl'(.s'l, .s2), LT(tl, t2)) such that (;herc 
909 
type 
ALL 
WA 
WX 
WS 
WZ 
1111111. nunl. of SUCCESS of 
correct ratio found 
corresp. (%) corresp. 
771 745 96.63 
612 600 98.03 
131 118 90.07 
13 12 92.3 
15 15 100 
Table h Experimental Result of Word Correspon- 
dences 
are word correspondences W(s1, tl) and W 
(s2,t2), s2 is a direct child of St and t2 is a 
direct child of tl. 
• LTX ... P(LT(.s'I,S2),LT(tl,t2)) such that 
all nodes other titan s2 and t2 have only one 
child node. 
• LTY ... P(LT(sl,.S2), LT(tl, t2)) such that 
all nodes other than Sl, s2,1':1 and t.2 have only 
one child node. 
LTX is a special case of LTY, since Sl and tl of 
LTX must have only one child node, on the other 
hand, ones of LTY may have more than two child 
nodes. A subcategory test tbr a phrasal correspon- 
dence is done in the above order. Exmnples of these 
subcategories are shown in Fig 5. 
Tlm result of these subcategories are also shown 
in Table 2. Subcategories MIN and LTX have very 
high accuracy and this result is very promising, 
since we can avoid nmnual checking for ttmse phrasal 
correst)ondences , or we would check only these types 
of t)hrasal correspondences mmmally and discard 
other types. 
As stated earlier, since we removed only sen- 
tences with severe parsing errors from the test set, 
please note that the above mtmbers of experimental 
results are calculated for a bilingual parsed corpus 
including parsing errors. 
4 Discussion 
There have been some studies on structural align- 
Inent of bilingual texts such as \[1, 4, 13, 3, 6\]. Our 
work is similar to these previous studies at the con- 
ceptual level, but different in some aspects. \[1\] 
reported a method for extracting translation tem- 
plates by CKY parsing of bilingual sentences. This 
work is to get phrase-structure level phrasal cor- 
respondences, but our work is to get dependency- 
structure level phrasal correspondences. \[4\] pro- 
posed a method for extracting structural matclfing 
(pairs of dependency trees) by calculating matching 
similarities of two dependency structures. Their 
work focuses on tile parsing ambiguity resolution 
by calculating structural matching. Further, \[3, 6\] 
proposed structural alignnmnt of dependency struc- 
tures. Their work assuined tha.t least common an- 
cestors of each fragment of a structural correspon- 
dence are preserved, but our work does not have 
such structural restriction. \[13\] is different to oth- 
ers in that it tries to find phrasal correspondences 
by comt)aring a MT result and its manual correc- 
tion. 
In addition to these differences, the main differ- 
ence is to find classes (or categories) of phrasal cor- 
respondences which have high accuracy. In general, 
since bilingual structural alignment is very compli- 
cated and difficult task, it; is very hard to get more 
than 90% accuracy in total. If we get only such 
an accuracy rate, the result is not useful, since we 
need manual clmcks tbr the all correspondences re- 
trieved. But, if we can get some classes of phrasal 
correspondence with, for instance, more than 90% 
accuracy rate, then we can reduce manual clmck- 
ing for phrasal correspondences in such classes, and 
this reduces the development cost of translation 
patterns used in later corpus-based translation pro- 
tess. As shown in the previous section, we could 
find ttmt all (:lasses of word correspondences and 
two subclasses of phrasal correspondences are more 
than 90% accurate. 
When actually using this automatically retrieved 
structural correspondence data, we must consider 
how to manually correct the incomplete parts and 
how to reuse mamlal correction data if the parser 
results are ctmnged. 
As for the tbrlner issue, we need an easy-to-use 
tool to modify correspondences to reduce the cost 
of mmmal operation. We have developed a GUI 
tool as shown in Figure 6. In this figure, the bot- 
tom half presents a pair of source and target depen- 
dency structures with word correspondences (solid 
lines) and phrasal correspondences (sequences of 
slmded circles). You can easily correct correspon- 
dences by looking at this graplfical presentation. 
As for tlm latter issue, we must develop meth- 
ods for reusing the manual correction data as much 
as possible even if tim parser outputs are changed. 
We have developed a tool for attaching phrasal 
correspondences by using existing phrasal corm- 
spondence data. This is implemented as follows: 
Each phrasal correspondence is assigned a signa- 
ture which is a pair of source and target, sentences, 
each of which tins bracketed segments which are in- 
cluded in the phrasal correspondence. For instance, 
910 
I 
tmihatu 
((~uebiomeR) 
,~10 
gijutu -,,- 
(tedr, do£.t¢) 
I 
--,<Jeveloprned 
0f 
--,'.-tect'lrlOlogy 
ILl Zl.l~l.l ,~ 
\[<dime} 
go 
seityou 
(i, otldl} 
I 
ya t~.a d 0ute ki 
...... ,.corlti nue 
*o 
-- shob'o 
-. ~obj 
" "k 9 Kl t;dh 
/ ' / \ 
economic 
f 
kaga t,lJ 
"~. 
% 
unparallelled 
. £tij d u - \[--~e c;h n 010 £tl 
~<~o~l I "-)/. ', ,,o  ol oy-. x ,,0,,, 
ka nte n 
{a~laedl 
korera .,- 
science. 
# g 
z ee 
(a) MIN (b) LTX (c) LTY 
p Urp ose 
sPec I 
-,qhis 
Figure 5: Examples of Categories of Phrasal Correst)ondences 
A: 
5115511. of 
type found 
COl;l'esi). 
ALL 678 
MIN 223 
LTX 17 
LTY 27 
B: 
I515151. of 
(:orrect 
co5-5"(~Sl). 
431 
215 
(~: 
SllC(;(~SS 
ratio ~/A (%) 
63.56 
96A1 
D: 
nunL of nodes 
covered t)y A 
7248 
1234 
E: 
nunl. of nodes 
covered by B 
4278 
1194 
F: 
SllCCeSS 
ratio 
E/D (%) 
59.02 
96.76 
17 100 153 153 100 
20 I 74.07 253 191 75A9 I 
Tal)le 2: lgxperinmntal Fh',sult ot' Phrasal Correst)on(len :es 
the following signature is made h)r a i)hrasal corre- 
Sl)on(lence (c) in Figure 5: 
(.~i:j) 
... \[korer~ no kanten karmlo\] kagaku \[gi- 
jutu\] ... 
... science and \[technology fl:om this 
lmrl)ose\] ... (/.~io) 
In the above e, xample, segments betwee, n '\[' and '\]' 
represent a phrasal correspondence. 
If new parsed dqmndency structures for a sen- 
tence pair is given, for each phrasal correspondence 
signature of the sentence pair, nodes in the struc- 
tures wtfich are inside 1)rackets of the signature are 
marked, mid if there is a minimal sul)tree consist- 
ing of only marked nodes, then a phrasal corre- 
Sl)ondence is reconstructed from the phrasal corre- 
spondence signature. By using this tool, we can 
efficiently reuse the manual efforts as much as pos- 
sible even if parsers are updated. 
5 Conclusion 
Ill this I)al)er, we have t)rol)osed methods for 
finding structural correspondences (word correst)on- 
dences and i)hrasal corr(;spondences) of bilingual 
parsed corpus. Further, we showed that the t)reci- 
sion of word correst)ond(mces and some catc'gories 
of t)hrasal corresl)ondences found 1)y our methods 
are highly accurate, and these correst)ondences can 
reduce the cost of trm~slation pattern accumula- 
tion. 
In addition to these results, we showed a GUI 
tool for mmmal correction and a tool for reusing 
previous correspondence data. 
As fld;ure directions, we will find more subclasses 
with high accuracy to reduce the cost for transla- 
tion pattern preparation. 
We believe that these methods and tools can ac- 
celerate the collection of a large set of translation 
patterns and the developlnent of a corlms-based 
translation system. 
911 
~rel id="28" type="P4" src="3.4,9,10,11,12.13" tgt="1,2.3,4,8,9,12" eval="T~R UE" score="O" geoeratlon='' subtype="orff' org=" con'lment='"'= 
~rel id="29" type="P5" src="1,2,3" tg~"l 0,11,12" BvaI="TRUE" score="0" generation="" subtype="org" org=" comment:="'> 
~rel ld="3O" type="P5" src="5.6.7" \[g1="5.6,7" eva~"TRUE" score="0" generation=" sublype="org" org="" cornmen~''~ 
<tel id="31" type="P5" src="7.8,9" tg~"7.F' evaI="TRUE" ecore="O" generation=" subtype="org" erg=" cerumen .t=""~. 
'~rel id="32" type="P5" src="3,4.9.10,11.12.13" tgt="1,2,3,4.e,g,12" evaI="TRUE" score="0" generation="' subtype="org" org='' comment=-"'-'. 
! 
\ 
L 
4\ 
h6 aclen;~aad tactiilC:;Id6i' g,0tiCid;~ ;f rn~\[,~r.c~un~les 
.. . .... , .... • ~-. 
,. ¸ L:i :: ? : i % 
Figure 6: An GUI tool for presenting/manipulating structural correspondences 

References 

\[1\] Kaji, H., Kids, Y., and Morimoto, Y., "Learning Trans- 
lation Templates from Bilingual Texts," Proc. of Coling 
92, pp. 672-678, I992. 

\[2\] Kurohashi, S., and Nagao, M., "A Syntactic Analy~ 
sis Method of Long Japanese Sentences based on the 
Detection of Conjunctive Structures," Computational 
Linguisties~ Voh 20, No. 4, 1994. 

\[3\] Grishman, R., "Iterative Alignment of Syntactic Struc- 
tures for a Bilingual Corpus," Proe. of 2nd Workshop 
for Very Large Corpora, pp. 57-68, 1994. 

\[4\] Matsumoto, Y., Ishimoto, H., and Utsuro, T., "Struc- 
tural Matching of Parallel Texts," Proc. of the 31st of 
ACL, pp. 23-30, 1993. 

\[5\] MeCord, C. M., "Slot Grammars," Computational Lin- 
guistics, Voh 6, pp. 31-43, 1980. 

\[6\] Meyers, A., Yanharber, R., and Grishman, R., "Align- 
ment of Shared Forests for Bilingual Corpora," Proc. of 
the 16th of COLING, pp. 460-465, June 1996. 

\[7\] Nagao, M., "A Framework of a Mechanical Translation 
between Japanese and English by Analogy Principle," 
Elithorn, A. and Banerji, R. (eds.) : Artificial and Hu- 
man Intelligence , NATO 1984. 

\[8\] Sato, S., and Nagao, M. "Toward Memory-based Trans- 
lation," Proc. of 13th COLING, August 1990. 

\[19\] Sumita, E., Iida, II., and Kohyama, H. "'Translating 
with Examples: A New Approach to Machine 3Yanslao 
tion," Proc. of" Info Japan 90, 1990. 

\[10\] Takeda, K., "Pattern-Based Context-Free Grammars 
for Machine ~l~anslation," Proc. of 34th ACL, pp. 144-- 
15I, June 1996. 

\[11\] Takeda, K., "Pattern-Based Machine ~lYanslation," 
Proc. of 16th COLING, Vol. 2, pp. 1155-1158, August 
1996. 

\[12\] Watanabe, H. "A Similarity-Driven Transfer System," 
Proc. of the 14th COLING, Vol. 2, pp. 770.-776, 1992. 

\[13\] ~Vatanabe, H. "A Method for Extracting ~IYanslation 
Patterns from ~lS'anstation Examples," Proc. of the 5th 
Int. Conf. on Theoretical and Methodological Issues in 
Machine Translation, pp. 292-301, 1993. 

\[14\] Watanabe, H., and Takeda, K., "A Pattern-based Ma.- 
chine Translation System Extended by Example-based 
Processing," Proc. of the 36th ACL & 17th COLING, 
Vol. 2, pp. 1369o1373, 1998. 
