Bilingual Text, Matching using Bilingual Dictionary and Statistics 
Takehito Utsuro t IIiroshi Ikeda ~ Masaya Yamane* Yuji Matsumoto t Makoto Nagao ~ 
tGraduate School of Information Science *Dept. of Electrical Engineering 
Nara Institute of Science and Technology Kyoto University 
Abstract 
This paper describes a unified framework for bilingnal 
text matching by combining existing hand-written 
bilingual dictionaries and statistical techniques. The 
process of bilingual text matching consists of two ma- 
jor steps: sentence alignment and structural matching 
of bilingual sentences. Statistical techniques are apt 
plied to estimate word correspondences not included 
in bilingual dictionaries. Estimated word correspon- 
dences are useful for improving both sentence align- 
ment and structural matching. 
1 Introduction 
Bilingnal (or parallel) texts are useful as resources of 
linguistic knowledge as well as in applications such as 
machine translation. 
One of the major approaches to analyzing bilin- 
gual texts is the statistical approach. The statistical 
approach involves the following: alignment of bilin- 
gual texts at the sentence level nsing statistical tech- 
niques (e.g. Brown, Lai and Mercer (1991), Gale and 
Church (1993), Chen (1993), and Kay and RSscheisen 
(1993)), statistical machine translation models (e.g. 
Brown, Cooke, Pietra, Pietra et al. (1990)), finding 
character-level / word-level / phrase-level correspon- 
dences from bilingual texts (e.g. Gale and Church 
(1991), Church (1993), and Kupiec (1993)), and word 
sense disambiguation for MT (e.g. Dagan, Itai and 
Schwall (1991)). In general, the statistical approach 
does not use existing hand-written bilingual dictio- 
naries, and depends solely upon statistics. For ex- 
ample, sentence alignment of bilingual texts are per- 
formed just by measuring sentence lengths in words 
or in characters (Brown et al., 1991; Gale and Church, 
1993), or by statistically estimating word level corre- 
spondences (Chen, 1993; Kay and RSscheisen, 1993). 
The statistical approach analyzes unstructured sen- 
tences in bilingual texts, and it is claimed that the 
results are useful enough in real applications such as 
machine translation and word sense disambiguation. 
However, structured bilingual sentences are undoubt- 
edly more informative and important for future natu- 
ral language researches. Structured bilingual or mul- 
tilingual corpora serve as richer sonrces for extract- 
ing linguistic knowledge (Klavans and Tzonkermann, 
1990; Sadler and Vendelmans, 1990; Kaji, Kida attd 
Morimoto, 1992; Utsuro, Matsnmoto and Nagao, 
1992; Matsumoto, l.shimoto and Utsuro, 1993; Ut- 
suro, Matsumoto and Nagao, 1993). Compared with 
the statistical approach, those works are quite differ- 
ent in that they use word correspondence information 
available in hand-written bilingual dictionaries and 
try to extract structured linguistic knowledge such 
as structured translation patterns and case frames of 
verbs. For example, in Matsunloto et al. (1993), we 
proposed a method for finding structural matching 
of parallel sentences, making use of word level simi- 
larities calculated from a bilingual dictionary and a 
thesaurus. Then, those structurally matched parallel 
sentences are used as a source for acquiring lexical 
knowledge snch as verbal case frames (Utsuro et al., 
1992; Utsuro et al., 1993). 
With the aim of acquiring those structnred linguis- 
tic knowledge, this paper describes a unilied frame- 
work for bilingual text matching by combining exist- 
ing hand-written bilingual dictionaries and statistical 
techniques. The process of bilingual text matchin 9 
consists of two major steps: sentence alignment and 
structural matching of bilingual sentences. In those 
two steps, we use word correspondence information, 
which is available in hand-written bilingual dictionar- 
ies, or not included in bilingual dictionaries but esti- 
mated with statistical techniques. 
The reasons why we take the approach of combin- 
ing bilingual dictionaries and statistics are as follows: 
Statistical techniques are limited since 1) they re- 
quire bilingnal texts to be long enough for extract- 
ing usefifl statistics, while we need to acquire struc- 
tured liugnistic knowledge even from bilingual texts 
of about 100 sentences, 2) even with bilingual texts 
long enough for statistical techniques, useful statistics 
can not be extracted for low frequency words. For the 
reasons 1) and 2), the use of bilingual dictionaries is 
inevitable in our application. On the other hand, ex- 
isting hand-written bilingual dictionaries are limited 
in that available dictionaries are only for daily wm'ds 
and usually domain specific on-line bilingual dictio- 
naries are not available. Thus, statistical techniques 
are also inevitable for extracting domain specific word 
correspondence information uot included in existing 
bilingual dictionarie'~. 
At present, we are at tile starting point of com- 
bining existing bilingual dictionaries and statistical 
techniques. '\['herefore, as statistical techniques tbr es- 
timating word correspondences not included in bilin- 
gual dictionaries, we decided to adopt techniques a.s 
simple as possible, rather than techniques based-on 
complex probabilistic translation models such as in 
1076 
statistical = = 
UJaPanese ........ text ~ _e.st_im.{,t!,)_n o f English iext _J~l 
parso parse * Granlllll|r "Dictionary 
' 1 f \] penc!ency st rueture _j ( dependency structure J 
~..,l~ Word Sinlilarity 
( .... /~\[~l ~" ~ " bilingual dictionary X.~matcllmg~../ + thesaurus 
~./e@~llN,,,.~ -statistics 
\[-.,,,p-,,c~e 1 .. "Y'."P':'"-.. I English 
Lexical Knowledge, Translation Patterns 
Fig. t: The l,'ramework of l}itingual Text Matching 
Browu ell al. (1990), Brown, Pietra, Pietra slid Mer 
c, er (1993), and Chen (1993). What we adopt are sim- 
ple co-occurrence-frequency-based techlfiques in Gale 
and Churcl, (1991) aaM Kay and lfiSscheisen (1993). 
As techniques for sentence ;flignment, we. adopt also 
quite a simple method based-.on the number of word 
correspondence.% without ~tny probabilistic transla- 
tion models. 
in the following sections, we illustrate the specifi- 
cations of our bilingual text nlatehing fl'alnework. 
2 The Framework of Bilingual 
Text Matching 
The overall framework of bilingual text inatching is 
depicted in l:ig. 1. Although our framework is in> 
plemented for Japanese and l~;nglish, it is bmguage 
il,dependent. 
First, bilingual texts are aligned at sentence level 
using word correspondence inforin~tiol, which is avail- 
able in bilingual dictionaries or estimated by statis. 
tic~l techniques. "Statistical estim~ttioa" at text; level 
indicates that lengtl>based statistical techniques arc 
applied if necessary. (At present, they are not im- 
plemented.) "Stati,stical c.stimation" ~tt sentence level 
indicates that word-to.-word correspondences ~re e> 
timated by statistic~d techniques. Then, eueh 1110110 
lingual sentence is parsed into a disjunctive depen- 
dency structure ~Hld structurally matched using word 
correspondence information. In the course of struc. 
tural matching, lexical and synt~Lctic tunbiguities of 
monolinguM sentences are resolved. FinMly, from the 
matching results, monolinguM lexical knowledge and 
translation patterns m'e acquired. 
So fitr, we have implemented the following,: sen- 
tence ~dignment btLsed-on word correspondence in- 
formation, word correspondence estimation by co- 
occnl'rence-ffequency-based methods in GMe mid 
Church (19.~H) and Kay and R6scheisen (1993), struc- 
tured Imttehlng of parallel sentences (Matsumoto et 
a l., 1993), and case Dame acquisition of Japanese 
verbs (Utsuro et al., 1993). In the remainder of 
this paper, we describe the specifications of sentence 
aliglmlent: and word correspondence estimation in 
sections 3 and 4, then report the results of small ex- 
periments and evMuate our framework in section 5. 
I077 
3 Sentence Alignment 
3.1 Bilingual Sentence Alignment 
Problem 
Ill this section, we formally define the problem of 
bilingual sentence Mignment) 
Let S be a text of n sentences of a language, and 
T be a text of m sentences of another language and 
suppose that S and T are translation of each other: 
--__ Sl,S2~...~S n 
T = tl,t2,...,tm 
Let p be a pair of minimal corresponding segments 
in texts S and T. Suppose that p consists of x sen- 
tences sa-~+l, • • •, Sa in S and y sentences t~l,. • •, tb 
in T and is denoted by the following: 
p = {a,x;b,y) 
Note that x and y could be 0. In this paper, we 
call the pair of minimM corresponding segments in 
bilingual texts a sentence bead. 2 Then, sentences in 
bilingual texts of S and T are aligned into a sequence 
P of sentence beads: 
P = Pl,P~,...,Pk 
We put some restriction on possibilities of sentence 
alignment. We assume that each sentence belongs to 
only one sentence bead and order constraints must 
be preserved in sentence alignment. Supposing Pi = 
(ai, xi; bi, Yi), those constraints are expressed in the 
following: 
a0 = 0 , b0 = 0 
ai = ai 1 + .~ci , bi = bi-1 q- yl (1 < i _< k) 
Suppose that a scoring function h can be defined 
for estimating the validity of each sentence bead pi. 
Then, bilingual sentence alignment problem can be 
defined as an optimization problem that finds a se- 
quence P of sentence beads which optimizes the total 
score H of the sequence P: 
H(P) = IIh(h(pl) ..... h(pk)) 
3.2 Bilingual Sentence Alignment us- 
ing Word Correspondence Infor- 
mation 
In this section, we describe the specification of our 
sentence alignment method based-on word correspon- 
dence information. 3 
lIn this paper, we do not describe paragraph alignment pro- 
cess. For the moment, our paragraph alignment program is not 
reliable enough and the results of sentence alignment are better 
without paragraph alignment tban with paragraph alignment. 
Since bilingual texts in our bilingual corpus are not so long, 
the computational cost of sentence Mignment is not serious 
problem even without paragraph Mignment. 
2The term bead is taken from Brown et al. (1991). 
3We basically use dictionary-based bilingual sentence align- 
ment method originMly reported in Murao (1991). The work 
in Murao (1991) was done under the supervision of Prof. M. 
Nagao and Prof. S. Sato (JAIST, East). 
3.2.1 Score of Sentence Bead 
Before aligning sentences in bilingual texts, content 
words are extracted frmn each sentence (after each 
sentence is morphologicMly analyzed if necessary), 
and word correspondences are found using both bilin- 
gum dictionaries and statistical information source for 
word correspondence. Then, using those word corre- 
spondence information, the score h of a sentence bead 
p is calculated as follows. 
First, supposing p= (a, x; b, y), and let n~(a, x) and 
nt(b,y) be the numbers of content words in the se- 
quences of sentences s~4,... ,s, and t~v~,,...,tb 
respectively, and n~t(p) be the nunlber of correspond- 
ing word pairs in p. Then, the score h ofp is defined as 
the ratio of n~t(p) to the sum of n~(a,x) and nt(b,y): 
t,,(p) ~,(p) 
3.2.2 Dynamic Programming Method 
Let Pi be the sequence of sentence beads fi'om the 
begbming of the bilingual text up to the bead pi: 
Pi = pa,p2,...,pi 
Then, we assume that the score H(Pi) of Pi follows 
the recursion equation below: 
H(Pi) = ll(Pi-l) + h(pl) (1) 
Let Hm(ai, bl) be the maximum score of aligning a 
part of S (from the beginning up to the ai(=ai_l+xi) - 
th sentence) and a part of T (f,'om the beginning up 
to bi(=bi-l+yi) - th sentence). Then, Equation I is 
transformed into: 
//.~(a~, b~) 
where the initial condition is: 
Um(ao,6o) = H.,(o,o) = o 
We limit the pair (xi,Yi) of the numbers of sen- 
tences in a sentence bead to some probable ones. For 
the remnant, we allow only 1-1, 1-2, 1-3, 1-4, 2-2 as 
pMrs of the numbers of sentences: 
(xi,yi) e {(1,1),(1,2),(2,1),(1,3), 
(3, 1), (1,4), (4, 1), (2, 2)} 
This optimization problem is solvable as a standard 
problem in dynamic programming. Dynamic pro- 
gramming is applied to bilingual sentence alignment 
in most of previous works (Brown et al., 1991; Gate 
and Church, 1993; Chen, 1993). 
1078 
4 Word Correspondence Esti- 
mation 
In this section, first we describe estimation flmctions 
based-on co-occnrrence frequencies. Then, we show 
how to incorporate word correspondence information 
available in bilingual dictionaries and to estimate 
word correspondences not included in bilingual dic- 
tionaries. Finally, we describe the threshold fnnction 
for extracting corresponding word pairs. 
4.1 Estimation Function 
in the following, we assume that sentences in the 
bilingual text are already aligned. 
Let w, and w~ be words in the texts S and T re- 
spectively, we define the following frequencies: 
freq(w~,,wt) = (frequency of wa and wt's 
co-occurring in a sentence head) 
f','eq(w,) = (frequency of 'w.~) 
freq(wt) = (frequency of wt) 
N - (total number of sentence beads) 
Then, estimation functions of Gale's (Gale and 
Church, 1991 ) and Kay's (Kay and RSscheisen, 1993) 
are given a.s below. 
4.1.1 Gale's Method 
Let a ,-~ d be as follows: 
b := freq(w~) ~ freq(w~,wt) 
c - freq(*vt)-- freq(w~,w~) 
d = N-a-b--e 
Then, the validity of word correspondence w, and wt 
is estimated by the following value: 
(an - be) 2 
(. + 6)(~ ~ e)(b + a)(e + d) 
( .~ b. ) "~ 
freq(w~)freq(wt)(N - freq(w,,))(N- freq(wt)) 
4.1.2 Kay's Method 
The validity of word correspondence w~ and wt is 
estimated by the following value: 
2freq(w~, wt) hk(',.~, '.~) - 
:~eq(w.) + f~eq(~.,) 
4.2 Incorporating Bilingual Dictio- 
nary 
By incorporating word correspondence information 
available in bilingual dictionaries, it becomes easier to 
estimate word correspondences not included in bilin- 
gum dictionaries. 
Let w, be a word in the text S and wt,w~ be words 
in the text T. Suppose that the correspondence of w, 
and wt is included in bilingual dictionaries, while the 
correspondence of w, and w~ is not included. Then 
the problem is to estimate the validity of word corre- 
spondence of w,~ and 'w' t. 
Let freq(w~,wt), freq(w~,w~), freq(w~), 
freq(wt), and freq(w~) be the same as above, and 
frcq(ws, wt, w't)be the frequency of w,, we, and w't's 
co-occurring in a sentence bead. Then, we solve the 
problem above by defining freq'(w,,w~), fveq'(w,), 
freq'(w't) , and N' which becmne the inputs to Gale's 
method or Kay's method. We describe two different 
ways of defining those vMues. 
Estimation I 
One is to estimate all the word correspondences 
equally except that the co-occurrence of wa and wt 
is preferred to that of ,w~ and w~. freq'(w~,w'~), 
fveq'(w~), freq'(w't) , and N' are given below: 4 
fveq'(w,, ,w't) : 
freq(w,, w',) -- ~ freq(w,, wt, w',) 
~t 
freq'(w,~) - freq(ws) 
freq'(w~t) - freq(w't) 
N' -- N 
(freq'(w,,wt) - freq(w,,wt) ) 
When w~, wt, and w~ are co-occurring in a sentence 
bead, the co-occurrence of w~ and wt is preferred and 
that of w~ and w I is ignored. Thus, freq'(w,,w~t) 
is obtained by snbtracting the fi'equeney of all those 
cases fiom the real co-occurrence frequency of w, and 
w' t. But, freq'(w~) and freq'(w~) are the same as the 
real frequencies and the estimated word correspon- 
dences reflect the real co-occurrence frequencies in the 
input text. (Compare with Estimation II.) Word 
correspondences both included and not included in 
bilingual dictionaries are equally estimated their va- 
lidities. 
Estimation II 
(File other is to remove from the input text all the 
co-occurrences of word pairs included in bilingual dic- 
tionaries, freq'(w,,'a4) , freq'(w~), freq'(w~), ~md N' 
are given below: 
4It can happen thtLt, within it sentence bead, one word of 
a language has more than one corresponding word~ of the Ol)- 
posite language and all the correspondences are included in 
bilinguM dictionaries. In that case, formalizations in this sec- 
tion need some modifications. 
I079 
freq~(ws,w~t) = 
freq(w~, w't) - E freq(w~, wt, w~) 
We 
freq'(w~) = freq(w~) - E freq(w~'wt) 
wt 
freq'(w't) = freq(w't) - E freq(w'~,w;) 
,o, 
# and # (the correspondence of w~ w t 
is included in bilingual dictionaries) 
N' = N 
With this option, after all the co-occurrences of word 
pairs included in bilingual dictionaries are removed 
from the input text, word correspondences not in- 
cluded in bilingual dictionaries are estimated their 
validities. 
In the following sections, we temporarily adopt Es- 
timation I for estimating word correspondences not 
included in bilingual dictionaries. It is necessary to 
further investigate and compare the two estimation 
methods with large-scale experiments. 
4.3 Threshold Function 
As a threshold function for extracting appropriate 
corresponding word pairs, we use a hyperbolic fimc- 
tion of word frequency and estimated value for word 
correspondence. 
At first, we define the following variables and 
constants: s 
x = (co-occurrence frequency) 
y = (estimated value for word correspondence) 
a = (constant for eliminating low frequency 
words) ( 1.0 for both h 9 and hk ) 
b = (constant for eliminating words) 
with low estimated value) 
( 0.1 for h a and 0.3 for hk ) 
c - (lower bound of word frequency) 
( 2.5 for both h a and hk ) 
Then, the threshold function g(x, y) is defined as be- 
low: 
g(~,y) = ~(~-b) (~>c) a 
And the condition for extracting corresponding word 
pairs is given below: 
g(x,y) > 1 , x>c 
When using extracted word correspondences in sen- 
tence alignment and structural matching, at present 
we ignore the estimated values and use estimated 
word correspondences and word correspondences in 
bilingual dictionaries equally. 
SNote that values for constants arc determined temporarily 
and need fllrther investigation with large-scale experiments. 
Especially, constants related to word frequency have to be 
tnned to the length of texts. 
5 Experiment and Evaluation 
In this section, we report the results of a small ex- 
periment on aligning sentences in bilingual texts and 
statistically estimating word correspondences. 
The sentence alignment program and the word cor- 
respondence estimation program are named AlignCO. 
The processing steps of AlignCO are as follows: 
1. Given a bilingual text, content words are ex- 
tracted from each sentence. 
2. A Japanese-English dictionary of about 50,000 en- 
tries is consulted and word correspondence infor- 
mation is extracted for content words of each sen- 
tence. 
3. The sentence alignment program named 
AlignCO/A aligns sentences in the input text by 
the method stated in section 3.2. 
4. Given the aligned sentences in the bilingual text, 
the word correspondence estimation program 
named AlignCO/C estimates word correspon- 
dences which are not included in the Japanese- 
English dictionary with option Estimation I in 
section 4.2. 
5. Combining word correspondence information 
available in the Japanese-English dictionary and 
estimated by AlignCO/C, sentences in the input 
text are realigned. 
As input Japanese-English bilingual texts, we 
use two short texts of different length -- 1) "The 
Dilemma of National Development and Democracy" 
(305 Japanese sentences and 300 English sentences, 
henceforth "dilemma"), 2) "Pacific Asia in the Post- 
Cold-War World" (134 Japanese sentences and 123 
English sentences, henceforth "cold-war"). Since the 
results of Gale's method and Kay's method did not 
differ so much, we show the result of Gale's method 
only. 
5.1 Sentence Alignment 
The followings are five best results of sentence align- 
ment before and after estimating word correspon- 
dences not included in the Japanese-English dictio 
nary. The results are improved after estimating word 
correspondences not included in the bilingual dictio- 
nary. 
"dilemma" 
number of errors average 
(five best solutions) error rate 
1st trial 18 18 19 19 16 6.3% 
2nd trial 13 14 14 15 13 4.8% 
"cold-war" 
number of errors average 
(five best solutions) error rate 
1st trial 5 6 4 7 8 4.9% 
2nd trial 4 2 2 5 0 2.1% 
7080 
5.2 Word Correspondence Estimation 
We classify the estimated word correspondences into 
three categories, "correct", "part of phrase", and 
"wrong". "part of 1)hrase" means that the estimated 
word correspondence can be considered ~ part of cor- 
responding phrases. "errm" rate" is tile ratio of the 
number of "wrong" word correspondences to the to- 
tal nunlber. "dilemma" 
\[~otal 1\[ col:feet \]phras~ wr error ,'ate 
87 \] 53 30 __1 4.6% l_8r j _:_ / • 
"cold-war" 
I- lI l\[ -- total \[\[ correct ~l,hrase I wrong l~err°r rate 
The result of "dilemma" is better than that of "cold- 
war". This is because the former is longer than the 
latter. 
The tbllowings are example word correspondences 
of each category where f~, ft, and f~, are freq(w~), 
freq(wt), and freq(w~, we) respectively. The paren- 
thesized correspondence is not extracted by the 
threshokl flmetion. 
correct 
__ 'w~ ..... w~ ___hq f, ft f,t 
X )1/31 :/ sultan 0.75 4 3 3 
~'/~ press 0.80 5 4 4 
\['1 I\[1 liberal 0.64 20 15 14 
¢ ,.. ga~b~ econonfic 0.32 33 19 15 
part of phrase 
w, wt hg f., ft fz, 
civilian 1.00 
snpreinacy 0.83 
I-Q civilian 0.69 
J-~ supremacy 0.83 
{'~ ~\]d supremacy 0.44 
"\]l' ({,~! j ; civilian 0.37 
6 6 6 
6 5 5 
6 6 5 
6 5 5 
4 5 3 
4 6 3) 
wrong 
W,. we _ hg fs ft f*t 
~'d~ ~ does 0.49 6 3 3 
)XP V'~'f:: and ().41 47 62 5 
Most of "correct" corresl)ondences are proper 
names like " X J> Y 7/ sultan", or those which have 
different parts of speech, like " 1'1 Ill (noun) - lib- 
eral (adjective)" and " *~:'i~/(noun) econonlic (adjec- 
tive)", or those which can be considered as translation 
equivalents but not included in the Japanese-English 
dictionary, like " "~ (news) press". 
The examples of "part of phrase" form a phrase 
correspondence " ..O.:.\[~{~)lll civilian supremacy". 
The former "wrong" correspondence " ,~,~l~ (mean- 
lug) - does" comes from the cm'resf)ondence of long 
distance dependent phrases ",~I~, ~,J'7~ does 
mean". The latter "wrong" correspondence " :~.\[zf(: 
(pacitic ocean) and" is extracted by Gale's method 
because both freq(j~'lz"i'fi)and freq(and) are high 
and close to tile total number of sentence beads. This 
correspondence is not extracted by Kay's method. 
Then, in Fig. 2, we illustrate the relation between 
the estimated value h~(w~,w 0 of Gale's mettlod and 
the co-occurrence frequency freq(w~, wt) for the text 
"dilemma". Tile threshold function seenls optimized 
so that it extracts as many word correspondences of 
the category "correct" and "part of phrase" as possi- 
ble, and extracts as few word correspondences of the 
category "wrong" as possible. 
6 Concluding Remarks 
This paper described a unified framework for bilin- 
gual text matching by combining existing hand- 
written bilingual dictionaries and statistical tech- 
niques. F, specially, we described a nlethod for align- 
lug sentences using word correspondence information, 
and a method for estimating word correspondences 
,tot included in bilingual dictionaries. 
Fstimated word correspondence information will 
improve the results of structural matching of bilim 
gual sentences, it will be reported in the future. With 
the same techniques as those for estimating word co,'- 
respondences, it is quite easy 1;o estimate correspon- 
dences of phrases such as noun phrases and idiomatic 
expressions. Then, the results of structural matching 
will be much more ilnproved. 
In order to inlprove the accuracy of sentence align 
inent, we need to combine our word-correspondence- 
based method with those length-based methods in 
Brown et al. (1991) and Gale and Church (1993). 
In the case of Japanese-English texts, the word- 
based nlethod in Brown et al. (1991) inight be better 
than tile character-b~ed method in Gale and Church 
(1993). 
References 
Brown, P. F., Cocke, J., Pietra, S. A., Pietra, V. 
J. D. et al. (1990). A statistical approach to 
machine translation, Computational Linguistics 
16(2): 79-8~. 
Brown, P. F., Lai, J, C. and Mercer, R. L. (1991). 
Aligning sentences ill bilingual corpora, Pro- 
ceedings of th.e 29th Annual Meeting of ACL, 
pp. 169 176. 
Brown, P. F., Pietra, S. A. D., Pietra, V. J. D. and 
Mercer, R. L. (1993). The mathematics of statis- 
ticM machine translation: Parameter estimation, 
Computational Linguistics 19(2): 263 311. 
Chen, S. F. (1993). Aliguing sentences in bilingual 
corpora using lcxical information, Proceedings of 
th.e 31th, Annual Meeting of ACL, pp. 9-16. 
Church, K. W. (1993). Char_Mign: A program for 
Migning parMlel texts at the character level, Pro- 
ceedings of the 31th Annual Meeting of ACL, 
pp. l 8. 
Dagan, I., Itai, A. and Schwall, U. (1991). Two 
languages are more informative than one, Pro- 
I087 
A22 
O 
E 
UJ 
c 
O 
E 
LU 
0.8 
0.6 
0.4 
0.2 
0.1 
0 
0.8 
0.6 
0.4 
0.2 
0.1 
0 
o 
* 8 o 
o o 
o o 
o o 
i o ',., o 
correct o 
part of phrase , 
Threshold ........ 
freq=1.5,h=0.1 ............... 
<, 
i * i +'",,. o 
$ * ...... o o + o '--... 
:,~...*. * ......... ~ ............ ~ ............... ": ................ i.iiii.iiiii.i.iiiiiiiiiiiill 
~1 i i __ , 
0 1.52.5 5 10 15 20 
Co-occurrence Frequency 
25 
~- T 
wrong 
Threshold ....... 
freq = 1.5, h =0.1 
i o 
io',o o 
io'k 
................... 
0 1.52.5 5 10 15 20 25 
Co-occurrence Frequency 
Fig. 2: Estimatiun i)er Co-occurrence Frequency of Word Correspondences ("dilemma") 
ceedings of the 29th Annual Meeting of ACL, 
pp. 130 1137. 
Gale, W. A. and Church, K. W. (1993). A program 
for aligning sentences in bilingual corpora, Com- 
putational Linguistics 19(1): 75 102. 
Gale, W. aim Church, K. (\]991). Identifying word 
correspondences in parallel texts, Proceedings of 
the 4th DARPA @eeeh and Natural Language 
Workshop, pp. 152 157. 
Kaji, H., Kida, Y. and Morimoto, Y. (1992). Learn- 
ing translation templates from bilingual text, 
Proceedings of the 14th COLING, pp. 672 -678. 
Kay, M. and ll~ischeisen, M. (1993). 'Fext- 
translation alignment, Computational Linguis- 
tics 19(1): 121 \[42. 
Klavans, J. and Tzoukermann, F,. (1990). The BI- 
CORD System: Combining lexical information 
from bilingual corpora and machine readable 
dictionaries, Proceedings of the 13th COLING, 
Vol. 3, pp. 174 179. 
Kupiee, J. (1993). An algorithm for finding noun 
phrase correspondences in bilingual corpora, 
Proceedings of the 3lth Annual Meeting of ACL, 
pp. 17-22. 
Matsumoto, Y., Ishimoto, II. and Utsuro, T. (1993). 
Structural matching of bilingual texts, Proceed- 
ings of the 31th Anmtal Meeting of A CL, pp. 23 - 
30. 
Murao, H. (1991). Studies o11 bilingual text aligm 
ment, Bachelor Thesis, Kyoto University. (in 
Japanese). 
Sadler, V. and Vendelmans, R. (1990). Pilot imple- 
mentation of a bilingual knowledge bank, Pro- 
ceedings of the 13th COLING, Vol. 3, pp. 449 
451. 
Utsuro, T., Matsumoto, Y. and Nagao, M. (1992). 
Lexical knowledge acquisition from bilingual cor- 
pora, Proceedings of the 1\]tth COLING, pp. 581- 
587. 
Utsuro, T., Matsumoto, Y. and Nagao, M. (1993). 
Verbal case frame acquisition from bilingual cor- 
pora, Proceedings of the 13th IJCAI, pp. 1150- 
1156. 
1082 
