You'll Take the High Road and I'll Take the Low Road: 
Using a Third Language to hnprove Bilingual Word Alignment 
Lars Borin 
Department of Linguistics, Uppsala University 
Box 527 
SE-751 20 Uppsala, Sweden 
Lars.Borin@ling.uu.se 
Abstract 
While -independent selztence align- 
ment programs typically achieve a recall in the 
90 percent range, tile same cannot be said about 
word alignment systems, where normal recall 
figures tend to fall somewhere between 20 and 
40 percent, in tile -indepeudent case. 
As words (and phrases) for wlrious reasons 
are more interesting to align than sentences, 
we need methods to increase word alignment 
recall, preferably without sacrificing precision. 
This paper reports on a series of experiments 
with pivot aligtunent, which is tile use of one or 
more additional hmguages to improve bilingual 
word aligment. Tile conclusion is that in a 
multilingual parallel corpus, pivot alignment is 
a safe way to iucrcase word alignment recall 
without lowering the precision. 
1 Introduction 
For about a decade and a half now, researchers 
in Natural  processing (NLP) and 
general and applied linguistics have been 
working with parallel corpora, i.e., in the 
prototypical case corpora consisting of original 
texts in some soume  (SL) together 
with their translations into one oi" more target 
hmguages (TL). In general linguistics, they 
are used--in tile same fashion as monolingual 
corpora--as handy sources of authentic 
. In computational linguistics and 
hmguage engineering, various methods for 
(semi-)automatic extraction from such corpora 
of, among others, translation equiwdents, have 
been explored. 
2 Why is word alignment more 
interesting and why is it difficult? 
Alignment--tile explicit linking of items in the 
SL and TL texts judged to correspond to each 
other~is a prerequisite for the extraction of 
translation cquiwdents fi'om parallel corpora, 
and tile granularity of tile alignment naturally 
determines what kind of translation units you 
can get out of these resources. With sentence 
aligmnent, you get data which can be used 
in, e.g., translation memories. If you want 
to build bi- or multilingual lexica for machine 
translation systems (or for people), however, 
you want to be able to align parallel texts on 
the word (and phrase) level. This is because, 
in the last two decades, NLP grammars 
havc become increasingly lexicalized, and 
granlnlars 1'o1" machine translation--as opposed 
to translation melnories, or example-based 
machine translation, neither of which uses 
a grammar in any interesting sense of tile 
word--forna no exception in this regard. 
The entries of tile lexicon, which is the 
major repository of linguistic knowledge in 
a lexicalized gramnlar, are mainly made up 
of units oil tile linguistic levels of words and 
phrases. 
The problem here is that sentence alignment 
is a fairly well-understood problem, but 
word alignment is much less so. This means 
that while -independent sentence 
alignment programs typically achieve a recall 
in the 90 percent range, the same cannot be said 
about word alignment systems, where nornml 
recall ligures tend to fall somewhere between 
20 and 40 percent, in the -independent 
case. Thus, we need methods to increase word 
alignment recall, preferably without sacrificing 
97 
precision. 
There are many conceivable reasons for word 
alignment being less 'effective' than sentence 
alignment. Different  structures 
ensure that words comparatively more seldom 
stand in a one-to-one relationship between the 
s in a parallel text, because, e.g., 
• SL function words may correspond to 
TL grammatical structural features, i.e. 
morphology or syntax, or even to nothing 
at all, if the TL happens not to express 
the feature in question. At the same 
time, function words tend to display a 
high type frequency, both because of high 
functional load (i.e., they are needed all 
over the place) and because they tend to 
be uninflected (i.e. each function word 
is typically represented by one text word 
type, while content words tend to appear 
in several inflectional variants). This of 
course means that function words will 
account for a relatively large share of 
the differences in recall figures between 
sentence and word alignment; 
• orthographic conventions may disagree on 
where word divisions should be written, 
as when compounds are written as several 
words in English, but as single words 
in German or Swedish, the extreme case 
being that some orthographies get along 
entirely without word divisions; 
• word alignment must by necessity 
(because word orders differ between 
s) work with word types rather 
than with word tokens, while sentence 
I Alignment recall is here understood as the number 
of units aligned by the alignment program divided by 
the total number of correct alignments (established by 
independent means, normally by human annotation). 
Precisiolz is the number of correct alignments (again 
established by independent means) divided by the 
number o1' units aligned by the alignment program (i.e., 
the numerator in the recall calculation). We will not in 
this paper go into a discussion of null alignments (source 
 units having no correspondence in the target 
 expression) or partial alignments (part, but not 
all, of a phrase aligned), as we believe that the results we 
present here are not dependent on a particular treatment 
of thcse--admiuedly troublesome--phenomena. 
alignment always works with sentence 
tokens, 2 i.e., it relies on linear order. 
This means that polysemy (one type 
in the SL corresponding to several 
types in the TL), homonymy (several 
types in the SL corresponding to one 
type in the TL), and combinations of 
polysemy and homonymy will disrupt the 
correspondence even between structurally 
similar s; 
Thus, the circumstance that linear order 
cannot be used to constrain word alignment 
--beyond the restriction that putative word 
alignments must appear in one and the 
same sentence alignment unit--together with 
the other factors .just mentioned, conspire 
to make word alignment a much harder 
problem than sentence alignment in the 
-independent case. 3 
3 Improving word alignment by 
combining knowledge sources 
The project in which the research reported 
here has been carried out, the ETAP project 
(see section 8, below), is a parallel translation 
corpus project, the aim of which is to create 
an annotated--understood as part-of-speech 
(POS) tagged and aligned--multilingual 
translation corpus, which will be used as the 
basis for the development of methods and 
tools for the automatic extraction of translation 
equivalents. 
Lately, we have been concentrating on 
finding good ways to improve word alignment. 
Tile word alignment system we currently use 
(which was developed in a sister project 
in our depamnent, the PLUG project; see 
Sfigvall Hein (to appear)) works itel'atively 
with many kinds of information sources, and 
it seems that this is a good way to proceed. 
Distributional parallelism, coocurrence, string 
2In parallel corpus alignment, that is, but not e.g. in 
searching in translation memories. 
3We must stress that we are talking about the 
la,guage-i, dependellt case here. For any particular 
 pair, -specific linguistic (and possibly 
other) information can be used to improve both sentence 
and word alignment, although the former will probably 
still stay ahead of the latter in terms of perfor,nance. 
98 
similarity (both between and within s), 
and part of speech are some of the information 
sources used, and also (heuristically based) 
stemming to increase type frequencies for the 
distributional measures (see, e.g. Tiedemann 
(to appear a), Tiedemann (to appear b); 
Molamed (1995), Melamed (1998)). 111 OUl" 
work in the ETAP proiect we are looking for 
additional such information sources, and so far 
we have coricoiltratod our ell:errs ori oxploririg 
linguistically rich information, such as word 
similarity (Berth, 1998) and the combination 
Of word alignlllent and POS taggillg (Borill, to 
appear a). 
There must certainly exist other sources of 
information, in addition t:o those mentioned 
above, lhat carl be used to ilnprovo word 
alignlnent. This paper discusses one particular 
such source, namely the use of a third hlnguage 
in the aligmnent process. Apart fronl an earlier 
presentation by the present author (Berth, to 
appear b), I have not seen any mention in 
the literature of the possibility of using a 
third  in this way for improving word 
alignmorit. Simard (1999) describes how the 
use of a third  can be brought to 
bear upon the simpler problem of senlence 
alignment, but he does not consider the harder 
problem of word alignmenl. Perhaps it has 
not being thought of for the silnplc reason that 
it is possible only with ###ulUlingual parallel 
corpora, and--for obvious reasons--not with 
b/lingual corpora, which has been the kind of 
parallel corpus that has received nlost attention 
from researchers in the field. 
4 Pivot alignment 
Since the third  acts as, as it were, 
a pivot for the alignment of the two other 
s, we refer to the method as pivot 
alignment, and it works as follows, with three 
s, e.g. Swedish (SE), Polish (PL) 
and Serbian-Bosnian-Croatian (SBC), where 
the aim is to align Swedish with the other two 
s on the word level. 
. 
. 
. 
Perform the pairwise alignments SE-~PL, 
SE-->SBC, PL--->SBC, and SBC-oPL; 
Check whether there exist aligned 
words on the indirect 'alignment path '4 
SE-oSFJC-oPL, which are not on the 
direct path SE--->PL. If there are, add them 
to the SE-oPL alignnaents. 
Do the same for the indirect path 
SE-->PL-oSBC and the direct path 
SE-oSBC 
In order lor this procedure to work, we must 
believe that 
1. there will be differences in tile SE-+PL 
and SE--bSBC alignments, and 
2. that these dilTerenees will 'survive' the 
PI,---bSBC and SBC-->PL aligments. 5 
Hypothesis (1) seems plausible, since the 
word alignment system used (Tiedemann (to 
appear a), Tiedenlann (to appear b)) actually 
aheady utilizes several kinds of information 
to align the words in the two texts. In 
particuhu, it uses distributional information, 
cooccurrence statistics, iterative size reduction, 
'naive' stemming, and string simihuity to 
select arid rank word alignment carididates 
(but #*el linear order; cf. also section 3 
above). Thus it is fully conceiwtble, e.g., 
that distributional information will provide 
one o t' the links and word similarity the 
other in a three- path, such as 
SE-->PL--->SBC, 6 while synonymy or polysemy 
(i.e., distributional differences; see above) will 
4It is this metaphor of the alignments going by 
different 'paths' or 'roads' to lhe salne goal which has 
inspired nle lo borrow the firsl part of the title of this 
paper frolll tile chorus of tile song "Loeb LolllOlld". 
5Incidentally, the indirect path could be extended 
with lilt)re lallgtlaoes, e.g. Swedish--> Polish--+ 
E,lglish-o Spanish, etc., but we have not investigated this 
possibility, although we explore the possibility of using 
several additional s in parallel, below. 
6This is perhaps intuitively the most likely 
situation in this particular case, since Polish and 
Scrbian-Bosnian-Croatian are fairly closely rchlted 
Slavic s lhat share many easily recognizable 
cognates, while both ~.ll'e lllHch lllOrc reinoiely related to 
Swedish 
99 
s 
aligned 
se-sbc 
+ se-pl-sbc 
se-pl 
+ se-sbc-pl 
SO-OS 
+ se-en-es 
se-Oll 
+ se-os-0n 
found 
links 
82 
1 
83 
57 
4 
61 
87 
8 
95 
95 
4 
99 
links in recall 
standard 
429 19.11% 
19.35% 
370 15.41% 
16.49% 
454 19.16% 
20.92% 
442 21.49% 
22.40% 
correct partly not 
(C) con: (PC) con: 
57 17 8 
1 
58 17 8 
37 14 6 
4 
41 14 6 
65 14 8 
7 1 
72 14 9 
70 14 11 
2 2 
72 14 13 
precision, precision 
correct C + PC 
69.51% 90.24% 
69.88% 90.36% 
64.91% 89.47% 
67.21% 90.16% 
74.71% 90.80% 
75.79% 90.53% 
73.68% 88.42% 
72.73% 86.87% 
Table 1." First pivot alignment experiment results (null links in standard not counted) \[From Borin (to appear b)\] 
prevent the first link to be made on the direct 
path SE-+SBC. 
5 An experiment with pivot 
alignment 
In recent work (Borin, to appear b), we reported 
on a small preliminary experiment to test the 
feasibility of the method. We proceeded as 
follows: 
1. The ETAP IVTI corpus was used for 
the experiment. This is a five- 
parallel translation corpus of text from 
the Swedish newspaper for immigrants 
(Invandrartidningen; the English version 
is called News and Views). Swedish 
is the source , and the other 
four s are English (EN), Polish, 
Serbian-Bosnian-Croatian and Spanish 
(ES). The IVTI corpus has roughly 
100,000 words of text in each ; 
2. The PLUG link annotator (Merkel (1999), 
Merkel et al. (to appear )) was 
used to produce evaluation standards 
("gold standards") for the following 
alignment directions: SE-+PL, SE-+SBC, 
PL--+SBC, SBC---,'PL in one group, and 
SE-+EN, SE-+ES, EN--+ES, ES-+EN in 
the other. 500 words were sampled 
randomly fl'om the Swedish source text, 
and the standards with Swedish as the 
source were made manually by me from 
this sample. The target units of these 
. 
. 
standards were then used as the basis for 
the manual establishment (again by me) 
of the various target  alignment 
evaluation standards. Because of null 
links, misaligned or differently aligned 
sentences, etc., the size of the evaluation 
standards varied fi'om 366 to 500 words; 
In addition to the already word aligned 
SE--+{EN,ES,PL, SBC}, we aligned the 
other  pairs necessary for the 
experiment; 
The evaluation function in the aligmnent 
system was used to calculate recall and 
precision for each word alignment. In 
addition to this, we manually extracted 
the additional links, if any, that would 
be found on the indirect path through the 
third . 
The null links mentioned in (2) above were 
largely due to the sampling procedure choosing 
many function words, which often (also in this 
case) are troublesome in the context of finding 
good translation equivalents, since they may 
not correspond to words in the TL (see section 
2 above). 
The results of the preliminary experiment are 
shown in Table 1. 
We see that only a few units survived the 
trip through two s, but out of those 
that did, most contributed positively to the 
total result. SE-+ES and SE-+PL were the 
alignments which benefitted most from pivot 
100 
s 
aligned (standard) 
sc-pl (501 ) 
+ se-en-pl 
+ se-es-pl 
+ se-sbc-pl 
sc-cs (501) 
+ Se-OIl-CS 
+ se-i~l-es 
so-on (501) 
+ SC-eS-Cll 
+ se-pl-en 
se-sbc (501) 
+ sc-pl-sbc 
COI'I'{~Cg 
lillks 
112 
2 
2 
6 
167 
9 
6 
139 
7 
2 
137 
2 
Hot 
COITeCt 
11 
13 
I 
1 
12 
1 
accumttlated recall precision 
convct 
+2 
+2 
+5 
+9 
+9 
+3 
+12 
24.55% 
24.95% 
24.95% 
25.75% 
26.35% 
35.93% 
37.92 
37.13% 
38.52% 
91.06% 
91.20% 
91.20% 
91.47% 
91.67% 
92.78% 
92.63% 
93.01% 
92.75% 
30.14% 92.05% 
+7 31.74% 91.82% 
+1 30.54% 92.16% 
+8 31.94% 91.87% 
28.94% 94.48% 
+2 29.34% 94.56% 
Table 2." New pivot aligmnent experiment results 
(null links i J1 stamlard not counted," 
correct and partly correct lillk.s" counted together) 
alignment (through EN and SBC, respectively), 
while the result wets insignificant for SE-+SBC 
and perhaps even detrimental in the case of 
SE--+EN. 
We saw these results as suggestive, rather 
than conclusive. It certainly seemed that 
the closer genetic relatedness of the two 
Shtvic s worked to our advantage, 
but we concluded that we needed to do 
more experiments, bolh with more  
combinations and with a modilied sampling 
procedure. In pmticular, we wanted to get rid 
o1' the problematic function words (see above). 
Since the recall is faMy low to start with, 
even a few correct additional alignments mean 
a great deal for the overall performance of the 
word alignment system. Thus, we thought that 
this approach would be worth pursuing t'urther. 
6 A new experiment with pivot 
alignment 
To coufirm these results, we redesigned slightly 
and extended our experimental procedure, in 
tile following way. A new sampling o1' the same 
corpus was performed, but this time we lirst 
constructed a stop word list consisting of the 50 
most frequent word types in the Swedish part 
of the IVT 1 corpus, as a -independent 
way of approximating the set o1' function words 
in the . Thus, we had a new sample, 
with more content words, to compare with the 
previous one, tile hyt~othesis being that a larger 
percentage of content words would be able to 
contribute more links in the pivot alignment 
process. 
We also added some new hmguage 
combinatious, so that we now would be 
able to whether there is a difference in using 
Spanish as a pivot in aligning Swedish and 
English, as opposed to using Polish. We also 
investigated what the result would be of using 
more than one additional  in parallel. 
The new pivot alignment paths investigated 
(in addition to the ones investigated in the lirst 
experiment) are represented by the following 
' triads': 
• SE-->EN-~PL 
• SE--~ES-+PL 
• SE-->PL-+EN 
• SE-->PL-+ES 
The hypothesis wets that the new setup 
would make the possible effect of close genetic 
relatedness more discernible, which indeed 
seems to be the case (see below). 
The results of the new experiment are shown 
in Table 2. We see that 
101 
• initial (non-pivot alignment) recall has 
gone up quite a bit, presumably because 
function words have been avoided in the 
standard; 
• initial alignment precision still remains at 
the same high level as before; 
• all but two of the alignments added by 
pivot alignment are correct, i.e. recall is 
raised without a decrease in precision; 
• difl~rent pivot s add different 
alignments, i.e. there seems to be a 
cunmlative positive effect fiom adding 
more s; 
• the degree of relatedness of the s 
in a triad seems to play a role for how 
well pivot alignment will work for the 
particular triad. 
7 Discussion and conclusions 
With the new experimental setup, we conlirmed 
the results fiom the earlier experiment, i.e., 
recall increases, but precision does not suffer. 
This tendency is even more marked in the 
new series of experiments, in addition, there 
seems to be a clear division along genetic 
lines; Polish is the best pivot  for 
Swedish-Serbian-Bosnian-Croatian alignment, 
and vice versa, while Spanish works better 
together with English. Another subcorpus in 
the ETAP project contains a Finnish part, and 
we aim at investigating the effects of using this 
non-Indo-European  (all the s 
are Indo-European in the two experiments 
described here) as one of the s in a 
similar experiment 
It seemed that the choice of content words (or 
rather: lower-frequency words) over function 
words did lead to a better result, but this should 
be t'urther investigated. 
We also see that the more s we add, 
the better the results become, i.e., different 
additional s complement each other. 
In general, there was little overlap in the 
contributions that each  added to the 
final result. 
It should be mentioned at this point, that 
the sampling and annotation procedure used 
did not allow us to check up on incorrect 
alignments which may have propagated 
through the pivot . The sampling 
procedure would have to be redesigned for this 
to be possible, 7 which we plan to do in the 
future. 
For the same reason, we do not have all 
the data needed to calculate the significance 
of the results. Thus, the results will have to 
remain suggestive for the time being, although 
the suggestion is strong that pivot alignment 
works the way it was hypothesized to work. 
In sc.:nmary, the results are encouraging, in 
that the links added through pivot alignment 
were largely correct links, i.e. pivot alignment 
could be expected to make a positive and safe 
contribution--i.e, increasing recall without 
lowering precision--in a word alignment 
system as one of many independent knowledge 
sources. 
8 Acknowledgements 
The research reported here was carried out 
within the ETAP project (Borin, to appear 
c), supported by the Bank of Sweden 
Tercentenary Foundation as part of the research 
programme Translation and Interpreting--a 
Meeting between Languages and Cultures. See 
http://www.translation.su.se/ 
Leif-J6ran Olsson, who is responsible for 
systems development in the ETAP project, 
wrote most of the software which made the 
experiment reported here possible. 
I wish to thank the members of the PLUG 
project for generously letting us use the Uplug 
system and the PLUG link annotator. 
7To do this, you would sample sentences instead of 
sampling words randomly throt, ghout the corpus, which 
is the way it is done at present. Actually, the sampling 
and annotation software was devised for strictly bilingual 
word alignment evaluation, and not t"o1" the purpose 
which it has been pressed into serving here. 
102 

References 

Lars Borin. 1998. Linguistics isn't always the 
answer: Word comparison in computational 
linguistics. In The l lth Nordic Confereuce 
on Computatioual Linguistics. NODALIDA 
'98 Proceedings, pages 140-151. Center 
for Sprogteknologi and Dept. of General 
and Applied Linguistics, University of 
Copenhagen. 

Lars Borin. to appear a. Alignment and 
tagging. In Parallel Corpora, Parallel 
Worlds. Dept. of Linguistics, Uppsaht 
University. 

Lars Borin. to appear b. Pivot alignment. In 
Proceedings of the 12th Nordic Conference 
ou Compulatiomtl Liuguislics (Nodalida99). 

Lars Borin. to appear c. The ETAP 
project -- a presentation and status report. 
ETAP research report etap-rr-01, Dept. of 
Linguistics, Uppsala University. 

I. Dan Mehtmed. 1995. Automatic evaluation 
and uniform filter cascades for inducing 
N-best translation lexicons. In Proceedings of the Third Workshop ou Very, Large Corpora. 

I. Dan Melamed. 1998. Word-to-word models 
of translational equiwflence. Technical 
Report IRCS Technical Report -#98-08, 
Department of Computer and Information 
Science, University of Pennsylwmia. 

Magnus Merkel. 1999. Undelwlauding 
am/ Enhanciug Trauslatiou I) 3, Parallel 
Text Processing. Dept. of Computer and 
Information Science, Link6ping University. 

Magnus Merkei, Mikael Andersson, and Lars 
Ahrenbmg. to appear. The PLUG Link 
Annotator - interactive construction of data 
froln parallel corpora. In PcuMlel Corpolzt, 
Parallel Worlds. Dept. of Linguistics, 
Uppsala University. 

Anna Sagvall Hein. to appear. The PLUG 
project: Parallel corpora in Link(Sping, 
Uppsala, G~3teborg: Aims and achievements. 
In Parallel Corpora, Parallel Worlds. Dept. 
of Linguistics, Uppsala University. 

Michel Simard. 1999. Three s are 
better than two. In Proceedings of the 1999 
Joint SIGDAT Conference on Empirical 
Methods in Natural Language Processing 
and Very Large Corpora, pages 2-11. 

J6rg Tiedemann. to appear a. Uplug - a 
modular corpus tool for parallel corpora. In 
Parallel Corpora, l'alz~llel Worlds. Dept. of 
Linguistics, Uppsala University. 

J6rg Tiedemann. to appear b. Word alignment 
step by step. In Proceediugs q/' the 
12th Nordic Confereuce ou Compulalioual 
Liuguis'tics (Nodalida99). 
