USING SIqVTA CTICDEPENDENCIES FOR WORD ALIGNMENT 
Fathi DEBILI - Elybs SAMMOUDA - Adnane ZRIBI 
CNRS- idl 
27, rue Damesme, 75013 Paris 
Phone : (33-1) 43 50 54 01 - Fax : (33-1) 45 89 17 32 
e-mall : debili@idl.msh-paris.fr 
Abstract 
We attack the problem of aligning words from pairs of 
bilingual sentences, rather than the well-known, and 
somewhat easier, problem of aligning sentences. The 
method that we develop is based on the use of bilingual 
dictionaries, having supposed that lemmatization has 
taken place. We first show that this method performs 
poorly in terms of silence and noise. To improve its 
performance we introduce syntactic dependency 
relations between the words in each of the two 
sentences considered. In this sense the syntagmatic 
level comes to the rescue of the paradigmatic level at 
which the alignment actually takes place. 
I. Introduction 
Given that two sentences F and E are translations of 
each other. Is there a simple method for aligning the 
words in each sentence? In other words is word 
alignment algorithmically simple to implement? We 
shall see that this problem is extremely delicate, even 
to be done by hand. To convince oneself, one need 
merely attempt it in order to see how quickly the 
choices pass from trivial to complicated. The 
difficulties stem from there not always being a simple 
one-to-one correspondence between the words of the 
two sentences. One word may correspond to many 
words (an expression); in other cases, one or many 
words may correspond to no other words. On the other 
hand, word order is rarely maintained, and to top 
things off, different syntactic status create complicated 
pairings when they do exist. 
IL Conventions and restrictions concerning 
manual alignment of words 
Let's begin by the cases that will be excluded due to the 
excessive level of difficulty that they present. 
Consider the two sentences ~ • 
E17 : BUtl the2 similarities3 are4 illus°rYS" 6 
F18 : Ces l c°mparais°ns2 °nt 3 leurs4 limites 5 : 6 
1. All the pairs of sentences as examples are extracted from 
The Acoustics of the Harpsichord (SCIENTIFIC 
AMER/CAN, February 1991) and its French translation 
L'acoustique du clavecin (POLrR LA SCIENCE, avril 1991) 
Although these sentences correspond to each other in 
the text that they appeared in, we cannot establish an 
alignment of their words. We will not study these cases 
for a few reasons: first, outside of their context it is 
difficult, even for human readers, to affirm their 
semantic relation as a translation; secondly, in order to 
align these sentences, the entire sentences must be 
considered as an expression, and this is debatable. 
How can manual alignments be represented? 
We will distinguish the alignment of words and groups 
of words whose mutual translation is established with 
the aid of a bilingual dictionary from alignments that 
are made from a local recomposition based on human 
"comprehension" of the two sentences. 
We will use the equal-sign (=) to mark links which 
come from a bilingual dictionary and the star symbol 
(*) to mark comprehension correspondences. We will 
call the first type of correspondence "lexical 
correspondence" and the second type "contextual 
correspondence". 
The alignments (l-n) (m-l) or (m-n) are characterized 
by the presence, on the same line, of more than one * 
or =. Let's give an example: 
F93 : Unel partiee seulement3 de~ cesj vibrations6 
contribuez aus son9 ~misw parn le~2 clavecin~ ,~4 
maistz tousle les~z mouvementsls d~terminentt9 le2o ~¢ 2t 
carahct~re22 ~23 de24 1'25 instrumente6 . 27 
Ej20 : Onlyt some2 ors this4 vibrational~ activity6 
contributes7 tos radiating9 soundlo of~z the12 
harpsi chord t ~ . 14 
!F93 El20 i 2 3 4 5 6 7 8 9 i0 il 12 13 14 
Une l = 
partie2 = 
seulement3 = 
de~ = 
ces5 = 
vibrations6 = * 
contribue7 = 
au8 = 
SOn 9 = 
~mislo * 
  
 188 
III. Hypothesis 
As a basis of our algorithm we find the following 
hypothesis. Consider two sentences F and E which are 
translations of each other. 
We say that two words j~ and ej, belonging to F and E 
respectively, correspond to each other if: i) they are 
translations of each other; ii) they enter into the same 
dependency relations with their neighbors; iii) they 
occupy the same positions. 
IV. Potential Alignments 
Consider the two sentences F and E. The potential 
alignment of words is obtained by comparing each of 
the words of one sentence with all of those from the 
second sentence. The comparisons ~, ej) are 
established with the help of a simple word transfer 
dictionary and the results are stored in a m x n matrix 
(m being the number of words in the French sentence 
and n in the English). Each element receives a note 
that is higher if the two words are: i) translations in 
the dictionary, ii) long, iii) in the same position. 
V. Ambiguity, noise and silence 
An alignment is 'ambiguous' if more than one solution 
is produced. Typology of errors (noise, silence): We 
will call errors of noise those alignments created 
between words should not be aligned, and errors of 
silence missing alignments between words which were 
manually aligned. 
VI. The reasons for noise and silence 
Noise: At the root of noisy alignments we find the 
problem ofpolysemy. When it is not resolved, it causes 
words to be aligned through sense that are improper in 
the current context. 
Another source of error corresponds to simple errors of 
alignment: the two words are translations of each other 
but in the present context they should not be aligned. 
For example, in the following sentences areas28 was 
incorrectly aligned with zones17. 
E91 : When I the 2 soundboard 3 vibrates 4 at 5 one 6 of 7 
its8 resonant9 frequencieslo ,11 thel2 glitter13 
bouncesl4 out15 of 16 regions17 thatl8 are19 
moving20 and21 collects22 along23 nodal24 lines25 
,26 °r27 areas28 where29 the30 s°undb°ard31 is32 "'" 
F69 : Lorsque I la 2 table d'harmonie 3 vibre 4 a5 l'une6 
de7 ses8 fr~quences9 delo res°nance11 ,12 lesl3 
paillettesl4 quittentl5 les16 zones17 enl8 
mouvement19 ... 
Silence: The main problem is something missing from 
the dictionary: either the head word is not present, or 
the correct translation is absent. This is essentially the 
non-recognition of synonymy that is the problem. 
VII. Resolution by Analogic Reasoning 
In order to reduce both noise and silence, we use a 
mechanism based on analogical reasoning. This is 
based on the following fundamental hypothesis: 
paradigmatic relations can help determine 
syntagmatic relations and vice-versa. 
Using monolingual dependency relations. 
The resolution mechanism can be understood from the 
following diagram. 
S 
1 
French ~. 
............. | ............................................... 
I -- 
ep R, eq 
Analogic Rectangle 
On this figure are represented four words of which two 
are aligned (~, ep). Syntactic dependencies between 
two other pairs of words ~ j~\] and \[ep eq\] are also 
represented. We want to know how valid the 
alignment betweenJ~ and e e is. 
To answer this, we reason in the following way: 
1. On the syntagmatic plane, 
\[\] sincej~ is in relation withj~ (the relation R/ 
being supposed valid), 
\[\] since e e is in relation with eq (the relation 
R, being valid), 
2. on the paradigmatic plane, 
\[\] since ~ is the translation of eq (supposing 
the alignment relation P2 is valid), 
then we conclude, by analogy, that the alignment 
relation Pl is also valid, in other words that j~ and e e 
are translations of each other in this context. This 
degree of validity will be stronger as the dependency 
relations Rf and 1% are close (identical or compatible) 
and as P1 and P2 get close to identity. 
We will call strong resolution one that confirms an 
existing potential alignment, and weak resolution one 
that negates an existing alignment or that creates a 
new alignment. 
VIII. Conclusion 
The algorithm presented here subdivides into three 
phases. The first phase is construction: based on 
lexical proximity, we try to establish all the possible 
links between the words of the two sentences being 
aligned. The second phase is one of elimination: using 
syntactic dependencies we attempt to resolve 
ambiguous attachments and to undo nonambiguous but 
erroneous attachments. The third step is again one of 
construction: we attempt to reduce silence. 
We repeat that even human solutions to alignments are 
subject to wide variations, which shows the difficulty 
of problem. 
Ackowledgements to tiadhemi Achour, Chiraz Ben 
Othman, Emna Souissi and Gregory Grefenstette. 
