Word Alignment with Cohesion Constraint
Dekang Lin and Colin Cherry
Department of Computing Science
University of Alberta
Edmonton, Alberta, Canada, T6G 2E8
flindek,colincg@cs.ualberta.ca
Abstract
We present a syntax-based constraint for word
alignment, known as the cohesion constraint. It
requires disjoint English phrases to be mapped
to non-overlapping intervals in the French sen-
tence. We evaluate the utility of this constraint
in two different algorithms. The results show
that it can provide a significant improvement in
alignment quality.
1 Introduction
The IBM statistical machine translation (SMT) models
have been extremely influential in computational linguis-
tics in the past decade. The (arguably) most striking char-
acteristic of the IBM-style SMT models is their total lack
of linguistic knowledge. The IBM models demonstrated
how much one can do with pure statistical techniques,
which have inspired a whole new generation of NLP re-
search and systems.
More recently, there have been many proposals to
introduce syntactic knowledge into SMT models (Wu,
1997; Alshawi et al., 2000; Yamada and Knight, 2001;
Lopez et al., 2002). A common theme among these
approaches is the assumption that the syntactic struc-
tures of a pair of source-target sentences are isomor-
phic (or nearly isomorphic). This assumption seems too
strong. Human translators often use non-literal transla-
tions, which result in differences in syntactic structures.
According to a study in (Dorr et al., 2002), such transla-
tional divergences are quite common, involving 11-31%
of the sentences.
We introduce a constraint that uses the dependency tree
of the English sentence to maintain phrasal cohesion in
the French sentence. In other words, if two phrases are
disjoint in the English sentence, the alignment must not
map them to overlapping intervals in the French sentence.
For example, in Figure 1, the cohesion constraint will rule
out the possibility of aligning to with `a. The phrases the
reboot and the host to discover all the devices are dis-
joint, but the partial alignment in Figure 1 maps them to
overlapping intervals. This constraint is weaker than iso-
morphism. However, we will show that it can produce a
significant increase in alignment quality.
The reboot causes the host to discover all the devices 
det subj det subj aux 
pre 
det 
obj mod 
  à    la Suite réinitialisation  ,  l'  hôte repère tous les périphériques 
after to the reboot the host locate all the peripherals 
1 2 3 4 5 6 7 8 9 10 
1 2 3 4 5 6 7 8 9 10 11 
Figure 1: A cohesion constraint violation
2 Cohesion Constraint
Given an English sentence E = e1e2:::el and a French
sentence F = f1f2:::fm, an alignment is a set of links
between the words in E and F. An alignment can be
represented as a binary relation A in [1;l] [1;m]. A
pair (i;j) is in A if ei and fj are a translation (or part
of a translation) of each other. We call such pairs links.
In Figure 2, the links in the alignment are represented by
dashed lines.
The reboot causes the host to discover all the devices 
det subj det subj aux 
pre 
det 
obj comp 
  à    la Suite réinitialisation  ,  l'  hôte repère tous les périphériques 
1 2 3 4 5 6 7 8 9 10 
1 2 3 4 5 6 7 8 9 10 
11 
after to the reboot the host locate all the peripherals 
Figure 2: An example pair of aligned sentence
The cohesion constraint (Fox, 2002) uses the depen-
dency tree TE (Mel’ˇcuk, 1987) of the English sentence
to restrict possible link combinations. Let TE(ei) be
the subtree of TE rooted at ei. The phrase span of ei,
spanP(ei;TE;A), is the image of the English phrase
headed by ei in F given a (partial) alignment A. More
precisely, spanP(ei;TE;A) = [k1;k2], where
k1 = minfjj(u;j)2A;eu2TE(ei)g
k2 = maxfjj(u;j)2A;eu2TE(ei)g
The head span is the image of ei itself. We define
spanH(ei;TE;A) = [k1;k2], where
k1 = minfjj(i;j)2Ag
k2 = maxfjj(i;j)2Ag
In Figure 2, the phrase span of the node discover is
[6, 11] and the head span is [8, 8]; the phrase span of the
node reboot is [3, 4] and the head span is [4, 4]. The word
cause has a phrase span of [3,11] and its head span is the
empty set;.
With these definitions of phrase and head spans, we de-
fine two notions of overlap, originally introduced in (Fox,
2002) as crossings. Given a head node eh and its modi-
fier em, a head-modifier overlap occurs when:
spanH(eh;TE;A)\spanP(em;TE;A)6=;
Given two nodes em1 and em2 which both modify the
same head node, a modifier-modifier overlap occurs
when:
spanP(em1;TE;A)\spanP(em2;TE;A)6=;
Following (Fox, 2002), we say an alignment is cohe-
sive with respect to TE if it does not introduce any head-
modifier or modifier-modifier overlaps. For example, the
alignment A in Figure 1 is not cohesive because there
is an overlap between spanP(reboot;TE;A)=[4;4] and
spanP(discover;TE;A)=[2;11].
If an alignmentA0violates the cohesion constraint, any
alignment A that is a superset of A0 will also violate the
cohesion constraint. This is because any pair of nodes
that have overlapping spans in A0 will still have overlap-
ping spans in A.
Cohesion Checking Algorithm:
We now present an algorithm that checks whether an
individual link (ei;fj) causes a cohesion constraint vi-
olation when it is added to a partial alignment. Let
ep0;ep1;ep2;::: be a sequence of nodes in TE such that
ep0=ei and epk=parentOf (epk 1) (k = 1;2;:::)
1. For all k 0, update the spanP and the spanH of
epk to include j.
2. For each epk (k > 0), check for a modifier-modifier
overlap between the updated the phrase span of
epk 1 and the the phrase span of each of the other
children of epk.
3. For each epk (k > 0), check for a head-modifier
overlap between the updated phrase span of epk 1
and the head span of epk.
4. If an overlap is found, return true (the constraint is
violated). Otherwise, return false.
3 Evaluation
To determine the utility of the cohesion constraint, we
incorporated it into two alignment algorithms. The algo-
rithms take as input an English-French sentence pair and
the dependency tree of the English sentence. Both algo-
rithms build an alignment by adding one link at a time.
We implement two versions of each algorithm: one with
the cohesion constraint and one without. We will describe
the versions without cohesion constraint below. For the
versions with cohesion constraint, it is understood that
each new link must also pass the test described in Sec-
tion 2.
The first algorithm is similar to Competitive Linking
(Melamed, 1997). We use a sentence-aligned corpus
to compute the  2 correlation metric (Gale and Church,
1991) between all English-French word pairs. For a given
sentence pair, we begin with an empty alignment. We
then add links in the order of their  2 scores so that each
word participates in at most one link. We will refer to this
as the  2 method.
The second algorithm uses a best-first search (with
fixed beam width and agenda size) to find an alignment
that maximizes P(AjE;F). A state in this search space
is a partial alignment. A transition is defined as the ad-
dition of a single link to the current state. The algorithm
computes P(AjE;F) based on statistics obtained from a
word-aligned corpus. We construct the initial corpus with
a system that is similar to the  2 method. The algorithm
then re-aligns the corpus and trains again for three iter-
ations. We will refer to this as the P(AjE;F) method.
The details of this algorithm are described in (Cherry and
Lin, 2003).
We trained our alignment programs with the same 50K
pairs of sentences as (Och and Ney, 2000) and tested it on
the same 500 manually aligned sentences. Both the train-
ing and testing sentences are from the Hansard corpus.
We parsed the training and testing corpora with Minipar.1
We adopted the evaluation methodology in (Och and Ney,
2000), which defines three metrics: precision, recall and
alignment error rate (AER).
Table 1 shows the results of our experiments. The first
four rows correspond to the methods described above. As
a reference point, we also provide the results reported in
(Och and Ney, 2000). They implemented IBM Model 4
by bootstrapping from an HMM model. The rows F!E
1available at http://www.cs.ualberta.ca/~lindek/minipar.htm
Table 1: Evaluation Results
Method Prec Rec AER
 2 w/o cohesion 82.7 84.6 16.5
w/ cohesion 89.2 82.7 13.8
P(AjE;F) w/o cohesion 87.3 85.3 13.6
w/ cohesion 95.7 86.4 8.7
F!E 80.5 91.2 15.6
Och&Ney E!F 80.0 90.8 16.0
Refined 85.9 92.3 11.7
and E!F are the results obtained by this model when
treating French as the source and English as the target
or vice versa. The row Refined shows results obtained
by taking the intersection of E!F and F!E and then
refining this intersection to increase recall.
From Table 1, we can see that the addition of the cohe-
sion constraint leads to significant improvements in per-
formance with both algorithms. The relative reduction in
error rate is 16% with the  2 method and 36% with the
P(AjE;F) method. The improvement comes primarily
from increased precision. With the P(AjE;F) method,
this increase in precision does not come at the expense of
recall.
4 Related Work
There has been a growing trend in the SMT community
to attempt to leverage syntactic data in word alignment.
Methods such as (Wu, 1997), (Alshawi et al., 2000) and
(Lopez et al., 2002) employ a synchronous parsing proce-
dure to constrain a statistical alignment. The work done
in (Yamada and Knight, 2001) measures statistics on op-
erations that transform a parse tree from one language
into another.
The syntactic knowledge that is leveraged in these
methods is tightly coupled with the alignment method it-
self. We have presented a modular constraint that can be
plugged into different alignment algorithms. This has al-
lowed us to test the contribution of the constraint directly.
(Fox, 2002) studied the extent to which the cohesion
constraint holds in a parallel corpus and the reasons for
the violations, but did not apply the constraint to an align-
ment algorithm.
5 Conclusion
We have presented a syntax-based constraint for word
alignment, known as the cohesion constraint. It requires
disjoint English phrases to be mapped to non-overlapping
intervals in the French sentence. Our experiments have
shown that the use of this constraint can provide a rela-
tive reduction in alignment error rate of 36%.
Acknowledgments
We wish to thank Franz Och for providing us with manu-
ally aligned evaluation data. This project is funded by and
jointly undertaken with Sun Microsystems, Inc. We wish
to thank Finola Brady, Bob Kuhns and Michael McHugh
for their help.
References
Hiyan Alshawi, Srinivas Bangalore, and Shona Douglas.
2000. Learning dependency translation models as col-
lections of finite state head transducers. Computa-
tional Linguistics, 26(1):45–60.
Colin Cherry and Dekang Lin. 2003. A probability
model to improve word alignment. Submitted.
Bonnie J. Dorr, Lisa Pearl, Rebecca Hwa, and Nizar
Habash. 2002. Duster: A method for unraveling
cross-language divergences for statistical word-level
alignment. In Stephen D. Richardson, editor, Proceed-
ings of AMTA-02, pages 31–43, Tiburon, CA, October.
Springer.
Heidi J. Fox. 2002. Phrasal cohesion and statistical ma-
chine translation. In Proceedings of EMNLP-02, pages
304–311.
W.A. Gale and K.W. Church. 1991. Identifying word
correspondences in parallel texts. In Proceedings of
the 4th Speech and Natural Language Workshop, pages
152–157. DARPA, Morgan Kaufmann.
Adam Lopez, Michael Nossal, Rebecca Hwa, and Philip
Resnik. 2002. Word-level alignment for multilingual
resource acquisition. In Proceedings of the Workshop
on Linguistic Knowledge Acquisition and Representa-
tion: Bootstrapping Annotated Language Data.
I. Dan Melamed. 1997. A word-to-word model of trans-
lational equivalence. In Proceedings of the ACL-97,
pages 490–497. Association for Computational Lin-
guistics.
Igor A. Mel’ˇcuk. 1987. Dependency syntax: theory and
practice. State University of New York Press, Albany.
Franz J. Och and Hermann Ney. 2000. Improved sta-
tistical alignment models. In Proceedings of the 38th
Annual Meeting of the Association for Computational
Linguistics, pages 440–447, Hong Kong, China, Octo-
ber.
Dekai Wu. 1997. Stochastic inversion transduction
grammars and bilingual parsing of parallel corpora.
Computational Linguistics, 23(3):374–403.
Kenji Yamada and Kevin Knight. 2001. A syntax-based
statistical translation model. In Meeting of the Associ-
ation for Computational Linguistics, pages 523–530.
