Proceedings of the ACL Workshop on Building and Using Parallel Texts, pages 87–90,
Ann Arbor, June 2005. c©Association for Computational Linguistics, 2005
Symmetric Probabilistic Alignment
Ralf D. Brown Jae Dong Kim Peter J. Jansen Jaime G. Carbonell
Language Technologies Institute
Carnegie Mellon University
Pittsburgh, PA 15213
{ralf,jdkim,pjj,jgc}@cs.cmu.edu
Abstract
We recently decided to develop a new
alignment algorithm for the purpose
of improving our Example-Based Ma-
chine Translation (EBMT) system’s per-
formance, since subsentential alignment is
critical in locating the correct translation
for a matched fragment of the input. Un-
like most algorithms in the literature, this
new Symmetric Probabilistic Alignment
(SPA) algorithm treats the source and tar-
get languages in a symmetric fashion.
In this short paper, we outline our basic
algorithm and some extensions for using
context and positional information, and
compare its alignment accuracy on the
Romanian-English data for the shared task
with IBM Model 4 and the reported results
from the prior workshop.
1 Symmetric Probabilistic Alignment
(SPA)
In subsentential alignment, mappings are produced
from words or phrases in the source language sen-
tence and those words or phrases in the target lan-
guage sentence that best express their meaning.
An alignment algorithm takes as input a bilingual
corpus consisting of corresponding sentence pairs
and strives to find the best possible alignment in the
second for selected n-grams (sequences of n words)
in the first language. The alignments are based on
a number of factors, including a bilingual dictionary
(preferably a probabilistic one), the position of the
words, invariants such as numbers and punctuation,
and so forth.
For our baseline algorithm, we make the follow-
ing simplifying assumptions, each of which we in-
tend to relax in future work, and the last of which
has already been partially relaxed:
1. A fixed bilingual probabilistic dictionary is
available.
2. Fragments (word sequences) are translated in-
dependently of surrounding context.
3. Contiguous fragments of source language text
are translated into contiguous fragments in the
target language text.
Unlike the work of (Marcu and Wong, 2002),
our alignment algorithm is not generative and does
not use the idea of a bag of concepts from which
the phrases in the sentence pair arise. It is, rather,
intended to find the corresponding target-language
phrase given a specific source-language phrase of in-
terest, as required by our EBMT system after find-
ing a match between the input and the training data
(Brown, 2004).
1.1 Baseline Algorithm
Our baseline algorithm is based on maximizing the
probability of bi-directional translations of individ-
ual words between a selected n-gram in the source
language and every possible n-gram in the corre-
sponding paired target language sentence. No posi-
tional preference assumptions are made, nor are any
length preservation assumptions made. That is, an
n-gram may translate to an m-gram, for any val-
ues of n or m bounded by the source and target
sentence lengths, respectively. Finally a smooth-
ing factor is used to avoid singularities (i.e. avoid-
ing zero-probabilities for unknown words, or words
never translated before in a way consistent with the
dictionary).
87
Given a source-language sentence
S1 : s0,s1,...,si,...,si+k,...,sn (1)
in the bilingual corpus, where si,...,si+k is a phrase
of interest, and the corresponding target language
sentence S2 is
S2 : t0,t1,...,tj,...,tj+l,...,tm (2)
the values of j and l are to be determined.
Then the segment we try to obtain is the target
fragment ˆFT with the highest probability of all pos-
sible fragments of S2 to be a mutual translation with
the given source fragment, or
ˆFT = argmax{F
T} (p(si,...,si+k $tj,...,tj+l))(3)
All possible segments can be checked in O(m2)
time, where m is the target language length, because
we will check m 1-word segments, m 1 two-word
segments, and so on. If we bound the target language
n-grams to a maximal length k, then the complexity
is linear, i.e. O(km).
The score of the best possible alignment is com-
puted as follows: Let LT be the Target Language
Vocabulary, s a source word, ti be target segment
words, and V = fti 2fLTgji  1gthe translation
word set of s,
We define the translation relation probability
p(Tr(s)2ft0,t1,...,tkg) as follows:
1. p(Tr(s) 2 ft0,t1,...,tkg) = max(p(tijs))
for all ti 2 ft0,t1,...,tkg when ftijti 2
ft0,t1,...,tkggis not empty.
2. p(Tr(s)2ft0,t1,...,tkg) = 0 otherwise.
Then the score of the best alignment is
S ˆFT = max
{FT}
SFT (4)
where the score can be written as two components
SFT = P1 P2 (5)
which can be further specified as
P1 =
parenleftBigg kproductdisplay
m=0
max (p (Tr(si+m)2ftj...j+lg) ,epsilon1)
parenrightBigg 1k+1
(6)
P2 =
parenleftBigg lproductdisplay
n=0
max (p (Tr(tj+n)2fsi...i+kg) ,epsilon1)
parenrightBigg 1l+1
(7)
where epsilon1 is a very small probability used as a smooth-
ing value.
1.2 Length Penalty
The ratio between source and target segment (n-
gram) lengths should be comparable to the ratio be-
tween the lengths of the source and target sentences,
though certainly variation is possible. Therefore, we
add a penalty function to the alignment probability
that increases with the discrepancy between the two
ratios.
Let the length of the source language segment be
i and the length of a target language segment under
consideration be j. Given a source language sen-
tence length of n (in the corpus sentence containing
the fragment) and its corresponding target language
length of m. The expected target segment length is
then given by ˆj = i mn . Further defining an allow-
able difference AD, our implementation calculates
the length penalty LP as follows, with the value of
the exponent determined empirically:
LPFT = min


parenleftBigg
jj ˆjj
AD
parenrightBigg4
, 1

 (8)
The score for a segment including the penalty func-
tion is then:
SFT  SFT  (1 LPFT ) (9)
Note that, as intended, the score is forced to 0 when
the length differencejj ˆjj> AD.
1.3 Distortion Penalty
For closely-related language pairs which tend to
have similar word orders, we introduce a distortion
penalty to penalize the alignment score of any can-
didate target fragment which is out of the expected
position range. First, we calculate CE, the expected
center of the candidate target fragment using CFS ,
the center of the source fragment and the ratio of
target- to source-sentence length.
CE = CFS  mn (10)
88
Then we calculate an allowed distance limit of the
center Dallowed using a constant distance limit value
DL and the ratio of actual target sentence length to
average target sentence length.
Dallowed = DL mm
average
(11)
Let Dactual be the actual distance difference be-
tween the candidate target fragment’s center and the
expected center, and set
SFT  



0, ifDactual  Dallowed
SFT
(Dactual−Dallowed+1)2 , otherwise
(12)
Furthermore, we think that we can apply this
penalty to language pairs which have lower word-
order similarities than e.g. French-English. Because
there might exist certain positional relationships be-
tween such language pairs, if we can calculate the
expected position using each language’s sentence
structure, we can apply a distortion penalty to the
candidate alignments.
1.4 Anchor Context
If the adjacent words of the source fragment and the
candidate target fragment are translations of each
other, we expect that this alignment is more likely
to be correct. We boost SFT with the anchor context
alignment score SACp,
SACp = P(si−1 $tj−1) P(si+k $tj+l) (13)
SFT  (SFT )λ (SACp)1−λ (14)
Empirically, we found this combination gives the
best score for French-English when λ = 0.6 and
for Romanian-English when λ = 0.8, and leads to
better results than the similar formula
SFT  λ SFT + (1 λ) SACp (15)
2 Experimental Design
In previous work (Kim et al., 2005), we tested our
alignment method on a set of French-English sen-
tence pairs taken from the Canadian Hansard corpus
and on a set of English-Chinese sentence pairs, and
compared the results to human alignments. For the
present workshop, we chose to use the Romanian-
English data which had been made available.
Due to a lack of time prior to the period of the
shared task, we merely re-used the parameters which
had been tuned for French-English, rather than tun-
ing the alignment parameters specifically for the de-
velopment data.
SPA was run under three experimental conditions.
In the first, labeled “SPA (c)” in Tables 1 and 2, SPA
was instructed to examine only contiguous target
phrases as potential alignments for a given source
phrase. In the second, labeled “SPA (n)”, a noncon-
tiguous target alignment consisting of two contigu-
ous segments with a gap between them was permit-
ted in addition to contiguous target alignments. The
third condition (“SPA (h)”) examined the impact of
a small amount of manual alignment information on
the selection of contiguous alignments. Unlike the
first two conditions, the presence of additional data
beyond the training corpus forces SPA(h) into the
Unlimited Resources track.
We had a native Romanian speaker hand-align
204 sentence pairs from the training corpus, and
extracted 732 distinct translation pairs from those
alignments, of which 450 were already present in
the automatically-generated dictionaries. The new
translation pairs were added to the dictionaries for
the SPA(h) condition and the translation probabili-
ties for the existing pairs were increased to reflect
the increased confidence in their correctness. Had
more time been available, we would have investi-
gated more sophisticated means of integrating the
human knowledge into the translation dictionaries.
3 Results and Conclusions
Table 1 compares the performance of SPA on what
is now the development data against the submissions
with the best AER values reported by (Mihalcea
and Pedersen, 2003) for the participants in the 2003
workshop, including CMU, MITRE, RALI, Univer-
sity of Alberta, and XRCE 1. As SPA generates only
SURE alignments, the values in Table 1 are SURE
alignments under the NO-NULL-Align scoring con-
dition for all systems except Fourday, which did not
generate SURE alignments.
Despite the fact that SPA was designed specifi-
cally for phrase-to-phrase alignments rather than the
1Citations for individual participants’ papers have been
omitted for space reasons; all appear in the same proceedings.
89
Method Prec% Rec% F1% AER
SPA (c) 64.47 62.68 63.56 36.44
SPA (n) 64.38 62.70 63.53 36.47
SPA (h) 64.61 62.55 63.56 36.44
Fourday 52.83 42.86 47.33 52.67
UMD.RE.2 58.29 49.99 53.82 46.61
BiBr 70.65 55.75 62.32 41.39
Ralign 92.00 45.06 60.49 35.24
XRCEnolm 82.65 62.44 71.14 28.86
Table 1: Romanian-English alignment results (De-
velopment Set, NO-NULL-Align)
word-to-word alignments needed for the shared task
and was not tuned for this corpus, its performance is
competitive with the best of the systems previously
used for the shared task. We thus decided to submit
runs for the official 2005 evaluation, whose resulting
scores are shown in Table 2.
On the development set, noncontiguous align-
ments resulted in slightly lower precision than con-
tiguous alignments, which was not unexpected, but
recall does not increase enough to improve F1 or
AER. The modified dictionaries improved preci-
sion slightly, as anticipated, but lowered recall suffi-
ciently to have no net effect on F1 or AER.
The evaluation set proved to be very similar in dif-
ficulty to the development data, resulting in scores
that were very close to those achieved on the dev-test
set. Noncontiguous alignments again proved to have
a very small negative effect on AER resulting from
reduced precision, but this time the altered dictionar-
ies for SPA(h) resulted in a substantial reduction in
recall, considerably harming overall performance.
After the shared task was complete, we performed
some tuning of the alignment parameters for the
Romanian-English development test set, and found
that the French-English-tuned parameters were close
to optimal in performance. The AER on the develop-
ment test set for the SPA(c) contiguous alignments
condition decreased from 36.44% to 36.11% after
the re-tuning.
4 Future Work
Enhancements in the extraction of word-to-word
alignments from what is fundamentally a phrase-to-
phrase alignment algorithm could probably further
Method Prec% Recall% F1% AER%
SPA (c) 64.96 61.34 63.10 36.90
SPA (n) 64.91 61.34 63.07 36.93
SPA (h) 64.60 60.54 62.50 37.50
Table 2: Evaluation results (NO-NULL-Align)
improve results on the Romanian-English data. We
also intend to investigate principled, seamless inte-
gration of manual alignments and dictionaries with
probabilistic ones, since the ad hoc method proved
detrimental. Finally, a more detailed performance
analysis is in order, to determine whether the close
balance of precision and recall is inherent in the bidi-
rectionality of the algorithm or merely coincidence.
5 Acknowledgements
We would like to thank Lucian Vlad Lita for provid-
ing manual alignments.

References
Ralf D. Brown. 2004. A Modified Burrows-Wheeler
Transform for Highly-Scalable Example-Based Trans-
lation. In Machine Translation: From Real Users
to Research, Proceedings of the 6th Conference of
the Association for Machine Translation in the Amer-
icas (AMTA-2004), volume 3265 of Lecture Notes
in Artificial Intelligence, pages 27–36. Springer Ver-
lag, September-October. http://www.cs.cmu.-
edu/˜ralf/papers.html.
Jae Dong Kim, Ralf D. Brown, Peter J. Jansen, and
Jaime G. Carbonell. 2005. Symmetric Probabilistic
Alignment for Example-Based Translation. In Pro-
ceedings of the Tenth Workshop of the European Asso-
cation for Machine Translation (EAMT-05), May. (to
appear).
Daniel Marcu and William Wong. 2002. A Phrase-
Based, Joint Probability Model for Statistical Machine
Translation. In Proceedings of the Conference on
Empirical Methods in Natural Language Processing
(EMNLP-2002), July. http://www.isi.edu/-
˜marcu/papers.html.
Rada Mihalcea and Ted Pedersen. 2003. An Evalua-
tion Exercise for Word Alignment. In Proceedings of
the HLT-NAACL 2003 Workshop: Building and Using
Parallel Texts: Data Driven Machine Translation and
Beyond, pages 1–10. Association for Computational
Linguistics, May.
