Proceedings of the Workshop on Statistical Machine Translation, pages 142–145,
New York City, June 2006. c©2006 Association for Computational Linguistics
TALP Phrase-based statistical translation system for European language
pairs
Marta R. Costa-juss a
Patrik Lambert
Jos·e B. Mari no
Josep M. Crego
Maxim Khalilov
Jos·e A. R. Fonollosa
Department of Signal Theory and Communications
TALP Research Center (UPC)
Barcelona 08034, Spain
(mruiz,jmcrego,agispert,lambert,khalilov,canton,adrian, rbanchs)@gps.tsc.upc.edu
Adri a de Gispert
Rafael E. Banchs
Abstract
This paper reports translation results for
the  Exploiting Parallel Texts for Statis-
tical Machine Translation (HLT-NAACL
Workshop on Parallel Texts 2006). We
have studied different techniques to im-
prove the standard Phrase-Based transla-
tion system. Mainly we introduce two re-
ordering approaches and add morphologi-
cal information.
1 Introduction
Nowadays most Statistical Machine Translation
(SMT) systems use phrases as translation units. In
addition, the decision rule is commonly modelled
through a log-linear maximum entropy framework
which is based on several feature functions (in-
cluding the translation model), hm. Each feature
function models the probability that a sentence e in
the target language is a translation of a given sen-
tence f in the source language. The weights, λi,
of each feature function are typically optimized to
maximize a scoring function. It has the advantage
that additional features functions can be easily in-
tegrated in the overall system.
This paper describes a Phrase-Based system
whose baseline is similar to the system in Costa-
juss a and Fonollosa (2005). Here we introduce
two reordering approaches and add morphological
information. Translation results for all six trans-
lation directions proposed in the shared task are
presented and discussed. More speci cally, four
different languages are considered: English (en),
Spanish (es), French (fr) and German (de); and
both translation directions are considered for the
pairs: EnEs, EnFr, and EnDe. The paper is orga-
nized as follows: Section 2 describes the system;
0This work has been supported by the European Union
under grant FP6-506738 (TC-STAR project) and the TALP
Research Center (under a TALP-UPC-Recerca grant).
Section 3 presents the shared task results; and,  -
nally, in Section 4, we conclude.
2 System Description
This section describes the system procedure fol-
lowed for the data provided.
2.1 Alignment
Given a bilingual corpus, we use GIZA++ (Och,
2003) as word alignment core algorithm. During
word alignment, we use 50 classes per language
estimated by ’mkcls’, a freely-available tool along
with GIZA++. Before aligning we work with low-
ercase text (which leads to an Alignment Error
Rate reduction) and we recover truecase after the
alignment is done.
In addition, the alignment (in speci c pairs of
languages) was improved using two strategies:
Full verb forms The morphology of the verbs
usually differs in each language. Therefore, it is
interesting to classify the verbs in order to address
the rich variety of verbal forms. Each verb is re-
duced into its base form and reduced POS tag as
explained in (de Gispert, 2005). This transforma-
tion is only done for the alignment, and its goal
is to simplify the work of the word alignment im-
proving its quality.
Block reordering (br) The difference in word
order between two languages is one of the most
signi cant sources of error in SMT. Related works
either deal with reordering in general as (Kanthak
et al., 2005) or deal with local reordering as (Till-
mann and Ney, 2003). We report a local reorder-
ing technique, which is implemented as a pre-
processing stage, with two applications: (1) to im-
prove only alignment quality, and (2) to improve
alignment quality and to infer reordering in trans-
lation. Here, we present a short explanation of the
algorithm, for further details see Costa-juss a and
Fonollosa (2006).
142
Figure 1: Example of an Alignment Block, i.e. a
pair of consecutive blocks whose target translation
is swapped
This reordering strategy is intended to infer the
most probable reordering for sequences of words,
which are referred to as blocks, in order to mono-
tonize current data alignments and generalize re-
ordering for unseen pairs of blocks.
Given a word alignment, we identify those pairs
of consecutive source blocks whose translation is
swapped, i.e. those blocks which, if swapped,
generate a correct monotone translation. Figure 1
shows an example of these pairs (hereinafter called
Alignment Blocks).
Then, the list of Alignment Blocks (LAB) is
processed in order to decide whether two consec-
utive blocks have to be reordered or not. By using
the classi cation algorithm, see the Appendix, we
divide the LAB in groups (Gn,n = 1 ... N). In-
side the same group, we allow new internal com-
bination in order to generalize the reordering to
unseen pairs of blocks (i.e. new Alignment Blocks
are created). Based on this information, the source
side of the bilingual corpora are reordered.
In case of applying the reordering technique for
purpose (1), we modify only the source training
corpora to realign and then we recover the origi-
nal order of the training corpora. In case of using
Block Reordering for purpose (2), we modify all
the source corpora (both training and test), and we
use the new training corpora to realign and build
the  nal translation system.
2.2 Phrase Extraction
Given a sentence pair and a corresponding word
alignment, phrases are extracted following the cri-
terion in Och and Ney (2004). A phrase (or
bilingual phrase) is any pair of m source words
and n target words that satis es two basic con-
straints: words are consecutive along both sides
of the bilingual phrase, and no word on either side
of the phrase is aligned to a word out of the phrase.
We limit the maximum size of any given phrase to
7. The huge increase in computational and storage
cost of including longer phrases does not provide
a signi cant improvement in quality (Koehn et al.,
2003) as the probability of reappearance of larger
phrases decreases.
2.3 Feature functions
Conditional and posterior probability (cp, pp)
Given the collected phrase pairs, we estimate the
phrase translation probability distribution by rela-
tive frequency in both directions.
The target language model (lm) consists of an
n-gram model, in which the probability of a trans-
lation hypothesis is approximated by the product
of word n-gram probabilities. As default language
model feature, we use a standard word-based 5-
gram language model generated with Kneser-Ney
smoothing and interpolation of higher and lower
order n-grams (Stolcke, 2002).
The POS target language model (tpos) con-
sists of an N-gram language model estimated over
the same target-side of the training corpus but us-
ing POS tags instead of raw words.
The forward and backwards lexicon mod-
els (ibm1, ibm1−1) provide lexicon translation
probabilities for each phrase based on the word
IBM model 1 probabilities. For computing the
forward lexicon model, IBM model 1 probabili-
ties from GIZA++ source-to-target alignments are
used. In the case of the backwards lexicon model,
target-to-source alignments are used instead.
The word bonus model (wb) introduces a sen-
tence length bonus in order to compensate the sys-
tem preference for short output sentences.
The phrase bonus model (pb) introduces a con-
stant bonus per produced phrase.
2.4 Decoding
The search engine for this translation system is de-
scribed in Crego et al. (2005) which takes into ac-
count the features described above.
Using reordering in the decoder (rgraph) A
highly constrained reordered search is performed
by means of a set of reordering patterns (linguisti-
cally motivated rewrite patterns) which are used to
143
extend the monotone search graph with additional
arcs. See the details in Crego et al. (2006).
2.5 Optimization
It is based on a simplex method (Nelder and
Mead, 1965). This algorithm adjusts the log-
linear weights in order to maximize a non-linear
combination of translation BLEU and NIST: 10∗
log10((BLEU ∗100) + 1) + NIST. The max-
imization is done over the provided development
set for each of the six translation directions under
consideration. We have experimented an improve-
ment in the coherence between all the automatic
 gures by integrating two of these  gures in the
optimization function.
3 Shared Task Results
3.1 Data
The data provided for this shared task corresponds
to a subset of the of cial transcriptions of the
European Parliament Plenary Sessions, and it
is available through the shared task website at:
http://www.statmt.org/wmt06/shared-task/.
The development set used to tune the system
consists of a subset (500  rst sentences) of the
of cial development set made available for the
Shared Task.
We carried out a morphological analysis of the
data. The English POS-tagging has been carried
out using freely available TNT tagger (Brants,
2000). In the Spanish case, we have used the
Freeling (Carreras et al., 2004) analysis tool
which generates the POS-tagging for each input
word.
3.2 Systems con gurations
The baseline system is the same for all tasks and
includes the following features functions: cp, pp,
lm, ibm1, ibm1−1, wb, pb. The POStag target
language model has been used in those tasks for
which the tagger was available. Table 1 shows the
reordering con guration used for each task.
The Block Reordering (application 2) has been
used when the source language belongs to the Ro-
manic family. The length of the block is lim-
ited to 1 (i.e. it allows the swapping of single
words). The main reason is that speci c errors are
solved in the tasks from a Romanic language to
a Germanic language (as the common reorder of
Noun + Adjective that turns into Adjective +
Noun). Although the Block Reordering approach
Task Reordering Con guration
Es2En br2
En2Es br1 + rgraph
Fr2En br2
En2Fr br1 + rgraph
De2En -
En2De -
Table 1: Additional reordering models for each
task: br1 (br2) stands for Block Reordering ap-
plication 1 (application 2); and rgraph refers to
the reordering integrated in the decoder
does not depend on the task, we have not done
the corresponding experiments to observe its ef-
 ciency in all the pairs used in this evaluation.
The rgraph has been applied in those cases
where: we do not use br2 (there is no sense in
applying them simultaneously); and we have the
tagger for the source language model available.
In the case of the pair GeEn, we have not exper-
imented any reordering, we left the application of
both reordering approaches as future work.
3.3 Discussion
Table 2 presents the BLEU scores evaluated on the
test set (using TRUECASE) for each con guration.
The of cial results were slightly better because a
lowercase evaluation was used, see (Koehn and
Monz, 2006).
For both, Es2En and Fr2En tasks, br helps
slightly. The improvement of the approach de-
pends on the quality of the alignment. The better
alignments allow to extract higher quality Align-
ment Blocks (Costa-juss a and Fonollosa, 2006).
The En2Es task is improved when adding both
br1 and rgraph. Similarly, the En2Fr task seems to
perform fairly well when using the rgraph. In this
case, the improvement of the approach depends on
the quality of the alignment patterns (Crego et al.,
2006). However, it has the advantage of delay-
ing the  nal decision of reordering to the overall
search, where all models are used to take a fully
informed decision.
Finally, the tpos does not help much when trans-
lating to English. It is not surprising because it was
used in order to improve the gender and number
agreement, and in English there is no need. How-
ever, in the direction to Spanish, the tpos added
to the corresponding reordering helps more as the
Spanish language has gender and number agree-
ment.
144
Task Baseline +tpos +rc +tpos+rc
Es2En 29.08 29.08 29.89 29.98
En2Es 27.73 27.66 28.79 28.99
Fr2En 27.05 27.06 27.43 27.23
En2Fr 26.16 - 27.80 -
De2En 21.59 21.33 - -
En2De 15.20 - - -
Table 2: Results evaluated using TRUECASE on
the test set for each con guration: rc stands for
Reordering Con guration and refers to Table 1.
The bold results were the con gurations submit-
ted.
4 Conclusions
Reordering is important when using a Phrase-
Based system. Although local reordering is sup-
posed to be included in the phrase structure, per-
forming local reordering improves the translation
quality. In fact, local reordering, provided by the
reordering approaches, allows for those general-
izations which phrases could not achieve. Re-
ordering in the DeEn task is left as further work.

References
T. Brants. 2000. Tnt - a statistical part-of-speech tag-
ger. Proceedings of the Sixth Applied Natural Lan-
guage Processing.
X. Carreras, I. Chao, L. Padr·o, and M. Padr·o. 2004.
Freeling: An open-source suite of language analyz-
ers. 4th Int. Conf. on Language Resources and Eval-
uation, LREC’04.
M. R. Costa-juss a and J.A.R. Fonollosa. 2005. Im-
proving the phrase-based statistical translation by
modifying phrase extraction and including new fea-
tures. Proceedings of the ACL Workshop on Build-
ing and Using Parallel Texts: Data-Driven Machine
Translation and Beyond.
M. R. Costa-juss a and J.A.R. Fonollosa. 2006. Using
reordering in statistical machine translation based on
alignment block classi cation. Internal Report.
J.M. Crego, J. Mari no, and A. de Gispert. 2005.
An Ngram-based statistical machine translation de-
coder. Proc. of the 9th Int. Conf. on Spoken Lan-
guage Processing, ICSLP’05.
J. M. Crego, A. de Gispert, P. Lambert, M. R.
Costa-juss a, M. Khalilov, J. Mari no, J. A. Fonol-
losa, and R. Banchs. 2006. Ngram-based smt
system enhanced with reordering patterns. HLT-
NAACL06 Workshop on Building and Using Paral-
lel Texts: Data-Driven Machine Translation and Be-
yond, June.
A. de Gispert. 2005. Phrase linguistic classi cation for
improving statistical machine translation. ACL 2005
Students Workshop, June.
S. Kanthak, D. Vilar, E. Matusov, R. Zens, and H.
Ney. 2005. Novel reordering approaches in phrase-
based statistical machine translation. Proceedings
of the ACL Workshop on Building and Using Par-
allel Texts: Data-Driven Machine Translation and
Beyond, pages 167 174, June.
P. Koehn and C. Monz. 2006. Manual and automatic
evaluation of machine translation between european
languages. June.
P. Koehn, F.J. Och, and D. Marcu. 2003. Statistical
phrase-based translation. Proc. of the Human Lan-
guage Technology Conference, HLT-NAACL’2003,
May.
J.A. Nelder and R. Mead. 1965. A simplex method
for function minimization. The Computer Journal,
7:308 313.
F.J. Och and H. Ney. 2004. The alignment template
approach to statistical machine translation. Compu-
tational Linguistics, 30(4):417 449, December.
F.J. Och. 2003. Giza++ software. http://www-
i6.informatik.rwth-aachen.de/ och/ soft-
ware/giza++.html.
A. Stolcke. 2002. Srilm - an extensible language mod-
eling toolkit. Proc. of the 7th Int. Conf. on Spoken
Language Processing, ICSLP’02, September.
C. Tillmann and H. Ney. 2003. Word reordering and
a dynamic programming beam search algorithm for
statistical machine translation. Computational Lin-
guistics, 29(1):97 133, March.
