A Web-based Demonstrator of a Multi-lingual Phrase-based
Translation System
Roldano Cattoni, Nicola Bertoldi, Mauro Cettolo, Boxing Chen and Marcello Federico
ITC-irst - Centro per la Ricerca Scientifica e Tecnologica
38050 Povo - Trento, Italy
{surname}@itc.it
Abstract
This paper describes a multi-lingual
phrase-based Statistical Machine Transla-
tion system accessible by means of a Web
page. The user can issue translation re-
quests from Arabic, Chinese or Spanish
into English. The same phrase-based sta-
tistical technology is employed to realize
the three supported language-pairs. New
language-pairs can be easily added to the
demonstrator. The Web-based interface al-
lows the use of the translation system to
any computer connected to the Internet.
1 Introduction
At this time, Statistical Machine Translation
(SMT) has empirically proven to be the most
competitive approach in international competi-
tions like the NIST Evaluation Campaigns1 and
the International Workshops on Spoken Language
Translation (IWSLT-20042 and IWSLT-20053).
In this paper we describe our multi-lingual
phrase-based Statistical Machine Translation sys-
tem which can be accessed by means of a Web
page. Section 2 presents the general log-linear
framework to SMT and gives an overview of
our phrase-based SMT system. In section 3
the software architecture of the demo is out-
lined. Section 4 focuses on the currently supported
language-pairs: Arabic-to-English, Chinese-to-
English and Spanish-to-English. In section 5 the
Web-based interface of the demo is described.
1http://www.nist.gov/speech/tests/mt/
2http://www.slt.atr.jp/IWSLT2004/
3http://www.is.cs.cmu.edu/iwslt2005/
2 SMT System Description
2.1 Log-Linear Model
Given a string f in the source language, the goal of
the statistical machine translation is to select the
string e in the target language which maximizes
the posterior distribution Pr(e | f). By introduc-
ing the hidden word alignment variable a, the fol-
lowing approximate optimization criterion can be
applied for that purpose:
e∗ = argmaxe Pr(e | f)
= argmaxe summationdisplay
a
Pr(e,a | f)
≈ argmaxe,a Pr(e,a | f)
Exploiting the maximum entropy (Berger et
al., 1996) framework, the conditional distribu-
tion Pr(e,a | f) can be determined through
suitable real valued functions (called features)
hr(e,f,a),r = 1...R, and takes the parametric
form:
pλ(e,a | f) ∝ exp{
Rsummationdisplay
r=1
λrhr(e,f,a)}
The ITC-irst system (Chen et al., 2005) is
based on a log-linear model which extends the
original IBM Model 4 (Brown et al., 1993)
to phrases (Koehn et al., 2003; Federico and
Bertoldi, 2005). In particular, target strings e are
built from sequences of phrases ˜e1 ... ˜el. For each
target phrase ˜e the corresponding source phrase
within the source string is identified through three
random quantities: the fertility φ, which estab-
lishes its length; the permutation pii, which sets
its first position; the tablet ˜f, which tells its word
string. Notice that target phrases might have fer-
tility equal to zero, hence they do not translate any
91
source word. Moreover, uncovered source posi-
tions are associated to a special target word (null)
according to specific fertility and permutation ran-
dom variables.
The resulting log-linear model applies eight fea-
ture functions whose parameters are either esti-
mated from data (e.g. target language models,
phrase-based lexicon models) or empirically fixed
(e.g. permutation models). While feature func-
tions exploit statistics extracted from monolingual
or word-aligned texts from the training data, the
scaling factors λ of the log-linear model are esti-
mated on the development data by applying a min-
imum error training procedure (Och, 2004).
2.2 Decoding Strategy
The translation of an input string is performed by
the SMT system in two steps. In the first pass a
beam search algorithm (decoder) computes a word
graph of translation hypotheses. Hence, either
the best translation hypothesis is directly extracted
from the word graph and output, or an N-best list
of translations is computed (Tran et al., 1996). The
N-best translations are then re-ranked by applying
additional features and the top ranking translation
is finally output.
The decoder exploits dynamic programming,
that is the optimal solution is computed by expand-
ing and recombining previously computed partial
theories. A theory is described by its state which is
the only information needed for its expansion. Ex-
panded theories sharing the same state are recom-
bined, that is only the best scoring one is stored
for further expansions. In order to output a word
graph of translations, backpointers to all expanded
theories are mantained, too.
To cope with the large number of generated the-
ories some approximations are introduced during
the search: less promising theories are pruned off
(beam search) and a new source position is se-
lected by limiting the number of vacant positions
on the left-hand and the distance from the left most
vacant position (re-ordering constraints).
2.3 Phrase extraction and model training
Training of the phrase-based translation model
requires a parallel corpus provided with word-
alignments in both directions, i.e. from source
to target positions, and viceversa. This pre-
processing step can be accomplished by applying
the GIZA++ toolkit (Och and Ney, 2003) that pro-
vides Viterbi alignments based on IBM Model-4.
Starting from the parallel training corpus, pro-
vided with direct and inverted alignments, the so-
called union alignment (Och and Ney, 2003) is
computed.
Phrase-pairs are extracted from each sentence pair
which correspond to sub-intervals of the source
and target positions, J and I, such that the union
alignment links all positions of J into I and all
positions of I into J. In general, phrases are ex-
tracted with maximum length in the source and tar-
get defined by the parameters Jmax and Imax. All
such phrase-pairs are efficiently computed by an
algorithm with complexity O(lImaxJ2max) (Cet-
tolo et al., 2005).
Given all phrase-pairs extracted from the train-
ing corpus, lexicon probabilities and fertility prob-
abilities are estimated.
Target language models (LMs) used by the de-
coder and rescoring modules are, respectively,
estimated from 3-gram and 4-gram statistics
by applying the modified Kneser-Ney smoothing
method (Goodman and Chen, 1998). LMs are es-
timated with an in-house software toolkit which
also provides a compact binary representation of
the LM which is used by the decoder.
3 Demo Architecture
Figure 1 shows the two-layer architecture of the
demo. At the bottom lie the programs that provide
the actual translation services: for each language-
pair a wrapper coordinates the activity of a special-
ized pre-processing tool and a MT decoder. The
translation programs run on a grid-based cluster
of high-end PCs to optimize the processing speed.
All the wrappers communicate with the MT front-
end whose main task is to forward translation re-
quests to the appropriate language-pair wrapper
and to report an error in case of wrong requests
(e.g. unsupported language-pair). It is worth
noticing here that a new language-pair can be eas-
ily added to the system with a minimal interven-
tion on the code of the MT front-end.
At the top of the architecture are the programs
that provide the interface with the user. This layer
is separated from the translation layer (hosted by
internal machines only) by means of a firewall.
The user interface is implemented as a Web page
in which a translation request (a source sentence
and a language-pair) is input by means of an
HTML form. The cgi script invocated by the form
manages the interaction with the MT front-end.
92
Web Page
(form)
scriptCGI
lang 1wrapper
prepro−
cessing
MT
decoder
prepro−
cessing
MT
decoder
wrapperlang 2
prepro−
cessing
MT
decoder
wrapperlang N...
MTfront−end
firewall
external host
internal hosts
fast machines
Figure 1: Architecture of the demo. For each
language-pair a set of programs (in particular the
MT decoder) provides the translation service. The
request issued by the user on the Web page is
sent by the cgi script to the MT front-end. The
translation is then performed on the appropriate
language-pair service and the output sent back to
the Web browser.
When a user issues a translation request after
filling the form fields, the cgi script sends the re-
quest to the MT front-end and waits for its reply.
The input sentence is then forwarded to the wrap-
per of the appropriate language-pair. After a pre-
processing step, the actual translation is performed
by the specific MT decoder. The output in the tar-
get language is then sent back to the user’s Web
browser through the chain in the reverse order.
From a technical point of view, the inter-process
communication is realized by means of standard
TCP-IP sockets. As far as the encoding of texts is
concerned, all the languages are encoded in UTF-
8: this allows to manage the processing phase in
an uniform way and to render graphically different
character sets.
4 The supported language-pairs
Although there is no theoretical limit to the num-
ber of supported language-pairs, the current ver-
sion of the demo provides translations to English
from three source languages: Arabic, Chinese and
Spanish. For demonstration purpose, three differ-
ent application domains are covered too.
Arabic-to-English (Tourism)
The Arabic-to-English system has been trained
with the data provided by the International Work-
shop on Spoken Language Translation 2005 The
context is that of the Basic Traveling Expres-
sion Corpus (BTEC) task (Takezawa et al., 2002).
BTEC is a multilingual speech corpus which con-
tains sentences coming from phrase books for
tourists. Training set includes 20k sentences con-
taining 159K Arabic and 182K English running
words; vocabulary size is 18K for Arabic, 7K for
English.
Chinese-to-English (Newswire)
The Chinese-to-English system has been trained
with the data provided by the NIST MT Evaluation
Campaign 2005 , large-data condition. In this case
parallel data are mainly news-wires provided by
news agencies. Training set includes 71M Chinese
and 77M English running words; vocabulary size
is 157K for Chinese, 214K for English.
Spanish-to-English (European Parliament)
The Spanish-to-English system has been trained
with the data provided by the Evaluation Cam-
paign 2005 of the European integrated project TC-
STAR4. The context is that of the speeches of
the European Parliament Plenary sessions (EPPS)
from April 1996 to October 2004. Training set for
the Final Text Edition transcriptions includes 31M
Spanish and 30M English running words; vocabu-
lary size is 140K for Spanish, 94K for English.
5 The Web-based Interface
Figure 2 shows a snapshot of the Web-based in-
terface of the demo – the URL has been removed
to make this submission anonymous. In the upper
part of the page the user provides the two informa-
tion required for the translation: the source sen-
tence can be input in a 80x5 textarea html struc-
ture, while the language-pair can be selected by
means of a set a radio-buttons. The user can reset
the input area or send the translation request by
means of standard reset and submit buttons. Some
examples of bilingual sentences are provided in
the lower part of the page.
4http://www.tc-star.org
93
Figure 2: A snapshot of the Web-based interface.
The user provides the sentence to be translated
in the desired language-pair. Some examples of
bilingual sentences are also available to the user.
The output of a translation request is simple: the
requested source sentence, the translation in the
target language and the selected language-pair are
presented to the user. Figure 3 shows an example
of an Arabic sentence translated into English.
We plan to extend the interface with the pos-
sibility for the user to ask additional information
about the translation – e.g. the number of explored
theories or the score of the first-best translation.
6 Acknowledgements
This work has been funded by the European Union
under the integrated project TC-STAR - Technol-
ogy and Corpora for Speech to Speech Translation
- (IST-2002-FP6-506738, http://www.tc-star.org).
References
A.L. Berger, S.A. Della Pietra, and V.J. Della Pietra.
1996. A Maximum Entropy Approach to Natural
Language Processing. Computational Linguistics,
22(1):39–71.
P. F. Brown, S. A. Della Pietra, V. J. Della Pietra, and
R. L. Mercer. 1993. The Mathematics of Statistical
Machine Translation: Parameter Estimation. Com-
putational Linguistics, 19(2):263–313.
Figure 3: Example of an Arabic sentence trans-
lated into English.
Mauro Cettolo, Marcello Federico, Nicola Bertoldi,
Roldano Cattoni, and Boxing Chen. 2005. A look
inside the itc-irst smt system. In Proceedings of the
10th Machine Translation Summit, pages 451–457,
Phuket, Thailand, September.
B. Chen, R. Cattoni, N. Bertoldi, M. Cettolo, and
M. Federico. 2005. The ITC-irst SMT System for
IWSLT-2005. In Proceedings of the IWSLT 2005,
Pittsburgh, USA.
M. Federico and N. Bertoldi. 2005. A Word-to-Phrase
Statistical Translation Model. ACM Transactions on
Speech and Language Processing. to appear.
J. Goodman and S. Chen. 1998. An empirical study of
smoothing techniques for language modeling. Tech-
nical Report TR-10-98, Harvard University, August.
P. Koehn, F. J. Och, and D. Marcu. 2003. Statistical
phrase-based translation. In Proceedings of HLT-
NAACl 2003, pages 127–133, Edmonton, Canada.
F. J. Och and H. Ney. 2003. A systematic comparison
of various statistical alignment models. Computa-
tional Linguistics, 29(1):19–51.
F.J. Och. 2004. Minimum Error Rate Training in
Statistical Machine Translation. In Proceedings of
ACL, Sapporo, Japan.
T. Takezawa, E. Sumita, F. Sugaya, H. Yamamoto, and
S. Yamamoto. 2002. Toward a Broad-Coverage
Bilingual Corpus for Speech Translation of Travel
Conversations in the Real World. In Proceedings of
3rd LREC, pages 147–152, Las Palmas, Spain.
B. H. Tran, F. Seide, and V. Steinbiss. 1996. A Word
Graph based N-Best Search in Continuous Speech
Recognition. In Proceedings of ICLSP, Philadel-
phia, PA, USA.
94
