WebDIPLOMAT: A Web-Based Interactive Machine Translation 
System 
Christol)her Hogan and Robert Frederking 
I,anguagc Techliologies \]institute 
Pittsburgh, \]~ennsylva.nia, USA 
chogan~e-l±ngo, com, ref@cs, cmu. edu 
Abstract 
We have implenlented a.n interactive, Wel)-based, 
chat-style machine translation system, SUpl)ort;ing 
speech recognition and synthesis, local- or third- 
party correction of speech recognition and machine 
tra.nslation output, a.nd online learning. The un- 
derlying client-server architecture, implemented in 
.la.va TM, pl:ovides remote, distributed computation 
for the translation and speech sut)systems. We fur- 
ther describe our Web-based user interthces, whMi 
can easily produce different uscflfl eonfigllrartions. 
1. Introduction 
The World Wide Web (Berners-l,ee, 11989) seems 
to be all ideal environment for machine transla- 
tion: it is easily accessible around the world using 
freely-available, easy-to-use tools which are ava.ilable 
to persons speaking a. nlyriad of langua.ges, all of 
whom would like to I)e able to communicate with 
one another without  barriers. IlL. is there- 
fore not too surl)rising that a few companies have 
attempted to make machine translation available 
in this medium (AltaVista, 1999; FreeTranslation, 
1.q99; hlt.erTran, 1999). '.l'he l)riinary use identified 
for these tra.nslators has been that of translating 
Web pages or amusing oneself with the inadequa.- 
cies of ma.dfine translation (Yang and l,ange, 1998). 
What these systems cannot be used for is real-time, 
speech-to-speech communication with translation. 
l{eal-time communication over the hiternet has 
more properly been the (lomain of '<chat" l)roto- 
eels: primarily Interact Relay Chat (11{(3) (Oikari- 
nen and Reed, 1993), and similar instant messaging 
protocols developed commercially (America Online 
Inc., 2000; Microsoft Corp., 2000; ICQ Inc., 1999). 
While some portals have been developed to permit 
access to chat using the Web (iTRiBE lnc., 1996), 
the primary point of access seems to be chat-specific 
client software. Although chat defines protocols and 
provides infrastructure, it is limited ill the kind of 
data that it can transl)ort, and client software is 
tightly focussed oil the text domain. Such limita- 
tions have not, however, prevented researchers fi'om 
exl)erilnenting with the possibilities of incorporat- 
ing machine translation or speech into tile chat ex- 
perience (1,enzo, 1998; Seligma.n et al., 1998). The 
outcome of these experiments has been to show that 
comn-mrcial machine translation systems may 1)e rea- 
sonably integrated into the chat room, and that com- 
mercial speech software ca.n be connected to existing 
chat software to provide the desired experience. 
We have taken a difl~rent road. It has been noted 
(Seligman, 19.(.)7; l"rederking et al., 2000) that broad- 
coverage machine translation and speech recognition 
cannot now be usefld mdess users can interact with 
the system to improve results. While Seligman et 
al. (1998) were able to etDct user editing of speech 
recognition by editing text before submitting it for 
translation, they were unable to do the same for tile 
translation system, prilnarily due to limitations of 
commercial software. Additional limitations are en- 
countered in the communication medium: chat is 
not amenable to non-text interaction with transla- 
tion agents, and commercial chat software does not, 
in any case, support such interaction. 
To deal with these limitations, we have developed 
a fully interactive, Web-based, chat-style tra.nslation 
system, supporting sl)eech recognition and synthesis, 
local-or third-1)arty correction of speech reeognitioi, 
and machine translation, and online learning, which 
ca.n be used with nothing lllore than a Well browser 
and some simple add-ons. All intensive processing, 
including translation and speech recognition is per- 
formed a.t central servers, permitting access for those 
with limited computational resources. In a.ddition, 
tile modular design of t.he system and interface per- 
mit computa.tional tasks to be easily distributed and 
different dialog configurations to be explored. 
2 Interface Design 
The design of the Webl)IPLOMAT system is in- 
tended to facilitate the following kind of interaction: 
(numbers correspond to Figure 1) 
1. Speech fl'om the user is recognized and dis- 
played in an editing window, where it may be 
edited by respeaking or using the keyboard. 
2. When text is acceptable to the user, it is sub- 
mitted tbr translation and transfer to the other 
1041 
', ........ .I ; 5 .... 
........ I ' 
-- -v ) 
Figure 1: User-level perspective on information flow. 
See text for explanation of labels. 
l)arty. 
3. Text to be translated is optionally presented to 
a human expert, who is able to translate, cor- 
rect and teach the system a correct translation. 
4. Upon machine translation of tlLe text, or accep- 
tance by the expert, a translation is delivered 
to the other pa.rty and synthesized. 
5. 13oth sides of the conversation are tracked a.u- 
tomatically for all users, and displayed on their 
interfaces. 
Although the above is the original vision for tihe 
system, other configurations are easily imagined. 
Configurations with more than two participants, or 
where one of the users is also simultaneously all ex- 
pert are stra.ightforwardly handled. International- 
ization of the interfaces, for use in different locales, is 
also easily handled. Many changes of this nature are 
handled by easy modifications to the HTMI, code for 
given \¥eb pages. More COml)licated tasks may be 
accomplished by modifications of underlying code. 
In order to produce the above configuration, the 
current system implements two user interthces (UIs): 
the Client UI, which provides speech and text input 
capabilities to the primary end-users of the system; 
and the Editor UI, which provides translation edit- 
ing capabilities to a human translation expert, in 
the rest of this section, we describe in detail certain 
unique aspects of each interface. 
2.1 Client User Interface 
In addition to speech-input and editing capabilities, 
the Client UI is able to track the entire dialog as 
it progresses. Because the Central Communications 
Server (@ ~a.l) forwards every message to all con- 
nected clients, every component of the system can be 
aware of how the dialog turn is proceeding. Ill tile 
Client UI, this capability is used to l)rovide a run- 
ning transcript of the conversation as it occurs. By 
noting the identifiers on messages (cf. ~,3.4), the U1 
can assign appropriate labels to each of the follow- 
ing: our original utterance, translation of our utter- 
ance, other person's utterance, translation of their 
utterance. In ~ddition, we use knowledge about the 
status of the dialog to prevent the user from send- 
ing several utterances belbre the other party has re- 
sponded. 
2.2 Editor User Interface 
The F, ditor UI provides tools which make it possible 
for a human expert to edit translations produced 
by the machine translator betbre they are sent to 
the users. As mentioned earlier, the editing step is 
optional, and is intended to improve the quality of 
transla.tions. The Editor UI may be configured so 
that either of the two users, or a remote third party 
can act as editor. Onr motivations for providing an 
editing capability are twofold: 
• Although our MT system (@ ~3.2) dots not 
always produce the correct answer, the correct 
answer is usually available a.mong the possibili- 
ties it. considers. 
t.al Q • ,H~ MT system provides for online updates of 
its knowledge base which a.llows tbr translations 
to improve over time. 
In order to take advantage of' these capabilities, we 
have designed two editing tools, the chart editor and 
a.lways-active learning, that enable a human expert 
to rapidly produce an accurate tlJaillslatioll aud to 
store tha.t translation in the MT knowledge base for 
future use. 
As discussed in ~a.2, our MT system ma.y produce 
more than one translation for each part of tile input, 
from which it attempts to se\]ect the best translation. 
The entire set of translations is available to the Web- 
I)IPLOMAT system, and ix used in the cha.rt editor. 
By double-clicking on words in the translation, the 
Original English 
My name is John ....... 
Edited Frencll 
l inen nora estJehn 
Figure 2: Popup Chart Editor 
1042 
human edit()\]: is l)resented a. pol)Ul)-menu of alterna.- 
tire tra.nslations beginning a.t a particular location 
in the sentence (see l?igure 2). When one o\[' the al- 
ternatives is sek;cted, it replaces the original word or 
words. In this way, a. sentence may be rapidly edited 
to an acceptable sta.te. 
In order to reduce develolmmnt \]line, our MT sys- 
tem can be used in a ra.pid-del)loylnent style: afl;er a. 
minimal knowledge base is constructed, the system 
is put into use with a huma.n expert supervising, so 
that domain-rel(:va.nt data ma.y be elicited (lui(:ldy. 
In order to supl)ort this, all uttera.nces a.re consid- 
ered for learning. When the editor presses the 'Ac- 
ccitt/Learn' l)utton, the original utterance and its 
tra.nslatiotl are exa.ntined to determine if they are 
suital)le for learning. (Turrently all utterances for 
which the forward tra.nslation has 1teen edited are 
su brat\]ted \['or learning, a.lthough other criteria ma.y 
also be entertained. More detail about online lea.r|> 
ing may 1)e found ill ~3.2. 
Although the editor UI is primarily i\]lte\]l(led tbr 
use by a. tra.nslation expert, it, will sometimes also 1)e 
u,qed 1)y tllose who are not as expert. For this situa- 
ti:)n, we ha.re introduced it lta('ktra.lisla.l.ion capalJil- 
ity which retra.nsla.tos the edited forward trai/sla.tioll 
into the  of the input. Although i,~iperl'ect, 
baektranslatio\]l can often give the user an idea of 
whether the forward transla.tion was suits\]ant\]ally 
(:O\]:l:eot,. 
3 System_ Design 
h, this section, we describe 1.he eOml)uta,l, io\]|al archi- 
t()etu r(" \[lllderlyin,,g the W(;b I) 11) I,O M A'I' sys|,elll. 
3.1. Ar(:hite('t;m'( ~. 
The underlyil\]g arel\]itecture of the \¥obl)II)I,OMAT ' 
system is shown in Figure 3. The system is organized 
arotllld three servel:s: 
The We.It Serv<'.r serves I1T\]Vll, l)ages to <:lients. 
We used an unmodified version of th<; Apache 
ll'l"l'l) Server (Apache Softwa.re l:oundation, 1999). 
Tim SI)eech Recogniz(:r(s) l)erform speech 
recognition for clients. 
The Central Commmfications Server allows 
comrmmica.tion between clients, l,hicapsulated 
oh.jeers sent to this server are forwarded to 
all connected clients. With the exception of 
speech and HTTP, all communications between 
clients use this server. 
The servers are designed to be small, and a.re in~ 
tended to coexist on one lnachine. 1 Currently, how- 
ever, the speech server inchides a full speech recog- 
l This is necessary due to security restrictions on .\]~twt 'I'M 
Applets. 
nizer, a.nd therefore consunies a greater amount o1' 
resources than the other servers. 
Most processing is intended Co be perforumd by 
clients, which haw~' no loca.lity requirements, and 
may therefore I)e distributed across nm.chi\]les and 
networks as necessary. The User and Editor Clients 
were described in {i§2.1 and 2.2. We will now ex- 
amine the most important l~rocessing mechanisms, 
ilmluding machine translation and speech recogni- 
tion/synthesis. 
3.2 Machine Translation 
l"or Machine Transla.tion, we rely on the l)anlite 
M|dti-lgl\]gine Machine Translation (MEMT) Server 
(l:rederking a.nd lh:own, 1996). This system, which 
is outlined in Figure 4, makes use of several trans- 
lation engines at once, combining their output with 
a. sta.tistica\]  model (Brown and l:rederk- 
ing, 1995). Each traiisla.tion engine makes use of a 
dill'ere|tt transla.tion technok)gy, and produ(:es multi- 
t)1% possibly overlal)ping , l.ra\]mlations for every part 
of tit(; inl)ut that it can translate. All of the trans- 
lations I)roduced 1)3: the various engines a,re pla.ced 
in a chart data struci;ure (Kay, 1967; Winograd, 
1983), indexed by the'Jr position i\]\] the input utter- 
a.nce. A statistical huiguage model is used, together 
with scores provided I)y the tra.nslation engines, to 
determine the optima.l path through the set of trans- 
lated segments, which informa,tion is also stored i\]\] 
the chart. Upon completion of tra.nslation, the chart 
data struct||re is made a.vailable For use by the rest 
o\[7 the WeM)II)I,OMA:I ' system. 
(;urrently, we enq)loy l,exica.l Transfer and Ex- 
Source Target 
Language Language 
Morphological I Analyzer i~\[ User Interthce 
-i~ ii'ransfer-Based MT 
-i~ Example-Based MT Statistical Modeller 
Knowledge-Based MT 
Expansion slot 
Figure d: Multi-Engine Machine 3h:ansla.tion Archi- 
tecture 
1043 
MEMT 
Server 
MT 
Interface 
i Speech Synthesizer ', Central Web 
'Recognizer(s) Server i Interface Server 
..... i-'"3"r":q~'-'7"~"~i' --\]'; 7 7~'-'- -" .... " ...................... "':':'i;:": \] ::):')}:'i}}i"':":":':i:i :: ";" ":':\]'17'ii'i ........... Ill\[el'net 
.,.'" / ...... ",, ................ ....." ....... .,. 
1 Speech User l Speech Speech User2 Speech 
/ 
Plugin Client Synth. Plugin Client Synth. Editor Client 1/ Editor Client 2 
Figure 3: Server 
ample Based Machine Translation (EBMT) engines 
(Na.gao, 1984; Brown, 1996). Lexical Transfer uses 
bilingual dictionaries and phrasal glossaries to pro- 
vide phrase-for-phrase translations, while EBMT 
uses a fllzzy matching step to produce translations 
froln a bilingual corpus of matched sentence pairs. 
Because the knowledge bases for these techniques are 
simple, they both suI)port online augmentation. As 
mentioned in §2.2, the Editor UI attempts to learn 
from utterances that have been edited. Pairs of ut- 
terances submitted for learning to the translator are 
placed in a Lexical Transfer glossary if less than six 
words long, and in an EBMT corpus if two words 
or longer. Higher scores are given to these newly 
created resources, so that they are preferred. 
The MT server is interfa.ced to the Central Server 
through MT interfa.ce clients, which handle, inter 
alia, character set conversions, support for learning 
and conversion of MT output into an internal ob- 
ject representation usable by other clients. It also 
ensures that outgoing translations are staml)ed with 
correct identifiers (cf. ~3.4), relative to the incoming 
text, to ensure that translations are directed to the 
appropriate clients. 
a.a Speech Recognition and Synthesis 
In the current system, speech recognition is handled 
as a private communication between a browser plug- 
in, running on the user's machine, and a speech 
recognition server, and is not routed through the 
central server. Speech is streamed over the network 
to the server, which performs the recognition, and 
returns the results as a text string. This configura- 
tion permits most of the computational resources to 
be offloaded from the client machine onto powerful 
remote servers. The speech may be streamed over 
the network as-is, or it may be lightly preprocessed 
into a feature stream for use over lower-bandwidth 
connections. The recognized text is returned di- 
Architecture 
rectly to tile user client for editing and validation 
by the user belbre heing sent for translation. Our 
speech server is a previously implemented design 
(Issar, 1997) based on the Sphinx II speech recog- 
nizer (Huang et a l., 1992). As mentioned earlier, 
the speech server and recognizer are not currently 
designed to run in a distributed fashion. 
Unlike speech recognition, which is handled by 
the User Client, speech synthesis does not require 
human interaction, and can therefore be connected 
directly to the central server. Currently, Synthe- 
sizer Interfaces unpackage internal representations 
and send utterances to be synthesized on a speech 
synthesizer running locally on the user's machine. 
Future plans call for speech to be synthesized at a 
central location and transported across the net.work 
in standard andio formats. 
3.4 Implementation 
All components of the Webl)IPLOMA'\]' except the 
speech components and Web Server were imple- 
mented in Java TM (Gosling et el., 1996), inclnding 
the Central Server. Messages between clients are 
implemented as a Java class Capsule, containing a 
String identifier and any number of data. Objects. 
Object serialization permits simple implementation 
of message streams. User Interface clients are de- 
veloped as Applets, which are embedded in HTML 
pages served by the Web Server. 
4 Future Work and Conclusion 
The most significant change we would like to make 
to the current system is the way that speech is han- 
dled. We firmly believe that the best speech input 
device is the one people are already familiar with, 
namely the telephone. A revised system would al- 
low users to call specific phone numbers (connected 
to the central server) in order to access the system, 
which would then recognize and synthesize speech 
1044 
over tile telephone line while still using web-based in- 
terfaces. This, of COtlrse, takes us closer to the grand 
AI Challenge of the translating telephone (OAIAE, 
1996; Kurzweil, 1999; Frederking et al., 1999). We 
contend that by using interactive machine transla- 
tion, the goal of a broad-domain translating tele- 
phone Call be more easily brought to fruition. 

References 

AltaVista. 1999. Babel Fish: A SYSTI{AN transla- 
tion system, http://babelfish.altavista.com/. 

America Ojflit\e Inc. 2000. AOI, Instant 
MessengertSm). http://www.aol.com/aim/home. 
html. 

The Apache Software Foundation. 1999. The 
Apache H'I'TI ) Server Project. http://www. 
apache.org. 

Tim Berners-l,ee. 1989. Informa.tion manage- 
ment: A proposal, http://www.w3.org/History/ 
1989/proposal.html, March. CI!;RN. 

l~.alf Brown and Robert Frederking. 1995. Ap- 
plying statistical English  modeling to 
symholic machine translation. \]n Proceedings of 
the ,5'ixlh International Uo~dbrence on 7'heorctical 
and Methodological Issues in Machine Trcmslation 
(TMI-95), pages 221-239. 

Ralf Brown. 1996. Example-based inachine transla- 
tion in the Pangloss system. In Proccedirtg.s of the 
l(ith International Co~@rencc o1~ Computational 
Lingttistics (COIJNG-96). 

Robert l,'rcderking and l{alf Ih'own. 1996. The 
Pangloss-lAte machine translation system. In Pro- 
ceedings of the Col~ference of the Association for 
Machine 7)'anslation in the Americas (AMTA). 

Robert Frederking, Christol)hel: llogan, and Alexan- 
der l{udnicky. 1999. A new approach to the trans- 
lating telephone. In l)wcccdiltfls of the Mac\ira: 
7)'anslalion 5'ummit VII: 1147' i~t i, hc G'tvat 7)'ans- 
lation I';ra, Singapore, September. 

lb)bert Frederking, Alexander Rudnicky, Christo- 
pher Hogan, and Kevin Lenzo. 2000. Interactive 
speech translation in the DIPLOMAT project. MT 
Journal. To appear. 

l"ree'lS:anslation. 1999. Free'l'rmMation: A Trans- 
parent l,anguage translation system, http://www. 
freetranslation.com/. 

James Gosling, Bill .loy, and (luy L. Steele, Jr. 1996. 
7'he Java "pM La,~(luage ,5'pcciJication. Addison- 
Wesley Publishing Co. 

Xuedong Ihmng, Fileno Alleva, Hsiao-Wuen Hen, 
Mei-Yuh Hwang, and Ronald l{osenfeld. 1992. 
The SPHINX-II speech recognition system: An 
overview. 'l'echnicM l{el)ort CMU-CS-92-112, 
Carnegie Mellon University School of Computer 
Science. 

ICQ Inc. 1999. ICQ IRC Services. http://www.icq. 
COrn/. 

Inter'Dan. 1999. An lnterTran translation system. 
http://www.airsho.com/transLator3.htm. 

Snnil Issar. 1997. A speech interface for forms on 
WWW. In Proceedings of the 5th European Con- 
ferencc on ,5'peech Communication and 7'echnol- 
ogy, September. 

iTRiBE Inc. 1996..lil{C. http://virtual.itribe.net/ 
jirc/. 

Martin Kay. 1967. Experiments with a powerfi|l 
parser. In Proceedings of the 2~M lnternatio~ml 
COLING, Angust. 

Ray Kurzweil. 1999. The Age of ,5'piritual Ma- 
chines: I~Tten Computers Exceed tluman h~telli- 
flence. Viking Press. 

Kevin Lenzo. 1998. personal conmmnication. 
Microsoft Corp. 2000. MSN TM Messenger Service. 
http://messenger.msn.com/. 

M. Nagao. 1984. A \['ramework of a nlechanical 
translation between Japanese and English by anal- 
ogy principle. In A. Elithorn and l{. 13aneI:ii, ed- 
itors, Artificial and \]\]'uma~ Intelligc~cc. NN.I'O 
lhlblications. 

()\[\[ice of Arti\[icial Intelligence Analysis and F, vahm- 
lion OAIAE. 1996. Artificial intelligence -An ex- 
ecutive overview, http://www.ai.usma.edu:8080/ 
overview/cover.html. 

Jarkko Oikarinen and l)arren l/.eed. 1993. Internet 
relay chat protocol, ftp://ftp.demon.co.uk/pub/ 
doc/rfc/rfc1738.txt, l{equcst for Comments 1459, 
Network Worldng (-lroup. 

Mark Seligman, Mary Flanagan, and Sophie Toole. 
1998. Dictated input {'or broad-coverage speech 
tra.nslation. In Clare Voss and Fie Reeder, edi- 
tors, Workshop on Embedded MT' ,h'flstems: De- 
sign, Consh'uclion, and l'Jvahtatiol~ of 5'9slcms 
'with an 1147' Component, l,anghorne, Pennsylva- 
nia, October. AMTA. 

Mark Seligman. 1997. Six issues in speech trans- 
lation. In Steven Kra.uwer et al., editors, Spo- 
ken Language Translation Workshop, pages 83--89, 
Madrid, July. 

Terry Winograd. 1.983. Langua9e as a Co.qnitive 
Process. Volume 1: Syntax. Addison-Wesley. 

Jin Yang and Elke 1). Lange. 1998. SYSTllAN 
on AltaVista.: A user study on real-time ma- 
chine translation on tile Intcrnet. In l)avid Far- 
well et al., editors, Proceedings of the Third Con- 
ference of the Association for Machine 7;r(msla- 
lion in lhe Americas (AMTA '98), pages 275- 
285, Langhorne, Pennsylvania, October. Springer- 
Verlag. 
