Spoken-Language Translation Method Using Examples 
Hitoshi IIDA, Eiichiro SUMITA and Osamu FURUSE 
ATR Interpreting Telecommunications Research Laboratories 
2-2 Hikaridai 
Seika-cho, Kyoto 619-02, JAPAN 
{iida, sumita, furuse}@itl.atr.co.jp 
1 Introduction 
Conventional approaches to machine translation 
are mostly concerned with written text, such as 
technical documents. This paper addresses the 
problem of spoken-language translation and ex- 
plains the method and its capability to handle 
spoken language. 
2 Seven requirements for 
spoken-language translation 
The following new design features are critical for 
success in spoken-language translation: 
1. Incremental processing 
Incremental processing is required so as to 
handle fragmental phrases or incomplete ut- 
terances and to realize a real-time response. 
This has a very close relation with item 5 be- 
low. 
2. Handling spoken language 
Fragmental phrases, isolated phrases, a gra- 
dient of case role changing, complex topical- 
ization, metonymical phrases, idiomatic ex- 
pressions for etiquette, and inconsistent ex- 
pressions in one utterance are main charac- 
teristics of spoken language. They strongly 
depend on dialogue situations. 
3. Handling euphemistic expressions 
Under the influence of social position or situ- 
ation, euphemistic expressions appear in var- 
ious scenes in various forms. 
4. Deterministic processing 
Neither pre-editing nor post-editing can be 
relied on in a speech translation system. In- 
teractive disambiguation by speakers does 
not necessarily converge a correct interpre- 
tation. 
5. Sufficient speed to avoid to break com- 
munication 
As an interpreter intervenes between speak- 
ers, real-time response is required to keep 
smooth turn taking. 
. 
. 
High-quality translation 
This is necessary in order to ensure correct 
information exchange between speakers. 
Recovering from speech recognition er- 
rors 
There are various aspects to recovering from 
speech recognition errors, for example in 
correcting phoneme sequences, syllable se- 
quences, word sequences (including com- 
pound words and collocations). 
3 Meeting the seven requirements 
3.1 Incremental processing 
This is an essential technology if one is to build 
an incremental translation system like a simulta- 
neous interpreter, and the proper way to grasp 
a chunk of a translation unit corresponding to 
some chunk in a target language is to extend 
'constituent boundary parsing' to bottom-up-type 
parsing \[Furuse96\]. 
3.2 Recovering from errors 
A certain recovery method is now under consid- 
eration: a re-entrizing model for phoneme candi- 
dates by means of searching the correct phonemes 
using modification depending on recognition er- 
ror characteristics in an example-based framewbrk 
\[Wakita95\]. This approach provides a recovery ef- 
fect in handling phoneme or syllable sequences, 
and the effect depends on the particular speakers 
because of individual error characteristics. 
3.3 Requirements covered by 
EBMT/TDMT 
The remaining requirements are handled effec- 
tively by an example-based approach as explained 
here. 
In NLP systems, especially for spoken language, 
many possibile syntactic structures are produced. 
It is an important and difficult process to choose 
the most plausibile structure. Conventional ap- 
proachs, such as knowledge-based one, cannot eas- 
ily handle continuous phenomena: gradation of 
case role changing; derivation of a metonymical 
1074 
relation; and relationship between a topicalized 
word and the main predicate. 
We have proposed Example-Based 
Machine 3¥anslation (EBMT) to deal with these 
difliculties\[Sumita92-a\]. The EBMT method pre- 
pares a large number of translation examples; the 
translation example that most closely matches the 
input expression is retrieved; and tile example is 
nfimicked. 
When applying F, BM'F to sentence transla- 
tion, the sentence must be analyzed by matching 
transaltion patterns of phrases \[Furuse94\]. This 
model is in a sense "driven by transfer", and 
we call it Transfer-Driven Machine %anslation 
(TDMT). 
3.3.1 Handling spoken language 
Spoken language includes many phenomena; 
here, howew'.r, we concentrate on the following 
ones: 
(1) "wa" is a Japanese topic marker and, in gen- 
eral, this marker can t)e replaced by other 
case particles. But some usages cannot be 
identified as to case role because of grada- 
tion of case role changing. Moreover, if there 
are double topic markers in a sentence, they 
cannot I)e replaced by other particles 1. The 
first sentence in our Japanese-to-English (JE) 
translation "snapshot" (Figure 1), for exam-. 
ple, is properly translated in our TI)MT pro- 
totype system. 
(i) "Chikatetsu-wa ichiban-chikai eki-wa 
doko desu-ka." 
('subway-topiealized,' 'the near- 
est,' 'station-topicalized,' 'where,' 'be- 
question') 
(2) Two sentences are mixed in one utterance. 
The tirst is pended, then inunedaitely the sec- 
ond sentence starts without conjunction. 
(ii) "Shiharai-wa ginkou-fllrikomi-o o-machi- 
shite-oriInasu." 
('payment-topicalized,' 'bank-transfer- 
objective,' 'wait-for-polite-modest') 
a.a.= Handling euphemistic expressions 
(1) There are various types of expressions for 
politeness, modesty, and euphemism. Such 
expressions are used depending on social 
roles. The fourth sentence in our Japanese- 
to-Korean (JK) translation snapshot (Figure 
2) is a sample of this type, which is properly 
dealt with by TI)MT. 
(iii) "Yoyaku-wo 
kakunin-sasete-itadaki-masu." 
1In this paper, sample Japanese sentences are writ- 
ten alphabetically and surrounded by double quotes, 
and the corresponding English words with usage mod- 
ifiers follow in parenthesis. 
('reservation-objective,' 
'confirm-modest') 
(iv) "Go-dengon-wo 
o-t ut ae-moushiage-masu ." 
('message-polite-objective,' 
'inform-honorific') 
3.3.3 Deterministic processing 
ConventionM MT methods provide multiple 
translation candidates but no information to use 
in selecting among them, or else just the first pos- 
sible sentence that is generated. 
On the contrary, EBMT generates all the possi- 
ble candidates combining suitable phrases. It also 
provides proper scores to each candidate using a 
similarity calculation. The scores realize "deter- 
ministic" translation. 
3.3.4 Speed 
\[Furuse96\] has improved a matching mechanism 
over translation patterns. By accepting input 
in left-to-right order and dealing with best-only 
substructures, the explosion of structural ambi- 
guity is restrained and an efficient translation of 
a lengthy input sentence can be achieved, l)re - 
liminary experimentation has shown that average 
translation times are reduced from 1.15 seconds 
to 0.55 seconds for input of 10 words in length 
and from 10.87 seconds to 2.04 seconds for in- 
put of 20 words in length. The incorporation 
of incremental morphological analysis and gener- 
ation \[Akamine95\] into the new-version TDMT, 
is promising for achieving incremental (simulta- 
neous) translation for a practical spoken-language 
translation system. 
If instantaneous response is required, the rest 
dominant process is retrieval of the closest transla- 
tion patterns from bulk collection. It is effectively 
solved by using a massively parallel algorithms 
and machines \[Sumita95-a, Snmita95-b, Oi93\]. 
3.3.5 Quality 
First, a well-known difficult problem in 
Japanese to English translation was selected as 
a test. The Japanese noun phrase of the form 
"noun + NO + noun" using the Japanese adnom- 
inM particle "NO" is an expression whose meaning 
is continuous. A translation success rate of about 
80% has been demonstrated in a Jacknife test 
\[Sumita92-a\]. Also, for other Japanese and En- 
glish phrases, similar effectiveness in target word 
selection and structural dsiambiguation has been 
demonstrated\[Sumita92-b\]. 
We have evaluated a experimental TDMT sys- 
tem, with 825 model sentences about conference 
registration. These sentences cover basic expres- 
sions in an inquiry dialogue. The success rate is 
71% for a test data set consisting of 1,050 unseen 
sentences in the same domain. 
1075 
target : "T would like to arrive at Las Vegas by nine o' clock at night" 
target : "If you get on the bus at nine fifteen, you ~ll arrive by e~ht o' clock at night" 
source : "AB~-CT~ ~" target : "At eight 0' clock 2" 
source : "L; ~ ~f~, ~?: ~ ~d--~r.'\]ff~;0~7~ AT~-n" 
target : "Well, it takes eleven hours approximately, r~ht ?" 
source : % ~ ~,~'~ U ~-~'6-i--~l%"ffl\[;b ~ U ~-" 
target : "No . there is the time difference and it will take t~elve hours" 
Figure 1: JE translation snapshot by TDMT 
source : "C-~%T'~Pe~,gtCZSf~'8~9-~ ~'' (Hi is it possible to make hotel reservation from here?) 
target : "~1~ ~ ~ ~ + ~@~1~?" 
source : "C~fr(~-C'~ ~/~-c~'C-~,,~a)ZIat~*~Z~X-c'I~ bT~ ~ ~:~i~ ~" (OK, what we do is to give you all the ~nformation you need and then 
~e ask you to go ahead and make the call yourself. ) 
target : ~ ,~ 7~ul\] o~o~ ~ ~o~ ~ ~ ~x\]~¢~.\]~h" 
source : "b+~ U~d~69~JU~/~U~< tZ~" 
(OL l'm looking for a central locatioi~ if possible. ) 
target : "~L~ ~ ~\]~ ~J~ ~ ~xj~\]~ .. 
(Not too expensive, and it shouldn ~ t take too long to get to the major sights from there. ) 
target : "e~H17~ ~ HI~I ~ ~ ~l~ ~17~ ~gJL" 
Figure 2: JK translation snapshot by TDMT 
4 JE 8¢ JK prototype systems 
The TDMT system is being expanded so as to 
handle travel arrangement dialogues including the 
topics of hotel reservation, room services, trou- 
bleshooting during hotel stays, various informa- 
tion queries, and various travel arrangements. At 
present the JE system has about a 5,000-word 
vocabulary and a transfer knowledge from 2,000 
training sentences. The JK system is half this 
size. While some modules, such as morphologi- 
cal analysis and generation, are language-specific, 
the transfer module is a common part of every lan- 
guage pair. Through JE and JK implementation, 
we believe that the translation of every language 
pair can be achieved in the same framework using 
TDMT. On the other hand, it has turned out that 
the linguistic distance between source and target 
languages reflects the variety of target expression 
patterns in the transfer knowledge. Table 1 shows 
the number of target expression patterns corre- 
sponding a Japanese particles in JE and JK. These 
numbers are counted from the current TDMT sys- 
tem's transfer knowledge, and the numbers of ex- 
amples are token numbers (i.e., not including du- 
plications). 
5 Discussion 
5.1 Integration of Speech and Language 
A mechanism for spontaneous speech translation 
must be consistent with a mechanism for handling 
associative knowledge, such as translation usage 
examples and word co-occurrence information for 
rnemory-b~ed processing, and with a mechanism 
for logical structure analysis according to detailed 
rules for each processing phase in the Transfer- 
Driven MT processing. Under the process, a study 
should be carried out on building a stochastic lan- 
guage model using both syntactic and semantic 
information for speech understanding. 
5.2 Related Research 
On the other hand, some studies hope to build 
spoken language translation systems using a cer- 
tain interlingua method. A semantic parser 
is a typical example of this method. In par- 
ticular, "semantic pattern based parsing" in 
JANUS, CMU's speech to speech translation sys- 
tem \[Woszczyna93, Levin95\] uses frame based se- 
mantics with a semantic phrase grammar and 
the operation of the parser is viewed as "phrase 
spotting." Another one is MIT's multilingual 
1076 
'Fable 1: Japanese particle translation in JE and ,IK translation 
Japanese 
Pattern 
X w(J Y 
Xga Y 
Xno Y 
XoY 
Xni Y 
Xde Y 
JP 
Example Target patterns 
224 30 
140 15 
226 36 
147 15 
154 22 
120 25 
Example Target patterns 
66 1 
40 1 
88 2 
41 1 
55 5 
33 5 
GALAXY: a human-language interface to on- 
line travel information \[Goddean94\]. The system 
makes use of 'semantic frame representation' so 
as to paraphrase a recognized speech input utter- 
ance into a concrete and simple expression that 
contbrms with one of the system's internal repre- 
sentations and makes the utterance meaning easy 
to handle. Itowever, in extracting the meaning 
of an inlmt sentence, many default values are re- 
quired so as to execute heuristic inferences. The 
inference is too powerful in explaining a speaker's 
intention and the propositional content of the ut- 
terance by one key word or phrase. Such a method 
may work well in a certain domain, but less scala- 
bility may be revealed when making a larger pro- 
totype system. 
VERBMOBIL is a typical translation system 
for face-to:face dialogue \[Wahlster93\]. This sys- 
tem adopts English as a dialogue language for 
human-machine interface and makes use of DRT- 
based semantic representation units. 
6 Conclusion 
'\['DMT has been proposed as a general technique 
for spoken-language translation. We have ap- 
plied TDMT to two language pairs, i.e., Japanese- 
English, and Japanese-Korean, as a first step to- 
ward multi-lingual translation. Also, we are plan- 
ning to integrate speech recognition with TI)M'F 
for achieving effective and efficient speech trans- 
lation. 
References 
\[Akamine95\] Akamine, S. and l!'uruse, O.: 
Einiehi-taiwabun-hon'yaku niokeru zenshinteki- 
nihongobml-seisei (incremental generation of 
Japanese Sentence in English to Japanese Dia- 
logue Translation), in ProF. of 1 st NLP convetion, 
pp.281-284 (1995), (in Japanese). 
\[Furuse94\] Furuse, O. and Iida, H. : Constituent 
Boundary Parsing for EBMT, in ProF. of COL- 
ING'94, pp. 105-111 (1994). 
\[Furuse96\] Furnse, O. and Iida, H. : Incremental 
Translation Utilizing Constituent Boundary Pat- 
tern, in ProF. of COLING'96 (1996). 
\[Goddeau94\] Goddeau, D., et al. : GALAXY: 
A IIUMAN-LANGUAGE INTERFACE TO ON- 
LINE 'I'RAVI!3; INFORMATION, in Poc. of IC: 
SLP94, pp.707-710 (1994). 
\[lida93\] 1ida, H. : Prospects for Adwmced Spo- 
ken Dialogue Processing, \[EICE TRANS. INF. 
and SYST., VOL. E-76-D, No.l, pp. 2-8 (1993). 
\[Levin95\] Levin, L. , et al. : Using Context 
in Machine Translation of Spoken Language, in 
ProF. of TMI-95, pp. 173-187 (1995). 
\[Nagao84\] Nagao, M. : A Framework of a Ma- 
chine Translation between Japanese and English 
by Analogy Principle, in Artitieial and Human In- 
telligence, eds. A. Elithorn and R. Banerji, North- 
llolhmd, pp. 173-180 (1984) . 
\[Oi93\] Oi, K. et al. : Toward Massively Paral- 
lel Spoken Language Translation, in Proe. of the 
Workshop on Parallel Processing for AI, IJCAI'93, 
pp. 36-39 (1993). 
\[Sumita92-a\] Surnita, E. and Iida, II. : 
Example-Based Transfer of Japanese Adnominal 
Particles into English, IEICE TRANS. INF. and 
SYST., VOL. E-75-1), No.4, pp. 585-594 (1992). 
\[Smnita92-b\] Sumita, E. and Iida, 11. : 
Example-Based NLP Techniques- A Case Study 
of Machine Translation - , Statistically-Based 
NLP Techniques- Papers from the 1992 Work- 
shop, Technical Report W'92-01, AAAI Press (1992). 
\[Sumita95-a\] Sumita, E. and Iida, tt. : Itetero- 
geneous Computing for Example-Based Transla- 
tion of Spoken Language, in Proe. of TMI-95, pp. 
273-286 (1995). 
\[Sumita95-b\] Sumita, g. and Iida, H. : Hetero~ 
geneous Computing for Example-Based Transla- 
tion of Spoken Language, in ProF. of TMI-95, pp. 
273-286 (1995). 
\[Wahlster93\] Wahlster, W. : Verbmobil: Trans- 
lation of Face-To-Face Dialogs, in ProF. of MT-- 
Sumnfit IV, pp. 127-135(1993). 
\[Wakita95\] Wakita, Y. et al. : Phoneme Can- 
didate Re-entry Modeling Using Recognition Er- 
ror Characteristics over Multiple HMM States, in 
ProF. of ESCA Workshop on Spoken Dialogue 
Systems, pp. 73-76 (1995). 
\[Woszczyna93\] Woszczyna, M., et al. : REC- 
CENT ADVANCES IN JANUS: A SPEECH 
TRANSLATION SYSTEM, in ProF. of EU- 
ROSPEECH'93, pp. 1295-1298 (1993). 
1077 
