Automatic Interpretation System Integrating
Free-style Sentence Translation and Parallel Text Based Translation
Takahiro Ikeda Shinichi Ando Kenji Satoh Akitoshi Okumura Takao Watanabe
Multimedia Res. Labs. NEC Labs.
4-1-1 Miyazaki, Miyamae-ku, Kawasaki, Kanagawa 216
t-ikeda@di.jp.nec.com, s-ando@cw.jp.nec.com, k-satoh@da.jp.nec.com,
a-okumura@bx.jp.nec.com, t-watanabe@ay.jp.nec.com
Abstract
This paper proposes an automatic in-
terpretation system that integrates free-
style sentence translation and parallel text
based translation. Free-style sentence
translation accepts natural language sen-
tences and translates them by machine
translation. Parallel text based translation
provides a proper translation for a sen-
tence in the parallel text by referring to a
corresponding translation of the sentence
and supplements free-style sentence trans-
lation. We developed a prototype of an au-
tomatic interpretation system for Japanese
overseas travelers with parallel text based
translation using 9206 parallel bilingual
sentences prepared in task-oriented man-
ner. Evaluation results show that the par-
allel text based translation covers 72% of
typical utterances for overseas travel and
the user can easily find an appropriate sen-
tence from a natural utterance for 64% of
typical traveler’s tasks. This indicates that
the user can benefit from reliable transla-
tion based on parallel text for fundamental
utterances necessary for overseas travel.
1 Introduction
A speech-to-speech translation system must inte-
grate at least three components — speech recogni-
tion, machine translation, and speech synthesis. In
practice, each component does not always output
the correct result for various inputs, and an error
in one component often leads to an incorrect result
being produced by the total system even for a lim-
ited domain. Clearly, we need ways to complement
speech-to-speech translation systems that cannot re-
liably produce a correct result.
Although some robust methods that make the er-
roneous results of other components acceptable have
been proposed (Yumi et al., 1997; Furuse et al.,
1998), there is no guarantee that the final output
from a system will be appropriate even with these
methods. To deal with this problem, we have taken a
more practical approach to developing an automatic
interpretation system where the user can obtain a
correct result instead of having to apply additional
operations and judgment.
In actual use of a speech-to-speech translation
system, an error in the speech-recognition or speech-
synthesis components is not a large problem if the
system has a screen that displays each result. The
user of the system can correct errors in the recogni-
tion result on the screen, and can communicate by
showing the other person the translated sentence on
the screen.
On the other hand, an error in the machine-
translation component is critical because a user who
is not familiar with the target language is unlikely
to notice the error in some cases. When a nonsensi-
cal sentence is generated by machine translation, the
user may realize that the listener does not understand
the translated sentence. However, when a plausible
sentence that means something diﬀerent from the in-
tended meaning is generated by the machine trans-
lation, the user may incorrectly assume that the ut-
terance was properly communicated. Consequently,
the user can seldom be sure that the listener cor-
rectly understood the intended meaning when using
a speech-to-speech translation system. A conversa-
                                            Association for Computational Linguistics.
                           Algorithms and Systems, Philadelphia, July 2002, pp. 85-92.
                          Proceedings of the Workshop on Speech-to-Speech Translation:
tion could continue for some time before it became
apparent that the two sides misunderstood what the
other was saying.
Moreover, if the user realizes that there is an er-
ror in the machine translation, correcting it will be
diﬃcult. Without knowing the source of the error,
the user cannot modify the input to obtain a correct
result.
These error problems severely limit the usability
of speech-to-speech translation.
In this paper, we propose an automatic interpreta-
tion system that integrates free-style sentence trans-
lation and parallel text based translation. In this sys-
tem, free-style sentence translation accepts natural
language sentences and translates them by machine
translation without guaranteeing the quality of the
translation. On the other hand, parallel text based
translation uses parallel bilingual sentences regis-
tered in the system and translates a registered sen-
tence by referring to the corresponding translation.
Although this translation process limits the input to
registered sentences, it is a robust means of han-
dling input with recognition errors and consistently
provides a correct translation. We integrated these
two types of translation to realize a robust transla-
tion system where the two types of translation com-
pensate for the shortcomings of each other.
For appropriate integration of free-style sentence
translation and parallel text based translation, we
had to consider three main points.
1. User interface: how best to present the two
functions to the user?
2. Content of registered sentences: How many ut-
terances should be covered by registered sen-
tences?
3. Retrieval system: What methods of searching
among the registered sentences should be pro-
vided to the user?
In this paper, we discuss these three points with
respect to a translation system for Japanese travelers
in the overseas travel domain. We construct a model
of the integration of free-style sentence translation
and parallel text based translation in Section 2. We
describe a prototype system based on the model in
Section 3 and evaluate it in Section 4. Related work
on translation systems utilizing parallel text are dis-
cussed in Section 5, and we conclude in Section 6.
2 The Integration Model
2.1 User Interface
Although parallel text based translation provides a
correct result, the registered parallel bilingual sen-
tences cannot cover all possible utterances by the
user in the target domain. Free-style sentence trans-
lation, on the contrary, accepts free-style input sen-
tences but provides no guarantee as to the quality of
results.
For many routine situations, users will clearly
benefit from using parallel text based translation.
In such cases, the system will probably include a
sentence that totally or partially fits what they want
to say. To ensure high translation reliability, users
should use free-style sentence translation only for
utterances not covered by the registered sentences.
However, users usually will not know what sen-
tences are registered in the system and will have to
search for an appropriate sentence before they can
use parallel text based translation. In some cases, the
user will be forced to use free-style sentence trans-
lation if unable to find an appropriate sentence.
A seamless user interface that allows the user to
easily switch between free-style sentence transla-
tion and parallel text based translation is therefore
needed in a system integrating these two forms of
translation. Two conditions in particular had to be
met to make the system easy to use.
1. The user should be able to use an input sen-
tence seamlessly as both a source sentence for
free-style sentence translation and a key sen-
tence for registered sentence retrieval.
2. The user should be able to use each sentence in-
cluded in the results of the registered sentence
retrieval and the input sentence as a source
sentence for translation. (The former would
be used for parallel text based translation, and
the latter would be used for free-style sentence
translation.)
2.2 Content of Registered Sentences
Registered sentences must cover the utterances nec-
essary for accomplishing typical tasks in the target
domain to provide correct translation for minimal
communication. In a translation system for overseas
travelers, some typical tasks are changing money,
checking in at a hotel, and ordering at a restaurant.
We adopted a three-tier model that consists of
scenes, tasks, and subtasks to prepare a suﬃcient set
Table 1: Examples of scenes, tasks, subtasks, and templates of sentences
Scene Task Subtask Template of sentence
Hotel Check-in Checking in I’d like to check in, please.
Hotel Check-in Requesting a type of room I’d like a room with the ocean view.
Restaurant Order Requesting cooking time for your steak Medium, please.
Restaurant Order Asking what they recommend What do you recommend for appetizers?
of necessary sentences to be registered in the sys-
tem. A scene comprised a place or situation that
corresponds to where a traveler is likely to be (e.g., a
hotel) and a problem that could arise. We made a list
of typical travelers’ tasks that would be necessary in
various travel scenes, divided each task into smaller
primitive tasks (subtasks), and assigned a sentence
template to each subtask based on the model.
In general, more than one round of conversation
is necessary to accomplish each task. We assumed
that a task would consist of smaller subtasks, each
of which would correspond to one round of conver-
sation that consisted simply of an utterance from a
traveler to a respondent and a response from the re-
spondent to the traveler. For example, the task of
checking in to a hotel consists of subtasks such as
giving your name, confirming your departure date,
and so on. Each subtask should be the smallest unit
of a task because users cannot use a registered sen-
tence eﬀectively if it includes more than what they
want to say.
In this way, only one sentence template is needed
for each subtask with regard to an utterance from a
traveler to a respondent. For example, we can assign
a sentence template of “I’d like to have ....” to the
subtask of ordering a dish in a restaurant. We can
provide a suﬃcient number of sentences by enabling
the user to fill in the part denoted as “...” (referred to
as a slot) with words applicable to the situation.
Table 1 shows examples of scenes, tasks, sub-
tasks, and sentence templates. An underlined part
represents a slot. We define a list of words individu-
ally for each slot.
For each task, both the utterances from a traveler
to a respondent and the responses from a respon-
dent to a traveler are significant. Responses should
also be supported by parallel text based translation to
ensure reliable communication. However, inputting
the response and retrieving a registered sentence that
matches it will be diﬃcult and time consuming for
the respondent who is unlikely to be familiar with
the translation system.
We use a system that presents a menu of responses
for the respondent to choose from. The system keeps
typical responses in parallel bilingual form for each
registered sentence that the traveler can use and dis-
plays these as candidate responses when the traveler
uses the sentence. The system then shows the trav-
eler the translation of the response selected by the
respondent.
This approach enables travelers to obtain a reli-
able response and also enables respondents to easily
select an appropriate response.
2.3 Retrieval System
The retrieval system to search for a registered sen-
tence that we use is based on a combination of three
conditions — the natural language sentence, scene,
and action.
Registered sentence retrieval based on a natural
language sentence is essential for seamless integra-
tion of free-style sentence translation and parallel
text based translation. We used a simple keyword-
based retrieval system for registered sentence re-
trieval. This system extracts keywords from an in-
putted natural language sentence, searches for sen-
tences including the keywords, and presents the re-
sults ranked mainly by the number of keywords in-
cluded in each sentence.
The system retrieves all sentences including more
than one keyword to reduce the chance of an appro-
priate sentence not being retrieved. We overcame
the increased retrieval noise in the result by applying
an additional retrieval system to search for registered
sentences in terms of the scene and action.
Each registered sentence to be retrieved for trans-
lation corresponds to a set of a scene, a task, and
a subtask as described in the previous section. A
scene represents a place or a situation where the user
wishes to accomplish the task and the subtask. A
task and a subtask represent a user’s actions. This
means that the user’s utterance is related to the user’s
intention regarding where (scene) the user wants to
do something (action).
We use the additional retrieval system in situa-
tions where the user has to search for sentences from
G24G58G57G52G50G44G57G4CG46G03G36G53G48G48G46G4BG10G57G52G10G56G53G48G48G46G4BG03G37G55G44G51G56G4FG44G57G4CG52G51G03G36G5CG56G57G48G50
G36G53G48G48G46G4B
G35G48G46G52G4AG51G4CG57G4CG52G51
G36G53G48G48G46G4B
G36G5CG51G57G4BG48G56G4CG56
G30G44G46G4BG4CG51G48
G37G55G44G51G56G4FG44G57G4CG52G51
G35G48G4AG4CG56G57G48G55G48G47
G36G48G51G57G48G51G46G48
G35G48G57G55G4CG48G59G44G4F
G35G48G4AG4CG56G57G48G55G48G47
G36G48G51G57G48G51G46G48
G37G55G44G51G56G4FG44G57G4CG52G51
G35G48G4AG4CG56G57G48G55G48G47G03G36G48G51G57G48G51G46G48G03G27G44G57G44G45G44G56G48
G2CG51G53G58G57 G32G58G57G53G58G57
G03G03G03G03G2BG52G5AG03G50G58G46G4BG03G4CG56G03G4CG57G22G03G03G03G03G36G52G55G48G03G5AG44G03G4CG4EG58G55G44G03G47G48G56G58G4EG44G11
G1DG1D
G28G51G4AG4FG4CG56G4BG2DG44G53G44G51G48G56G48
Figure 1: The configuration of our prototype system
the points of view of scene and action.
1) Search by scene
The number of scenes where travelers are likely to
have a conversation is limited and can be systemat-
ically classified regarding places such as an airport,
a hotel, or a restaurant.
We provide a directory-type search system that
can be used to search for sentences by scene. We
built up the travel-scene directory tree and assigned
sentences to the leaf nodes of the tree. When the
user selects a scene in the tree, sentences belonging
to that scene are presented to the user. The selected
scene does not change until the user selects another
scene in this search system since the user generally
will not move to a diﬀerent scene while talking.
2) Search by action
Since it is diﬃcult to represent actions with key-
words and a traveler’s range of probable actions
in overseas travel is limited, we also provided a
directory-type search system to search for sentences
by action. We constructed a directory tree of traveler
actions, and the user can obtain the sentences used
for an action by selecting the action from the tree.
By inputting a natural language sentence and se-
lecting a scene and an action, the user can obtain
sentences that include the keywords extracted from
the input sentence and that match the selected scene
and action. When the user selects a diﬀerent scene
or action, the system again searches through the reg-
istered sentences using the new condition regarding
the scene or action along with the original condition
that was not changed. This enables the user to dy-
namically adjust the search conditions.
Table 2: Top layer nodes of the scene directory
Scenes
Using Interpretation Machine
Basic Expressions
On the airplane
Airport
Hotel
Restaurant
Shopping
Transportation
Rent a car / Driving
Sightseeing / Entertainment
Telephone / Mail / Bank
Property loss / Incident / Accident
Sickness / Injury
Table 3: Action directory
Actions
Making a request
Asking for permission
Asking a question
Complaining
Explaining
Greeting
3 Prototype System
We have integrated free-style sentence translation
(Watanabe et al., 2000) and parallel text based trans-
lation based on the model described in the previous
section and built a new prototype system. Here, we
describe the system configuration, the contents of
the registered sentences in the system, and the scene
and action directories. We also explain how the user
operates the system interface.
3.1 System Configuration
The prototype system consists of six components
— speech recognition, machine translation, regis-
tered sentence retrieval, parallel text based transla-
tion, registered sentence database, speech synthesis
(Figure 1). We have utilized the speech recognition,
machine translation, and speech synthesis described
in (Watanabe et al., 2000). The registered sentence
retrieval component searches the registered sentence
database using the system input. The parallel text
based translation component produces a translation
of the registered sentence selected by the user from
the search results provided by the registered sen-
tence retrieval component. The system input can
be used as the machine-translation target and as a
search key for registered sentence retrieval accord-
ing to the user’s instruction.
Figure 2: A sentence inputted to the automatic inter-
pretation system
Figure 3: The result of registered sentence retrieval
for the sentence in Figure 2
3.2 Registered Sentences
We first listed a traveler’s typical tasks in eleven
scenes where travelers often have to speak to people
and then made a list of typical subtasks by analyzing
the process necessary to accomplish each task. Next,
we composed a sentence template for each subtask
and a list of typical words that could be inserted into
each slot of the templates. We have composed 2590
templates, which can be used to generate 7410 sen-
tences with the slot word-lists, and have installed
these in the system.
We have also composed 1185 templates, which
can be used to generate 1796 sentences through
slot word expansion, as response candidates for the
respondent. Sharing a set of response candidates
among several sentences for the traveler decreases
the total number of response templates needed. A
set of response candidates is linked to every sentence
for the traveler to which the respondent can respond.
Figure 4: English translation of the first registered
sentence in Figure 3
Figure 5: English translation of the input sentence in
Figure 2
3.3 Scene and Action Directories
For each of the eleven scenes, we listed the relevant
tasks in a two-layered tree with 70 leaf nodes to cre-
ate a scene directory. Table 2 shows the top layer
nodes of the scene directory.
We used only the six actions listed in Table 3 for
the action directory and constructed a one-layered
tree since it is diﬃcult for the user to select the action
if actions are classified in detail.
3.4 User Interface
Figure 2 shows the display screen of the prototype
system. In this example, the user inputs the Japanese
sentence “Kono hoteru kara k¯uk¯o ni iku basu wa ari-
masuka. (Is there a bus going to the airport from
this hotel?)” by speaking. The result of the speech
recognition is displayed in the input window at the
center of the screen.
When the user clicks the “kensaku jikk¯o (search)”
button in the screen, the system searches among
Figure 6: The result of registered sentence retrieval
for the sentence “Kozeni o irete kudasai. (I’d like
some small change.)”
Figure 7: The screened result for the scene “Denwa
· Y¯ubin · Gink¯o (Telephone / Mail / Bank)”
the registered sentences using the input sentence as
a key and displays the search result under the in-
put window (Figure 3). The sky-blue color of the
background in the window indicates the sentence se-
lected as the target for translation. The user can se-
lect another sentence including the input sentence by
clicking it.
When the user clicks the “honyaku (translate)”
button after selecting the first registered sentence
in Figure 3, the system retrieves an English trans-
lation of the sentence registered with the Japanese
sentence, displays it (Figure 4), and reads it through
the speech synthesis.
If the user cannot find an appropriate sentence
in the search results, the user can resort to free-
style sentence translation. When the user clicks the
“honyaku (translate)” button after selecting the in-
put sentence, the system translates it into English
through the machine translation, displays it (Figure
Figure 8: English translation of the registered sen-
tence “Kore wa ch ¯umon to chigaimasu. (This is not
what I ordered.)”
Figure 9: Japanese translation of the first response
in Figure 8
5), and reads it through the speech synthesis.
In this way, the user can use free-style sentence
translation and parallel text based translation seam-
lessly for the same input sentence.
Next, we explain how a user can narrow down the
search result by using the directory.
Figure 6 shows the system display when the user
inputs the Japanese sentence “Kozeni o irete kuda-
sai. (I’d like some small change.)” by speaking and
searches for a matching registered sentence. The
search result is displayed in the lower central win-
dow. In this case, no appropriate sentence appears
among the higher ranking sentences.
In Figure 6, the scene directory is displayed in the
left part of the window. When the user selects the
scene “Denwa · Y¯ubin · Gink¯o (Telephone / Mail
/ Bank)”, the search result is narrowed down to the
sentences associated with the scene in “Telephone
/ Mail / Bank” (Figure 7). The registered sentence
“Kozeni o mazete itadakemasuka. (I’d like some
small change.)” is then displayed at the second of
the results, and the user can use this sentence for
translation.
The user can similarly narrow down the result
with the action directory. If necessary, the user can
also use a combination of the scene and the action
directories.
We next explain how a respondent can respond by
selecting from among the response candidates regis-
tered in the system.
Figure 8 shows the screen of the system when the
user selects the registered Japanese sentence “Kore
wa ch¯umon to chigaimasu. (This is not what I or-
dered.)” and translates it. The English translation is
displayed in the upper central window of the screen,
and the response candidates are listed in the lower
central window.
When the respondent selects the first response
from these and clicks the “Trans” button, the system
displays Japanese translation of the response (Figure
9) and reads it through the speech synthesis.
In this way, the respondent can easily respond to
the traveler by selecting a response from among the
provided candidates when the traveler uses a regis-
tered sentence. The traveler can thus fully under-
stand the response.
4 Evaluation
In this section, we evaluate the prototype system
with respect to the extent that typical traveler utter-
ances are covered by the registered sentences and
whether the user can easily find a registered sentence
that matches a natural utterance.
4.1 Coverage provided by the Registered
Sentences
The system can provide correct translation if the in-
put sentence can be matched to a registered sen-
tence; otherwise, it can only provide a translation
through machine translation whose quality is uncer-
tain. To determine the proportion of commonly used
sentences for which the system would provide a cor-
rect translation, we evaluated the coverage provided
by the registered sentences for sentences randomly
extracted from travel conversation corpora.
Table 4 displays the evaluation results. In the ta-
ble, “closed set” denotes the result for sentences ex-
tracted from the corpora we referred to when devel-
oping the registered sentences and “open set” de-
Table 4: Coverage provided by the registered sen-
tences for travel conversation corpora
Total Covered
Closed set 358 260 (72.6%)
Open set 308 163 (52.9%)
Table 5: The number of the subtask for which proper
registered sentence retrieved
Total Subtask Sentence Retrieved
116 74 (63.8%)
notes the result for sentences extracted from corpora
not referred to when developing the registered sen-
tences. The registered sentences covered 72.6% of
the sentences in the closed set and 52.9% of the sen-
tences in the open set.
The coverage of almost 73% for the closed set
suggests that roughly 27% of the sentences in the
corpora are not used for typical travel conversation.
If the open set includes an equal proportion of atyp-
ical sentences, the registered sentences cover about
72% of the typical sentences used for travel in the
open set.
We therefore believe that our prototype system
can provide reliable translation for a minimum set
of utterances necessary for overseas travel.
4.2 Basic Performance of the Registered
Sentence Retrieval
Since registered sentences are retrieved mainly by
using sentences inputted with natural language, a
poorly performing retrieval system may prevent the
user finding an appropriate sentence that is regis-
tered in the system. To determine whether the sys-
tem can reliably retrieve an appropriate sentence
from the utterance the user first thinks of, we ran-
domly picked 116 subtasks for which registered sen-
tences had been developed, had experimental sub-
jects compose a natural sentence that could be used
to accomplish each subtask, and evaluated whether
the retrieval result when using a composed sentence
as a key included the registered sentence.
Table 5 shows the evaluation results. The user
could find a registered sentence corresponding to the
natural utterance with our retrieval system for 63.8%
of the traveler’s subtasks when a sentence for the
subtask was registered in the system.
Although this retrieval system performance is not
suﬃcient when we take into consideration the de-
creased performance caused by recognition errors
in the input sentence, we expect to improve the re-
trieval performance by adding expressions synony-
mous with those in each sentence template to the in-
dex used for registered sentence retrieval.
5 Related Work
Example-based machine translation has been pro-
posed as a method for translating with parallel bilin-
gual sentences (Nagao, 1984; Shirai et al., 1997; Fu-
ruse et al., 1994). An example-based machine trans-
lation system retrieves example sentences similar to
the input sentence, and translates by appropriately
assembling and adapting the retrieved sentence to
the input.
On the other hand, the parallel text based transla-
tion that we apply uses a translation that corresponds
to each sentence as it is. Our system thus enables
robust overall translation by allowing users to select
the most appropriate sentence for the situation.
The technique of translation memory allows users
to apply parallel bilingual sentences and widely used
in commercial systems (Falcone, 2000). Previous
translation results are stored and reused to provide
similar translations for new input. This technique
is mainly used for document translation to improve
the eﬃciency of translation and to ensure the consis-
tency in translation results.
On the other hand, the parallel text based transla-
tion that is used in our system was designed to al-
low seamless cooperation with real-time speech-to-
speech translation. Also, it is equipped in advance
with sentences specially composed for any likely
conversation.
6 Conclusion
We have developed an automatic interpretation sys-
tem by integrating free-style sentence translation
and parallel text based translation. The system pro-
vides a predefined always-correct translation when
the user can use a registered sentence. Users can
easily switch between the two forms of translation if
necessary.
Our prototype of the automatic interpretation sys-
tem for Japanese overseas travelers includes 9206
task-oriented parallel bilingual sentences for trav-
elers. The composed sentences cover 72% of the
sentences typically used during overseas travel. The
system also provides predefined response candidates
that a respondent can use when answering the trav-
eler.
Registered sentences are searched for in response
to natural language sentences input by the user. The
user can narrow down the search results by specify-
ing a scene and an action. We found that users could
find a registered sentence that corresponded to a nat-
ural utterance for 64% of traveler’s subtasks if a sen-
tence for the subtask was registered in the system.
Our next step is to improve the performance of
the registered sentence retrieval. We plan to add ex-
pressions synonymous with those in each sentence
template to the index used for registered sentence
retrieval. We also plan to restrict the search space by
estimating the scene from the dialogue context.

References
Suzanne Falcone. 2000. More translation memory tools
(not many more, but good ones). Translation Journal,
4(2).
Osamu Furuse, Eiichiro Sumita, and Hitoshi Iida. 1994.
Transfer-driven machine translation utilizing emprical
knowledge. Transactions of IPSJ, 35(3):414–425. in
Japanese.
Osamu Furuse, Setsuo Yamada, and Kazuhide Ya-
mamoto. 1998. Splitting long or ill-formed input for
robust spoken-language translation. In Proceedings of
COLING-ACL’98, pages 421–427.
Makoto Nagao. 1984. A framework of a mechanical
translation between japanese and english by analogy
principle. In Alick Elithorn and Ranan Banerji, ed-
itors, Artificial and Human Intelligence, pages 173–
180. North-Holland.
Satoshi Shirai, Francis Bond, and Yamato Takahashi.
1997. A hybrid rule and example based method for
machine translation. In Proceedings of NLPRS-97,
pages 49–54.
Takao Watanabe, Akitoshi Okumura, Shinsuke Sakai,
Kiyoshi Yamabana, Shinichi Doi, and Ken Hanazawa.
2000. An automatic interpretation system for travel
conversation. In Proceedings of ICSLP 2000.
Wakita Yumi, Kawai Jun, and Iida Hitoshi. 1997. Cor-
rect parts extraction from speech recognition results
using semantic distance calculation, and its application
to speech translation. In Proceedings of ACL/EACL-97
Workshop on Spoken Language Translation, pages 24–
31.
