Word Selection for EBMT based on Monolingual Similarity and Translation
Confidence
Eiji Aramaki
DDDE
, Sadao Kurohashi
DDDE
, Hideki Kashioka
DE
and Hideki Tanaka
DE
DD
Graduate School of Information Science and Tech. University of Tokyo
Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
CUaramaki, kuroCV@kc.t.u-tokyo.ac.jp
DE
ATR Spoken Language Translation Research Laboratories
2-2 Hikaridai, Seika, Soraku, Kyoto 619-0288, Japan
CUhideki.kashioka, hideki.tanakaCV@atr.co.jp
Abstract
We propose a method of constructing an
example-based machine translation (EBMT)
system that exploits a content-aligned bilingual
corpus. First, the sentences and phrases in the
corpus are aligned across the two languages,
and the pairs with high translation confidence
are selected and stored in the translation mem-
ory. Then, for a given input sentences, the
system searches for fitting examples based on
both the monolingual similarity and the transla-
tion confidence of the pair, and the obtained re-
sults are then combined to generate the transla-
tion. Our experiments on translation selection
showed the accuracy of 85% demonstrating the
basic feasibility of our approach.
1 Introduction
The basic idea of example-based machine translation, or
EBMT, is that translation examples similar to a part of
an input sentence are retrieved and combined to produce
a translation(Nagao, 1984). In order to make a practi-
cal MT system based on this approach, a large number
of translation examples with structural correspondences
are required. This naturally presupposes high-accuracy
parsers and well-aligned large bilingual corpora.
Over the last decade, the accuracy of the parsers im-
proved significantly. The availability of well-aligned
bilingual corpora, however, has not increased despite our
expectations. In reality, the number of bilingual cor-
pora that share the same content, such as newspapers and
broadcast news, has increased steadily. We call this type
of corpus a content-aligned corpus. With these observa-
tions, we started a research project that covered all as-
pects of constructing EBMT systems starting from using
Figure 1: Translation Example (TE).
a content-aligned corpus, i.e., a bilingual broadcast news
corpus.
First, the sentences and phrases in the corpus are
aligned across the two languages, and the pairs with high
translation confidence are selected and stored in the trans-
lation memory. Then, translation examples are retrieved
based on both the monolingual similarity and the trans-
lation confidence of the pair. Finally, these examples are
combined to generate the translation.
This paper is organized as follows. The next sec-
tion presents how to build the translation memory from
a content-aligned corpus. Section 3 describes our EBMT
system, paying special attention to the selection of trans-
lation examples. Section 4 reports experimental results
of word selection, Section 5 describes related works, and
Section 6 gives our conclusions.
* Underlined phrases and sentences have no parallel expressions in the other language.
Figure 2: NHK News Corpus.
2 Building Translation Memory
In EBMT, an input sentence can hardly be translated by
a single translation example, except when an input is ex-
tremely short or is a typical domain-dependent sentence.
Therefore, two or more translation examples are used to
translate parts of the input and are then combined to gen-
erate a whole translation. Syntactic information is useful
for composing example fragments.
In this paper, we call a structurally aligned bilingual
sentence pair a translation example or TE (Figure 1).
This section presents our method for building TEs from a
content-aligned corpus.
Since the bilingual corpus used in our project does not
contain literal translations, automatic parsing and align-
ment inevitably contain errors. Therefore, we selected
highly likely TEs to make a translation memory.
2.1 NHK News Corpus
We used a bilingual news corpus compiled by the NHK
broadcasting service (NHK News Corpus), which con-
sists of about 40,000 Japanese-English article pairs cov-
ering a five-year period. The average number of Japanese
sentences in an article is 5.2, and that of English sentence
is 7.4. Table 2 shows an example of an article pair.
As shown in Table 2, an English article is not a literal
translation of a Japanese article, although their contents
are almost parallel.
2.2 Sentence Alignment
We used a DP matching for bilingual sentence alignment,
where we allow the matching of 1-to-1, 1-to-2, 1-to-3, 2-
to-1 and 2-to-2 Japanese and English sentence pairs. This
matching covered 84% of the following evaluation set.
We selected 96 article pairs for the evaluation of sentence
and phrase alignment, and we call this the evaluation set.
We use the following score for matching, which is based
on a ratio of corresponding content words (WCR: content
Word Corresponding Ratio).
WCR BP
CF
CS
CF
CY
B7CF
CT
BN (1)
where CF
CY
is the number of Japanese content words in a
unit, CF
CT
is the number of English content words, and CF
CS
is the number of content words whose translation is also
in the unit, which is found by translation dictionaries}
We used the EDR electronic dictionary, EDICT,
ENAMDICT, the ANCHOR translation dictionary, and
Figure 3: Handling of Remaining Phrases.
Figure 4: WCR and Precision.
the EIJIRO translation dictionary. These dictionaries
have about two million entries in total.
On the evaluation data, the precision of the sentence
alignment (defined as follows) was 60.7%.
precision BP
# of correct system outputs
# of system outputs
(2)
Among types of a corresponding unit, the precision of
1-to-1 correspondence was the best, at 77.5%. Since a 1-
to-1 correspondence is suitable for the following phrase
alignment, we decided to use only the 1-to-1 correspon-
dence results.
2.3 Phrase Alignment
The 1-to-1 sentence pairs obtained in the previous sec-
tion are then aligned at phrase level by the method based
on (Aramaki et al., 2001). The method consists of the
following pre-process and two aligning steps.
Pre-process: Conversion to phrasal dependency struc-
tures.
First, the phrasal dependency structures of the sen-
tence pair are estimated. The English parser returns
a word-based phrase structure, which is merged into
a phrase sequence by the following rules and con-
verted into a dependency structure by lifting up head
phrases.
Table 1: Number of TEs.
Corpus WCR #ofTEs
0.3–0.4 18290
NHK News 0.4–0.5 6975
0.5– 2314
White Paper — 2225
SENSEVAL — 6920
1. Function words are grouped with the following
content word.
2. Adjoining nouns are grouped into one phrase.
3. Auxiliary verbs are grouped with the following
verb.
The Japanese parser outputs the phrasal dependency
structure of an input, and that is used as is. We used
The Japanese parser KNP (Kurohashi and Nagao,
1994) and The English nl-parser (Charniak, 2000).
Step 1: Estimation of basic phrasal correspondences.
We started with the word-level alignment to get the
basic phrasal alignment. We used translation dictio-
naries for this process. The word sense ambiguity
in the dictionaries is resolved with a heuristics that
the most plausible correspondence is near other cor-
respondences.
Step 2: Expansion of phrasal correspondences.
Finally, the remaining phrases, which were not han-
dled in the step 1, are merged into a neighboring
phrase correspondence or are used to establish a new
correspondence, depending on the surrounding ex-
isting correspondences. Figure 3 shows an example
of a new correspondence established by a structural
pattern.
These procedures can detect the phrasal alignments in
a pair of sentences as shown in Figure 1.
For phrase alignment evaluation, we selected all of the
145 sentence pairs that had 1-to-1 correspondences form
the evaluation set and gave correct content word corre-
spondences to these pairs. The phrase correspondences
detected by the system were judged correct when the cor-
respondences include the manually given content word
correspondences.
Based on this criterion, the precision of phrase align-
ment was 50%. Then, we found a correlation between
the phrase alignment precision and WCR of parallel sen-
tences as shown in Figure 4. Furthermore, the precision
of sentence alignment and WCR also have a correlation.
Since their performances nearly reaches their limits when
WCR is 0.3, we decided to use parallel sentences whose
WCR is 0.3 or greater as TEs.
Figure 5: Example of Translation.
2.4 Building Translation Memory
As explained in the preceding sections, among sentence-
aligned and phrase-aligned NHK News articles, TEs with
a 1-to-1 sentence correspondence and whose WCR is 0.3
or greater are registered in the translation memory. Table
1 shows the number of TEs for each WCR range.
In addition, the Bilingual White Paper and Translation
Memory of SENSEVAL2 (Kurohashi, 2001) were also
phrase-aligned and registered in the translation memory.
Sentence alignments are already given for these corpora.
Since their parallelism are fairly high and the accuracies
of their phrase alignments are more than 70%, we utilized
all phrase-aligned sentence pairs as TEs (Table 1).
3 EBMT System
Our EBMT system translates a Japanese sentence into
English. A Japanese input sentence is parsed and trans-
formed into a phrase-based dependency structure. Then,
for each phrase, an appropriate TE is retrieved from the
translation memory that is most suitable for translating
Figure 6: Selection of a TE.
the phrase (and its neighboring phrases). Finally, the En-
glish expressions of the TEs are combined to produce the
final English translation (Figure 5).
This section describes our EBMT system, mainly the
TE selection part.
3.1 Basic Idea of TE Selection
The basic idea of TE selection is shown in Figure 6.
When a part of the input sentence and a part of the TE
source language sentence have an equal expression, the
part of the input sentence is called I and the part of the
TE source language sentence is called S. A part of the TE
target language corresponding to S is called T. The pair S
and T is called fragment of TE (FTE).
I, S and T have to meet the following conditions, as a
natural consequence of the fact that S-T is used for trans-
lating I.
1. I, S and T are each structurally connected phrases.
2. I is equal to S except for function words at the
boundaries.
3. S corresponds to T completely, that is, all phrases in
S and T are aligned.
It might be the case that for an I, two or more FTEs that
meet the above conditions exist in the translation mem-
ory. Our method takes into account the following rela-
tions among I-S-T to select the best FTE:
1. The largest pair of I and S.
2. The similarity between the surroundings of I and
these of S.
3. The confidence of alignment between S and T.
The following sections concretely present how to cal-
culate these criteria. For simplicity of explanation, we
call a set of phrasal correspondences between S and T,
EQ; that neighboring EQ, CONTEXT; that between S and
T, ALIGN (Figure 6).
3.2 Monolingual Similarity between Japanese
Expressions
The equality between I and S is a sum of the equality
score of each phrase correspondence in EQ, which is cal-
culated as follows:
EQUALB4CXB5BP
C8
CB
CRD3D2D8
A2BE
AZ
CRD3D2D8
B7BCBMBEA2
C8
CB
CUD9D2CR
A2BE
AZ
CUD9D2CR
BN (3)
where AZ
CRD3D2D8
is the number of content words in the phrase
correspondence, AZ
CUD9D2CR
is the number of function words,
CB
CRD3D2D8
is the equality between content words, and CB
CUD9D2CR
is the equality between function words. CB
CRD3D2D8
and CB
CUD9D2CR
are given in Table 2.
1
Usually, the equality score between I and S is equal to
the number of phrases in I (the number of phrase corre-
spondences in EQ), but sometimes these are slightly dif-
ferent, depending on the conjugation type and function
words.
1
All constant values in Table 2 and formulas were decided
based on preliminary experiments.
On the other hand, the similarity between the surround-
ings of I and those of S is a sum of the similarity score of
each phrase correspondence in CONTEXT, which is cal-
culated as follows:
SIMB4CXB5BP
AQ
CG
CB
CRD3D2D8
A2BE
AZ
CRD3D2D8
B7BCBMBEA2
C8
CB
CUD9D2CR
A2BE
AZ
CUD9D2CR
AR
A2CB
CRD3D2D2CTCRD8
BM
(4)
Basically the calculation of SIM and EQUAL is the
same, except that SIM considers the relation type between
the phrase in I and its outer phrase by CB
CRD3D2D2CTCRD8
. When
the relation is the same, the influence of the surrounding
phrases must be large, so CB
CRD3D2D2CTCRD8
is set to 1.0; when the
relation is not the same, CB
CRD3D2D2CTCRD8
is set to 0.5. The rela-
tions between phases are estimated by the function word
or conjugation type of the dependent phrase.
The monolingual similarity between Japanese expres-
sions I and S is calculated as follows:
CG
CXBEBXC9
EQUALB4CXB5B7
CG
CXBEBVC7C6CCBXCGCC
SIMB4CXB5BM (5)
3.3 Translation Confidence of Japanese-to-English
Alignment
The translation confidence of phrase alignment between
S and T is the sum of the confidence score of each phrase
correspondence in ALIGN, CONF(CX) in Table 2, and it is
weighted by the WCR of the parallel sentences.
As a final measure, the score of I-S-T is calculated as
follows:
D2
CG
CXBEBXC9
EQUALB4CXB5B7
CG
CXBEBVC7C6CCBXCGCC
SIMB4CXB5
D3
A2
D2
CG
CXBEBTC4C1BZC6
CONFB4CXB5
D3
A2WCRBM (6)
3.4 Search Algorithm of FTE
For each phrase (P) in an input sentence, the most plausi-
ble FTE is retrieved by the following algorithm:
1. FTEs are retrieved from the translation memory, in
which a Japanese phrase matches P, and it is aligned
to an English phrase. (that is, these are FTEs that
meet the basic conditions for translation in Section
3.1).
2. For each FTE obtained in the previous step, it is
checked whether the surrounding phrase of P and
that of FTE are the same or similar, phrase by
phrase, and the largest I-S-T that meets the basic
conditions is detected.
Table 2: Parameters for Similarity and Confidence Calculation.
1.1 exact match
1.0 stem match
CB
CRD3D2D8
0.5 A2 CB
D2D8D8
+ 0.3 thesaurus match
0.3 POS match
0 otherwise
* CB
D2D8D8
is a similarity calculated based on NTT thesaurus(Ikehara et al., 1997) (max = 1).
1.1 exact match
CB
CUD9D2CR
1.0 stem match
0 otherwise
1.0 all content words in alignment CX correspond to each other in dic
CONF(CX) 0.8 some content words in alignment CX correspond to each other in dic
0.5 otherwise
3. The score of each I-S-T is calculated, and the best
I-S-T (S-T is the FTE) is selected as the FTE for P.
As a result of detecting FTEs for phrases in the input,
two FTEs starting from the different phrase might over-
lap each other. In such a case, we employed a greedy
search algorithm that adopts the higher score FTE one
by one; therefore, each previously adopted FTE is only
partly used for translation.
On the other hand, when no FTE is obtained for an in-
put phrase, a translation dictionary is utilized (when the
phrase contains two or more content words, the longest
matching strategy is used for dictionary look-up). When
two or more possible translations are given from the dic-
tionary, the most frequent phrase/word in the NHK News
Corpus is adopted.
Figure 5 shows examples of FTEs detected by our
method.
2
3.5 Generating a Target Sentence
The English expressions in the selected FTEs are com-
bined, and the English dependency structure is con-
structed. The dependency relations in FTEs are pre-
served, and the relation between the two FTEs is esti-
mated based on the relation of the input sentences. Figure
5 shows an example of a combined English dependency
structure.
When a surface expression is generated from its depen-
dency structure, its word order must be selected properly.
This can be done by preserving the word order in FTEs
and by ordering FTEs by a set of rules governing both the
dependency relation and the word-order.
The module for controlling conjugation, determiner,
and singular/plural is not yet implemented in our current
MT system.
2
As the bottom example in Figure 5 shows, EBMT can eas-
ily handle head-switching translation by using an FTE that con-
tains all of the head-switching phenomena in it.
4 Experiments
For evaluation, we selected 50 sentence pairs from the
NHK News Corpus that were not used for the translation
memory. Their source (Japanese) sentences were trans-
lated by our EBMT system, and the selected FTEs were
evaluated by hand, referring to the target (English) sen-
tences.
A phrase by phrase evaluation was done to judge
whether the English expression of the selected FTE was
good or bad. The accuracy was 85.0%.
In order to investigate the effectiveness of each com-
ponent of FTE selection, we compared the following four
methods:
1. EQCONTEXTALIGN: The proposed method.
2. EQALIGN: FTE score is calculated as follows, with-
out the CONTEXT similarity:
CG
CXBEBXC9
EQUAL(CX) A2
CG
CXBEBTC4C1BZC6
CONF(CX)A2 WCRBM (7)
3. EQCONTEXT: FTE score is calculated as follows,
without the ALIGN confidence:
CG
CXBEBXC9
EQUAL(CX) B7
CG
CXBEBVC7C6CCBXCGCC
SIM(CX)BM (8)
4. DICONLY: Word selection is based only on dictio-
naries and frequency in the corpus.
The accuracy of each method is shown in Table 3,
and the results indicate that the proposed method, EQ-
CONTEXTALIGN, is the best, that is, using context sim-
ilarity and align confidence works effectively. Figure 7
Figure 7: Word Selection by EQCONTEXTALIGN and DICONLY.
Table 3: Experimental Results.
Good Bad Accuracy
EQCONTEXTALIGN 268 47 85.0%
(246) (35) (87.5%)
EQALIGN 254 61 80.6 %
(233) (48) (82.9%)
EQCONTEXT 234 80 74.2%
(213) (68) (75.8%)
DICONLY 232 83 73.6%
* Values in brackets indicate the accuracy only for FTEs,
excluding cases in which the dictionary was used as a
backup.
shows examples of EQCONTEXTALIGN and DICONLY.
EQCONTEXTALIGN usually selects appropriate words,
compared to DICONLY.
When there are no plausible translation examples in the
translation memory, the system selects a low-similarity or
low-confidence FTE. However we believe this problem
will be resolved as the number of translation examples
increases, since the News Corpus is increasing day by
day.
5 Related Work
The idea of example based machine translation systems
was first proposed by (Nagao, 1984), and preliminary
systems that appeared about ten years (Sato and Na-
gao, 1990; Sadler and Vendelmans, 1990; Maruyama and
Watanabe, 1992; Furuse and Iida, 1994) showed the basic
feasibility of the idea.
Recent studies have focused on the practical aspects
of EBMT, and this technology has even been applied
to some restricted domains. The work in (Richardson
et al., 2001; Menezes and Richardson, 2001) addressed
the problem of technical manual translation in several
languages, and the work of (Imamura, 2002) dealt with
dialogues translation in the travel arrangement domain.
These works select the translation example pairs based
solely on the source language similarity. We believe this
is partly due to the high parallelism found in their cor-
pora.
Our work targets a more general corpus of wider cover-
age, i.e., the broadcast news collection. Generally avail-
able corpora like the one we use tend to be more freely
translated and suffer from lower parallelism. This com-
pelled us to use the criterion of translation confidence,
together with the criterion of monolingual similarity used
in the previous works. As we showed in this paper, this
metric succeeded in meeting our expectations.
6 Conclusion
In this paper, we described operations of the entire EBMT
process while using a content-aligned corpus, i.e., the
NHK Broadcast Corpus. In this process, one of the key
problems is how to select plausible translation examples.
We proposed a new method to select translation exam-
ples based on source language similarity and translation
confidence. In the word selection task, the performance
is highly accurate.
Acknowledgements
This work was supported in part by the 21st Century COE
program “Information Science and Technology Strate-
gic Core” at University of Tokyo and by a contract with
the Telecommunications Advancement Organization of
Japan, entitled “A study of speech dialogue translation
technology based on a large corpus”.

References
Eiji Aramaki, Sadao Kurohashi, Satoshi Sato, and Hideo
Watanabe. 2001. Finding translation correspondences
from parallel parsed corpus for example-based transla-
tion. In Proceedings of MT Summit VIII, pages 27–32.
Eugene Charniak. 2000. A maximum-entropy-inspired
parser. In In Proceedings of NAACL 2000, pages 132–
139.
Osamu Furuse and Hitoshi Iida. 1994. Constituent
boundary parsing for example-based machine transla-
tion. In Proceedings of the 15th COLING, pages 105–
111.
Satoru Ikehara, Masahiro Miyazaki, Satoshi Shirai, Akio
Yokoo, Hiromi Nakaiwa, Kentarou Ogura, and Yoshi-
fumi Oyama Yoshihiko Hayashi, editors. 1997.
Japanese Lexicon. Iwanami Publishing.
Kenji Imamura. 2002. Application of translation knowl-
edgeacquired by hierarchical phrase alignment for
pattern-based mt. In Proceedings of TMI-2002, pages
74–84.
Sadao Kurohashi and Makoto Nagao. 1994. A syntactic
analysis method of long Japanese sentences based on
the detection of conjunctive structures. Computational
Linguistics, 20(4).
Sadao Kurohashi. 2001. Senseval2 Japanese translation
task. In Proceedings of SENSEVAL2, pages 37–40.
Hiroshi Maruyama and Hideo Watanabe. 1992. The
cover search algorithm for example-based translation.
In Proceedings of TMI-1992, pages 173–184.
Arul Menezes and Stephen D. Richardson. 2001. A best-
first alignment algorithm for automatic extraction of
transfer mappings from bilingual corpora. In Proceed-
ings of the ACL 2001 Workshop on Data-Driven Meth-
ods in Machine Translation, pages 39–46.
Makoto Nagao. 1984. A framework of a mechanical
translation between Japanese and english by analogy
principle. In In Artificial and Human Intelligence,
pages 173–180.
Stephen D. Richardson, William B. Dolan, Arul
Menezes, and Monica Corston-Oliver. 2001. Over-
coming the customization bottleneck using example-
based mt. In Proceedings of the ACL 2001 Work-
shop on Data-Driven Methods in Machine Translation,
pages 9–16.
V. Sadler and R. Vendelmans. 1990. Pilot implementa-
tion of a bilingual knowledge bank. In Proeedings of
the 13th COLING, pages 449–451.
Satoshi Sato and Makoto Nagao. 1990. Toward memory-
based translation. In Proceedings of the 13th COLING,
pages 247–252.
