Automatic Construction of a Transfer Dictionary
Considering Directionality
Kyonghee Paik, Satoshi Shirai and Hiromi Nakaiwa
fkyonghee.paik,hiromi.nakaiwag@atr.jp * sat@fw.ipsj.or.jp
ATR Spoken Language Translation Laboratories
2-2-2, Keihanna Science City Kyoto, Japan 619-0288
 NTT Advanced Technology Corporation
12-1, Ekimaehoncho, Kawasaki-ku, Kawasaki-shi, Japan 210-0007
Abstract
In this paper, we show how to construct a
transfer dictionary automatically. Dictionary
construction, one of the most di cult tasks
in developing a machine translation system, is
expensive. To avoid this problem, we investi-
gate how we build a dictionary using existing
linguistic resources. Our algorithm can be ap-
plied to any language pairs, but for the present
we focus on building a Korean-to-Japanese
dictionary using English as a pivot. We
attempt three ways of automatic construction
to corroborate the e ect of the directionality
of dictionaries. First, we introduce \one-time
look up"method using a Korean-to-English and
a Japanese-to-English dictionary. Second, we
show a method using \overlapping constraint"
with a Korean-to-English dictionary and an
English-to-Japanese dictionary. Third, we con-
sider another alternative method rarely used
for building a dictionary: an English-to-Korean
dictionary and English-to-Japanese dictionary.
We found that the  rst method is the most
e ective and the best result can be obtained
from combining the three methods.
1 Introduction
There are many ways of dictionary building.
For machine translation, a bilingual transfer
dictionary is a most important resource. An in-
teresting approach is the Papillon Project that
focuses on building a multilingual lexical data
base to construct large, detailed and principled
dictionaries (Boitet et al., 2002). The main
source of multilingual dictionaries is monolin-
gual dictionaries. Each monolingual dictionary
is connected to interlingual links. To make
this possible, we need many contributors, ex-
 Some of this research was done while at ATR.
perts and the donated data. One of the stud-
ies related to the Papillon Project tried to link
the words using de nitions between English and
French, but the method can be extended to
other language pairs (Lafourcade, 2002). Other
research that focuses on the automatic build-
ing of bilingual dictionaries include Tanaka and
Umemura (1994), Shirai and Yamamoto (2001),
Shirai et al. (2001), Bond et al. (2001), and
Paik et al. (2001).
Our main concern is automatically building
a bilingual dictionary, especially with di erent
combinations of dictionaries. None of the re-
search on building dictionaries seriously consid-
ers the characteristics of dictionaries. A dic-
tionary has a peculiar characteristic according
to its directionality. For example, we use a
Japanese-to-English (henceforth, J)E) dictio-
nary mainly used by Japanese often when they
write or speak in English. Naturally, in this sit-
uation, a Japanese person knows the meaning
of the Japanese word that s/he wants to trans-
late into English. Therefore, an explanation for
the word is not necessary, except for the words
whose concept is hard to translate with a single
word. Part-of-speech (henceforth POS) infor-
mation is also secondary for a Japanese person
when looking up the meaning of the correspond-
ing equivalent to the Japanese word.
On the other hand, an English-to-Japanese
(henceforth E)J) dictionary is basically used
from a Japanese point of view to discover the
meaning of an English word, how it is used and
so on. Therefore, explanatory descriptions, ex-
ample sentences, and such grammatical infor-
mation as POS are all important. As shown in
(2), a long explanation is used to describe the
meaning of tango, its POS and such grammat-
ical information as singular or plural. Also, an
E)J dictionary includes the word in plenty of
examples, comparing to a J)E dictionary. The
following examples clearly show the di erence.
(1) J)E: a0a2a1a4a3 : a5 dancea6 the tango a5a8a7a7a7 sa6
(2) E)J: tan a9 go /(n. pl a7a7a7 s)
a0a10a1a10a3 :a.
a11a10a12a2a13a2a14a16a15a18a17a20a19a22a21a24a23a26a25a10a27a10a28a18a23a24a29a31a30 ..etc.
(trans. tango \a dance of Central African abo-
riginals,...etc.")b. a32a22a23a24a33 (trans. \its music")Vi
a0a2a1a2a3a2a34a36a35a38a37 (\to dance the tango").
In this paper, we evaluate the e ects that occur
when we use di erent combinations of dictio-
naries and merge them in di erent ways.
2 Conventional Methods and
Problems
The basic method of generating a bilingual
dictionary through an intermediate language
was proposed by Tanaka and Umemura
(1994). They automatically constructed a
Japanese-French dictionary with English as an
intermediate language and manually checked
the extracted results. In this sense, their
method is not completely automatic. They
looked up English translations for Japanese
words, and then French translations of these
English translations. Then, for each French
word, they looked up all of its English trans-
lations. After that, they counted the number
of shared English translations (one-time
inverse consultation). This was extended to
\two-time inverse consultation". They looked
up all the Japanese translations of all the
English translations of a given French word
and counted how many times the Japanese
word appears. They reported that \comparing
the generated dictionary with published dic-
tionaries showed that data obtained are useful
for revising and supplementing the vocabulary
of existing dictionaries." Their method shows
the basic method of building a dictionary using
English as an intermediate language. We ap-
plied and extended their method in automatic
dictionary building especially considering the
directionality of dictionaries.
Tanaka and Umemura (1994) used four dic-
tionaries in two directions (J)E, E)J, F)E
and E)F). They  rst harmonized the dictio-
naries by combining the J)E and E)J into
a single J,E and the F)E and E)F into a
harmonized F,E dictionary. We followed their
basic method without harmonizing the dictio-
naries to emphasize the in uence of directional-
ity.
In general, foreign word entries in a bilingual
dictionary attempt to cover the entire vocabu-
lary of the foreign language. However, foreign
words that do not correspond to one’s mother
tongue are not recorded in a bilingual dictio-
nary from one’s mother tongue to the foreign
language (Hartmann, 1983). A long explana-
tory phrase is replaced with a word that often
does not perfectly correspond to the original.
On the other hand, most of the index words
from a foreign language to a mother tongue in-
clude many expository de nitions or explana-
tions that focus on usage. Such syntactic infor-
mation as POS and number as well as exam-
ple sentences are rich compared with a dictio-
nary from mother tongue to a foreign language.
These characteristics should be considered when
building a dictionary automatically.
Bond et al. (2001) showed how semantic
classes can be used along with an intermediate
language to create a Japanese-to-Malay dictio-
nary. They used semantic classes to rank trans-
lation equivalents so that word pairs with com-
patible semantic classes are chosen automati-
cally as well as using English to link pairs. How-
ever, we cannot use this method for languages
with poor language resources, in this case se-
mantic ontology. Paik et al. (2001) improved
the method to generate a Korean-to-Japanese
(henceforth K)J) dictionary using multi-pivot
criterion. They showed that it is useful to build
dictionaries using appropriate multi-pivots. In
this case, English is the intermediate language
and shared Chinese characters between Korean
and Japanese are used as pivots.
However, none of the above methods con-
sidered the directionality of the dictionaries in
their experiments. We ran three experiments to
emphasize the e ects of directionality.1 There
are many approaches to building a dictionary.
But our focus will be on the generality of build-
ing any pair of dictionaries automatically using
English as a pivot. In addition, we want to con-
 rm various directionalities between a mother
tongue and a foreign language.
1The  rst two experiments were reported in Shirai
and Yamamoto (2001) and Shirai et al. (2001). We
present new evaluations in this paper.
3 Proposed Method
We introduce three ways of constructing a K)J
dictionary. First, we construct a K)J dic-
tionary using a K)E dictionary and a J)E.
Second, we show another way of constructing
a K)J dictionary using an K)E dictionary
and an E)J dictionary. Third, we use a novel
way of dictionary building using an E)K and
E)J to build a K)J dictionary. However, our
method is not limited to building a K)J dic-
tionary but can be extended to any other lan-
guage pairs so long as X-to-English or English-
to-X dictionaries exist. These three methods
will cope with making dictionaries using any
combination.
We assume that the following conditions hold
when building a bilingual dictionary: (1) Both
the source language and the target language
cannot be understood (to build a dictionary
of unknown language pairs); (2) Various lex-
ical information of the intermediate language
(English) is accessible. (3) Limited information
about the source and target language may be
accessible.
3.1 Lexical Resources
Our method can be extended to any other
language pairs if there are X-to-English and
English-to-X dictionaries. It means that there
are four possible combinations such as i)
X-to-English and Y-to-English, ii) X-to-English
and English-to-Y, iii) English-to-X and Y-to-
English and iv) English-to-X and English-to-Y
to build a X-to-Y dictionary. We tested i), ii)
and iv) in this paper and we used the following
dictionaries in our experiment:
Type # Entries Dictionary
J)E 28,310 New Anchor 2
E)J 52,369 Super Anchor3
K)E 50,826 Yahoo K)E 4
E)K 84,758 Yahoo E)K 4
3.2 Linking K)E and J)E
Our method is based upon a one-time in-
verse consultation of Tanaka and Umemura
(1994)( See Section 2.) to judge the word cor-
respondences of Korean and Japanese.
Lexical Resources used here is a K)E dic-
tionary (50,826 entries) and a J)E dictionary
2(Yamagishi et al., 1997) 3 (Yamagishi and Gunji,
1991) 4 http://kr.engdic.yahoo.com
(28,310 entries). There is a big di erence in the
number of entries between the two dictionaries.
This will a ect the total number of extracted
words.
For Evaluation, we use a similarity score S1
for a Japanese word j and a Korean word k is
given in Equation (1), where E(w) is the set of
English translations of w. This is equivalent to
the Dice coe cient. The extracted word pairs
and the score are evaluated by a human to keep
the accuracy at approximately 90%.
S1(j;k) = 2  jE(j) \ E(k)jjE(j)j + jE(k)j (1)
The most successful case is when all the En-
glish words in the middle are shared by K)E
and J)E. Figure 1 shows how the link is real-
ized and the similarity scores are shown in Table
1. The similarity score shows how many English
words are shared by the two dictionaries: the
higher the score, the higher possibility of suc-
cessful linking. However, as Table 1 shows, we
have to sort out the inappropriately matched
pairs by comparing the S1 score of equation (1)
against a threshold  . The threshold allows us
to exclude unfavorable results. For example,
for words having one shared English translation
equivalent, we have to discard the group (3) in
Table 1.
When the words translated from English
match completely, the accuracy is high. And if
the number of shared English translated words
(jE(J) \ E(K)j) is high, then we get a high
possibility of accurate matching of Korean and
Japanese. However, accuracy deteriorates when
the number of the shared English translated
words (shown by the threshold) decreases as
in (2) and (3) of Table 1. We solved this
problem by varying the threshold according
to the number of shared English equivalents.
The value of the threshold  was determined
experimentally to achieve an accuracy rate of
90%.
Result: Linking through English gives a to-
tal of 175,618 Korean-Japanese combinations.
To make these combinations, 28,479 entries out
of 50,826 from the K)E dictionary and 17,687
entries out of 28,310 from the J)E dictionary
are used. As a result, we can extract 25,703 es-
timated good matches with an accuracy of 90%.
Korean English Japanese
check
a0a2a1a4a3
a5 cheque
a6a8a7a8a9
examine a10a12a11a8a13a15a14
a16
a17a19a18
a20 prevent
a21a23a22a8a13a15a14
prevent from
Figure 1: Linking through English translation equivalents (K)E, J)E)
Shared Eng.  Korean ) English Japanese ) English
(1) 2 1.000 (a24a26a25a28a27a29 check;cheque) (a30a32a31a32a33 check;cheque)
2 1.000 (a34a35a23a36a37 check;cheque) (a30a32a31a38a33 check;cheque)
(2) 1 .667 (a24a26a25a28a27a29 check;cheque) (a39a41a40 check)
(3) 1 .500 (a24a26a25a28a27a29 check;cheque) (a42a44a43a38a45a47a46 check;examine)
1 .400 (a24a26a25a28a27
a29 check;cheque) (
a48a50a49a38a45a47a46 prevent from;prevent;check)
1 .333 (a34a35a23a36a37 check;cheque) (a51a12a52a53a46 leave;deposit;check;entrust)
Table 1: Example of linking through English translations
Shared Eng5 Extracted  Good matches
7 1 0 1
6 1 0 1
5 16 0 16
4 165 0 165
3 1,325 0.4 1,206
2 12,037 0.5 7,401
1 161,863 0.667 16,790
Total 175,408 25,580
Table 2: Matching words by K)E +J)E
3.3 Linking K)E and E)J
Method: We investigated how to improve
the extraction rate of equivalent pairs using
an overlapping constraint method here.
To extract Korean-Japanese word pairs, we
searched consecutively through a K)E dictio-
nary and then an E)J dictionary. We take
English sets corresponding to Korean words
from a Korean-English dictionary and Japanese
translation sets for each English words from an
E)J dictionary. The overlap similarity score
S2 for a Japanese word j and a Korean word
k is given in Equation (2), where E(w) is the
set of English translations of w and J(E) is the
bag of Japanese translations of all translations
of E.
S2(j;k) = jjj;j 2 J(E(k)); (2)
After that, we test the narrowing down of trans-
lation pairs by extracting the overlapped words
in the Japanese translation sets. See Figure 2.
Lexical Resources: We used a K)E dictio-
nary (50,826 entries), the same as the one used
in section 3.2 and a E)J dictionary (52,369 en-
tries). Compared to the resources used in our
 rst method, the number of entries are well bal-
anced.
Evaluation: After extracting the over-
lapped words in the Japanese translation sets,
the words were evaluated by humans. The
main evaluation was to check the correlation
between the overlaps and the matches of
Korean and Japanese word pairs. Table 3
shows the overlapped number of shared English
words and the number of index words of the
K)E dictionary.
Overlaps Num of entries in K)E
4 or more 1,286
3 3,097
2 13,309
1-to-1 match 1,315
Subtotal 19,007
Other match 8,832
No Match 22,987
Total 50,826
Table 3: The number of entries in K)E dictio-
nary according to overlapped English words
Result: Entries with a 1-to-1 match have
jE(K)j = jE(J)j = 1. These are generally good
matches (90%). If more than two overlaps oc-
cur, then the accuracy matching rate is as high
as 84.0%. It means that the number of useful
entries is the sum of the 1-to-1 matches and 2 or
more overlaps: 19,007 (37.4% of the K)E en-
tries) with 87% accuracy. However, using K)E
and E)J there is a problem of polysemy in En-
glish words. For example, clean has two di er-
ent POSs, adjective and verb in a K)E dictio-
Korean English Japanese Overlaps
a0a2a1 1
clean a3a5a4a7a6a9a8 1
a3a5a4a7a6a11a10a23a13a15a14 1
neatly a3a5a4a7a6a11a10 2
a12a14a13a16a15
a17a18a20a19a22a21
a9a24a23a7a25
a1 1
tidily a3a27a26a9a28a30a29 2
a31a11a32
a29 2
cleanly - 0
Figure 2: Overlapping Translation equivalents (K)E, E)J)
nary. Unfortunately, this information cannot be
used e ectively due to the lack of POS in K)E
when linking them to a E)K dictionary. On
the other hand, clean using E)J can be trans-
lated into either a3a33a4a30a6a34a8 , an adjective or a3a35a4
a6a11a10a53a13 a14 , a verb. This makes the range of over-
lap score widely distributed as shown in Figure
2. This is the reason using K)E and E)J is
not as good as using K)E and J)E. We will
discuss this more in section 4.
3.4 Linking E)K and E)J
As we have discussed in earlier sections, the
characteristics of dictionaries di er according to
their directionality. In this section, we intro-
duce a novel method of matching translation
equivalents of Korean and Japanese. From the
Korean speaker’s point of view, the E)K dic-
tionary covers all English words, includes ex-
planatory equivalents, and example sentences
showing usage. The same thing is true for the
E)J dictionary from a Japanese speaker’s point
of view. In this respect, we expect that the
result of extraction is not as e ective as the
other combinations such as K)E +J)E and
K)E +E)J. On the other hand, we think that
there must be other ways to exploit explanatory
equivalents and example sentences.
Method: First, we linked all the Korean
and Japanese words if there is any shared En-
glish words. Then, we sorted them according to
POSs to avoid the polysemous problem of POS.
The left hand side of Figure 3 shows how we
link Korean and Japanese pairs.
Lexical Resources: We used a E)K dic-
tionary (84,758 entries) and a E)J dictionary
(52,369 entries). Both of the dictionaries have
many more entries than the ones used in the
previous two methods.
Evaluation: We use similarity score S3 in
Equation (3) as a threshold which is used to
extract good matches.
S3(k;j) = jK(E(k) \ E(j))j + jJ(E(k) \ E(j))jjE(k) \ E(j)j
(3)
K(W): bag of Korean translations of set W
J(W): bag of Japanese translations of set W
E(w): set of English translations of word w
jK(E)j means the number of Korean trans-
lation equivalents, andjJ(E)j means the num-
ber of Japanese translation equivalents. The
sum of the numbers is divided by the number
of intermediate English words. It is used to re-
duce the polysemous problem of English words.
It is because it is hard to decide which trans-
lation is appropriate, if an English word has
too many translation equivalents in Korean and
Japanese. The value of threshold (S3) is shown
in Table 4. We vary the threshold according
to N = jE(j) \ E(k)j to maximize the number
of successful matches experimentally. N repre-
sents the number of intermediate English words.
For N=1, we only count one-to-one matches,
which means one Korean and one Japanese are
matched through only one English. The follow-
ing are examples of being counted when N is
1-to-1: e.g. a36a38a37a14a39 a21a41a40a43a42a44a46a45 a21 -autosuggestion(n.)- a47a33a48a27a49
a50 ,
a51
a42
a52a54a53
a55 (
a56
a57a58 )
a59 -billiard(a.)-a60a11a61a30a3a35a62 , etc. We may
lose many matching pairs by this threshold, but
the accuracy rate for 1-to-1 is very high (96.5%).
To save other matches when N=1, we need to
examine further. In our experiment, a63 a21a64a19a66a65 a56a67 , a68
a69a71a70
a6 is rejected because lovely has two Korean
translations and two Japanese translations; the
match a63
a21a64a19a66a65
a56
a67 ,
a68
a69a24a70
a6 is not 1-to-1. We post-
pone this part to further research.
N Extracted Matched Good S3 Extracted Matched Good
24-6 438 422 96.3% any 438 422 96.3%
5 313 301 96.2%  35 302 293 97.0%
4 790 698 88.3%  25 661 601 90.9%
3 2,432 1,960 80.7%  10 634 586 92.4%
2 12,862 (6,784) (52.8%)  10 3,613 (3,150) (87.2%)
 1[-to-1] 4,712 (4,547) (96.5%) 2 4,712 (4,547) (96.5%)
21,547 (14,712) (68.3%) 10,360 (9,599) (92.7%)
Table 4: Summary of matching words by E)K and E)J
N: Number of total English translation equivalents
 : We only count word pairs under the condition of 1-to-1 match.
Korean English Japanese Examples N S3 Matches
a63
a21a64a19a66a65
a56
a67
a68
a69a71a70
a6
lovely (a.) a0 a70 a6 a63 a21a22a19a66a65 a56a67 ,a68 a69a71a70 a6 1 (2+2)/1=4.0 N
a19
a37a2a1
a3a4a6a5
a37 a56
a67  ne (a.)
a0a8a7a24a8 a63
a21a22a19a66a65
a56
a67 ,
a0
a70
a6 1 (2+2)/1=4.0 N
beautiful(a.) a13a10a9 a69a71a70 a6 a19 a37a2a1a3a4a6a5 a37 a56a67 ,a0 a70 a6 4 (9+11)/4=5.0 Y
a11
a12
a56
a67 a13a8a14
a8
a19
a37a2a1
a3a4a6a5
a37 a56
a67 ,
a0a8a7a24a8 2 (5+7)/2=6.0 Y
a15
a16a58a18a17a20a19
a52a22a21a24a23
a25 fair (a.)
a15
a16a58a18a17a20a19
a52a22a21a24a23
a25 ,
a26a27a4a28a27 1 (3+4)/1=7.0 N
a29
a30a31
a56
a3a32
a26a27a4a10a27
Figure 3: An example of matching E)K and E)J
Result: Table 4 shows the extracted 21,564
pairs of Korean and Japanese words. On av-
erage, 14,712 pairs match with a 68.3% suc-
cess rate. The numbers in parentheses are esti-
mated.
As expected, by setting this threshold we get
fewer extracted words such as 10,360 words as
shown in Table 4. However, the accuracy of the
matched word pairs averages 92.7%.
Comparison: To compare the three meth-
ods, we randomly chose 100 Korean words from
a K)J dictionary6 which could be matched
through all three methods. The number of
extracted matches was 28 using K)E and
J)E, 34 using K)E and E)J, and 13 using
E)K and E)J. For K)E and E)J method,
21 out of 34 K)J pairs were found only in
K)E and E)J method but not in K)E and
J)E method. Among the 21 new K)J word
pairs, only one pair is an error (not a good
match). One new pair was found in E)K and
E)J method. Therefore, combining all three
methods gave 49 (28+20+1) di erent K)J
pairs, a better result than any single method.
These results are shown in Table 5. Clearly
6We used Korean-Japanese dictionary
(Shogakukan: 1993) for the sampling that includes
110,000 entries, many of which are used infrequently.
the dictionaries used greatly a ect the number
of matches. The number of matches could be
improved by considering English derived forms
(e.g. matching con rmation with con rm).
K)E +J)E K)E +E)J E)K +E)J
Total 28 34 13
Good 28 33 10
Error 0 1 3
Table 5: Comparison of the Proposed Methods
4 Discussion
We have shown the results of di erent match-
ing metrics for di erent dictionary directions.
Directionality is an important matter for
building dictionaries automatically. In a K)E
(or J)E) dictionary an index word contains
non-conjugated forms whereas an index word
in E)K (or E)J) dictionary contains POS
and conjugated forms. Therefore we expect the
combination of K)E and J)E to be better
than K)E and E)J since we can avoid the
mismatch of POS.
On the other hand, a dictionary E)K or
E)J contains less uniform information such
as long expository terms, grammatical explana-
tions and example sentences. Especially, POS
is far more detailed than the dictionaries of the
other direction. These all contribute to fewer
good matching words.
As for the better result using K)E and J)E,
we cannot overlook language similarity: Korean
and Japanese are very similar with respect to
their vocabularies and grammars. This must
have result in sharing relatively more appropri-
ate English translations and further matching
more appropriate Korean and Japanese trans-
lation equivalents.
In the combination of K)E and E)J, the
common English translations are reduced due to
the characteristics of K)E and E)J. A K)E
dictionary from the Korean speaker’s point of
view tends to have relatively simple English
equivalents and normally POS is not shown. On
the other hand, an E)J dictionary shows such
complicated equivalents as explanation of the
entry a, a piece of translation equivalent b and
grammatical information as shown in (2) in Sec-
tion 1. Therefore, it is natural that the match-
ing rate is far less than the combination of K)E
and J)E. Considering the size of dictionaries
used in K)E and J)E (estimated maximum
matches: 28,310 K)J pairs) and the one used in
K)E and E)J (estimated maximum matches:
50,826 K)J pairs), we extrapolate from Table 5
that the method using K)E and J)E is better
than the method using K)E and E)J.
We concluded that: K)E + J)E outper-
forms K)E + E)J which outperforms E)K
+ E)J. The following brie y summarizes the
three methods.
 K)E + J)E:
{ Equal characteristics of the dictionaries
{ The meaning of the registered words tends to
be translated to a typical, core meaning in
English
{ Synergy e ect: Korean and Japanese are very
similar, leading to more matching.
 K)E + E)J:
{ The combination of di erent characteristics
of dictionaries makes automatic matching less
successful.
{ A core meaning is extended to a peripheral
meaning at the stage of looking up E)J. (See
Figure 2.)
 E)K + E)J:
{ There are far fewer matches.
{ We can take advantage of example sentences,
expository terms, and explanations to extract
functional words.
{ We can improve accuracy by including En-
glish POS data.
Even though we expected that the combina-
tion of dictionaries between E)K and E)J will
not provide a good result, it is worthwhile to
know limits. After analyzing all of the result,
we found that there is the e ect of dictionary
directionality. Also, we con rm that if we can
use all the methods and combine them, we will
get the best result since the output of the three
dictionary combinations do not completely over-
lap.
Future Work
Our goal is not restricted to making a Korean-
Japanese dictionary, but can be extended to any
language pair. We assume that we do not know
the source and target languages so well that it
is not easy to match just the content words. In-
stead, we need to match automatically any kind
of entries, even such functional words as parti-
cles, su xes and pre xes. We think that it is
best to extract these functional words by tak-
ing advantage of the characteristics of the E)K
and E)J dictionaries. For example, one of the
merits of using E)K and E)J is that we can
get conjugated forms such as the Korean adjec-
tive a0 a1a3a2a4a5a7a6 a1a9a8a10 which matches the English adjec-
tive beautiful; it is normally not registered in
a K)E dictionary because a0 a1a3a2a4a5a7a6 a1a9a8a10 is an ad-
jective conjugated form of the root a0 a1a3a2a4a5a12a11a14a13a15 a6 a1 .
Only the root forms are registered in an X-to-
English dictionary. Also for verbs, we can get
non  nite forms using E)K and E)J dictionar-
ies. As index word, the non-conjugated forms
are registered in a J)E dictionary such as a3a3a3
a4a4a4 a6a6a6a17a16a16a16 meaning beautiful or clean. However, by
using E)J, we can get conjugated forms such
as a3a3a3a5a4a4a4a7a6a6a6a11a10a10a10 , a3a3a3a20a4a4a4 a6a6a6 a8a8a8 and so forth. Registering
all conjugated forms in a dictionary simpli es
the development of a machine translation sys-
tem and further second language acquisition.
The direction from English-to-X contains a
lot of example sentences. So far, the idea of us-
ing example sentences and idiomatic phrases for
dictionary construction has not been adopted.
To check the possibility of extracting functional
words, we extracted example sentences and id-
iomatic phrases from E)J and E)K dictionar-
ies based upon the number of shared English
words and look into the feasibility of using them
to extract functional words.
We extracted a total of 1,033 paraphrasing
sentence pairs between Korean and Japanese
with  ve or more shared English words. Among
them, 465 sentences (45%) matched all the En-
glish exactly (=), and 373 sentences (36.1%) al-
most ( ) matched. We give examples below:
= (10) "as for me, give me liberty or give me
death." a0a2a29 a70a2a1a4a3 a47a6a5a8a7a10a9 a69 a4a71a8a12a11a33a4 a9a10a13a71a28
a16a15a14a17a16a2a7a19a18
a70
a16 .
"as for me, give me liberty or give me
death." a20 a37a22a21a24a23a26a25a27a23a29a28a17a32 a36a38a37a31a30a32 a39 a37 a19 a37a33a20 a21a35a34a33a36a37 a29a38a39 a56a3a4 a56a3a40 a51 a42a41a43a42 a37 .
 (8) "he is taller than any other boy in the
class." a44
a3a46a45a15a47a49a48
a62a12a16 a4 a25a51a50a49a52a54a53a55a7a10a56 a6 .
"Tom is taller than any other boy in his
class." a57a30a4 a56a3a32a59a58a61a60a37 a21a24a23 a45a63a62a6a64a55 a53a55a19a65a66 a5 a37a68a67a66a70a69 a21 a39 a37 a3a5 a5 a37 .
(extracted from E)K and E)J)
The numbers in parentheses in the above ex-
amples represent how many English words are
shared between E)K and E)J. Using these
paraphrasing sentences we will examine the ef-
fective way of extracting functional words.
Finally we would like to apply our method to
open source dictionaries, in particular EDICT
(J)E, Breen (1995)) and engdic (E)K, Paik
and Bond (2003)). This would make the results
available to everyone, so that they can be used
in comparative evaluation or further research.
5 Conclusion
We have shown three major combination of dic-
tionaries to build dictionaries. These methods
can be applied to any pairs of language; we used
a K)E dictionary, a J)E, an E)K dictionary
and an E)J to build a K)J dictionary using
English as a pivot.
We applied three di erent methods accord-
ing to di erent combination of dictionaries.
First, a one-time look up method (Tanaka and
Umemura, 1994) is tried using K)E and J)E.
Second, an overlapping constraint method in
one direction is applied using K)E and E)J.
Finally, a novel combination for building a
dictionary is attempted using E)K and E)J.
We found that the best result is obtained
by the  rst method. However, by combining
all methods we can extract far more entries
since the results from the three method do not
overlap. Our result shows that 60% of word
pairs in the second method are not found in the
 rst or the third method. For the third method
(using E)K and E)J), we could not extract
as many matched pairs, but it is potentially
useful for extracting conjugated forms and
functional words.
Acknowledgments
This research was supported in part by the Ministry of
Public Management, Home A airs, Posts and Telecom-
munications. We would also like to thank Francis Bond
for his comments and discussion.

References
Christian Boitet, Mathieu Mangeot, and Gilles
S erasset. 2002. The Papillon Project: cooper-
atively building a multilingual lexical data-base
to derive open source dictionaries and lexicons.
The 2nd Workshop NLPXML-2002, pages 93{96,
Taipei, Taiwan.
Francis Bond, Ruhaida Binti Sulong, Takefumi Ya-
mazaki, and Kentaro Ogura. 2001. Design and
construction of a machine-tractable Japanese-
Malay dictionary. In MT Summit VIII, pages 53{
58, Santiago de Compostela, Spain.
Jim Breen. 1995. Building an electronic Japanese-
English dictionary. Japanese Studies Association
of Australia Conference.
Reinhard Rudolf-Karl Hartmann. 1983. Lexicogra-
phy: Principles and Practice. Academic Press.
Mathieu Lafourcade. 2002. Automatically populat-
ing acception lexical database through bilingual
dictionaries and conceptual vectors. In Papillon
2002 Seminar(CD-Rom), Tokyo, Japan.
Kyonghee Paik and Francis Bond. 2003. Enhancing
an English and Korean dictionary. In Papillon-
2003, pages CD{rom paper, Sapporo, Japan.
Kyonghee Paik, Francis Bond, and Satoshi Shi-
rai. 2001. Using multiple pivots to align Korean
and Japanese lexical resources. In NLPRS-2001,
pages 63{70, Tokyo, Japan.
Satoshi Shirai and Kazuhide Yamamoto. 2001.
Linking English words in two bilingual dictionar-
ies to generate another language pair dictionary.
In ICCPOL-2001, pages 174{179, Seoul.
Satoshi Shirai, Kazuhide Yamamoto, and Kyonghee
Paik. 2001. Overlapping constraints of two step
selection to generate a transfer dictionary. In
ICSP-2001, pages 731{736, Taejon, Korea.
Kumiko Tanaka and Kyoji Umemura. 1994. Con-
struction of a bilingual dictionary intermediated
by a third language. In COLING-94, pages 297{
303, Kyoto.
Katsuei Yamagishi and Toshio Gunji, editors. 1991.
The New Anchor Japanese-English dictionary.
Gakken.
Katsuei Yamagishi, Tokumi Kodama, and Chiaki
Kaise, editors. 1997. Super Anchor English-
Japanese dictionary. Gakken.
