Effect of Domain-Specific Corpus
in Compositional Translation Estimation for Technical Terms
Masatsugu Tonoike†, Mitsuhiro Kida†,
Takehito Utsuro†
†Graduate School of Informatics,
Kyoto University
Yoshida-Honmachi, Sakyo-ku,
Kyoto 606-8501 Japan
(tonoike,kida,takagi,sasaki,
utsuro)@pine.kuee.kyoto-u.ac.jp
Toshihiro Takagi†, Yasuhiro Sasaki†,
and Satoshi Sato‡
‡Graduate School of Engineering,
Nagoya University
Furo-cho, Chikusa-ku,
Nagoya 464-8603 JAPAN
ssato@nuee.nagoya-u.ac.jp
Abstract
This paper studies issues on compiling
a bilingual lexicon for technical terms.
In the task of estimating bilingual term
correspondences of technical terms, it
is usually quite difficult to find an exist-
ing corpus for the domain of such tech-
nical terms. In this paper, we take an
approach of collecting a corpus for the
domain of such technical terms from
the Web. As a method of translation
estimation for technical terms, we pro-
pose a compositional translation esti-
mation technique. Through experimen-
tal evaluation, we show that the do-
main/topic specific corpus contributes
to improving the performance of the
compositional translation estimation.
1 Introduction
This paper studies issues on compiling a bilingual
lexicon for technical terms. So far, several tech-
niques of estimating bilingual term correspon-
dences from a parallel/comparable corpus have
been studied (Matsumoto and Utsuro, 2000). For
example, in the case of estimation from compa-
rable corpora, (Fung and Yee, 1998; Rapp, 1999)
proposed standard techniques of estimating bilin-
gual term correspondences from comparable cor-
pora. In their techniques, contextual similarity
between a source language term and its transla-
tion candidate is measured across the languages,
and all the translation candidates are re-ranked ac-
cording to the contextual similarities. However,
collecting terms
of specific
domain/topic
(language S )
X
S
U
(# of translations
is one)
compiled bilingual lexicon
process data
collecting
corpus
(language T )
domain/topic
specific
corpus
(language T )
sample terms
of specific 
domain/topic
(language S )
X
ST
U
, X
ST
M
,Y
ST
estimating bilingual term
correspondences
language pair (S,T )
term set
(language S )
X
T
U
(lang. T )
translation set
(language T )
web
(language S )
web
(language S )
existing
bilingual lexicon
X
S
M
(# of translations
is more than one)
Y
S
(# of translations
is zero)
web
(language T )
web
(language T )
looking up
bilingual lexicon
validating
translation
candidates
Figure 1: Compilation of a Domain/Topic Spe-
cific Bilingual Lexicon
there are limited number of parallel/comparable
corpora that are available for the purpose of es-
timating bilingual term correspondences. There-
fore, even if one wants to apply those existing
techniques to the task of estimating bilingual term
correspondences of technical terms, it is usually
quite difficult to find an existing corpus for the
domain of such technical terms.
Considering such a situation, we take an ap-
proach of collecting a corpus for the domain of
such technical terms from the Web. In this ap-
proach, in order to compile a bilingual lexicon
for technical terms, the following two issues have
to be addressed: collecting technical terms to be
listed as the headwords of a bilingual lexicon, and
estimating translation of those technical terms.
Among those two issues, this paper focuses on the
second issue of translation estimation of technical
terms, and proposes a method for translation es-
timation for technical terms using a domain/topic
specific corpus collected from the Web.
More specifically, the overall framework of
114
compiling a bilingual lexicon from the Web can
be illustrated as in Figure 1. Suppose that we have
sample terms of a specific domain/topic, techni-
cal terms to be listed as the headwords of a bilin-
gual lexicon are collected from the Web by the re-
lated term collection method of (Sato and Sasaki,
2003). Those collected technical terms can be di-
vided into three subsets according to the number
of translation candidates they have in an existing
bilingual lexicon, i.e., the subset X
U
S
of terms for
which the number of translations in the existing
bilingual lexicon is one, the subset X
M
S
of terms
for which the number of translations is more than
one, and the subset Y
S
of terms which are not
found in the existing bilingual lexicon. (Hence-
forth, the union X
U
S
∪ X
M
S
is denoted as X
S
.)
The translation estimation task here is to estimate
translations for the terms of X
M
S
and Y
S
. For the
terms of X
M
S
, it is required to select an appro-
priate translation from the translation candidates
found in the existing bilingual lexicon. For ex-
ample, as a translation of the Japanese technical
term “����”, which belongs to the logic cir-
cuit field, the term “register” should be selected
but not the term “regista” of the football field. On
the other hand, for the terms of Y
S
, it is required
to generate and validate translation candidates. In
this paper, for the above two tasks, we use a do-
main/topic specific corpus. Each term of X
U
S
has
the only one translation in the existing bilingual
lexicon. The set of the translations of terms of
X
U
S
is denoted as X
U
T
. Then, the domain/topic
specific corpus is collected from the Web using
the terms in the set X
U
T
. A new bilingual lexicon
is compiled from the result of translation estima-
tion for the terms of X
M
S
and Y
S
, as well as the
translation pairs which consist of the terms of X
U
S
and their translations found in the existing bilin-
gual lexicon.
For each term of X
M
S
, from the translation can-
didates found in the existing bilingual lexicon, we
select the one which appears most frequently in
the domain/topic specific corpus. The experimen-
tal result of this translation selection process is de-
scribed in Section 5.2.
As a method of translation genera-
tion/validation for technical terms, we propose a
compositional translation estimation technique.
Compositional translation estimation of a term
can be done through the process of composi-
tionally generating translation candidates of the
term by concatenating the translation of the
constituents of the term. Here, those translation
candidates are validated using the domain/topic
specific corpus.
In order to assess the applicability of the com-
positional translation estimation technique, we
randomly pick up 667 Japanese and English tech-
nical term translation pairs of 10 domains from
existing technical term bilingual lexicons. We
then manually examine their compositionality,
and find out that 88% of them are actually com-
positional, which is a very encouraging result.
Based on this assessment, this paper proposes a
method of compositional translation estimation
for technical terms, and through experimental
evaluation, shows that the domain/topic specific
corpus contributes to improving the performance
of compositional translation estimation.
2 Collecting a Domain/Topic Specific
Corpus
When collecting a domain/topic specific corpus of
the language T , for each technical term x
U
T
in the
set X
U
T
, we collect the top 100 pages with search
engine queries including x
U
T
. Our search engine
queries are designed so that documents which de-
scribe the technical term x
U
T
is to be ranked high.
For example, an online glossary is one of such
documents. Note that queries in English and those
in Japanese do not correspond. When collect-
ing a Japanese corpus, the search engine “goo”
1
is used. Specific queries used here are phrases
with topic-marking postpositional particles such
as “x
U
T
qx”, “x
U
T
qMO”, “x
U
T
x”, and an ad-
nominal phrase “x
U
T
w”, and “x
U
T
”. When col-
lecting a English corpus, the search engine “Ya-
hoo!”
2
is used. Specific queries used here are “x
U
T
AND what’s”, “x
U
T
AND glossary”, and “x
U
T
”.
3 Compositional Translation Estimation
for Technical Terms
3.1 Overview
An example of compositional translation estima-
tion for the Japanese technical term “ ;��
�
1
http://www.goo.ne.jp/
2
http://www.yahoo.com/
115
• application(1)
•practical(0.3)
• applied(1.6)
•action(1)
•activity(1)
• behavior(1)
• analysis(1)
•diagnosis(1)
• assay(0.3)
• behavior analysis(10)
B�B� Compositional generation 
of translation candidate
• applied behavior analysis(17.6)
• application behavior analysis(11)
• applied behavior diagnosis(1)
B�B� Decompose source term into constituents  
B�B� Translate constituents into target language      
process
!� -
G �a
!� -
G�b
Generated translation candidates
>" (1.6;
 1;
 1)+(1.6;
 10)
• application(1)
•practical(0.3)
• applied(1.6)
Figure 2: Compositional Translation Estimation
for the Japanese Technical Term “ ;��
�
s”

s” is shown in Figure 2. First, the Japanese tech-
nical term “ ;��
�
s” is decomposed into
its constituents by consulting an existing bilin-
gual lexicon and retrieving Japanese headwords.
3
In this case, the result of this decomposition can
be given as in the cases “a” and “b” (in Fig-
ure 2). Then, each constituent is translated into
the target language. A confidence score is as-
signed to the translation of each constituent. Fi-
nally, translation candidates are generated by con-
catenating the translation of those constituents
without changing word order. The confidence
score of translation candidates are defined as the
product of the confidence scores of each con-
stituent. Here, when validating those translation
candidates using the domain/topic specific cor-
pus, those which are not observed in the corpus
are not regarded as candidates.
3.2 Compiling Bilingual Constituents
Lexicons
This section describes how to compile bilingual
constituents lexicons from the translation pairs of
the existing bilingual lexicon Eijiro. The under-
lying idea of augmenting the existing bilingual
lexicon with bilingual constituents lexicons is il-
lustrated with the example of Figure 3. Suppose
that the existing bilingual lexicon does not in-
clude the translation pair “applied : ;”, while
it includes many compound translation pairs with
the first English word as “applied” and the first
3
Here, as an existing bilingual lexicon, we use Ei-
jiro(http://www.alc.co.jp/) and bilingual constituents lexi-
cons compiled from the translation pairs of Eijiro (details
to be described in the next section).
a19 a16
applied mathematics : ;
:�
applied science : ;J�
applied robot : ;����
.
.
. frequency
⇓y�
applied : ;:40
a18 a17
Figure 3: Example of Estimating Bilingual Con-
stituents Translation Pair (Prefix)
Table 1: Numbers of Entries and Translation Pairs
in the Lexicons
lexicon
# of entries # of translation
English Japanese pairs
Eijiro 1,292,117 1,228,750 1,671,230
P
2
232,716 200,633 258,211
B
P
38,353 38,546 112,586
B
S
22,281 20,627 71,429
Eijiro : existing bilingual lexicon
P
2
: entries of Eijiro with two constituents
in both languages
B
P
: bilingual constituents lexicon (prefix)
B
S
: bilingual constituents lexicon (suffix)
Japanese word “ ;”.
4
In such a case, we align
those translation pairs and estimate a bilingual
constituent translation pair, which is to be col-
lected into a bilingual constituents lexicon.
More specifically, from the existing bilingual
lexicon, we first collect translation pairs whose
English terms and Japanese terms consist of two
constituents into another lexicon P
2
. We compile
“bilingual constituents lexicon (prefix)” from the
first constituents of the translation pairs in P
2
and
compile “bilingual constituents lexicon (suffix)”
from their second constituents. The numbers of
entries in each language and those of translation
pairs in those lexicons are shown in Table 1.
In the result of our assessment, only 27% of the
667 translation pairs mentioned in Section 1 can
be compositionally generated using Eijiro, while
the rate increases up to 49% using both Eijiro and
“bilingual constituents lexicons”.
5
4
Japanese entries are supposed to be segmented into a
sequence of words by the morphological analyzer JUMAN
(http://www.kc.t.u-tokyo.ac.jp/nl-resource/juman.html)
5
In our rough estimation, the upper bound of this rate
is about 80%. Improvement from 49% to 80% could be
achieved by extending the bilingual constituents lexicons
and by introducing constituent reordering rules with preposi-
tions into the process of compositional translation candidate
generation.
116
3.3 Score of Translation Pairs in the
Lexicons
This section introduces a confidence score of
translation pairs in the various lexicons presented
in the previous section. Here, we suppose that
the translation pair 〈s, t〉 of terms s and t is used
when estimating translation from the language of
the term s to that of the term t. First, in this pa-
per, we assume that translation pairs follow cer-
tain preference rules and can be ordered as below:
1. Translation pairs 〈s, t〉 in the existing bilin-
gual lexicon Eijiro, where the term s consists
of two or more constituents.
2. Translation pairs in the bilingual constituents
lexicons whose frequencies in P
2
are high.
3. Translation pairs 〈s, t〉 in the existing bilin-
gual lexicon Eijiro, where the term s consists
of exactly one constituent.
4. Translation pairs in the bilingual constituents
lexicons whose frequencies in P
2
are not
high.
As the definition of the confidence score
q(〈s, t〉) of a translation pair 〈s, t〉, in this paper,
we use the following:
q(〈s, t〉)=
⎧
⎪
⎨
⎪
⎩
10
(compo(s)−1)
(〈s, t〉 in Eijiro)
log
10
f
p
(〈s, t〉) (〈s, t〉 in B
P
)
log
10
f
s
(〈s, t〉) (〈s, t〉 in B
S
)
(1)
where compo(s) denotes the word (in English) or
morpheme (in Japanese) count of s, f
p
(〈s, t〉) the
frequency of 〈s, t〉 as the first constituent in P
2
,
and f
s
(〈s, t〉) the frequency of 〈s, t〉 as the second
constituent in P
2
.
6
3.4 Score of Translation Candidates
Suppose that a translation candidate y
t
is gener-
ated from translation pairs 〈s
1
,t
1
〉,···,〈s
n
,t
n
〉
by concatenating t
1
,···,t
n
as y
t
= t
1
···t
n
.
Here, in this paper, we define the confidence score
of y
t
as the product of the confidence scores of the
6
It is necessary to empirically examine whether this def-
inition of the confidence score is optimal or not. However,
according to our rough qualitative examination, the results
of the confidence scoring seem stable when without a do-
main/topic specific corpus, even with minor tuning by incor-
porating certain parameters into the score.
collecting terms
of specific
domain/topic
(language S )
X
S
U
(# of translations
is one)
compiled bilingual lexicon
process data
collecting
corpus
(language T )
domain/topic
specific
corpus
(language T )
sample terms
of specific 
domain/topic
(language S )
X
ST
U
, X
ST
M
,Y
ST
estimating bilingual term
correspondences
language pair (S,T )
term set
(language S )
X
T
U
(lang. T )
translation set
(language T )
web
(language S )
web
(language S )
existing
bilingual lexicon
X
S
M
(# of translations
is more than one)
Y
S
(# of translations
is zero)
web
(language T )
web
(language T )
looking up
bilingual lexicon
validating
translation
candidates
Figure 4: Experimental Evaluation of Translation
Estimation for Technical Terms with/without the
Domain/Topic Specific Corpus (taken from Fig-
ure 1)
constituent translation pairs 〈s
1
,t
1
〉,···,〈s
n
,t
n
〉.
Q(y
t
)=
n
productdisplay
i=1
q(〈s
i
,t
i
〉) (2)
If a translation candidate is generated from
more than one sequence of translation pairs, the
score of the translation candidate is defined as the
sum of the score of each sequence.
4 Translation Candidate Validation
using a Domain/Topic Specific Corpus
It is not clear whether translation candidates
which are generated by the method described in
Section 3 are valid as English or Japanese terms,
and it is not also clear whether they belong to the
domain/topic. So using a domain/topic specific
corpus collected by the method described in Sec-
tion 2, we examine whether the translation candi-
dates are valid as English or Japanese terms and
whether they belong to the domain/topic. In our
validation method, given a ranked list of trans-
lation candidates, each translation candidate is
checked whether it is observed in the corpus, and
one which is not observed in the corpus is re-
moved from the list.
5 Experiments and Evaluation
5.1 Translation Pairs for Evaluation
In our experimental evaluation, within the frame-
work of compiling a bilingual lexicon for tech-
nical terms, we evaluate the translation estima-
tion part which is indicated with bold line in Fig-
117
Table 2: Number of Translation Pairs for Evaluation
dictionaries categories |X
S
| |Y
S
|
S = English S = Japanese
|X
U
S
| |X
M
S
| C(S) |X
U
S
| |X
M
S
| C(S)
Electromagnetics 58 33 36 22 82% 32 26 76%
McGraw-Hill Electrical engineering 52 45 34 18 67% 25 27 64%
Optics 54 31 42 12 65% 22 32 65%
Iwanami
Programming language 55 29 37 18 86% 38 17 100%
Programming 53 29 29 24 86% 29 24 79%
Dictionary of
(Computer) 100 100 91 9 46% 69 31 56%
Computer
Anatomical Terms 100 100 91 9 86% 33 67 39%
Dictionary of Disease 100 100 91 9 74% 53 47 51%
250,000 Chemicals and Drugs 100 100 94 6 58% 74 26 51%
medical terms Physical Science and Statistics 100 100 88 12 64% 58 42 55%
Total 772 667 633 139 68% 433 339 57%
McGraw-Hill : Dictionary of Scientific and Technical Terms
Iwanami : Encyclopedic Dictionary of Computer Science
C(S) : for Y
S
, the rate of including correct translations within the collected domain/topic specific corpus
ure 4. In the evaluation of this paper, we sim-
ply skip the evaluation of the process of collecting
technical terms to be listed as the headwords of a
bilingual lexicon. In order to evaluate the transla-
tion estimation part, from ten categories of exist-
ing Japanese-English technical term dictionaries
listed in Table 2, terms are randomly picked up
for each of the set X
U
S
, X
M
S
, and Y
S
. (Here, as
the terms of Y
S
, these which consist of the only
one word or morpheme are excluded.) As de-
scribed in Section 1, the terms of X
U
T
(the set
of the translations for the terms of X
U
S
) is used
for collecting a domain/topic specific corpus from
the Web. Translation estimation evaluation is to
be done against the set X
M
S
and Y
S
. For each of
the ten categories, Table 2 shows the sizes of X
U
S
,
X
M
S
and Y
S
, and for Y
S
, the rate of including cor-
rect translation within the collected domain/topic
specific corpus, respectively.
5.2 Translation Selection from Existing
Bilingual Lexicon
For the terms of X
M
S
, the selected translations are
judged by a human. The correct rates are 69%
from English to Japanese on the average and 75%
from Japanese to English on the average.
5.3 Compositional Translation Estimation
for Technical Terms without the
Domain/Topic Specific Corpus
Without the domain specific corpus, the cor-
rect rate of the first ranked translation candidate
is 19% on the average (both from English to
Japanese and from Japanese to English). The
rate of including correct candidate within top 10
is 40% from English to Japanese and 43% from
Japanese to English on the average. The rate of
compositionally generating correct translation us-
ing both Eijiro and the bilingual constituents lex-
icons (n = ∞) is about 50% on the average (both
from English to Japanese and from Japanese to
English).
5.4 Compositional Translation Estimation
for Technical Terms with the
Domain/Topic Specific Corpus
With domain specific corpus, on the average, the
correct rate of the first ranked translation candi-
date improved by 8% from English to Japanese
and by 2% from Japanese to English. However,
the rate of including correct candidate within top
10 decreased by 7% from English to Japanese,
and by 14% from Japanese to English. This is be-
cause correct translation does not exist in the cor-
pus for 32% (from English to Japanese) or 43%
(from Japanese to English) of the 667 translation
pairs for evaluation.
For about 35% (from English to Japanese) or
30% (from Japanese to English) of the 667 trans-
lation pairs for evaluation, correct translation does
exist in the corpus and can be generated through
the compositional translation estimation process.
For those 35% or 30% translation pairs, Fig-
ure 5 compares the correct rate of the first ranked
translation pairs between with/without the do-
main/topic specific corpus. The correct rates in-
crease by 34∼37% with the domain/topic specific
corpus. This result supports the claim that the do-
118
:�:|
:�:�:|
:�:�:|
:�:�:|
:�:�:|
:�:�:|
:�:�:|
:�:�:|
:�:�:|
:�:�:|
:�:�:�:|
:�:�:�:�:�:�:�:�:�:�:�:�:�:�:�:�
:�:�:�:�:�:�:�:�:�:�:w:�:�:�:�:�:�:�:�:�:�:�
:�:�:�:�:�:�
:�:�:�:�:�:�:�:�:�:�
:�:w:�:�:�:�:�:�
:�:�
:�:�:�:�:�:�:�:�:�:�
:�:�:�:�:�:�:�:�
:�:�:�:�:�:�:�:�:�:�:w:�:�:�:�:�
:�:�:�:�:�:�:�
:�:�:�:�:�:�:�:�:�:w:�:�:�:w:�:�:�
:�:�:�:�:�:�:�:�:w:�:�:�:�:�:�:�
:�:�:�:�:�:�:�
:�:�:�:�:�
:�:�:w:�
:�:�:�:w
:�:�:w:�:�:�
:�:�:w:�:�:�
:�:�:�:w:�:�:�:�
:�:�:�:�:�:�:�
:�:�:�:�:�:�:�:�:�:w::|:�
:�:�:�:�:�:�:w:�:�:�:�:�:�
:�:�:�:�:w:�:�:�:�:�:�
(a) English to Japanese
:�:|
:�:�:|
:�:�:|
:�:�:|
:�:�:|
:�:�:|
:�:�:|
:�:�:|
:�:�:|
:�:�:|
:�:�:�:|
:�:�:�:�:�:�:�:�:�:�:�:�:�:�:�:�
:�:�:�:�:�:�:�:�:�:�:w:�:�:�:�:�:�:�:�:�:�:�
:�:�:�:�:�:�
:�:�:�:�:�:�:�:�:�:�
:�:w:�:�:�:�:�:�
:�:�
:�:�:�:�:�:�:�:�:�:�
:�:�:�:�:�:�:�:�
:�:�:�:�:�:�:�:�:�:�:w:�:�:�:�:�
:�:�:�:�:�:�:�
:�:�:�:�:�:�:�:�:�:w:�:�:�:w:�:�:�
:�:�:�:�:�:�:�:�:w:�:�:�:�:�:�:�
:�:�:�:�:�:�:�
:�:�:�:�:�
:�:�:w:�
:�:�:�:w
:�:�:w:�:�:�
:�:�:w:�:�:�
:�:�:�:w:�:�:�:�
:�:�:�:�:�:�:�
:�:�:�:�:�:�:�:�:�:w::|:�
:�:�:�:�:�:�:w:�:�:�:�:�:�
:�:�:�:�:w:�:�:�:�:�:�
(b) Japanese to English
Figure 5: Evaluation against the Translation Pairs
whose Correct Translation Exist in the Corpus
and can be Generated Compositionally
main/topic specific corpus is effective in transla-
tion estimation of technical terms.
6 Related Works
As a related work, (Fujii and Ishikawa, 2001)
proposed a technique of compositional estima-
tion of bilingual term correspondences for the
purpose of cross-language information retrieval.
In (Fujii and Ishikawa, 2001), a bilingual con-
stituents lexicon is compiled from the translation
pairs included in an existing bilingual lexicon in
the same way as our proposed method. One of the
major differences of the technique of (Fujii and
Ishikawa, 2001) and the one proposed in this pa-
per is that in (Fujii and Ishikawa, 2001), instead of
the domain/topic specific corpus, they use a cor-
pus of the collection of the technical papers, each
of which is published by one of the 65 Japanese
associations for various technical domains. An-
other important difference is that in (Fujii and
Ishikawa, 2001), they evaluate only the perfor-
mance of cross-language information retrieval but
not that of translation estimation.
(Cao and Li, 2002) proposed a method of com-
positional translation estimation for compounds.
In the proposed method of (Cao and Li, 2002),
translation candidates of a term are composition-
ally generated by concatenating the translation
of the constituents of the term and are re-ranked
by measuring contextual similarity against the
source language term. One of the major differ-
ences of the technique of (Cao and Li, 2002) and
the one proposed in this paper is that in (Cao and
Li, 2002), they do not use the domain/topic spe-
cific corpus.
7 Conclusion
This paper proposed a method of compositional
translation estimation for technical terms using
the domain/topic specific corpus, and through
the experimental evaluation, showed that the do-
main/topic specific corpus contributes to improv-
ing the performance of compositional translation
estimation.
Future works include the followings: first, in
order to improve the proposed method with re-
spect to its coverage, for example, it is desir-
able to extend the bilingual constituents lexicons
and to introduce constituent reordering rules with
prepositions into the process of compositional
translation candidate generation. Second, we are
planning to introduce a mechanism of re-ranking
translation candidates based on the frequencies of
technical terms in the domain/topic specific cor-
pus.
References
Y. Cao and H. Li. 2002. Base noun phrase translation using
Web data and the EM algorithm. In Proc. 19th COLING,
pages 127–133.
Atsushi Fujii and Tetsuya Ishikawa. 2001. Japanese/english
cross-language information retrieval: Exploration of
query translation and transliteration. Computers and the
Humanities, 35(4):389–420.
P. Fung and L. Y. Yee. 1998. An IR approach for translating
new words from nonparallel, comparable texts. In Proc.
17th COLING and 36th ACL, pages 414–420.
Y. Matsumoto and T. Utsuro. 2000. Lexical knowledge ac-
quisition. In R. Dale, H. Moisl, and H. Somers, editors,
Handbook of Natural Language Processing, chapter 24,
pages 563–610. Marcel Dekker Inc.
R. Rapp. 1999. Automatic identification of word transla-
tions from unrelated English and German corpora. In
Proc. 37th ACL, pages 519–526.
S. Sato and Y. Sasaki. 2003. Automatic collection of related
terms from the web. In Proc. 41st ACL, pages 121–124.
119
