Two Methods for Learning ALT-J/E \]¥anslation Rules 
from Examples and a Semantic Hierarchy 
llussein Ahnuallim 
ln\[o. and Coml)uter Science Dept. 
King Fahd University of 
l)etroleum and Minerals 
l)hahran 312(;1, Sated( lXrahia 
Yasuhiro Akil)a Takefumi Yamazaki 
Akio Yok0o Shigeo Kalmda 
NTT Communication Science Labs. 
\]-2356, Take, "(okosuka-shi 
Kamtgawa-k(m 2:/8-()3 
,I apa.n 
Abs~I'aC|; 
This palu:r prose.his our work towards the mtlomatic 
acquisition of translatiort "t'ules from Jatmnese- l')nglish 
transhdion examples fo'r NTT"s ALT'-J/I'2 .machine 
translation system. We apply two lttat:hinc lca'tvti~ 9 
ab.loritim~s : lIaussler's algm'ithm fro" h:mvvirtg inter- 
nal disj'tmctive concept and (~'uirdan's I1)3 algm'ithm. 
l,;:Cl)evimental results show that our al)trroach yields 
r'uh'.s that (lI'('. highly a(:c'twale colnpalvd to l\]tc l/lilly(t- 
ally crv.atcd r'ules. 
1 Introduction 
A critical issue in AI r(~sem'ch is to ov(.'r(:(ml(~ the 
knowh~(Ige acquisition bottleneck in knowl(!dge-tms(!d 
systems. As a knowledge base is eXlmn(led, adding 
more kn((wl(~dg(-` and fixing previ(ms err(m(~(Tus kn()w1- 
edge become increasingly c(Tstly. Mor(~(Tv(w, maintain- 
ing the integrity of Ire'go knowledge bases has 17rovcn 
to be a very chall(mging task. 
A wid(!ly im)i)(Tsed apl)roach t() deal with the 
knowl(~dg(~ a(:quisiti(m botth.uu~(:k is to employ some 
lcai'ning lll(-`ch}llliSi\[l to (~Xtl'}lct th(~ (\[csir(!d kni)wl- 
edge autornati(:a\]ly or semi-automatically from a(:- 
tual (:ases (Tr examl)h!s \[lhmhamm & \Vilkins \]993\]. 
The validity of this apiTroa('h is 17ec(TminI', m()re ew 
(dent as vari(ms machin,~-learning,-l)ased l~u()wh,(lg(' 
acquisitioi~ tools for real--world domains are l)(,i(l~,, 
report(-`d \[Kim & Moldovan 1993, l)orter ~t al. 199t), 
Sato 1991a, 5;at(7 19!)ll7, Ul:sur(7 (!t al. 1992, 
Wilkins 1990\]. 
AIJI'-.J/I:'~, whi(:h is an exp(!rim(mtal Japan('s(!- 
English translation system d(~v(.qoped art Nipp(m 'lbh!- 
gral)h and T(~lel)hon('. Corporation (NTT), is (me ex- 
amI)le of a larg(! knowh~dg(>l,ased system in which 
solutions t(7 the l~n()wle(lg(~ a('(luisiti(m l)(Tttl(m(~ck are 
delinit(~ly need(:d. ()he major (:(Tmi)on(mt of this sys- 
tem is its huge (:oll('(:tion of trm~sl,~ti(m. t'ltlcs. Each 
of these rules associates a .\]alTmlCSC s(,\[lten('(' I);d,t(Tn 
with an aI)I)roI)riat(-` l,'mglish pattern. To translat(: 
a Japanese s(~ltt('.iic( ~, into l';nglish, AI;I'-,I/I'; hiol~s lbr 
the rul(~ whose ,\]almli(!s(! i);ttt(wil llHttch(!N t}l(! S(!ILt(!II('(! 
best, and then uses the English \])~-ttt(~l'l| O\[' thatt rule 
for translation. 
So far, AI;F-J/E translation ruh!s have b(!en com- 
posed mam(ally by (~xtensiv(~ly trained human ex- 
l)('rts. T(7 qualify lln" this.i(~b, an eXl~ert must not only 
master both English and .lapanes(~ but also be very 
familiar with various comi)onents of the system. Each 
tinm the rules are (~xi)anded or altc.r(-`d, the new set 
of rules must then I)c "delmgg(~d" using a c(711ecthm 
of t(.~I. ('as(,s. Usually, s('vcral it(~ri~tions are n(~cded t(7 
arrive at translation rules (Tf acceptalflc quality. 
Creating new translation rules as well as refining 
existing ones have In'OVen to lm cxtr(~mely difficult 
;~ltid tiHl(~-COllSll(liill~ l)(?(:a/iSC thcsc t~(.sks r(~(l(lil'(! col(~ 
sidering a huge space ()f p(Tssibh~ comlTimtti(ms (rules 
in AI;\['-,I/E at(! (~xpr(.'ssed in terms of as much as 
3000 "semantic categorieF'). The high costs involved 
make the mmmal creation of ALT-.I/E's translation 
rules impractical, hMeed, in si)ite of the w~st mnount 
()f r(,sources sp,mt ,)n building th(-` current ruh!s of 
A LT-J/I!', faults in these rules are still d(~tected fi'om 
time t() tim(.', making system l\[(kl.illt(!Ilatiic(~ it c(mtinu- 
Oils 1"(!(I 11\] F(!(ll(!l It. 
'I'h(-` aim ()f this work is to mak(! AUI.'-J/I,;'s tnmsla- 
(.ion rubes less costly and more rcliabh-` through tim us(! 
()t' inductive machi,l(' h'a,',lin/,; techni(lueS. Car(!ful ex- 
aminati,)n (Tf th(, mamml pr(7(:(~ss wlfich has been t271- 
lmv,'d so far by Al;l'-,l/l';'s (~Xl)erts fin" Imihling t:rans- 
lati(m ruDs revc'ids that m(Tst of th(.' efl',n't is spent on 
figuring out the (:onditi(m part of the rules (that is, 
the 3apanesl~ i(att(~rns). Ther(~fore, we prol)OSC th(; 
(is(.' of indu(:tiv(~ machine learning algorithms t(( h~mn 
these conditions fi'onl examph~s of Japanese sentences 
and their English translations. Under this machine 
l('arning approach, the user is r(qi(wed from exph)r- 
ing th(! hug(: space of alt(~rmttives sl(e/hc, has to con.- 
sider wh(m c(mstrnctinl,; translation rules manually 
from scratch- a job whi(:h only ext(msiv(!ly train(!d 
eXlT(wts can perf(n'm. Th(' task is now tin'ned into 
a s('ar('h tl)r s()m(~ r('as(Tnahh-` rules that explain t.lm 
given training cxamlTles , whbrc the search is han(lh-`d 
aut(mmti('ally by a learning algorithm. This not only 
sltves the tlser~s tiltl(}~ hilt idso lltakes it untle(:t!ssary 
for the user to be an expert of the AUI'-J/E sys- 
tem. Mor(~ver, this approa(:h sigmticantly reduces 
the "subjectivity" of the rules since the interwmtion 
of hmnlm exI)erts is minimized. This is tmrticularly 
important because tile iHllllense Illllllb(w of transla- 
tion rules (currently over 10,000) requires employing 
a team of experts over an extended l)eriod of tim(!. 
Two learning methods are investigated in this i m- 
l)er. Ext)eriments show that the rnles learned by 
these methods are very close to the rules mmmally 
COmliosed by hlllIt}tll experts. Ill Hl(Ist cases~ givell 
a reasonabh~ mtmber of training examph~s, th(! em- 
ployed methods are able to find rules that are more 
than 90% accurate when compared to the mamutlly 
COnlI)OSed miles. 
The rest of this document is organized as ti)llows. 
We begin in Section 2 by it brief overview of the AUI'- 
J/E Japanese-l.;nglish translation system. In Section 
3, we discuss some of the 1)rol)lems that arise when the 
translation rules of ALT-J/E are composed manually 
})y }roman experts. Then, we t)ropose in Section 4 
an alternative approach based on machine learning 
techniques. In Section 5, we describe the inductive 
learning methods used, followed by an experimental 
ewfluation of these methods in Section 6. Fimdly, 
conclusion remarks are stated in Section 7. 
2 ALT-J/E: A Brief Overview 
ALT-.I/E, the Automatic Language Trlmslator: 
Japanese to English, is one of the most &dvitll(:(}d 
and well-recognized systems for translating ,htpanese 
to English. It is the largest such system in terms of 
the iunount of knowledge it compris(~s. In this work, 
we are concerned with the li)llowing components o\[' 
the ALT-J/E system: 
1. The Semantic lliera.rchy, 
2. The Semantic Dictionary, and 
3. Tile Translation l{ules. 
We briefly describe each of these COmln)nents be- 
low. For more details al)out the AI,T-.I/E sys- 
tem, we refer the reader to \[lkehara et M. 1989, 
Ikehara et al. 1990, ikehara et al. 1991\]. 
As shown in l"igam~ 1, the Semantic ltierarchy 
is it SOFt of colt(:el)t t}l(?SltllrtlS represented its it l;l'(?e 
structure in which each node is called a .SC'IIta'tttiC cat- 
egolw, or a (:atego'l~9 R)r siml)licity. Edges in this struc- 
ture represent "is-a" relations am(rag the categories. 
For example, "Agents" and "P(!ople" (see Figure 1) 
are both categories. Tile edge between these two (:at- 
egories indicates that any instance of "l)eoph~ '' is also 
an instance of "Agents". The current version of ALT- 
.l/E's Semlmtic llierarchy is :12 levels (let, I) and has 
about 3000 nodes. The Semantic Dictionary maps 
(~it(:h .\]~4pall('.sC IIOtlll to its aI)prol)riate SeItlalltic cRt- 
cgories. For example, the Selilalltic D!ctionary states 
that the noun )~!:~ (niwatori), which meahs "chicken" 
OF "h011" ill English, is an instance of the categories 
"Meat" and "Birds". 
The Translation Rules in AUI?-J/E associate 
Japanese patterns with English patterns. Currently, 
ALT-J/E uses roughly 10,000 of these rules.' As Fig- 
ure 2 shows, each translation rule has a .\]apanese fret - 
tern its its left-hand side and all English pattern as 
its right-hand side. For example, the first rule in this 
figure basically sltys that if the ,Japanese verb in a 
sentence is ~J'~ < (yaku), its subj('(:t is an instance of 
"l)eople '', and its ol)ject is an instance of "lh'ead" or 
"Cake", then the following English pattern is to be 
llS(?d: 
Sub.jeer "l)ake" Obj(!ct. 
Note that in this (:~e the Japan(!se verb ~y~ ((yaku) 
is transhtted into the English verb 'q)akc'". This slune 
.\]aI)anes(! yet'l) cait also be translated into the English 
verbs "roast", "broil", "crenmte" or "burn", depend- 
lug on the context. These (:~Lses axe }landled by the 
fore" other rules given in Figure 2. 
Translation rules are meant only to handle basic 
sentences that contain just a single .\]itl)a.ltt.'se ver}). 
Such sentences are called "simple selitellCeS. ''2 '\[l'o 
translate a comlllex sentence, M;\]'-,I/E does various 
ldnds of pre- and post-proc(~ssing, l/oughly speak- 
ing, the given complex sentence is first broken into 
a collection of simple sentences in the we-processing 
phase. Then, the English translations of these are 
combined together in the post-processing t)}u~se to 
give the final translation of the complex sentence. 
To translate a simple sentence, AI:I'-J/E looks for 
tile most ai)I)roi)ria.te translation rule to use. Based 
on the VOl'b of the sentence, the system considers ius 
candidates all those tra.nslation rules that have this 
verb on their left-hand side. 'l'he English pattern 
of the rule, whose JaI)imese pattern matches the s0Ii- 
tell(:(! })est is th(!ii osod to generate the desired English 
translation. 
As shown in Figm'e 2, the ,Ial)anese patterns are 
exln'essed using th(, wu'iM)les NI, N~,..., etc., which 
r(!\]\[)H}s(}llt variollS COIllp()lleIltS of it Ja, pallese S(~Ilt(!llCe~ 
such as the subject, the ob.iect , et(:. :l The "degree 
O\[ llilttchillg ~ \])otw(R!II it ,\]ltl)alles(.' \[liltt(!l'lI itlld it Sl~ll- 
fence is based on how well the values o\[' these vltri- 
ables for the given sentence match those categories 
required by the Japanese pattern. 'Fhe Semantic Dic- 
tin fact. AUI-J/E has three dith'rel,t kinds of translation 
HI.s: (i) the senlauti(' pal teru transfer rules (ront~,hly 10,000 
l'uh,s). (it) the idiomatic expression tl'itli~.fer l'lll(.s (/i\])Oltt 5.000 
rules), and (iii) the p, en,.ral trallsfer rllh,s. We lINt ~ the lt'Hll 
"'Tl'~illSliitii)ll l{llh.s" 11t,1"(, Io l'(,fel' to I\]le .Siqllilliti( l)itttUllt trails- 
ti,r rules. These form the majority of the rulos, alld they are 
the most fl'equently used by .kUI'-J/E. 
~'lhe I(,i'lli "'siml)le S(~lllt,llC( ,'" iS it (lilei't translalitm of IgS~ 
(taulmn) in .lal);UleSe. 
:l'\]o be precis...\]al)iil|~,s(, NI'llI,'II('t'N ill't* I|SllaIIv \])/tl'sed illIO 
a set ol (Olnlmn~mts (('ailed ~I - I{'}~ ~ - ~, E - t~, etc.) that iIl'e 
quite di|felt'll! froln those used in English. Using "'sul)j(.cI" and 
"'ob.i('ct'" \]1(~1( ' is ouly lilt'Hilt to Cits(' lhe discussion fin' English 
l'ell (I(TS. 
58 
~-----~LPe-6p\]~ @: Continued below 
Agents --~- organi2aitions 
/ ~____~ Natu ral Concrete ~ Places ---,~.~--~.-FRe(}ions ..% 
--Objects. ----~:~-- Animate' . 
-~--~-~- I n alli rnat e Anything 
x~ ~'~. Culture Abstract thirLqs -~ L;ystems / Customs 
Abstract 
. / I-Iulm~n Activities X ~ Things ~----~- Phenonqena 
~ Natural Phenol\]lena 
\, /--.- Existence -~Abstract -,~----~'~----~ -t: s 
Relationships ~-----~ ype . ~--~ Relationships 
"~ Properties 
dopth 4 clopth .g (Ioplh 6 dopth Z ctopth 8 
@ : People X~\[ HLIRlar/N ~ Old / YOLInO / ,, Male / fem~41(~ ~"'-~Mal° / Female~-~ ~ Male ",,',--,,. ~ f:emalo 
l,'igur(~ 1: q'h(, upper h!v(!ls of th{! Semantic lli{war(:hy in AI:I'-,I/I'2. 
ll" TIIEN 
J-Vt,H, = " !tJ~ < (yaku) " Suhj = ,\', 
Na (Sul}j) -~ "l%oph r E-Verb =: " bak{'" 
.V., (()bj) ~ "Fhcad'" {}r "'Cakt"" ()hj = N2 
11" 'I'IIEN 
J-Verb = " ~k < (:,'aku) " Sub.i = ,Yl 
,",~t (Subj) -: "lh.,,idC' E-V{,rh = " roast " 
At.., (OI}j) -2- "\[",h'al" ()l}j :: .\:e 
IF TIIEN 
.J-V{'H} = ":tt,( (yaku}" Suhj =: ,V I 
A'I (,'3ubj) ~ "l',',,l}h"" l'2-\'erh = " hroil" 
N~ (()bj) -2 "'l'T,h" ' } I I ' ' S t ' m I I ( } I ' ' 1 ` I ( ) I } I i ~ 'X" e 
11" 'I'IIEN 
.\]-Verl} = ":\[\]~( (yaku)" Subj = A'I 
N~ (Sub j} ~ ' A~,emn' E-Verb = " {'rmnat{, " 
N.., (()hi) : "I'e{q}h:' .r "Animals" ()hi = N., 
ll.' TtlEN 
.J-V'erl) = "~l~( (yaku)" Suhj = .%'1 
Art (~ul)\]) :: ".-\.KunD,'" m "'Ma{hin,'s" I~-\,ql) 2: " bulu " 
N2 (()bj) :c "'Plac{'£" or "Obj,, ~£' el ()hi = .V~ 
"l ,{IC~it i(lll~, "" 
l,'i~me 2: 'i'ranMatien rules f'(w t:he ,hq}an~.~e v~'rh f/t! < (yaku). 'l'he~e 
exl)('.rts. " ~7i " hl(li(:;tt,(!s "an ill,taM(:(' of". 
rule~ are composed mammlly }}3' lmman 
59 
tionary is used during the matching process to deter- 
mine whether or not a given noun is an instance of a 
certain category. 
3 Shortcomings of the Manual 
Approach 
"1)ranslation rules in the AI,T-,I/I~ system have so far 
been composed manually 1)y hunmn (!xl)erts. flow- 
ever, due to the high cost-1)er-ruh.' , and b(~(:aus(~ of the 
huge nmnlmr of translation rules needed fl)r AL'I'-,\]/I); 
to carry out ;t reas()nabl(.' transhttion job, the manual 
apI)roach hms been conchided by the d(~veloI)ers of 
AUI'-J/I'~ to be impracticld. In particular, the l'(,lh)w- 
ing l)roblems have been wported: 
• lhiilding and mmntaining the translation rules 
require *t greltt deal of expertise. "1"o qualify tin" 
this task, skillflfl exI~erts are required not only to 
master both aal)anese and l!;liglish, Init also t() 
b('. flflly fiuniliar with Al;I'-J/l';'s large S(~lnanti(: 
llierarchy and to understand the overall l)l'()(:(.'ss 
of the system. Such qualifications are costly and 
involve extensive training. 
. In spite of the wmt am(rant of resourc(~s spent 
on tmilding the current ruh!s of AI2F-.III'; by hu- 
man exports, faults are still detected from time 
to tinm, Inal¢ing the malnt(!ilance of th(; system 
~t ('oiltillllOliS r(~(|ll\]r(!Iil(}ilt. 
® The translaf.ion rules are not qnite coucrch: and 
vary dep(mding on the exI)ert. Rules (:onstructed 
by Oil('. oxpcl't ~-tl'(~ 11(){; (}asy for \[tiloth(H" (}XpCl'\[, t() 
understand and modify. This makes the. maintc- 
nine(! process ll)ore difficult and ii'lltkl~s it hard 
to substitute an expert by another, 
- An important o/)jective is to tmild sI)ecialized 
versions of ALq'- .} /I,; to be used in specitic al)- 
pli(:ai;ion domnins. 'l?he Illttllllltl ai)proach is o/)- 
viously unrealistic since it illvolveS Inor(! irainiug 
of the human experts with r('sp(!(:t I;() the l;arg(!f, 
application doina.in, alld I)(~(-itllS0 this l)rocess hm; 
to |)e repeated for (!v0ry new d()lHiliil. 
• One. of the problems fitting the design('rs of A1;I'- 
J/l~: is the refinement of the Smnantic lli(!rarchy. 
Whenever this structure is altered, the trans- 
lation rules mnst also t)e revised to r(qh*(:t the 
change. Such revision is extr(~mely troubh~sonu., 
and error-prone if it is don(; mamlally. 
4 A Machine Learning Ap- 
proach 
"\['lie problems we have just listed regarding the man-- 
ual construction of A\[f.\['-,l/l'\]'s translation rules are 
largely solved if the process can be automated. An 
attractive approa(:h to this l)robhmi is lto resort to 
inductive machine learning techniques to extract the 
desired translation rules fl'om examples of .laI)anesc 
sent(m(:(~s and their English translations. At tit(.' on> 
rent stage, how(wet, learning translation rules fully 
automatically from eXaml)les alone seems to lm too 
chalhmging. A more realistic goal is to minimize 
rathc'r than to totMly eliHlinat(~ the intervention of 
human exp('rts in the rifle aquisiti~m process. Thus, 
OIll" Cllrl'(?Ilt o1)jectiv(~ is to ('OllCOIltl';itt(~ 011 ~Ult.Olll~tt- 
ing l;he niost ditlicult and tinl(>(:onsnlning parts of the 
niallllal procedure. 
The goal of the pr(!sent work is to learn what we call 
"partial translation rules". A partial translation rule 
consisls ()l" the left-hand side along with the English 
verb of the right-hand side of a translation rule. hi 
other words, the otlly diflin'en(:e between it transla.tion 
rul(.' and at partial translation ruh j is that the latter 
has only an I'\]nglish verl) rather than it full English 
patt0rn its its right-hand side. 
Constructing a partial translation rule is the most 
ditllcult part of constructing a. tl'anslati(m rule. ln- 
d(~e(l, t;/ll'liillg it l)itrtial Fill{! into a comlil(!te one is a 
relatiw~ly easy t;ask that can Im done by a human 
operator with moderate knowh!dge of English and 
,J al)~Ul(!Se. 
5 Learning Task and Methods 
In this work, we investigate two dift'erent inductiw, 
l('arning algorithms. Before talking about these al- 
gorithms, we will first IIiMc.e the learning task more 
precise, alid shed some light Oil the diftlculties that 
distinguish it from other previously studied learning 
tasks. 
5.1 Tile Learning Task 
The .iol) of a learning algorithm in our setting is 
to construct partial translation rules, l,'or a given 
,lapan(~s(! verb ,l-vcr'b and a l)ossil)le English transhl- 
tion l,?-vcrbi of that verb, the MgorMlm has to llnd 
the npln'ol~riate condition(s) that should hoM in the 
('i)litoxt ill Ol'dOr ti) Illlt 1) ,\]-'O,f~'l'l) to E-VC.'tq)i. 
As an exmnlfl(! , consider the ,lapanese verb /!E 5 
(tsukau). This verb corresponds to the English verbs 
"use", "spend" and %ml)loy". The c}loice aniong 
these IDn.t~lish verbs del)(mds mostly on tim o}@ct of 
the sentence, l,'or example, if the object is mi in- 
stance of "Asset" or "Time", then "spend" is itpl)ro- 
priate. Thus, it rough rule for mapping ¢< 5 (tsukau) 
to "Slmnd" may look like 
11:" \[.l-,t.:m, = t'~5 \] 
'I'III'~N l\]-\'i~Rn = spend. 
\VO S(!('I'7. to \]Oitl'll this kind of l'lll(!s frolll exatl~ll)lt!s 
of ,hil)anese senti.mces and their I:;nglish translations, 
such as the following pair: 
60 
{ . I'i&':~:~= ~:{~L ' 5, Tim l}rincess sp(!n(Is mt)lmy ). 
After parsing (which is carrie{l trot by AI,T-J/Iq's 
parser), the. above exanq)le gives the ft}llowing l)ail': 
( \[ J-\~:,u = ~5 . ~;tuuEc'r = mtj>,). 
OBJECT = k;(Iw \], E-VERLI =~ Sl)eltd ). 
lly looking np the Semantic l)icti{)nary of AI/I'-.I/IQ 
the i}ossibh~ semanti{: catep;ories ft}r (mjyo are "No- 
ble Person", "Daughter" anti "Female", antt thosP 
for kane are "Asset", "Metal", "l)ay" and "M*'dal". 
Thus, this example is tiredly giwm to the learning al- 
p;tn'ithm in the folh}wing fl)rm: 
( \[ .~UILII,:t'T ~ { Noble Person, l)an~ht,,r. Fen.de }. 
()llJE('q ~ { Ass01, iXl,'tal, l)ay. Medal }\] . 
I".-VEItB == Slmltd ), 
where N :~ ,%" indicates I;hat t}m senl:(m{'t' c()mI)(}n{mt 
N is an instant:e, of each category s (2 ,5'. '\['lw p;('n(wal 
fin'mat t)t" the training examI)h's is as f{)ll{iw~< 
(\[ N, ~ {a,,a2,...}, 
& -= {b,,b.~,...},... (~) 
N. < {,,~,,,.,, ...}\], 1,:-v,,,I,) 
whol'e e~/ch Ni reI)resents a COlllp(}II(!II{. of the S(HltelICtT 
(sul}ject, ol)ject, etc.), mitt ea{:h ai,bi, and ci is a 
senlantic category. 
l¥om the viewpoint of machine learning r('s{!ar{:h, 
the al)t)vt~ h'.arning task is inter{~sting/(:hall('nl;in~: 
from two l}erspet:tives: 
~, Iluge~ amount of backgrom,{l knowledge: 
'lb I}e apl)roI}riate for our learning task, the 
learning algorithm must efl'{~ctively utiliz{~ AI,T- 
J/E's large Semantic lIierarchy. This require- 
merit of being {'al}abk' t)f t~xl}l()iting such a hug{' 
amount of lm{:kgrt}und knt)wh~tlgt' (lisqualilics 
most of the known inductivt~ learning algorithms 
froln dirct:tly l)eing nsed ill our domain. 
® Ambiguity of the training examI)h~s: Un- 
like mr}st known learning doinains, tim trainint~ 
exa.mph,s in tmr setting (as givml in Et I. (l)) are 
ambiguous in the sense that cat:h (ll the varial)h's 
(SUII.IECT, OILIECT, etC,) iS assignt~tl multipl(' 
wdues rltther than a single value, l"(){:usinl~ t}tl 
the rehwant wdu{!s (that is, the va\]ue~; tha.t con- 
trilmted to the chtlice of the t,;nplish v(!rb) is an 
extlTit challenge to the l(!ill'Ii(!r ill ()Ill' (l{}IIlaill. 
To deal with th(' above learning l)l'{)bh!m, w{! in- 
vestigate{l two al)I)roat:hes. One is based {m a tl~e()- 
retical algorithm introdnc(,d by l lm~ssh,r fin" learnint~ 
internal disjunctive conceI)ts, and the (,thor (m tht, 
wdl-known ll)3 alg(}rithm t)f QuiMan. 
5.2 Haussler's algorithm for learning 
internal disjmlctiw', exl)res:dons 
hi ()lit t\[l'S|, al}I)roach, we relwt'stml the c(m(lil.i(ms (}f 
the h~arned partial translati{m rules as i~h:rTml dis- 
j'uncli'vc c.:lPp't'cssio'tts, an{1 mnI}h)y an all;or(tirol given 
l)y llaussltw for learning {:oncel)ts exprbssed in this 
syntax, lhulssh!r's alg(}rithm enjt}ys many adwm- 
taD's. \]:irst, it has lwen analytically t}rt}vtm to l}e 
(luite tqficient both in terms of time and the mun- 
t)t'r (if ('Xaml)h's nt'(,detl f{), learninp;. S{!ct)ntl, tlw 
aIp;orithnl is Cal}al)le {)f exl}licitly utilizing the I)a(:k- 
grtmn{I kn(iwledgt~ rt'pr{'sentt~d \]}y tht~ Semantic llier- 
;U't'lly. Mt)r('{tvt!r, l.\]le latlg£ttage used \]}y hlllrla.l| eXl){!rl.s 
It} t't)nslruct AI:I'-,I/E's rules is quite similar t,t} in((!r- 
hal disjunctivt~ expr{~ssit)ns, suggesting the aI)prol}ri- 
ateness ()f this alpiocithul's bias. 1 laussler's alporithm, 
on the other hand, suflbrs the iml)ortant sht)rtctmfing 
(within ()ur setting) that it is not Cal}abl{! t}f It,art> 
ing from ambiguous examl}h's. In orthq" t,o I)e able t() 
use the algt}rit.hm for our tav~k, the atnl)ip;uity has It} 
be exl)licitly r('m(wt'(1 fr{}m all the training (~xanll)lt's. 
()f c(,m'se, this al}i)rtmch is not desirable I)t'lraust~ it 
r{xltlil'{!5; s(}lllO ilti{,rvt'ltti{)ll t)y a, hllllliIll eXl)tWt im(l 
\])(,{'ause tht'rt~ are st) {,31aratd.t'('s that tlisam})ip~ual.itm 
iS doll(! ill \[I l)crfi~ct mamm\]'. 
5.3 Quinlan's 11)3 
()ur st!cond ai)l)roach is based on th(~ 11)3 algorithm 
intrtMuced I)y Quinlan in \[Quinlan 198(;\]. As il~ is, 
11)3 is ilot al}lc ~ to utilize the 1)ackgrt)lmd knowledge of 
(mr domain, nor is it capable of dealing with ambigu- 
ous trahlhlg examplt!s of the form given by lCt I. (1). It. 
b; (:h!arly inal)l}rtq)riat:t! to {xt!al, NI, ~V2"" its multi- 
vahwd variabh's, which is the tilt)st, c()tlllll{)ll w}/y o\[ 
using I1)3. This is because of the hug(.' munbm" of wd- 
llt'S thest,, variables (:till Lake, ilIld IllS() I)(!CILIIS(~ V,'t! lit!(!({ 
to ext)loit the Ba{:kgromM knowh!dge represented by 
the Semantic 1 Ih!rarchy. 
To bt! ablt~ to use 11)3 ill {}llr d()lllllill~ We I.l'}tllS- 
ft}rm the training exanq)les into a new representatitm 
thai. can \])l! handled by 11)3. The tla.nsfornial.ion wt! 
ln'Ol)t)se is (lime in a way such that the \]'elevant inf(}r- 
III;l\[.i()II fr()Ill tll(~ t.ho StTIIla.llt.ic lli(!rar{:hy art! inchM{!d 
in the newly rel}rt~s('ntt'd eXaml}h~s, anti, id, tilt! HD.III(! 
(lille, these nt'wly rt'l}restmted eXaml}l('s still r{qlect 
the amBiguily l}rt's('nt iu tim t)rit~inal (!Xaml)l('s. 
()Ill" t.FilllSf()I'IIID.tit)ll lIl{q;hotl is d(~scril)ed as follt)ws: 
L('I. A I}{y tlw set ()f all the catetv)rit's (hilt alIl)tmrc(l in 
the (raisin(; exanll)h's , and t,heir ancesl.t}rs. I:or {wery 
c (! :1, w(! (It!lint! it bhml'y f(!atui'{~ a.s ;t tt!sI; t)\[ th{! t(ll'Ill 
Is Ni an instance of (2 
For it training {!Xmnl)le 
(\[N, ~-.,fi,... Ni ~ Si,.-. N,, -S,,\], l'LVcr'b), 
we let the t)utctmie of the abt}ve test I}e t't'm', if and 
only if tiwrt! exists some s ~ Si such that s is ;Ill 
an{'e:~t{w of ," in the }~{'nlanlic I1itwar{:hv, ()r (: itself. 
Using {hi,s{, features, we c(mvtwt each t}f lhe {raininl,; 
oxami)h'~ imo a ut'w pair (V, I'J- Vcrl, ) wh('re 1' is a 
vt't:tt}r of bits ea(h I'{'I)I'{!S(!I{LiIII!~ the O/ll('t)lllt~ t)f t.h{! 
corrcsl)ouding t~at.m'(" for t.he given training eXaml)le. 
61 
Given the above definition of the binary features, 
the new pMrs (V, I '2- Verb ) in{:lude all the necessary 
l)aekgTom,d knowledge obtMn(xl form ttu., Semantic 
ltierarchy, and also reflect the ambigafity of the orig- 
imd trldning examt}les. In uther words, the above 
transformation can i}e seen as "cOral}fling" the infor- 
mation of the original ambignous training examph.'s 
along with the necessary parts of the Semantic llier- 
archy into a format that is ready to be proce~sscd 1}y 
ii)3 (or in fact, by many other feature-t}ased learning 
algorittmls). 
Note that if we create a featme fur every semantic 
category c and every sentence COmllonent Ni, then 
the total number of features will become inti.'asiblv 
large (Inany thousands), llowe.ver, what we need is 
only to consider those categories that apl}eared in the 
training data, and their ancestors (the set A above). 
In our experiments, this results in a reasomfl}le ram> 
ber of features (one to two hundred). This is 1}ecause 
the numl}er of examples is limited and also t)ecause 
of the rather "tilted" distribution of what categories 
can naturMly at}I}ear as a certain (:OlIll}Otlellt of it Sell- 
tenee for a given verb. (Eg. the object of the verl} f;2 
~3" (nomu), which roughly means to "drink", can not 
be just mlything!) 
The most important a(lvmltage of the al}ove ap- 
proach is that it cmt be applied to alnbiguous train- 
ing examl}les as they are, without the need to remove 
the mnbiguity explicitly as wc did with Ilaussler's al- 
gorithm. Another adwmtage of using ID3 is that we 
do not need to break our learning task into binary 
class learning problems since ID3 is caI}ablc of Mu'n- 
ing multi-class learning concepts. 
6 Experimental Work 
The goad of tile experiments reI}orted here is to evalu- 
ate the qmdity of the partiad translation rules learned 
by the two h.~m'ning methods we have just descril}ed. 
The comi}arison includes the folh}wing three settings: 
1. Using llaussler's algorithm to learn fr{}ill training 
examl}les ~ffter removing the mnl)igulty. 
2. Using ID3 to h;arn from training examl)les af- 
ter removing the ambiguity atnd performing the 
transformation given in the Subsection 5.3. 
3. Using ID3 to learn from tnfining examI}les after 
performing the transfi)rmation given ill tile Sub- 
section 5.3, trot without removing the. ambiguity. 
In a sense, the first setting rellresents the lmst we can 
do in the absence of the ambiguity since llmlssler's al- 
goritl}m does at good job in exi)loiting the baekgT{mnd 
knowledge fi-om the Selnanti{: Ilierarchy. Comparing 
Setting 2 with Setting 1 tells us how successfifl our 
transformati{m of the training examl}les is in letting 
1D3 make use of the available I}ackground knowledge. 
Fimdly, comparing Setting 3 with Settir,g 2 tells ns 
how successful our transhn'mation is in letting 1133 
learn directly froin amt)igalous training examl)les. 
The experiments were done tbr six ditl'erent 
.lapanese ver/}s. '.\['able 1 shows a list of these verbs, 
along with the II/lltl})er of training eKauli\])h!s llsed, and 
the a{:cura{:y levels obtained by each meth{}d. In the 
table, "tlausslcr", "ID3 NA" and "11)3 A" de.note 
Setting 1, Setting 2 and Setting 3, resl}e{:tively. The 
a(:curacy was esthnated using the leaLvt>olle-{}llt {:ross- 
wflidation meth{}d '| , m,d assuming that the rules {:{)m- 
I)osed rnamutlly by human experts are t}erfect (that 
is, we are measuring how close tim learned rules are 
to those {:Omllosed mmmally). 
The i)erti}rmanee levels of both lhmssler's alg()- 
rithm and ID3 when learning from unambiguous ex- 
amples are quite similar in Sl)ite of the fact that each 
algorithm implements a different bias and has a com- 
pletely diftin'ent way {}f' exl}loiting the background 
knowledge. Coml}aring tim l}erformance of ID3 in 
the two cases of leil.rIlillg froI\[l itIIl\]}ig/l(}llS &ll(\[ IllHlI\[l- 
I)iguous examl}les , ambiguity is not harntful t(} ll)3's 
l}erforman(:e in most cases. In fact, for some of the 
verbs, the t}erforlIl~tn{:e is evelk \])etter when aml)iguity 
is present. This suggests that the apl}roach we have 
chosen to de.al with ambiguity is effective for our task, 
and tl,at ext}licit retll{}vitl o\[ ambiguity is not an at- 
tractive strategy sim:e it is not easy to {t(}, and since 
it does not greatly improve the a(:{:m'aey anyway. 
The most important ll(}int here is that the ol}served 
a{:cura{:y of both the. 11)3 a.lgorithm aim llaussler's 
algorithm is satisfactorily high overa!.l in spite of the 
limited mmfl}er of the training examl}k's used. Such 
a high level of at(:curat(:y str{mgly indicates that the 
use of these algo,'ithms will provide significant aid in 
the c{}l,struction of AI/.I'-J/E's trmMati{}\]t rules. 
7 Conclusion 
This paper reported our work towaMs the acquisi- 
ti(m of,hqmnese-lCmglish translation rules through the 
use of inductive machine learning techniques. Two 
approaches were investigated. The first aplmmch ix 
based on a. theoreticMly-f(mnded algorithm given by 
l lmlssler fl}r h~arning internal disjunctive eoncel)tS. 
This algorithm haLs the advantage that it is tailored to 
utilize background knowledge, of the kind availabh~ in 
our domain. We f{nmd, howeww, no obvious way to 
make this algorithm learn directly t'mm ambignous 
training examples, and thus, anlbiguity wlm explic-. 
itly removed from the training exmnph~s in order to 
use this algorithm. Om' second apl)roach ix based on 
the IDa algorithm. As it is, i1)3 is not Mile to uti- 
lize the background knowledge of our domain, nor is 
it capable of dealing with ambiguous training exam- 
-I b'Xallll)h, s ill't' vxchldell frOlll the tl'aillillg st,t Ollt * ~l\[ il IilllO. 
:\[ho i'llI(, hqllllt'd \[iOlll I\]lo l'('sl of ~hl, I'Xalltllllt's is thlqt IINl'd to 
l}rPdict the {'lass o\[ tilt, l'lqllOXl,d eX;tllllllt.. This ',',';Is I'{'I}{'atod 
for all lhe (,Xilllll}lus. illlll the \]){,l{'(,lllf/l~},t , o\[ ('{}IT{'(I (htssilicatilllt 
iv} l't'l}Ol't i'd. 
62 
Table 1: Experimental results on six ,lapanese verbs. Nulnbm's show the accuracy trot-cent, estinmted using 
tit(.' leave-one-out cross-validation method. 11)3 NA indicates using 11)3 wit.h the ambiguity removed fi'oIlI 
the training examl)les. I1)3 A iudicates using 1I)3 to learn from aml@~uous training eXaml)les. 
n;;..g A,,,,,a;i;;,; '/, 
.lal)aru'.se Verb \]'huffish Ve'rbs __ \]~':r2. _ llausshn" 11)3 ~Ni\ 1I)3 A 
{'-E 5 (tsukau) use, spend, emph#,, 80 85 93 91 
I~R~'y (nomu) drink, take, eat, a('C(!l)L ,12 90 98 93 
i/'.f- 5 (okonau) conduct, play, hold 33 9.1 88 88 
l,~; 12 5 (oujiru) answer, enter, meet 30 90 87 90 
~j'~ < (yaku) tmrn, bake, roast, broil, crculato 27 93 8\[) 93 
fr~C < (tc~ku) s,)h,e, undo, dislwl 2!1 100 \] 00 97 
A'vc'vagc Accwrac'!j 92.0 92.5 92.0 
l)\]es. We gaY(-`, }towevtw~ air (!a-qy Way to "(:()m\])il(¢' 
the relewmt backgrouiM knowh!dge along with th(! 
ambignous training examl)h!s into a modilied set o\[ 
training examph!s on which w,! were abh! to directly 
run 11)3. Experiments comparing these approach¢,s 
showed that the rules learned using the second ap 
preach with the ambiguity present in the training cx- 
3.Ittpl(!s are ahttost as 3.ccltt*~ttt! ils those ()})tltill(!d fl'ollI 
arnlfignity-free examples using llaussh'r's alg(n'ithnL 
Ow.'rall, our experiments sho~ed that using Ilia- 
chine learning techniques yiehls ruh!s that are highly 
itct:llrltte (:otllpared to the ttuttntally created rules. 
These results suggest that exploiting the reported in- 
ductiw. • lem'ning techniques will significantly accehq'- 
ate the construction process of AIJI'-J/E's translation 
ruh.'s. Currently, the reported learning aplnoachos are 
I)eing inchlded in at semi-imtonmtic knowledge aC(lui- 
sition tool to be ttsc(l ill the actual (leveh)im,ont of 
the AUI'-J/F system. 
Acknowledgelnent: \Ve wish to thank l)r. S. 
lkehara for his COlltiitllOllS (!ttc()ltrilg{~Itlcllt. This work 
W~LS done while the first author was spending a I)()st- 
doctoral yem. at NTT. lle Mso thanks King l"ahd Un- 
w~rsity of Petrohmm and Minerals, Saudi Arabia, for 
their support. 
References 
\[l.hmhanan & Wilkins 199:1\] \[~llchan:qn, l/. (1. and 
Wilkins, l). C. (Eds.), llcadings in knowlcdgc ac- 
q'uisition and lca't'nin9, Morgan l(mffmamh 19.03. 
\[Ilaussler 1!)88\] lhulssler, 1)., "Quantifyintg inductive 
1)ias: AI learning Mgorithms and Valiant's h~arning 
framework", Artificial httdligence., 26(2), 177 221, 
1988. 
\[lkehara et al. 1989\] lkehal'a, S., Miyazaki, hi., Slft- 
r~d, S. and Yoke, o, A., "An i\l)l:,r()ach to Machine 
'\[3"anslation Method based on Constructive l'ro. 
~:~-`ss 'rh~,~,ry", Review of EG'I,, vol. 37, No. 1, :/9 ,1.1, 
1989. 
\[Ikehara el al. 1990\] lkehara, S., Shh'ai, S., Y,:)koo, A. 
and Nakaiwa, l l., "Toward an MT Systeln with- 
out lh'e-Editing Effects of New Methods in AI;I'- 
,I/1,", \]'r'oc. of M T S',mmit-,'\], 1990. 
\[llwhara ot al. 19!)1\] lkehara, S., Yokoo, A. mid 
Miyazaki, M., "Semantic Analysis Dictionaries for 
Machine Trallsla, th)n", (hi .lalmnese), II';1CE I~c- 
pro't, N LC 91-19, \['n.sbit'tztc of l¢lccla'o'wics, \]'ll.fo'Fftt(l.- 
tio'n and (}omm'lmication l'~ngb,:cr,% .lapa*~, i991. 
\[Kim & Mohlovan 1!193\] Kim, J. T, and Mohlovan, 
I). 1., "Acquisition of Semantic \[)atterns for Infor- 
tt|atioll l",xtracti(m from Corpora", P'loc.of CA IA- 
9:/, 171 176, 1993. 
\[l'Jort(!r (!t ~d. 1990\] \])()rtor, 1~. \V., l~at'oiss, It, and 
l lolte, II. C. Concept learning and heuristic classi- 
fication in weak-theory domains. Artificia! Mtdli- 
9e'ncc, 45(3):229 2(;3, 1990. 
\[Quinhm 1986\] Quinhm, ,l. H. "Induction of Decision 
Trees", Machine Lcavnin.g, 1(1):81 106, 1986. 
\[Sato 1991a\] Sato, S., "MIYI'I: Examph!-l~ased Word 
Selection" (in .lap;mese), .lo'lwnal of .\]apam!se So- 
cicl.y fin" ArtiJicial l'nl.clligczu:~!, w)l.6, NI).,I, 1991. 
\[.qaJo 1!)!)11)\] Sat.o, S., "MlYI'2: A method for Com- 
bining l"ragmonts of \]';x;unph's in I';xaml)le-l~as¢,d 
Translation" (in Jaimnose), ./o'mw.al of ./ap(wwsa 
So(:idqj for Artificial lntclligen.cc , w)l.6, No.6, 1991. 
\[Utsuro ctal. 19!)2\] Utsuro, T., Matsumoto, Y. and 
Naga,,, M. "l,¢!xica.l Knowh'dgc Acquisition from 
llilingual Corl)ora" , l'roc, of tht: 14th l'ntc'rna 
tiorml Co'l~J'~'r'e'nce. on C(ml.tmtational I,ing'uistics, 
581 587, 1992. 
\[Willdns 1990\] \Villdus, D. C. Knowledge base refine- 
IlK'lit HS iltll)rovinl~ all ittcottll)let(! ;I, tld iI|corr(~(;t (to- 
main theory. In l(odralofl', Y. and b.lMmlski, 11. S., 
(0ds), Mach.i'tw l,earni'tlfl: An Artificial l'n.tellige'm:c 
App'roru:h, w~l. 11\[, pp .193 51.1, Morgan--Kaufmaml 
l'ublishers, /990. 
b3 
