File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-2211_metho.xml
Size: 19,937 bytes
Last Modified: 2025-10-06 14:14:19
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-2211"> <Title>Pattern-Based Machine Translation</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 'lYanslation Patterns </SectionTitle> <Paragraph position="0"> A tra, nsla, tion pa, ttcrv, is defined as a, pair of CF(~' rules, a,nd 7,(;to or more syntactic head a,nd 5'n,k constra,ints for uontermina\] symbols. For example, tim I!;nglish-French tra,nsla,tion lmttern 3 NP:\] miss:V:2 NP:3 -, S:2 S:2 ~-- NP:3 manquer:V:2 h NP:I essentia,lly &',scribes a sy'nch'ror~,izcd &quot;1 pa,lr consisting of a, lct't-hand-side I,;nglish CFG rule (ca,lied a, smiter: ruh;)</Paragraph> </Section> <Section position="5" start_page="0" end_page="1155" type="metho"> <SectionTitle> NII'VN\['- ~ S </SectionTitle> <Paragraph position="0"> a.nd a right-ha.rid side French CFG rula (ca.lh;d ;t tawet</Paragraph> <Paragraph position="2"> a.ccompa.nicd by the following constraints: 1. Head coimt;i'aints: Tha nontermina.l symbol V ut the source, rule must have the verb &quot;miss&quot; a.s a, synta.ctic head. The symbol V in the target rule must haw; the, w,rb &quot;manquer&quot; a.s a. syntactic hea.d. The hea.d of symbol S in the source (ta.rge, t) r,ll0, is idantica.l to the hea.d of symbol V in tlw source (t~trg0,t) rule, as they a.re co-indexed, llea.d constrahtts c~m lm specified in either or both siC/h;s of the l)a.tterns.</Paragraph> <Paragraph position="3"> 2. l, ink constraints: Nontcrmina.I symbols in source a.nd ta.rgct CF(~ rules a.ve lirt.kcd if they a.re given the same index &quot;: i&quot;. Thus, the first NI ~ (NP:I) in the souv(:e rule, corresponds to tit(; second NI' (NP:I) in the ta.rgct rule, the Vs in both ruh;s correspond to e,a.ch other, a.nd the second NP (NI':3) in the source, vuh, corresponds to the first; NI ~ (N1:':3) in the target I&quot;lt~.</Paragraph> <Paragraph position="4"> STA(~(Sldeb,,r and Sch;d)es, :1990)(Synchro,dz,d '\['A(;) fin each member of the TAt1 ('lh'oe Adjoining (;,'a.nlnl;u') \[';unily. 21{ecently, 'l'vee Insertiml (;vamm;u'(Sdud.~s a,,1 Waters, 1 !19:0 ha.s 1ram1 introduced to show ;t similar possilAlity. ()m&quot; approach, however, is more ira:lined toward the {'.Ffg fornlali~,nl.</Paragraph> <Paragraph position="5"> SAnd il.s il.\[lectional va,ri;tll|;s - We will (\]ist:~,lss ;Igl'I!Olllell\[; issues lal.el', in the &quot;l'3xl;euded l'kn'tua.lism&quot; sect:ion.</Paragraph> <Paragraph position="6"> The source a.nd target rules, that is, the CFG rules with no constraints, are called the CFG skdeton of the patterns. The notion of a syntactic hea.d is simila.r to that used in unification grammars, although the hea.ds in our pa.tterns are simply ene\[~ded a.s cha.racter strings ra.ther than as complex feature structures. A head is typica.lly introduced 5 in pretermina.1 rules such a.s lea~e ~ V V ~- partir where, two w~rbs, '&quot;leave&quot; a.nd &quot;partir,&quot; are associated with the heads of the nonterminal symbol V. This is equiwdently expressed as lea~e:l --* V:I V:I +-- pa.rtir:l which is physica.lly implemented as a.n entry of a. lexicon. A set T of transla.tion patterns is said to accept an input s iff there is a. deriva.tion se, quence Q for s using the source CFG skeletons of T, a.nd every head constra.int a.ssoeia.ted with the CFG skeletons in Q is sa.tisfied. Similarly, T is sa.id to translate s iff there is a synchronized derivation sequence Q for s such tha.t T accepts s, and every hea.d and link constraint associa.ted with the source and ta.rget CFG skeletons in Q is satisfied. The deriw> lion Q then produces a transla.tion t a.s the, resulting sequence of terminal symbols included ill the ta.rget CFG skeletons in Q. Transla.tion of an input string s essen- null tially consists of the following thre, e steps: * Parsing s by using the source CFG skeletons * Propagating link eonstra.ints from source to target CFG skeletons to build a ta.rget CFG deriva.tion seqllen(:e null * Generating t from the target CFG deriva.tion sequence null The third step is trivia.1 a.s in the case. of STAG transla.tion. null Some imlnedia.te results follow from the a.bove definitions.(Takeda., 1996) 1. Let a. CFG gramma.r (4 be a. set of source CFG skeletons in T. Then, T accet)ts a. context-free, la.nguage (CFL), de.noted by L(1 ), such tha.t L('I ) L(G).</Paragraph> <Paragraph position="7"> 2. Let a CFG grammar H be a. subset of source CFG skeletons in T snch tha.t a. source CF(\] skeleton k is in It iff k has no head constraints assoeia.ted with it. rl3 tl(} II , :\[~ ~1'(; (;(' I' t S '1, ~11 \['~et IJ(\]~) Of I \[l,'\[gl' }l,g(~ L(~\]) . 3. L(T) is a proper subset of L(G) if, for exami)le , there exists a. pa.ttern p (C T) with a sonrce CFG rule X ~ Xi '&quot;Xv such tha.t 6 (a.) p has a. head constraint h:X for some nonterminal symbol Xi (i = 1,2,..., h).</Paragraph> <Paragraph position="8"> (b) T ha.s a, deriva.tion sequence X --4 . .. -4 'w such tha.t X is assoeia,ted with a head g (h, ;/: g), and T has no se, quenee of nonterminal symbols ~q...}~ that derives exactly the same set of strings a.s X does.</Paragraph> <Paragraph position="9"> 5A nonterminal symbol X in a source or target CFG rule X -~ XI&quot;&quot; Xk can only be consl.rained to have one of the heads in the RHS .X1 &quot; ' Xk. Thus, monotonicity of he~d cnnstraints holds throughout the parsing process.</Paragraph> <Paragraph position="10"> &quot;This is not a necessary condition for L(T) C L(G'). It is provable that for any set T of patterns, there exists a (weakly) equivalent CFG grammar F, with possibly exponentially more grammar rules, such that L(T) = L(F). A decision problem of two Cl.'l,s, L(T) C L(G), is solwdJle if\[ L(b') = L(G). 'Fhis includes an undecidable problem, L(F) = E*. Theret'ore, we can conclude that L(T) C L(G) is mtdecidable. Similar discussions ean be found in the literature on Generalized Phrase Structure Grammar(Gaz(lar et al., 1985).</Paragraph> <Paragraph position="11"> Although our &quot;pa.tterns&quot; have no more deseriptiw'~ power than CFG, they c, an provide considerably better descriptions of the domain of locality than ordinary CFG rules. For example, be:V:1 year:NP:2 old --, VP:I VP:I 4-- avoir:V:il au:NP:2 can h~ndle sueh NP pairs as &quot;one yea.r&quot; and &quot;un a.n,&quot; a.nd &quot;more than two yea.rs&quot; a.nd &quot;l)hls que deux alIS,&quot; which would haare to be covered by a la.rge numl)er of plain CFG rules. TAGs, on the other ha.nd, are known to be &quot;mildly context-sensitive&quot; gra.mma.rs, and they ea.n ca.pture a wider ra.nge of synta.etic dependencies, such as cross-serial depe, ndencies. The computational complexity of pa.rsing fbr TAGs, however, is ()(\[G\]ndeg), which is t3.r greaW, r than tha.t of CFG parsing. Moreover, defining a. new STAG rule is not a.s easy for the users as just adding an entry into a. dietiona.ry, beca.use ca.oh STAG rule ha.s to be speeifie, d a.s a. pair of synta.etic tree structures. Our pa.tterns, on the other hand, ca,n be spe, cified as easily as to leave * -- de quitter * to l)e yea.r:* old = d'avoir an:* by the users, lie, re, the wildcard &quot;*&quot; stands for a.n NP by defa.nlt. The prepositions %o&quot; a.nd &quot;de&quot; a.re merely ttsed to specify that these patterns are for VPs, and they a.re removed when compiled into interna.\[ forms so tha.t these pa.tterns axe a.pplica.ble to finite a.s well a.s infinite forms. Simila.rly, &quot;to be&quot; is used to show that the phrase is a be,-verb and its complement. The wiklea.rds ca.n be constra.ined with a. hea.d, a.s in &quot;year:*&quot; and %a:*&quot;. It, addition, they ca.n be a.ssociated with a.n explMt nonterminal symbol such a.s &quot;V:*&quot; or &quot;A\])JP:*&quot; (e.g., '&quot;leave:V:*&quot;). By defining a. few such nota.tions, these, pa.tterns ca.n 1)e successfully conw~,rted into the forma.1 representations defined a.bow:. The notations a.re so simple tha.t even a. novice PC user should ha.re no trouble in writing our pa.tte, rns, a.s if it(; or site were lnaking a. voca.bula.ry list for English or French ex~mls.</Paragraph> </Section> <Section position="6" start_page="1155" end_page="1156" type="metho"> <SectionTitle> 3 Pattern-Based Translation Algorithm </SectionTitle> <Paragraph position="0"> A parsing a.lgorithm for translation patterns ca.n be any of the known CFG parsing algorithms, including CKY and Ea.rley a.lgorithms. It should be first noted, however, that CFG could produce exponentb~lly ambiguous parses for some input, in which ease we can only apply heuristic or stochastic measurement to select the most promising pa.rse.</Paragraph> <Paragraph position="1"> It is known tha.t an l!\]a.rley-ba.sed parsing a.lgorithm can be made to run in O(\](;\]Kn a) :ra.ther tha.n O(JaJ2n:'),(iVla.ruya.ma., 1993; Graham et al., 1980) where K is the number of distinct nonternfinal symbols in the gramma.r G. We ca.n expect a. very etfide.nt pa.rser tbr our pa.tterns, r The input string ca.n a.lso be scanned to reduce the number of relewmt gramma.r rules before pa.rsing, e The combined process is a.lso known as offlineparsing in LTAC,.</Paragraph> <Paragraph position="2"> Handling aml)iguous parses is a. difficult task. The basic strategy for choosing a candida.te pa.rse during Eaxley-based pa.rsing ix a.s tbllows: 1. Prei~;r a pa.ttern p with a source CFG skeleton X --~ Xt'&quot; Xk over a.ny other pa.ttern q such that the source CFG ske, leton of q is X -4 X,...Xt:, and such tha.t Xi in p ha.s a head constraint h, if q has h. : Xi (i = 1,...,k). The pa.ttern p is said to be mort: specific tha, n q. This relation is similar to a. subsumt)tion rela.tionship(Pollard and Sag, 1987).</Paragraph> <Paragraph position="3"> rSchabes and Waters(Schabes and Waters, 1995) also discuss sewu'al techniques for optimizing parsing algoritlmm.</Paragraph> <Paragraph position="4"> SSuch scanning is essential for some languages with no explicit word bounda.ries (such as Japanese and Chinese).</Paragraph> <Paragraph position="5"> 2. t'refhr a. 1)a,ttern p with a. source (,I~ ~ slw, leton over (me with D, wer termina,t syml)ols tha.n p.</Paragraph> <Paragraph position="6"> 3. l)refhr a. pa,tt('rIl p tha,t d(le.s not viola,te a.ny hea.d constraint ov(',r one tha,t viola.tes a. head constraint. 4. Prefer tile shortest deriwl.tion sequence for ea.ch input sul)string. A pa.ttern ~br a. la.rger doma.in of loca, lity tends to give a. shorter deriva.tion s(,,qu(,nce. Thus, our stra.te.gy fa.vors h'xi<:alizcd (or hea(I(:onstra.ined) a.nd <:ollo<:(ttional pa.tterns, which is exactly what we axe going to a.chi('ove with pa.ttern-l)a.sed MT. Seh,,ction of t)a.tt(',rns in tit(', deriva.tion s('XlU(mc(~ accom l)aldeS th(; constru(:tion of a. l;a.rg(',t (h,riwt.tion se,(luen(:(', I,ink constra.ints are prol)aga.ted fronl SOlll'(;(2 t() ta.rget derivation trees. This is basically a. bottom-up I)rO(:t: (111 I'lL Silt(:(', the numl)er M of distinct pa.irs (X,w), for ;1. II()llt(!rminal symbol (or a. ::hart) X and a. sul)s(~quen(:(~ 'Ill of ini)ut string s, is bounded by h&quot;,. 2, th(;re a.r(, a.t m(/.~t h'n:&quot; l)OSsibh~ tril)les (X,w,h}, such tha.t h is a. head of X. Thus, we ca.n COml)ute the 'm,-Scst choice of tra.rtsla.tion (:a.ndMa.tes \[n 0(\]7'\]\[(,,,&quot;) tim(:. I\[(;i'(',, /i is the nlllnl)el' ()t distinct ll`otll;o, rnliIl`'t.lm symbols in T, a.nd 'n. is |:he size o\[ the input string.</Paragraph> <Paragraph position="7"> The reader shoMd note. critical diff'er0,nce, s between h'xica.lized gra.mma.r rules (ill. the s(;It,'-;o, of UI'AG) a.Ild tra.ns\[ati(in pa.tterns when they a.re used for M'\['.</Paragraph> <Paragraph position="8"> Virstly, a. pa.ttern is not nec(;ssa.ri\[y lexica.lized. An e(:ononfica.1 way of orga.nizing tra,nsla.tion pa.tt(',rns is 1;o include, non--lexica.lized pa.tterns as &quot;d(ffault&quot; tra.nsla.ti(m yules. I:or exa.mple, the pa.ttern</Paragraph> <Paragraph position="10"> is a.lways prelhrred over the default rule, I)(ma.use of (mr i)r(',fe, r(;nce stra.te, gy. Sitnila.rly, tho, pa.ttern please \:l':l ~ VI':I V\[':I ~ \:1':1 , s'il veins 1)\]a:d, should I)e liv(;foxred over a. h~xi<:alized t>a.l;t(n:n, if a.ny, AI)VP:I xxx:Vl':2 =~ VI':2 VI':2 +- AI)VP:I yyy:Vl>:2 S(',c(mdly, lexica.liza.tion mighl; consido, ra.1)ly increase the stz(*, of ~ lAG gra.mma.rs (m pa.rticula.r, compositiona.l gra.mma.r rules such as A\]).IP NP -} NI)) when a. la.rge nulnb0,r of lexica.I items axe a.ssocia.ted with 1;hem. Since, it is not tlltllSlla,1 fol&quot; a, ItOllll in a, SOllFC(; laIlg~tla,gj(? to ha,ve severa.l counterpa.rts in a. ta.rget la,ngua,ge, the number of tr(:e-pa.irs in STAG would grow much la.rgo, r tha.n tha.t of sour(:('. I2L'AG tre,(;s. Although in I:I'AG the gramma.r rules a.re (lifferentia.t(;d from their physica.l ol),jacts (&quot;pa.rsc'r rules&quot;), a.nd &quot;structure sha.ring&quot;(Vijay-Sha.nker and Scha.bes, 1992) is propos(;d, this ambigMty r('ma.ins lit the pa.rser rllles~ too.</Paragraph> <Paragraph position="11"> Thirdly, a. tra.nsla.tion pa.ttern ca, n omit the tree stru(:tur(: of a. (:olloca, tion, h,,a.ving it as just a. s0,(lU(',n(:e of termina, l symbols. }Pot&quot; exa.ml)h',, See y(m later, NP:I , S S ~-- At, revoir, NP:I is perthctly a,c(:eptabh; a,s ;/, tra.nsla,tion pa.ttern.</Paragraph> </Section> <Section position="7" start_page="1156" end_page="1157" type="metho"> <SectionTitle> 4 Extended Formalism </SectionTitle> <Paragraph position="0"> Syntactic depend(umi(',s hi` na.tura.l \[a.ngua.ge s(~nt(',n(:o,s a.re so subth', tha.t ma.ny powerful gra,mmar forma.lisms ha.re I)e(;n l)roposed to account for them. The a.deqtmcy of CVG for des(:ribing na.tura.1 la.ngua.ge synta.x ha.s long l)eett questione, d, a.nd unifi(:a, tion gramma.rs, among others, ha.v(' been used to buihl a, pre(:ise theory of the, computa.tiona.l aspects of synta.ctic d('.t)(mdenci('.s , which are des(:ril)ed by tit(', notion of unifica.tion a, nd by fea.ture stru(:t ur('.s.</Paragraph> <Paragraph position="1"> Transla.tion pa.tt(;rns ca.n also 1)e ext(mded by m(;a.ns of unifi(:a.tion a.nd fea.tur(, structures. Such (',xtensh)ns lntlst be ca.refully a,l)t)lied so that they do not sax:rifice tit(', et u fici0,ncy of pa.rsing a.nd genera.tion a.lgorithins. Shi('J)(:r a.nd Schabes brMty dis(:uss the issu(',(Shiel)(~r a.nd Schabes, 1990). We can a.lso extend tra.nsta.tion l)a.tterns as fbllows: \[:',ach noilt(~rmirull node in a. pattern can be a.s socia.t0d with a. tixed length vc(:tor (if binary fcatu'rr:,s'.</Paragraph> <Paragraph position="2"> This will o, na.I)le, us to st)ecit~y such synta.ct, ic (h;po, ndencies as agreement and sulma.tegoriza.tion in 1)atterns. \[Jnification of Lina.ry featl,res, however, is much simphu': unification of a. t'ea.ture value pair succeeds only when the imir is either ((),0) or(I,l). Since the. fl'at,H'e vector has a. fixed langth, unifica.tion of two t'eaturc vectors is performed in a consta.tlt time,. For o, xample, the pa,tterns</Paragraph> <Paragraph position="4"> are unifiable, with tra.nsitiw; a.nd intra.nsitive verbs, respectively. We can also distinguish local and head fea.tures, a.s postula.ted in I\[I)SG. Verb subca.tegoriza.tion is th(',II` encoded a.s</Paragraph> <Paragraph position="6"> where &quot;-()ILl&quot; is a. hma.1 fea.ture for hea.d Vl's in I,ItSs, while :' k()ILl&quot; is a head featurt; for gl)s ilt the l{.\[\]Ss.</Paragraph> <Paragraph position="7"> \[Inifica.tion of a ht;ad fea.ture with q ()ILl succeeds when it is not bo'tmd.</Paragraph> <Paragraph position="8"> Another extension is to associa.te wo, ights with fleetterns. It is then possilih', to ra.nk the ma.tching lmtterns ax:t:ording to a linea.r ordering of the weights ra.tho, r tha.n the pa.irwise pa.rtia.l ordering of pa.tterns described in the, previous section. Numeric weights for 1)a.tterns a.re axtr(,moly useful as a mea.ns of assigning higher priorities to us(:r-defined 1)a.ttevns.</Paragraph> <Paragraph position="9"> The final (;xttmsion of tra.nsla.tion 1)atterils iS int(,.gra lion of examl)h~s, or bilir~.:V,d cmpo',:t, int() our framework. It consi,~ts of the following steps. Imt :1' l)e a. set ()f tra.nsla.tion pa.ttern,% \[~; a. bili~,gual corpus, a.nd (s,t) a. t)a,h' (If SOttFC(', ~lIl{l target ,,-;(;nt(;it(:es, 1. If T can tra.nsla.te s into t, (lo nothing.</Paragraph> <Paragraph position="10"> 2. If T can tra.nsla.te s into t' (t ~ t'), do the following: (a.) If the, r(; is a. pa.ired (leriwl.tion s(:(lll(;ll(;(; Q of (s,t) in T, crea.te a. new l)a.ttern p' tbr a. pa.ttern p used in Q such tha.t (',very nont(~rulina.1 syml)ol X in p with no head constraint is associa.to, d with h : X in q, where the, head h is instantia.ted in X of p. ekdd p* to T if it is not a.h'eady there,.</Paragraph> <Paragraph position="11"> (b) \[f there is no such pa,ired deriva.tion sequence, add the pah&quot; to T (s,t} as a. tra.nsla.tion l)a.ttern.</Paragraph> <Paragraph position="12"> 3. If Tca, nnot tra.nsla.te s, a.dd the, pa.ir (s,t) to T a.s a. tra.nsla.tion pa.ttern.</Paragraph> <Paragraph position="13"> The siml)lest wa N of integra,ting the corpus B into T is just to consider the sentence pair (s,t} as a translation pa.ttern. Some additiona.l steps a.re no, cessaa'y to achieve higher MT a.ccura(:y for a. slightly wider ra.nge of sentences tha.n those included in IL However, tit(', de, gree of hnprovement in MT a.ccura.{:y tha.t ca.n be, ax:hieved with this h;a.rning mechanism is opo.rt to question, since the a.ddition of tra.nsla.tion pa.tterns does not necessa.rily gua.ra.ntee a. monotonic improve, nwnt in MT a.ccuracy.</Paragraph> </Section> <Section position="8" start_page="1157" end_page="1157" type="metho"> <SectionTitle> 5 hnplementation </SectionTitle> <Paragraph position="0"> Our exl)erimental implementa.tion of a. pa.tto.rn-l)ased MT system consists of about 500 defa.ult-tra.nsla.tion t)a.tterns, about 2400 idiomatic a.nd colloca.tiona.1 pa.tterns, a,n({ a,1)out 60,000 lexica.l items ff)r English-to-.Ia,pa,nese tra.nsla.tion. A sample run of the prototyl)e system is shown in Figure 1. tt shows one of the (l(;riva.tion sequc;tiees for the input sentence John should he.a.r from Ma.ry M)ouf, the news if he re, turns home.</Paragraph> <Paragraph position="1"> Ea.ch lino. in the. deriva.tion sequence shows a.ii English source CFG rule of a. pattern used for the deriva.tion.</Paragraph> <Paragraph position="2"> For examt)le , the first line</Paragraph> <Paragraph position="4"> in the deriw~.tion sequence shows tliaPS two nontermina.l symbols, $1 a, nd PUNCT, form a, sentenc(; S, tha,t S is coqnclex(xl with Sl, a,nd tha, t SI Inust have a, fi'n, itc form f(m.turc +(;FIN. The, curre, nt insta.nce of S ha.s four f'(;a.t,u'es finite, prcscr,,t (cPIH'TS), w/tA-.~u/t#ct (cS-UI\],\]), a.II(l with-a'mriliary-vcrh (cAUX) a.Ild it spa, ns the word l)ositions 0 to 13. o We ca, II Mso find several h('ad-cortstrained pa.tterns there. For examt)\]e,</Paragraph> <Paragraph position="6"> is a, l)a.ttern tbr tra.nsla.ting :'return:V tiome:NP&quot;. Tho, do, faxllt V+NP translaPSion pa,tteFIl will assign a, wrong Japanese, caso, mamker for this phra,se,.</Paragraph> <Paragraph position="7"> Our 1)rototyp(; took a,l)out 9 sec (ela,psed time) to transla,te this input s(mtence a,nd produce seve, n alterha, tire translaPSious. The deriva,tion shown in t\]le figure wa,s the first (i.e., the best), a,nd generates a, correct tra, nsla,tion. Therefore, colloca,tiona,l p~tterns a,nd def'~l,,lt patterns have, been a.pl)ropria,te, ly coral)it\]ell lui(le, r ()Ill' pro, fe, ren(:(~ stra,f, egy.</Paragraph> </Section> class="xml-element"></Paper>