File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-1067_metho.xml

Size: 13,412 bytes

Last Modified: 2025-10-06 14:14:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-1067">
  <Title>Word Completion: A First Step Toward Target-Text Mediated IMT</Title>
  <Section position="4" start_page="394" end_page="394" type="metho">
    <SectionTitle>
2 Word Completion
</SectionTitle>
    <Paragraph position="0"> Our scenm'io for wor(1 completion SUl)t)oses that a translator works on some designated segment of the source text (of attproxinmtely sentence size), and elaborates its (;ranslation from left to right.</Paragraph>
    <Paragraph position="1"> As each target-text character is tylted , a t)rot)osed (:omttletion for tit(; currenl; word is (tisttlayed; if this is (',orreet, the translator ntay a(',cept it att(l l)cgin typing the next word. Although inore elaborate comi)lel;ion schemes are imaginable, in(:luding ones that involve the use, of alternate hyI)othesos or 1)revisions for morl)hologieal repair, we have ot)ted against these for the time t)eing because they necessitate st)eeial commands whose benetit in terms of characters saved would t)e diilicult to estimate.</Paragraph>
    <Paragraph position="2"> The heart of ore&amp;quot; system is a comltletion engine for English to t~'ench translation which finds the best completion for a \[,Y=eneh word prefix given the current English source text segment utnler translation, attd the words which precede the prefix in the corresponding l~Y=eneh target text segment.</Paragraph>
    <Paragraph position="3"> It comprises two main components: an cvaluatot which assigns scores to completion hypotheses, and a generator which produces a list of hyp(tthesos that match the current prefix and picks the one with the highest score.</Paragraph>
    <Paragraph position="4"> 1This idea is similm to existing work on tyl)ing ae(:elerators for the disabled (Demasco and McCoy, 1992), but our methods differ signitieantly in many aspects, chief among which is the use of bilingual context.</Paragraph>
  </Section>
  <Section position="5" start_page="394" end_page="395" type="metho">
    <SectionTitle>
3 Hypothesis Evaluation
</SectionTitle>
    <Paragraph position="0"> Each score produced by the evaluator is an estilnate of p(tl{, ,st, the probability of a target.language word t given a preceding target text t, anti a source text s. For etticiency, this distribution is modeled as a silnple linear combination of SOl)re'ate tn'edietions fl'om tit(; target text and the sottree text:</Paragraph>
    <Paragraph position="2"> The vahte of /~ was chosen so as to maximize e, otnpletion lterforInanee over a test text; (see s(!ction 5).</Paragraph>
    <Section position="1" start_page="394" end_page="394" type="sub_section">
      <SectionTitle>
3.1 Target-Text Based Prediction
</SectionTitle>
      <Paragraph position="0"> The target-text based prediction p(tlt) comes tY=om an interpolated trigranl language model for l%:ench, of the type commonly used in speech recognition (Jelinek, 11!190). It; was trained on 47M words fiom the Canadian Hansard Corpus, with 750/o used to make relative-fl'equency I)arameter estintates and 25% used to reestimate interpolation coefticients.</Paragraph>
    </Section>
    <Section position="2" start_page="394" end_page="395" type="sub_section">
      <SectionTitle>
3.2 Source-Text Based Prediction
</SectionTitle>
      <Paragraph position="0"> The source text prediction p(t\[s) comes fl'om a statistical model of English-to-l,Y=ench translation which is based on the IBM translation models 1 and 2 (Brown el; al., 1993). Model 1 is a Hid.den Markov Model (HMM) of the target language whose states correspond to source text tokens (see figure l), with the addition of one special null state to account for target text words that have no strong direct correlation to any word in the source text. The output distribution of any state tie the set of probabilities with which it generates target words) deitends only on the correspondit~g source text word, and all next-state transition distributions are uniform. Model 2 is similar to model 1 except that states are attgmented with a target-token t)osition cotnponent, attd transition probabilities depend on both source and target token positions, '2 with the topographical constraint that a state's target-token t)ositioll component must always match the current actual position. Because of the restricted form of the state transition UAlong with source and target text lengths in l/town et al's fornmlation, lint these are constant for arty particular HMM. The results 1)resented in this palter are optimistic in that the target text lengl;h was assumed to be known in advance, which of course is unrealistic. IIowever, (Dagan et al., 1993) have shown that knowledge of target-text length is not crucial to the model's i)ertbrmanee.</Paragraph>
      <Paragraph position="1">  J' ai d' autres cxcmplcs d' autres pays  cxamples from many other countries might generate the French sentence shown. The state-transition probabilities (horizontal arrows) are all 1/9 for model 1, and depend on the next state for model 2, eg p((froms, 6} I') = a(516). The output probabilities (vertical arrows) depend on the words involved, eg p(d' I {from~, 6}) = p(d' I from ). matrices for these models, they have the prop-erty that- unlike HMM's in general they generate target-language words independently. The probability of generating hypothesis t at position</Paragraph>
      <Paragraph position="3"> where sj is the jth source text token (so is a null token), p(tlsj) is a word-for-word translation probability, and a(jli ) is a position alignment probability (equal to 1/( M + 1) for inodel 1).</Paragraph>
      <Paragraph position="4"> We introduced a simple enhancement to the IBM models designed to extend their coverage, and make them more compact. It is based on the observation that there are (at least) three classes of English forms which most often translate into Fk'ench either verbatim or via a predictable transformation: proper nouns, numbers, and special atphanuineric codes such as C-~5. We found that we could reliably detect such &amp;quot;invariant&amp;quot; forms in an English source text using a statistical tagger to identify proper nouns, and regular expressions to match immbers and codes, along with a filter for frequent names like United States that do not translate verbatim into French and Immbers like 10 that tend to get translated into a fairly wide variety of forms.</Paragraph>
      <Paragraph position="5"> When the translation models were trained, invariant tokens in each source text segment were replaced by special tags specific to each class (different invariants occuring in the same segment were assigned serial numbers to distinguish them); any instances of these tokens found in the corresponding target text segment were also replace(\] by the appropriate tag. This strategy reduced the nmnber of parameters in the inodels by about 15%.</Paragraph>
      <Paragraph position="6"> When ewfluating hypotheses, a siufilar replacement operation is carried out and the translation probabilities of paired invariants are obtained from those of the tags to which they map.</Paragraph>
      <Paragraph position="7"> Parameters for the translation models were reestimated fl'om the Hansard corpus, automatically aligned to the sentence level using the method described in (Simard et al., 1992), with non one-to-one aliglmmnts arid sentences longer than 50 words filtered out; the ret~fine(l material consisted of 36M English words and 37M Fren(:h words.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="395" end_page="396" type="metho">
    <SectionTitle>
4 Hypothesis Generation
</SectionTitle>
    <Paragraph position="0"> The main challenge in generating hypotheses is 1;o balance the opposing requirements of completion accuracy and speed the former tends to increase, and tile latter to decrease with tile nmnber of hypotheses considered. We took a number of steps in art effort to achieve a good compromise.</Paragraph>
    <Section position="1" start_page="395" end_page="395" type="sub_section">
      <SectionTitle>
4.1 Active and Passive Vocabularies
</SectionTitle>
      <Paragraph position="0"> A well-established corollary to Zipf's law holds that a minority of words account for a majority of tokens in text. To capitalize on this, our system's French vocabulary is divided into two parts: a small active component whose contents are always used for generation, and a much larger passive part which comes into play only when the active vocabulary contains no extensions to the (:urrent t)refix.</Paragraph>
      <Paragraph position="1"> Space requirements for the passive vocabulary were minimized by storing it as a special trie in which conlnlon Srl\[~cix patterns are represented only once, and variable-length coding techniques are used for structural information. This allows us to maintain a large dictionary containing over 380,000 forms entirely in memory, using about 475k bytes.</Paragraph>
      <Paragraph position="2"> The active vocabulary is also represented as a trie. For efficiency, explicit lists of hypotheses at'(; not generated; instead, evaluation is performed during a reeursive search over the portion of the trie below the current coinpletion prefix. Repeat searches when the prefix is extended by one character are ol)viated in inost situations by memo\]zing the results of tile original search with a bestchild pointer in each trie node (see figure 2).</Paragraph>
    </Section>
    <Section position="2" start_page="395" end_page="396" type="sub_section">
      <SectionTitle>
4.2 Dynamic Vocabulary
</SectionTitle>
      <Paragraph position="0"> To set the contents of the active vocabulary, we borrowed the idea of a dynamic vocabulary from (Brousseau et al., 1995). This involves using</Paragraph>
    </Section>
    <Section position="3" start_page="396" end_page="396" type="sub_section">
      <SectionTitle>
4.3 Case Handling
</SectionTitle>
      <Paragraph position="0"> The l;re.~l;nlcnl: of \](;l;lx;r ca.s(; L'-; ;~ 1Mcky l)r()/)i(!ni for hyi)oi;l/(;sis general;ion mid one tlta.l; cil~llll()l; })(; ig;llor(;(\] \]11 kill inl;(~ra.('.l;iv(; al)l)li(:id;ion. IVlt)st; words can al)pc, ar in ;L llllllli)(!l' ()\[' (liffer(;nl; ('.;/os(!-v;rli;l,Iil, \['orllt,~ gbll(l \[;h(~l'(! ~%1'c, llO sit\[It)l(', ;)dl(1 ;d)solul,(~ l/l\[(',,q l;lii~t spec, i\[i7 which iS al)t)rot)ria,l;c i/l a t);u'l;ic, ula, r (:onlx'.xl;. To (x)pe wii,h I;his ,sil:ll;d;i()n~ w(; axh)l)ix;d a. \]i(;urisl;i(: sl,ra,1;(%y I)as(;(t on an idealiz(;(l nl()(l(;I of Fr(;n(:ti case c()iiv(~,ll{;iOllS in which w(irds m'e dirt(led into l;wo (:lass(;s: (:lass ;I words m'(; those which are normally wrii:l:en in low(we;me; (:lass 2 words are t, hose H/t(',h ;;~S \])rOl)(',r ltoittls whi(;h lior- null nmlly 1;~dce a, Sl)e(:i;fl case lml;Ix;rn (',onl;a,inin/r ~d; \](\]~-IoS/; ()IIC llt)\])(~I'CktS(~ c, tmra,c,l;er. Class 1 words gO, If. ('r;m; ('.ai)italiz('d hyt)ol;}i.(;s(!,~ ;i J: Lhe \[)el,;innillg o\[' gt S(',Ill~(',it(;(', O1' wh(',ll l;h(; (;Ollll)\](',I;iOll l)r('fix is (;;~l)ita,liz(,d; llI)I)(;rt:;tso hyi)othos(;,q when the (:oml)le null ch~racl;er the l,rmisl;~l;or will tiave l;o tyl)c. The fililt. |l;WO lll(~;tsur(!s ;IAo inlxmdcd Ix) ~t)prt)xiimd,t~ i,hc IIIIlHI)CF o\[&amp;quot; \]C/eysl;rokes s;tved within words. The firsl; ooSSIllliCS l\]ud; l, he, lirlmslal;()r TlS(',S ~ Slmcial COllil\[igblid&gt; ('.OSl,iug OIIC k(;ysi;rok(', lx) ~r(:c(;i)t; ;L pro t)()S;-I\[, r|~h(\] s(',COlI(t 0~S,SlllllCS |;h0ol; ~t(;(;(',t)l;};~ll(:(~ COllsisl;s merely ill l;yping l;h(', chm'~mlx;r whit:h follows i,he word either a st)~me or a punci;ua, I;ion \[ll}~l'k. 3 Complel;ions m'e free in this i~ccoIlltl;hlt,~&gt;  any spaces or punctuation characters in handtyped prefixes are assessed a one-keystroke escape penalty.</Paragraph>
      <Paragraph position="1"> Figure 4 shows the performance of the system for various values of the trigram coefficient A. A noteworthy feature of this graph is that interpolation improves performance over the pure trigram by only about 3%. This is due in large part to the fact that the translation model has already made a contribution in non-linear fashion through the dynamic vocabulary, which excludes many hypotheses that might otherwise have misled the language model.</Paragraph>
      <Paragraph position="2"> Another interesting characteristic of the data is the discrepancy between the number of correctly anticipated characters and those in completed suffixes. Investigation revealed the bulk of this to be attributable to morphological error. In order to give the system a better chance of getting inflections right, we modified the behaviour of the hypothesis generator so that it would never produce the same best candidate more than once for a single token; in other words, when the translator duplicates the first character of a proposal, the system infers that the proposal is wrong and changes it. As shown in table 1, completion performance improves substantially as a result. Figure 5 contains a detailed record of a completion session that points up one further deficiency in the system: it proposes punctuation hypotheses too often. We found that simply suppressing punctuation in the generator led to another small increment in keystroke savings, as indicated in table 1.</Paragraph>
      <Paragraph position="3"> letters are not normally followed by either spaces or punctuation. We assume the system can detect these and automatically suppress the character used to effect the completion.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML