Automatic Detection of Omissions in Translations 
I. Dan Melamed 
Department of Computer ~md Information Science 
University of Pcnnsylwnia 
Philadelphia, PA, \] 91104, U.S.A. 
melamed@unagi, cis. upenn, edu 
Abstract 
ADOMIT is an algorithln for 
Automatic Detection of OMissions 
in Translations. The algorithm re- 
lies solely on geometric analysis of 
bitext maps and uses no linguis- 
tic information. This property al- 
lows it to deal equally well with 
omissions that do not correspond 
to linguistic units, such as might re- 
sult ti'om word-processing mishaps. 
ADOMIT has proven itself by dis- 
covering many errors in a hand- 
constructed gold standard for eval- 
uating bitext mapping algorithms. 
Quantitative evaluation on simu- 
lated omissions showed that, even 
with today's poor bitext mapping 
technology, ADOMIT is a valuable 
quality control tool for translators 
and translation bureaus. 
1 Introduction 
Omissions in translations arise in several ways. A 
tired translator can accidentally skip a sentence 
or a paragraph in a large text. Pressing a wrong 
key can cause a word processing system to delete 
several lines without warning. Such anomalies 
can usnally be detected by carefnl proof-reading. 
However, price competition is forcing translation 
bureaus to cut down on this labor-intensive prac- 
tice. An automatic method of detecting omissions 
(:an be a great help in maintaining translation 
quality. 
ADOMIT is an algorithm for Automatic Detec- 
tion of OMissions in Translations. ADOMIT rests 
on principles of geometry, and uses no linguis- 
tic information. This property allows it to deal 
equally well with omissions that do not correspond 
to linguistic units, such as might result from word- 
processing mishaps. ADOMIT is limited only by 
the quality of the available bitext map. 
The paper begins by describing the geometric 
properties of bitext maps. These properties en- 
able the Basic Method for detecting omissions. 
Section 5 suggests how the omission detection 
technique can be embodied in a translators' tool. 
The main challenge to perfect omission detection 
is noise in bitext maps, which is characterized 
in Section 6. ADOMIT is a more robust varia- 
tion of the Basic Method. Section 7 explains how 
ADOMIT filters out some of the noise in bitext 
maps. Section 8 demonstrates AI)OMIT's perfor- 
mance aim its value as a quality control tool. 
2 Bitext Maps 
Any algorithm for detecting omissions in a trans- 
lation must use a process of eliminatiorl: It; must 
first decide which segments of the original text 
have corresponding segments in the translation. 
This decision requires a detailed description of 
the correspondence between units of the origi- 
nal text; and milts of the translation. To un(ler- 
stand such correspondence, think of the original 
text and the translation as a single bitext (Hat 
ris, 1988). A description of the correspondence 
between the two halves of the bitext is called a 
bitext map. At least two methods for finding 
bitext maps have been described in tile literature 
(Church, 1993; Melamed, 1996). Both methods 
output a sequence of corresponding character po- 
sitions in the two texts. The novelty of' the omis- 
sion detection method presented in this paper Dies 
in analyzing these correspondence points geomet- 
rically. 
A text and its translation can form the axes of 
a rectangular bitext space, as in Figure 1. The 
height and width of the rectangle correspond to 
the lengths of the two texts, in characters. The 
lower leg corner of ttle rectangle represents the 
texts' beginnings. The upper right corner rep- 
resents the texts' ends. If we know other cor- 
responding character positions between the two 
texts, we can plot them as points in the bitext 
space. The bitext map is the real-valued fnnc- 
lion obtained by interpolating successive points 
in the bitext space. The bitext map between two 
texts that are translations of each other (mutual 
translations) will be injective (one to one). 
Bitext maps have another property that is 
crucial lbr detecting omissions in translations. 
There is a very high correlation between the 
lengths of mutual translations ('p = .991) 
(Gale & Church, 1991). This implies that the 
slope of segments of the bitext map flmction tlne- 
tuates very little. The slope of any segment of the 
764 
mall will, in probal)ility, be very close tO the ratio 
of the lengths of l, lm two texts. \[n <)ther words, 
the slop\[; of ma.p segments has vel'y low val'ia/lge. 
3 The Basic Method 
Omissions in translations give rise to distinctive 
l>atterns in \[>itext maps, as illustrated in l!'igure I. 
'l'he nearly horizontal l)art of the 1)itext inal> in 
..Q 
8 D. 
~3 
o = '-- 
II fie :>, 
i 
.................. " ..... "=4 ............... 
Region A Region O Region B 
x = character position in text 1 
I,'igure 1: Au omission, iu bitex! space. Rcgiou.s 
A aud H co'rmspond lo rcgion.s a and b, respec- 
tively, l~cflion 0 has uo corresponding regiou ou 
the vertical azis. 
region 0 takes up almost no part o\[' the vertical 
axis. This region represents a section of the text 
on tit<', horizontal axis that has no corresponding 
section in the text on the ve.rtieal axis (,he very 
definition of an onlission. The slol>e betw<'en the 
end points of the region is unusually low. A n omis- 
sion in the text on the horizonl:al axis would man- 
liest itself ms a nearly verti<;al region in the bitext 
space. These ItIlllslla\[ slope <:onditions are the key 
to <letecting omissions. 
(-liven a noisc-fl:ee bitext map, omissions are 
easy to detect. First, a I)itext space is constructed 
by placing the original t<;xt on the y-axis, and 
the translation on the x-.axis. Second, the known 
points of correspondence are l>lotted in the l>itext 
sl>ace, l+,a<:h ad, iacent pair or points t)<)un<ls a seg- 
ment of (,he bitext map. Any segment whose sh>l/e 
is unusually low ix a likely omission. Ttds notion 
can I>e made precise by specifying a sloI>e angle 
threshoht l. So, third, segt-/|ents with slope angle 
a < t are flagged a.s omitted segments. 
4 Noise-Free Bitext Maps 
The only way to ensure tliat a bitext map in noise- 
fl:ee is to construct one by hand. Simard et al. 
(1992) hand-aligned corresponding sentences in 
two excerpts of tile Canadian Ihmsards (parlia- 
mentary debate transcripts available in English 
and French). l,'or historical reasons, these l>i- 
texts are named "easy" and "hard" ill the liter- 
ature, q'hc sentence-based alignments were con- 
verted to character-based aligmnei~ts l>y no(,iug 
the corresponding character positions at the end 
of ca.oh pair of aligned sentences. 'rhe result was 
two hand-constructed bitext maps. Several re- 
sear<:hers have used these \[>articular bilcxt ntaps 
;is a gold standard f(>r evahiating hitext mal>l>itlg 
and aligmneut algorithms (Simard el; al., 1992; 
(\]hutch, 1993; I)agan et al., 19!)3; Melamed, 19!)6). 
Surprisingly, AI)OMIT f'ouu<l lnany errors in 
these hand-aligned/>itexts, both in the alignment 
and in the original translation. AI)OM1T pro-- 
cessed both halves of both I>itexts using slol>e an- 
gle thresholds From 5 ° to 200 , in increments of 
5 °. For ea<'h run, AI)OMIT produced a list ()f the 
t>itext malls segm<mts whose slope angles were t>e 
low the speci\[ied threshold /,. The output for the 
French hall7 o1" the "easy" bitexl,, with t -: 15 °, 
consisted of the following 10 items: 
29175) to (26917, 29176) 
45647) to (42179, 45648) 
47794) to (44236, 47795) 
(26869, 
(42075, 
(44172, 
(211071 
(211725 
(319179 
(436118 
(453064 
(504626 
(658098 
230935) to (211379 
231714) to (211795 
348672) to (319207 
479850) to (436163 
499175) to (453116 
556847) to (504663 
726197) to (658225 
231007) 
231715) 
348673) 
479857) 
499176) 
556848) 
726198) 
Each ordered pair is a co-ordinate in the hitexL 
space; each pair of co-ordinates delimits one emiL- 
ted se.gmenL \]i;xamination of these L0 pairs o\[' 
C,}lara(-tcl? ra\[lgeS ill the bitext revealed Lhat 
• 4 omitted segments pointed to omissions in 
the original translation, 
• d omitted segments poitH,ed to aligmnent er- 
roFs~ 
• 1 omitted segment pointed to an omission 
which apparently caused an Mignment error 
(i.e. the segment contained ouc of each), 
• \[ omitted segment pointed to a piece of texl; 
that was accidentally repeated in the original, 
bu(, only translated once. 
With t = I0 °, !) o\[" the I0 segments b~ the 
list still came up; 8 out of 10 remained wit;it 
/. = 5 °. Similar errors were discovered in 
tile other half of the "easy" bitext, and in the 
"hard" bitext, including one omission of Jnore 
than 450 characters. Other segrne.nts appeared 
in the list For ~ > 150 . None of the other seg- 
ments were. outright omissions or misalignments. 
Howew'x, all of them corresponded to non-literal 
translations or paraphrases. For instance, with 
t = 20 °, A I)()MI'F discovered an instance of "Why 
is the governlnent doing this?" (;ratlslatcd as 
"Pourquoi?" 
765 
The hand-aligned bitexts were also used to 
measure ADOMIT's recall. The human align- 
ers marked omissions in the originM transla- 
tion by 1-0 alignments (Gale & Church, 1991; 
lsabelle, 1995). ADOMIT did not; use this in- 
formation; the algorithm has no notion of a line 
of text. However, a simple cross-check showed 
that ADOMIT found all of the omissions. The 
README file distributed with the bitcxts ad- 
mitted that the "human aligners weren?t infalli- 
ble" and predicted "probably no more than five 
or so" alignment errors. ADOMIT corroborated 
this prediction by finding exactly five alignment 
errors. AI)OMIT's recall on both kinds of er- 
rors implies that when tile ten troublesome seg- 
ments were hand-corrected in the "easy" bitext, 
the result was very likely the world's first noise- 
free bitext map. 
5 A Translators' Tool 
As any translator knows, many omissions are 
intentional. Translations are seldom word for 
word. Metaphors and idioms usually cannot 
be translated literally; so, paraphrasing is com- 
mon. Sometimes, a paraphrased translation is 
nmch shorter or much longer than the original• 
Segments of the bitext map that represent such 
translations will have slope characteristics sin> 
ilar to omissions, even though the translations 
nmy be perfectly valid. These cases are termed 
intended omissions to distinguish them fl:om 
omission errors. To be useful, the omission detec- 
tion algorithm must be able to tell the difference 
between intended and unintended omissions. 
Fortnnately, the two kinds of omissions have 
very different length distributions. Intended 
omissions are seldom longer than a few words, 
whereas accidental omissions are often on the of 
der of a sentence or more. So, an easy automatic 
way to separate the accidental omissions from the 
intended omissions is to sort; all the omitted seg- 
ments from longest to shortest. The longer acci- 
dental omissions will float to the top of the sorted 
list;. 
Translators can search for omissions after they 
finish a translation, just like other writers run 
spelling checkers, after they finish writing. A 
translator who wants to correct omission errors 
can find them by scanning the sorted list of omit- 
ted segments Dora the top, and examining the rel- 
evant regions of the bitext. Each time the list 
points to an accidental omission, the translator 
('an make the appropriate correction in the trans- 
lation. If the translation is reasonably complete, 
the accidental omissions will quickly stop appear- 
ing in the list and the correction process can stop. 
Only the smallest errors of omission will remain. 
6 The Problem of Noisy Maps 
The results of l!\]xperiment ~l demonstrate 
ADOMI'F's t)otential. Ilowever, such stellar per- 
formance is only possible with a nearly per- 
fect bitext map. Snch bitext maps rarely exist 
outside the laboratory; today's 1lest autonmtic 
methods for finding tlitext maps are far fl'om 
perfect (Church, 1993; l)agan et ah, 1993; 
Melamed, 1996). At least two kinds of map er- 
rors can interfere with omission detection. One 
kind results in Sl)urious omitted segments, while 
the other hides real omissions. 
I!'igure 2 shows how erroneous points in a bitext 
map can be indistinguishable from omitted seg- 
ments. When such errors occur in the map, // 
"true" bitext map ~ ~° 
erroneous \ 
o'mal ; "" - - .~-,¢ 
segment 
Figure 2: An undeleciable error in lhe bitea:t map. 
A real omission could resull in lhe same map pal- 
lern as lhese erroneous poinls. 
ADOMIT cannot help but announce an omission 
where there isn't one. This kind of map error is 
the main obstacle to the algorithru's precision. 
The other kind of map error is the main obsta- 
cle to tile algorithm's recall. A typical manifes- 
tation is illustrated in Figure 1. The map points 
in Region O contradict the injective property of 
bitext maps. Most of the points in Region O are 
probably noise, because they map many positions 
on the x-axis to just a few positions on the y- 
axis. Such spurious points break up large omitted 
segments into seqnences of small ones. When the 
omitted segments are sorted by length for presen- 
tation to the translator, the fragmented omitted 
segments will sink to the bottom of the list along 
with segments that correspond to small intended 
omissions. The translator is likely to stop scan- 
ning the sorted list of omissions before reaching 
them. 
766 
7 ADOMIT 
A I)OMIT alhwiates the fragmentation problem 
by finding and ignoring extralleOllS lna t) points. 
A COul)le of (hefinitions hell) to exl)lMn the tech- 
nique. Recall that omitte(l segments are (lefine(I 
with respect to a chosen slope angle threshold l: 
Ally segment of the bitext map with slope angle 
less than t is an omitted segment. An omitted 
segtn(mt that contains extraneous t)oint,s ('an be 
ehara('terized as a sequence of mininml omitted 
segments, intersl)ersed with one or more, intcrfer- 
lug segments. A miniinal omitt('.d s(',gm(,.nt ix 
an onfitted segment between two adjaecnt points 
in the bitext map. A maximal omitte(1 seg- 
m(:nt is an ondtted segment that is not a proper 
subsegmc'nt of another omitted segtlmnt. Inter- 
ferlng segnmnts are std)segtuents of maximal 
omitted segments with a slope m~gle at)()v(', Lit(: 
chosen threshold. IntertL'ring segments are always 
delinfite.d by extraneous Inap l)oinl;s. If il, were 
not for interfering segments, the fragmenl, ation 
problem could be solved I)y simply (;oneatenating 
a(lja(-ent minimal omitted segrne.ts, Using these 
definitions, the. prol)leHt of re(:otmtru(;tiug maxi- 
mal omitted segme.nts can be stated as follows: 
Which sequences of mimmal omitted segments re-. 
suited fi'om fragmentation of a maximal omitted 
s('.gment? 
A maxintal omitted segmeut Hnlsl; \]la, vea slope 
angle t)elow the chosen threshohl t. S% the \[)rob- 
h;m (:an be solved I)y considering each I)air of inin- 
imal omitted SeglllelltS, to Se':e if the. slope angle 
l)etween the starting point of the first and th(; end 
point el the secolM is less than 1. This brute \['oree 
solution requires ~:q)l)roximately ½"n, 2 comparisons. 
Since a large bitext may have tens of thousan(ts 
of minimal omitted segments, a faster method is 
desirable. 
Theorem 1 suggests a fast algorithm to search 
\['or pairs of mini trial omitted segments th at arc \['ar- 
thest al)art, and that may have resulted ffo,l I'rag 
m('.nt~tion of a maximal omitted segment. 'Fhe 
theorem is illustrated in Figure 3. tt and 7' are 
mn(unonics for "t)ottom" and "top." 
Th('.orein 1 Leg A be lh.e array of all minimal 
omitlcd segments, sorled by/lhe horizonlal posilion 
of the left end poinl. Lel H be a line in the bile.~l 
space, whose slope equals lhc slope of the main 
diagonal, such thai all lhe seqm.en:s in A lie above 
tl. l, el ,s be lhc left eudpoiut fff a se, gm, r'nl in A. 
tel :\[~ be a ray sla'rting at ,s with a slope angle equal 
lo the chose',, lhrcshohl I. Let i be Ihc i~ler,sc('lio'a, 
oJB and 'i ~. Let b bc the point o'. 11 with the same 
horizonlal posilion as s. No'w, a mamim.al omitted 
segm, enl starling at .~ musl end at so'me poi'.l c iu 
lhe triangle A.s'ib. 
Proof Sketch: s is deJiucd as lhe left end 
poiul, so e must be lo lhc righl of s. By dcfini- 
lion of B, e must be abovc H. If c were above "~', 
c.I 
o 
g_ 
?, 
i i J~-- 
di main ', 'e 
J (pq~ to main diagonal) 
/ FI _ 
x = character position in text 1 
l"igm'c 3: At,, cJ\]icicnt search jot 'maximal omilted 
scgmenl,s. The array of minimal omilled segm( ul,s 
lies above line 17,. Any scqueucc of .segmenls shtl'l- 
ing al s, such lhal lhe slope angle of lhe whole se- 
quence is less than l, musl end al some poinl (: in 
lhe lriangle Asib, 
then lhe slope angle of segmenI ,st would be ,qrea:cr 
lhan the slope angle of 7' = l, so Se co,hl not be 
an omilled segment. El 
A I)OMI'I' exploits Theorem l as follows. Each 
minimal omitted segtueut z h~ A is considered i. 
turn. Starting at z, AI)OM\[T searches the ar 
ray A for the last (i.e. righl, most)segtrtent whose 
right cml point e is in the triaugh'. A~sgb. Usually, 
this segment -will bc z itself, in which case the 
single mininml omitted segment is deemed a, max- 
imal omii,tex\[ segment. When e is not (-)ll tile s.%l\[ic 
minimM omitted segmen\[, as s, AI)OM I'1' centare 
nares all Cite segments between s and c to form a 
ma×imal omitted segment. The search starting 
from SeglllOllt 7, (;all stop &8 SOOll ~ts it elleOllllt;(ws 
a segment with a right end point higher than i, 
For us('I'ul vahms of t, ea(:h search will Sl)a.tl only 
a handful of ean(lidate (rod points. \])roccssing l;\[m 
entire array A i. this .umner produces the desh:ed 
set of maximal omitt(',d seg\[nellts very quickly. 
8 Evaluation 
'\[b accurately evaluate a system for detec(ing 
omissions in tra,nslations, it is uecessary to use 
a lfitext with ma,ny omissions, whose locatio.s 
are known in advance. For perfect validity, the 
omissions should be those of a real translat, or, 
working ou a real translation, detected by a per 
fc('t i)roof-rcader. \[\]nfortunately, first drafts of 
tra.sh,,io.s that had bee,, subj.d, ed to (:ar,~r.l ,',~ 
vision were not readily available. Therefore, the 
ewdual,ion proee.eded by simulation. The adva, ll 
rage of a simulation was complete control oww the 
lengths and relative positions of omissions. This 
is important because the noise in a bitext map 
is mort likely 1,o obscure a short otnissio, dlan a 
long one. 
767 
The simulated omissions' lengths were chosen 
to represent the lengths of typical sentences and 
paragraphs in real texts. A corpus of 61479 
Le Monde paragraphs yielded a median French 
paragraph length of 553 characters. 1 had no cor- 
pus of French sentences, so I estimated the me- 
dian French sentence length less directly. A corpus 
of 43747 Wall ,5'trent Jo,~rnal sentences yielded a 
median English sentence length of 126 characters. 
This number was multiplied by 1.103, the ratio of 
text lengths in the "easy" bitext, to yield a me- 
dian French sentence length of 139. Of course, 
the lengths of sentences and paragraphs in other 
text genres will vary. The nledian lengths of sen- 
tences and paragraphs in this paper are 114 and 
730 characters, respectively. Longer omissions arc 
easier to detect. 
The placement of silnulated omissions in the 
text was governed by the assumption that transla- 
tors' errors of omission occur independently fl:oIn 
one another. This assumption implied that it 
was reasonable to scatter the simulated omissions 
in the text using any meinoryless distribution. 
Such a distribution simplitied the experimental de- 
sign, because performance on a fixed number of 
omissions in one text would be the same as per- 
refinance on the same number of omissions scat- 
tered among multiple texts. As a result, the bitext 
mapping algorithm had to be run only once per 
parameter set, instead of separately for each of the 
100 omissions in that parameter set. 
A useflll evaluation of any omission detection 
algorithm must take. the human factor into ac- 
count. A translator is unlikely to slog through 
a long series of false omissions to make sure thai; 
there are no more true omissions in the transla- 
tion. Several consecutive false onfissions will de- 
ter the translator from searching any further. On 
average, the more consecutive fMse omissions it 
takes for a translator to give up, the more true 
omissions they will tind. Thus, recall is highly cor- 
related with the amount of patience that a trans- 
lator has. Translator patience is one of the inde- 
pendent w~riables in this experiment, quantified in 
terms of the nmnber of consecutive false omissions 
that the translator will tolerate. 
Separate evMuations were carried out for the 
Basic Method and for AI)OMIT, and each method 
was evMuated separately on the two different 
omission lengths. The 2x2 design necessitated 
ibm: repetitions of the following steps: 
1. 100 segments of the given length were deleted 
from the 1,¥eneh hMf of the bitext. 'Phe posi- 
tion of each simulated omission was randomly 
generated fl:om a unilbrm distribution, except 
that, to simplify subsequent evaluation, the 
omissions were spaced at least 1000 charac- 
ters apart. 
2. A hand-constructed bitext map was used 
to tlnd the segments in the English half of 
the bitext that corresponded to the deleted 
French segments. For the purposes of the 
simulation, these English segments served as 
the "true" omitted segments. 
3. The SIMI{. bitext mapping algorithm 
(Melamed, 1996) was used to find a map 
between the original English text and 
the French text; containing the simulated 
omissions. Note that SIMI{ cnn be used with 
or without a translation lexicon. Use of a 
translation lexicon results in more accurate 
bitext maps, which make omission detection 
easier. However, wide-coverage translation 
lexicons are rarely awfilable. +tb make the 
evMuation more representative, SIMR was 
run without this resource. 
4. The bitext map resulting froln Step 3 was 
fed into the Basic Method for detecting 
omissions. The omitted segments flagged by 
the Basic Method were sorted in order of de- 
creasing length. 
5. Each omitted segment in the output from 
Step 4 was compared to the list of true 
omitted segments from Step 2. If any 
of the true omitted segments overlapped 
the flagged omitted segment, the "true 
omissions" counter was incremented. Other- 
wise, the "false omissions" counter was incre- 
mented. An example of the resulting pattern 
of increments is shown in Figure 4. 
6. The pattern of increments was further ana- 
lyzed to find the first point at which the "\['a\]se 
omissions" counter was incremented 3 times 
in a row. The wflue of the "true on;fissions" 
counter at that point represented the recall 
achieved by translators who give up after 3 
consecutive false omissions. To measure the 
recall that would be achieved by more patient 
translators, the "true omissions" counter was 
also recorded at the first occurrence of 4 and 
5 consecutive false omissions. 
7. Steps 1 to 6 were repeated 10 times, in order 
to measure 95% confidence intervMs. 
The low slope angle thresholds used in Section 4 
are suboptimal in the presence of lna 1) noise, be- 
cause much of the noise results in segments of very 
low slope. The optimum value t -- 37 o was deter- 
mined using a separate development bitext. With 
t frozen at the optimum value, recMl was measured 
on the corrected "easy" bitext. 
Figures 5 and 6 plot the mean recall scores 
R)r translators with different degrees of patience. 
A I)OM\]T outperformed the Basic Method by up 
to 48 percentage points. AI)OMIT is also more 
robust, as indicated by its shorter confidence in- 
tervals. Figure 6 shows that ADOMIT can hel l ) 
translators catch more thall 90% of all paragraph- 
size olnissions, and more than one half of all 
sentence-size onfissions. 
768 
0 
\[- 
0 
<D 
100 \[ .... \[| 
90+ I 8O 70 
60 
50 
40 
30 
20 
10 
0 
0 
~L~ ...... 
10 20 30 40 50 60 70 80 90 O0 
true omissions 
Figure 4: An example of lhe order of "h"uc" 
and "false" omissions when sorlcd by lcntflh. 
ltorizonlal runs correspond 1o conscculive "h'ue" 
omissions in lhe oulpul; 'vcrlical runs correspond 
lo consecutive "false" omissions. In Ibis <cam- 
ple, lhe firsl run of more than. 3 "faLse" omissions 
occurs only after 87 "true" omissions. 
1 O0 -! 
II 
o~ 
m E 
90 i ................................. 
80 553:dharacter omissions 
70 ......... !39:dharacter Qmission{ " 6o !,, 
5o 711 ;:v:-:. .. :: 
40 
30 .......... 
20 - • ~ - ., __i 
5 4 3 
consecutive false emissions tolerated by translator 
l?igtlre 5: Mean Basic M elhod recall scor'cs with 
950X confidence intervals fin' simulaled translators 
with varying degrees of patience. 
AD()MH' is only limited by the quality of the 
input bitext map. 'l'he severity of this limits- 
Lion is yet t;o be det;ermined. This paper evalu-- 
a~,od AI)OM1T on a pair of buig,tages for which 
SIMR (;nil reliably produce good bitext maps 
(Melamed, 1996). SIMR will soon be tested on 
other language pairs. ADOMIT will become e.ve.n 
more useful as better bitext nml)ping technology 
becon\]es available. 
9 Conclusion 
AI)OMIT is the first pul)lished aul, oin~(,ic 
method for detecting omissions in translations. 
A I)OMIT's performance is limited only by the ac- 
curacy of the input bitcxt real). Given an accurate 
bitc'xt map, AI)OM IT can reliably dcte('l; even tim 
smallest errors of omission. Even with today's 
poor bitext mapping technology, ADOMIT lit,(Is a. 
large enough proportion of typical omissions to be 
of great practicaJ benefit. The t,e(:hnique is easy 
to implement and easy to integrate into a transla- 
100 
90 
"- 80 II 
.c 70 
~ 60 
50 g 
o) E 
553-dharacter omissions 
! 39:gharacter omissions 
I i ..... 
: "',.... 
40 "'""i 
30 : 
20 - ' - ~ - ' - 
5 4 3 
consecutive false omissions tolerated by translator 
l,'igure 6: Mean A I)OMIT' recall scores with 95~ 
confidence intervals for simuhttcd lranslalors wilh 
varying degrees of palicncc. 
tor's routine. AI)OMIT is a valuable qu,dity con- 
trol tool for tra.nslators and translation bureatts. 
Acknowledgements 
This research began while 1 was a visitor 
a~ the (\]eill, re d'hmovatiotJ en Tcctutologics dc 
l'lnformatioil in LavM, C, anda. The problem of 
omission detection wa.s suggested to mc by 1,31-. 
liott Macklovitch. I am grateful to the following 
people tot: commc, nting on earlier drafts: Pierre 
Isabelh;, Miekey Chandrasekar, Mike Collins, 
Mil,ch M;trCltS, Adwait llaJ, na/mrkhi, B. Srinivas, 
and two a, nonylnotls reviewers. My work was par- 
daily fumhxl by AI¢,O grant I)AAI,03-89 (;0031 
PLUME aml by AI{PA grants N00014-90-J-18fi3 
a.ml N(i6(\]0194(7 6043. 
References 
K. W. Church, "Char_Mign: A Program for Align- 
ing ParalM Texts at the Character Level," 31sl 
Annual Mcclin9 of lhc Association for" ComFula- 
lional Ling'uistics, (;oluml)us, OIl, 119!)3. 
1. 1)~tgan, I{. (?hutch & W. Gale, "l{ol)t,sl, 
Word Alignment for Machine A ided 'l'ranslation," 
Workshop on Very Large (7orpora, available from 
the A(JI,, 1993. 
W. (\]ale & K. W. Clmrch, "A Ih;ogral~l for 
Aligning Sentctwes in Bilingual Corpora," 291h 
Annual Meeling of lhe Association \]or ComFula- 
lional Linguislics, l~erk('ley, (;A, 1991. 
B. Ilarris, "Bi-'l'cxt, a New Concept in Tr~msla- 
/,ion Theory," Language Monthly //54, \[988. 
P. lsabcllc, personal communication, 1995. 
Ge rnel~tc Approach to 1. D. Melamed, "A o " 
Mapping 13il;cxL Correspondence," Coufc.rc',ce o, 
Empirical Methods in Nalural Language Process- 
ing, Philadelphia, U.S.A, 1996. 
M. Simard, G. I!'. Foster & I'. lsa\])elle, "Us- 
ing Cognates to Align Sentences in Bilingual Cor- 
pora," l"o'urth lntcrnalioual (;o,Ocrc'nce on Thc- 
orclical and Methodological lssnes in Machine 
Translalion, Montreal, (Xumda, 1992. 
769 
