Improving SMT quality with morpho-syntactic analysis 
Sonja Nieflcn and Hcrmann Ney 
Lehrstuhl fiir hflbrlnatik VI 
Computer Science Department 
RWTH University of Technology Aachen 
D-52056 Aachen, Germany 
Email: niessen@in£ormatik, rwth-aachen, de 
Abstract 
In the framework of statistical machine transla- 
tion (SMT), correspondences between the words 
in the source and the target  are 
learned from bilingual corpora on the basis of 
so-called alignment mode, Is. Many of the sta- 
tistical systems use little or no linguistic know- 
ledge to structure the underlying models. In 
this paper we argue that training data is typical- 
ly not large enough to sutficiently represent the 
range of different phenomena in natural langua- 
ges and that SMT can take advantage of the ex- 
plicit introduction of some knowledge about the 
lmlgnages under consideration. The improve- 
ment of the translation results is demonstrated 
on two ditferent German-English corpora. 
1 Introduction 
In this pal)er, we address the question of how 
morl)hological and syntactic analysis can help 
statistical machine translation (SMT). In our 
apl)roach, we introduce several transtbrmations 
to the source string (in our experiments the 
source  is German) to demonstrate how 
linguistic knowledge can improve translation re- 
suits especially in the cases where, the token- 
type ratio (nmnber of training words versus 
nmnber of vocabulary entries) is unthvorable. 
After reviewing the statistical approach to 
machine translation, we first explain our mo- 
tivation for examining additional knowledge 
sources. We then present our approach in detail. 
Ext)erimental results on two bilingual German- 
English tasks are reported, namely the VERB- 
MOBIL and the EUTRANS task. Finally, we give 
an outlook on our fllture work. 
2 Statistical Machine Translation 
The goal of the translation process in statistical 
machine translation can l)e fornmlated as tbl- 
lows: A source  string .f~ = fl... f.! 
is to be translated into a target  string 
c\[ =- el... el. In the experiments reported in 
this paper, the source  is German and 
the target  is English. Every English 
string is considered as a possible translation for 
the intmt. If we assign a probability P'r(e\[lfi/) 
to each pair of strings (el, fi/), then according to 
Bayes' decision rule, we have to choose the En- 
glish string that maximizes the I)roduct of the 
English  model Pr(c{) and the string 
translation model r'r(fff\[e{). 
Many existing systems tbr SMT (Wang and 
Waibel, 1997; Niefien et al., 1.(/98; Och and We- 
ber, 1998) make use of a special way of structur- 
ing the string translation model (Brown et al., 1993): 
'l?he correspondence between the words 
in the source and the target string is described 
by aligmuents that assign one target word posi- 
tion to each source word position. The prob- 
ability of a certain English word to occur in 
the target string is assumed to depend basically 
only on the source word aligned to it. It is clear 
that this assumption is not always valid tbr the 
translation of naturM s. It turns out 
that even those approaches that relax the word- 
by-word assumption like (Och et al., 1999) have 
problems with lnany phenomena typical of nat- 
ural s in general and German in par- 
titular like 
• idiomatic expressions; 
• colnpound words that have to be translated 
by more than one word; 
• long range dependencies like prefixes of 
verbs placed at the end of the sentence; 
• ambiguous words with different meanings 
dependent on the context. 
1081 
Tile parameters of the statistical knowledge 
sources nlentioned above are trained on bi- 
lingual corpora. Bearing ill mind that more 
than 40% of the word tbrms have only been seen 
once in training (see q~,bles 1 and 4), it is obvi- 
ous that the phenomena listed above can hardly 
be learned adequately from the data and that 
the explicit introduction of linguistic knowledge 
is expected to improve translation quality. 
The overall architecture of the statistical 
translation approach is depicted in Figure 1. hi 
this figure we already anticipate the t'aet that 
we will transtbrm the source strings in a certain 
manner. If necessary we can also apply the in- 
verse of these transfbrmations on the produced 
output strings. Ill Section 3 we explain in detail 
which kinds of transtbrmations we apply. 
Source Language Text 
1 
QTransformation ) 
1' fl 
Global Search: 
maximize Pr(el). Pr(f~ lel) 
over e I 
1 I 
l 
Target Language Text 
l J I ~ Lexicon Model Pr(l 1 \]e,) \[ 
Alignment Model \] 
Language Model 
Figure 1.: Architecture of the translation 31)- 
preach based on Bwes' decision rule. 
3 Analysis and Transformation of 
the Input 
As already pointed ouL we used the inethod 
of transforming the inl)ut string in our experi- 
ments. The advantage of this approach is that 
existing training and search procedures did not 
have to be adapted to new nlodels incorporat- 
ing the information under consideration. On the 
other hand, it would be more elegant to leave 
the decision between different readings, tbr in- 
stance, to the overall decision process in search. 
Tile transtbrmation method however is nlore 3(t- 
equate tbr the preliminary identification of those 
phenonmna relevant tbr improving the transla- 
tion results. 
3.1 Analysis 
We used GERTWOL, a German Morphologi- 
cal Analyser (Haapalainen and M~@)rin, 1995) 
and the Constraint Grammar Parser Ibr Ger- 
man GERCG tbr lexical analysis and inorpho- 
logical and syntactic dismnbiguation. For a de- 
scription of the Constraint Grammar approach 
we refer the reader to (Karlsson, 1990). Some 
prel)rocessing was necessary to meet the input 
format requirements of the tools, hi the cases 
where the tools returned lnore thalt one reading, 
either simple heuristics based on domain spe- 
cific pretbrence ruh;s where at)plied or a nlore 
general, non-mnbiguous analysis was used. 
In the following subsections we list some 
transtbrmations we have tested. 
3.2 Separated German Verbprefixes 
Sortie verbs in German consist of a main part 
and a detachable prefix which can be shifted 
to the end of the clause, e.g. "losfahren" ("to 
leave") in the sentence "Ich fahre morgen los.". 
We extr~cted all word forms of separable verbs 
fl:om th.e training corl)us. The resulting list con- 
tains entries of the tbrm prefixlmain. The en- 
try "los\[t:'ahre" indicates, fi)r exalnple, that the 
prefix "los" (:an l)e detached flom the word tbrm 
"fahre". In all clauses containing a word match- 
ing a main part and a word matching the corre- 
sponding prefix part occuring at the end of the 
clause, the prefix is prepended to the beginning 
of the main part, as in "Ich losfahre morgen." 
a.a German Compound Words 
German comt)(mnd words pose special 1)roblems 
to the robustness of a translation method, be- 
cause the word itself must be represented in the 
training data: the occurence of each of the coin- 
t)onents is not enough. The word "I~'iichtetee" 
tbr example can not be translated although its 
coml)onents "Friichte" and "Tee" appear in the 
training set of EUTRANS. Besides, even if the 
coml)ound occurs in training, tile training algo- 
rithm may not be capable of translating it prop- 
erly as two words (in the nlentioned case the 
words "fl'uit" and "tea") due to the word align- 
ment assumption mentioned in Section 2. We 
1082 
therefore split the COml)ound words into their 
(:Oml)onents. 
3,,4 Annotation with POS Tags 
()he way of hell)|rig the disanfl)iguation of gill- 
t)Jguous words is to annotate them with their 
t)m:l; of Sl)eech (POS) inl'()rmation. We (:hose l;he 
tbllowing very ti'equent short words that often 
(:;rased errors in translation fi)r VERBMO\]3IL: 
"aber" can 1)e adverb or (:onjun('tion. 
"zu" can l)e adverb, pret)osition , sepnrated 
verb prefix or infinitive marker. 
%ler', "die" and "das" cnn 17e definite m:ti- 
CIos el' \])1"Ol1Ol111S. 
'.\['he difficulties due to l;hese aml)iguities m:e 
illustrated by the fi)lh)wing exmnt)les: The sen- 
tence "Das wiird(' mir sehr gut 1)~ssen. '' is often 
trnnslnted 1)y "Th, e would suit me very well." 
iltsl;e;~(l ()\[ "5l'h,at would suit me very well." and 
"Das win: zu s(:lmcll." is trnnsl;~ted by "Th~Lt 
was to t'~lsl;." instea,(t of "Theft; was too f;~st;.". 
We alTpended the POS l;~g in training a,mt 
t(;st corpus fiTr the VERBMOBII, task (see 4.\]). 
3.5 Merging Phrases 
Some multi-word phrases as ~ whole rel)r(;sent 
a distine(; synta.7"tie rob; in (;he s(mtenT:e. The 
17hra.se "irgend ('.t;w;ls" (%nything") for exa,m- 
t)1(; m~y form ('it, l,('a: a.n in(h'tinit;('. (h'.t;('.rmino.r 
():c an in(lelinil;e pronoun. Like 2\] other mull;i- 
word tThrases "irg(:nd-et;wa.s" is merged in order 
t;o form one single voca,bulary ('nl;ry. 
3.6 Treatment of Unseen Words 
l"or sl;atist;i(::fl ma(:hin(; tr;mslation it is difficult 
1;() handle woi'ds not seen in training. \]~br m> 
kllOWll i)l;O1)el; ll&llIeS~ i\[; is normally ('(TrreT't to 
t)bme the word un(;h~mge(t into th(; transl~fl;ion. 
We have t)(;(;n working on the l;17ea~l;nlenI; of 1111- 
kll()Wll words of other types. As ~flr(;~dy men- 
l;ioned in Se(:l;ion 3.3, the st)litting of eomt)ound 
words cml reduce |;he nmnber of unknown Cl(:r- 
man words. 
In addition, we have examined methods of r(> 
pl~('ing a word \['ullform l)y ~ more ;O)stra('l; word 
form nnd (-heek whether this fi)rm is kn()wn and 
(:;~m l)e I;ranslnted. Th(' l;rmlslat, ioll of the sin> 
|)lifted word tbrm is generally not the precis(' 
trmlslai;ion of the original on(', 17ul; sometimes 
the intended semantics is conveyed, e.g.: 
"kaltes" is ~m adjective in the singular neuter 
fOl;lll &lid. c3~11 be t,l'a, nst:'ornled to the less 
specilic form "kalt" ("cold"). 
"Jahre" ("years") (:~m be replaced by the sin- 
gulm: form "J~fln:". 
"beneidest" (%o envy" in tirst person singu- 
lar): if the infinitive tbnn "beneiden" is not 
known, it might hell).just, to remove tim 
leading t)artiele "be". 
4 Translation Results 
We use the SSER (sul)jectivc sentence error 
rat(') (Ni('fien et al., 2000) as evaluation cri- 
t('rion: E~wh translated senten(:e is judged by 
~ tmmmi exmniner according 1;(7 nn error scale 
ti'om 0.0 (semantically and syntaeti(:~flly col 
reef) to 1.0 ((:onlt)h;l;ely wrong). 
4.1 Translation Results for VEm~MOmL 
Th(, VEI{BM()BII, corpus consists of st)onttme- 
ously spoken dialogs in t;he al)t)oint;ment sch(> 
(hfling domain (Wtflflster, 1993). German sen- 
t;ences ;~re l;ra.nsl;~lx;d inl;o English. The output 
of the st)ee('h re(:ognizer (Ibr example th(; single- 
best hyl)othesis ) is used as int)ut to the tr;ms- 
lation moduh',s. For resem:eh tmri/oses the orig- 
inal l;(;xt st)oken 1)y th(, users can t)7, t)r(;sented 
t() the translal;ion system t;(7 ev~flm~te the MT 
(:omponent set)er~ti;ely from l;hc, re(:ognizT~r. 
'l'h('. tra.ining set (:onsist;s (Tf d5 680 s(;nl;o.n(:e 
pairs. Testing was carried out on ~t seper~te 
set of 14:7 senl;enees l;h~fl; (to not contain any 
mlseen words, hi Table 1 l;he ehara('teristics of 
the training sets are summarized for l;he original 
eort)ns and after l;he ai)plication of the des(:rit)ed 
tr~Lnsfornlat;ion.s on t;he Gerlll}~tll part of l;he col 
pus. \[l.'he tM)le shows that on t;his cou)us Ill(', 
splitting of (:Oml)OUll(ts iinl)roves l;hc l;oken-tyl)e 
rntio tiom 59.7 t(7 65.2, lint th(', mmfl)er of singh;- 
tons (words s(;en only on('e in tt'nhfing) does not 
go down by more than 2.8%. '.l'he oth.er trans- 
fi)rm~tions (i)r(;1)ending separated verb 1)refixe,~ 
"t)ref"; mineral;ion wi|;h 1)OS t~gs "i)os"; merg- 
ing of phrases "merge") do not at\[bet these co> 
pus st;,l;isl;ies much. 
The translntion l)erformmme results are given 
in rl2~fi)le 2 tbr tra.nslat;ion of text and in 'l~fi)le 
3 for translation of t;he single-best hyl)oth(!sis 
given t)y a sl)eech recognizer (a('(:m:a.('y 69%). 
For t)oth cases, l;r;mslation on text ml(t on 
st)ee(:h int)ut , st)litting (:oml)oml(t words does 
1083 
Table 1: Corpus statistics: VERBMOBIL train- 
ing ( "baseline" =no preprocessing). 
preprocessing 
English 465 143 
Gerlnan 
baseline 
verb prefixes 
split compounds 
pos 
pos+merge 
pos+merge+pref 
no. of no. of single- 
tokens types tons 
4382 37.6% 
437968 7335 44.8% 
435 686 7370 44.3% 
442938 6794 42.0% 
437972 7 344 44.8% 
437330 7363 44.7% 
435055 7397 44.2% 
not iml)rove translation quality, but it is not 
harmful either. The treatment of separable pre- 
fixes helps as does annotating some words with 
part of speech inibrmation. Merging of 1)hrases 
does not improve the quality much further. The 
best translations were adfieved with the combi- 
nation of POS-annotation, phrase merging and 
prepending separated verb prefixes. This holds 
tbr t)oth translation of text and of speech input. 
Table 2: Results on VERBMOBIL text intmt. 
preprocessing SSER \[%\] 
baseline 
verb prefixes 
split compounds 
pos 
pos+merge 
pos+merge+pref 
20.3 
19.4 
20.3 
19.7 
19.5 
18.0 
The fact that these hard-coded transtbrma- 
tions are not only hclpflfl on text input, but 
also on speech input is quite encouraging. As 
an example makes clear this cannot be taken 
for granted: The test sentence "Dann fahren 
wir dann los." is recognized as "Dam1 fahren wir 
dann uns." and the tact that separable verbs do 
not occur in their separated form in the train- 
ing data is mffavorable in this case. The fig- 
ures show that in generM the speech recognizer 
output contains enough information for helpflfl 
preprocessing. 
Table 3: Results on VERBMOBIL speech inlmt. 
preprocessing 
baseline 
verb prefixes 
split compounds 
split+pref 
pos+merge+pref 
ssEa \[%1 
43.4 
41.8 
43.1 
42.3 
41.1 
4.2 Translation Results for EUTRANS 
The EUTRANS corpus consists of different 
types of German-English texts belonging to the 
tourism domain: web pages of hotels, touris- 
tic brochures and business correspondence. The 
string translation and  model parame- 
ters were trained on 27 028 sentence pairs. The 
200 test sentences contain 150 words never seen 
in training. 
Table 4 summarizes the corpus statistics of 
the training set for the original corpus, af- 
ter splitting of compound words and after ad- 
ditional prepending of seperated verb prefixes 
("split+prefixes"). The splitting of compounds 
improves the token-type ratio flom 8.6 to 12.3 
and the nmnber of words seen only once in train- 
ing reduces by 8.9%. 
Table 4: Corpus statistics: EUTRANS. 
preprocessing no. of 
tokens 
English 562 264 
German 
baseline 
split compounds 
split+prefixes 
499 217 
535 505 
534 676 
no. of single- 
types tons 
33 823 47.1% 
58317 58.9% 
43 405 50.0% 
43 407 49.8% 
Tile mlmber of words in the test sentences 
never seen in training reduces from 150 to 81 by 
compound splitting and can further be reduced 
to 69 by replacing the unknown word forms by 
more general forms. 80 unknown words are en- 
countered when verb prefixes are treated in ad- 
dition to compound splitting. 
Experiments for POS-annotation have not 
been pertbrmed on this corpus because no small 
set of ambiguous words causing many of the 
1084 
translation errors on this |;ask can be identified: 
Comt)ared to |;it(', VERBMOBIL task, this tort)us 
is less homogeneous. Merging of 1)hrases did not 
help much on VEI/,BMOBIL and is theretbre not 
tested here. 
Tal)le 5 shows that the splitting of comt)ound 
words yields an improvement in the subjective 
sentence error rate of 4.5% and the treatment 
of unknown words ("unk") improves the trans- 
lation quality by an additional 1%. Treating 
SOl)arable verb 1)refixes in addition to splitting 
compounds gives the be, st result so far with an 
improvement of 7.1% absolute COml)ared to the 
l)aseline. 
Table 5: Results on EUTRANS. 
1)ret)rocessing SSER \[%\] 
1)aseline 57.4 
split comi)ounds 52.9 
sl) lit+lmk 51.8 
split+prefixes 50.3 
5 Conclusion and Future Work 
In this paper, we have presented some methods 
of providing morphological aim syntactic intbr- 
mat|on tbr improving the 1)ertbrmance of sta- 
tistical machine trallslation. First ext)eriments 
prove their general aplflicalfility to reMistic and 
comI)lex tasks such as spontaneously spoken di- 
alogs. 
We are. 1)lamfing to integrate the al)t)roach 
into the search process. We are also working 
on  models and translation models that 
use mort)hological categories for smoothing in 
the case of unseen events. 
Acknowledgement. This work was partly 
supported by the German FederM Ministry of 
Education, Science, Research and Technology 
under the Contract Number 01 IV 701 q_'4 
(VERBMOBIL) and as part of the EUTRANS 
project by the European Comnmnity (ESPRIT 
project number 30268). 
The authors would like to thank Gregor 
Leusch tbr his support in implementation. 

References 

P.F. Brown, S.A. Della Pietra, V.J. 
Della Pietra, and ILL. Mercer. 1993. 
Mathematics of Statistical Machine %'ansla- 
tion: Parameter Estimation. Computational 
Linguistics, 19(2):263 311. 

Mariikka Haapalainen and Ari Majorin. 1995. 
GERTWOL und Morphologische Disambi- 
guierung fiir das Deutsche. URL: 
www.lingsoft.fi/doc/gercg/NODALIDA-poster.html. 

Fred Karlsson. 1990. Constraint Grmnmar as 
a Frainework tbr Parsing Running Text. In 
PTvecedings of th, e 13th, hzternational Confer- 
cnce on Computational Linguistics, volume 3, 
pages 168-173, Helsinki, Finland. 

Sonja Niefien, Stephan Vogel, Hermann Ney, 
and Christoph Tilhnann. 1998. A DP based 
Search Algorithm tbr Statistical Machine 
Translation. In Proceedings of the 36th An- 
nual Con:ferencc of the Association for Com- 
putational Linguistics and the 17th Interna- 
tional Conference on Computational Linguis- 
ties, pages 960 967, Montrdal, P.Q., Canada, 
August. 

Sonja Niefien, Franz loser Oeh, Gregor Leusch, 
and Hermaml Ney. 2000. An Ewfluation Tool 
tbr Machine %'anslation: Fast Evaluation 
for MT Research. In Proceedings of the 2nd 
International Conference on Language Rc- 
so'arccs and Evaluation, pages 39 45, Athens, 
Greece, May. 

Franz .losef Och and Hans Weber. 1998. hn- 
t)roving Statistical Natural Language ~:ans- 
lation with Categories and Rules. In Pro- 
eccdings of the 36th Annual Con.fcrcncc of 
th, e Association for Computational Linguis- 
tics and the 17th international Conference on 
Computational Linguistics, pages 985-989, 

Montrdal, P.Q., Canada, August. 
Iq:anz ,loser Och, Christol)h Tillmmm, aim Her- 
maml Ney. 1999. hnproved Alignment Mod- 
els tbr Statistical Machine Translation. In 
Proceedings of the Co~:ference on Empirical 
Methods in Natu~nl Language Processing and 
Very Large Corpora, pages 20-28, University 
of Maryland, College Park, Maryland, June. 

Wolfgang Wahlster. 1993. Verl)mobih Transla- 
lion of Face-to-Face Dialogs. In Proceedings 
of the MT Summit IV, pages 127-135, Kobe, 
Japan. 

Ye-Yi Wang and Alex Waibel. 1997. Decod- 
ing Algorithm in Statistical %'anslation. In 
Proceedings of the ACL/EACL '97, Madrid, 
Spain, pages 366 372, July. 
