Treating 'Free Word Order' in M mhine Translation 
17,a,lf STEINBEI-/.GEII, 
UMIST - Centre for Comput;ational Linguistics 
Ma.uchesl;er, UK, r~flf(/;~ccl.umist.a,c.uk 
0 Abstract 
In free word order languages, every s('.ni;cn(:(~ 
is cml)eddcd in its specific contexL The order 
of consl;ituenLs is d('tcrmincd l)y the categories 
theme, theme and contrastive J'oc'us. This pa- 
per shows \[low to recognisc mid to transhti,e 
these cat;egorles autom;~t;ically on ;~ serH,e.n- 
tial basis, so that SCilI;ClIC(.' ('ml)edding; can I)c' 
achieved wit;houL having i;o tel'e,' 1,o l,h(', co,> 
t;cxt. Tradition,dly neglected moditier ch~sses 
are fully covered by the proposc'd mc'Lhod. 
1 Introduction 
Most languages known as free word order hm- 
gv, ages are in facl; languages wil;ll parl, ially J)'ee 
wonl order (\]';ngcN(amp el, al. 1992), or rather 
free phrase order (Schi;ufelc 1991). A difli- 
culty linl(cd to the form;d de.scription of i,}w.sc 
languages is that instead o17 a (:ompletc lack 
of ordering rules many sul)l, le rcsi;ricl, ions alj- 
ply. A large arrlounl, of wor(t or(lcr v;triai;ions 
;trc gratnmi~t;ical in isoh~t<.d sct,tc,,:es, but co,,- 
text restricts the numl)er o1" s(K|uen(:es wllicli 
are possible and ilal;l~ll:al. Ill l,}lis s(.'llSC 1 Sell- 
fences air(', era, bedded in their context. A slw.- 
cific context calls for a certain word order, an(l 
the word order of a given senl, ence rcflect, s il,s 
conl;exl;. 
: \]n 1;his p;,pe,', we present ,'ccerit sugges- 
Lions on how 1;o treat, free lJhras(, order in N;d,- 
ural L~nguagc Processing (N\],P), and \[)?(?sent 
an alternative' solution t,o the problem. The 
id&~ is to use a the7natieall~-t.#ged, or fle:,:i- 
hie, canonical form (CI")for generation, and 
an algorithm to recognise the. re, levant cat- 
cgorics Ihe,te, "theme ;rod cont?'asZhm ./be'us 
du,'iilg ;m;dy.sis. 'l'his ,ncthod has })cc'n im- 
plcmctited successfully itl the u,tiflc;d, ion and 
constrainl.-basc'd M~vchine Tr;mslal,i(m sysl;em 
CAT2 (Sharp 71989, Sl;cinlmrgcr 1992a). I1, it> 
(:ludcs i;he ordering of ,nodi tiers, which arc l, ra- 
dil;ionally lcfI. oul; {,, wo,'d or(It,' desc,'il)l, iot, 
(C.o,,lo,,/l,;vet,s 71!1!)2). All stat;e, nc,~ts in (,his 
pN)cr concern writi,(m hmguage, ;~s spoken h.l- 
gu;tgjc is more \]i\])er;d wil, h rcsl)eCl; to ordering. 
2 The Data 
Wc shall stm'l. I)y prescni,ing some da.l,~ which 
illustral,cs the prol:~lc.ms re.hd,c.d to word ordc.r 
l, rc';d;inen(; in N\],P. Many ordering wn'iations 
arc possible (1 ;~ - \] e, 2a, 21)), Iml, seine of them 
arc, less nal, tn';d (\]c), and oI, h(.'rs arc ('.v('.n un- 
gramma.tical (2% 2d). 71('. is only accepl;ad)lc if 
I,}le pc','sotml protioun ich is heavily sl;rcsscd, 
indicatc'd here in capiLa.ls.' 
\[at Mori;en2, i we?de ich Jim vielleichl.lv, besuchen. 
Tomorrow will I him l,'obabl:ll visit 
lh h:h wei'de, ihn vielh!ic.hl.lu ,,K)rg(mu,i bes,,(:hen. 
1 will Dim probnbly Iomorrow uisil 
lc h:h we?de ihn i,,o,'geiluu vielhfichl;le beslichen. 
I 'will Mm lomorrow probably visil 
\[d VielMchI.,., we,'de ich ihn ,noi'ge,~e,~ l)es,,che,,. 
Probably will 1 him lomo,'row visil 
le ? Moi'gen:t2 we.,'de ihn vMleichtl.., ICI1 besuchen. 
7'omor~vw 'will him. prob~tbly visil 
2a I",r fuhr dennoch~o ebenFalls:m nach Mi'mdmn. 
lie dro-ve never'th~less alsp to Munich 
1The use of the index nmnbers will be explaim'.d in 
see.rich 5. 
69 
2b l)ennochun fllhr er ebenfallsa.~ nach Miinchen. 
Nevertheless drove he also to Munich 
2c * Er fldar ebenfallsa5 dennoch~u naeh Mfinchen. 
He drove also nevertheless to Munich 
2d * Ebenfallsa5 fuhr er dennoeh~,0 nach Miinchen. 
Also drove he nevertheless to Munich 
Depending on the context,, different word 
orders are eii, her required or, at the very lcasi,, 
they are more natural l,han others. Although 
in 3 and 4 the context is represented by ques- 
t;ions, it is not normally limited to I;hese. 3a, 
which is i;he most natural answer to 3, is very 
unnatural, if not ungrammatical, in 4. Al- 
though not all contexts restrict the order of 
consti/;uents as drastically as 3 and 4, it is a 
general rule for German an<l similar languages 
that sentences are more natural if they are 
properly embedded in their contexts: 
3 ~vVen erwartete (lie Frau wit, dew Nudelholz? 
Whom wailed-for the woman with the rolliuq pin 
3a Die I~au erwartete wit <lem Nudelholz ihren 
MANN. 
The woman wailed-for with the rolling pin her 
h,tsband 
3t) ? Die Frau erwartet;e ihren MANN wit dew 
Nudelholz. 
7'he woman waited-for her husband with lhe 
rolling pin 
4 MIt; wms erwartete die l'¥au ihren Mann? 
With-what waited-for the womml her h,tsband 
4a Die Frau erwartete ihren Mann mlt dam NUdel- 
holz. 
The woman waited-for her husband with the 
rolling pin 
4t) ?? Die Frau erwartete ,,,it dew NOdelholz ihre.,~ 
Mann. 
The woman waited-for wilh the rollin.g pin her 
husband 
It is generally acknowledged thai; the com- 
bination <>f several factors determines the or- 
der of eonstituenl, s in German and similar lan- 
guages, in Steinberger (1994)~ eleven princi- 
ples acting on the pragmatic, semantic and 
syntactic levels are listed, each of which can 
be reformulated as one or several linear prece- 
dence (LP) rules. The :factors comprise of 
t:he tendencies to order ele.ments according to 
the theme-rheme st, ructure and/or I,d the fimc- 
tional sentence perspective. Furthermore, t;hcy 
concern verb ben<ling, animacy, heaviness, the 
importarl<:e of semantic roles for phrase order- 
ing, and others. A disl;inct feature of the order- 
ing r<.'gularil, ies is that none of l, he feel,ors can 
be formulated as an absolute l,P rule, which 
makes word order description dimcull; to deal 
with in NI,P. In receni; years several proposi- 
tions were made to deal with this phenomenon 
in either analysis or general, ion, or both. 
3 Recent Suggestions on 
Treating Free Phrase 
Order 
Uszkoreit (1,087) suggests overcoming t;lle lack 
of absohg;e, rules by using disjuncl, iorts of I,P 
rules. The idea is that if at least one IA ~ 
rule sanctions a sequence of constituenl;s, the 
sentence is grammatical. The mode\[ thus ex- 
presses competence, ral;her than performance, 
as it either accepl.s or rejects a senl, ence, with- 
out maldng a judgement on accel%ability dif- 
ferences as in 1. 
Anothe.r idea put forward by \]);rbach 
(1993) accoum;s h)r grades of acceptability. Er- 
bach assume, s thai; the order of verb comple- 
ments ideally is according to an obliqueness 
hierarchy, and thai; each deviation from this of 
der decreases the acceptability of t.he sentence 
I:)y a factor of 0.8. 'l'wo divergences result in 
an a(:(:el)tabilii,y s(:orc of 0.64 (0.8 * 0.8), e(;c. 
Problems we see linked to this approach are 
I, he use of l;hc obliqueness hierarchy, which lim- 
its l, he preference mechanism to complements, 
and the fact, that every diw~'rsion decreases the 
score invariably, without considering the vary- 
ing effec(; of differen(, wn'iaI, ions. 
A proposal which (;akes into account the 
different importance, or weight, of preference 
rules, is presenl;cd in .lacobs (71988). Jacobs 
assigns each of his preference rules a specific 
numerical weight. If a rule applies in a giwm 
sentence, its value is added to the acceptability 
score of the sentCfl(X',, if it i's violated, its value 
is subtracted. The higher the final score, I;he 
more nal, ural, or the ~bettev' tim senl, ence is. 
70 
Idea.lly, air competing preference; rules are sltt- 
isficd. 'l.'hc coinplic;tl, ion we see wi(,h (;his al>- 
proach is t, tl~U; some stricLly or(ic;red sequ(;llces 
inter\[ere wil, h the calculation of accc~ptal)ility. 
Some of thorn concern the ordoring of (xm- 
ers (AbtSmmgspar(,ilwln; Thurma:r 1989) and 
other modifier subgroups (Stcinl:)ergc,r \] 99,t). 
Some o:f (;ho criticism could be overcome 
by changing l;I'le different propositions slightly. 
l%r instoan(:e, Erbach's (19.<)3) sugg(:si;ion to 
add prc'h:rence to fc;aturc'.-bascd h>ru:alisms 
could be combined with \[Jszl(orcit's prcfere)lc(: 
rules. An idea i,o solve i;he prol)le.ms linkc(l 
to ,/acobs' weighing mcchlmism would be i,o 
combine it, wiLh absolu(,e I,t ) rules, in orcter (;o 
avoid ungramma(;ical s('.qtl('.rlces. \] IOW(:V('r, We; 
want to suggest another method, based on our 
findings concerning na(,ural, marked and uu- 
gra)rmuU, icaJ word orclc;r, and mM:ing :is(' of 
the categories (;herr:e, rheme, and ('onl, ras(,ivc'. 
focus (henceforth simply called foc'.,s'). 
4 The New Model 
In our approach (of. Steinbergcr 3!)9,1), wc 
have diftk'.rent whys or dealing with \[rcc phri~se 
order irl analysis a.nd genc'rltl, iou. In analysis 
(of. section 6), g~i'a2"tllfla.l'S have to allow most, 
orderings, its blxrely any phrlxse order can be 
completely excluded. ()in(:(; it struct;ure is as- 
signed to an input sent;once, we sugg('.s(, that 
thematic, rhen~a(,ic and contrasl,ive\]y \['ocussed 
elements be identified by using our insighi, s 
COllC(;rlliSlg (;he re(;ognil, ion of (;l~(:sc c;ti,cTjories. 
This in:format, ion concerning \[u)ictionaJ seli.- 
ix'rice perspective can mid should I)(' conveyed 
in (;he l,~u'gc't langmtg(, of the. txanslal, ion. 
With respect, to getlera%ion ((;f. SC(:(;iOll 
5), accept~bIe, orderings are dcfined l>y a sin-. 
gle comprehensive line:at preccdc:nc:e (I,P) rule 
which not only assigns stric:~ prioril;ic'.s t,o syn:- 
bols t~tgge.d h>r syl:t~acLic ca%egory ((,..g. N I'or 
noinin,tl;ive NP> SIT for sil, m~tive c:oInplcmc:n(,, 
M for modifier), but; also for the (,hematic cat- 
c'gories theme, rh, emc and conlr'a.~li've J)Jcu.s'. l(, 
is crucial thai; t, hc' relative or(M:ing of sylUa(:- 
tic symbols can be varied by wxrying (;heir re- 
spective lhemalic m~u'kings. 'Hilt \],P rule idso 
assigns prioril;ic:s to syntacl, ic c:;U,(;g;ories which 
are not thernal, ica.lly marked. Thus,; i~ synbtc-- 
l;i(: elemeii(; is assigiled a. dc'f~ml(, position il" no 
I, hcmld,ic in\[orma,(,ion is a.vailable, bill; is move(1 
out of this default l)osi/,ion ir (;hc?l::<la.(;ic inl'of 
real, ion is presc;llt. \[\[i /,his way, a siiigl<' rill('. 
i'epre.qen(~s it fixed canonical \[orm for unmarked 
(:lemenl;s and at the slmm time perrnil;s widely 
varying (though no(; l;ruly J)'cc) orderings h)l' 
thematically tam'Iced cases. 
(\]chervil,ion and analysis ~c:corcling Ix) l, his 
":ne(,lio<l will be preseili,e(l ill rnore cletiUl now. 
5 Generation 
\;Ve argue in Six,hllwrgcr (199,1) that the use 
()r a, corlipr(',hc'lisiv(; I~P rule> as lJr(,.q(',llL(;d ill 
(,he 1)re.viol:s se.ction, is itn eiPicienl, way of gOll- 
criU, ing s(:'li(,(;ll(;(:s whic.h not only a.r(; c.orrect, 
in sonw contexts bul, wllicll conlply wll, h t, heh' 
coni,('xtultl rcsla'ic.tions. This flcxihlo Ot).l;p:ll; is 
achieved I).y using l;he l;}iro.(; (,hematJc catx'gories 
Zheme, 'd~c'me mid c.o,~h'a,~ti'lse j'ocu.% which Cml 
capl;ur(', cOnll)lelncni,s a.s wdl i~s iTiodil\]c;rs i'c;-. 
alised by all piira.saJ cal,c,gories. 'l'ablo I showx 
such a (J\]" \[or (7('rlll&lt. 
The table is to l>e read \[roni ler~ 1,o right 
&Ii(\[ l'rOill Lop (,o boiiLolll, The iel, ix,rs N, A, 
I)> (~ rel)rcse, nl, I, hc; \['olar C&SOS ll~OlT/itla, Live,, a,(;- 
cuslcl, ive, daCive, a.rld genitive. PO sLands for 
p:'eposil, ional ob.iec(;, and SIT, Dill. a.nd EXP 
\[or situative, direcl, iona.l m~d expmlsive con> 
l)l(:mcrll;s. Nonl a.n(l Adj a,ro l:orllirl~tl mid a,(l- 
jectiva\[ c:onlpi('lllenl, s, M reprCselll, s I;he (livcrse 
gjroul)s of modili(,rs. The. f('a.l, llrc +/-d r(;l'(~l'S 1,o 
d(:flnil;eness, -I l-i< I,o &tiiiil&cy> ~g(7 l,o SlllJl)orl; 
vcrl) consl, rucl,ions, and I,h(' index iitliill)crs l,c) 
M indica.l,e t, il(' rc'lid, ive order o\[ mo(li\[i('rs (Mi 
lJrecodes MT, lind so oil). Tile index rnlrnl:)crs 
are lmse.d on Ilobcrg's classification (1981 ). If 
el(:mcnl;s cmmot cooccm:, (;hey i~re sep~u'itl;e(l 
by a sla,sl: (/), it,  oppo,,<,ct (;o by it,,,ow (<). 
'1'\[:(' CI" h:llpos(;s linear order Oll &l: llti- 
orclered set of itrgunic:ni, s mid modifiers. When 
(;he mmlysis of the source language fails to 
rccognisc (,\]icrn('.> i'\[iclli(? itil(i \['ocus> it defaul(, 
oi'<l(:r is gc,.li(:rate(l. Al(,ilough no C l" sc,:luenc(,. 
can produco good senter~ce.s in MI coni;exl;s (c\[. 
3 and ,1), (;Iw dehml(, ()l'cl(:l' is suiliiU>l(: in it large 
31YIOtllll; O\[ conl;exl;,q. 
71 
Np,.o,JN+a+b < (A<D/Nom/Adi)p,.o,~ < I IILMI, < N+d_~,/N_~+~ < 
< (N~o,,/N+a+~)+fo~/(A<D),,~o,~+to~,.~ < (A<D)-I-~+<~ < G,,,. .... < N_d_~ < (A<l))+,e_~ < 
< M~,.~o,,~(~_~s ) < M,it(,~,,)_.m) < mneo(.ll) <\[ Mmod(,12-.m) < 
< POp,.o,~ < (a<I))_~t_, < 17'().j.d_,.~ < PO+a_~, < P()-,~+,~ < P()-,~-~ < (~I,~o,,~ < 
< < 
< SIT/DIR/I,;XP < (Nom/Adj)_,,,.o,~ < (N/A/D/G/I'O)svc, 
Table 1: ~Thematieally-tagged' Canonical Form for German 
Before showing some example sentences 
generated by ~his CF~ we have to mention 
one particularity of German, which is that the 
verb is in second position in declarative ma- 
trix clauses (verb-second, or V2 position), and 
in final position in subordinate clauses (verb- 
final, or VI e position). Nearly any element can 
take the one position preceding the verb in V'2, 
,-ailed the. Vo~J'd~t ("p,'e-(verbal)field'). Nor- 
mally a thematic element is placed into the 
Vorfeld. According to IIoberg's (1.981) analy- 
sis of the Mannheimer l)'uden Korpus, in 63% 
of al\[ V2 sentences the nominative complement 
(sub jet6) takes this place. A convenient way of 
seeing it is that all elements fol|ow the. verb in 
V2 position according to tile CF, and that one 
(thematic) element is moved int;o the Vorfeld 
position. We suggest that if the analysis of the 
source language fails to recognise the theme of 
the sentence, the subject takes this place. 
In our model, most elements can cith(,r I)e 
thematic, rhematic, or neutral (i.e. unnmrked 
with respect to theme and theme). Sent(?nce 
variations as different as shown in the exam- 
ples 5a to 5d can be generated using tim canon- 
ical form presented above, depending on t;he 
parameterisation of the features theme, rheme 
and focus for the different constituents. The 
order of elements in 5a corresponds to the de- 
faull, order. Itowever, the same order would be 
general;ed if the personal pronoun was marked 
as being thematic, and/or if the adverb gest.ern 
was rhematic. We put the information -t-theme 
in 5a to 5e in brackets to indicate that this 
feaLm'e is not a requirement to generate I,he 
respect;ire word orders. The relaJ;ive order el' 
the adverb and the accusative NP in 51) dif- 
hn's fi'om the one in 5a, becaus(" I, he object den 
Mann is rhematic. In 5c and 5(I, 9estcr'n and 
den Mam~ arc thematic, respectively, in ad-- 
dition t;o this, the persorm.l pronoun in 5(l is 
marked as being stressed contrastiw'.ly. We 
used eapii;al letters 1,o express the obligatory 
h~cus, ll; is easy to think of more phrase order' 
combinations caused by further parameterisa- 
lions. 
5a lch(+o~,,~.) habe den Mann ge.sl,ern~(+,.h~,,~) 
gesehen. (A+a+~ -Mun ) 
I have lhe man 9eslerdaj/see~ 
51) Ich(+0~e,,~e) babe gest;e.rn2(i den Mallllq.,.he,n e gese- 
\[lell. 
1 have yeslerday Ilze mm~ seen 
5(: (',OSlx~.rl126+theme hill)(? i¢;11 deal Mallll(_brheme ) 
gesehen. 
Yesterday have 1 th( man seen 
5d Den M ann+the,,~e hal)e, ge.s t e r n :~_I_ t h ~m e. 
\[(21\[+\]o~u~ gesehen. 
The ma'll, have yesterday I seem 
Modi tiers shou l<l 1)e classified according to 
ltoberg's (71981) d4 modifier position classes, 
which partly coincide with the common seman- 
t, ic classi\[ications, and partly not. Ilobcrg's 
modifier indexes are l;he r(.'sult of the stal;istical 
veril~cai,ion of lintel s intuitive classes (1970). 
As modifiers do not alw~ys follow in l, he same 
order, ltoberg chose a classl fication which lead 
to least deviations between her cla.ssiflcation 
72 
and i;he order in the corpus used (Mannheimer 
Duden Korpus). The following sentet~ces ex- 
emplify the order of Lhe CI ~ for modifiers: 
6a~ Ich babe deshalb.,2 gestern.,a mit Wolf.v2 fin'nge.- 
sehen. 
I have therefore yester'day wilh Wolf watched-to 
6b lch habe deshalb.22 ntit ~,~701f4.2 gest.ernu~+,.h~,,,e 
fern gesehen. 
I have therefore with Wolf yesterday watched-iv 
7 \])amals2~+th~m~ bin ich l,'rauen ohnehin9 ofl,37 
iiberstiirzt~a davongelmffen. 
Then am I women anyway often o'verhastyly ran- 
away (Then, I often ran away fl'om women over- 
hastily anyway) 
l)ue to the procedure descril)ed iu this sec- 
tion, ungrammatical sentences such ~s 2c and 
2(1 c~m be. ~voided successhflly. 
6 Analysis 
~l'he generation of contextultlly embedded sen- 
tenees is based on the succoss\[lll analysis of 
l;heme ~md rheme constiLuen~s. U'he recognio 
Lion of contrasLive sl;ress is even more impor- 
l, anl,. A basic fa(:l; l, hat can be used h)r the m~- 
tomlttie recognition of these cal, egories is i;lt~L 
not only the conl, ext determines the orde.ring 
of constituents in m~ eml)edded seng(.'nc(', but; 
also ~ given se.ntence carries inforrn;tl,ion on 
Clte contexL to which it 1)elongs. When Cler- 
rna.n n~l, ive spe~tkers see (,he sentence 3;~/dl)~ 
for instance, Lhey h&ve ~t st;tong feeling a|toul; 
the context; in which it occurs. It is very liko.ly 
thai; 1;11(; NP ihr'en Mann is stressed, ll; is e.ither 
rhenutLic, or it c~trries contra.stive focus, le is 
even more restricted. 'Che personM pronoun 
ich must be contrastively stressed (I ~tzyselfam 
the person who visits him). in every conl,ext 
requiring another stress, le is ungr~mmu~tica\], 
\]I; is l,hus possible to extr~tcL inform~tlfion on 
the context of ~ given seTiLen(;e, wil, hout halving 
~ccess to the prec(;ding se.nLences. 
Analysis grammars must, allow mosl; con- 
stituent order w~ri;~t;ions, its the number of 
phrase orders theft c~m be excluded is very lim- 
ited. q'he diiDrence with generation gr~m~mm's 
is l;h~tL it is suttqcient to generate one 'goo(t' 
phr~tse order for e;~ch context, whereas in ana.1- 
ysis all possible vari~ttions h~ve to \])e ~dlowe.(t. 
For this red,son, ~he CF is of no use~ for ~mal- 
ysis. hlstea.d, mlMysis grammars should Mlow 
~dl gramm~tic,M orders ~md ide.niAfy /,hem~tic, 
rhemal;ic mld focussed I)hri~ses. 
In our :tlgovithm, the number of possi- 
l)le thenms lind rhe~mes is limited to on(: con- 
stituenl, cinch, as l, his is sufficient Co generate 
l, he. w~ria.tions in 5 to 7. Firstly, focus should 
1)e identified, a.nd ~l'{;er this theme ~n(l rhenm. 
Some pe.rmul;~l, ions are. only possible if one 
consLitucnl; is stressed conLrastively. These 
construcl;ions include l;he V-orJ'eld posit;ion of 
some i;yl)i(:idly rhern~t, ic elernerfl, s (8, 9), l,he 
right, movemenl; of (:onstil;uerlts which h~we a. 
strong t.ender~c:y I;o (,he left (of. 1('. mt(l 5(l 
altove), ~md ol.hers (SI, einberger 1.99,1). 
8 Nach li'l{.ANKreich+/ ...... ist Vahd ge.flogen. 
To l,'raT~ee is Vah/~ Jle'm (Vah.g flew to l+a'uce) 
9 l!finen INder+io,:,,., Iml, Anne geheir~lxd;. 
Au Indian has ATJne married (,'ln~e has married 
an htdian) 
In i, he nexl; step, i, he theme category is 
iderfl, ified. \]';v(ery element i~l, the I)eginrting of 
the chmse is marked i~s ~ the.me if i(, has not 
1)e.en idenLified as ~ focus in Lhe preceding sl;ep 
(J0, 11): 
10 I)mlmls+tu~,,~ le/)t,e. \[lendrix noch. 
Then lived llendri," still (llendrix was still alive 
the.u) 
I1 lch glauhe,.dal3 'l'ina+u,~,,~ ofl, koe.hl;. 
1 believe lhal "l'i,~a often cooks 
Simil~,' t,o lla.jig:ovd, el, a.l.'s (1993) sug- 
gestion for I);nglish, and I,o Mi~Lsul)a.r~ el, al.'s 
(1993) for .la.l)amese , tim h~sL (-ollsLiLuent of 
the senl, ence will l)e re('ognised its rherru~tic, its 
rllemes Lend to occur sc'ntence-fina~lly (cf. 5;~ 
and 61)). Our approach differs from tllkii~:ovA 
et a.l.'s, howe.ver, in theft we prohibit some ele.- 
ments from 1)eing rho.m~tie. In Germ~m, 1;hese 
inhere.rH, ly nou-rhemi~t,ic eleme.rM;s include per- 
sonM pronouns, as we'll as a limited set of 
too(lifters such as 'wohl in 12. Although some 
modifier groups tend to be potential rhcmes, 
m~d ot, hers do n()t, mosL modifiers muM, b(: 
coded individually in thel dictionary (Slx'An- 
I)erger, 1994). Not('. I;h~l;' if inherently non- 
r\]lem&tic elemenLs occur seml;e~n(:e-ihmlly, it, is 
Z3 
likely that either the verb in V2 position, or 
the Vorfeld element, carry heavy stress (12a vs. 
12b). 
12a Er LAS+/o~,,~ den Artikel iiber Worl.stelhmg 
(lann wohl-rheme • 
He read the article on word-order then presum- 
ably 
12t} ?? Er las den ArTlkel iiber Wortsl, elhmg (hmn 
wohl-,.heme • 
Haji(:ov~ et el. (1993) suggest that verbs 
are generally marked as rhemes, except if they 
have very general lexical meaning (su(:h as be, 
have, happen, carT'y oul, become). As our main 
concern is word order, and German verb pie{:(> 
ment is restricted by rules which do not al- 
low variation, our algorithm does not allow 
the recognition of verbs as rhemes. In 12, no 
constituent wou\[d be recognised as being the- 
matic. 
Not all languages express theme, rheme 
and focus as distinctly by word order vari- 
ation as German does. Either they rely on 
1;he context to find out which constituents 
(have to) carry stress, or they use other \]Tleans 
such as clefting, pseudo-clefting, topicatisa- 
tion, dislocat;ion, voice, impersonal construc- 
{.ions, partMes, and morphological as well as 
lexical means (Foley/Van Valin 1988). How- 
ever, even in English, which is often r(.'h;rr(,(l 
to as a, fixed wor(\[ order language, irlforma{,io,i 
on theme an{l rheme can be extracted auto- 
matically (Hajiaov£ el;. al. \]993; .qt(;i,~l)('rg{.'r 
1992a). To which (tegr{'c this information is 
conveyed in other languages, and 1\]y which 
means, must be subject to a language pair- 
specific investigation. The extraction of infor- 
mation on theme, rheme an(t focus is more im- 
portant when translating from one \[rce phrase 
order la.nguage int, o another, than when trans- 
lating into a fixed-word order language. }low- 
ever, there are independent reasons for recog- 
nislng the sentence focus, namely the. correb> 
{ion between stress on the one hand, and scope 
of negation (Payne 1985) and of degree modi- 
tiers (Steinberger 1992\[)) on the other. 
7 Ambiguity Resolution 
Findings on natural, less natural an(1 ungram- 
matical word order w~riations can also be used 
to iruprove sentence analysis with respect go 
some cases of ambiguity resolution. In the 
case of Tl3, chef' can l)e recognised as denoting 
earlier (e.her2(;), as the homonymous adverb 
(ehers, "ra~her") must not \[)e negated. Fur- 
flmrmore., some cases of unlikely PP attach- 
ment can he nearly excluded. In ld, the I}P 
expressing local;ion (vor der \]lank) is unlikely 
to be a sentence modifier, as this would result 
in (:on{restive focussing of the personal pro- 
noun ihn. This can be seen in 15, where the 
PP car\]not 1)e an ad,iunct l,o the preceding NP, 
b(,(:mlse the Nil ) is realised as a pronoun. The 
PP in 14 is thus more likely to be an adjunct to 
{,he nomi n al;i ve N P (ler M ann (TI 4 a) t h an a sen- 
ten(:e modifier (1,tb). The genera.l principle is 
that focussing constru(;t;ions a.re relatively un- 
lil(ely to occur ill written text, and therefore 
one should avoid the an~dysis involving focus 
when another analysis is possible. This is the 
case when the analysis of the PP as an adjunct 
results in a sentence without toni, restive stress. 
13a Er sollte ni('ht eherus kommen. (not earlier) 
lle should not earlier come (lie should not come 
earlier) 
1% * EP sollte nlcht ehers kommen. (rather) 
He shouhl 7~ol ralh.er come 
14 l)eshalb hat (let Mmm vor der Bank ihn geseh{.m. 
7'her{fore has the man iu-ffout-of the bank him 
sc~..u (7'her~'Jbr~' Ihe man in /toni 4' lhe bauk has 
seen him) 
Ida l)(~shal/) hal; der Mann vor der Bank ilm gesehen. 
ldl) ? I)eshalb hat der Mann vor der Bank IIIN ig- 
llorierl,. 
15 ?? Deshalb hat er vor der Bank IIIN gesehen. 
Therefore }las he iu-fi'o'al-of lhe bank him seen 
8 Conclusion 
The order of constituents in free phrase or- 
der languages is det, crmined by a set of :fac- 
tors which const, itute tendencies rather than 
clear-cut rules. The fact; thi~t most;, hut not all, 
constituent orders are possible, an(1 that some 
74 
orders are more n,~tura\[ than others poses a 
considerable problem for NI,P. 
In this paper, we presented a method t,o 
deal with these problems from the analysis and 
l;he generation point of view. Concerning anal- 
ysis, the znMn idc~ is (.hat single sentences re- 
flect the theme-rheme structure irnpos(,d l)y 
the context, so that thematic, rhcmatic and 
(contrastivcly) focussed constituents can often 
I)e recognised, in generation, wc can convey 
gills knowledge, by diN'.ring word order depend- 
ing on the context. This is achieved by using a 
c~monical form which includes l;he flea:ible cal,- 
cgories l heine, theme and conZraslive focus. 
A major a(twmtage ()vet: methods sug- 
gested in the past is that ~cceptz~bilit, y differ- 
ences between sentences can be dealt with, and 
thai: even modifier sequences, which are tra- 
ditionally left ou(; in word order descripLion, 
can be handled. Wrong const,il;uent; or(lets are 
avoided, because the order of t,h<' major part; 
of the sentence is fi×ed, and only sir@c' con- 
stituents move to the theme and theme posi- 
tions. 'Fhc difficulty arising from the unclear 
l)orderline between free and fixed phrase or- 
der, which is typical of most free phrase order 
htngua.qes, is dealt with successfully. 

Bibliography 

Conhm, Smnall Pin-Ngern and Martha Evens 
(1992). Can Comlmters \]landle Adverbs?. In: Coliuq 
Proceedings, 1192-1196, Nantes 

Engel~ Uh'ich (1970). Regeln zur Wortsl.elhmg. In: 
l"orsch'ungsberichte des lnstituls fib' deutsche Sprachc 
5, 7-148, Ma.nnhelrn 

Engelkamp~ au.(lith, Gregor Erba('h and Ilans 
Uszkoreit (1992). llandling l,inear l~recedence Con- 
straints by Unification. In: A UL Proceedings, 201-208, 
Newark 

Erbach~ Gregor (1993). Using Preference Vahms 
in Typed l,~ature Structures I;o li',×p\]olt Non-Absoh~te 
Consl,raints for Disambiguat;ion. In: llarald 'Frost 
(Ed.), 173-186 

Foley and Van Valin (1985). Information l>a(:kag- 
ing in the Clause. In: q'irnol,hy Shopen (E<I.), 282-36.'t 

IlajiSowl, Ewe, Petr Stall and liana Skomnalovai 
(1.993). Identifying Topic and Focus by an Automatic 
Procedure. EACL Proceedings, 178-182, ULrcc.ht 

Holmrg, Ursula (1981). Die Wortstellqng in der 
geschriebenen deutschen Gegenwartssprache, M iinchen 

.lacobs, ;loachi:m (1988). I'r(',bh, nte der freien WON,- 
st;elhlng im I)eul;sche.n. In: S'prache "und l'r'agmalik - 
Arbeilsberichle, 8-37, l,und 

Matsubara, Tsutomu, Itidet:oshi Nagai, Teigo 
Nakamm'a .rod Hirosato Nomura (1993). 

Stochas\[,ic Model for Focus and it, s Application I;o I)i-- 
alogue Gcnerat.ion. In: NLI'I~S Pvoceediugs , 402-405, 
Fulcuoka 

l~ayne, John IL (1985). Negation. In: 'Fi,nothy 
Shopen (Ed.), 197-242 

Sc.hihffele~ Stew,.n (1991). A Nol.c on I, he 
Tern, 'S('raml)ling'. In: Nal.ural Language and 
Lin.quislic Theory, volume. 9-.2, 365-368, l)or- 
d rechf,/Boston/London 

Sharp, Randall (1989). CA'1'2 - A Formalism for 
Mull, ilingual Machine Translal;ion. Proceedings of the 
ht lernalional Seminar on Ma chbJ e Tr'anslation, 'l'blisi, 
Georgia (USSR) 

Shol)mt , Timothy (Ed.) (1.985). Language. Ty- 
pology and Synl.acl, h: l)escrlpt, lon, Vohtme 1: (3ause 
Slrucl'lwe, Cambridge 

Steinl)erg(.w~ Ralf (1992a). l~eschreil)ung der Ad- 
verbsl, elhmg inl dmltschc,i und englischen Sal;z im llin- 
I)lick auf Masdfilmlle 0bersetzung. Eurolra.D Work- 
i,lg Papers No. 23, IAI, Saarbriicken 

St(;iId)(:l'g(:r, Ralf (19921)). l)er Skopus von (;rad 
partil(eln: Seine i)berset.zm~g trod seine lrnple, nlen 
tiertmg im Masc.hinelhm {)lmrsetzungssystenl CAT2 
l'/urolra-D Workin9 l'apers No. 24, IAI, Saarbriicken 

Stcildmrgcr, Ralf(1994). A StHdy of Word Order 
Variation in Ge.rman, wil,h Special I{eferencc to Modi- 
tier Placenmnt. Phl) Thesis, Uniw.~rsity of Manchester 

Tlmrnmlr, Maria (1989). Modalpartikcln mid ihrc 
Koirlbinat;ionen, 'l'iil)inge.n 

"l'rost;, llarahl (Ed.) (1993). Feal,m'e Formalisms 
and Linguisl.ic Aml)iguit,y, Chi(:hesl;er 

Uszkoreit;, Ilans (:1987). Word order and con- 
st,il,uent sl.rlwlurc in Gcrluatl. CS'Ll Lecture Notes No. 
8: Stanford 
