Unscrambling English word order 
Allan Ramsay L: Helen Seville 
Centre for Computational Linguistics 
UMIST, PO Box 88, Manchester M60 1QD, England 
allan/he:teng@cc:t, umist, ac. uk 
Abstract 
We propose a treatment of 'extraposition' which 
allows items to be assimilated directly even when 
they at)pear far from their canonical positions. This 
treatnmnt supports analyses of a number of phenom- 
ena which are otherwise hard to describe. The ap- 
1)roach requires a generalisation of standard chart 
i)arsing techniques. 
1 Extraposition in English 
It is widely accei)ted that sentences such as 
1 I saw the girl wh, o your tnvther said h,e fancied. 
2 The soup was OK, but th, e main course I th, ought 
was awful. 
involvc items ('who', 'the mai'a, eoursc') being found 
far away from their normal i)ositions (as the com- 
plement of 'fancied' and the subject of 'was a@d'). 
It seems likely that; the modifiers 'in the parle' and 
"with, all my heart' in 
3 l>n the park I met Arthur. 
4 I bclievcd with, all my heart th, at sh, c loved me. 
arc also 'out of position', since you would normally 
expect VP-modi(ying PPs of this kind to appear im- 
mediately to the right of the modified VP (so that 
the canonical versions of these sentences would have 
been '1 met Arthur in the park' and 'I believed that 
site loved me with all my heart.'). There arc vari- 
ous reasons for moving things around in this way- 
moving 'who' to the left; in (1) provides an easy way 
of l)icking out the boundary of the relative clause; 
moving 'the main course' and 'in the park' in (2) 
and (3) puts them into tlmmatically/informationally 
more prominent positions; and moving the senten- 
tial complement 'that she lovcd me' to the right in 
(4) reduces the attachment ambiguity that arises in 
the alternative form. 
This is all well-known, and is treated in most 
grammatical frameworks by hallucinating an item in 
the canonical position, and then rememl)ering that 
halhlcination uI) to (;tie 1)oint at which the out-of- 
place item is encountered. Exactly how the halhlci- 
nation is remelnbered varies fron~ one framework to 
another, with Ulfification grammars generally carry- 
ing intbrmation about it on a category-valued fea- 
ture (usually called slash,). The main problem with 
this al)l)roach is that it is difficult to control the sit- 
uations in which 'traces' of this kind get proposed. 
(Johnson and Kay, 1994) suggest using ~sponsors' in 
order to license the introduction of traces, where a 
st)onsor is some item of the required kind that has 
already 1)een found, and which is hence potentially 
going to cancel with the trace. 
If your parser works ti'om left,right then this 
will work for items which have been left-shifted, but 
clearly it cmmot work for right-shifted items, since 
the sI)onsor will not have t)een found at the time 
when it is needed. Thus we cannot use a sI)onsor to 
justify hallucinatillg an S-comp ti)r 'believed' in (4), 
or for the heavy-NP-shifts in 
5 He 
6 Ite 
h, ouse. 
gave up his job. 
built on that .spot th, e most appallingly ugly 
In any casc, the notion that some item has been 
left- or right-shifted fails to account for cases of 'in- 
traposition': 
7 I bclievc Betty is a fool. 
8 Betty, I belicvc, is a fool. 
9 Betty is, I believe, a fool. 
It is at least 1)lausible that (8) and (9) are variants 
oil (7). They're made out of the same words, they 
have the same truth conditions: the only trouble is 
that part of the sentence seems to be in the wrong 
t)laee. 
This is analogous to the situation in (2) and (3), 
where items were moved to the fi'ont to make them 
more prominent. It seems as though in (;tie current 
case the words 'I believe' have been shifted into the 
middle, and 1)arcnthesised, to nmke them less promi- 
nent. We will show how to deal with this by adal)ting 
663 
13 Charles danced with Eliza, but Diana he kissed. 
mid when we say that tim PP that results from satu- 
rating 'on' modifies a VP to its left we are referring 
to cases like (12), not to 
14 On the mat the cat sat. 
(or even 'On the mat sat the cat. ', where tile expec- 
tation that the subject will appear to the left of the 
verb has also been violated.) 
It seems as though we need the standard FRCP to 
cope with the canonical cases; the weakened FRCP' 
to cope with cases where the phrase occurs in some 
unexpected position; and something else to constrain 
the unexpected positions which are actually possible. 
Tile constraints on what can be moved around 
take two tbrnls. Firstly, we have say whether some- 
thing can be nloved at all, which we do by intro- 
ducing a polar-valued feature called moved: items 
which appear away from their canonical positions 
are marked moved(rioht) or moved(left), depend- 
ing on the direction in which they have been shifted. 
Arguments and targets which aren't allowed to move 
will be marked -moved by tile item that subcate- 
gorises for them. 
Secondly, we have to specify where those items 
that can move are allowed to get to. We do this 
by using linear precedence rules (LP-rules), most of 
which place constraints on immediate local subtrees. 
Thus we can say things like 
{A, B, C} : +wh@A&-wh.@B 
-+ start@A < start@B 
to capture tile fact that if A and B are local subtrees 
of C, then if A is WH-marked and B is not then 
A must precede B (A's start must be before B's). 
Note that the signature of the rule mentions C, even 
though in this case the body does not. 
The facts about extraposition are captured by 
rules which specify the circumstances under which a 
local subtree can (or must)be +moved. Tile key con- 
straints for trees representing structures with verbal 
heads are as follow: 
{ A, B, C} : -wh@A& -aux@C&A = snbject@C 
- moved@A 
(if A is the subject of C, where A is not WH-marked 
and C is not mi auxiliary, then A may not be moved) 
{A, B, C} : movcd@A = right 
~X(X E dtrs@C 
& start@core@C < start@X) 
start@X < start(~A) 
(if A has been right-shifted, then C had better have 
some other daughter X between C's head and A. 
Tile flfll rule says that C must be heavier than X, 
where we take 'C is heavier than X' to mean that C 
covers more words than X, so that tlfis rule covers 
(4), (5) and (6)). 
There are a nmnber of other such rules, of which the 
most complex relates to 'ttmt'-clauses (denoted by 
-~cornp). The description of such clauses comes in 
two parts, one to say that -~vh phrases may not be 
extracted and one to say that +wh phrases must be 
extracted. 
(i) {A, B, C} :C C clause 
& +comp@C & +compact@C 
---} - wh@B 
(ii) {A, B, C} : C C clause 
& +cornp@C & -compact@C 
--~ + wh@B 
The first part of this says that if combining A and 
B produces a 'that'-clause C, then if C is +compact 
(so nothing has been extracted from it) then it, had 
better be -wh (in other words, nothing properly 
inside it can be +wh). If on tile other hand C is 
-compact then there must be solnething extracted 
from it, in which case the item which has been ex- 
tracted must mark it as +wh,. These rules cover the 
(un)aeceptability of 
15 I know that she loves me. 
16 ~ I know me that she loves. 
17 who I know that sh, e loves. 
18 * I know that who she loves. 
5 Intraposition 
The rules in Section 4 provide a reasonable account 
of simple extraposition (both left and right) from 
clauses. We now return to 
7 I believe Betty is a fool. 
8 Betty, I believe, is a fool. 
9 Betty is, I believe, a fool. 
Suppose we use FRCP', with no LP-rules, to analyse 
(8). We will get, mnong other things, the phrases 
and part phrases shown in Fig. 1 (tile commas are 
treated as lexical items, so that 'Betty' starts at 0, 
the first comma at 1, and 'I' at 2, and so on). 
The first couple of steps are straightforward: 'a 
fool' results fl'om combining 'a' and 'fool'. It; has no 
holes in it, its extreme start and end are the stone as 
its compact start and end. Then 'is a fool' results 
froin combining 'is' and 'a fool', and again all the 
pieces are in the right place, so the extreme start 
and end are the santo as the compact start and end 
and the phrase is +compact. 
At step 3, 'Betty' is integrated as the subject of 
'is a fool'. The result starts at 0, since that's where 
666 
l)hrase start end xstart xend 
1 a fool 6 8 6 8 
2 is n tbol 5 8 5 8 
3 Betty is a. fool 5 8 0 8 
4 believe Betty is a fool 3 4 0 8 
5 I 1)elieve Betty is a %ol 2 4 0 8 
6 ,I believe Betty is a tbol 0 4 0 8 
7 ,I believe Betty is a fool, 0 8 0 8 
Figure 1: Analysis of (8) 
(:Oral)act 
+ 
+ 
+ 
'Betty' starts, and is -compact, since it does not 
include all the intervening words. 
At 4 this -compact sentence l)ecomes the com- 
plement of 'believe'. The result is again -compact, 
since it fails to include the word q' or the two com- 
mas which at)pear 1)etween its start and end l)oints. 
The compact core is now the word 'believe', so the 
comi)act start and en(t are 3 and d. 
.At; 5, the VP 'believe Betty is a fool' combines 
with 'I' to produce 'I believe Betty is a fool'. The 
two commas then combine with this phrase, mark- 
ing it as being parenthe.tieal and, when the second 
comma is included, finally marking it as +compact. 
Similar structures would be created during the 
t)rocessing of (9), with the only difl'erence l)eing that 
'is a fool' would 1)e the first -corn, pact phrase found. 
Apart fl:om that the analysis of (9) would tm identi- 
cal to the analysis of (8). 
'i~hcre m'e two problems with this ai)l)roa(:h to sen- 
Ix'ames of this kind: (i) 1)ecause we obtain identical 
syntactic analyses of (7), (8) mid (9), then any (:Oln- 
t)ositional semantics will assign all three the same 
interpretation. This is not entirely wrong: I cannot 
fairly say a.ny of these sentences unless I do believe 
that Betty is a tbol. But it is also clearly not entirely 
right, since it misses the diflbrence in emi)hasis. We 
will not discuss this any further here. (ii) be(:mlse we 
are not applying the LP-rules, we get rather a large 
number of mmlyses. Without LP-rules, we gel; a sin- 
gle analysis of '1 believe Bctty is a fool', having con- 
structed 23 t)artial and complete, edges. 1,br 'Betty, 
I believe, is a fool' we get three analyses (including 
the correct one) having constructed 1.01 edges. Most 
of t, hese m'ise fl'om the t)resence of the commas, since 
we have to allow for the possibility that each of these 
(:ommas is either an ot)ening or closing l)racket, or a 
conjmmtion in a comma-sei)arated list of coltiunets. 
Others arise fl'om the fact that we have removed 
all the LP-rules, so that we are treating English as 
having completely Dee word order. Case marking 
still provides some constraints on what can combine 
with what, so that in the current case 'I' is the only 
possible sul)ject for 'believe' and 'Betty' is the only 
possible subject for 'is'. If we had been dealing with 
19 Betty, Fred believes, is a fool 
then we would have had six analyses froln 107 edges, 
with the new ones arising because we had assigne(1 
~Fred' as the subject of %" and 'Betty' as the sul)- 
jec~ of 'believes'. 
Clearly we need to reinstate the general LP-rule.% 
whilst allowing for the cases we are interested in. 
These cases are characterised in two ways: (i) some 
word that requires a sentence has oc(-urre(1 in ~ con- 
text where a :st)lit' sentence is available, and (ii) this 
word is adjaeeilt to a 1)arenthetical comlna. The 
statement of this rule is rather long-winded, but the 
result is to provide a single analysis of (8) from 66 
edges, and a single analysis of (9) from 70 edges. 
6 ~more X than Y' 
Most (:ases of extral)osition in English involve sen- 
tences, lint there are a numl)er of other 1)henomena 
where items seem to have 1)een shifted around. Con- 
sider for instance the following examples: 
20 Geo~ye ate more than six peaches. 
21 Harriet ate more peaches than pears. 
22 \[a'n, ate more pcaehcs than ,\]ulia.n. 
In (20), "more' than si:r' looks like a complex deter- 
miner. How many peaches did George eat? More 
than six. The easiest way to analyse this is by 
assmning theft 'more' subcategoriscs for a 'than- 
phrase'. 
In (21) and (22), however, the than-phrase seems 
to have become di@)inted. It still seems as though 
'more.' heads a complex determiner, since (22) would 
support the answer 'more than Julian' to the ques- 
tion 'How many peachcs did Ian cat?' a 
We therefore introduce lexical entries tbr 'more' 
and 'tha'n' which look roughly as follows: 
athough (21) does not seem to support ~morc than pears' 
as an mmwer to 'How many peaches did Harriet cal'?'. The 
problem seems l;o be that NP complements to 'than' are actu- 
ally elliptical (see below), and it seems to be harder to recover 
the elllpsed sentence 'More than she ate pears' than to recover 
~More than Julian ate peaches' or era, ore than Julian ate'. 
667 
sign 
/ n°m°°,' / fl J J / sy'l / 
L Jill Lfoo Iwh< JM 
subcat/\[in \[phon 'than'\]l\ II 
semantics... 
sign 
phon 'than' 
I onfoot 
syn \[ \[... 
\[foot\[wit <>\] 
subcat (X} 
semantics... 
The entry for 'more' says that it will make a specifier 
if it finds a satmated phrase headed by 'than'. The 
entry for 'than' says that it will make a phrase of 
the required type so long as it finds sonm argument 
X. We know very little about X. In (20) it is a 
number, in (21) and (22) it appears to be an NP. In 
fact, as (Puhnan, 1987) has shown, the best way to 
tlfink about these examples is by regarding them as 
elliptical fbr the sentences 
23 Harriet ate more peaches than site ate pears. 
24 1an ate more peaches than Julian ate peach, es. 
Other kinds of elliptical phrase are permitted, as in 
25 Keith ate more peaches than Lucy did. 
or even 
26 Martha ate more ripe titan unripe pcach, es. 4 
We theretbre allow arbitrary phrases as tile argu- 
ment to 'than'. All we need now are the LP-rules 
describing when arguments of 'than' should be ex- 
traposed. These simply say that if you are combin- 
ing the determiner 'more' with a 'than'-phrase, then 
if the sole daughter of the 'than'-phrase is a nmnber 
then it must not be shifted, and if it is not then it 
must be right-shifted. 
(i) {d, B, C} : A E det&phon@A = 'more' 
& cat@B = than& dtrs@B = <D> 
& (DEnnmorDeadj) 
4Note that in this case the argument of 'than' is not dis- 
placed. 
--> - moved@ B 
(ii) {A, B, C} :A E det&phon@A = 'more' 
cat¢}B = than 
& not(dtrs@B = <D> 
& (D < .:am or D < adj)) 
--~ moved@B = right 
With these LP-rules, we get approl)riate structural 
analyses for (20) (25). We do not, however, cur- 
rently have a treatment of ellipsis. We therefore 
cannot provide sensible semantic analyses of (21) 
and (22), since we cannot determine what sentences 
'peaches' and 'Julian' are elliptical for (imagine, for 
instance, trying to decide whether 'Eagles eat more 
spar~vws than crows' meant 'Eagles eat more spar- 
rvws than crows cats sparrows' or 'Eagles eat more 
sparrows than eagles eat crows'). 
If the structure of 'more peaches th, an pears' in- 
volves a displaced 'than'-phrase, then it seems very 
plausible that the stone is true for 
27 Nick wrote a more elegant program th, an Olive. 
28 Peter wrote a more elegant prvgram th, an th, at. 
This is given further supt)ort by the acceptability of 
examples like 
29 A progrum more elegant than that would be hard 
to find. 
where tile 'than'-I)hrase is adjacent to the modified 
adjective 'elegant' rather than to the noun 'program' 
which is modified by the whole phrase 'more elegant 
than that'. 
Frustratingly, it just does not seem possible to 
reuse the lexical entry above for 'more' to cope with 
these cases. In (20) (25), 'more' made a deternfiner 
when supplied with an apl)ropriate 'than'-phrase. 
For (27)-(29) it needs to make somettfing whicll will 
combine with an adjective/adverb to produce an in- 
tensified version of the original. We therefore need 
tile tbllowing entry: 
)lion ~more ~ 
subcat sign < > J| 
semantics... 
668 
This needs a 'than'-phrase to saturate it, and once it 
is saturated it will combine with an adj (adjective or 
a(tverb) to nmke a new adj. There are two questions 
to be answered: should such a complex adj attpear to 
the left; or right of its target, and should the 'than'- 
phrase be extraposed or not? 
(28) and (29) show that these questions are in- 
timately connected. If the 'than'-phrase is right- 
shifted, then the resulting modifier aptmars to the 
left of its target (28); if it is not, then the moditier 
appears to the right (29). This is exactly what is pre- 
dicte(1 by (Willimns, 1981)'s suggestion that head- 
final moditiers generally appear to the left of their 
targets ( 'a quietly sleeping man') whereas non-head- 
final ones apt)ear to the right ('a 'llt(t'lt sleeping qui- 
etly'). All we need to do is to make right-shifting of 
the 'than'-l)hrase optional, and to invoke Williams' 
rule., using the coral)act core of the modifier. Thus 
the compact modifier 'more elegant th, an thai,' fl'om 
(2{I) is not head final, since the whole thing is cora- 
l)act but the head, 'elwant' , is i1ot; the last word; the 
non-colnt)aet one 'move ch;gant ... than that' from 
(28) is head final, since this time 'elegant' is the 
last, word in the (:ompact core 'more elwant'. Hetme 
'more clegant th, an that' tbllows its targe, t a.nd 'more 
clwant ... than th, at' precedes it. No new LP-rules 
are required, and 110 challg(}s 1;(1 th(} gellel:al rllle for 
locating moditiers are required. 
7 Conclusions 
\¥e have shown how retrieving disl)laeed items di- 
rectly, rather than t)ositing a trace of some kind 
and then eancellillg it against an appriate iteln 
when one turns up, can I)rovide treatlllellts of left- 
and right-extraposition which display the advan- 
tages that (Johnson and Kay, 1994) obtain for left- 
extraposition. This approach to extraposition can 
be extended to deal with 'intraposition' and to cases 
where items have been extracted ti'om non-clausal 
items. In order to avoid overgeneration, we needed 
to introduce a set of LP-rules which are applied 
as phrases are constructed in order to ensure, that 
items have not been shifted to unacceptable posi- 
tions. The extra computation required for checking 
the LP-rules has no effect on l;he comI)lexity of the 
parsing process, since they simply add a constant 
(and t~irly small) extra set of steps each time a new 
edge is proposed. As a rough tmrformance guide, the 
gralnInar generates five analyses for 
30 Ite built on that site a more unattractive house 
than the eric which he built in Greenwich. 
on the basis of 237 edges (the different global anal- 
yses m'ise from the attachment ambiguities for the 
wn:ious modifiers), and takes 4.1 seconds to <1o so 
(compiled Sicstus on a Pentiunl 350). This sentence 
contains a right-shifted NP, which itself contains a 
'more ... than ...' construction and also a relative 
clause with a left-shifted WII-pronoun, and hence 
could be expected to cause problems for al)l)roaches 
using si)onsors , while 
8 Betty, I believe, is a fool. 
takes 0.27 seconds. Tit('. worst case comple×ity 
analysis for this kind of approach is fairly awflfl 
(O(l y) X 22(N-I)) where 1 ~ is the number of unsatu- 
rated edges in the initial chart and N is the length 
of the sentence (Ramsay, in press)). In practice the 
LP-rules provide sufficient constraints on the gener- 
ation of non-coral)act phrases for pertbrmalme to be 
generally acceptable on sentences of about twenty 
words. 

References 

M Johnson and M Kay. 1994. Parsing and empty 
nodes. Computational Linguistics, 20(2):289 300. 

R M Kal)lan. 1973. A general syntactic processor. 
In R.. Rustin, editor, Natural  proeessin.q, 
pages 193 241, New York. Algorithmics Press. 

M Kay. 1973. The MINI) system. In R. Rustin, 
editor, Naturnl Language Processing, pages 155 
188, New York. Algorithmics Press. 

C J Pollard and I A Sag. 1988. An I'nfovmation 
Based Approach to Syntax and Semantics: Vol 
1 Fundamentals. CSLI lecture notes 113, Chicago 
University Press, Chi(:ago. 

C J l?ollard and I A Sag. 1994. Head-driven Ph~nsc 
Str'uct'nre G~nmmar. (~hi(',ago University Press, 
Chicago. 

S G l'uhnan. 1987. l~vents and VP-modifiers. in 
B.G.T. Lowden, editor, Proceedings of the Alvcy 
Sponsored Workshop On Formal Semantics in 
Natural Languagc P'roecssiug, Colchester. Univer- 
sity of Essex. 

A M Ramsay. in press. Parsing with discontinuous 
phrases. Nat'mnl Language Engincering. 

I A Sag and T Wasow. 1999. Syntactic theory: a 
formal introduction. CSLI, Stratford, Ca. 

E Williams. 1981. On the notions 'lexically related' 
att(t 'head of a word'. Ling'uistic Inquiry, 12:254 
274. 

M M Wood. 1993. Categorial Grammars. Rout- 
ledge, London. 
