Prepositional Phrase Attachment Through A Hybrid 
Disambiguation Model 
Haodong Wu and Teiji Furugori 
Department of Computer Science 
University of Electro-Communications 
1-5-1, Chofugaoka,Chofu, Tokyo 182, ,JAPAN 
{wu, furugori}Ophaeton, cs. uec. ac. j p 
Abstract 
Prepositional phrase attachment is a ma- 
jor cause of stru(:tural alnbiguity in nat- 
ural language. Recent work has been 
dependent on corpus-based approaches 
to deal with this problem. However, 
corpus-based approaches suffer from the 
sparse-data problem. To cope with this 
problem, we introduce a hybrid method 
of integrating corpus-based approach 
with knowledge-based techniques, using 
a wide-variety of information that comes 
from annotated corpora and a machine- 
readable dictionary. When the occur- 
rence frequency on the corpora is low, we 
use preference rules to determine PP at- 
tachment based on clues from conceptual 
information. An experiment has proven 
that our hybrid method is both effective 
and applicable in practice. 
1 Introduction 
The resolution of prepositional phrase attachment 
ambiguity is a difficult problem in NLP. There 
have been many proposals to attack this prob- 
lem. Traditional proposMs are mainly based on 
knowledge-based techniques which heavily depend 
on empirical knowledge encoded in handcrafted 
rules and domain knowledge in knowledge base: 
they are therefore not scalable. Recent work has 
turned to corpus-based or statistical approaches 
(e.g. Hindle and Rooth 1993; Ratnaparkhi, Rey- 
nar and Roukos 1994, Brill and Resnik 1994, 
Collins and Brooks 1995). Unlike traditional pro- 
posals, corpus-based approaches need not to pre- 
pare a large amount of handcrafted rules, they 
have therefore the merit of being scalable or easy 
to transfer to new domains. However, corpus- 
based approaches shffer fi'om the notorious sparse- 
data problem: estimations based on low occur- 
renee frequencies are very unreliable and often re- 
sult in bad performances in disambiguation. To 
cope with this problem, Brill and Resnik (1994) 
use word classes from Word-Net noun hierarchy 
to (:luster words into semantic classes. Collins and 
Brooks (1995) on the other hand use morpholog- 
ical analysis t)oth on test and tr~fining data. Un- 
fortunately, all these smoothing methods are not 
efficient enough to make a significant improvement 
on perforlnancc. 
Instead of using pure statistical approaches 
stated above, wc propose a hybrid approach to at- 
tack PP attachment problem. We employ corpus- 
based likelihood analysis to choose most-likely at- 
tachment. Where the occurrence frequency is too 
low to make a reliable choice, wc turn to use con- 
ceptual infornlation froln a machine-readable dic- 
tionary to to make decision on PP attachments. 
We use this disambiguation method to buihl a dis- 
ambiguation module in PFTE system, l 
In what follows we first outline the idea of us- 
ing hybrid information to sui)ply preferences for 
resolving ambiguous PP attachment. We then 
describe how this information is used in disam- 
biguating PP attachment. We put the hybrid ap- 
proach in an disambiguation algorithm. Finally, 
we show an experiment and its result. 
2 Using Multiple Information in 
Disambiguation 
Like other work, we use fonr head words to make 
decision on PP attachment: the main verb v, the 
head noun (nl) ahead of the preposition (p), and 
the head noun (n2) of the object of the preposi- 
tion. In the later discussion, the four head words 
are referred to as a quadrul)le (v nl p n2). 
Analyzing the strategies human beings employ 
in PP attachment disambiguation, we f(mnd that 
a wide-variety of information supplies important 
clues for disambiguation. It includes presupposi- 
tions, syntactic and lexical cues, collocations, syn- 
tactic and semantic restrictions, features of head 
words, conceptual relationships, and world knowl- 
edge. We use clues that are general and reliable 
1PFTE stands for Parser for Free Text of English. 
PFTE system is a versatile parsing system in develop- 
ment which (:overs a wide range of phenomena in lexi- 
cal, syntactic, semantic dimensions. It is designed as a 
linguistic tool for at)plications in text understanding, 
database generation fi'om text and computer-based 
language learning. 
1070 
so that they make the computation efficient and 
extensible. The information or clues we use are 
the following: 
1. Syntactic or lexical cues. If nl is same as n2, 
for exalnple, often nlTPP is a fixed t)hr;~se 
su(:h as .step by step. 
2. Co-oee'wr'rences. The (;o-o(:(:llrrences of tri- 
ples and pairs in (v nl p n2) colne frmn an- 
notated eorl)ora (Se(:tion 4). 
3. Syntactic and semantic features. Features of 
v or nl n2 sometimes in(licate the "corre(:t" 
attachment. For examt)le,if v is a movement, 
p is to and n2 is a t)lace or direction, the PP 
teuds to be attached to the verb. 
4. Conceptual relationships 1)etween v and n2, 
or between nl and n2. These relationships, 
which reflect the role-expections of the pre- 
1)osition, sut)l)ly important chics for disambi- 
guation. For example, in the sentence Peter 
broke the window by a ,stone, we are sure that 
the PP by a stone is att~u'hed to broke/v by 
knowing that stone~n2 is an instrument for 
broke/v. 
V~fe use co-occurrence informatioi~ in corl)us- 
t)ased (lis;mfl)iguation and other information in 
rule-b,~sed disambiguation. Later, we will discuss 
how to ac(tuire above information and use it in 
disambiguation. 
3 Estimation based on Corpora 
In this section, we consider two kinds of PP at- 
tachment in our corlms-t)ased al)l)roaeh , nalnely, 
attachment to verb phrase (VP atta('lmmnt) and 
to nmm i)hrase (NP attachment). Here, we use 
two ammtated corpora: EDR English Corpus 2 
and Susanne Corpus a to SUpl)ly training data. 
Both of theln (-(retain tagged syntactic structure 
for each sentence in thein. That is, each PP in the 
corl)ora has 1)een attached to an unique l)hrase. 
RA(v,nl,p,n2), a score fi'om 0 to 1, ix defined 
as a value of counts of VP attachments divided 
by the total of occurrences of (v,nl,1),n2) in the 
training data. 4 
RA(v,nl,p,n2) = f(,,vl,,,,,I,v,,ce) f(v,nl ,p,n2) 
,~ f ( vPlv ,,~ L ,p,u2) - /(,,vl ..... I,v,-2)+y(,wl,,,,,1,v,,,u) (1) 
In (1), the symbol f denotes frequency of a par- 
ti('ular tuple in the training data. For exami)le, 
2FDR English Corpus, conq)iled by Japan Ehx'- 
tronic Dictionary Research Institute, Ltd, eontalllS 
160,000 sentences with annotated nmrphologie, syn- 
tactic m,d semantic information. 
aSusaxme Corpus,cOral)ileal \])y Oeoffre.y Saml~so:n , 
is an amtotated corpus coml)risiltg about 130,000 
words of written American English text. 
'lWe assulue that only two kinds of PP atta(:h- 
merits: VP or NP attachment in the training data. 
f(vl) I share,apartment,with,friend) is the numl)er 
-of~.imes the quadruple (share, apartlnent, with, 
friend) is seelt with a VP attachment. Thus, 
we could choose a attaefiment actor(ling to RA 
score= if RA>0.5 choose VP attachment, other- 
wise choose NP attachment. 
Most of quadruples in test data are not in the 
training data, however. We thus turn to collect 
triples of (vd),nl),(nl,p,n2),(v,nl,l)) and 1)airs of 
(v,t)),(nl,p),(l),n2) like Collins and Brooks (1995) 
did, and coinpute RA score by (2) and (3). 
RA(v,nl,p,n2) = 
f(vVl,,p,n2)+f(,~p\[,~l .p,n2)+f(op\[,,,ul 4') 
f(v,p,n2)T f(nl,p,n2)+ f( v,n,p) (2) 
or, 
RA(v,nl,1),n2 ) = 
f ( vvl v ,p) + f ( ,:vl,~ ~ ,P )+ f ( "ph),n 2 ) 
f(v,p)T f(nl ,p)+ f(p,n2) (3) 
To avoi(l using very low frequen(:ies, we set 
two thr('sholds for each one above. For triple- 
combimttion, the c(mdition is: 
fl, riple(v,Itl,1),n2 ) ~ 2, and 
12*RA(v,nl,p,n2)-ll * h)g(ftrip>(v,nl,p,n2)) < 0.5 
here, 
flriple(v,Ill,p,n2) = f(v,1),n2)+f(nl,t),n2)+f(v,nl,1) ) 
For 1)airs-(:oml)ination, the condition is: 
fpair(v,nl,p,n2) > 4, and 
12*RA(v,nl,p,n2)-I I * log(fp,,i,.(v,ul,p,n2)) < 0.5 
here, 
fpair(v,nl,1),n2 ) = f(v,p)+f(nl,p)+f(p,n2) 
With the first threshohl in ca,oh case, we can 
avoid using low frequency tul)les; with the second 
one in each case, we throw away the RA score 
which is close to 0.5 ~Ls tlfis wdue is rather unsta- 
bh,. 
4 Conceptual Information and 
Preference Rules 
As we use only "relial)le" data from corl)ora to 
make decision on PP atta('hlnellt based (m RA 
score, many PPs' attachlnents may be left unde- 
termined due to sparse. (l;tta. We deal these unde- 
te.rlnined PPs with a rule-based approach. Here 
we use preference rules to determine PP attach- 
ments 1)y judging features of head words and con- 
ceptual relationships among them. Tl,is informa- 
tion comes from a machine-readable dictionary 
EDR dictionary, s 
SEDII electronic dietionm'y consists of a set 
of machine-readable dictionaries which includes 
Japanese and English word dictionary, Japanese and 
English co-occurrence dictionary, concept dictionary, 
and Jal)anese < > English l)ilingual dletio- 
nary(EI)R 1993). 
1071 
4.1 Features and Concept Classes 
We cluster words (verbs or nouns) ~hi~h have 
s~une feature or syntactical function into a (:()n- 
cel)t class. For examI)le, we classify verbs into 
active and passive, and ontologicM cbusses of men- 
tal, movement, etc. Similarly, we group nouns into 
place, time, state, direction, etc. 
We extract eoncel)t (:lass from concept classifi- 
cation in EDR Concept Dictionary} ~ 
4.2 Conceptual Relationship 
Conceptual relationships between v and n2, or be- 
tween nl and n2 predict PP attaehnlent quite well 
in many eases. We use EDR concept dictionary to 
acquire the concel)tual relationship between two 
concet)ts. For examt)le, given the two concet)ts of 
open and key, the dictionary will tell us that there 
may be a implement relationship 1)etween them, 
means that key may be act its an instrument for 
the action open. 
4.3 Preference Rules 
We introduce 1)reference rules to encode syntactic 
and lexical clues, as well a~s clues from conceptual 
information to determine PP attachments. We 
divide these rules into two categories: a rule whi('tl 
(:nit be applied to most of 1)rel)ositions is cMled 
global rule; a rule tying to a particular prel)osition, 
on the other hand, is called local rule. Four global 
rules used in our disambiguatioi: module are listed 
in Table 1. 
1. lexical(passivized(v) + PP) AND 
prep ¢ 'by' - > vp_attach(PP) 
2. nl : n2 - > vi)_attaeh(nl + PP) 
3. (prep # 'of' AND prep # 'for') AND 
(time(n2) OIl date(n2)) - > Vl)_attaeh(PP) 
4. lexicM(Adjeetive + PP) - > adjp_attach(PP) 
Table 1: Global rules 
Local rules use (:oncel)tual inforlnation to deter- 
mine PP attachlnent. In Table 2, we show sample 
h)cal rules for preposition with. 
with-rules: 
iml)lement(v, :)2) - > Vl)_~tttach(Pl )) 
(a-ol)jeet(nl, n2) ()R possessor(nl, 112)) 
AND NOT(implen:ent(v, n2)) - > 
np_~tttach(PP) 
Default - > vi)_attach(PP) 
Table 2: Sample local rules 
On the left hand of each rule, a one-atonl pre(t- 
Concet)t Dictionm'y consists of al)out 400,000 con: 
cepts, where, fbr eolleet)t classification, related con- 
eepts are orgmfized in hierm'chieM ar('hitecture and a 
concept in h)wer level inherits the f~atures from its 
upper level concepts. 
icate Oil the left hand presents tt subclass of con- 
cept ill the eon(:ept hierarchy (e.g. tilne(n2)), and 
a two-atom 1)redicate describes the COlWei)t rela- 
tion between two at(nns (e.g. implennult(v,n2)). 
Since local rules emph)y the senses of head 
words (termed as concepts), we shouhl 1)roject 
each of v, ul and n2 used by rules into one or 
several coi|cepts which denote(s) "correct" word 
senses before apl)lying local rules. The process is 
described in (Wu and Furugori 1995). 
5 Disambiguation Module 
For each sentence with aml)igu<)us PP (both ill 
syntaeti(' and semantie;d level), PETE system will 
produ<'e ;t structure with unattached PP(s), and 
call the disambiguation 1nodule to resolve ambigu- 
ous PP(s). The algorithm used in the nn)(hfle is 
shown beh)w : 
\[ALGORITHM\] 
Phase 1. (disambiguation using gh)bal rules): 
Try global rules one 1)y one. If a rule succeeds, use 
it to decide the attachment, and exit. 
Phase 2. (statistiesd)ased dismnbiguation): 
RA(v,nld),n2 ) = -1 (initial value) 
ftriple(v,nl,1),lt2 ) : f(v,p,:t2)+f(nl,1),n2)Tf(v,nl,p) 
fpair(v,Ill,t),n2 ) = f(v,1))+f(nl,1))+f(p,ll2 ) 
if ftriph,(v,nl,i),n2 ) > 2, then 
ll A(vdil,1),n2) = .f(vvl~,,r,.2)+.r( ~,~,1,, 1 ,v,-~)q.f(,,v\[ ,,,~, :' a,) f( v,v,n 2)-b f(~ t ,p,~ 2).4- f( v ,.. 4>) 
if \[2*RA(v,n 1,1),n2 )- 11 *log(fi.riple( v ,n1,1),n2 ) ) <0.5 
then RA(v,nl,p,n2) =-1 
if RA(v,nl,p,n2)<0 and fpair(v,nl,l),l~2)>4,then 
I1A(v,nl,I),n2) = f(.vplv,p)+ f(vp\[,, 1,p)T f(,,plp,,,2} f(v,p) + f( n I,p)+ f(p,7~ 2) 
if \[2*RA(v,nl,p,n2)-I I *log(fpair(v,lll,p,n2))<0.5 
then RA(v,nl,i),n2) = -1 
if I{A(v,nld),n2 ) > 0, then { 
if RA(v,nl,1),n2)<0.5, then choose NP attachment 
otherwis(, choose VP attachment 
exit. } 
Phase 3. (concept-based disalnl)iguation): 
1) Project each of v, nl, n2 into its COIteel)t sets. 
2) Try the rules related to the prel)osition , if only 
one rule is applicable, use it to decide the attach- 
ment, and then exit. 
Phase 4. (attachment 1)y default): 
if f(p) > 0, then { 
if ~ < 0.5, then choose NP attachment; f(~,) 
otherwise choose VP attachment} 
otherwise choose NP attachment. 
This algorithm differs from the previous one de.- 
scribed ill (Wu and Furugori 1995) in which prefer- 
ence rules were applied 1)efol'e statistical comput- 
ing. We have changed the order for the following 
reasons: an experinlent has proven that using the 
\].072 
data of qua(lrul)les and triples, as well as tut)les 
with high occurrences i,s good enough in success 
rate (Sec Tal)lc 3). and statistic models ha,ve a 
ground m~themlttical 1)asis. 
6 Experiment and Evaluation 
We did an exl~eriment to test our lnethod. First, 
we prcl)are(l test data of 3043 ambiguous PPs in 
texts randomly taken from a (:Olnl)uter manual, a. 
graalllnlar book and Japan Time.s. 
Phase 
~lobal rules 
_lriplcs 
_Dairs 
local rides 
Total Number 
507 
564 
1093 
662 
Number Correcl 
487 
518 
931 
557 
Stlccess rate, 
96. l% 
91.8% 
85.3% 
84.1% 
others 2 l 7 151 69.6% 
Total 3043 2644 86.9% 
'Fable 3: Results of the test in PP attachment 
The results are shown ill Table 3. We success- 
fully disalnbiguated 86.9% of th(, test data. To 
reduce sl)ars(' data. 1)roblenl and deal wilh unde- 
fined wor(ls in the dictiolm.ry, we use a l)roc(!dure 
simihtr to th;tt of Collins and Brook 11995) to pro- 
(:ess head words both in training data and in test 
datm Tile 1)ro(:c(lure is shown as follows: 
• All 4-digit lmmbers itre tel)laced with 'date'. 
* All verbs are rel)l~u:ed with their stems ill low- 
or (:as(~S. 
e Nouns starting with it calfital letter are re- 
placed with 'lmme'. 
• Personal 1)ronouns in the n2 field are r(,lfiaced 
with 'perso\]t'. 
As the result, we a(:quired all ac(:urate rate of 
87.5% (TM)le 4), an improvemellt of 0.6% on the 
1)r('.violls OlI(L 
Phase Total Number Number Correct Success rote 
~global rules 507 487 96.1% 
ll~es 659 601 90.9% 
_Amirs 1134 965 84,9% 
local rules 628 527 83.9% 
others 115 81 70.4% 
Total 3043 2661 87.5% 
Table 4: Restflts with processing head words 
The result is rather good, COlnt)aral)h' to the 
l)erformance of all "averag('." hlIl\[la, ll looking at 
(v,nl,p,n2) alone (al)out 85% to 90% according to 
Hindle ~md R ooth 1993, Collins and Brooks 1995). 
We attribute this result to the hyl)rid apl)roach we 
used, in which preferences with higher rdiabilities 
are used 1)rior to other on('s in the disalnl)iguation 
l)rocess. We found that two thresholds are very 
hell)ful in iml/roving the result. If we set the first 
threshohl as 0 ~md throw away the second thresh- 
old, then l.he success rates ill tril)le-('onfl)ination 
will \])(K:(,llt(' 89.1% (-1.8%), a,nd 81.2% (-3.7%) in 
l)aJr-(:ombilmtion. Moreover, using h)('al rules to 
tackle unattached PPs by statistical model is also 
hellfful in improving the overall su('cess rat(, since 
loom rules in l)hase 3 work nmch b(,tter than de- 
fault (h,(:ision in Phase 4. 
7 Conclusion 
Pure statisticM models for disalnl)iguation tltsks 
Sll~'('l' fl'Olll sparse-data 1)robh'nL ~V(' ltot('(l that 
even when ai)plying smooth t('chniques such as se- 
nuultic sinfilarity or (:lustering, it is hard to avoid 
malting poor est;ilnat.iol~s Oil low OCCltrr(ulces ill 
corpora. ()nqine dictionaries wlfich contain rich 
semantic or concel)tual information ml W be of help 
in improving the perforlnan('e. ()ur exl)erimcnt 
shows tha, t the hybrid al)proach we taken is both 
effectiv(, and a.1)l)li('able ill practice. 
References 
Brill, E. and l{('snik, P. 1994. A rul('-l)i~sed al)- 
pro~Lch to 1)relmsitional phrlme at;t~tchnwnt dis- 
ambigua.tion. In Proc. of th, e, 15th, Coling, 
1198 1204. 
Collins, M. and Brooks, J. 1995. l)rel/ositional 
llhrase attachment through a backed-off lnodel. 
h ttl,: / /xxx.lanl.gov / c,ni,-lg/ 9506021. 
DMfigr('n, K. and McDowell, a. 1986. Using con> 
monsense knowledge /o (lisaml)iguate l/reposi - 
tionM t)hrase modifiers. In Pr'oc. of the 5th, 
AAAI, 589-593. 
Japan Ehwtronic Dictionary Research institute, 
Ltd. 1993. ED\]I electronic dictionary sl)eciti- 
cations guide. 
Jensen, K. and Binot, J. 1987. 1)isambiguating 
prepositional phrase attachments by using on- 
line dictionary definition. In Computational 
Linyuistica. 1313-4) : 251-260. 
Hindlc, D. and Rooth, M. 1993. Structural mn- 
I)iguity mM lexical rel~tions. In Computational 
Linguistics, 1911): 103-120. 
Luk, A. K. 1995. Statistical sense disand)iguation 
with relatively sInall corllora using dictionary 
delinitiol~S. In Proc. of the ,'\](h'd A CL Meeting, 
181-188. 
Whittelnore, G.; Ferrara, K.; and l~runner, H. 
1990. Empirical study of predictive powers of 
siml)le attachlnent sdtelll('s for t)ost-modiliers 
prepositionM phrases. In Proc. of th, c 28th 
A CL Meeting, 23-30. 
Wu, H., Takeshi, 1. and Furugori, T. 1995. A pre- 
ferential al)proach for disambigmtting ilrelmsi -. 
tional phrase modifiers. In Proc. of the 3th. Na- 
tural La'ng'u, age Processing I)acific Rim Sympo- 
,siUm, 745-751. 
1073 
