A Rule-Based Approach to Prepositional Phrase Attachment 
Disambiguation 
Eric Brill Philip Resnik* 
Spoken Language Systems Group Sun Microsystems l,aboratories, In(:. 
Laboratory for Colnputer Science, M.I.T. Chelmsford, MA 01824 ~1195 U.S.A 
(\]ambridge, Ma. 02139 U.S.A philip.resnil;(~east.sun.com 
brill~goldilocks.lcs.mit.edu 
Abstract 
I:n this paper, we describe a new corpus-based ap- 
proach to prepositional phrase attachment disam- 
biguation, and present results colnparing peffo> 
mange of this algorithm with other corpus-based 
approaches to this problem. 
Introduction 
Prel)ositioual phrase attachment disambiguation 
is a difficult problem. Take, for example, the sen- 
rouge: 
(l) Buy a ear \[p,o with a steering wheel\]. 
We would guess that the correct interpretation is 
that one should buy cars that come with steer- 
ing wheels, and not that one should use a steering 
wheel as barter for purchasing a car. \]n this case, 
we are helped by our world knowledge about auto- 
mobiles and automobile parts, and about typical 
methods of barter, which we can draw upon to cor- 
rectly disambignate the sentence. Beyond possibly 
needing such rich semantic or conceptual int'ornla- 
tion, Altmann and Steedman (AS88) show that 
there a,re certain cases where a discourse model 
is needed to correctly disambiguate prepositional 
phrase atta.chment. 
However, while there are certainly cases of an> 
biguity that seem to need some deep knowledge, 
either linguistic or conceptual, one might ask whag 
sort of performance could 1oe achieved by a sys- 
tem thai uses somewhat superficial knowledge au- 
*Parts of this work done a.t the Computer and hP 
lbrmation Science Department, University of Penn- 
sylvania were supported by by DARPA and AFOSR 
jointly under grant No. AFOSR-90-0066, and by ARO 
grant No. DAAL 03-89-C0031 PR\[ (first author) and 
by an IBM gradmtte fellowship (second author). This 
work was also supported at MIT by ARPA under Con- 
tract N000t4-89-J-la32= monitored through the Office 
of Naval resear<:h (lirst a.uthor). 
tomatically ~xtracted from a large corpus. Recent 
work has shown thai; this approach holds promise (H\]~,91, HR93). 
hi this paper we describe a new rule-based ap- 
proach to prepositional phrase attachment, disam- 
biguation. A set of silnple rules is learned au- 
tomatically to try to prediet proper attachment 
based on any of a number of possible contextual 
giles. 
Baseline 
llindle and Rooth (IIR91, 1\[17{93) describe 
corpus-based approach to disambiguating between 
prepositional phrase attachlnent to the main verb 
and to the object nonn phrase (such as in the ex- 
ample sentence above). They first point out that 
simple attachment strategies snch as right associa- 
tion (Kim73) and miuimal a.tbtchment (Fra78) do 
not work well i,l practice' (see (WFB90)). They 
then suggest using lexical preference, estimated 
from a large corpus of text, as a method of re- 
solving attachment ambiguity, a technique the}' 
call "lexical association." From a large corpus of 
pursed text, they first find all nonn phrase heads, 
and then record the verb (if' any) that precedes the 
head, and the preposition (if any) that follows it, 
as well as some other syntactic inforlnation about 
the sentence. An algorithm is then specified 1,o try 
to extract attachment information h'om this table 
of co-occurrences. I!'or instance, a table entry is 
cousidered a definite instance of the prepositional 
phrase attaching to the noun if: 
'\['he noun phrase occm:s in a context where 
no verb could license the prepositional phrase, 
specifically if the noun phrase is in a subjeet 
or other pre-verbal position. 
They specify seven different procedures for decid- 
ing whether a table entry is au instance of no 
attachment, sure noun attach, sm:e verb attach, 
or all ambiguous attach. Using these procedures, 
they are able to extract frequency information, 
1198 
counting t, he numl)e,r of times a ptu:ticular verb 
or ncmn a.ppe~u:s with a pal:tieuh~r l~reposition. 
These frequen(;ies serve a.s training d~t;a for 
the statistical model they use to predict correct 
i~ttachmenL To dismnbigu;~te sentence (l), they 
would compute the likelihood of the preposition 
with giwm the verb buy, {rod eolltrast that with 
the likelihood of that preposition given I:he liOttll 
whed. 
()he, problem wit;h this ,~pproa~ch is tll~tt it 
is limited in what rel~tionships are examined to 
make mi ~d;tachment decision. Simply extending 
t\[indle and l{,ooth's model to allow R)r relalion- 
ships such as tlml~ I)e.tweell the verb and the' ob- 
ject o\[' the preposition would i:esult ill too large 
a. parameter spa.ce, given ~my realistic quantity of 
traiuing data. Another prol)lem of the method, 
shared by ma.ny statistical approaches, is that 
the. model ~(:quired (Inring training is rel)reser~ted 
in a huge, t~d)le of probabilities, pl:ecludiug any 
stra.ightf'orward analysis of its workings. 
'l~-ansformation-Based Error-Driven 
Learning 
Tra, nS\]bl'm~d;ion-lmsed errol:-dHven learlting is ~ 
sin@e learning a.lgorithm tlmt has t)eeu applied 
to a. number of natural la.ngm,ge prol)ie.ms, includ- 
Jllg l)a.t't O\[' speech tagging and syuta.cl, ic l)m:sing 
(1h:i92, \]h:i93a, Bri!)gb, Bri9d). Figure :1 illus- 
trates the learning l)l:OCC'SS, l:irsL, tlll;21nlola, ted 
text; is l)assed through the initial-st;ate mmota- 
tot. 'l'lw~ initial-stat, e area)tater can range in com- 
plexity from quite trivial (e.g. assigning rmtdom 
strll(:ttll:C) to quit, e sophistica.ted (e.g. assigning 
the output of a. I{nowledge-based ;/l/llot;~l, tol' that 
was created by hand). Ouce text has beeu passed 
through the iuitia.l-state almOl, at.or, it. is then (;ore- 
pared to the h'ugh,, as indicated ill a luamlally an- 
nota,teA eorl)llS , and transformations are le~u'ned 
that can be applied to the oul, put of the iuitial 
state remora, tot t;o make it, better resemble the 
:ruffs. 
So far, ouly ~ greedy search al)proach has been 
used: at eaeh itera.tion o\[' learning, t.he tra nsfo> 
nl~tion is found whose application results in the 
greatest iml)rovenmnt; tha.t transfk)rmation is then 
added to the ordered trmlsforlmLtiou list and the 
corpus is upd~d.ed by a.pplying the. learned trans 
formation. (See, (I{,Mg,\[) for a detailed discussiou 
of this algorithm in the context of machiue, le, aru-- 
iug issues.) 
Ottce 3,11 ordered list; of transform~tions is 
learned, new text, can be mmotated hy first aI> 
plying the initial state ~mnotator to it and then 
applying each o\[' the traaM'ormations, iu order. 
UNANNOTATI{D \] 
"I'I~X'I' 
1NH'\[AI, l 
STATE 
ANNOTATliD TEXT TI~.I j'\['l l 
,~e,N El( ~-~ RUI ,I-S 
Figure \[: Transfonm~tion-I~ased Error.-Driven 
l,earlfiUg. 
rlh:ansformation-B ased Prepositional 
Phrase Attachment 
We will now show how transformation-based e.rrol> 
driwm IGmfing can be used to resolve prep(~si- 
tiered phrase at, tachment ambiguity. The l)reposi- 
tioiml phrase a.tt~Munent |ea.riter learns tra.nsfor-- 
Ill~ttiollS \[¥onl a C,)l:l>tls O\[ 4-tuples of the \['orm (v 
I11 I\] 1|9), where v is ~1 w;rl), nl is the head of its 
objecl, llolni \]phrase, i ) is the \])l'epositioll, and 11:2 
is the head of the noun phrase, governed by the 
prel)c, sition (for e,-:anq~le, sce/v :1~' bo:q/,l o,/p 
the h711/~2). 1,'or all sentences that conlbrm to this 
pattern in the Penn Treeb~mk W{dl St, l:eet 3ourlml 
corpns (MSM93), such a 4-tuplc was formed, attd 
each :l-tuple was paired with the at~aehnteut de- 
cision used in the Treebauk parse) '\['here were 
12,766 4q;ul)les in all, which were randomly split 
into 12,206 trnining s**mples and 500 test samples. 
\[n this e×periment (as in (\[II~,9\], I\]l{93)), tim at- 
tachment choice For l)repositional i)hrases was I)e- 
I,ween the oh.iecl~ mmn and l,he matrix verb. \[n the 
initial sl,~te mmotator, all prepositional phrases 
I \])at.terns were extra.clxxl usJ.ng tgrep, a. tree-based 
grep program written by Rich Pito. '\]'\]te 4-tuples were 
cxtract;ed autom~tk:ally, a.ud mista.kes were not. m~vn 
tta.lly pruned out. 
1199 
are attached to the object, noun. 2 This is tile at- 
tachment predicted by right association (Kim73). 
The allowable transforlnations are described 
by the following templates: 
• Change the attachment location from X to Y if: 
- nlisW 
- n2 is W 
- visW 
-- p is W 
- nl is W1 and n2 is W2 
- nl isWl andvisW2 
Here "from X to Y" can be either "from nl to v" 
or "from v to nl," W (W1, W2, etc.) can be any 
word, and the ellipsis indicates that the complete 
set of transformations permits matching on any 
combination of values for v, nl, p, and n2, with 
the exception of patterns that specify vahms for all 
four. For example, one allowable transformation 
would be 
Change the attachment location from nl to v 
if p is "until". 
Learning proceeds as follows. First, the train- 
ing set is processed according to the start state 
annotator, in this case attaching all prepositional 
phrases low (attached to nl). Then, in essence, 
each possible transtbrmation is scored by apply- 
ing it to the corpus and cornputing the reduction 
(or increase) in error rate. in reality, the search 
is data driven, and so the vast majority of al- 
lowable transformations are not examined. The 
best-scoring transformation then becomes the first 
transformation in the learned list. It is applied to 
the training corpus, and learning continues on the 
modified corpus. This process is iterated until no 
rule can he found that reduces the error rate. 
In the experiment, a tol, al of 471 transfor- 
mations were learned -- Figure 3 shows the first 
twenty. 3 Initial accuracy on the test set is 64.0% 
when prepositional phrases are always attached to 
the object noun. After applying the transforma- 
tions, accuracy increases to 80.8%. Figure 2 shows 
a plot of test-set accuracy as a function of the 
nulnber of training instances. It is interesting to 
note that the accuracy curve has not yet, reached a 
2If it is the case that attaching to the verb would 
be a better start state in some corpora, this decision 
could be parameterized. 
ZIn transformation #8, word token amount appears 
because it was used as the head noun for noun phrases 
representing percentage amounts, e.g. "5%." The rule 
captures the very regular appearance in the Penn Tree- 
bank Wall Street Journal corpus of parses like Sales for 
the yea," \[v'P rose \[Np5Yo\]\[pP in fiscal 1988\]\]. 
Accuracy 
81.00 r l 
80.00 !! 
79,00 t 
77.00 !--R ..... / --F ..... %oo!1 / 
I 
74:001 .... __t .... _ _ 
73.00 
j - 
72.00 ll i___/__. 7,®!>2 - 
70.00 
69.00 
68.00 
67.00 
64.00 
0.00 5.00 
I q 
! 
i 
t 
T!aining Size x 103 
10.00 
Figure 2: Accuracy as a function of l;raining corpus 
size (no word class information). 
plateau, suggesting that more training data wonld 
lead to further improvements. 
Adding Word Class Information 
In the above experiment, all trans\[brmations are. 
triggered hy words or groups of words, and it is 
surprising that good performance is achieved even 
in spite of the inevitable sparse data problems. 
There are a number of ways to address the sparse 
data problem. One of the obvious ways, mapping 
words to part of speech, seerns unlikely to help. h> 
stead, semanl, ic class information is an attracLive 
alternative. 
We incorporated the idea of using semantic ino 
tbrmation in the lbllowing way. Using the Word~ 
Net noun hierarchy (Milg0), each noun in the 
ffa{ning and test corpus was associated with a set 
containing the noun itself ph.ts the name of every 
semantic class that noun appears in (if any). 4 The 
transformation template is modified so that in ad- 
dition to asking if a nmm matches some word W, 
4Class names corresponded to unique "synonynl 
set" identifiers within the WordNet noun database. 
A noun "appears in" a class if it falls within the hy- 
ponym (IS-A) tree below that class. In the experiments 
reported here we used WordNet version :l.2. 
1200 
1 
2 
4 
5 
(3 
7 
8 
9 
10 
II 
12 
:13 
\]4 
15 
\[6 
17 
\[8 
119 
2()_ 
Change Att{:~ehment 
Location 
l"r~m~ To (;oMition 
N1 V P is at 
N\] \/ P is as 
N1 V I ) is iulo 
N:I \/ P is ,l}'om 
N:I V P is with 
N\] V N2 is year 
N 1 V P is by 
I? is i~ and 
N I V NI ix amounl 
N \[ \/ \]' is lhrough 
NI V \]) is d'urb~g 
NI V V ix p,ul 
N1 V N2 is mou.lk 
N\[ V 1' is ulldcr 
NJ V 1 ) is after 
V is have and 
N1 V I' is b~ 
N:\[ V P is wilk.oul 
V NI P is of 
V is buy and 
N1 \/ P is for 
N:I V P is beJbl"( 
V is have and 
NI V P is o~ 
x/ 
L( 
V 
v ~ 
V 
v / 
V 
V 
,/ 
Figure 3: The \[irst 20 transforntat;ions learned tbr 
preposil;ional phrase ~ttachme, n|;. 
it: (~an a/so ask if" it is a~ member of some class C. s 
This al)proaeh I;o data. sparseness is similar to tllat 
of (l{,es93b, li, l\[93), where {~ method ix proposed 
for using WordNet in conjunction with a corpus 
to ohtain class-based statisl, ie,q. ()lit' method here 
is ltlllC\]l simpler, however, in I;hat we a.re only us- 
ing Boolean values to indieal;e whel;her ~ word can 
be a member of' a class, rather than esl, imating ~ 
filll se{, of joint probabilities involving (:lasses. 
Since the tr;ulsformation-based al)l/roach with 
classes cCm gener~dize ill a way that the approach 
without classes is ml~l)le to, we woldd expect f'cwer 
l;ransf'ormal;ions to be necessary, l!;xperimeaH, ally, 
this is indeed the case. In a second experiment;, 
l;raining a.ml testing were era:tied out on the same 
samples as i, the previous experiment, bul; I;his 
time using the ext, ende, d tra nslbrmation t(;ml)la.tes 
for word classes. A total of 266 transformations 
were learned. Applying l.hese transt'ormai.ions to 
the test set l'eslllted in a.n accuracy of' 81.8%. 
\[n figure 4 we show tile lirst 20 tra.nsform{~l, ions 
lem'ned using ilOllll classes. Class descriptions arc 
surrounded by square bracl{ets. (; 'Phe first; grans- 
Ibrmation st~l.cs thai. if" N2 is a. nomt I, hal; describes 
time (i.e. ix a. member of WordNet class that in- 
cludes tim nouns "y(;ar," "month," "week," and 
others), thell the preltositiomd phrase should be 
al;tache(\[ t,() the w;rb, since, tim(; is \]nlMl more 
likely Io modify a yet'It (e.g. le,vc lh(: re(cling iu 
an hour) thaJl a, lloun. 
This exlw, riment also demonstrates how rely 
\[~¢~l;ul:e-based lexicon or word classiflcat, ion scheme 
cau triviaJly be incorlJorated into the learner, by 
exLencling l;ransfot'nlal,iolls to allow thent to make 
l'efel'eAlc(? |;o it WOl:(\[ gilt\[ {lily O\[' its features. 
\],valuation against Other 
Algorithms 
In (lIl~91, HR93), tra.inittg is done on a 
superset el' sentence types ttsed ill train- 
ing the transforlJ~atiolFbased learner. The 
transformation-based learner is I, rained on sen- 
tences containing v, n\[ and p, whereas the algo- 
rithm describe.d by llindle and I~,ooth ca.n zdso use 
sentences (;ontailfing only v and p, (n' only nl and 
i1. \[11 their lmper, they tra.in on ow~r 200,000 sen- 
Lettces with prel)ositions f'rotn the Associated Press 
(APt newswire, trod I;hey quote a.n accuracy of 78- 
80% on AP test &~ta.. 
~' For reasons of ~: u n- time c\[lk:icn(:y, transfonmLl, ions 
tmddng re\['crence 1:o tile classes of both nl a,nd n2 were 
IlOI; p(~l?lXiitl, tR(I. 
GI;or expository purposes, the u.iqm'. WordNet 
id('.ntilicrs luwe been replaced by words Lh~LL describe 
the cont, cnt of the class. 
1207 
(~lml~.ge \] 
Attachment, / 
Location / 
# li'rom t 'Fo \[ Condition 
1 N1 V N2 is \[time\] 
2 N1 V P is al 
3 N1 V P is as 
4 N1 V P is into 
5 N 1 V P is from 
6 N1 V 1 ) is wilh 
7 N1 V P is of 
P is in and 
NI is 
8 N 1 V \[measure, quanlily, amou~l\] 
P is by all.el 
9 N1 V N2 is \[abslraclion\] 
I 0 NI V P is lhro'ugh 
1) is in and 
N I is 
11 NI V \[group,group.in.g\] 
12 V N 1 V is be 
13 NI V V is pul 
14 NI V P is under 
P is i~ and 
N\] is 
15 N1 V \[written co.mmlu~ication\] 
16 N1 V l ) is wilhoul 
17 N1 V P is during 
18 N 1 V 
19 NI V 
20 N1. V 
l ) is on and 
Nt is \[U~.ing\] 
P is after 
V is buy and 
P is for 
Figure 4: The first 20 transformations learned 
for prepositional phrase attachment, using noun 
classes. 
of 
Method Accuracy Transforms 
t-Scores 70.4- 75.8 
'\]¥anstbrma~ions 80.8 471 
Trans\['ormati ons 
(no N2) 79.2 418 
Transformations 
(classes) 8:1.8 26(5 
Figure 5: Comparing l{esults in PP Attachment. 
In order to compare the two approaches, we 
reimplemen:ed the ~flgorithm fi'om (IIR.91) and 
tested it using the same training and test set 
used for the above experiments. Doing so re- 
sull;ed in an attachment accuracy of 70.4%. Next, 
the training set was expanded to include not only 
the cases o\[' ambiguous attachment \]Fonnd in the 
parsed Wall Street Journal corpus, as before, but 
also all the unambiguous prepositional phrase at- 
tachments tbnnd in the corpus, as well (contiml- 
ing to exclnde the tesl, set, of course). Accuracy 
improved to 75.8% r using the larger training set, 
still significantly lower than accuracy obtained us-- 
lag tam tl:ansformal;ion-based approach. The t.ech- 
nique described in (Res93b, 1{1t93), which com- 
bined Hindle and Rooth's lexical association tech- 
nique with a WordNet-based conceptual associa- 
tion measure, resulted in an accuracy of 76.0%, 
also lower than the results obtained using trans- 
formations. 
Since llindle and Rooth's approach does 
not make reference to n2, we re-ran the 
transformation-learner disalk)wing all transforma- 
tions that make reference ~o n2. Doing so resulted 
in an accuracy of 79.2%. See figure 5 h)r a sun> 
mary of results. 
It is possihle Lo compare; the results described 
here with a somewhat similar approach devel-. 
oped independently by Ratnaparkhi and I/,oukos 
(l{R94), since they also used training and test datt~ 
drawn from the Penn Treebank's Wall Street Jour- 
nal corpus. Instead of' using mammlly coustructed 
lexical classes, they nse word classes arrived at via 
mutmd information clustering in a training corpus 
(BDd+92), resulting in a representation in which 
each word is represented by a sequence of bits. 
As in the experiments here, their statistical model 
also makes use of a 4-tuple context (v, c<l, p, n2), 
and can use the identit.ies of the words, class inl'or- 
marion (tbr them, wdues of any of the class bits), 
rThe difference between these results ~nd tile result 
they quoted is likely due to a much bLrger training set 
used in their origimd experiments. 
1202 
or both Mnds of ild'ormation as eotll;extual fea- 
tlll?eS riley {lescril)e a search process use(\[ to 
{letePn6\]m what, sul)set of the available ill\['or,~Ht- 
lion will Im used in the model. (\]iv{;\]\] a eh{}ice 
of features, they train ;t prol}abi/islie model For 
I)r(Sitclcoutext), and in {.esl.ing choose Site :-: v 
oP Site = nl a~ccordi\]lg I;o which has {he higher 
eomlitional probal)i\]ity. 
t~,atnal)~Pkhi and Roukos rel}ort an aecuraey 
oi' 81.6% using bot, h word and class iui'orma, tion 
on Wall SI;re.et 3ourna\] text,, using a t:raining COl:- 
pus twice as la, rgc as that used in ouP experiments. 
They also report that a (leeision tree mode/ eon- 
st\];u(:t~d using the same features m,d I,i;aining data 
ac\[lieve{I I)erformanee of 77.71~, (}n t\[:e same I.est 
set, 
A llUll ll)el' o\[' other reseaPehers have exl)lored 
eorlms-I)ased approaches I;o l)repositional phrase 
attaehmet,t disaml)iguation tM~t n\]~d{c use of word 
classes, l"or example, Weisehed{q cl al. (WAIH91) 
and Basili el al. (BI}V91) bol,\]l deseril)e the 
use oflnanually coustrueted, donmhv Sl){~eitic word 
classes together with cori}us-tmsed si,t~tisties in of 
d{2r to resolve i)rel)ositional 1)hrase a.t, taehlllellt &Ill-. 
I}iguity. I{e(;a.llSe these papers deseril)e results ol)- 
tained on different corpora, however, it is (lifIicull; 
to II~,:'tl,:.{; a. 1)(;r\['(}rllla, iic{! COl\[lD~/l:iSOll, 
Conclusions 
The. tPansl'ormation-hased approach to resolving 
prepositional phl:ase disanlbiguation has a mlmt)er 
of advaiH;ages over (}l,\]ler ;i.l)l)roatehes. \[11 a (\]irect 
eoml);u:ison with lexical association, higher ble(;ll- 
vaey is achieved using words alolm (wen though 
attachment inf\}rnlation is captured i*l a relatively 
small numl)er of simple, rea(lable rules, as opl)osed 
to a. large lllllll\])eF Of lexical co-oeetlrreltee l)l'o\])a -- 
I)ilities. 
\]u addition, we have shown how the 
l;raus\['orln~Lion-based learner can casity be e×.- 
tended to incorporate word-class i/fformatiou. 
This resulted in a slight; increase in 1)erformanee, 
but, more notal)\]y it resulted in a reduct;ion hy 
roughly half in the l;ota\[ mnnl)er of transfor- 
mation rules needed. And in (:outrast to ap- 
pro~ches using class--based prol)abilistic models 
(BPV91, Res93e, WAI~ F91) or classes derived vi;~ 
statistical clusl.ering methods (1~.R94), t:his {ech- 
llique pro(hlees a, I:HIO set that (:al}l;ltr{es eolteepl:~lal 
geueralizal;ions couciseIy a.ml ill \]mman-rea{Ial}\]e 
for I n. 
F/\]rthel:lllOl:e, iuso\['ar as (:oHq)a, risolls e&ll I)o 
ina(h- alllOllg separa, Le exl'}el:llllel/l;s llsilt~ Wail 
Street Jour\]ml training aml test data ((llRgl), 
reiml)l('meute(l as reI)oPted above; (l{es93e, 
1t1193); (IH1.94)), the rule-based approach de.. 
scribed here achieves better perl'orlttaucc, using ml 
algol:ithm tlmt is eoncel}tually quite Mml)le am/iu 
l)l'~l.(;tiea\] teFlttS extretuely easy to ilnplenlel~t, s 
A more genera\] point 
ix tha.t the transl'orm~d,ion-based ;~l)l}roateh is eas- 
ily a(lapl,ed t;o situations in which some learning 
1"rein a (:orpus is desiral)le, 1}ui, hand-construetc{I 
l}l:ior knowledge is also available. Existing knowl- 
e{lge, such as structural strategies or even a priori 
h;xieal l}references, (;all 1)e incorl)orated into I;he 
start state annotator, so theft the learning ~dgo. 
I:ithm begins with n,ore refiued input. And knowu 
exceptious {:au 1)e handh'(l transparently simply hy 
adding add\]: \[onal rules to tim set thai; is learned, 
IlSillg tile sallle representatio\]l. 
A disadwmtage of the al)l)roach is that it re- 
quires supervised training that is, a representa- 
tive set of "true" c~ses t'FOlll which Co learn. Ilow- 
ever, this l)eeomes less of a probh'.m as atmotated 
eorl}ora beeolne increasingly available, and sug- 
gests the comhination o1:' supexvised and uusuper 
vised methods as a.u ilfl;eresth G ave\]me \['or \['urther 
rese;ire\] \[. 
References 
\[AS88\] (~. All, mann and M. Steedmau. Inter- 
action with context during hmnan sen- 
ten(u', I)ro<'essing. Co.qnitio~, ;}0:191 
238, 1988. 
\[l;DdF92\] I}eter I?. \]~Powa, Vhleell~ J. l)ella 
\]}ietr;~, l)eter V. {leSouza, 3enni\['er (L 
I,ai, and Robert I,. Mereer. (21ass. 
based n-gr+url models of natural \[aw- 
gua.ge. Compulational l, ingui.slic.% 
18(d):467 480, December 1(,)92. 
\[BPV91\] H. Basili, M. Pazienza, and P. Velardi. 
Combining NLI } ~md statistica.l teelP 
niques for lexical aequisition. \]:n Pro- 
ccedings of the AAA 1 Fall 5'ymposhtm. 
on Probabilistie Approaches to Natural 
Language, Cambridge, MassaehusetLs, 
Octobew 1!)9 I. 
\[Ih:i92\] F,. trill. A simple rule-bused part of 
speech t~Gge\]:. In t'voeecding~ of lhP 
Third UoT@'rence on Applied Natu- 
ral Lan.guagc Processing, A (..'g, Trent;o, 
ltaly, 1992. 
\[Bri93a\] I';. trill. Automatic grammar m- 
duel;ion and parsing fi:ee text: A 
t r arts \['orlnation-1}ased al~l)roaeh. I\]1 
Proceedings o,f tit{; 31sl Mceling of lhe 
Association of Compulational 1,inguis- 
tics, Columbus, Oh., 1993. 
8()ur code is being made pul}licly ~waihdAe. (?on- 
tact {.Its'. aJ\]thors \[br inl)-}l:ltlaJ, ioll Oll how to obtain it. 
1203 
\[Bri93b\] 
\[Bri94\] 
\[l!'ra78\] 
\[Hi{9 ~I\] 
\[n,~,ga\] 
\[1(im73\] 
\[Mil90\] 
\[MSM93\] 
\[Res93a\] 
\[Res93b\] 
\[l{,es93c\] 
\[ltnga\] 
E. Brill. A Corpus-Based Approach lo 
Language l, cmming. Phi) thesis, De- 
partment of' Computer and lnfbrrna- 
lion Science, University of Pennsylva- 
nia, 1993. 
E. Brill. Some advances in rule-based 
part of speech tagging. In Proceed- 
ings of lhe 7'welflh National Co,@r- 
once on Artificial \]nlclligence (AAAI- 
94) , Seattle, Wa., 1994. 
I,. Frazier. 0)~ comprehending sen- 
tences: synlaelic parsi~Lq sleategies. 
PhD thesis, University of Connecticut, 
1978. 
1). Hindle and M. f{,ooth. Structural 
ambiguity and lexicai relations. In 
Proccedin~ls of the ~f,J~l~ Annual Mee# 
in9 of lhe Associa*ion for" Computa- 
tional l,i~Lquisties, Berkeley, Ca., 1991. 
I). Iiindle and M. l{ooth. Structura.1 
ambiguity and lexical relations. Com- 
putational Li~Lq'uislies, 19(l):103 120, 
1993. 
J. Kimball. Seven principles of surface 
structure parsing in natm'al language. 
Co q~ilimh 2, 1973. 
G. Miller. Wordnet: an on-line lexi- 
cal cla.l~abase, hflernational ,lour~al of 
Lezieography, 3(4), 1990. 
M. Marcus, 
B. Santorini, and M. Marcinkiewicz. 
Building a large annotat, ed corpus of 
l!;nglish: the Petm 'Freebank. Compu- 
talional Linguistics, 19(2), 1993. 
P. l{esnik. Selection a.~d lnforma- 
lion: A Chtss-lhtsed Approach Io Lezi- 
eal t{elatio~ships. PhD thesis, Unive> 
sity of Pennsylvania, December 1993. 
(Institute for H,esearch in Cognitive 
Science report IRCS-9'3-42). 
P. l{,esnik. Semantic classes and syn- 
tactic ambiguity. In Proceedings of/he 
A ~PA Workshop on I~urna~ Language 
Technology. Morgan Kamfinan, \[993. 
P. H,esnik. Semantic classes and syn- 
tactic ambiguity. AftPA Workshop on 
Ihman Language q?echnology, Mz~rch 
1993, Princeton. 
P. Resnik and M. Hearst. Syntactic 
ambiguity and conceptual relations. In 
K. Church, editor, Pr'oceedings of lhe 
ACL Workshop on Very Large Cor- 
pora, pages 58 64, .June 1993. 
\[R, M94\] 
\[mv~q 
\[WAB + 9 :l\] 
\[WFB90\] 
L. i{amshaw and M. Marcus. Explo> 
ing the statistical derivation of trans- 
formational rule sequences for part-of- 
speech tagging. In a. Klavans and 
P. l{esnik, editors, The Balanci~g Ac*: 
Proceedings of lhe AC.L Workshop on 
Combining Symbolic and 5'tatislical 
Approaches to Language, New Mexico 
State University, July 1994. 
A. Ratnaparkhi and S. Roukos. A 
maximum entropy model \[br prepo- 
sitional phrase attachment. In Pro- 
ceedings of lhe ARPA Workshop on 
\]\[uma~ Language Technology, Plains- 
I,oro, N J, March 1994. 
IC Weischedel, D. Ayuso, R.. Bobrow, 
S. Boisen, R. lngria, and J. Palmucei. 
Partial parsing: a report of work in 
progress. In Proceedings of the l,b'urth 
DA.RPA 5'pecch and Nalural Language 
Workshop, February .199 l, 1991. 
G. Whi~temore, K. Ferrara, and 
H. Brunner. Empirical study of 
predictive powers of simple attach- 
lnent schemes for post-modifier prepo- 
sitional phrases. In .Procecdi~gs of the 
281h Annual Meeting of the Associ- 
ation for Comlmtalional Linguistics, 
1990. 
1204 
