File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/94/c94-2195_abstr.xml

Size: 21,654 bytes

Last Modified: 2025-10-06 13:48:10

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-2195">
  <Title>A Rule-Based Approach to Prepositional Phrase Attachment Disambiguation</Title>
  <Section position="1" start_page="0" end_page="1202" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> I:n this paper, we describe a new corpus-based approach to prepositional phrase attachment disambiguation, and present results colnparing peffo&gt; mange of this algorithm with other corpus-based approaches to this problem.</Paragraph>
    <Paragraph position="1"> Introduction Prel)ositioual phrase attachment disambiguation is a difficult problem. Take, for example, the senrouge: null (l) Buy a ear \[p,o with a steering wheel\]. We would guess that the correct interpretation is that one should buy cars that come with steering wheels, and not that one should use a steering wheel as barter for purchasing a car. \]n this case, we are helped by our world knowledge about automobiles and automobile parts, and about typical methods of barter, which we can draw upon to correctly disambignate the sentence. Beyond possibly needing such rich semantic or conceptual int'ornlation, Altmann and Steedman (AS88) show that there a,re certain cases where a discourse model is needed to correctly disambiguate prepositional phrase atta.chment.</Paragraph>
    <Paragraph position="2"> However, while there are certainly cases of an&gt; biguity that seem to need some deep knowledge, either linguistic or conceptual, one might ask whag sort of performance could 1oe achieved by a system thai uses somewhat superficial knowledge au*Parts of this work done a.t the Computer and hP lbrmation Science Department, University of Pennsylvania were supported by by DARPA and AFOSR jointly under grant No. AFOSR-90-0066, and by ARO grant No. DAAL 03-89-C0031 PR\[ (first author) and by an IBM gradmtte fellowship (second author). This work was also supported at MIT by ARPA under Contract N000t4-89-J-la32= monitored through the Office of Naval resear&lt;:h (lirst a.uthor).</Paragraph>
    <Paragraph position="3"> tomatically ~xtracted from a large corpus. Recent work has shown thai; this approach holds promise (H\]~,91, HR93). hi this paper we describe a new rule-based approach to prepositional phrase attachment, disambiguation. A set of silnple rules is learned automatically to try to prediet proper attachment based on any of a number of possible contextual giles.</Paragraph>
    <Paragraph position="4"> Baseline llindle and Rooth (IIR91, 1\[17{93) describe corpus-based approach to disambiguating between prepositional phrase attachlnent to the main verb and to the object nonn phrase (such as in the example sentence above). They first point out that simple attachment strategies snch as right association (Kim73) and miuimal a.tbtchment (Fra78) do not work well i,l practice' (see (WFB90)). They then suggest using lexical preference, estimated from a large corpus of text, as a method of resolving attachment ambiguity, a technique the}' call &amp;quot;lexical association.&amp;quot; From a large corpus of pursed text, they first find all nonn phrase heads, and then record the verb (if' any) that precedes the head, and the preposition (if any) that follows it, as well as some other syntactic inforlnation about the sentence. An algorithm is then specified 1,o try to extract attachment information h'om this table of co-occurrences. I!'or instance, a table entry is cousidered a definite instance of the prepositional phrase attaching to the noun if: '\['he noun phrase occm:s in a context where no verb could license the prepositional phrase, specifically if the noun phrase is in a subjeet or other pre-verbal position.</Paragraph>
    <Paragraph position="5"> They specify seven different procedures for deciding whether a table entry is au instance of no attachment, sure noun attach, sm:e verb attach, or all ambiguous attach. Using these procedures, they are able to extract frequency information,  counting t, he numl)e,r of times a ptu:ticular verb or ncmn a.ppe~u:s with a pal:tieuh~r l~reposition. These frequen(;ies serve a.s training d~t;a for the statistical model they use to predict correct i~ttachmenL To dismnbigu;~te sentence (l), they would compute the likelihood of the preposition with giwm the verb buy, {rod eolltrast that with the likelihood of that preposition given I:he liOttll whed.</Paragraph>
    <Paragraph position="6"> ()he, problem wit;h this ,~pproa~ch is tll~tt it is limited in what rel~tionships are examined to make mi ~d;tachment decision. Simply extending t\[indle and l{,ooth's model to allow R)r relalionships such as tlml~ I)e.tweell the verb and the' object o\[' the preposition would i:esult ill too large a. parameter spa.ce, given ~my realistic quantity of traiuing data. Another prol)lem of the method, shared by ma.ny statistical approaches, is that the. model ~(:quired (Inring training is rel)reser~ted in a huge, t~d)le of probabilities, pl:ecludiug any stra.ightf'orward analysis of its workings.</Paragraph>
    <Paragraph position="7"> 'l~-ansformation-Based Error-Driven Learning Tra, nS\]bl'm~d;ion-lmsed errol:-dHven learlting is ~ sin@e learning a.lgorithm tlmt has t)eeu applied to a. number of natural la.ngm,ge prol)ie.ms, includ-Jllg l)a.t't O\[' speech tagging and syuta.cl, ic l)m:sing (1h:i92, \]h:i93a, Bri!)gb, Bri9d). Figure :1 illustrates the learning l)l:OCC'SS, l:irsL, tlll;21nlola, ted text; is l)assed through the initial-st;ate mmotatot. 'l'lw~ initial-stat, e area)tater can range in complexity from quite trivial (e.g. assigning rmtdom strll(:ttll:C) to quit, e sophistica.ted (e.g. assigning the output of a. I{nowledge-based ;/l/llot;~l, tol' that was created by hand). Ouce text has beeu passed through the iuitia.l-state almOl, at.or, it. is then (;orepared to the h'ugh,, as indicated ill a luamlally annota,teA eorl)llS , and transformations are le~u'ned that can be applied to the oul, put of the iuitial state remora, tot t;o make it, better resemble the :ruffs.</Paragraph>
    <Paragraph position="8"> So far, ouly ~ greedy search al)proach has been used: at eaeh itera.tion o\[' learning, t.he tra nsfo&gt; nl~tion is found whose application results in the greatest iml)rovenmnt; tha.t transfk)rmation is then added to the ordered trmlsforlmLtiou list and the corpus is upd~d.ed by a.pplying the. learned trans formation. (See, (I{,Mg,\[) for a detailed discussiou of this algorithm in the context of machiue, le, aru-iug issues.) Ottce 3,11 ordered list; of transform~tions is learned, new text, can be mmotated hy first aI&gt; plying the initial state ~mnotator to it and then applying each o\[' the traaM'ormations, iu order.</Paragraph>
    <Section position="1" start_page="1198" end_page="1199" type="sub_section">
      <SectionTitle>
Phrase Attachment
</SectionTitle>
      <Paragraph position="0"> We will now show how transformation-based e.rrol&gt; driwm IGmfing can be used to resolve prep(~sitiered phrase at, tachment ambiguity. The l)repositioiml phrase a.tt~Munent |ea.riter learns tra.nsfor--Ill~ttiollS \[Y=onl a C,)l:l&gt;tls O\[ 4-tuples of the \['orm (v I11 I\] 1|9), where v is ~1 w;rl), nl is the head of its objecl, llolni \]phrase, i ) is the \])l'epositioll, and 11:2 is the head of the noun phrase, governed by the prel)c, sition (for e,-:anq~le, sce/v :1~' bo:q/,l o,/p the h711/~2). 1,'or all sentences that conlbrm to this pattern in the Penn Treeb~mk W{dl St, l:eet 3ourlml corpns (MSM93), such a 4-tuplc was formed, attd each :l-tuple was paired with the at~aehnteut decision used in the Treebauk parse) '\['here were 12,766 4q;ul)les in all, which were randomly split into 12,206 trnining s**mples and 500 test samples.</Paragraph>
      <Paragraph position="1"> \[n this experiment (as in (\[II~,9\], I\]l{93)), tim attachment choice For l)repositional i)hrases was I)e-I,ween the oh.iecl~ mmn and l,he matrix verb. \[n the initial sl,~te mmotator, all prepositional phrases I \])at.terns were extra.clxxl usJ.ng tgrep, a. tree-based grep program written by Rich Pito. '\]'\]te 4-tuples were cxtract;ed autom~tk:ally, a.ud mista.kes were not. m~vn tta.lly pruned out.</Paragraph>
      <Paragraph position="2">  are attached to the object, noun. 2 This is tile attachment predicted by right association (Kim73).</Paragraph>
      <Paragraph position="3"> The allowable transforlnations are described by the following templates:  * Change the attachment location from X to Y if: - nlisW - n2 is W - visW -- p is W - nl is W1 and n2 is W2 - nl isWl andvisW2  Here &amp;quot;from X to Y&amp;quot; can be either &amp;quot;from nl to v&amp;quot; or &amp;quot;from v to nl,&amp;quot; W (W1, W2, etc.) can be any word, and the ellipsis indicates that the complete set of transformations permits matching on any combination of values for v, nl, p, and n2, with the exception of patterns that specify vahms for all four. For example, one allowable transformation would be Change the attachment location from nl to v if p is &amp;quot;until&amp;quot;.</Paragraph>
      <Paragraph position="4"> Learning proceeds as follows. First, the training set is processed according to the start state annotator, in this case attaching all prepositional phrases low (attached to nl). Then, in essence, each possible transtbrmation is scored by applying it to the corpus and cornputing the reduction (or increase) in error rate. in reality, the search is data driven, and so the vast majority of allowable transformations are not examined. The best-scoring transformation then becomes the first transformation in the learned list. It is applied to the training corpus, and learning continues on the modified corpus. This process is iterated until no rule can he found that reduces the error rate.</Paragraph>
      <Paragraph position="5"> In the experiment, a tol, al of 471 transformations were learned -- Figure 3 shows the first twenty. 3 Initial accuracy on the test set is 64.0% when prepositional phrases are always attached to the object noun. After applying the transformations, accuracy increases to 80.8%. Figure 2 shows a plot of test-set accuracy as a function of the nulnber of training instances. It is interesting to note that the accuracy curve has not yet, reached a 2If it is the case that attaching to the verb would be a better start state in some corpora, this decision could be parameterized.</Paragraph>
      <Paragraph position="6"> ZIn transformation #8, word token amount appears because it was used as the head noun for noun phrases representing percentage amounts, e.g. &amp;quot;5%.&amp;quot; The rule captures the very regular appearance in the Penn Tree-bank Wall Street Journal corpus of parses like Sales for the yea,&amp;quot; \[v'P rose \[Np5Yo\]\[pP in fiscal 1988\]\].  size (no word class information).</Paragraph>
      <Paragraph position="7"> plateau, suggesting that more training data wonld lead to further improvements.</Paragraph>
    </Section>
    <Section position="2" start_page="1199" end_page="1202" type="sub_section">
      <SectionTitle>
Adding Word Class Information
</SectionTitle>
      <Paragraph position="0"> In the above experiment, all trans\[brmations are.</Paragraph>
      <Paragraph position="1"> triggered hy words or groups of words, and it is surprising that good performance is achieved even in spite of the inevitable sparse data problems.</Paragraph>
      <Paragraph position="2"> There are a number of ways to address the sparse data problem. One of the obvious ways, mapping words to part of speech, seerns unlikely to help. h&gt; stead, semanl, ic class information is an attracLive alternative.</Paragraph>
      <Paragraph position="3"> We incorporated the idea of using semantic ino tbrmation in the lbllowing way. Using the Word~ Net noun hierarchy (Milg0), each noun in the ffa{ning and test corpus was associated with a set containing the noun itself ph.ts the name of every semantic class that noun appears in (if any). 4 The transformation template is modified so that in addition to asking if a nmm matches some word W, 4Class names corresponded to unique &amp;quot;synonynl set&amp;quot; identifiers within the WordNet noun database. A noun &amp;quot;appears in&amp;quot; a class if it falls within the hyponym (IS-A) tree below that class. In the experiments reported here we used WordNet version :l.2.</Paragraph>
      <Paragraph position="5"> preposil;ional phrase ~ttachme, n|;.</Paragraph>
      <Paragraph position="6"> it: (~an a/so ask if&amp;quot; it is a~ member of some class C. s This al)proaeh I;o data. sparseness is similar to tllat of (l{,es93b, li, l\[93), where {~ method ix proposed for using WordNet in conjunction with a corpus to ohtain class-based statisl, ie,q. ()lit' method here is ltlllC\]l simpler, however, in I;hat we a.re only using Boolean values to indieal;e whel;her ~ word can be a member of' a class, rather than esl, imating ~ filll se{, of joint probabilities involving (:lasses.</Paragraph>
      <Paragraph position="7"> Since the tr;ulsformation-based al)l/roach with classes cCm gener~dize ill a way that the approach without classes is ml~l)le to, we woldd expect f'cwer l;ransf'ormal;ions to be necessary, l!;xperimeaH, ally, this is indeed the case. In a second experiment;, l;raining a.ml testing were era:tied out on the same samples as i, the previous experiment, bul; I;his time using the ext, ende, d tra nslbrmation t(;ml)la.tes for word classes. A total of 266 transformations were learned. Applying l.hese transt'ormai.ions to the test set l'eslllted in a.n accuracy of' 81.8%.</Paragraph>
      <Paragraph position="8"> \[n figure 4 we show tile lirst 20 tra.nsform{~l, ions lem'ned using ilOllll classes. Class descriptions arc surrounded by square bracl{ets. (; 'Phe first; grans-Ibrmation st~l.cs thai. if&amp;quot; N2 is a. nomt I, hal; describes time (i.e. ix a. member of WordNet class that includes tim nouns &amp;quot;y(;ar,&amp;quot; &amp;quot;month,&amp;quot; &amp;quot;week,&amp;quot; and others), thell the preltositiomd phrase should be al;tache(\[ t,() the w;rb, since, tim(; is \]nlMl more likely Io modify a yet'It (e.g. le,vc lh(: re(cling iu an hour) thaJl a, lloun.</Paragraph>
      <Paragraph position="9"> This exlw, riment also demonstrates how rely \[~C/~l;ul:e-based lexicon or word classiflcat, ion scheme cau triviaJly be incorlJorated into the learner, by exLencling l;ransfot'nlal,iolls to allow thent to make l'efel'eAlc(? |;o it WOl:(\[ gilt\[ {lily O\[' its features. \],valuation against Other Algorithms In (lIl~91, HR93), tra.inittg is done on a superset el' sentence types ttsed ill training the transforlJ~atiolFbased learner. The transformation-based learner is I, rained on sentences containing v, n\[ and p, whereas the algorithm describe.d by llindle and I~,ooth ca.n zdso use sentences (;ontailfing only v and p, (n' only nl and i1. \[11 their lmper, they tra.in on ow~r 200,000 sen-Lettces with prel)ositions f'rotn the Associated Press (APt newswire, trod I;hey quote a.n accuracy of 7880% on AP test &amp;~ta..</Paragraph>
      <Paragraph position="10"> ~' For reasons of ~: u n- time c\[lk:icn(:y, transfonmLl, ions tmddng re\['crence 1:o tile classes of both nl a,nd n2 were IlOI; p(~l?lXiitl, tR(I.</Paragraph>
      <Paragraph position="11"> GI;or expository purposes, the u.iqm'. WordNet id('.ntilicrs luwe been replaced by words Lh~LL describe the cont, cnt of the class.</Paragraph>
      <Paragraph position="13"> In order to compare the two approaches, we reimplemen:ed the ~flgorithm fi'om (IIR.91) and tested it using the same training and test set used for the above experiments. Doing so resull;ed in an attachment accuracy of 70.4%. Next, the training set was expanded to include not only the cases o\[' ambiguous attachment \]Fonnd in the parsed Wall Street Journal corpus, as before, but also all the unambiguous prepositional phrase attachments tbnnd in the corpus, as well (contimling to exclnde the tesl, set, of course). Accuracy improved to 75.8% r using the larger training set, still significantly lower than accuracy obtained us-lag tam tl:ansformal;ion-based approach. The t.echnique described in (Res93b, 1{1t93), which combined Hindle and Rooth's lexical association technique with a WordNet-based conceptual association measure, resulted in an accuracy of 76.0%, also lower than the results obtained using transformations. null Since llindle and Rooth's approach does not make reference to n2, we re-ran the transformation-learner disalk)wing all transformations that make reference ~o n2. Doing so resulted in an accuracy of 79.2%. See figure 5 h)r a sun&gt; mary of results.</Paragraph>
      <Paragraph position="14"> It is possihle Lo compare; the results described here with a somewhat similar approach devel-.</Paragraph>
      <Paragraph position="15"> oped independently by Ratnaparkhi and I/,oukos (l{R94), since they also used training and test datt~ drawn from the Penn Treebank's Wall Street Journal corpus. Instead of' using mammlly coustructed lexical classes, they nse word classes arrived at via mutmd information clustering in a training corpus (BDd+92), resulting in a representation in which each word is represented by a sequence of bits.</Paragraph>
      <Paragraph position="16"> As in the experiments here, their statistical model also makes use of a 4-tuple context (v, c&lt;l, p, n2), and can use the identit.ies of the words, class inl'ormarion (tbr them, wdues of any of the class bits), rThe difference between these results ~nd tile result they quoted is likely due to a much bLrger training set used in their origimd experiments.</Paragraph>
      <Paragraph position="17">  or both Mnds of ild'ormation as eotll;extual featlll?eS riley {lescril)e a search process use(\[ to {letePn6\]m what, sul)set of the available ill\['or,~Htlion will Im used in the model. (\]iv{;\]\] a eh{}ice of features, they train ;t prol}abi/islie model For I)r(Sitclcoutext), and in {.esl.ing choose Site :-: v oP Site = nl a~ccordi\]lg I;o which has {he higher eomlitional probal)i\]ity.</Paragraph>
      <Paragraph position="18"> t~,atnal)~Pkhi and Roukos rel}ort an aecuraey oi' 81.6% using bot, h word and class iui'orma, tion on Wall SI;re.et 3ourna\] text,, using a t:raining COl:pus twice as la, rgc as that used in ouP experiments. They also report that a (leeision tree mode/ eonst\];u(:t~d using the same features m,d I,i;aining data ac\[lieve{I I)erformanee of 77.71~, (}n t\[:e same I.est set, A llUll ll)el' o\[' other reseaPehers have exl)lored eorlms-I)ased approaches I;o l)repositional phrase attaehmet,t disaml)iguation tM~t n\]~d{c use of word classes, l&amp;quot;or example, Weisehed{q cl al. (WAIH91) and Basili el al. (BI}V91) bol,\]l deseril)e the use oflnanually coustrueted, donmhv Sl){~eitic word classes together with cori}us-tmsed si,t~tisties in of d{2r to resolve i)rel)ositional 1)hrase a.t, taehlllellt &amp;Ill-. I}iguity. I{e(;a.llSe these papers deseril)e results ol)tained on different corpora, however, it is (lifIicull; to II~,:'tl,:.{; a. 1)(;r\['(}rllla, iic{! COl\[lD~/l:iSOll, Conclusions The. tPansl'ormation-hased approach to resolving prepositional phl:ase disanlbiguation has a mlmt)er of advaiH;ages over (}l,\]ler ;i.l)l)roatehes. \[11 a (\]irect eoml);u:ison with lexical association, higher ble(;llvaey is achieved using words alolm (wen though attachment inf\}rnlation is captured i*l a relatively small numl)er of simple, rea(lable rules, as opl)osed to a. large lllllll\])eF Of lexical co-oeetlrreltee l)l'o\])a -I)ilities. null \]u addition, we have shown how the l;raus\['orln~Lion-based learner can casity be ex.tended to incorporate word-class i/fformatiou.</Paragraph>
      <Paragraph position="19"> This resulted in a slight; increase in 1)erformanee, but, more notal)\]y it resulted in a reduct;ion hy roughly half in the l;ota\[ mnnl)er of transformation rules needed. And in (:outrast to appro~ches using class--based prol)abilistic models (BPV91, Res93e, WAI~ F91) or classes derived vi;~ statistical clusl.ering methods (1~.R94), t:his {echllique pro(hlees a, I:HIO set that (:al}l;ltr{es eolteepl:~lal geueralizal;ions couciseIy a.ml ill \]mman-rea{Ial}\]e for I n.</Paragraph>
      <Paragraph position="20"> F/\]rthel:lllOl:e, iuso\['ar as (:oHq)a, risolls e&amp;ll I)o ina(h- alllOllg separa, Le exl'}el:llllel/l;s llsilt~ Wail Street Jour\]ml training aml test data ((llRgl), reiml)l('meute(l as reI)oPted above; (l{es93e, 1t1193); (IH1.94)), the rule-based approach de..</Paragraph>
      <Paragraph position="21"> scribed here achieves better perl'orlttaucc, using ml algol:ithm tlmt is eoncel}tually quite Mml)le am/iu l)l'~l.(;tiea\] teFlttS extretuely easy to ilnplenlel~t, s A more genera\] point ix tha.t the transl'orm~d,ion-based ;~l)l}roateh is easily a(lapl,ed t;o situations in which some learning 1&amp;quot;rein a (:orpus is desiral)le, 1}ui, hand-construetc{I l}l:ior knowledge is also available. Existing knowle{lge, such as structural strategies or even a priori h;xieal l}references, (;all 1)e incorl)orated into I;he start state annotator, so theft the learning ~dgo.</Paragraph>
      <Paragraph position="22"> I:ithm begins with n,ore refiued input. And knowu exceptious {:au 1)e handh'(l transparently simply hy adding add\]: \[onal rules to tim set thai; is learned, IlSillg tile sallle representatio\]l.</Paragraph>
      <Paragraph position="23"> A disadwmtage of the al)l)roach is that it requires supervised training that is, a representative set of &amp;quot;true&amp;quot; c~ses t'FOlll which Co learn. Ilowever, this l)eeomes less of a probh'.m as atmotated eorl}ora beeolne increasingly available, and suggests the comhination o1:' supexvised and uusuper vised methods as a.u ilfl;eresth G ave\]me \['or \['urther rese;ire\] \[.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML