File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/h05-1105_metho.xml
Size: 14,817 bytes
Last Modified: 2025-10-06 14:09:35
<?xml version="1.0" standalone="yes"?> <Paper uid="H05-1105"> <Title>Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution</Title> <Section position="5" start_page="835" end_page="835" type="metho"> <SectionTitle> 2 Prepositional Phrase Attachment </SectionTitle> <Paragraph position="0"> A long-standing challenge for syntactic parsers is the attachment decision for prepositional phrases. In a configuration where a verb takes a noun complement that is followed by a PP, the problem arises of whether the PP attaches to the noun or to the verb.</Paragraph> <Paragraph position="1"> Consider the following contrastive pair of sentences: (1) Peter spent millions of dollars. (noun) (2) Peter spent time with his family. (verb) In the first example, the PP millions of dollars attaches to the noun millions, while in the second the PP with his family attaches to the verb spent.</Paragraph> <Paragraph position="2"> Past work on PP-attachment has often cast these associations as the quadruple (v,n1,p,n2), where v is the verb, n1 is the head of the direct object, p is the preposition (the head of the PP) and n2 is the head of the NP inside the PP. For example, the quadruple for (2) is (spent, time, with, family).</Paragraph> </Section> <Section position="6" start_page="835" end_page="838" type="metho"> <SectionTitle> 2.1 Related Work </SectionTitle> <Paragraph position="0"> Early work on PP-attachment ambiguity resolution relied on syntactic (e.g., &quot;minimal attachment&quot; and &quot;right association&quot;) and pragmatic considerations. Most recent work can be divided into supervised and unsupervised approaches. Supervised approaches tend to make use of semantic classes or thesauri in order to deal with data sparseness problems. Brill and Resnik (1994) used the supervised transformation-based learning method and lexical and conceptual classes derived from Word-Net, achieving 82% precision on 500 randomly selected examples. Ratnaparkhi et al. (1994) created a benchmark dataset of 27,937 quadruples (v,n1,p,n2), extracted from the Wall Street Journal. They found the human performance on this task to be 88%1. Using this dataset, they trained a maximum entropy model and a binary hierarchy of word classes derived by mutual information, achieving 81.6% precision. Collins and Brooks (1995) used a supervised back-off model to achieve 84.5% precision on the Ratnaparkhi test set. Stetina and Makoto (1997) use a supervised method with a decision tree and WordNet classes to achieve 88.1% precision on the same test set. Toutanova et al. (2004) use a supervised method that makes use of morphological and syntactic analysis and WordNet synsets, yielding 87.5% accuracy.</Paragraph> <Paragraph position="1"> In the unsupervised approaches, the attachment decision depends largely on co-occurrence statistics drawn from text collections. The pioneering work in this area was that of Hindle and Rooth (1993).</Paragraph> <Paragraph position="2"> Using a partially parsed corpus, they calculate and compare lexical associations over subsets of the tuple (v,n1,p), ignoring n2, and achieve 80% precision at 80% recall.</Paragraph> <Paragraph position="3"> More recently, Ratnaparkhi (1998) developed an unsupervised method that collects statistics from text annotated with part-of-speech tags and morphological base forms. An extraction heuristic is used to identify unambiguous attachment decisions, for example, the algorithm can assume a noun attachment if there is no verb within k words to the left of the preposition in a given sentence, among other conditions. This extraction heuristic uncovered 910K unique tuples of the form (v,p,n2) and (n,p,n2), although the results are very noisy, suggesting the correct attachment only about 69% of the time. The tuples are used as training data for classifiers, the best of which achieves 81.9% precision ple combinations of web-based n-grams, Lapata and Keller (2005) achieve lower results, in the low 70's.</Paragraph> <Paragraph position="4"> Using a different collection consisting of German PP-attachment decisions, Volk (2000) uses the web to obtain n-gram counts. He compared Pr(p|n1) to Pr(p|v), where Pr(p|x) = #(x,p)/#(x). Here x can be n1 or v. The bigram frequencies #(x,p) were obtained using the Altavista NEAR operator.</Paragraph> <Paragraph position="5"> The method was able to make a decision on 58% of the examples with a precision of 75% (baseline 63%). Volk (2001) then improved on these results by comparing Pr(p,n2|n1) to Pr(p,n2|v). Using inflected forms, he achieved P=75% and R=85%.</Paragraph> <Paragraph position="6"> Calvo and Gelbukh (2003) experimented with a variation of this, using exact phrases instead of the NEAR operator. For example, to disambiguate Veo al gato con un telescopio, they compared frequencies for phrases such as &quot;ver con telescopio&quot; and &quot;gato con telescopio&quot;. They tested this idea on 181 randomly chosen Spanish disambiguation examples, labelling 89.5% recall with a precision of 91.97%.</Paragraph> <Section position="1" start_page="836" end_page="837" type="sub_section"> <SectionTitle> 2.2 Models and Features </SectionTitle> <Paragraph position="0"> We computed two co-occurrence models; (i) Pr(p|n1) vs. Pr(p|v) (ii) Pr(p,n2|n1) vs. Pr(p,n2|v). Each of these was computed two different ways: using Pr (probabilities) and # (frequencies). We estimate the n-gram counts using exact phrase queries (with inflections, derived from WordNet 2.0) using the MSN Search Engine. We also allow for determiners, where appropriate, e.g., between the preposition and the noun when querying for #(p,n2). We add up the frequencies for all possible variations. Web frequencies were reliable enough and did not need smoothing for (i), but for (ii), smoothing using the technique described in Hindle and Rooth (1993) led to better recall. We also tried back-off from (ii) to (i), as well as back-off plus smoothing, but did not find improvements over smoothing alone. We found n-gram counts to be unreliable when pronouns appear in the test set rather than nouns, and disabled them in these cases. Such examples can still be handled by paraphrases or surface features (see below). Authors sometimes (consciously or not) disambiguate the words they write by using surface-level markers to suggest the correct meaning. We have found that exploiting these markers, when they occur, can prove to be very helpful for making disambiguation decisions. The enormous size of web search engine indexes facilitates finding such markers frequently enough to make them useful.</Paragraph> <Paragraph position="1"> For example, John opened the door with a key is a difficult verb attachment example because doors, keys, and opening are all semantically related. To determine if this should be a verb or a noun attachment, we search for cues that indicate which of these terms tend to associate most closely. If we see parentheses used as follows: &quot;open the door (with a key)&quot; this suggests a verb attachment, since the parentheses signal that &quot;with a key&quot; acts as its own unit. Similarly, hyphens, colons, capitalization, and other punctuation can help signal disambiguation decisions. For Jean ate spaghetti with sauce, if we see &quot;eat: spaghetti with sauce&quot; this suggests a noun attachment.</Paragraph> <Paragraph position="2"> Table 1 illustrates a wide variety of surface features, along with the attachment decisions they are assumed to suggest (events of frequency 1 have been ignored). The surface features for PP-attachment have low recall: most of the examples have no surface features extracted.</Paragraph> <Paragraph position="3"> We gather the statistics needed by issuing queries to web search engines. Unfortunately, search engines usually ignore punctuation characters, thus preventing querying directly for terms containing hyphens, brackets, etc. We collect these numbers indirectly by issuing queries with exact phrases and then post-processing the top 1,000 resulting summaries2, looking for the surface features of interest. We use Google for both the surface feature and paraphrase extractions (described below).</Paragraph> <Paragraph position="4"> The second way we extend the use of web counts is by paraphrasing the relation of interest and seeing if it can be found in its alternative form, which 2We often obtain more than 1,000 summaries per example because we usually issue multiple queries per surface pattern, by varying inflections and inclusion of determiners. suggests the correct attachment decision. We use the following patterns along with their associated at- null tachment predictions: (1) v n2 n1 (noun) (2) v p n2 n1 (verb) (3) p n2 * v n1 (verb) (4) n1 p n2 v (noun) (5) v pronoun p n2 (verb) (6) be n1 p n2 (noun) The idea behind Pattern (1) is to determine if &quot;n1 p n2&quot; can be expressed as a noun compound; if this happens sufficiently often, we can predict a noun attachment. For example, meet/v demands/n1 from/p customers/n2 becomes meet/v the customers/n2 demands/n1.</Paragraph> <Paragraph position="5"> Note that the pattern could wrongly target ditransitive verbs: e.g., it could turn gave/v an apple/n1 to/p him/n2 into gave/v him/n2 an apple/n1. To prevent this, we do not allow a determiner before n1, but we do require one before n2. In addition, we disallow the pattern if the preposition is to and we require both n1 and n2 to be nouns (as opposed to numbers, percents, pronouns, determiners etc.).</Paragraph> <Paragraph position="6"> Pattern (2) predicts a verb attachment. It presupposes that &quot;p n2&quot; is an indirect object of the verb v and tries to switch it with the direct object n1, e.g., had/v a program/n1 in/p place/n2 would be transformed into had/v in/p place/n2 a program/n1. We require n1 to be preceded by a determiner (to prevent &quot;n2 n1&quot; forming a noun compound).</Paragraph> <Paragraph position="7"> Pattern (3) looks for appositions, where the PP has moved in front of the verb, e.g., to/p him/n2 I gave/v an apple/n1. The symbol * indicates a wildcard position where we allow up to three intervening words. Pattern (4) looks for appositions, where the PP has moved in front of the verb together with n1. It would transform shaken/v confidence/n1 in/p markets/n2 into confidence/n1 in/p markets/n2 shaken/v.</Paragraph> <Paragraph position="8"> Pattern (5) is motivated by the observation that if n1 is a pronoun, this suggests a verb attachment (Hindle and Rooth, 1993). (A separate feature checks if n1 is a pronoun.) The pattern substitutes n1 with a dative pronoun (we allow him and her), e.g., it will convert put/v a client/n1 at/p odds/n2 into put/v him at/p odds/n2.</Paragraph> <Paragraph position="9"> Pattern (6) is motivated by the observation that the verb to be is typically used with a noun attachment. (A separate feature checks if v is a form of the verb to be.) The pattern substitutes v with is and are, e.g. it will turn eat/v spaghetti/n1 with/p sauce/n2 into is spaghetti/n1 with/p sauce/n2.</Paragraph> <Paragraph position="10"> These patterns all allow for determiners where appropriate, unless explicitly stated otherwise. For a given example, a prediction is made if at least one instance of the pattern has been found.</Paragraph> </Section> <Section position="2" start_page="837" end_page="838" type="sub_section"> <SectionTitle> 2.3 Evaluation </SectionTitle> <Paragraph position="0"> For the evaluation, we used the test part (3,097 examples) of the benchmark dataset by Ratnaparkhi et al. (1994). We used all 3,097 test examples in order to make our results directly comparable.</Paragraph> <Paragraph position="1"> Unfortunately, there are numerous errors in the test set3. There are 149 examples in which a bare determiner is labeled as n1 or n2 rather than the actual head noun. Supervised algorithms can compensate for this problem by learning from the training set that &quot;the&quot; can act as a noun in this collection, but unsupervised algorithms cannot.</Paragraph> <Paragraph position="2"> In addition, there are also around 230 examples in which the nouns contain special symbols like: %, slash, &, ', which are lost when querying against a search engine. This poses a problem for our algorithm but is not a problem with the test set itself. The results are shown in Table 2. Following Ratnaparkhi (1998), we predict a noun attachment if the preposition is of (a very reliable heuristic). The table shows the performance for each feature in isolation (excluding examples whose preposition is of). The surface features are represented by a single score in Table 2: for a given example, we sum up separately the number of noun- and verb-attachment pattern matches, and assign the attachment with the larger number of matches.</Paragraph> <Paragraph position="3"> We combine the bold rows of Table 2 in a majority vote (assigning noun attachment to all of instances), obtaining P=85.01%, R=91.77%. To get 100% recall, we assign all undecided cases to verb (since the majority of the remaining non-of instances attach to the verb, yielding P=83.63%, R=100%. We show 0.95-level confidence intervals for the precision, computed by a general method based on constant chi-square boundaries (Fleiss, 1981).</Paragraph> <Paragraph position="4"> A test for statistical significance reveals that our results are as strong as those of the leading unsuper- null open Door with a key noun 100.00 0.13 (open) door with a key noun 66.67 0.28 open (door with a key) noun 71.43 0.97 open - door with a key noun 69.70 1.52 open / door with a key noun 60.00 0.46 open, door with a key noun 65.77 5.11 open: door with a key noun 64.71 1.57 open; door with a key noun 60.00 0.23 open. door with a key noun 64.13 4.24 open? door with a key noun 83.33 0.55 open! door with a key noun 66.67 0.14 open door With a Key verb 0.00 0.00 (open door) with a key verb 50.00 0.09 open door (with a key) verb 73.58 2.44 open door - with a key verb 68.18 2.03 open door / with a key verb 100.00 0.14 open door, with a key verb 58.44 7.09 open door: with a key verb 70.59 0.78 open door; with a key verb 75.00 0.18 open door. with a key verb 60.77 5.99 open door! with a key verb 100.00 0.18 sion and recall shown are across all examples, not just the door example shown.</Paragraph> <Paragraph position="5"> vised approach on this collection (Pantel and Lin, 2000). Unlike that work, we do not require a collocation database, a thesaurus, a dependency parser, nor a large domain-dependent text corpus, which makes our approach easier to implement and to extend to other languages.</Paragraph> </Section> </Section> class="xml-element"></Paper>