File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2112_intro.xml

Size: 5,498 bytes

Last Modified: 2025-10-06 14:04:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2112">
  <Title>How bad is the problem of PP-attachment? A comparison of English, German and Swedish</Title>
  <Section position="4" start_page="0" end_page="81" type="intro">
    <SectionTitle>
2 Background
</SectionTitle>
    <Paragraph position="0"> (Hindle and Rooth, 1993) did not have access to a large treebank. Therefore they proposed an unsupervised method for resolving PP attachment ambiguities. And they evaluated their method against 880 English triples verb-noun-preposition (V-N-P) which they had extracted from randomly selected, ambiguously located PPs in a corpus. For example, the sentence &amp;quot;Timex had requested duty-free treatment for many types of watches&amp;quot; results in the V-N-P triple (request, treatment, for). These triples were manually annotated by both authors with either noun or verb attachment based on the complete sentence context. Interestingly, 586 of these triples (67%) were judged as noun attachments and only 33% as verb attachments. And (Hindle and Rooth, 1993) reported on 80% attachment accuracy, an improvement of 13% over the baseline (i.e. guessing noun attachment in all  cases).</Paragraph>
    <Paragraph position="1"> A year later (Ratnaparkhi et al., 1994) published a supervised approach to the PP attachment problem. They had extracted quadruples V-N-P-N1 (plus the accompanying attachment decision) from both an IBM computer manuals treebank (about 9000 tuples) and from the Wall Street Journal (WSJ) section of the Penn treebank (about 24,000 tuples). The latter tuple set has been reused by subsequent research, so let us focus on this one.2 (Ratnaparkhi et al., 1994) used 20,801 tuples for training and 3097 tuples for evaluation. They reported on 81.6% correct attachments.</Paragraph>
    <Paragraph position="2"> But have they solved the same problem as (Hindle and Rooth, 1993)? What was the initial bias towards noun attachment in their data? It turns out that their training set (the 20,801 tuples) contains only 52% noun attachments, while their test set (the 3097 tuples) contains 59% noun attachments.</Paragraph>
    <Paragraph position="3"> The difference in noun attachments between these two sets is striking, but (Ratnaparkhi et al., 1994) do not discuss this (and we also do not have an explanation for this). But it makes obvious that (Ratnaparkhi et al., 1994) were tackling a problem different from (Hindle and Rooth, 1993) given the fact that their baseline was at 59% guessing noun attachment (rather than 67% in the Hindle and Rooth experiments).3 Of course, the baseline is not a direct indicator of the difficulty of the disambiguation task. We may construct (artificial) cases with low baselines and a simple distribution of PP attachment tendencies. For example, we may construct the case that a language has 100 different prepositions, where 50 prepositions always introduce noun attachments, and the other 50 prepositions always require verb attachments. If we also assume that both groups occur with the same frequency, we have a 50% baseline but still a trivial disambiguation task.</Paragraph>
    <Paragraph position="4"> But in reality the baseline puts the disambiguation result into perspective. If, for instance, the baseline is 60% and the disambiguation result is 80% correct attachments, then we will claim that our disambiguation procedure is useful. Whereas  e.g. by (Collins and Brooks, 1995; Stetina and Nagao, 1997), used the Ratnaparkhi data sets and thus allowed for good comparability.</Paragraph>
    <Paragraph position="5"> if we have a baseline of 80% and the disambiguation result is 75%, then the procedure can be discarded. null So what are the baselines reported for other languages? And is it possible to use the same extraction mechanisms for V-N-P-N tuples in order to come to comparable baselines? We did an in-depth study on German PP attachment (Volk, 2001). We compiled our own treebank by annotating 3000 sentences from the weekly computer journal ComputerZeitung. We had first annotated a larger number of subsequent sentences with Part-of-Speech tags, and based on these PoS tags, we selected 3000 sentences that contained at least one full verb plus the sequence of a noun followed by a preposition. After annotating the 3000 sentences with complete syntax trees we used a Prolog program to extract V-N-P-N tuples with the accompanying attachment decisions. This lead to 4562 tuples out of which 61% were marked as noun attachments. We used the same procedure to extract tuples from the first 10,000 sentences of the NEGRA treebank. This resulted in 6064 tuples with 56% noun attachment (for a detailed overview see (Volk, 2001) p. 86). Again we observe a substantial difference in the baseline. When our student J&amp;quot;orgen Aasa worked on replicating our German experiments for Swedish, he used a Swedish treebank from the 1980s for the extraction of test data. He extracted V-N-P-N tuples from SynTag, a treebank with 5100 newspaper sentences built by (J&amp;quot;arborg, 1986). And Aasa was able to extract 2893 tuples out of which 73.8% were marked as noun attachments (Aasa, 2004) (p. 25). This was a surprisingly high figure, and we wondered whether this indicated a tendency in Swedish to avoid the PP in the ambiguous position unless it was to be attached to the noun. But again the extraction process was done with a special purpose extraction program whose correctness was hard to verify.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML