File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/p99-1081_metho.xml
Size: 5,909 bytes
Last Modified: 2025-10-06 14:15:28
<?xml version="1.0" standalone="yes"?> <Paper uid="P99-1081"> <Title>An Unsupervised Model for Statistically Determining Coordinate Phrase Attachment</Title> <Section position="5" start_page="610" end_page="611" type="metho"> <SectionTitle> 3 Training Data Extraction </SectionTitle> <Paragraph position="0"> A statistical learning model must train from un-ambiguous data. In annotated corpora ambiguous data are made unambiguous through classifications made by human annotators. In unannotated corpora the data themselves must be unambiguous. Therefore, while this model disambiguates CPs of the form (nl p n2 cc n3), it trains from implicitly unambiguous CPs of the form (n ccn). For example: - Wn-x is the first noun to occur within 4 words to the left of cc.</Paragraph> <Paragraph position="1"> -no preposition occurs between this noun and cc.</Paragraph> <Paragraph position="2"> - no preposition occurs within 4 words to the left of this noun.</Paragraph> <Paragraph position="3"> * wn+x is the rightmost noun (n2) if: - it is the first noun to occur within 4 words to the right of cc.</Paragraph> <Paragraph position="4"> - No preposition occurs between cc and this noun.</Paragraph> <Paragraph position="5"> The first noun to occur within 4 words to the right of cc is always extracted. This is ncc. Such nouns are also used in the statistical model. For example, the we process the sentence below as follows: dog and cat Because there are only two nouns in the un-ambiguous CP, we must redefine its components. The first noun will be referred to as nl. It is analogous to nl and n2 in the ambiguous CP. The second, terminal noun will be referred to as n3. It is analogous to the third noun in the ambiguous CP. Hence nl -- dog, cc --- and, n3 = cat. In addition to the unambiguous CPs, the model also uses any noun that follows acc. Such nouns are classified, ncc.</Paragraph> <Paragraph position="6"> We extracted 119629 unambiguous CPs and 325261 nccs from the unannotated 1988 Wall Street Journal. First the raw text was fed into the part-of-speech tagger described in \[AR96\] 1. This was then passed to a simple chunker as used in \[AR98\], implemented with two small IBecause this tagger trained on annotated data, one may argue that the model presented here is not purely unsupervised.</Paragraph> <Paragraph position="7"> Several firms have also launched business subsidiaries and consulting arms specializing in trade, lobbying and other areas.</Paragraph> <Paragraph position="8"> First it is annotated with parts of speech:</Paragraph> <Paragraph position="10"> From there, it is passed to the chunker yielding: null</Paragraph> <Paragraph position="12"> Noun phrase heads of ambiguous and unambiguous CPs are then extracted according to the heuristic, giving: subsidiaries and arms and areas where the extracted unambiguous CP is {nl = subsidiaries, cc = and, n3 = arms} and areas is extracted as a ncc because, although it is not part of an unambiguous CP, it occurs within four words after a conjunction.</Paragraph> </Section> <Section position="6" start_page="611" end_page="611" type="metho"> <SectionTitle> 4 The Statistical Model </SectionTitle> <Paragraph position="0"> First, we can factor p(a, nl, n2, n3) as follows:</Paragraph> <Paragraph position="2"> The terms p(nl) and p(n2) are independent of the attachment and need not be computed.</Paragraph> <Paragraph position="3"> The other two terms are more problematic. Because the training phrases are unambiguous and of the form (nl cc n2), nl and n2 of the CP in question never appear together in the training data. To compensate we use the following heuristic as in JAR98\]. Let the random variable C/ range over (true, false} and let it denote the presence or absence of any n3 that unambiguously attaches to the nl or n2 in question. If C/ = true when any n3 unambiguously attaches to nl, then p(C/ = true \[ nl) is the conditional probability that a particular nl occurs with an unambiguously attached n3. Now p(a I nl,n2) can be approximated as:</Paragraph> <Paragraph position="5"> where the normalization factor, Z(nl,n2) = p(true I nl) + p(true I n2). The reasoning behind this approximation is that the tendency of a CP to attach high (low) is related to the tendency of the nl (n2) in question to appear in an unambiguous CP in the training data.</Paragraph> <Paragraph position="6"> We approximate p(n3la, nl, n2) as follows: p(n3 I a = H, nl, n2) ~ p(n3 I true, nl) p(n3 I a = L, nl, n2) ~ p(n3 I true, n2) The reasoning behind this approximation is that when generating n3 given high (low) attachment, the only counts from the training data that matter are those which unambiguously attach to nl (n2), i.e., C/ = true. Word statistics from the extracted CPs are used to formulate these probabilities.</Paragraph> <Section position="1" start_page="611" end_page="611" type="sub_section"> <SectionTitle> 4.1 Generate C/ </SectionTitle> <Paragraph position="0"> The conditional probabilities p(truelnl) and p(true I n2) denote the probability of whether a noun will appear attached unambiguously to some n3. These probabilities are estimated as:</Paragraph> <Paragraph position="2"> where f(n2, true) is the number of times n2 appears in an unambiguously attached CP in the training data and f(n2) is the number of times this noun has appeared as either nl, n3, or ncc in the training data.</Paragraph> </Section> <Section position="2" start_page="611" end_page="611" type="sub_section"> <SectionTitle> 4.2 Generate n3 </SectionTitle> <Paragraph position="0"> The terms p(n3 I nl, true) and p(n3 I n2, true) denote the probabilies that the noun n3 appears attached unambiguously to nl and n2 respectively. Bigram counts axe used to compute these as follows:</Paragraph> <Paragraph position="2"> where N is the set of all n3s and nets that occur in the training data.</Paragraph> </Section> </Section> class="xml-element"></Paper>