File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/95/p95-1007_intro.xml

Size: 8,188 bytes

Last Modified: 2025-10-06 14:05:51

<?xml version="1.0" standalone="yes"?>
<Paper uid="P95-1007">
  <Title>Tagged Dependency Tagged Adjacency o I I</Title>
  <Section position="3" start_page="48" end_page="49" type="intro">
    <SectionTitle>
2 Method
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="48" end_page="48" type="sub_section">
      <SectionTitle>
2.1 Extracting a Test Set
</SectionTitle>
      <Paragraph position="0"> A test set of syntactically ambiguous noun compounds was extracted from our 8 million word Grolier's encyclopedia corpus in the following way. 2 Because the corpus is not tagged or parsed, a somewhat conservative strategy of looking for unambiguous sequences of nouns was used. To distinguish nouns from other words, the University of Pennsylvania morphological analyser (described in Karp et al, 1992) was used to generate the set of words that can only be used as nouns (I shall henceforth call this set AZ). All consecutive sequences of these words were extracted, and the three word sequences used to form the test set. For reasons made clear below, only sequences consisting entirely of words from Roget's thesaurus were retained, giving a total of 308 test triples. 3 These triples were manually analysed using as context the entire article in which they appeared. In  and is in the public domain.</Paragraph>
      <Paragraph position="1"> some cases, the sequence was not a noun compound (nouns can appear adjacent to one another across various constituent boundaries) and was marked as an error. Other compounds exhibited what Hindie and Rooth (1993) have termed SEMANTIC INDE-TERMINACY where the two possible bracketings cannot be distinguished in the context. The remaining compounds were assigned either a left-branching or right-branching analysis. Table 1 shows the number of each kind and an example of each.</Paragraph>
      <Paragraph position="2"> Accuracy figures in all the results reported below were computed using only those 244 compounds which received a parse.</Paragraph>
    </Section>
    <Section position="2" start_page="48" end_page="48" type="sub_section">
      <SectionTitle>
2.2 Conceptual Association
</SectionTitle>
      <Paragraph position="0"> One problem with applying lexical association to noun compounds is the enormous number of parameters required, one for every possible pair of nouns. Not only does this require a vast amount of memory space, it creates a severe data sparseness problem since we require at least some data about each parameter. Resnik and Hearst (1993) coined the term CONCEPTUAL ASSOCIATION to refer to association values computed between groups of words. By assuming that all words within a group behave similarly, the parameter space can be built in terms of the groups rather than in terms of the words.</Paragraph>
      <Paragraph position="1"> In this study, conceptual association is used with groups consisting of all categories from the 1911 version of Roget's thesaurus. 4 Given two thesaurus categories tl and t~, there is a parameter which represents the degree of acceptability of the structure \[nine\] where nl is a noun appearing in tl and n2 appears in t2. By the assumption that words within a group behave similarly, this is constant given the two categories. Following Lauer and Dras (1994) we can formally write this parameter as Pr(tl ~ t2) where the event tl ~ t2 denotes the modification of a noun in t2 by a noun in tl.</Paragraph>
    </Section>
    <Section position="3" start_page="48" end_page="49" type="sub_section">
      <SectionTitle>
2.3 Training
</SectionTitle>
      <Paragraph position="0"> To ensure that the test set is disjoint from the training data, all occurrences of the test noun compounds have been removed from the training corpus.</Paragraph>
      <Paragraph position="1">  study, both unsupervised. The first employs a pattern that follows Pustejovsky (1993) in counting the occurrences of subcomponents. A training instance is any sequence of four words WlW2W3W 4 where wl, w4 ~ .h/and w2, w3 E A/'. Let county(n1, n2) be the number of times a sequence wlnln2w4 occurs in the training corpus with wl, w4 ~ At'.</Paragraph>
      <Paragraph position="2"> The second type uses a window to collect training instances by observing how often a pair of nouns co-occur within some fixed number of words. In this study, a variety of window sizes are used. For n &gt; 2, let countn(nl, n2) be the number of times a sequence nlwl...wins occurs in the training corpus where i &lt; n - 2. Note that windowed counts are asymmetric. In the case of a window two words wide, this yields the mutual information metric proposed by Liberman and Sproat (1992).</Paragraph>
      <Paragraph position="3"> Using each of these different training schemes to arrive at appropriate counts it is then possible to estimate the parameters. Since these are expressed in terms of categories rather than words, it is necessary to combine the counts of words to arrive at estimates. In all cases the estimates used are:</Paragraph>
      <Paragraph position="5"> Here ambig(w) is the number of categories in which w appears. It has the effect of dividing the evidence from a training instance across all possible categories for the words. The normaliser ensures that all parameters for a head noun sum to unity.</Paragraph>
    </Section>
    <Section position="4" start_page="49" end_page="49" type="sub_section">
      <SectionTitle>
2.4 Analysing the Test Set
</SectionTitle>
      <Paragraph position="0"> Given the high level descriptions in section 1.3 it remains only to formalise the decision process used to analyse a noun compound. Each test compound presents a set of possible analyses and the goal is to choose which analysis is most likely. For three word compounds it suffices to compute the ratio of two probabilities, that of a left-branching analysis and that of a right-branching one. If this ratio is greater than unity, then the left-branching analysis is chosen. When it is less than unity, a right-branching analysis is chosen. ~ If the ratio is exactly unity, the analyser guesses left-branching, although this is fairly rare for conceptual association as shown by the experimental results below.</Paragraph>
      <Paragraph position="1"> For the adjacency model, when the given compound is WlW2W3, we can estimate this ratio as:</Paragraph>
      <Paragraph position="3"> For the dependency model, the ratio is:</Paragraph>
      <Paragraph position="5"> In both cases, we sum over all possible categories for the words in the compound. Because the dependency model equations have two factors, they are affected more severely by data sparseness. If the probability estimate for Pr(t2 ~ t3) is zero for all possible categories t2 and t3 then both the numerator and the denominator will be zero. This will conceal any preference given by the parameters involving Q. In such cases, we observe that the test instance itself provides the information that the event t2 --~ t3 can occur and we recalculate the ratio using Pr(t2 ---* t3) = k for all possible categories t2,t a where k is any non-zero constant. However, no correction is made to the probability estimates for Pr(tl --~ t2) and Pr(Q --* t3) for unseen cases, thus putting the dependency model on an equal footing with the adjacency model above.</Paragraph>
      <Paragraph position="6"> The equations presented above for the dependency model differ from those developed in Lauer and Dras (1994) in one way. There, an additional weighting factor (of 2.0) is used to favour a left-branching analysis. This arises because their construction is based on the dependency model which predicts that left-branching analyses should occur twice as often.</Paragraph>
      <Paragraph position="7"> Also, the work reported in Lauer and Dras (1994) uses simplistic estimates of the probability of a word given its thesaurus category. The equations above assume these probabilities are uniformly constant.</Paragraph>
      <Paragraph position="8"> Section 3.2 below shows the result of making these two additions to the method.</Paragraph>
      <Paragraph position="9"> sit either probability estimate is zero, the other analysis is chosen. If both are zero the analysis is made as if the ratio were exactly unity.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML