File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/w99-0631_metho.xml

Size: 12,203 bytes

Last Modified: 2025-10-06 14:15:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0631">
  <Title>An Iterative Approach to Estimating Frequencies over a Semantic Hierarchy</Title>
  <Section position="4" start_page="258" end_page="259" type="metho">
    <SectionTitle>
2 The Input Data and
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="258" end_page="259" type="sub_section">
      <SectionTitle>
Semantic Hierarchy
</SectionTitle>
      <Paragraph position="0"> The input data used to estimate frequencies and probabilities over the semantic hierarchy has been obtained from the shallow parser described in Briscoe and Carroll (1997). The data consists of a multiset of 'co-occurrence triples', each triple consisting of a noun lemma, verb lemma, and argument position.</Paragraph>
      <Paragraph position="1"> We refer to the data as follows: let the universe of verbs, argument positions and nouns that can appear in the input data be denoted = {Vl,... ,Vkv }, 1Z---- {rl,... ,rkn} and Af = {nl,... , nk~C/ }, respectively. Note that in our treatment of selectional restrictions, we do not attempt to distinguish between alternative senses of verbs. We also assume that each instance of a noun in the data refers to one, and only one, concept.</Paragraph>
      <Paragraph position="2"> We use the noun hypernym taxonomy of WordNet, version 1.6, as our semantic hierarchy. 4 Let C = {Cl,...,Ckc } be the set of concepts in WordNet. There are approximately 66,000 different concepts. A concept is represented in WordNet by a 'synonym set' (or 'synset'), which is a set of synonymous words which can be used to denote that concept. For example, the concept 'nut', as in a crazy person, is represented by the following synset: {crackpot, crank, nut, nutcase, fruitcake, screwball}.</Paragraph>
      <Paragraph position="3"> Let syn(c) C Af be the synset for the concept c, and let an(n) = { c In 6 syn(c) } be the set of concepts that can be denoted by the noun n. The fact that some nouns are ambiguous means that the synsets are not necessarily disjoint.</Paragraph>
      <Paragraph position="4"> 4There are other taxonomies in WordNet, but we only use the noun taxonomy. Hence, from now on, when we talk of concepts in WordNet, we mean concepts in the noun taxonomy only.</Paragraph>
      <Paragraph position="5">  The hierarchy has the structure of a directed acyclic graph, 5 with the isa C C xC relation connecting nodes in the graph, where (d,c) * isa implies d is a kind of c. Let isa* C C x C be the transitive, reflexive closure of isa; and let ~= { c' l (d,c ) * isa* } be the set consisting of the concept c and all of its hyponyms. The set &lt;:food&gt; contains all the concepts which are kinds of food, ineluding &lt;food&gt;.</Paragraph>
      <Paragraph position="6"> Note that words in our data can appear in synsets anywhere in the hierarchy. Even concepts such as &lt;entity&gt;, which appear near the root of the hierarchy, have synsets containing words which may appear in the data. The synset for &lt;entity&gt; is {entity, something}, and the words entity and something may well appear in the argument positions of verbs in the corpus. Furthermore, for a concept c, we distinguish between the set of words that can be used to denote c (the synset of c), and the set of words that can be used to denote concepts in L 6</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="259" end_page="259" type="metho">
    <SectionTitle>
3 The Measure of Association
</SectionTitle>
    <Paragraph position="0"> We measure the association between argument positions of verbs and sets of concepts using the association norm (Abe and Li, 1996). 7 For C C C, v E Vandr E 7~, the association norm is defined as follows: A(C, v, r) - p(CIv' r) p(CI ) For example, the association between the object position of eat and the set of concepts denoting kinds of food is expressed as follows: A(&lt;food&gt;, eat, object). Note that, for  standard terminology by refering to this second set as the synsets of c.</Paragraph>
    <Paragraph position="1"> 7This work restricts itself to verbs, but can be extended to other kinds of predicates that take nouns as arguments, such as adjectives.</Paragraph>
    <Paragraph position="2"> C c C, p(C\]v,r) is just the probability of the disjunction of the concepts in C; that is,</Paragraph>
    <Paragraph position="4"> In order to see how p(clv ,r) relates to the input data, note that given a concept c, verb v and argument position r, a noun can be generated according to the distribution p(n\[c, v, r), where</Paragraph>
    <Paragraph position="6"> Note that for c C/ cn(n), p(nlc, v, r) = O.</Paragraph>
    <Paragraph position="7"> The association norm (and similar measures such as the mutual information score) have been criticised (Dunning, 1993) because these scores can be greatly over-estimated when frequency counts are low. This problem is overcome to some extent in the scheme presented below since, generally speaking, we only calculate the association norms for concepts that have accumulated a significant count.</Paragraph>
    <Paragraph position="8"> The association norm can be estimated using maximum likelihood estimates of the probabilities as follows.</Paragraph>
    <Paragraph position="10"/>
  </Section>
  <Section position="6" start_page="259" end_page="261" type="metho">
    <SectionTitle>
4 Estimating Frequencies
</SectionTitle>
    <Paragraph position="0"> Let freq(c, v,r), for a particular c, v and r, be the number of (n, v, r) triples in the data in which n is being used to denote c, and let freq(v, r) be the number of times verb v appears with something in position r in the data; then the relevant maximum likelihood estimates, for c E C, v E 12, r E 7~, are as  follows.</Paragraph>
    <Paragraph position="2"> Since we do not have sense disambiguated data, we cannot obtain freq(c, v, r) by simply counting senses. The standard approach is to estimate freq(c, v, r) by distributing the count for each noun n in syn(c) evenly among all senses of the noun as follows:</Paragraph>
    <Paragraph position="4"> where freq(n, v, r) is the number times the triple (n,v,r) appears in the data, and \[ cn(n)\] is the cardinality of an(n).</Paragraph>
    <Paragraph position="5"> Although this approach can give inaccurate estimates, the counts given to the incorrect senses will disperse randomly throughout the hierarchy as noise, and by accumulating counts up the hierarchy we will tend to gather counts from the correct senses of related words (Yarowsky, 1992; Resnik, 1993). To see why, consider two instances of possible triples in the data, drink wine and drink water. (This example is adapted from Resnik (1993).) The word water is a member of seven synsets in WordNet 1.6, and wine is a member of two synsets. Thus each sense of water will be incremented by 0.14 counts, and each sense of wine will be incremented by 0.5 counts. Now although the incorrect senses of these words will receive counts, those concepts in the hierarchy which dominate more than one of the senses, such as &lt;beverage&gt;, will accumulate more substantial counts.</Paragraph>
    <Paragraph position="6"> However, although counts tend to accumulate in the right places, counts can be greatly underestimated. In the previous example, freq(&lt;beverage&gt;,drink, object) is incremented by only 0.64 counts from the two data instances, rather than the correct value of 2.</Paragraph>
    <Paragraph position="7"> The approach explored here is to use the accumulated counts in the following re-estimation procedure. Given some verb v and position r, for each concept c we have the following initial estimate, in which the counts for a noun are distributed evenly among all of its senses:</Paragraph>
    <Paragraph position="9"> Given the assumption that counts from the related senses of words that can fill position r of verb v will accumulate at hypernyms of c, let top(c, v, r) be the hypernym of c (or possibly c itself) that most accurately represents this set of related senses. In other words, top(c, v, r) will be an approximation of the set of concepts related to c that fill position r of verb v. Rather than splitting the counts for a noun n evenly among each of its senses c E cn(n), we distribute the counts for n on the basis of the accumulated counts at top(c, v, r) for each c E cn(n). In the next section we discuss a method for finding top(c, v, r), but first we complete the description of how the re-estimation process uses the accumulated counts at top(c, v, r).</Paragraph>
    <Paragraph position="10"> Given a concept c, verb v and position r, in the following formula we use \[c, v, r\] to denote the set of concepts top(c, v, r). The re- _ ^ rn+l. estimated frequency treq (c, v, r) is given as follows.</Paragraph>
    <Paragraph position="11"> fr rn+l. eq (c, v, r) =</Paragraph>
    <Paragraph position="13"> Note that only nouns n in syn(c) contribute to the count for c. The count freq(n, v, r) is split among all concepts in</Paragraph>
    <Paragraph position="15"/>
  </Section>
  <Section position="7" start_page="261" end_page="262" type="metho">
    <SectionTitle>
5 Determining top(c,v,r)
</SectionTitle>
    <Paragraph position="0"> The technique for calculating top(c, v, r) is based on the assumption that a hypernym d of c is too high in the hierarchy to be top(e, v, r)if the children of e' are not sufficiently homogeneous with respect to v and r. A set of concepts, C, is taken to be homogeneous with respect to a given v E and r 6 7~, ifp(vl~ , r) has a similar value for each c 6 C. Note that this is equivalent to comparing association norms since</Paragraph>
    <Paragraph position="2"> and, as we are considering homogeneity for a given verb and argument position, p(vlr ) is a constant.</Paragraph>
    <Paragraph position="3"> To determine whether a set of concepts is homogeneous, we apply a X 2 test to a contingency table of frequency counts. Table 1 shows frequencies for the children of &lt;nutriment&gt; in the object position of eat, and the figures in brackets are the expected values, based on the marginal totals in the table.</Paragraph>
    <Paragraph position="4"> Notice that we use the freq0 counts in the table. A more precise method, that we intend to explore, would involve creating a new table for each freqm , m &gt; 0, and recalculating top(c, v, r) after each iteration. A more significant problem of this approach is that by considering p(v\]~, r), we are not taking into account the possibility that some concepts are associated with more verbs than others. In further work, we plan to consider alternative ways of comparing levels of association. null The null hypothesis of the test is that p(vl~ , r) is the same for each c in the table. For example, in Table 1 the null hypothesis is that for every concept c that is a child of &lt;nutriment&gt;, the probability of some concept d 6 ~ being eaten, given that it is the object of some verb, is the same. For the experiments described in Section 6, we used 0.05 as the level of significance. Further work will investigate the effect that different levels of significance have on the estimated frequencies. null The X 2 statistic corresponding to Table 1 (v, c) Hypernyms of c  is 4.8. We use the log-likelihood X ~ statistic, rather than the Pearson's X 2 statistic, as this is thought to be more appropriate when the counts in the contingency table are low (Dunning, 1993). 8 For a significance level of 0.05, with 4 degrees of freedom, the critical value is 9.49 (Howell, 1997). Thus in this case, the null hypothesis (that the children of &lt;nutriment&gt; are homogeneous with respect to eat) would not be rejected.</Paragraph>
    <Paragraph position="5"> Given a verb v and position r, we compute top(c,v,r) for each c by determining the homogeneity of the children of the hypernyms of c. Initially, we let top(c, v, r) be the concept c itself. We work from c up the hierarchy reassigning top(c, v, r) to be successive hypernyms of c until we reach a hypernym whose children are not sufficiently homogeneous. In situations where a concept has more than one parent, we consider the parent which results in the lowest X 2 value as this indicates the highest level of homogeneity. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML