File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0209_metho.xml

Size: 7,874 bytes

Last Modified: 2025-10-06 14:14:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0209">
  <Title>Selectional Preference and Sense Disambiguation</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Selectional Preference as
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Statistical Association
</SectionTitle>
      <Paragraph position="0"> The treatment of selectional preference used here is that proposed by Resnik (1993a; 1996), combining statistical and knowledge-based methods. The basis of the approach is a probabilistic model capturing the co-occurrence behavior of predicates and conceptual classes in the taxonomy. The intuition is illustrated in Figure 1. The prior distribution PrR(c) captures the probability of a class occurring as the argument in predicate-argument relation R, regardless of the identity of the predicate. For example, given the verb-subject relationship, the prior probability for (person) tends to be significantly higher than the prior probability for (insect). However, once the identity of the predicate is taken into account, the probabilities can change -- if the verb is buzz, then the probability for (insect) Can be expected to be higher than its prior, and (person) will likely be lower. In probabilistic terms, it is the difference between this conditional or posterior distribution and the prior distribution that determines selectional preference.</Paragraph>
      <Paragraph position="1"> Information theory provides an appropriate way to quantify the difference between the prior and posterior distributions, in the form of relative entropy (Kullback and Leibler, 1951). The model defines the selectional preference strength of a predicate as: *</Paragraph>
      <Paragraph position="3"> Intuitively, SR(p) measures how much information, in bits, predicate p provides about the conceptual class of its argument. The better Pr(c) approximates Pr(cip), the leas influence p is having on its argument, and therefore the less strong its selectional preference.</Paragraph>
      <Paragraph position="4"> Given this definition, a natural way to characterize the &amp;quot;semantic fit&amp;quot; of a particular class as the argument to a predicate is by its relative contribution to the overall selectional preference strength. In particular, classes that fit very well can be expected to have higher posterior probabilities, compared to their priors, as is the case for (insect) in Figure 1.</Paragraph>
      <Paragraph position="5"> Formally, selectional association is defined as: Am(p, c) -- 1 Pr(c\[p) Pr(c\[p) log Pr(c) &amp;quot; This model of selectional preference has turned out to make reasonable predictions about human judgments of argument plausibility obtained by psycholinguistic methods (Resnik, 1993a). Closely related proposals have been applied in syntactic disambiguation (Resnik, 1993b; Lauer, 1994) and to automatic acquisition of more KatzFodoresque selection restrictions in the form of weighted disjunctions (Ribas, 1994). The selectional association has also been used recently to explore apparent cases of syntactic optionality (Paola Merlo, personal communication). null</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Estimation Issues
</SectionTitle>
    <Paragraph position="0"> If taxonomic classes were labeled explicitly in a training corpus, estimation of probabilities in the model would be fairly straightforward. But since text corpora contain words, not classes, it is necessary to treat each occurrence of a word in an argument position as if it might represent any of the conceptual classes to which it belongs, and assign frequency counts accordingly. At present, this is done by distributing the &amp;quot;credit&amp;quot; for an observation uniformly across all the conceptual classes containing an observed argument. Formally, given a predicate-argument relationship R (for example, the verb-object relationship), a predicate p, and a conceptual class c, ~'~ count (p, w) freqR(p,c) ~ ~ ~ ' tvEc where countR(p, w) is the number of times word w was observed as the argument of p with respect to R, and classes(w) is the number of taxonomic classes to which w belongs. Given the frequencies, probabilities are currently estimated using maximum likelihood; the use of word classes is itself a form of smoothing (cf. Pereira et al. (1993)). I This estimation method is similar to that used by Yarowsky (1992) for Roget's thesaurus categories, and works for similar reasons. As an example, consider two instances of the verb-object relationship in a training corpus, drink coffee and drink wine. Coffee has 2 senses in the WordNet 1.4 noun taxonomy, and belongs to 13 classes in all, and wine has 2 senses and belongs to a total of 16 classes. This means that the observed countverb_obj(drink , coffee) = 1 will be distributed by adding 1-~ to the joint frequency with drink for each of the 13 classes containing coffee. Similarly, the joint frequency with drink will be incremented by ~ for each of the 16 classes containing wine. Crucially, although each of the two words is ambiguous, only those taxonomic classes containing both words -- e.g., (beverage) -receive credit for both observed instances. In general, because different words are ambiguous in different ways, credit tends to accumulate in the taxonomy only in those classes for which there is real evidence of co-occurrence; the rest tends to disperse unsystematically, resulting primarily in noise. Thus, despite the absence of class annotation in the training text, it is still possible to arrive at a usable estimate of class-based probabilities.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="53" type="metho">
    <SectionTitle>
4 An Unsupervised Method for
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="53" type="sub_section">
      <SectionTitle>
Sense Disambiguation
</SectionTitle>
      <Paragraph position="0"> Table 1 presents a selected sample of Resnik's (1993a) comparison with argument plausibility judgments made by human subjects. What is most interesting here is the way in which strongly selecting 1Word w is typically the head of a noun phrase, which could lead the model astray -- for example, toy soldiers behave differently from soldiers (McCawley, 1968). In principle, addressing this issue requires that noun phrases be mapped to taxonomic classes based on their compositional interpretation; however, such complications rarely axise in practice.</Paragraph>
      <Paragraph position="1">  verbs &amp;quot;choose&amp;quot; the sense of their arguments. For example, letter has 3 senses in WordNet, 2 and belongs to 19 classes in all. In order to approximate its plausibility as the object of wrfle, the selectional association with wrote was computed for all I9 classes, and the highest value returned ~ in this case, (writing) (&amp;quot;anything expressed in letters; reading matter&amp;quot;). Since only one sense of letter has this class as an ancestor, this method of determining argument plausibility has, in essence, performed sense disambiguation as a side effect.</Paragraph>
      <Paragraph position="2"> This observation suggests the following simple algorithm for disambignation by selectional preference. Let n be a noun that stands in relationship R to predicate p, and let {sl, ..., st} be its possible senses. For i from 1 to h, compute: C, = {clc is an ancestor ofsi} as = max AR(p,c) cEC~ and assign as as the score for sense st. The simplest way to use the resulting scores, following Miller et al. (1994), is as follows: if n has only one sense, select it; otherwise select the sense st for which at is greatest, breaking ties by random choice.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML