File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/w01-0703_metho.xml
Size: 9,006 bytes
Last Modified: 2025-10-06 14:07:41
<?xml version="1.0" standalone="yes"?> <Paper uid="W01-0703"> <Title>Learning class-to-class selectional preferences</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Selectional preference learning </SectionTitle> <Paragraph position="0"> Selectional preferences try to capture the fact that linguistic elements prefer arguments of a certain semantic class, e.g. a verb like 'eat' prefers as object edible things, and as subject animate entities, as in, (1) &quot;She was eating an apple&quot;. Selectional preferences get more complex than it might seem: (2) &quot;The acid ate the metal&quot;, (3) &quot;This car eats a lot of gas&quot;, (4) &quot;We ate our savings&quot;, etc.</Paragraph> <Paragraph position="1"> Corpus-based approaches for selectional preference learning extract a number of (e.g.</Paragraph> <Paragraph position="2"> verb/subject) relations from large corpora and use an algorithm to generalize from the set of nouns for each verb separately. Usually, nouns are generalized using classes (concepts) from a lexical knowledge base (e.g. WordNet).</Paragraph> <Paragraph position="3"> Resnik (1992, 1997) defines an information-theoretic measure of the association between a verb and nominal WordNet classes: selectional association. He uses verb-argument pairs from Brown. Evaluation is performed applying intuition and WSD. Our measure follows in part from his formalization.</Paragraph> <Paragraph position="4"> Abe and Li (1995) follow a similar approach, but they employ a different information-theoretic measure (the minimum description length principle) to select the set of concepts in a hierarchy that generalize best the selectional preferences for a verb. The argument pairs are extracted from the WSJ corpus, and evaluation is performed using intuition and PP-attachment resolution.</Paragraph> <Paragraph position="5"> Stetina et al. (1998) extract word-arg-word triples for all possible combinations, and use a measure of &quot;relational probability&quot; based on frequency and similarity. They provide an algorithm to disambiguate all words in a sentence. It is directly applied to WSD with good results.</Paragraph> </Section> <Section position="4" start_page="0" end_page="1" type="metho"> <SectionTitle> 3 Our approach </SectionTitle> <Paragraph position="0"> The model explored in this paper emerges as a result of the following observations: * Distinguishing verb senses can be useful.</Paragraph> <Paragraph position="1"> The examples for eat above are taken from WordNet, and each corresponds to a different word sense : example (1) is from the &quot;take in solid food&quot; sense of eat, (2) from the &quot;cause to rust&quot; sense, and examples (3) and (4) from the &quot;use up&quot; sense.</Paragraph> <Paragraph position="2"> * If the word senses of a set of verbs are similar (e.g. word senses of ingestion verbs like eat, devour, ingest, etc.) they can have related selectional preferences, and we can generalize and say that a class of verbs has a particular selectional preference.</Paragraph> <Paragraph position="3"> Our formalization thus distinguishes among verb senses, that is, we treat each verb sense as a A note is in order to introduce the terminology used in the paper. We use concept and class indistinguishably, and they refer to the so-called synsets in WordNet. Concepts in WordNet are represented as sets of synonyms, e.g. <food, nutrient>. A word sense in WordNet is a word-concept pairing, e.g. given the concepts a=<chicken, poulet, volaille> and b=<wimp, chicken, crybaby> we can say that chicken has at least two word senses, the pair chickena and the pair chicken-b. In fact the former is sense 1 of chicken, and the later is sense 3 of chicken. For the sake of simplicity we also talk about <chicken, poulet, volaille> being a word sense of chicken.</Paragraph> <Paragraph position="4"> different unit that has a particular selectional preference. From the selectional preferences of single verb word senses, we also infer selectional preferences for classes of verbs.</Paragraph> <Paragraph position="5"> Contrary to other methods (e.g. Li and Abe's), we don't try to find the classes which generalize best the selectional preferences. All possibilities, even the very low probability ones, are stored.</Paragraph> <Paragraph position="6"> The method stands as follows: we collect [noun-word-sense relation verb-word-sense] triples from Semcor, where the relation is either subject or object. As word senses refer to concepts, we also collect the triple for each possible combination of concepts that subsume the word senses in the triple. Direct frequencies and estimates of frequencies for classes are then used to compute probabilities for the triples.</Paragraph> <Paragraph position="7"> These probabilities could be used to disambiguate either nouns, verbs or both at the same time. For the time being, we have chosen to disambiguate nouns only, and therefore we compute the probability for a nominal concept, given that it is the subject/object of a particular verb. Note that when disambiguating we ignore the particular sense in which the governing verb occurs.</Paragraph> </Section> <Section position="5" start_page="1" end_page="2" type="metho"> <SectionTitle> 4 Formalization </SectionTitle> <Paragraph position="0"> As mentioned in the previous sections we are interested in modelling the probability of a nominal concept given that it is the subject/object of a particular verb:</Paragraph> <Paragraph position="2"> Before providing the formalization for our approach we present a model based on words and a model based on nominal-classes. Our class-to-class model is an extension of the second . The estimation of the frequencies of classes are presented in the following section. Notation: v stands for a verb, cn (cv) stand for nominal</Paragraph> <Paragraph position="4"> ) stands for the concept linked to the i-th sense of the given noun (verb), rel could be any grammatical relation (in our case object or subject), [?] stands for the subsumption relation, fr stands for frequency and rf ^ .for the estimation of the frequencies of classes.</Paragraph> <Section position="1" start_page="2" end_page="2" type="sub_section"> <SectionTitle> 4.1 Word-to-word model: eat chicken </SectionTitle> <Paragraph position="0"/> <Paragraph position="2"> At this stage we do not use information of class subsumption. The probability of the first sense of chicken being an object of eat depends on how often does the concept linked to chicken appear as object of the word eat, divided by the number of occurrences of eat with an object.</Paragraph> <Paragraph position="4"> , as we count occurrences of concepts rather than word senses. This means that synonyms also count, e.g. poulet as synonyms of the first sense of chicken.</Paragraph> </Section> <Section position="2" start_page="2" end_page="2" type="sub_section"> <SectionTitle> 4.2 word-to-class model: </SectionTitle> <Paragraph position="0"> eat <food, nutrient> The probability of eat chicken depends on the probabilities of the concepts subsumed by and subsuming chicken being objects of eat. For instance, if chicken never appears as an object of eat, but other word senses under <food, nutrient> do, the probability of chicken</Paragraph> <Paragraph position="2"> given the more general concept times the probability of the more general concept being a subject/object of the verb is added. The first probability is estimated dividing the class frequencies of cn i with the class frequencies of the more general concept. The second probability is estimated as in 4.1.</Paragraph> </Section> <Section position="3" start_page="2" end_page="2" type="sub_section"> <SectionTitle> 4.3 class-to-class model: </SectionTitle> <Paragraph position="0"> <ingest, take in, ...> <food, nutrient> The probability of eat chicken depends on the probabilities of all concepts above chicken being objects of all concepts above the possible senses of eat. For instance, if devour never appeared on the training corpus, the model could infer its selectional preference from that of its</Paragraph> <Paragraph position="2"/> <Paragraph position="4"> superclass <ingest, take in, ...>. As the verb can be polysemous, the sense with maximum probability is selected.</Paragraph> <Paragraph position="5"> Formula (4) shows that the maximum probability for the possible senses (cv j ) of the verb is taken. For each possible verb concept (cv) and noun concept (cn) subsuming the target</Paragraph> <Paragraph position="7"> ), the probability of the target concept given the subsuming concept (this is done twice, once for the verb, once for the noun) times the probability the nominal concept being subject/object of the verbal concept is added.</Paragraph> </Section> </Section> class="xml-element"></Paper>