File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-0808_intro.xml
Size: 3,641 bytes
Last Modified: 2025-10-06 14:06:28
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0808"> <Title>Word Sense Disambiguation for Acquisition of Selectional Preferences</Title> <Section position="3" start_page="52" end_page="52" type="intro"> <SectionTitle> 2 Previous Work </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="52" end_page="52" type="sub_section"> <SectionTitle> 2.1 Selectional preference acquisition </SectionTitle> <Paragraph position="0"> Other approaches to selectional preference acquisition closely related to this are those of Resnik (Resnik, 1993b, 1993a) Ribas (1994, 1995), and Li and Abe (Li & Abe, 1995; Abe & Li, 1996) 2.</Paragraph> <Paragraph position="1"> All use a class based approach using the WordNet hypernym hierarchy (Beckwith, Felbaum, Gross, & Miller, 1991) as the noun classification and finding selectional preferences as sets of disjoint noun classes (not related as descendants of one another) within the hierarchy. The key to obtaining good selectional preferences is obtaining classes at an appropriate level of generalisation. These researchers also use variations on the association score given in equation 1, the log of which gives the measure from information theory known as mutual information.</Paragraph> <Paragraph position="2"> This measures the &quot;relatedness&quot; between two words, or in the class-based work on selectional preferences between a class (c) and the predicate (v).</Paragraph> <Paragraph position="4"> In comparison to the conditional distribution p(clv ) of the predicate (v) and noun class (c) the association score takes into account the marginal distribution of the noun class so that higher values are not obtained because the noun happens to occur 2I shall refer to the work in papers (Li & Abe, 1995) and (Abe & Li, 1996) as &quot;Li and Abe&quot; throughout, since the two pieces of work relate to each other and both involve the same two authors more frequently irrespective of context. The conditional distribution might, for example, bias a class containing &quot;people&quot; as the direct object of &quot;fly&quot; in comparison to the class of &quot;BIRDS&quot; simply because the &quot;PEOPLE&quot; class occurs more in the corpus to start with.</Paragraph> <Paragraph position="5"> l~esnik and Pdbas both search for disjoint classes with the highest score. Since smaller classes will fit the predicate better and will hence have a higher association value they scale up the mutual information value by the conditional distribution giving the association score in equation 2. The conditional distribution will be larger for larger classes and so in this way they hope to obtain an appropriate level of generallsation.</Paragraph> <Paragraph position="6"> A(v,c) P(clv) = e(clvllog 1- (21 The work described here uses the approach of Li and Abe who rather than modifying the association score use a principle of data compression from information theory to find the appropriate level of generalisation. This principle is known as the Minimum Description Length (MDL) principle. In their approach selectional preferences are represented as a set of classes or a &quot;tree cut&quot; across the hierarchy which dominates all the leaf nodes exhaustively and disjointly. The tree cut features in a model, termed an &quot;association tree cut model&quot; (ATCM) which identifies an association score for each of the classes in the cut. In the MDL principle the best model is taken to be the one that minimises the sum of:</Paragraph> </Section> </Section> class="xml-element"></Paper>