File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-0209_intro.xml
Size: 2,398 bytes
Last Modified: 2025-10-06 14:06:23
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0209"> <Title>Selectional Preference and Sense Disambiguation</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> It has long been observed that selectional constraints and word sense disambiguation are closely linked. Indeed, the exemplar for sense disambiguation in most computational settings (e.g., see Allen's (1995) discussion) is Katz and Fodor's (1964) use of Boolean selection restrictions to constrain semantic interpretation. For example, Mthough burgundy can be interpreted as either a color or a beverage, only the latter sense is available in the context of Mary drank burgundy, because the verb drink specifies the selection restriction +LIQUID for its direct objects.</Paragraph> <Paragraph position="1"> Problems with this approach arise, however, as soon as the domain of interest becomes too large or too rich to specify semantic features and selection restrictions accurately by hand. This paper concerns the use of selectional constraints for automatic sense disambiguation in such broad-coverage settings. The approach combines statistical and knowledge-based methods, but unlike many recent corpus-based approaches to sense disambiguation (Y=arowsky, 1993; Bruce and Wiebe, 1994; Miller et al., 1994), it takes as its starting point the assumption that sense-annotated training text is not available. Motivating this assumption is not only the limited availability of such text at present, but skepticism that the situation will change any time soon. In marked contrast to annotated training material for part-of-speech tagging, (a) there is no coarse-level set of sense distinctions widely agreed upon (whereas part-of-speech tag sets tend to differ in the details); (b) sense annotation has a comparatively high error rate (Miller, personal communication, reports an upper bound for human annotators of around 90% for ambiguous cases, using a non-blind evaluation method that may make even this estimate overly optimistic); and (c) no fully automatic method provides high enough quality output to support the &quot;annotate automatically, correct manually&quot; methodology used to provide high volume annotation by data providers like the Penn Treebank project (Marcus et al., 1993).</Paragraph> </Section> class="xml-element"></Paper>