File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/97/w97-0201_abstr.xml
Size: 1,501 bytes
Last Modified: 2025-10-06 13:49:03
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0201"> <Title>Getting Serious about Word Sense Disambiguation</Title> <Section position="2" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> Recent advances in large-scale, broad coverage part-of-speech tagging and syntactic parsing have been achieved in no small part due to the availability of large amounts of online, human-annotated corpora. In this paper, I argue that a large, human sense-tagged corpus is also critical as well as necessary to achieve broad coverage, high accuracy word sense disambiguation, where the sense distinction is at the level of a good desk-top dictionary such as WORD-NET. Using the sense-tagged corpus of 192,800 word occurrences reported in (Ng and Lee, 1996), I examine the effect of the number of training examples on the accuracy of an exemplar-based classifier versus the base-line, most-frequent-sense classitier. I also estimate the amount of human sense-tagged corpus and the manual annotation effort needed to build a largescale, broad coverage word sense disambiguation program which can significantly out-perform the most-frequent-sense classifier.</Paragraph> <Paragraph position="1"> Finally, I suggest that intelligent example selection techniques may significantly reduce the amount of sense-tagged corpus needed and offer this research problem as a fruitful area for word sense disambiguation research.</Paragraph> </Section> class="xml-element"></Paper>