File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/c04-1177_concl.xml
Size: 2,339 bytes
Last Modified: 2025-10-06 13:53:57
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1177"> <Title>Automatic Identification of Infrequent Word Senses</Title> <Section position="6" start_page="0" end_page="0" type="concl"> <SectionTitle> 6 Conclusions </SectionTitle> <Paragraph position="0"> We have proposed and evaluated a method which can identify senses which are rare in a given corpus. This method uses a ranking of senses derived automatically from raw text using both distributional similarity methods and a measure of semantic similarity, such as those available in the WordNet similarity package. When using rankings derived from a thesaurus automatically acquired from the BNC, we have demonstrated that this technique produces promising results in removing unused senses from both SemCor and the SENSEVAL-2 English all-words task corpus. Moreover, the senses removed erroneously from SemCor were less frequent than average.</Paragraph> <Paragraph position="1"> A major benefit of this method is to tailor a generic resource such as WordNet to domain-specific text, and we have demonstrated this using two domain specific corpora and and an evaluation using semi-automatically created domain labels (Magnini and Cavagli`a, 2000).</Paragraph> <Paragraph position="2"> There is scope for experimentation with other WordNet similarity scores. From earlier experiments we noted that the lesk measure produced quite good results, although it is considerably less efficient than jcn as it compares sense definitions at run time. One major advantage that lesk has, is its applicability to other PoS. The lesk measure can be used when ranking adjectives, and adverbs as well as nouns and verbs (which can also be ranked using jcn). Another advantage of the lesk measure is that it is applicable to lexical resources which do not have the hierarchical structure that WordNet does, but do have definitions associated with word senses.</Paragraph> <Paragraph position="3"> This paper only deals with nouns, however we have recently investigated the ranking method for an unsupervised predominant sense heuristic for WSD for other PoS (McCarthy et al., 2004b). We plan to use the ranking method for identifying prevalent and infrequent senses from domain specific text and using this as a resource for WSD and lexical acquisition. null</Paragraph> </Section> class="xml-element"></Paper>