File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/p98-2228_evalu.xml
Size: 3,681 bytes
Last Modified: 2025-10-06 14:00:34
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2228"> <Title>Word Sense Disambiguation using Optimised Combinations of Knowledge Sources</Title> <Section position="8" start_page="1400" end_page="1400" type="evalu"> <SectionTitle> 6 Results </SectionTitle> <Paragraph position="0"> To date we have tested our system on only a portion of the text we derived from SEMCOR, which consisted of 2021 words tagged with LDOCE senses (and 12,208 words in total). The 2021 word occurances are made up from 1068 different types, with an average polysemy of 7.65. As a baseline against which to compare results we computed the percentage of words which are correctly tagged if we chose the first sense for each, which resulted in 49.8% correct disambiguation.</Paragraph> <Paragraph position="1"> We trained a decision list using 1821 of the occurances (containing 1000 different types) and kept 200 (129 types) as held-back training data. When the decision list was applied to the held-back data we found 70% of the first senses correctly tagged. We also found that the system correctly identified one of the correct senses 83.4% of the time. Assuming that our tagger will perform to a similar level over all content words in our corpus if test data was avilable, and we have no evidence to the contrary, this figure equates to 92.8% correct tagging over all words in text (since, in our corpus, 42% of words tokens are ambiguous in LDOCE).</Paragraph> <Paragraph position="2"> Comparative evaluation is generally difficult in word sense disambiguation due to the variation in approach and the evaluation corpora. However, it is fair to compare our work against other approaches which have attempted to disambiguate all content words in a text against some standard lexical resource, such as (Cowie et al., 1992), (Harley and Glennon, 1997), (McRoy, 1992), (Veronis and Ide, 1990) and (Mahesh et al., 1997). Neither McRoy nor Veronis & Ide provide a quantative evaluation of their system and so our performance cannot be easily compared with theirs. Mahesh et. al. claim high levels of sense tagging accuracy (about 89%), but our results are not directly comparable since its authors explicitly reject the conventional markup-trainingtest method used here. Cowie et. al. used LDOCE and so we can compare results using the same set of senses. Harley and Glennon used the Cambridge International Dictionary of English which is a comparable resource containing similar lexical information and levels of semantic distinction to LDOCE. Our result of 83% compares well with the two systems above who report 47% and 73% correct disambiguation for their most detailed level of semantic distinction. Our result is also higher than both systems at their most rough grained level of distinction (72% and 78%). These results are summarised in Table 1.</Paragraph> <Paragraph position="3"> In order to compare the contribution of the separate taggers we implemented a simple voting system.</Paragraph> <Paragraph position="4"> By comparing the results obtained from the voting system with those from the decision list we get some idea of the advantage gained by optimising the combination of knowledge sources. The voting system provided 59% correct disambiguation, at identifying the first of the possible senses, which is little more than each knowledge source used separately (see Table 2). This provides a clear indication that there is a considerable benefit to be gained from combining disambiguation evidence in an optimal way. In future work we plan to investigate whether the apparently orthogonal, independent, sources of information are in fact so.</Paragraph> </Section> class="xml-element"></Paper>