File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/j01-3001_intro.xml
Size: 4,571 bytes
Last Modified: 2025-10-06 14:01:11
<?xml version="1.0" standalone="yes"?> <Paper uid="J01-3001"> <Title>The Interaction of Knowledge Sources in Word Sense Disambiguation</Title> <Section position="4" start_page="327" end_page="328" type="intro"> <SectionTitle> 2 The Brill tagger uses the 48-tag set from the Penn Tree Bank (Marcus, Santorini, and Marcinkiewicz </SectionTitle> <Paragraph position="0"> 1993), while LDOCE uses a set of 17 more general tags. Brill's tagger has a reported error rate of around 3%, although we found that mapping the Penn TreeBank tags used by Brill onto the simpler LDOCE tag set led to a lower error rate.</Paragraph> <Paragraph position="1"> 3 In the 3rd Edition of LDOCE the publishers claim that the senses are indeed ordered by frequency, although they make no such claim in the 1st Edition used here. However, Guo (1989) found evidence that there is a correspondence between the order in which senses are listed and the frequency of occurrence in the 1st Edition.</Paragraph> <Paragraph position="2"> Stevenson and Wilks Interaction of Knowledge Sources in WSD Partial disambiguation (by part of speech): If there is more than one possible homograph with the correct part of speech but some have been removed from consideration, that word has been partially disambiguated by part of speech.</Paragraph> <Paragraph position="3"> No disambiguation (by part of speech): If all the homographs of a word have the same part of speech, which is then assigned by the tagger, then none can be removed and no disambiguation has been carried out.</Paragraph> <Paragraph position="4"> Part-of-speech error: It is possible for the part-of-speech tagger to assign an incorrect part of speech, leading to the correct homograph being removed from consideration. It is worth mentioning that this situation has two possible outcomes: first, some homographs, with incorrect parts of speech, may remain; or second, all homographs may have been removed from consideration. null In Table 3 we show in the column labelled Count the number of words in our five articles which fall into each of the four categories. The relative performance of the baseline method (choosing the first sense) compared to the reported algorithm (removing homographs using part-of-speech tags) are shown in the rightmost two columns. The figures in brackets indicate the percentage of polyhomographic words correctly disambiguated by each method on a per-class basis. It can be seen that the majority of the polyhomographic words (297 of 342) fall into the &quot;Full disambiguation&quot; category, all of which are correctly disambiguated by the method reported here. When no disambiguation is carried out, the algorithm described simply chooses the first sense and so the results are the same for both methods. The only condition under which choosing the first sense is more effective than using part-of-speech information is when the part-of-speech tagger makes an error and all the homographs with the correct part of speech are removed from consideration. In most cases this means that the correct homograph cannot be chosen; however, in a small number of cases, this is equivalent to choosing the most frequent sense, since if all possible homographs have been removed from consideration, the algorithm reverts to using the simpler heuristic of choosing the word's first homograph. 4 Although this result may seem intuitively obvious, there have, we believe, been no other attempts to quantify the benefit to be gained from the application of a part-of-speech tagger in WSD (see Wilks and Stevenson 1998a). The method described here is effective in removing incorrect senses from consideration, thereby reducing the search space if combined with other WSD methods.</Paragraph> <Paragraph position="5"> In the experiments reported in this section we made use of the particular structure of LDOCE, which assigns each sense to a homograph from which its part of speech information is inherited. However, there is no reason to believe that the method reported here is limited to lexicons with this structure. In fact this approach can be applied to any lexicon which assigns part-of-speech information to senses, although it would not always be possible to evaluate at the homograph level as we do here.</Paragraph> <Paragraph position="6"> In the remainder of this paper we go on to describe a sense tagger that assigns senses from LDOCE using a combination of classifiers. The set of senses considered by the classifiers is first filtered using part-of-speech tags.</Paragraph> </Section> class="xml-element"></Paper>