File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-1005_metho.xml
Size: 18,832 bytes
Last Modified: 2025-10-06 14:14:06
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-1005"> <Title>Word Sense Disambiguation using Conceptual Density</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Conceptual Density and Word </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Sense Disambiguation </SectionTitle> <Paragraph position="0"> Conceptual distance tries to provide a basis for measuring closeness in meaning among words, taking as reference a structured hierarchical net. Conceptual distance between two concepts is defined in IRada et al. 89\] as the length of the shortest path that connects the concepts in a hierarchical semantic net. In a similar approach, \[Sussna 931 employs the notion of conceptual distance between network nodes in order to improve precision during document indexing. \[Resnik 95\] captures semantic similarfly (closely related to conceptual distance) by means of the information content of the concepts in a hierarchical net. In general these alw;oaches focus on nouns.</Paragraph> <Paragraph position="1"> The measure ()1' conceptual distance among concepts we are looking for should be scnsflive Io: * the length of the shortest palh that connects lhe concepts involved.</Paragraph> <Paragraph position="2"> * the depth in the hierarchy: concepts in a deeper part of the hierarchy should be ranked closer.</Paragraph> <Paragraph position="3"> * the density of concepts in the hierarchy: concepts in a dense part of the hierarchy are relatively closer than those in a more sparse region.</Paragraph> <Paragraph position="4"> - tile measure should be independent of the lltllllber o1' concepts we are measuring.</Paragraph> <Paragraph position="5"> We have experimented willl several fornmlas that follow the four criteria presented above. The experiments reported here were pcrformcd using the Conceptual Density formuhl \[Agirre & Rigau 95\], which compares areas of subhierarchies.</Paragraph> <Paragraph position="6"> To illustrate how Conceptual 1)ensity can help to disambiguate a word, in figure I lhe word W has four senses and several context words. Each sense of the words belongs to a subhierarchy of WordNct. Tile dots in the subhierarchies represent the senses of eilhcr the word to be disambiguated (W) or the words in the context. Conceptual Density will yield the highest density for lhe subhierarchy containing more senses of lhose, rehttive to the total amount of senses in the subhierarchy. Tim sense o1' W contained in the subhierarchy with highest Conceptual l)ensity will be chosen as the sense disambiguating W in the given context. In figure 1, sense2 would be chosen.</Paragraph> <Paragraph position="7"> Given a concept c, at the top of a sulfifierarchy, and given nhyp (mean number of hyponyms per node), the Conceptual Density for c when its subhierarchy contains a number m (nmrks) of senses of the words to disambiguate is given by the \[ormula below:</Paragraph> <Paragraph position="9"> l;ornlula I shows a lmralneter that was COlnputed experimentally. The 0.20 tries to smooth the exponential i, as m ranges between I and tim total number of senses in WordNet. Several values were Ified for the parameter, and it was found that the best lmrl'ormanee was attained consistently when the parameter was near 0.20.</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="17" type="metho"> <SectionTitle> 3 The Disambiguation Algorithm </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="17" type="sub_section"> <SectionTitle> Using Conceptual Density </SectionTitle> <Paragraph position="0"> Given a window size, the program moves the window one noun at a time from the beginning of the document towards its end, disambiguating in each step the noun in the middle of the window and considering the other nouns in the window as contexl. Non-noun words are ,lot taken into account.</Paragraph> <Paragraph position="1"> The algorilhm Io disambiguate a given noun w in tile middle of a window o1' nouns W (c.f. figure 2) roughly proceeds its folk)ws:</Paragraph> <Paragraph position="3"> First, the algorithm represents in a lattice the nouns present in the window, their senses and hypernyms (step 1). Then, the program computes the Conceptual Density of each concept in WordNet according to the senses it contains in its subhierarchy (step 2). It selects the concept c with highest Conceptual Density (step 3) and selects the senses below it as the correct senses for the respective words (step 4).</Paragraph> <Paragraph position="4"> The algorithm proceeds then to compute the density for the remaining senses in the lattice, and continues to disambiguate the nouns left in W (back to steps 2, 3 and 4). When no further disambiguation is possible, the senses left for w are processed and the result is presented (step 5).</Paragraph> <Paragraph position="5"> Besides completely disambiguating a word or failing to do so, in some cases the disambiguation algorithm returns several possible senses for a word.</Paragraph> <Paragraph position="6"> In the experiments we considered these partial outcomes as failure to disambiguate.</Paragraph> </Section> </Section> <Section position="5" start_page="17" end_page="17" type="metho"> <SectionTitle> 4 The Experiments </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="17" end_page="17" type="sub_section"> <SectionTitle> 4.1 The texts </SectionTitle> <Paragraph position="0"> We selected four texts from SemCor at random: br-a01 (where a stands for gender &quot;Press: Reportage&quot;), br-b20 (b for &quot;Press: Editorial&quot;), br-j09 (j means &quot;Learned: Science&quot;) and br-r05 (r for &quot;Humour&quot;). Table 1 shows some statistics for each text.</Paragraph> <Paragraph position="1"> text words nouns nouns monosemous An average of 11% of all nouns in these four texts were not found in WordNet. According to this data, the amount of monosemous nouns in these texts is bigger (32% average) than the one calculated for the open-class words fi'om the whole SemCor (27.2% according to \[Miller et al. 94\]).</Paragraph> <Paragraph position="2"> For our experiments, these texts play both the rol'e of input files (without semantic tags) and (tagged) test files. When they are treated as input files, we throw away all non-noun words, only leaving the lemmas of the nouns present in WordNet.</Paragraph> </Section> </Section> <Section position="6" start_page="17" end_page="55" type="metho"> <SectionTitle> 4.2 Results and evaluation </SectionTitle> <Paragraph position="0"> One of the goals of the experiments was to decide among different variants of the Conceptual Density formula. Results are given averaging the results of the four files. Partial disambiguation is treated as failure to disambiguate. Precision (that is, the percentage of actual answers which were correct) and recall (that is, the percentage of possible answers which were correct) are given in terms of polysemous nouns only. Graphs are drawn against the size of the context 3 .</Paragraph> <Paragraph position="1"> * meronymy does not improve performance as expected. A priori, the more relations are taken in account (e.i. meronymic relations, in addition to the hypo/hypernymy relation) the better density would capture semantic relatedness, and therefore better results can be expected.</Paragraph> <Paragraph position="2"> The experiments (see figure 3) showed that there is not much difference; adding meronymic information does not improve precision, and raises coverage only 3% (approximately). Nevertheless, in the rest of the results reported below, meronymy and hypernymy were used.</Paragraph> <Paragraph position="3"> * global nhyp is as good as local nhyp.</Paragraph> <Paragraph position="4"> The average number of hypouyms or nhyp (c.f.</Paragraph> <Paragraph position="5"> formula 1) can be approximated in two ways. If an independent nhyp is computed for every concept in WordNet we call it local nhyp. If instead, a unique nhyp is computed using the whole hierarchy, we have global nhyp.</Paragraph> <Paragraph position="6"> While local nhyp is the actual average for a given concept, global nhyp gives only an estimation. The results (c.f. figure 4) show that local nhyp performs only slightly better. Therefore global nhyp is favoured and was used in subsequent experiments.</Paragraph> <Paragraph position="7"> * context size: different behavionr for each text. One could assume that the more context lhere is, the better the disambiguation results would be. Our experiments show that each file from SemCor has a different behaviour (c.f. figure 5) while br-b20 shows clear improvement for bigger window sizes, br-r05 gets a local maximum at a 10 size window, etc.</Paragraph> <Paragraph position="8"> As each text is structured a list of sentences, lacking any indication of headings, sections, paragraph endings, text changes, etc. the program gathers the context without knowing whether the nouns actually occur in coherent pieces of text. This could account for the fact that in br-r05, composed mainly by short pieces of dialogues, the best results are for window size 10, the average size of this dialogue pieces. Likewise, the results for br-a01, which contains short journalistic texts, are hest for window sizes from 15 to 25, decreasing significatly for size 30.</Paragraph> <Paragraph position="9"> Ill addition, the actual nature of each text is for sure an impommt factor, difficult to measure, which could account for the different behawfiur on its own. In order to give an overall view of the performance, we consider the average hehaviour.</Paragraph> <Paragraph position="10"> * file vs. sense. WordNct groups noun senses in 24 lexicographer's files. The algorithm assigns a noun both an specific sense and a file label. Both file matches and sense matches are interesting to count. Whilc the sense level gives a fine graded measure of the algorithm, the file level gives an indication of the perl'ormance if we were interested in a less sharp level of disambiguation. The granularity of the sense distinctions made in \[Hearst, 91\], \[Yarowsky 92\] and \[Gale et al. 93\] also called homographs in \[Guthrie et al. 931\], can be compared to that of the file level in WordNct.</Paragraph> <Paragraph position="11"> For instance, in \[Yarowsky 92\] two homographs of tile noun }liNg are considered, one characterised as MUSIC and the other as ANIMAL, INSECT. In WordNet, the 6 senses of I~t~s related to music appear in the following files: ARTIFACT, ATTRIBUTE, COMMUNICATION and PERSON. The 3 senses related to animals appear in the files ANIMAL and FOOD. This mcans that while the homograph level in \[Yarowsky 92\] distinguishes two sets of senses, the file level in WordNet distinguishes six sets of senses, still finer in granularity.</Paragraph> <Paragraph position="12"> Figure 6 shows that, as expected, file-level matches attain better performance (71.2% overall and 53.9% for polysemic nouns) than sense-level matches.</Paragraph> <Paragraph position="13"> * evaluation of the results Figure 7 shows that, overall, coverage over polyscmous nonns increases significantly with the window size, without losing precision. Coverage tends to get stabilised near 80%, getting little improvement for window sizes bigger than 20.</Paragraph> <Paragraph position="14"> The figure also shows the guessing baseline, given hy selecting senses at random. This baseline was first calculated analytically and later checked experimentally. We also compare the performance of our algorithm with that of the &quot;most frequent&quot; heuristic. The frequency counts for each sense were collected using the rest of SemCor, and then applied to the \['our texts. While the precision is similar to that of our algorithm, the coverage is 8% worse.</Paragraph> <Paragraph position="15"> preceding graphs were relative to the polysemous nouns only. Including monosemic nouns precision raises, as shown in table 2, from 43% to 64.5%, and the coverage increases from 79.6% to 86.2%.</Paragraph> <Section position="1" start_page="55" end_page="55" type="sub_section"> <SectionTitle> 4.3 Comparison with other works </SectionTitle> <Paragraph position="0"> The raw results presented here seem to be poor when compared to those shown in \[Hearst 91\], \[Gale et al. 93\] and \[Yarowsky 9211. We think that several factors make the comparison difficult. Most of those works focus in a selected set of a few words, generally with a couple of senses of very different meaning (coarse-grained distinctions), and for which their algorithm could gather enough evidence. On the contrary, we tested our method with all the nouns in a subset of an unfestricted public domain corpus (more than 9.000 words), making fine-grained distinctions among all the senses in WordNct.</Paragraph> <Paragraph position="1"> An approach that uses hierarchical knowledge is that of \[Resnik 9511, which additionally uses the information content of each concept gathered from corpora. Unfortunately he applies his method on a different task, that of disambiguating sets of related nouns. The evaluation is done on a set of related nouns from Roger's Thesaurus tagged by hand. The fact that some senses were discarded because the human judged them not reliable makes comparison even more difficult.</Paragraph> <Paragraph position="2"> In order to compare our approach we decided to implement \[Yarowsky 92\] and \[Sussna 93\], and test them on our texts. For \[Yarowsky 92\] we had to adapt it to work with WordNet. His method relies on cooccurrence data gathered on Roget's Thesaurus semantic categories. Instead, on our experiment we use saliency values 4 based on the lexicographic file tags in SemCor. The results for a window size of 50 nouns are those shown in table 35. Tile precision attained by our algorithm is higher. To compare figures better consider the results in table 4, were the coverage of our algorithm was easily extended using the version presented below, increasing recall to 70.1%. \[+ From the methods based on Conceptual Distance, \[Sussna 9311 is the most similar to ours. Sussna disambiguates several documents from a public corpus using WordNet. The test set was tagged by hand, allowing more than one correct senses for a single word. The method he uses has to overcome a combinatorial explosion 6 controlling the size of the window and &quot;freezing&quot; the senses for all the nouns preceding the noun to be disambiguated. In order to fi'eeze the winning sense Sussna's algorithm is forced to make a unique choice. When Conceptual Distance is not able to choose a single sense, the algorithm chooses one at random.</Paragraph> <Paragraph position="3"> Conceptual Density overcomes the combinatorial explosion extending the notion of conceptual distance from a pair of words to n words, and therefore can yield more than one correct sense for a word. For comparison, we altered our algorithm to also make random choices when unable to choose a single sense.</Paragraph> <Paragraph position="4"> We applied the algorithm Sussna considers best, constraint for the first 10 nouns (tile optimal window size according to his experiments) of file br-r05 had to deal with more than 200,000 synset pairs.</Paragraph> <Paragraph position="5"> discarding the factors that do not affect performance significantly 7, and obtain the results in table 4. A more thorougla comparison with these methods could he desirable, hut not possible in this paper l'or the sake of conciseness.</Paragraph> <Paragraph position="6"> might be only one of a number of complementary evidences of the plausibility ol'a certain word sense. Furthermore, WordNet 1.4 is not a complete lexical database (current version is 1.5).</Paragraph> <Paragraph position="7"> * Tune the sense distinctions to the level best suited for the application. On the one hand the sense distinctions made by WordNet 1.4 arc not always satisl'actory. On tire other hand, our algorithm is not designed to work on the file level, e.g. il' the sense level is unable to distinguish among two senses, the file level also fails, even if both senses were fronl the same file. If the senses were collapsed at the file level, the coverage and precision of tile algorithm at the file level might be even better.</Paragraph> </Section> </Section> <Section position="7" start_page="55" end_page="55" type="metho"> <SectionTitle> 5 Further Work </SectionTitle> <Paragraph position="0"> We would like to have included in this paper a study on whether there is or not a correlation among correct and erroneous sense assignations and the degree of Conceptual Density, that is, the actual figure held by fommla I. If this was the case, the error rate could be furtber decreased setting a ccrtain lhreshold for Conceptual Density wdues of wilming senses. We would also like to evaluate the usel'ulness of partia~l disambiguation: decrease of ambiguity, number of times correct sense is among the chosen ones, etc.</Paragraph> <Paragraph position="1"> There are some factors that could raise the performmace of our algorithm: * Work on coherent chunks of text.</Paragraph> <Paragraph position="2"> Unfortunately any information about discourse structure is absent in SemCor, apart from sentence endings Thc performance would gain from the fact lhat sentences from unrelated topics wouht not be considered in the disamhiguation window.</Paragraph> <Paragraph position="3"> * Extend and improve the semantic data.</Paragraph> <Paragraph position="4"> WordNet provides sinonymy, hypernymy and meronyny relations for nouns, but other relations are missing. For instance, WordNet lacks eross-categorial semantic relations, which could he very useful to extend the notion of Conceptual Density of nouns to Conceptual Density of words. Apart from extending lhe disambiguation to verbs, adjectives and adverbs, cross-catcgorial relations would allow to capture better lhe relations alnong senses and provide firmer grounds for disambiguating.</Paragraph> <Paragraph position="5"> These other relations could be extracted from other knowledge sources, both corpus-based or MRD-based. If those relations could be given on WordNet senses, Conceptual Density could profit from them. It is ot, r belief, following the ideas of \[McRoy 92\] that full-fledged lexical ambiguity resolution should combine several information sources. Conceptual Density &quot;/Initial mutual constraint size is 10 and window size'is 41. Meronymic links are also considered. All the links have the same weigth.</Paragraph> </Section> class="xml-element"></Paper>