File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/p06-2119_evalu.xml
Size: 12,136 bytes
Last Modified: 2025-10-06 13:59:42
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2119"> <Title>Word Sense Disambiguation using lexical cohesion in the context</Title> <Section position="7" start_page="932" end_page="934" type="evalu"> <SectionTitle> 5 Results </SectionTitle> <Paragraph position="0"> We evaluate the six heuristics on the English lexical sample of SENSEVAL-2, in which each target word has been POS-tagged in the training part. With the absence of taxonomy of adjectives in WordNet we only extract all 29 nouns and all 29 verbs from a total of 73 lexical targets, and then we subcategorize the test dataset into 1754 noun instances and 1806 verb instances. Since the sample of SENSEVAL-2 is manually sense-tagged with the sense number of WordNet 1.7 and our metrics are based on its version 2.0, we translate the sample and answer format into 2.0 in accordance with the system output format.</Paragraph> <Paragraph position="1"> Finally, we find that each noun target has 5.3 senses on average and each verb target 16.4 senses. Hence the baseline of random selection of senses is the reciprocal of each average sense number, i.e. separately 18.9 percent for nouns and 6 percent for verbs.</Paragraph> <Paragraph position="2"> In addition, SENSEVAL-2 provides a scoring software with 3 levels of schemes, i.e. finegrained, coarse-grained and mixed-grained to produce precision and recall rates to evaluate the participating systems. According to the SENSEVAL scoring system, as we always give at least one answer, the precision is identical to the recall under the separate noun and verb datasets.</Paragraph> <Paragraph position="3"> So we just evaluate our systems in light of accuracy. We tested the heuristics with fine-grained precision, which required the exact match of the key to each instance.</Paragraph> <Section position="1" start_page="932" end_page="934" type="sub_section"> <SectionTitle> 5.1 Context </SectionTitle> <Paragraph position="0"> Without any knowledge of domain, frequency and pragmatics to guess, word context is the only way of labeling the real meaning of word. Basically a bag of context words (after morphological analyzing and filtering stop-words) or the fine-grained ones (syntactic role, selection preference etc.) can provide cues for the target. We propose to merely use a bag of words to feed into each heuristic in case of losing any valuable information in the disambiguation, and preventing from any interference of other clues except the semantic hierarchy of WordNet.</Paragraph> <Paragraph position="1"> The size of the context is not a definitive factor in WSD, Yarowsky (1993) suggested the size of 3 or 4 words for the local ambiguity and 20/50 words for topic ambiguity. He also employed Roget's Thesaurus in 100 words of window to implement WSD (Yarowsky, 1992). To investigate the role of local context and topic context we vary the size of window from one word distance away to the target (left and right) until 100 words away in nouns or 60 in verbs, until there are no increases in the context of each instance.</Paragraph> <Paragraph position="2"> different size of context in SENSEVAL 2 Noun and verb disambiguation results are respectively displayed in Figure 1 and 2. Since the performance curves of the heuristics turned into flat and stable (the average standard deviations of the six curves of nouns and verbs is around 0.02 level before 60 and 20, after that approxi- null mately 0.001 level), optimal performance is reached at 60 context words for nouns and 20 words for verbs. These values are used as parameters in subsequent experiments.</Paragraph> <Paragraph position="3"> of SENSEVAL-2 in the transformed context spaces Although our metrics can measure the similarity of nouns and verbs through the derived related form of verbs (not from the derived verbs of nouns as a consequence of the shallowness of verb taxonomy of WordNet), we still can't completely rely on WordNet, which focuses on the paradigmatic relations of words, to fully cover the complexity of contextual happenings of words.</Paragraph> <Paragraph position="4"> Since the word association norm captures both syntagmatic and pragmatic relations in words, we transform the context words of the target into its associated words, which can be retrieved in the EAT, to augment the performance of the lexical hub.</Paragraph> <Paragraph position="5"> There are two word lists in the EAT: one list takes each head word as a stimulus word, and then collects and ranks all response words according to their frequency of subject consensus; the other list is in the reverse order with the response as a head word and followed by the eliciting stimuli. We denote the stimulus/response set of word as SR, respond/stimulus as RS. Apart from that we symbolize SRANDRS as the intersection of SR and RS, along with SRORRS as the union set of SR and RS. Then for each context word we retrieve its corresponding words in each word list and calculate the similarity between the target and these words including the context words.</Paragraph> <Paragraph position="6"> As a result we transform the original context space of each target into an enriched context space under the function of SR, RS, SRANDRS or SRORRS.</Paragraph> <Paragraph position="7"> We take the respective 60 context words of nouns and 20 words of verbs as the reference points for the transferred context experiment, since after that the performance curves of the heuristics turned into flat and stable (the average standard deviations of the six curves of nouns and verbs is around 0.02 level before 60, after that approximately 0.001 level).</Paragraph> <Paragraph position="8"> After the transformations, the noun and verb results are respectively demonstrated in Figure 3 and 4.</Paragraph> <Paragraph position="9"> other unsupervised systems and similarity metrics null Pedersen et al. (2003) in the work of evaluating different similarity techniques based on Word-Net, realized two variants of Lesk's methods: extended gloss overlaps (P&L_extend) and gloss vector (P&L_vector), as well as evaluating them in the English lexical sample of SENSEVAL-2.</Paragraph> <Paragraph position="10"> The best edge-counting-based metric that they measured are from Jiang and Conrath (1997) (J&C).</Paragraph> <Paragraph position="11"> Accordingly, without the transformation of EAT, we compare our results of HWL and HSL (denoted as HWL_Context and HSL_Context) with the above methods (picking up their optimal values). The results are illustrated in Figure 5. At the same time we also list three baselines for unsupervised systems (Kilgarriff and Rosenzweig, 2000), which are Baseline Random (randomly selecting one sense of the target), Baseline Lesk (overlapping between the examples and definitions of and unsupervised systems in SENSEVAL-2 each sense of the target and context words), and its reduced version, i.e. Baseline Lesk Def (only definition).</Paragraph> <Paragraph position="12"> We further compare HWL and HSL with the intervention of SRORRS of EAT (denoted as HWL_SRORRS and HSL_ SRORRS) with other unsupervised systems that employ no training materials of SENSEVAL-2, which are respectively: null * IIT 1 and IIT 2: extended the WordNet gloss of each sense of the target, along with its superordinate and subordinate node's glosses, without back-off policies.</Paragraph> <Paragraph position="13"> * DIMAP: employed both WordNet and the New Oxford Dictionary of English. With the first sense as a back-off when tied scores occurred. null * UNED-LS-U: for each sense of the target, they enriched the sense describer through the first five hyponyms of it and a dictionary built from 3200 books from Project Gutenberg. They adopted a back-off policy to the first sense and discarded the senses accounting for less than 10 percent of files in SemCor). null</Paragraph> </Section> </Section> <Section position="8" start_page="934" end_page="935" type="evalu"> <SectionTitle> 7 Conclusion and discussion </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="934" end_page="934" type="sub_section"> <SectionTitle> 7.1 Local context and topic context </SectionTitle> <Paragraph position="0"> On the analysis of standard deviation of precision on different stage in Figure 1 and 2 we can conclude that the optimum size for HSN to HSS was +-10 words for nouns, reflecting a sensitivity to only local context, whilst HWL and HSL reflected significant improvement up to +-60 reflecting a sensitivity to topical context. In the case of verbs HSA showed little significant context sensitivity, HSN showed some positive sensitivity to local context but increasing beyond +-5 had a negative effect, HSM and HSS to HSL showed some sensitivity to broader topical context but this plateaued around +-20 to 30.</Paragraph> </Section> <Section position="2" start_page="934" end_page="934" type="sub_section"> <SectionTitle> 7.2 The analysis of different heuristics. </SectionTitle> <Paragraph position="0"> HWL and HSL were clearly superior for both noun and verb tasks, with the superiority of HSL being significantly greater and more comparable between noun and verb tasks with the difference scarcely reaching significance. These observations remain true with the addition of the EAT information. After transformations with EAT for nouns, HSL and HWL no longer differ significantly in performance, forming a single group with relatively higher precision, whilst the other heuristics clump together into another group with lower precision, reflecting a negative effect from EAT. In the verb case, HWL and HSL, HSM and HSS, and HSN and HSA form three significantly different groups with reference to their precision, reflecting poor performance of both normalized heuristics (HSN and HSA) and a significantly improved result of HWL from the EAT data.</Paragraph> <Paragraph position="1"> All of this implies that in the lexical hub for WSD, the correct meaning of a word should hold as many links as possible with a relatively large number of context words. These links can be in the level of word form (HWL) or word sense (HSL). HSL achieved the highest precision in both nouns and verbs.</Paragraph> </Section> <Section position="3" start_page="934" end_page="934" type="sub_section"> <SectionTitle> 7.3 The interaction of EAT in WSD </SectionTitle> <Paragraph position="0"> For the noun sense disambiguation, the paired two sample for mean of the t-Test showed us that RS and SRORRS transformations can significantly improve the precision of disambiguation of HWL and HSL (P<0.05, at the confidence level of 95 percent). All four transformations using EAT for verb disambiguation are significantly better than its straightforward context case on HWL and HSL (P<0.05, at the confidence level of 95 percent).</Paragraph> <Paragraph position="1"> It demonstrated that both the syntagmatic relation and other domain information in the EAT can help discriminate word sense. With the transformation of context surroundings of the target, the similarity metrics can compare the likeness of nouns and verbs, although we can exploit the derived form of word in WordNet to facilitate the comparison.</Paragraph> </Section> <Section position="4" start_page="934" end_page="935" type="sub_section"> <SectionTitle> 7.4 Comparison with other methods </SectionTitle> <Paragraph position="0"> The lexical hub reached comparatively higher precision in both nouns (45.8%) and verbs (35.6%). This contrasted with other similarity based methods and the unsupervised systems in SENSEVAL-2. Note that we don't adopt any back-off policy such as the commonest sense of word used by UNED-LS-U and DIMAP.</Paragraph> <Paragraph position="1"> Although the noun and verb similarity metrics in this paper are based on edge-counting without any aid of frequency information from corpora, they performed very well in the task of WSD in relation to other information based metrics and definition matching methods. Especially in the verb case, the metric significantly outperformed other metrics.</Paragraph> </Section> </Section> class="xml-element"></Paper>