File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-1318_metho.xml
Size: 4,318 bytes
Last Modified: 2025-10-06 14:07:29
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1318"> <Title>Automatic WordNet mapping using word sense disambiguation*</Title> <Section position="3" start_page="144" end_page="145" type="metho"> <SectionTitle> 4 Evaluation </SectionTitle> <Paragraph position="0"> In this section, we evaluate the performance of each six heuristics as well as the combination method. To evaluate the performance of WordNet mapping, the candidate synsets of 3260 senses of Korean words in bilingual dictionary was manually classified as linking or discarding.</Paragraph> <Paragraph position="1"> We define 'precision' as the proportion of correctly linked senses of Korean words to all the linked senses of Korean words in a test set. We also define 'coverage' as the proportion of linked senses of Korean words to all the senses of Korean words in a test set.</Paragraph> <Paragraph position="2"> Table 1 contains the results for each heuristic evaluated individually against the manually classified data. The test set here consists of the 3260 manually classified senses.</Paragraph> <Paragraph position="3"> In general, the results of each heuristic seem to be poor, but are always better than the random choice baseline. The best heuristic according to the precision is the maximum similarity heuristic. But it was applied to only 59.51% of 3260 senses of Korean words. The results of each heuristic are better than the random mapping, with a statistically significance at the 99% level. decision tree based combination We performed 10-fold cross validation to evaluate the performance of the combination of all the heuristics using the decision tree - we split the data into ten parts, reserved one part as a validation set, trained the decision tree on the other nine parts and then evaluate the reserved part. This process is repeated nine times using each of the other nine parts as a validation set. Table 2 shows the results of the other trials of the combination of all the heuristics. Summing is a way to simply sum all the scores of each heuristic. Then the candidate synset which has the highest summation of the scores is selected. Logistic regression, as described in (Hosmer and Lemeshow, 1989), is a popular technique for binary classification. This technique applies an inverse logit function and employs the iterative reweighted least squares algorithm. This technique determines the weight of each heuristic.</Paragraph> <Paragraph position="4"> With the combination of the heuristics using summing, we obtained an improvement over maximum similarity heuristic (heuristic 1) of 9%, maintaining a coverage 100%. The decision tree is able to correctly map 93.59% of the senses of Korean words in bilingual dictionary, maintaining a coverage 77.12%.</Paragraph> <Paragraph position="5"> Applying the decision tree to combine all the heuristics for all Korean words in bilingual dictionary, we obtain a preliminary version of the Korean WordNet containing 21654 senses of 17696 Korean nouns with an accuracy of 93.59% (-2-0.84% with 99% confidence).</Paragraph> </Section> <Section position="4" start_page="145" end_page="145" type="metho"> <SectionTitle> 5 Related works </SectionTitle> <Paragraph position="0"> Several attempts have been performed to automatically produce multilingual ontologies.</Paragraph> <Paragraph position="1"> (Knight & Luk 1994) focuses on the construction of Sensus, a large knowledge base for supporting the Pangloss Machine Translation system, merging ontologies (ONTOS and UpperModel) and WordNet with monolingual and bilingual dictionaries. (Okumura & Hovy 1994) describes a semi-automatic method for associating a Japanese lexicon to an ontology using a Japanese/English bilingual dictionary as a 'bridge'. Several lexical resources and techniques are combined in (Atserias et al., 1997) to map Spanish words from a bilingual dictionary to WordNet. In (Farreres et al., 1998), use of a taxonomic structure derived from a monolingual MRD is proposed as an aid to the mapping process.</Paragraph> <Paragraph position="2"> This research is contrasted that it utilized bilingual dictionary to build monolingual thesaurus based on the existing popular lexical resources and used the combination of multiple unsupervided WSD heuristics.</Paragraph> </Section> class="xml-element"></Paper>