File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1663_metho.xml
Size: 21,737 bytes
Last Modified: 2025-10-06 14:10:46
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1663"> <Title>Quality Assessment of Large Scale Knowledge Resources</Title> <Section position="4" start_page="534" end_page="536" type="metho"> <SectionTitle> 2 Large Scale Knowledge Resources </SectionTitle> <Paragraph position="0"> This study covers a wide range of large-scale knowledge resources: WordNet (WN) (Fellbaum, 1998), eXtended WordNet (Mihalcea and Moldovan, 2001), large collections of semantic preferences acquired from SemCor (Agirre and Martinez, 2001; Agirre and Martinez, 2002) or acquired from the BNC (McCarthy, 2001), large-scale Topic Signatures for each synset acquired from the web (Agirre and de la Calle, 2004) or acquired from the BNC (Cuadros et al., 2005).</Paragraph> <Paragraph position="1"> However, although these resources have been derived using different WN versions, the research community has the technology for the automatic alignment of wordnets (Daud'e et al., 2003). This technology provides a mapping among synsets of different WN versions, maintaining the compatibility to all the knowledge resources which use a particular WN version as a sense repository.</Paragraph> <Paragraph position="2"> Furthermore, this technology allows to port the knowledge associated to a particular WN version to the rest of WN versions already connected.</Paragraph> <Paragraph position="3"> Using this technology, most of these resources are integrated into a common resource called Multilingual Central Repository (MCR) (Atserias et al., 2004). In particular, all WordNet versions, eXtended WordNet, and the semantic preferences acquired from SemCor and BNC.</Paragraph> <Section position="1" start_page="534" end_page="535" type="sub_section"> <SectionTitle> 2.1 Multilingual Central Repository </SectionTitle> <Paragraph position="0"> The Multilingual Central Repository (MCR)2 follows the model proposed by the EuroWordNet project. EuroWordNet (Vossen, 1998) is a multi-lingual lexical database with wordnets for several European languages, which are structured as the Princeton WordNet. The Princeton WordNet contains information about nouns, verbs, adjectives and adverbs in English and is organized around the notion of a synset. A synset is a set of words with the same part-of-speech that can be interchanged in a certain context. For example, <party, political party> form a synset because they can be used to refer to the same concept. A synset is often further described by a gloss, in this case: &quot;an organization to gain political power&quot;. Finally, synsets can be related to each other by semantic relations, such as hyponymy (between specific and more general concepts), meronymy (between parts and wholes), cause, etc.</Paragraph> <Paragraph position="1"> The current version of the MCR (Atserias et al., 2004) is a result of the 5th Framework MEANING project. The MCR integrates into the same EuroWordNet framework wordnets from five different languages (together with four English Word-Net versions). The MCR also integrates WordNet Domains (Magnini and Cavagli`a, 2000) and new versions of the Base Concepts and Top Concept Ontology. The final version of the MCR contains 1,642,389 semantic relations between synsets, most of them acquired by automatic means. This represents almost one order of magnitude larger than the Princeton WordNet (204,074 unique semantic relations in WordNet 2.0). Table 1 summarizes the main sources for semantic relations integrated into the MCR.</Paragraph> <Paragraph position="2"> Table 2 shows the number of semantic relations between synsets pairs in the MCR and its overlappings. Note that, most of the relations in the MCR between synsets-pairs are unique.</Paragraph> <Paragraph position="3"> Hereinafter we will refer to each semantic resource as follows: * WN (Fellbaum, 1998): This knowledge resource uses the direct relations encoded in WordNet 1.6 or 2.0. We also tested WN-2 (using relations at distance 1 and 2) and WN- null Selectional Preferences from the BNC 707,618 New relations from Princeton WN2.0 42,212 Gold relations from eXtended WN 17,185 Silver relations from eXtended WN 239,249 Normal relations from eXtended WN 294,488 Total 1,642,389 * XWN (Mihalcea and Moldovan, 2001): This knowledge resource uses the direct relations encoded in eXtended WordNet.</Paragraph> <Paragraph position="4"> * XWN+WN: This knowledge resource uses the direct relations included in WN and XWN.</Paragraph> <Paragraph position="5"> * spBNC (McCarthy, 2001): This knowledge resource contains the selectional preferences acquired from the BNC.</Paragraph> <Paragraph position="6"> * spSemCor (Agirre and Martinez, 2001; Agirre and Martinez, 2002): This knowledge resource contains the selectional preferences acquired from SemCor.</Paragraph> <Paragraph position="7"> * spBNC+spSemCor: This knowledge resource uses the selectional preferences acquired from the BNC and SemCor.</Paragraph> <Paragraph position="8"> * MCR (Atserias et al., 2004): This knowledge resource uses the direct relations included in MCR.</Paragraph> </Section> <Section position="2" start_page="535" end_page="536" type="sub_section"> <SectionTitle> 2.2 Automatically retrieved Topic Signatures </SectionTitle> <Paragraph position="0"> Topic Signatures (TS) are word vectors related to a particular topic (Lin and Hovy, 2000). Topic Signatures are built by retrieving context words of a target topic from large volumes of text. In our case, we consider word senses as topics. Basically, the acquisition of TS consists of A) acquiring the best possible corpus examples for a particular word sense (usually characterizing each word sense as a query and performing a search on the corpus for those examples that best match the queries), and then, B) building the TS by deriving the context words that best represent the word sense from the selected corpora.</Paragraph> <Paragraph position="1"> For this study, we use the large-scale Topic Signatures acquired from the web (Agirre and de la Calle, 2004) and those acquired from the BNC (Cuadros et al., 2005).</Paragraph> <Paragraph position="2"> * TSWEB3: Inspired by the work of (Leacock et al., 1998), these Topic Signatures were constructed using monosemous relatives from WordNet (synonyms, hypernyms, direct and indirect hyponyms, and siblings), querying Google and retrieving up to one thousand snippets per query (that is, a word sense). In particular, the method was as follows: null - Organizing the retrieved examples from the web in collections, one collection per word sense.</Paragraph> <Paragraph position="3"> - Extracting the words and their frequencies for each collection.</Paragraph> <Paragraph position="4"> - Comparing these frequencies with those pertaining to other word senses using TFIDF (see formula 1).</Paragraph> <Paragraph position="5"> - Gathering in an ordered list, the words with distinctive frequency for one of the collections, which constitutes the Topic Signature for the respective word sense.</Paragraph> <Paragraph position="6"> This constitutes the largest available semantic resource with around 100 million relations (between synsets and words).</Paragraph> <Paragraph position="7"> * TSBNC: These Topic Signatures have been constructed using ExRetriever4, a flexible tool to perform sense queries on large corpora. null - This tool characterizes each sense of a word as a specific query using a declarative language.</Paragraph> <Paragraph position="8"> - This is automatically done by using a particular query construction strategy, defined a priori, and using information from a knowledge base.</Paragraph> <Paragraph position="9"> In this study, ExRetriever has been evaluated using the BNC, WN as a knowledge base and TFIDF (as shown in formula 1) (Agirre and de la Calle, 2004)5.</Paragraph> <Paragraph position="11"> Where w stands for word context, wf for the word frecuency, C for Collection (all the corpus gathered for a particular word sense), and Cf stands for the Collection frecuency.</Paragraph> <Paragraph position="12"> In this study we consider two different query strategies: * Monosemous A (queryA): (OR monosemous-words). That is, the union set of all synonym, hyponym and hyperonym words of a WordNet synset which are monosemous nouns (these words can have other senses as verbs, adjectives or adverbs). * Monosemous W (queryW): (OR monosemous-words). That is, the union set of all words appearing as synonyms, direct hyponyms, hypernyms indirect hyponyms (distance 2 and 3) and siblings. In this case, the nouns collected are monosemous having no other senses as verbs, adjectives or adverbs.</Paragraph> <Paragraph position="13"> While TSWEB use the query construction queryW, ExRetriever use both.</Paragraph> </Section> </Section> <Section position="5" start_page="536" end_page="539" type="metho"> <SectionTitle> 3 Indirect Evaluation on Word Sense </SectionTitle> <Paragraph position="0"> Disambiguation In order to measure the quality of the knowledge resources described in the previous section, we performed an indirect evaluation by using all these resources as Topic Signatures (TS). That is, word vectors with weights associated to a particular synset which are obtained by collecting those word senses appearing in the synsets directly related to them 6. This simple representation tries to be as neutral as possible with respect to the evaluation framework.</Paragraph> <Paragraph position="1"> All knowledge resources are indirectly evaluated on a WSD task. In particular, the noun-set sociated weight.</Paragraph> <Paragraph position="2"> of Senseval-3 English Lexical Sample task which consists of 20 nouns. All performances are evaluated on the test data using the fine-grained scoring system provided by the organizers.</Paragraph> <Paragraph position="3"> Furthermore, trying to be as neutral as possible with respect to the semantic resources studied, we applied systematically the same disambiguation method to all of them. Recall that our main goal is to establish a fair comparison of the knowledge resources rather than providing the best disambiguation technique for a particular semantic knowledge base.</Paragraph> <Paragraph position="4"> A common WSD method has been applied to all knowledge resources. A simple word overlapping counting (or weighting) is performed between the Topic Signature and the test example7. Thus, the occurrence evaluation measure counts the amount of overlapped words and the weight evaluation measure adds up the weights of the overlapped words. The synset having higher overlapping word counts (or weights) is selected for a particular test example. However, for TSWEB and TSBNC the better results have been obtained using occurrences (the weights are only used to order the words of the vector). Finally, we should remark that the results are not skewed (for instance, for resolving ties) by the most frequent sense in WN or any other statistically predicted knowledge.</Paragraph> <Paragraph position="5"> Figure 3 presents an example of Topic Signature from TSWEB using queryW and the web and from TSBNC using queryA and the BNC for the first sense of the noun party. Although both automatically acquired TS seem to be closely related to the first sense of the noun party, they do not have words in common.</Paragraph> <Paragraph position="6"> As an example, table 4 shows a test example of Senseval-3 corresponding to the first sense of the noun party. In bold there are the words that appear in TSBNC-queryA. There are several important words that appear in the text that also appear in the TS.</Paragraph> <Paragraph position="7"> 4 Evaluating the quality of knowledge resources In order to establish a clear picture of the current state-of-the-art of publicly available wide coverage knowledge resources we also consider a number of basic baselines.</Paragraph> <Paragraph position="8"> BNC(queryA) with TFIDF (24 out of 9069 total words) <instance id=&quot;party.n.bnc.00008131&quot; docsrc=&quot;BNC&quot;> <context> Up to the late 1960s , catholic nationalists were split between two main political groupings . There was the Nationalist Party , a weak organization for which local priests had to provide some kind of legitimation . As a <head>party</head> , it really only exercised a modicum of power in relation to the Stormont administration . Then there were the republican parties who focused their attention on Westminster elections . The disorganized nature of catholic nationalist politics was only turned round with the emergence of the civil rights movement of 1968 and the subsequent forming of the SDLP in 1970 . </context> </instance></Paragraph> <Section position="1" start_page="537" end_page="537" type="sub_section"> <SectionTitle> 4.1 Baselines </SectionTitle> <Paragraph position="0"> We have designed several baselines in order to establish a relative comparison of the performance of each semantic resource: * RANDOM: For each target word, this method selects a random sense. This baseline can be considered as a lower-bound.</Paragraph> <Paragraph position="1"> * WordNet MFS (WN-MFS): This method selects the most frequent sense (the first sense in WordNet) of the target word.</Paragraph> <Paragraph position="2"> * TRAIN-MFS: This method selects the most frequent sense in the training corpus of the target word.</Paragraph> <Paragraph position="3"> * Train Topic Signatures (TRAIN): This baseline uses the training corpus to directly build a Topic Signature using TFIDF measure for each word sense. Note that in this case, this baseline can be considered as an upper-bound of our evaluation framework. Table 5 presents the F1 measure (harmonic mean of recall and precision) of the different baselines. In this table, TRAIN has been calculated with a fixed vector size of 450 words. As expected, RANDOM baseline obtains the poorest result while the most frequent sense of Word-Net (WN-MFS) is very close to the most frequent sense of the training corpus (TRAIN-MFS), but both are far below to the Topic Signatures acquired using the training corpus (TRAIN).</Paragraph> </Section> <Section position="2" start_page="537" end_page="538" type="sub_section"> <SectionTitle> 4.2 Performance of the knowledge resources </SectionTitle> <Paragraph position="0"> Table 6 presents the performance of each knowledge resource uploaded into the MCR and the average size of its vectors. In bold appear the best results for precision, recall and F1 measures. The lowest result is obtained by the knowledge directly gathered from WN mainly because of its poor coverage (Recall of 17.6 and F1 of 25.6). Its performance is improved using words at distance 1 and 2 (F1 of 33.3), but it decreases using words at distance 1, 2 and 3 (F1 of 30.4). The best precision is obtained by WN (46.7), but the best performance is achieved by the combined knowledge of MCRspBNC8 (Recall of 42.9 and F1 of 44.1). This represents a recall 18.5 points higher than WN. That is, the knowledge integrated into the MCR (Word-Net, eXtended WordNet and the selectional preferences acquired from SemCor) although partly de- null resources integrated into the MCR.</Paragraph> <Paragraph position="1"> in terms of recall and F1 measures than using the knowledge currently present in WN alone (with a small decrease in precision). It also seems that the knowledge from spBNC always degrades the performance of their combinations9.</Paragraph> <Paragraph position="2"> Regarding the baselines, all knowledge resources integrated into the MCR surpass RAN-DOM, but none achieves neither WN-MFS, TRAIN-MFS nor TRAIN.</Paragraph> <Paragraph position="3"> Figure 1 plots F1 results of the fine-grained evaluation on the nominal part of the English lexical sample of Senseval-3 of the baselines (including upper and lower-bounds), the knowledge bases integrated into the MCR, the best performing Topic Signatures acquired from the web and the BNC evaluated individually and in combination with others. The figure presents F1 (Y-axis) in terms of the size of the word vectors (X-axis)10. In order to evaluate more deeply the quality of each knowledge resource, we also provide some evaluations of the combined outcomes of several knowledge resources. The combinations are performed following a very simple voting method: first, for each knowledge resource, the scoring results obtained for each word sense are normalized, and then, for each word sense, the normalized scores are added up selecting the word sense with higher score.</Paragraph> <Paragraph position="4"> Regarding Topic Signatures, as expected, in general the knowledge gathered from the web (TSWEB) is superior to the one acquired from the BNC either using queryA or queryW (TSBNCqueryA and TSBNC-queryW). Interestingly, the performance of TSBNC-queryA when using the 9All selectional preferences acquired from SemCor or the BNC have been considered including those with very low confidence score.</Paragraph> <Paragraph position="5"> 10Only varying the size of TS for TSWEB and TSBNC.</Paragraph> <Paragraph position="6"> first two hundred words of the TS is slightly better than using queryW (both using the web or the BNC).</Paragraph> <Paragraph position="7"> Although TSBNC-queryA and TSBNC-queryW perform very similar, both knowledge resources contain different knowledge. This is shown when combining the outcomes of these two different knowledge resources with TSWEB.</Paragraph> <Paragraph position="8"> While no improvement is obtained when combining the knowledge acquired from the web and the BNC when using the same acquisition method (queryW), the combination of TSWEB and TSBNC-queryA (TSWEB+ExRetA) obtains better F1 results than TSWEB (TSBNC-queryA have some knowledge not included into TSWEB).</Paragraph> <Paragraph position="9"> Surprisingly, the knowledge integrated into the MCR (MCR-spBNC) surpass the knowledge from Topic Signatures acquired from the web or the BNC, using queryA, queryW or their combinations. null Furthermore, the combination of TSWEB and MCR-spBNC (TSWEB+MCR-spBNC) outperforms both resources individually indicating that both knowledge bases contain complementary information. The maximum is achieved with TS vectors of at most 700 words (with 49.3% precision, 49.2% recall and 49.2% F1). In fact, the resulting combination is very close to the most frequent sense baselines. This fact indicates that the resulting large-scale knowledge base almost encodes the knowledge necessary to behave as a most frequent sense tagger.</Paragraph> </Section> <Section position="3" start_page="538" end_page="539" type="sub_section"> <SectionTitle> 4.3 Senseval-3 system performances </SectionTitle> <Paragraph position="0"> For sake of comparison, tables 7 and 8 present the F1 measure of the fine-grained results for nouns of the Senseval-3 lexical sample task for the best and worst unsupervised and supervised systems, respectively. We also include in these tables some of the baselines and the best performing combination of knowledge resources (including TSWEB and MCR-spBNC)11. Regarding the knowledge resources evaluated in this study, the best combination (including TSWEB and MCR-spBNC) achieves an F1 measure much better than some supervised and unsupervised systems and it is close to the most frequent sense of WordNet (WN-MFS) and to the most frequent sense of the training corpora (TRAIN-MFS).</Paragraph> <Paragraph position="1"> 11Although we maintain the classification of the organizers, system s3 wsdiit used the train data.</Paragraph> <Paragraph position="2"> We must recall that the main goal of this research is to establish a clear and neutral view of the relative quality of available knowledge resources, not to provide the best WSD algorithm using these resources. Obviously, much more sophisticated WSD systems using these resources could be devised. null</Paragraph> </Section> </Section> <Section position="6" start_page="539" end_page="540" type="metho"> <SectionTitle> 5 Quality Assessment </SectionTitle> <Paragraph position="0"> Summarizing, this study provides empirical evidence for the relative quality of publicly available large-scale knowledge resources. The relative quality has been measured indirectly in terms of precision and recall on a WSD task.</Paragraph> <Paragraph position="1"> The study empirically demonstrates that automatically acquired knowledge bases clearly surpass both in terms of precision and recall the knowledge manually encoded from WordNet (using relations expanded to one, two or three levels). Surprisingly, the knowledge contained into the MCR (WordNet, eXtended WordNet, Selectional Preferences acquired automatically from SemCor) is of a better quality than the automatically acquired Topic Signatures. In fact, the knowledge resulting from the combination of all these large-scale resources outperforms each resource individually indicating that these knowledge bases contain complementary information. Finally, we should remark that the resulting combination is very close to the most frequent sense classifiers.</Paragraph> <Paragraph position="2"> Regarding the automatic acquisition of large-scale Topic Signatures it seems that those acquired from the web are slightly better than those acquired from smaller corpora (for instance, the BNC). It also seems that queryW performs better than queryA but that both methods (queryA and queryW) also produce complementary knowledge.</Paragraph> <Paragraph position="3"> Finally, it seems that the weights are not useful for measuring the strength of a vote (they are only useful for ordering the words in the Topic Signature).</Paragraph> </Section> class="xml-element"></Paper>