File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-0802_metho.xml
Size: 27,755 bytes
Last Modified: 2025-10-06 14:07:25
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-0802"> <Title>Sense clusters for Information Retrieval: Evidence from Semcor and the EuroWordNet InterLingual Index</Title> <Section position="4" start_page="10" end_page="14" type="metho"> <SectionTitle> PLANT ORGAN </SectionTitle> <Paragraph position="0"> The plant/food rule successfully relates senses 2 and 1, while for Information Retrieval the interesting cluster is for senses 2 and 3, (both botanical terms).</Paragraph> <Paragraph position="1"> Our hypothesis is, therefore, that we cannot assume general clustering criteria; different NLP applications require different clustering criteria that are difficult to reconcile in a single clustering approach. Our work on clustering is centered on identifying sense-distinctions that could be relevant from an Information Retriev'~t and Cross Language Information Retrieval point of view.</Paragraph> <Paragraph position="2"> Next section describes a clustering strategy that adequates to the Information Retrieval criterion: cluster senses if they tend to co-occur in the same Semcor documents.</Paragraph> <Paragraph position="3"> In Section 3, we study a different clustering ceilteflon, based on equivalent translations for two or more senses in other wordnets fi'om the EuroWordNet database. This is a direct criterion to duster senses in Machine Translation or Cross-Language Text Retrieval. Then we measure the overlap between both criteria, to conclude that the EWN InterLingual Index is also a valuable source of evidence for Information Retrieval clusters.</Paragraph> <Paragraph position="4"> 2 Cluster evidence from Semcor One of our goals within the EuroWordNet and ITEM projects was to provide sense clusterings for WordNet (and, in general,for the EuroWordNet InterLingual Index, (Gonzalo et al., 1999) that leave only the sense distinctions in wordnets that indicate different (semantic) indexing units for Information Retrieval. Our first lexicographic examination of WordNet sense distinctions and ciustefings following criteria based on the wordnet hierarchy did not produce clear criteria to classify senses semi-automatically according to this ~ requirement. As we mentioned before, the clusters applied on the EWN InterLingual Index which relied solely on hierarchical information in Wordnet, produced a slight decrease of retrieval performauce in an experiment using 1LI records as indexing units.</Paragraph> <Paragraph position="5"> Thus, we decided to stick to our only clear-cut criterion: cluster senses if they are likely to co-occur in the same document. The fact that the same sense combination occurs in several semantically tagged documents should provide strong evidence for clustering. Fortunately, we had the Semcor corpus of semantically-tagged documents to start with.</Paragraph> <Paragraph position="6"> For example, the first two senses of &quot;breath&quot; co-occur in several Semcor documents: Breath 1. (the air that is inhaled or exhaled in respiration) null 2. (the act of exhaling) This co-occurrence indicates that this sense distinction ~ not help to discriminate different document contexts. While in this particular example there is a clear relation between senses (sense 1 is involved in the action specified in sense 2), it seems extremely difficult to find general clustering techniques based on Word.Net hierarchy to capture all potential IR clusters.</Paragraph> <Paragraph position="7"> We have scanned Semcor in search of sets of (two or more) senses that co-occur frequently enough. In practice, we started with a threshold of at least 2 documents (out of 171) with the co-occurring senses in a similar distribution. We did not use the original Semcor files, but the IR-Semcor partition (Gonzalo et al., 1998) that splits multi-text documents into coherent retrieval chunks. We completed this list of candidates to cluster with pairs of senses that only co-occur once but belong to any &quot;cousin&quot; combination (Peters et al., 1998). Finally, we obtained 507 sets of sense combinations (producing above 650 sense pairs) for which Semcor provides positive evidence for clustering. A manual verification of 50 of such clusters showed that above 70% of them were useful. We also noticed that raising the threshold (the number of documents in which the senses cooccur), the error rate decreases quickly.</Paragraph> <Paragraph position="8"> Then we worked with this set of positive IR clusters, trying to identify a set of common features that could be used to cluster the rest of WN/EWN senses. However, it seems extremely difficult to find any single criterion, common to all clusters. For instance, if we consider a) number of variants in common between the synsets corresponding to the candidate senses; b) number of words in common between the glosses; and c) common hypernyms, we find that any combination of values for these three features is likely to be found among the set of clusters inferred from Semcor.</Paragraph> <Paragraph position="9"> For example: fact I. a piece of information about circurastances that ezist or events that have occurred; &quot;first you must collect all the facts of the case&quot; ~. a statement or assertion of verified information about something that is the case or has happened; &quot;he supported his argument with an impressive array of facts&quot; Number of documents in which they co-occur: 13 a) number of variants in common: 1 out of 1 b) (content) words in common between flosses: yes c) common hypernyms: no door 1. door -(a swinging or sliding barrier that will close the entrance to a room or building; &quot;he knocked on the door&quot;; &quot;he slammed the door as he left&quot;) 2. doorway, door, entree, entry, portal, room access'- (the space in a wall through which you enter or leave a room or building; the space that a door can close; &quot;he stuck his head in the doorway&quot;) Number of documents in which they co-occur: 11 a} number of variants in common: I out of 6 b} (content} words in common between glosses: yes (also XPOS: enter/entrance} c} common hypernyras: yes way 1. manner, mode, style, way, fashion - (a manner of performance; &quot;a manner of living&quot;; &quot;a way of life&quot;) 2. means, way - (how a result is obtained or an end is achieved; &quot;a means of communication&quot;; &quot;the true way to success&quot;) Number of documents in which they co-occur: 9 a) number of variants in common: I out of 6 b) (content) words in common between glosses: no c) common hypernyms: no The next logical step is to use this positive evidence, combined with negative co-occurrence evidence, in training some machine learning system that can successfully capture the regularities hidden to our manual inspection. In principle, a binary classification task would be easy to capture by decision trees or similar techniques.</Paragraph> <Paragraph position="10"> Therefore, we have also extracted from Seacot combinations of senses that appear frequently enough in Semcor, but never appear together in the same document. The threshold was set in, at least, 8 occurrences of each sense in Semcor, resulting in more than 500 negative clusters. A manual verification of 50 of these negative clusters showed that about 80% of them were acceptable for Information Retrieval as senses that should be distinguished. Together with the positive evidence, we have more than 1100 training cases for a binary classifier. Our plan is to apply this classifter to the whole EWN InterLingual Index, and then perform precision/recall tests in the environment described in (Gonzalo et al., 1998; Gonzalo et al., 1999).</Paragraph> <Paragraph position="11"> 3 Cluster evidence from the ILI When translated into a target language, sense distinctions of a word may be lexicalized. For instance, the English term spring is translated into Spanish as primavera in its &quot;season&quot; sense, into muelle in its &quot;metal device&quot; sense, or as flaente in its &quot;fountain&quot; sense. For an English-Spanish Machine Translation system, it is crucial to distinguish these three senses of spring. But it is also frequent that two or more senses of a word are translated into the same word, for one or more languages. For instance, child as &quot;human offspring (son or daughter) of any age&quot; and child as &quot;young male person&quot; are both translated into &quot;nifio&quot; in Spanish, into &quot;enfant&quot; in French, and into &quot;kind&quot; in German. We will use the term &quot;parallel polysemy&quot; to refer to this situation in the rest of this article.</Paragraph> <Paragraph position="12"> Obviously, a Machine Translation system does not need to distinguish these two senses. But it is also tempting to hypothesize that the existence of parallel polysemy in two or more target languages may indicate that the two senses are close enough to be clustered in more applications. Indeed, in (Resnik and Yarowsky, 1999) this criterion is proposed to determine which word senses should be retained or discarded in a testbed for automatic In particular, our goal has been to test whether two or more senses of a word are likely to be clustered, for IR purposes, if they have parallel polysemy in a certain number of languages via the EuroWorclNet InterLingual Indez. If the answer is positive, then the InterLingual Index, with eight languages interconnected, would be a rich source of information to provide IR clusters. In EWN, each monolingual database is linked, via Cross-Language equivalence relations, to the InterLingual Index (ILI) which is the superset of all concepts occurring in all languages. The ILI permits finding equivalent synsets between any pair of languages included in the database. For instance, senses 1 and 2 of child are translated into Spanish, French and German as follows: Child child 1 -r {child, kid} - (a human offspring (son or daughter) of any age; &quot;they had three children&quot;; &quot;they were able to send their kids to college&quot;) child 2 --~ {male child, boy, child} - (a young male person; &quot;the baby was a boy&quot;; &quot;she made the boy brush his teeth every night&quot;) Spanish: {child, kid} EQ-SYNONYM {ni~o, cr~o, menor} {male child, boy, child} EQ-SYNON'YM {nino} lwencch: {child, kid} EQ-SYNONYM{ en.fant, mineur} {male child, boy, child} EQ-SYNONYM{en\]ant) German: {child, kid} EQ-SYNONYM {kind} {mate child, boy, child} EQ-SYNOrCYM {kind, spross} Note that child I and child ~ have parallel translations in all three languages: Sp~mish (nifio), French (enfant) and German (kind). In this case, this criterion successfully detects a \])air of senses that could be clustered for Information Retrieval purposes.</Paragraph> <Paragraph position="13"> In order to test the general validity of this criterion, we have followed these steps: Select a set of nouns for a full manual study.</Paragraph> <Paragraph position="14"> We have chosen the set of 22 nouns used in the first SENSEVAL competition (Kilgarrift and Palmer, 2000). This set satisfied our requirements of size (small enough for an exhaustive manual revision), reasonable degree of polysemy, and unbiased for our testing purposes (the criteria to select these 22 nouns was obviously independent of our experiment). We had to reduce the original set to 20 nouns (corresponding to 73 EWN senses), as the other two nouns were polysemous in the Hector database used for SEN-SEVAL, but monosemous in WordNet 1.5 and EuroWordNet. As target languages we chose Spanish, French, Dutch and German.</Paragraph> <Paragraph position="15"> Extract the candidate senses that satisfy the parallel polysemy criterion, in three variants: - Experiment 1: sets of senses that have parallel translations in at least two out of the four target languages.</Paragraph> <Paragraph position="16"> - Experiment 2: sets of senses that have parallel translations in at least one out of the four target languages. This is a softer constraint that produces a superset of the sense clusters Obtained in Experiment 1.</Paragraph> <Paragraph position="17"> -Experhnent 3: sets of senses whose synsets are mapped into the same target synset for at least one of the target languages. This criterion cannot be tested on plain multilingual dictionaries, only on EWN-like semantic databases.</Paragraph> <Paragraph position="18"> * Check out manually whether the dusters produced in Experiments 1-3 are valid for Information Retrieval. At this step, the validity of clusters was checked by a human judge. Unfortunately, we did not have the chance yet to attest the validity of these judgments using more judges and extracting inter-annotator agreement rates. We could compare annotations only on a small ~action of cases (15 sense pairs), which we use to make the criterion &quot;valid for Itt&quot; precise enough for reliable annotation. The results are reported in sections 3.2-3.4 for the different experiments.</Paragraph> <Paragraph position="19"> * Identify all possible lexicographic reasons behind a parallel polysemy, taking advantage of the previous study. This is reported in the next section.</Paragraph> <Paragraph position="20"> * Check how many clusters obtained from Semcor also satisfy the parallel translation criterion, to have an idea of the overlap between both (section 3.5).</Paragraph> <Paragraph position="21"> * Finally, study whether the results have a dependency on possible incompleteness or inadequacy of the InterLingual I.udex (section 3.6).</Paragraph> <Section position="1" start_page="13" end_page="14" type="sub_section"> <SectionTitle> 3.1 Typology of parallel polysemy </SectionTitle> <Paragraph position="0"> Parallel polysemy can also be a sign of some systematic relation between the senses. As it is said in (Seto, 1996), ~(..) There often is a one-to-one correspondence between different languages in their lexiealization behaviour towards metonyrny, in other words, metonymically related word senses are often translated by the same word in other languages&quot;. null But the reasons for parallel polysemy are not limited only to systematic polysemy. In the case of the EWN database, we have distinguished the following causes: 1. There is a series of mechanisms of meaning extension, if not universal, at least, common to several languages: (a) Generallzation/speciall =ation For example, the following two senses for band: English: band; French: groupe; German: Band, Mnsicgruppe 1. Instrumentalists not including string players 2. A group of musicians playing popular music for dancing (b) (c) (d) Sense 1 is a specialization of Sense 2, and this pattern is repeated in French and German.</Paragraph> <Paragraph position="1"> Metonymic relations. Some of them form already will known systematic polysemy patterns. As for applicability to IR, we should be capable to discriminate regular polysemy rules that provide valid IR clusters from those that contain senses that can not be interpreted simultaneously within a same document. Ex- null amples include: English: glass; Spanish: vaso 1. container 2. quantity which is a valid IR cluster, and which should be distinguished for IR. Metaphors. This kind of semantic relation usually does not produce good IR clusters, because senses related by means of metaphor usually belong to different semantic fields and, consequently, tend to occur in distinct documents. For example: null English: giant; Spanish: coloso; French: colosse; Dutch: kolossus 1.a person of exceptional importance and reputation 2.someone who is abnormally large Semantic caique or loan translation. A (probably metaphorical) sense extension is copied in other languages. It also can produce undesirable clusters for Ilt, because the original relation between two senses involved can be based on a metaphor. For example: English: window; Spanish: ventana; Dutch: venster.</Paragraph> <Paragraph position="2"> 1.an opening in the wall of a building to admit light and air 2.a rectangular pert of a computer screen that is a display different of the rest of the screen The original computer sense for window is also adopted in Spanish and German 2.</Paragraph> <Paragraph position="3"> for the corresponding words ventana and venster.</Paragraph> <Paragraph position="4"> In certain occasions, the particularities of how the wordnets have been built semi-automatically lead to a mimesis of the WN1.5 senses and, consequently, to parallel polysemy in several languages. These sense distinctious are not incorrect, but perhaps would be different if every monolingual wordnet had been constructed without WN 1.5 as a reference for semi-automatic extraction of semantic relations. An example: Behaviottr: 1. Manner of acting or conducting oneself (Spanish: compertamiento, conducta; French: comportement, conduite) 2. (psychology) the aggregate of the responses or reaction or movements made by an organism in any situation (Spanish: comportamiento, conducta; French: comportement)</Paragraph> </Section> </Section> <Section position="5" start_page="14" end_page="17" type="metho"> <SectionTitle> 3. Beehavioural attributes </SectionTitle> <Paragraph position="0"> (Spanish: comportamiento, conducta; French: comportement) The question is what classes of parallel polysemy are dominant in EWN, and then whether parallel polysemy can be taken as a strong indication of a potential IR cluster. A preliminary answer to this question is reported in the next sections.</Paragraph> <Section position="1" start_page="14" end_page="15" type="sub_section"> <SectionTitle> 3.2 Experiment 1 </SectionTitle> <Paragraph position="0"> Here we selected all sense combinations, in our 20 English nouns test set, that had parallel translations in at least two of the four target languages considered (Spanish, French, Dutch and German).</Paragraph> <Paragraph position="1"> We found 10 clusters: 6 were appropriate for Information Retrieval, 3 were judged inappropriate, and one was due to an error in the database: Valid Itt clusters Band 1,2: something elongated, worn around the body or one of the limbs / a strip or stripe of a contrasting color or material (mapped into two different syusets in Spani.~h and French) band 2,5: a strip or stripe of a contrasting color or material/a stripe of a contrasting color (mapped into different syusets in Spanish and French; only one translation into Dutch.) band 8,9: instrumentalists not including string players / a group of musicians playing popular music for dancing (linked to the s~mae synset in German and in Dutch) behaviour 1,2,3: manner of acting or conducting oneself /(psychology) the aggregate of the responses or reaction or movements made by an organism in any situation / bchavioural attributes (two senses are sisters, and in general the distinction is not easy to understand; in two cases the Dutch synset is the same, and there is no Dutch translation for the other. In Sp~nigh there are three synsets that mimic the English ones).</Paragraph> <Paragraph position="2"> Bet 1,~: act of gambling/money risked (metonymy relation, translated iaato different synsets in Spanish and French. One or both translations missing for the other languages) ezcess 3,4: surplusage / overabundance (different synsets in Spanish and French, one or both translations missing in the other languages).</Paragraph> <Paragraph position="3"> inappropriate clusters giant 5,6: a person off exceptional importance / someone who is abnormally large (metaphoric relation; linked to the same syuset in Dutch, and to different synsets in Spanish and French) giant 5,7: a person of ezceptional importance / a very large person (metaphoric relation; linked to different synsets in Dutch and German) rabbit 1,2: mammal / meat (systematic polysemy; linked to different syusets in Spanish, German and French).</Paragraph> <Paragraph position="4"> Erroneous cluster steer 1,2: castrated bull/ hint, indication off potential opportunity. Both are translated into &quot;buey&quot; in Spanish and into &quot;stierkalf ~ in Dutch. Only the &quot;castrated bull&quot; --~ &quot;buey&quot; link is appropriate. null</Paragraph> </Section> <Section position="2" start_page="15" end_page="15" type="sub_section"> <SectionTitle> 3.3 Experiment 2 </SectionTitle> <Paragraph position="0"> If we take all clusters that have a parallel translation in at least one target language (rather than two target languages as in Experiment 1), we obtain a larger subset of 27 clusters. The 17 new clusters have the following distribution: * 9 valid clusters, such as bother 1,2 (something that causes trouble / angry disturbance).</Paragraph> <Paragraph position="1"> * 3 inappropriate clusters that relate homonyms, such as band 2,7 (strip or stripe of a contrasting color or material/unofHcial association of people).</Paragraph> <Paragraph position="2"> * 4 inappropriate clusters that group metonymieally related senses, such as sanction 2,3 (penalty/authorization).</Paragraph> <Paragraph position="3"> * I inappropriate cluster based on a metaphor: steering 2,3 (act of steering and holding the course/guiding, guidance) On the overall, we have 15 valid clusters, 11 inappropriate, and one error. The percentage of useful predictions is 56%, only slightly worse than for the tighter constraint of experiment 1. It is worth noticing that: 1. The parallel translation criterion obtained 27 potential clusters for 20 nouns, nearly one and a half cluster per noun. The criterion is very productive! 2. The percentage of incorrect clusters (41%) is high enough to suggest that parallel polysemy cannot be taken as a golden rule to cluster close senses, at least with the languages studied. Even 3 of the negative cases were homonyms, totally unrelated senses. Perhaps the general WSD clustering criterion proposed in (Resnik and Yarowsky, 1999) needs to be revised for a specific application such as IR. For instance, they argue that dusters based on parallel polysemy &quot;would eliminate many distinctions that are arguably better treated as regular polysemy'. But we have seen that regular polysemy may lead to sense distinctions that are important to keep in an Information Retrieval application. On the other hand, the results reported in (Resnik and Yarowsky, 1999) suggest that we would obtain better clusters if the parallel polysemy criteria is tested on more distant languages, such as Japanese or Basque to test English sense distinctions.</Paragraph> </Section> <Section position="3" start_page="15" end_page="16" type="sub_section"> <SectionTitle> 3.4 Experiment 3 </SectionTitle> <Paragraph position="0"> In this experiment, which cannot be done with a multilingual dictionary, we looked for sense distinctions that are translated into the same synset for some target language. This is a direct evidence of sense relatedness (both senses point to the same concept in the target language), although the relation may be complex (for instance, one of the two senses might be translated as an EQ-HYPONYM).</Paragraph> <Paragraph position="1"> We found 9 clusters satisfying the criterion, all of them for linlcq to the Dutch wordnet. 5 sense combinations are valid IR clusters. Three combinations turned out to be inappropriate for the needs of 1R (accident 1,2: chance / misfortune; steering 2,3: the act of steering and holding the course / guiding, guidance; giant 5,6: a person of exceptional importance / someone who is abnormally large). Finally, the erroneous cluster for steerl (castrated bull) and steer2 (hint, an indication of potential opportunity) reappeared again. The results for the three experiments are summazized in Table 1. It seems that the parallel polysemy criteria on the ILI can be a very rich source of information to cluster senses for IR, but it is as well obvious that it needs to be refined or manually revised to obtain high quality clusters.</Paragraph> </Section> <Section position="4" start_page="16" end_page="16" type="sub_section"> <SectionTitle> 3.5 Overlapping of criteria from Semcor </SectionTitle> <Paragraph position="0"> to ILI To complete evidence for correlation between Semcor-based clusters and ILI-based clusters, we studied two subsets of Semcor-based clusters to check if they matched the parallel polysemy criteria on the ILI. The first set were the 11 sense combinatious with a co-occurrence frequency above 7 in Semcor. 10 out of 11 (91%) also hold the most restrictive criterion used in Experiment 1, again indicating a strong correlation between both criteria. Then we augmented the set of sense combinations to 50 - with co-occurrence frequencies above 2-. This time, 27 clusters matched the criterion in Experiment 2 (54%). As the evidence for Semcor clustering decreases, the criterion of parallel translations is also less reliable, again indicating a correlation between both.</Paragraph> </Section> <Section position="5" start_page="16" end_page="17" type="sub_section"> <SectionTitle> 3.6 Adequacy of the ILI to get </SectionTitle> <Paragraph position="0"> tr-n~lation clusters Clustering methods based on the criterion of parallel translation depend, to a great extent, on the adequacy and quality of the lexical resources used. How many ILI clusters had we obtained in an EWN database with total coverage and completely error-free? Our experiments, though limited, are a first indication of the utility of EWN for this task: * Analyzing 73 WN senses corresponding to 20 nouns used in the SENSEVAL, we found 2 erroneons equivalence links in the Spanish and Dutch wordnets. Taking into account that EWN was built by semi-automatic means, this seems a low error rate.</Paragraph> <Paragraph position="1"> Only 16 senses out of 73 have equivalence links in the 4 selected wordnets. 19 senses have equivalence \]ink,q in 3 languages, 21 senses in 2 languages, 9 in only one language and 6 have no equivalence links in any of the selected worduets. The lack of equivalence links sometimes can be explained by the lack of lexicalized terms for a certain WN concept. For example, float2 (a drink with ice-cream floating in it) is not lexicalized in Spanish, so we should not expect an equivalence link for this sense in the Spanish wordnet. In many other cases though, the lack of the equivalence links is due to incompleteness in the database.</Paragraph> <Paragraph position="2"> Each monolingual wordnet reflects, to a large extent, the kind of Machine-Readable resources used to build it. The Spanish wordnet was built mainly from bilingual dictionaries and therefore is closer to the Wn 1.5 structure. The French word-net departed from an ontology-like database, and thus some non-lexicaliT.ed expressions are still present (for instance, float ~ has soda_avec_un_boule_de_glace as French equivalent). The Dutch wordnet departed from a lexical database rich in semantic information, thus it departs more from the Wordnet structure, has a richer connectivity and complex links into the InterLingual Index, etc. Cross-Language equivalent relations are not, therefore, totally homogeneous in EWN.</Paragraph> <Paragraph position="3"> On the overall, however, the ILI seems perfectly suitable for automatic applications regarding multilingual sense mappings. In particular, the fine-grainedness of Wordnet and EuroWord-Net, in spite of its lack of popularity among NLP researchers, may be an advantage for NLP applications, as it may suit different clusterings for different application requirements.</Paragraph> </Section> </Section> class="xml-element"></Paper>