XML Viewer - w06-3811

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-3811_metho.xml
Size: 18,044 bytes
Last Modified: 2025-10-06 14:10:58
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3811">
  <Title>Synonym Extraction Using a Semantic Distance on a Dictionary</Title>
  <Section position="3" start_page="65" end_page="67" type="metho">
    <SectionTitle>
3 Synonym extraction
</SectionTitle>
    <Paragraph position="0"> We used for the experiment the XML tagged MRD Tresor de la Langue Francaise informatise (TLFi) from ATILF (http://atilf.atilf.fr/), a large French dictionary with 54,280 articles, 92,997 entries and 271,166 definitions. The extraction of synonyms has been carried out only for nouns, verbs and adjectives. The basic assumption is that words with semantically close definitions are likely to be synonyms. We then designed a oriented graph that brings closer definitions that contain the same words, especially when these words occur in the beginning. We selected the noun, verb and adjective definitions from the dictionary and created a record for each of them with the information relevant to the building of the graph: the word or expression being defined (henceforth, definiendum); its grammatical category; the hierarchical position of the defined (sub-)sense in the article; the definition proper (henceforth definiens).</Paragraph>
    <Paragraph position="1"> Definitions are made of 2 members: a definiendum and a definiens and we strongly distinguish these 2 types of objects in the graph. They are represented by 2 types of nodes: a-type nodes for the words being defined and for their sub-senses; o-type nodes for the words that occur in definiens.</Paragraph>
    <Paragraph position="2"> For instance, the noun nostalgie 'nostalgia' has 6 defined sub-senses numbered A.1, A.2, B., C., C. - and D.:  NOSTALGIE, subst. fem.</Paragraph>
    <Paragraph position="3"> A. 1. Etat de tristesse [...] 2. Trouble psychique [...] B. Regret melancolique [...] desir d'un retour dans le passe.</Paragraph>
    <Paragraph position="4"> C. Regret melancolique [...] desir insatisfait.</Paragraph>
    <Paragraph position="5"> - Sentiment d'impuissance [...]  D. Etat de melancolie [...] The 6 sub-senses yield 6 a-nodes in the graph plus one for the article entry: a.S.nostalgie article entry</Paragraph>
    <Paragraph position="7"> A-node tags have 4 fields: the node type (namely a); its grammatical category (S for nouns, V for verbs and A for adjectives); the lemma that correponds to the definiendum; a representation of the hierarchical position of the sub-sense in the dictionary article. For instance, the A. 2. sub-sense of nostalgie corresponds to the hierarchical position 1_2.</Paragraph>
    <Paragraph position="8"> O-nodes represent the types that occur in definiens.1 A second example can be used to present them. The adjective jonceux 'rushy' has two sub-senses 'resembling rushes' and 'populated with rushes': Jonceux, -euse, a) Qui ressemble au jonc.</Paragraph>
    <Paragraph position="9"> b) Peuple de joncs.</Paragraph>
    <Paragraph position="10"> Actually, TLFi definitions are POS-tagged and lemmatized: null</Paragraph>
    <Paragraph position="12"> All the types that occur in definiens are represented, including the function words (pronouns, determiners...) and the punctuation. Function words play an important role in the graph because they bring closer the words that belong to the same semantical referential classes (e.g. the adjectives of resemblance), that is words that are likely to be synonyms. Their role is also reinforced by the manner edges are weighted.</Paragraph>
    <Paragraph position="13"> A large number of TLFi definitions concerns phrases and locutions. However, these definitions have been removed from the graph because: * their tokens are not identified in the definiens; * their grammatical categories are not given in the articles and are difficult to calculate; * many lexicalized phrases are not sub-senses of the article entry.</Paragraph>
    <Paragraph position="14"> O-node tags have 3 fields: the node type (namely o); the grammatical category of the word; its lemma.</Paragraph>
    <Paragraph position="15"> The oriented graph built for the experiment then contains one a-node for each entry and each entry sub-sense (i.e. each definiendum) and one o-node for each type that occurs in a definition (i.e. in a definiens). These nodes are connected as follows:  1. The graph is reflexive; 2. Sub-senses are connected to the words of their definiens and vice versa (e.g. there is an edge betweena.A.jonceux.1ando.Pro.qui, and another one between o.Pro.qui and a.A.jonceux.1).</Paragraph>
    <Paragraph position="16"> 3. Each a-node is connected to the a-nodes  of the immediately lower hierarchical level but there is no edge between an a-node and the a-nodes of higher hierarchical levels (e.g. a.S.nostalgie is connected to a.S.nostalgie.1_1, a.S.nostalgie.1_2, a.S.nostalgie.2, a.S.nostalgie.3 and a.S.nostalgie.4, but none of the sub-senses is connected to the entry).</Paragraph>
    <Paragraph position="17">  4. Each o-node is connected to the a-node that represents its entry, but there is no edge between the a-node representing an entry and the corresponding o-node (e.g. there is an edge between o.A.jonceux and a.A.jonceux, but none between a.A.jonceux and o.A.jonceux).</Paragraph>
    <Paragraph position="18"> All edge weights are 1 with the exception of the edges representing the 9 first words of each definiens. For these words, the edge weight takes into account their position in the definiens. The weight of the edge that represent the first token is 10; it is 9 for the second word; and so on down to  These characteristics are illustrated by the fragment of the graph representing the entry jonceux in table</Paragraph>
  </Section>
  <Section position="4" start_page="67" end_page="69" type="metho">
    <SectionTitle>
1.
4 Experiment and results
</SectionTitle>
    <Paragraph position="0"> Once the graph built, we used Prox to compute a semantic similarity between the nodes. We first turned the matrix G that represent the graph into a Markovian matrix [ ^G] as described in section 2 and then computed [ ^G]5, that correspond to 5-steps paths in the Markovian graph.4 For a given word, we have extracted as candidate synonyms the a-nodes (i) of the same category as the word (ii) that are the closest to the o-node representing that word in the dictionary definitions. Moreover, only the first a-node of each entry is considered. For instance, the candidate synonyms of the verb accumuler 'accumulate' are the a-nodes representing verbs (i.e. their tags begin in a.V) that are the closer to the o.V.accumuler node.</Paragraph>
    <Paragraph position="1">  The three first groups take advantage of the fact that synonyms of the definiendum are often used in definiens.</Paragraph>
    <Paragraph position="2"> The question of the evaluation of the extraction of synonyms is a difficult one, as was already mentioned in the introduction. We have at our disposal several thesauri for French, with various coverages (from about 2000 pairs of synonyms, to 140,000), and a lot of discrepancies.5 If we compare the thesaurus with each other and restrict the comparison to their common lexicon for fairness, we still have a lot of differences. The best f-score is never above 60%, and it raises the question of the proper gold standard to begin with. This is all the more distressing as the dictionary we used has a larger lexicon than all the thesaurus considered together (roughly twice as much). As our main purpose is to build a set of synonyms from the TLF to go beyond the available thesaurus, we have no other way but to have lexicographers look at the result and judge the quality of candidate synonyms. Before imposing this workload on our lexicographer colleagues, we took a sample of 50 verbs and 50 nouns, and evaluated the first ten candidates for each, using the ranking method presented above, and a simpler version with equal weights and no distinction between sense levels or node types. The basic version of the graph also excludes nodes with too many neighbours, such as &amp;quot;etre&amp;quot; (be), &amp;quot;avoir&amp;quot; (have), &amp;quot;chose&amp;quot; (thing), etc. ). Two of the authors separately evaluated the candidates, with the synonyms from the existing thesauri  much more liberal than the other about synonymy, but most synonyms accepted by the first were accepted by the second judge (precision of 0.85).6 We also considered a few baselines inspired by the method. Obviously a lot of synonyms appear in the definition of a word, and words in a definition tend to be consider close to the entry they appear in. So we tried two different baselines to estimate this bias, and how our method improves or not from this.</Paragraph>
    <Paragraph position="3"> The first baseline considers as synonyms of a word all the words of the same category (verbs or nouns in each case) that appear in a definition of the word, and all the entry the word appear in. Then we selected ten words at random among this base.</Paragraph>
    <Paragraph position="4"> The second baseline was similar, but restricted to the first word appearing in a definition of another word.</Paragraph>
    <Paragraph position="5"> Again we took ten words at random in this set if it was larger than ten, and all of them otherwise.</Paragraph>
    <Paragraph position="6"> We show the results of precision for the first candidate ranked by prox, the first 5, and the first 10 (always excluding the word itself). In the case of the two baselines, results for the first ten are a bit  both verbs and nouns, which only moderately satisfactory. misleading, since the average numbers of candidates proposed by these methods were respectively 8 and 6 for verbs and 9 and 5.6 for nouns (Table 2). Also, nouns had an average of 5.8 synonyms in the existing thesauri (when what was considered was the min between 10 and the number of synonyms), and verbs had an average of 8.9.</Paragraph>
    <Paragraph position="7"> We can see that both baselines outperforms weighted prox on the existing thesaurus for verbs, and that the simpler prox is similar to baseline 2 (first word only). For nouns, results are close between B2 and the two proxs. It is to be noted that a lot of uncommon words appear as candidates, as they are related with very few words, and a lot of these do not appear in the existing thesauri.</Paragraph>
    <Paragraph position="8"> By looking precisely at each candidate (see judges' scores), we can see that both baselines are slightly improved (and still close to one another), but are now beaten by both prox for the first and the first 5 words. There is a big difference between the two judges, so Judge 2 has better scores than Judge 1 for the baselines, but in each case, prox was better. It could be troubling to see how good the second base-line is for the first 10 candidates, but one must remember this baseline actually proposes 6 candidates on average (when prox was always at 10), making it actually nothing more than a variation on the 5  candidate baseline, to which it should be compared in all fairness (and we see that prox is much better there). The difference between the two versions of prox shows that a basic version is better for verbs and the more elaborate one is better for nouns, with overall better results for verbs than for nouns.</Paragraph>
    <Paragraph position="9"> One could wonder why there was some many more candidates marked as synonyms by both judges, compared to the original compilation of thesaurus.</Paragraph>
    <Paragraph position="10"> Mainly, it seemed to us that it can be accounted for by a lot of infrequent words, or old senses of words absent for more restricted dictionaries. We are currently investigating this matter. It could also be that our sample picked out a lot of not so frequent words since they outnumber frequent words in such a large dictionary as the TLF. An indication is the average frequency of words in a corpus of ten years of the journal &amp;quot;Le Monde&amp;quot;. The 50 words picked out in our sample have an average frequency of 2000 occurrences, while when we consider all our about 430 candidates for synonymy, the average frequency is 5300.</Paragraph>
    <Paragraph position="11"> The main conclusion to draw here is that our method is able to recover a lot of synonyms that are in the definition of words, and some in definitions not directly related, which seems to be an improvement on previous attempts from dictionaries. There is some arbitrariness in the method that should be further investigated (the length of the random walk for instance), but we believe the parameters are rather intuitive wrt to graph concepts. We also have an assessment of the quality of the method, even though it is still on a sample. The precision seems fair on the first ten candidates, enough to be used in a semi-automatic way, coupled with a lexicographic analysis. null</Paragraph>
  </Section>
  <Section position="5" start_page="69" end_page="70" type="metho">
    <SectionTitle>
5 Related work
</SectionTitle>
    <Paragraph position="0"> Among the methods proposed to collect synonymy information, two families can be distinguished according to the input they consider. Either a general dictionary is used (or more than one (Wu and Zhou, 2003)), or a corpus of unconstrained texts from which lexical distributions are computed (simple collocations or syntactic dependencies) (Lin, 1998; Freitag et al., 2005) . The approach of (Barzilay and McKeown, 2001) uses a related kind of resource: multiple translations of the same text, with additional constraints on availability, and problems of text alignment, for only a third of the results being synonyms (when compared to Wordnet).</Paragraph>
    <Paragraph position="1"> A measure of similarity is almost always used to rank possible candidates. In the case of distributional approaches, similarity if determined from the appearance in similar contexts (Lin, 1998); in the case of dictionary-based methods, lexical relations are deduced from the links between words expressed in definitions of entries.</Paragraph>
    <Paragraph position="2"> Approaches that rely on distributional data have two major drawbacks: they need a lot of data, generally syntactically parsed sentences, that is not always available for a given language (English is an exception), and they do not discriminate well among lexical relations (mainly hyponyms, antonyms, hypernyms) (Weeds et al., 2004) . Dictionary-based  approaches address the first problem since dictionaries are readily available for a lot of language, even electronically, and this is the raison d'etre of our effort. As we have seen here, it is not an obvious task to sort related terms with respect to synonymy, hypernymy, etc, just as with distribution approaches.</Paragraph>
    <Paragraph position="3"> A lot of work has been done to extract lexical relations from the definitions taken in isolation (mostly for ontology building), see recently (Nichols et al., 2005), with a syntactic/semantic parse, with usually results around 60% of precision (that can be compared with the same baseline we used, all words in the definition with the same category), on dictionaries with very small definitions (and thus a higher proportions of synonyms and hypernyms). Estimating the recall of such methods have not been done. Using dictionaries as network of lexical items or senses has been quite popular for word sense disambiguation (Veronis and Ide, 1990; H.Kozima and Furugori, 1993; Niwa and Nitta, 1994) before losing ground to statistical approaches, even though (Gaume et al., 2004; Mihalcea et al., 2004) tried a revival of such methods. Both (Ho and Cedrick, 2004) and (Blondel et al., 2004) build a graph of lexical items from a dictionary in a manner similar to ours.</Paragraph>
    <Paragraph position="4"> In the first case, the method used to compute similarity between two concepts (or words) is restricted to neighbors, in the graph, of the two concepts; in the second case, only directly related words are considered as potential candidates for synonymy: for two words to be considered synonyms, one has to appear in the definition of another. In both cases, only 6 or 7 words have been used as a test of synonymy, with a validation provided by the authors with &amp;quot;related terms&amp;quot; (an unclear notion) considered correct. The similarity measure itself was evaluated on a set of related terms from (Miller and Charles, 1991), as in (Budanitsky and Hirst, 2001; Banerjee and Pedersen, 2003), with seemingly good results, but semantically related terms is a very different notion (&amp;quot;car&amp;quot; and &amp;quot;tire&amp;quot; for instance are semantically related terms, and thus considered similar).</Paragraph>
    <Paragraph position="5"> We do not know of any dictionary-based graph approach which have been given a larger evaluation of its results. Parsing definitions in isolation prevents a complete coverage (we estimated that only 30% of synonyms pairs in the TLF can be found from definitions). null As for distributional approaches, (Barzilay and McKeown, 2001) gets a very high precision (around 90%) on valid paraphrases as judged by humans, among which 35% are synonymy relations in Wordnet, 32% are hypernyms, 18% are coordinate terms.</Paragraph>
    <Paragraph position="6"> Discriminating among the paraphrases types is not addressed. Other approaches usually consider either given sets of synonyms among which one is to be chosen (for a translation for instance) (Edmonds and Hirst, 2002) or must choose a synonym word against unrelated terms in the context of a synonymy test (Freitag et al., 2005), a seemingly easier task than actually proposing synonyms. (Lin, 1998) proposes a different methodology for evaluation of candidate synonyms, by comparing similarity measures of the terms he provides with the similarity measures between them in Wordnet, using various semantic distances. This makes for very complex evaluation procedures without an intuitive interpretation, and there is no assessment of the quality of the automated thesaurus. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML