XML Viewer - w06-1003

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1003_metho.xml
Size: 12,814 bytes
Last Modified: 2025-10-06 14:10:42
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1003">
  <Title>SS</Title>
  <Section position="4" start_page="18" end_page="22" type="metho">
    <SectionTitle>
3 Multilingual WN Service
</SectionTitle>
    <Paragraph position="0"> In the Section above we have illustrated the general architecture of LeXFlow and showed how a Lexical Workflow Type can be implemented in order to enrich already existing lexicons belonging to the same language but realizing different models of lexicon encoding. In this section we move to a cross-lingual perspective of lexicon integration. We present a module that similarly addresses the issue of lexicon augmentation or enrichment focusing on mutual enrichment of two wordnets in different languages and residing at different sites.</Paragraph>
    <Paragraph position="1"> This module, named &amp;quot;multilingual WN Service&amp;quot; is responsible for the automatic cross-lingual fertilization of lexicons having a Word- null Net-like structure. Put it very simply, the idea behind this module is that a monolingual word-net can be enriched by accessing the semantic information encoded in corresponding entries of other monolingual wordnets.</Paragraph>
    <Paragraph position="2"> Since each entry in the monolingual lexicons is linked to the Interlingual Index (ILI, cf. Section 3.1), a synset of a WN(A) is indirectly linked to another synset in another WN(B). On the basis of this correspondence, a synset(A) can be enriched by importing the relations that the corresponding synset(B) holds with other synsets(B), and vice-versa. Moreover, the enrichment of WN(A) will not only import the relations found in WN(B), but it will also propose target synsets in the language(A) on the basis of those found in language(B).</Paragraph>
    <Paragraph position="3"> The various WN lexicons reside over distributed servers and can be queried through web ser- null Put in the framework of the general LeXFlow architecture, the Multilingual wordnet Service can be seen as an additional external software agent that can be added to the augmentation workflow or included in other types of lexical flows. For instance, it can be used not only to enrich a monolingual lexicon but to bootstrap a bilingual lexicon.</Paragraph>
    <Section position="1" start_page="19" end_page="20" type="sub_section">
      <SectionTitle>
3.1 Linking Lexicons through the ILI
</SectionTitle>
      <Paragraph position="0"> The entire mechanism of the Multilingual WN Service is based on the exploitation of Interlingual Index (Peters et al., 1998), an unstructured version of WordNet used in EuroWordNet (Vossen et al., 1998) to link wordnets of different languages; each synset in the language-specific wordnet is linked to at least one record of the ILI by means of a set of equivalence relations (among which the most important is the EQ_SYNONYM, that expresses a total, perfect equivalence between two synsets).</Paragraph>
      <Paragraph position="1"> Figure 6 describes the schema of a WN lexical entry. Under the root &amp;quot;synset&amp;quot; we find both internal relations (&amp;quot;synset relations&amp;quot;) and ILI Relations, which link to ILI synsets.</Paragraph>
      <Paragraph position="2"> Figure 3 shows the role played by the ILI as set of pivot nodes allowing the linkage between concepts belonging to different wordnets.</Paragraph>
      <Paragraph position="3">  In the Multilingual WN Service, only equivalence relations of type EQ_SYNONYM and EQ_NEAR_SYNONYM have been taken into account, being them the ones used to represent a translation of concepts and also because they are the most exploited (for example, in IWN, they cover about the 60% of the encoded equivalence relations). The EQ_SYNONYM relation is used to realize the one-to-one mapping between the language-specific synset and the ILI, while multiple EQ_NEAR_SYNONYM relations (because of their nature) might be encoded to link a single language-specific synset to more than one ILI record. In Figure 4 we represented the possible relevant combinations of equivalence relations that can realize the mapping between synsets belonging to two languages. In all the four cases, a synset &amp;quot;a&amp;quot; is linked via the ILI record to a synset &amp;quot;b&amp;quot; but a specific procedure has been foreseen in order to calculate different &amp;quot;plausibility scores&amp;quot; to each situation. The procedure relies on different rates assigned to the two equivalence relations (rate &amp;quot;1&amp;quot; to EQ_NEAR_SYNONYM relation and rate &amp;quot;0&amp;quot; to the EQ_SYNONYM). In this way we can distinguish the four cases by assigning respectively a weight of &amp;quot;0&amp;quot;, &amp;quot;1&amp;quot;, &amp;quot;1&amp;quot; and &amp;quot;2&amp;quot;.</Paragraph>
      <Paragraph position="4">  between two Lexicons A and B and the ILI.</Paragraph>
      <Paragraph position="5"> The ILI is a quite powerful yet simple method to link concepts across the many lexicons belonging to the WordNet-family. Unfortunately, no version of the ILI can be considered a standard and often the various lexicons exploit different version of WordNet as ILI  . This is a problem that is handled at web-service level, by incorporating the conversion tables provided by (Daude et al., 2001). In this way, the use of different versions of WN does not have to be taken into consideration by the user who accesses the system but it is something that is resolved by the system itself  . This is why the version of the ILI is a parameter of the query to web service (see Section below).</Paragraph>
    </Section>
    <Section position="2" start_page="20" end_page="21" type="sub_section">
      <SectionTitle>
3.2 Description of the Procedure
</SectionTitle>
      <Paragraph position="0"> On the basis of ILI linking, a synset can be enriched by importing the relations contained in the corresponding synsets belonging to another wordnet.</Paragraph>
      <Paragraph position="1"> In the procedure adopted, the enrichment is performed on a synset-by-synset basis. In other words, a certain synset is selected from a word-net resource, say WN(A). The cross-lingual module identifies the corresponding ILI synset, on the basis of the information encoded in the synset. It then sends a query to the WN(B) web service providing the ID of ILI synset together with the ILI version of the starting WN. The WN(B) web service returns the synset(s) corresponding to the WN(A) synset, together with reliability scores. If WN(B) is based on a different ILI version, it can carry out the mapping between ILI versions (for instance by querying the ILI mapping web service). The cross-lingual module then analyzes the synset relations encoded in the  For example, the Chinese and the Italian wordnets considered as our case-study use respectively versions 1.6 and 1.5.  It should be noted, however, that the conversion between different WN versions could not be accurate so the mapping is always proposed with a probability score.</Paragraph>
      <Paragraph position="2"> WN(B) synset and for each of them creates a new synset relation for the WN(A) synset.</Paragraph>
      <Paragraph position="3"> If the queried wordnets do not use the same set of synset relations, the module must take care of the mapping between different relation sets. In our case-study no mapping was needed, since the two sets were completely equivalent.</Paragraph>
      <Paragraph position="4"> Each new relation is obtained by substituting the target WN(B) synset with the corresponding synset WN(A), which again is found by querying back the WN(A) web service (all these steps through the ILI). The procedure is formally defined by the following formula:  Every local wordnet has to provide a web service API with the following methods:  1. GetWeightedSynsetsByIli(ILIid, ILIversion) 2. GetSynsetById(sysnsetID) 3. GetSynsetsByLemma(lemma)  The returned synsets of each method must be formatted in XML following the schema depicted in Figure 6: Figure 6. Schema of Wordnet Synsets Returned by WN Web Services.</Paragraph>
      <Paragraph position="5"> The scores returned by the method &amp;quot;Get-WeightedSynsetsByIli&amp;quot; are used by our module to calculate the reliability rating for each new proposed relation.</Paragraph>
    </Section>
    <Section position="3" start_page="21" end_page="21" type="sub_section">
      <SectionTitle>
3.3 A Case Study: Cross-fertilization be-
</SectionTitle>
      <Paragraph position="0"> tween Italian and Chinese Wordnets.</Paragraph>
      <Paragraph position="1"> We explore this idea with a case-study involving the ItalianWordNet (Roventini et al., 2003) and the Academia Sinica Bilingual Ontological Wordnet (Sinica BOW, Huang et al., 2004).</Paragraph>
      <Paragraph position="2"> The BOW integrates three resources: Word-Net, English-Chinese Translation Equivalents Database (ECTED), and SUMO (Suggested Upper Merged Ontology). With the integration of these three key resources, Sinica BOW functions both as an English-Chinese bilingual wordnet and a bilingual lexical access to SUMO. Sinica Bow currently has two bilingual versions, corresponding to WordNet 1.6. and 1.7. Based on these bootstrapped versions, a Chinese Wordnet (CWN, Huang et al. 2005) is under construction with handcrafted senses and lexical semantic relations. For the current experiment, we have used the version linking to WordNet 1.6.</Paragraph>
      <Paragraph position="3"> ItalWordNet was realized as an extension of the Italian component of EuroWordNet. It comprises a general component consisting of about 50,000 synsets and terminological wordnets linked to the generic wordnet by means of a specific set of relations. Each synset of ItalWordNet is linked to the Interlingual-Index (ILI).</Paragraph>
      <Paragraph position="4"> The two lexicons refer to different versions of the ILI (1.5 for IWN and 1.6 for BOW), thus making it necessary to provide a mapping between the two versions. On the other hand, no mapping is necessary for the set of synset relations used, since both of them adopt the same set. For the purposes of evaluating the cross-lingual module, we have developed two web-services for managing a subset of the two resources. null The following Figure shows a very simple example where our procedure discovers and proposes a new meronymy relation for the Italian synset {passaggio,strada,via}. This synset is equivalent to the ILI &amp;quot;road,route&amp;quot; that is ILIconnected with BOW synset &amp;quot;Dao Lu ,Dao ,Lu &amp;quot; (dao_lu, dao, lu) (Figure 7, A) . The Chinese synset has a meronymy relation with the synset &amp;quot;Shi Zi Lu Kou &amp;quot; (wan) (B). This last synset is equivalent to the ILI &amp;quot;bend, crook, turn&amp;quot; that is ILIconnected with Italian WordNet synset &amp;quot;curvatura, svolta, curva&amp;quot; (C). Therefore the procedure will propose a new candidate meronymy relation between the two Italian WordNet synsets (D).</Paragraph>
    </Section>
    <Section position="4" start_page="21" end_page="22" type="sub_section">
      <SectionTitle>
3.4 Considerations and Lessons Learned
</SectionTitle>
      <Paragraph position="0"> Given the diversity of the languages for which wordnets exist, we note that it is difficult to implement an operational standard across all typologically different languages. Work on enriching and merging multilingual resources presupposes that the resources involved are all encoded with the same standard. However, even with the best efforts of the NLP community, there are only a small number of language resources encoded in any given standard. In the current work, we presuppose a de-facto standard, i.e. a shared and conventionalized architecture, the WordNet one.</Paragraph>
      <Paragraph position="1"> Since the WordNet framework is both conventionalized and widely followed, our system is  able to rely on it without resorting to a more substantial and comprehensive standard. In the case, for instance, of integration of lexicons with different underlying linguistic models, the availability of the MILE (Calzolari et al., 2003) was an essential prerequisite of our work. Nevertheless, even from the perspective of the same model, a certain degree of standardization is required, at least at the format level.</Paragraph>
      <Paragraph position="2"> From a more general point of view, and even from the perspective of a limited experiment such as the one described in this paper, we must note that the realization of the new vision of distributed and interoperable language resources is strictly intertwined with at least two prerequisites. On the one side, the language resources need to be available over the web; on the other, the language resource community will have to reconsider current distribution policies, and to investigate the possibility of developing an &amp;quot;Open Source&amp;quot; concept for LRs.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML