File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1003_intro.xml

Size: 11,730 bytes

Last Modified: 2025-10-06 14:03:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1003">
  <Title>SS</Title>
  <Section position="3" start_page="0" end_page="18" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In this paper we present an application fostering the integration and interoperability of computational lexicons, focusing on the particular case of mutual linking and cross-lingual enrichment of two wordnets. The development of this application is intended as a case-study and a test-bed for trying out needs and requirements posed by the challenge of semi-automatic integration and enrichment of practical, large-scale multilingual lexicons for use in computer applications. While a number of lexicons already exist, few of them are practically useful, either since they are not sufficiently broad or because they don't cover the necessary level of detailed information.</Paragraph>
    <Paragraph position="1"> Moreover, multilingual language resources are not as widely available and are very costly to construct: the work process for manual development of new lexical resources or for tailoring existing ones is too expensive in terms of effort and time to be practically attractive.</Paragraph>
    <Paragraph position="2"> The need of ever growing lexical resources for effective multilingual content processing has urged the language resource community to call for a radical change in the perspective of language resource creation and maintenance and the design of a &amp;quot;new generation&amp;quot; of LRs: from static, closed and locally developed resources to shared and distributed language services, based on open content interoperability standards. This has often been called a &amp;quot;change in paradigm&amp;quot; (in the sense of Kuhn, see Calzolari and Soria, 2005; Calzolari 2006). Leaving aside the tantalizing task of building on-site resources, the new paradigm depicts a scenario where lexical resources are cooperatively built as the result of controlled co-operation of different agents, adopting the paradigm of accumulation of knowledge so successful in more mature disciplines, such as biology and physics (Calzolari, 2006).</Paragraph>
    <Paragraph position="3"> According to this view (or, better, this vision), different lexical resources reside over distributed places and can not only be accessed but choreographed by agents presiding the actions that can be executed over them. This implies the ability to build on each other achievements, to merge results, and to have them accessible to various systems and applications.</Paragraph>
    <Paragraph position="4"> At the same time, there is another argument in favor of distributed lexical resources: language resources, lexicons included, are inherently distributed because of the diversity of languages distributed over the world. It is not only natural that language resources to be developed and maintained in their native environment. Since language evolves and changes over time, it is not possible to describe the current state of the lan- null guage away from where the language is spoken.</Paragraph>
    <Paragraph position="5"> Lastly, the vast range of diversity of languages also makes it impossible to have one single universal centralized resource, or even a centralized repository of resources.</Paragraph>
    <Paragraph position="6"> Although the paradigm of distributed and interoperable lexical resources has largely been discussed and invoked, very little has been made in comparison for the development of new methods and techniques for its practical realization. Some initial steps are made to design frameworks enabling inter-lexica access, search, integration and operability. An example is the Lexus tool (Kemps-Snijders et al., 2006), based on the Lexical Markup Framework (Romary et al., 2006), that goes in the direction of managing the exchange of data among large-scale lexical resources. A similar tool, but more tailored to the collaborative creation of lexicons for endangered language, is SHAWEL (Gulrajani and Harrison, 2002). However, the general impression is that little has been made towards the development of new methods and techniques for attaining a concrete interoperability among lexical resources.</Paragraph>
    <Paragraph position="7"> Admittedly, this is a long-term scenario requiring the contribution of many different actors and initiatives (among which we only mention standardisation, distribution and international cooperation). null Nevertheless, the intent of our project is to contribute to fill in this gap, by exploring in a controlled way the requirement and implications posed by new generation multilingual lexical resources. The paper is organized as follows: section 2 describes the general architectural design of our project; section 3 describes the module taking care of cross-lingual integration of lexical resources, by also presenting a case-study involving an Italian and Chinese lexicons. Finally, section 4 presents our considerations and lessons learned on the basis of this exploratory testing.</Paragraph>
    <Paragraph position="8"> 2 An Architecture for Integrating Lexical Resources LeXFlow (Soria et al., 2006) was developed having in mind the long-term goal of lexical resource interoperability. In a sense, LeXFlow is intended as a proof of concept attempting to make the vision of an infrastructure for access and sharing of linguistic resources more tangible.</Paragraph>
    <Paragraph position="9"> LeXFlow is an adaptation to computational lexicons of XFlow, a cooperative web application for the management of document workflows (DW, Marchetti et al., 2005). A DW can be seen as a process of cooperative authoring where a document can be the goal of the process or just a side effect of the cooperation. Through a DW, a document life-cycle is tracked and supervised, continually providing control over the actions leading to document compilation. In this environment a document travels among agents who essentially carry out the pipeline receive-processsend activity.</Paragraph>
    <Paragraph position="10"> There are two types of agents: external agents are human or software actors performing activities dependent from the particular Document Workflow Type; internal agents are software actors providing general-purpose activities useful for many DWTs and, for this reason, implemented directly into the system. Internal agents perform general functionalities such as creating/converting a document belonging to a particular DW, populating it with some initial data, duplicating a document to be sent to multiple agents, splitting a document and sending portions of information to different agents, merging duplicated documents coming from multiple agents, aggregating fragments, and finally terminating operations over the document. External agents basically execute some processing using the document content and possibly other data; for instance, accessing an external database or launching an application.</Paragraph>
    <Paragraph position="11"> LeXFlow was born by tailoring XFlow to management of lexical entries; in doing so, we have assumed that each lexical entry can be modelled as a document instance, whose behaviour can be formally specified by means of a lexical workflow type (LWT). A LWT describes the life-cycle of a lexical entry, the agents allowed to act over it, the actions to be performed by the agents, and the order in which the actions are to be executed. Embracing the view of cooperative workflows, agents can have different rights or views over the same entry: this nicely suits the needs of lexicographic work, where we can define different roles (such as encoder, annotator, validator) that can be played by either human or software agents. Other software modules can be inserted in the flow, such as an automatic acquirer of information from corpora or from the web. Moreover, deriving from a tool designed for the cooperation of agents, LeXFlow allows to manage workflows where the different agents can reside over distributed places.</Paragraph>
    <Paragraph position="12"> LeXFlow thus inherits from XFlow the general design and architecture, and can be considered as a specialized version of it through design  of specific Lexical Workflow Types and plug-in of dedicated external software agents. In the next section we briefly illustrate a particular Lexical Workflow Type and the external software agents developed for the purpose of integrating different lexicons belonging to the same language. Since it allows the independent and coordinated sharing of actions over portions of lexicons, LeXFlow naturally lends itself as a tool for the management of distributed lexical resources.</Paragraph>
    <Paragraph position="13"> Due to its versatility, LeXFlow is both a general framework where ideas on automatic lexical resource integration can be tested and an infrastructure for proving new methods for cooperation among lexicon experts.</Paragraph>
    <Section position="1" start_page="18" end_page="18" type="sub_section">
      <SectionTitle>
2.1 Using LeXFlow for Lexicon Enrichment
</SectionTitle>
      <Paragraph position="0"> In previous work (Soria et al., 2006), the LeXFlow framework has been tested for integration of lexicons with differently conceived lexical architectures and diverging formats. It was shown how interoperability is possible between two Italian lexicons from the SIMPLE and WordNet families, respectively, namely the SIMPLE/CLIPS (Ruimy et al., 2003) and ItalWordNet (Roventini et al., 2003) lexicons.</Paragraph>
      <Paragraph position="1"> In particular, a Lexical Workflow Type was designed where the two different monolingual semantic lexicons interact by reciprocally enriching themselves and moreover integrate information coming from corpora. This LWT, called &amp;quot;lexicon augmentation&amp;quot;, explicitly addresses dynamic augmentation of semantic lexicons. In this scenario, an entry of a lexicon A becomes enriched via basically two steps. First, by virtue of being mapped onto a corresponding entry belonging to a lexicon B, the entry A inherits the semantic relations available in the mapped entry null B . Second, by resorting to an automatic application that acquires information about semantic relations from corpora, the acquired relations are integrated into the entry and proposed to the human encoder.</Paragraph>
      <Paragraph position="2"> B An overall picture of the flow is shown in Figure 1, illustrating the different agents participating in the flow. Rectangles represent human actors over the entries, while the other figures symbolize software agents: ovals are internal agents and octagons external ones. The two external agents involved in this flow are the &amp;quot;relation calculator&amp;quot; and the &amp;quot;corpora extractor&amp;quot;. The first is responsible for the mapping between the sets of semantic relations used by the different lexicons. The &amp;quot;corpora extractor&amp;quot; module invokes an application that acquires information about part-of relations by identifying syntactic constructions in a vast Italian corpus. It then takes care of creating the appropriate candidate semantic relations for each lemma that is proposed by the application.</Paragraph>
      <Paragraph position="3">  Type.</Paragraph>
      <Paragraph position="4"> A prototype of LeXFlow has been implemented with an extensive use of XML technologies (XML Schema, XSLT, XPath, XForms, SVG) and open-source tools (Cocoon, Tomcat, mySQL). It is a web-based application where human agents interact with the system through an XForms browser that displays the document to process as a web form whereas software agents interact with the system via web services.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML