File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-1204_metho.xml

Size: 9,475 bytes

Last Modified: 2025-10-06 14:08:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1204">
  <Title>in the Ocean. Transatlantic Standards for Multilingual Lexicons (with an eye to Machine Translation). In Proceedings of</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 EU chair: N. O. Bernsen; US chair: M. Liberman.
4 EU chair: M. King; US chair: E. Hovy.
</SectionTitle>
    <Paragraph position="0"> methodology, with experts from both the EU and US, acting as a catalyst in order to pool concrete results coming from major international/national/industrial projects.</Paragraph>
    <Paragraph position="1"> Relevant common practices or upcoming standards are being used where appropriate as input to EAGLES/ISLE work. Numerous theories, approaches, and systems are being taken into account as any recommendation for harmonisation must take into account the needs and nature of the different major contemporary approaches.</Paragraph>
    <Paragraph position="2"> Results are widely disseminated, after due validation in collaboration with EU and US HLT R&amp;D projects, National projects, and industry.</Paragraph>
    <Paragraph position="3"> In the following we concentrate on the</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Computational Lexicon Working Group
</SectionTitle>
      <Paragraph position="0"> (CLWG), trying to describe its specific methodology and its goal of establishing a general and consensual standardized environment for the development and integration of multilingual resources. The general vision adheres to the idea of enhancing the sharing and reusability of multilingual lexical resources, by promoting the definition of a common parlance for the community of multilingual HLT and computational lexicon developers. The CLWG pursues this goal by proposing a general schema for the encoding of multilingual lexical information, the MILE (Multilingual ISLE Lexical Entry). This has to be intended as a meta -entry, acting as a common representational layer for multilingual lexical resources.</Paragraph>
      <Paragraph position="1"> We describe the preliminary proposals of guidelines for the MILE, highlighting some methodological principles applied in previous EAGLES.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 The Computational Lexicon
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Working Group
</SectionTitle>
      <Paragraph position="0"> Existing EAGLES results in the Lexicon and Corpus areas are currently adopted by an impressive number of European - and recently also National - projects and has became the &amp;quot;de-facto standard&amp;quot; for LR in Europe. This is a very good measure of the impact - and of the need - of such a standardisation initiative in the HLT sector. To mention just a few key examples: [?] the LE PAROLE/SIMPLE resources (morphological/syntactic/semantic lexicons and corpora for 12 EU languages (Zampolli, 1997) (Ruimy et al., 1998) (Lenci et al., 1999) (Bel et al., 2000) rely on EAGLES results (Sanfilippo et al., 1996) (Sanfilippo et al., 1999), and are now being enlarged to real-size lexicons through many National Projects, thus building a really large infrastructural platform of harmonised lexicons in Europe, sharing the same model; [?] the ELRA Validation Manuals for Lexicons (Underwood and Navarreta, 1997) and Corpora (Burnard et al., 1997) are based on EAGLES guidelines; [?] morpho-syntactic encoding of lexicons and tagging of corpora in a very large number of EU, international and national projects - and for more than 20 languages -- is conformant to EAGLES recommendations (Monachini and Calzolari, 1996) (Monachini and Calzolari, 1999) (Leech and Wilson, 1996).</Paragraph>
      <Paragraph position="1"> Standards must emerge from state -of-the-art developments. The process of standardisation, although by its own nature not intrinsically innovative, must - and actually does - proceed shoulder to shoulder with the most advanced research. Since ISLE involves many bodies active in EU-US NLP and speech projects, close collaboration with these projects is assured and, significantly, free manpower has been contributed by the projects, as a sign of both their commitment and of the crucial importance they place on reusability issues.</Paragraph>
      <Paragraph position="2"> Lexical semantics has always represented a sort of wild frontier in the investigation of natural language. In fact, the number of open issues in lexical semantics both on the representational, architectural and content level might induce an actually unjustified negative attitude towards the possibility of designing standards in this difficult territory. Rather to the contrary, standardisation must be conceived as enucleating and singling out the areas in the open field of lexical semantics, that already present themselves with a clear and high degree of stability, although this is often hidden behind a number of formal differences or representational variants, that prevent the possibility of exploiting and enhancing the aspects of commonality and the already consolidated achievements.</Paragraph>
      <Paragraph position="3"> With no intent of imposing any constraints on investigation and experimentation, the ISLE CLWG rather aims at selecting mature areas and results in computational lexical semantics and in multilingual lexicons, which can also be regarded as stabilised achievements, thus to be used as the basis for future research. Therefore, consolidation of a standards proposal must be viewed, by necessity, as a slow process comprising, after the phase of putting forward proposals, a cyclical phase involving ISLE external groups and projects with: i) careful evaluation and testing of recommendations in concrete applications; ii) application, if appropriate, to a large number of European languages; iii) feedback on and readjustment of the proposals until a stable platform is reached; dissemination and promotion of consensual recommendations.</Paragraph>
      <Paragraph position="4"> The process of standard definition undertaken by CLWG represents an essential interface between advanced research in the field of multilingual lexical semantics, and the practical task of developing resources for HLT systems and applications. It is through this interface that the crucial trade-off between research practice and applicative needs will actually be achieved.</Paragraph>
      <Paragraph position="5"> In what follows we briefly describe the two-step strategy adopted in the journey towards standards design: a first activity of survey of existing multilingual resources both in the European and American research and industrial scenarios. A second ongoing phase aiming at individuating hot areas on the domains of multilingual lexical resources, which call - and de facto can access to - a process of standardisation.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Preliminary Step: the Survey Phase
</SectionTitle>
      <Paragraph position="0"> Following the well established EAGLES methodology, the first priority was to do a wide-range survey of bilingual/multilingual (or semantic monolingual) lexicons, so as to reach a fair level of coverage of existing lexical resources. With respect to this target, one of the first objectives is to discover and list the (maximal) set of (granular) basic notions needed to describe the multilingual level. The Survey of existing lexicons (Calzolari, Grishman and Palmer, 2001) has been accompanied by the analysis of the requirements of a few multilingual applications, and by the parallel analysis of typical cross-lingually complex phenomena 5.</Paragraph>
      <Paragraph position="1"> The main issue is how to state in the most proper way the translation correspondences among entries in the multilingual lexicon. The passage from source language (SL) to target language (TL) makes it necessary to express very complex and articulated transfer conditions , which have to take into account as difficult and pervasive phenomena as argument switching, multi-word expressions, collocational patterns, etc.</Paragraph>
      <Paragraph position="2"> The function of an entry in a multilingual lexicon is to supply enough information to allow the system to identify a distinct sense of a word or phrase in SL, in many different contexts, and reliably associate each context with the most appropriate translation. The first step is to determine, of all the information that can be associated with SL lexical entries, what is the most relevant to a particular task. We decided to focus the work of survey and subsequent recommendations around two major broad categories of application: Machine Translation and Cross-Language Information Retrieval. They have partially different/complementary needs, and can be considered to represent the requirements of other application types. It is necessary in fact to ensure that any guidelines meet the requirements of industrial applications and that they are implementable.</Paragraph>
      <Paragraph position="3"> In the Survey, some Korean and Japanese examples were present in the case study dedicated to relevant cross-linguistic phenomena, (e. g. sense distinctions according to variation in syntactic frames/semantic type/</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML