File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/96/w96-0513_abstr.xml

Size: 11,971 bytes

Last Modified: 2025-10-06 13:48:45

<?xml version="1.0" standalone="yes"?>
<Paper uid="W96-0513">
  <Title>Multilinguality and Reversibility in Computational Semantic Lexicons</Title>
  <Section position="1" start_page="0" end_page="51" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> In this paper, we address the issue of generating multilingua.1 computational semantic lexicons from analysis lexicons, showing the necessity of relying on a conceptual lexicon. We first discuss the type of information which should be found in NLP lexicons, whatever their use (analysis, generation, speech, robotics). We claim that we should take advantage of the existing large-scale analysis lexicons and use tliem as the starting point in the process of building large-scale generation lexicons, by first reversing tliem and then enhancing them.</Paragraph>
    <Paragraph position="1"> Tliis implies having access to a conceptual lexicon, which will serve as a pivot point between the analysis and the generation lexicons. We implemented the work reported here for Spanish and English MT projects, within the knowledge-based paradigm. From a theoretical point of view, regenerating the source text with the reversed analysis lexicon enabled us to enhance several issues as diverse as: evaluating analysis lexicons, testing the semantic analyser, evaluating which information should be added to the generation lexicon; and testing the grain-size of the pivot point between analysis and generation.</Paragraph>
    <Paragraph position="2"> Introduction There is no consensus on the type of lexicons which should be used for generators. It seems to depend on the type of generator. It also seems to depend on the kind of application involved: monolingual generation, naultilingual generation, machine translation: generation of sentences vs. texts vs. speech; or also generation from raw data vs. from conceptual representations built with generation in mind.</Paragraph>
    <Paragraph position="3"> Once one has an application in mind, then there are three main approaches one can adopt to build the lexicon: lexicographic: very attractive for NLP applications at first sight, as they provide a useful description of the vocabulary; entries are distinguished on the basis of multiple senses and We would like to thank in the mikrokosmos team, Tom Herndon, .Jeff Longwel, Oscar Cossio and Javier Ochoa.</Paragraph>
    <Paragraph position="4"> subcategorisations. But in practice, this approach complicates the process of lexical disambiguation for parsing and lexical choice in generation by an unjustified proliferation of entries. null statistical: very attractive for NLP applications as it seems to replace knowledge-based approaches and therefore supplant the needs for human acquisition of large-scale semantic lexicons, which is a very time consuming task. However, the limits of statistical approaches haw: been pointed out by \[Smadja, 1993\]. Moreover, some phenomenon, such as event ellipsis (EE) cannot be handled by a pure statistical approach nor by a lexicographic approach, as its recovery necessitates a semantic treatment, (\[Viegas and Nirenburg, 1995a\]), which is, in fact, handled by a computalional Linguistic approach making use of semantics.</Paragraph>
    <Paragraph position="5"> computational linguistic: the main advantage of this approach is that it is usually theoretically grounded, and is domainand application-independent. Moreover, it can handle phenomena which are out of the reach of other approaches and yet are necessary to enhance lexical choice in generation. For instance, the EE triggered by enjoy as in i) \[ enjoyed the salmon very much, must be modeled with a semantic representation so that its recovery can be taken care of as in ii) I enjoyed eating .... This is part of lex.ical choice as one can choose to realise the synthetic version or the analytic version of the EE, as exemplified in i) and ii) respectively (cf. \[Viegas and Nirenburg, 1995a\]).</Paragraph>
    <Paragraph position="6"> However necessary, adopting a linguistic approach is a difficult task, as one of the main drawbacks of this approach is that it is time consuming as far as the building of the lexicon is concerned. We address in next section how to bypass this drawback.</Paragraph>
    <Paragraph position="7">  A Multi-purpose Knowledge Base Since building computational semantic lexicoas is a very time-consuming task, we should aim at lexicons which conform to the three following conditions: a multi-lingual: French, Engfish, Japanese, Russain, Spanish, etc..., (format of the lexicon) b - multi-rr'~d'._: .'_'&amp;quot; ..... :... 1~ .... ;~ tic information for natural language processing, phonological information, essentially for speech recognition and production, (structure of the lexicons) c - multi-use: so that they can be used for analysis, generation (mono/multi-lingual), MT, or speech processing. (reversibility of the lexicons) The way we organised and structured our lexicons directly follows these conditions.</Paragraph>
    <Paragraph position="8"> Large-scale computational generation lexicons carrying semantic information are indeed not that common, the obvious reason being that acquiring semantic information is a difficult and time consuming task. However, it is by no means an unattainable task, if we structure and organise our analysis lexicons in such a way so that the information they contain can be used at best for building generation lexicons. null Acquiring a large-scale lexicon is very expensive, which is why building lexicons that are reusable for other domains or applications is recommended. It is well known in computational lexical semantics that a sense enumeration approach only based on subcategorisation differences is computationally expensive and unrealistic from a theoretical viewpoint.</Paragraph>
    <Paragraph position="9"> Our lexicons are composed of superentries, where each entry consists of a list of words, stored there independently of their part of speech (the verb and noun form of walk are under the same superentry), as described in length in \[Onyshkevych and Nirenburg, 1994\].</Paragraph>
    <Paragraph position="10"> Reversing an Analysis Lexicon Before addressing the issue of reversing the analysis lexicon, we want first to show how we could acquire a large-scale analysis lexicon.</Paragraph>
    <Paragraph position="11"> Acquisition of the Analysis Lexicon We acquired a Spanish semantic lexicon of about 40,000 word meanings, for an MT Project, described in \[Beale et al., 1995\]. We automated as much as possible the task of acquisition by providing the lexicographers with access to on-fine dictionaries, on-fine corpora, and also software allowing lexicographers to access all this on fine information in an easy way (see \[Viegas and Nirenburg, 1995b\] for the task of acquisition). Our interfaces have been designed with respect to users needs, and continue to evolve on a needed basis.</Paragraph>
    <Paragraph position="12"> We give below the example of the partial entry cornpafii'a in Spanish, with two different marmings, represented by the following concepts in our world model (or ontology): COR-PORATION, INTEIrtACT-SOCIALLY. One important point here to notice is our transcategorim approach. There is no one-to-one mapping between semantic categories or concepts and lexical items, and some EVENTS, (such as INTERACT-SOCIALLY here) can be lexicalised as nouns (Figure 1) 1 or verbs such as in  compa~i'a.</Paragraph>
    <Paragraph position="13"> Let us now consider some of the entries for the Spanish verb adquirir with the following corresponding semantics: ACQUIRE, LEARN, displayed in (Figure 2).</Paragraph>
    <Paragraph position="14"> The sub-entries for adquirir have different selectional restrictions for the theme, OBJECT and INFORMATION for ACQUIRE and LEARN respectively. null We have acquired about 1/5 of our lexicon semi-automatically and have developed a morpho-semantic acquisition program, which has allowed us to acquire the remaining 4/5 entirely automatically to create at the end a large-scale lexicon of about 40000 word senses. 2 The main advantage of our approach is that it enabled us to economically multiply the size of the lexicon. The main drawback is that the en- null using the reversed lexicon to regenerate the entry as explained below.</Paragraph>
    <Paragraph position="15"> The Reversed Lexicon The algorithm to &amp;quot;reverse&amp;quot; the analysis lexicon (AL) to produce the generation lexicon (GL) mainly involves rearranging, modifying, deleting, and adding certain items. We focus below on the zones which have been reversed, namely: SYN (subcategorisation information) and SEN\[ (providing the semantic information with associated selectional restrictions), as shown in  Our transcategorial approach to sense discrimination is a good basis for paraphrasing, thus the concept ACQUIRE from the ontology, can be lexicafised in our Spanish lexicon, at least in: adquirir, obtener, conseguir (verbs), adquisicidn, obtcnciSn, enriquecimiento (nouns), codicioso (adjective). We only show partial entries for superentry of the concept ACQUIIIE, as shown in (Figure 4).</Paragraph>
    <Paragraph position="16">  We are now in the phase of enhancing the reversed lexicon for producing the Spanish and Engfish generation lexicons: namely, we are encoding information which is specific to the process of generation and which can be avoided in an analysis lexicon, such as word order (in Adj-noun constructions), and collocational information acquired semi-automatically fi'om corpora ( hacer una adquisici6n).</Paragraph>
    <Paragraph position="17"> Moreover, with this technique, we can produce multifingual generation lexicons by lexicafising the concepts of the reversed lexicons in different languages, this ensures that we will have a lexical item or phrase for lexicallsation available.</Paragraph>
    <Paragraph position="18"> Another advantage of reversing an analysis lexicon is using it to regenerate the same text that was parsed to gain some insight into the issue of the pivot point between parsing and generation, and as a resnlt of this, what is the best input for generation.</Paragraph>
    <Paragraph position="19"> Advantages of a Reversed Lexicon A reversed lexicon has advantages beyond its practical use in generation. We have identified, and in some cases begun work on the following areas: Evaluation of semantic analysis* With a reversed lexicon that is based on the original analysis lexicon, it is possible to take the output semantic representations from the analyser and submit them to a text generator. The output surface structures can then be compared to the input text. Apart from this, evaluation of semantic analyses can be difficult because  it involves reading and understanding complex meaning representations.</Paragraph>
    <Section position="1" start_page="51" end_page="51" type="sub_section">
      <SectionTitle>
Evaluating Text Meaning Representa-
</SectionTitle>
      <Paragraph position="0"> tion language. For example, the granularity of semantic representation can be studied. Is the representation precise enough to correctly translate all meaning components, or is a specific source term mapped into a generalised one from which the original meaning cannot be recovered? This will be especially helpful in a multilingual environment where meaning components might be bundled differently.</Paragraph>
      <Paragraph position="1"> Testing lexicon entries. We have developed a suite of tools to help in testing the analysis lexicon, to ensure the high-quality of our large-scale lexicon. These tools range in complexity from checking placement of parentheses to automatically creating sentences to test individual lexicon entries. For the latter, having a reversed lexicon available is extremely helpful. For example, a simple lexicon entry for the English word read might look like:</Paragraph>
      <Paragraph position="3"/>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML