File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-2007_intro.xml

Size: 3,709 bytes

Last Modified: 2025-10-06 14:02:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2007">
  <Title>Using an incremental robust parser to automatically generate semantic UNL graphs</Title>
  <Section position="3" start_page="0" end_page="1" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> UNL is a project of multilingual personal networking communication initiated by the University of United Nations based in Tokyo. The representation of an utterance in the UNL inter-lingua is a hypergraph where nodes bear universal words (interlingual acceptions) with semantic attributes and arcs denote semantic relations. Any natural language utterance can be enconverted (encoded) into a UNL expression thatcanthenbeusedasapivotinavarietyof possible applications (multilingual information retrieval, automatic translation, etc.).</Paragraph>
    <Paragraph position="1"> Enconverting into UNL is thus to be understood as the process by which a UNL expression is generated from the analysis of a natural language utterance. This process can be carried out by different strategies, ranging from fully automatic to fully human enconverting.</Paragraph>
    <Paragraph position="2"> Within the UNL project, a number of software tools exist for different languages, mainly dictionnaries and deconverters (for French (Serasset and Boitet, 2000), for Tamil (Dhanabalan and Geeta, 2003), etc.). However, there are a few tools for enconversion (for German (Hong and Streiter, 1999), for Spanish  ,etc.).</Paragraph>
    <Paragraph position="3"> As they are not full automatic enconverters, these systems have not yet proved to be suitable for dealing with huge amounts of heterogeneous data.</Paragraph>
    <Paragraph position="4"> For French, there is currently a version under development of an enconverter that uses the Ariane-G5 platform (Boitet et al., 1982), an environment for multilingual machine translation, for the analysis of the natural language input. However, this approach has several drawbacks.</Paragraph>
    <Paragraph position="5"> First, the size of the linguistic input that it can process is limited to 200-250 words. Second, the output produced contains all the possible complete linguistic analysis for a sentence (multiple syntactic and logico-semantic trees). This implies an interactive disambiguation step to choose the appropriate linguistic analysis for the enconverter. Such an interactive disambiguation step is not a drawback in itself (it is indeed very useful in the context of automatic translation). The problem rather comes from an efficient disambiguation of huge amounts of analysis in a reasonable time. Finally, the system is not yet multi-platform (the program currently runs only on Macintosh) and the connecting procedures with Ariane-G5 are not very efficient at this time (efforts are currently being done to address this issue).</Paragraph>
    <Paragraph position="6"> To cope with all these difficulties and to develop a French enconverter that can generate UNL expressions for large collections of raw corpora, we propose to use the ouputs produced by an existing incremental parser which has already proved robust and efficient for parsing huge amounts of data.</Paragraph>
    <Paragraph position="7"> This article is organized as follows: after introducing the UNL language and giving some details on how it represents knowledge in a language-neutral way, we present XIP, an in- null http://www.unl.fi.upm.es cremental parser, that we will use as the central tool for the enconversion. Then, we describe the mechanism for transforming XIP's outputs into UNL expressions and finally we discuss a preliminary evaluation of the enconverter and our perspectives.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML