Using an incremental robust parser to automatically generate
semantic UNL graphs
Nuria Gala
GETA-CLIPS-IMAG
385 av. de la Biblioth`eque, BP 53
F-38041 Grenoble cedex 9, France
nuria.gala@imag.fr
Abstract
The UNL project (Universal Networking Lan-
guage) proposes a standard for encoding the
meaning of natural language utterances as se-
mantic hypergraphs, intended to be used as
pivot in multilingual information and commu-
nication systems. Several deconverters permit
to automatically translate UNL utterances into
natural languages. However, a rough enconver-
tion from natural language texts to UNL expres-
sions is usually done interactively with editors
specially designed for the UNL project or by
hand (which is very time-consuming and diﬃ-
cult to extrapolate to huge amounts of data).
In this paper, we address the issue of using an
existing incremental robust parser as main re-
source to enconverting French utterances into
UNL expressions.
1 Introduction
UNL is a project of multilingual personal
networking communication initiated by the Uni-
versity of United Nations based in Tokyo. The
representation of an utterance in the UNL inter-
lingua is a hypergraph where nodes bear uni-
versal words (interlingual acceptions) with se-
mantic attributes and arcs denote semantic re-
lations. Any natural language utterance can be
enconverted (encoded) into a UNL expression
thatcanthenbeusedasapivotinavarietyof
possible applications (multilingual information
retrieval, automatic translation, etc.).
Enconverting into UNL is thus to be unders-
tood as the process by which a UNL expression
is generated from the analysis of a natural lan-
guage utterance. This process can be carried
out by diﬀerent strategies, ranging from fully
automatic to fully human enconverting.
Within the UNL project, a number of soft-
ware tools exist for diﬀerent languages, mainly
dictionnaries and deconverters (for French
(Serasset and Boitet, 2000), for Tamil (Dhan-
abalan and Geeta, 2003), etc.). However, there
are a few tools for enconversion (for German
(Hong and Streiter, 1999), for Spanish
1
,etc.).
As they are not full automatic enconverters,
these systems have not yet proved to be suitable
for dealing with huge amounts of heterogeneous
data.
For French, there is currently a version un-
der development of an enconverter that uses the
Ariane-G5 platform (Boitet et al., 1982), an en-
vironment for multilingual machine translation,
for the analysis of the natural language input.
However, this approach has several drawbacks.
First, the size of the linguistic input that it can
process is limited to 200-250 words. Second, the
output produced contains all the possible com-
plete linguistic analysis for a sentence (multi-
ple syntactic and logico-semantic trees). This
implies an interactive disambiguation step to
choose the appropriate linguistic analysis for the
enconverter. Such an interactive disambigua-
tion step is not a drawback in itself (it is in-
deed very useful in the context of automatic
translation). The problem rather comes from
an eﬃcient disambiguation of huge amounts of
analysis in a reasonable time. Finally, the sys-
tem is not yet multi-platform (the program cur-
rently runs only on Macintosh) and the connec-
ting procedures with Ariane-G5 are not very ef-
ficient at this time (eﬀorts are currently being
done to address this issue).
To cope with all these diﬃculties and to de-
velop a French enconverter that can generate
UNL expressions for large collections of raw cor-
pora, we propose to use the ouputs produced
by an existing incremental parser which has al-
ready proved robust and eﬃcient for parsing
huge amounts of data.
This article is organized as follows: after
introducing the UNL language and giving some
details on how it represents knowledge in a
language-neutral way, we present XIP, an in-
1
http://www.unl.fi.upm.es
cremental parser, that we will use as the central
tool for the enconversion. Then, we describe the
mechanism for transforming XIP’s outputs into
UNL expressions and finally we discuss a pre-
liminary evaluation of the enconverter and our
perspectives.
2 The Universal Networking
Language (UNL)
2.1 The language
UNL is an artificial language that describes
semantic networks. Sentence information is
represented by hypergraphs having universal
words (UWs) as nodes and relations as arcs. A
hypergraph can also be represented as a set of
directed binary relations, between UWs in the
sentence. Linguistic information is encoded by
means of the UWs, the relations that exist bet-
ween them and the attributes that are associ-
ated with them.
2.2 Universal Words
Universal Words represent simple or compound
concepts. They denote interlingual acceptions
(word senses) for a given lemma.
An entry in the dictionnary of Universal
Words contains, as illustrated in Figure 1, a
head word (the French lemma ”membre” in this
example) followed by a list of morpho-syntactic
constraints. The last part of the entry contains
the UW itself: a character string (an English-
language lemma) between double quotes, which
usually contains a list of semantic constraints
in brackets.
[membre] { CAT(CATN),GNR(MAS,FEM),N(NC) }
"associate(icl>member)";
[membre] { CAT(CATN),GNR(MAS),N(NC) }
"member(icl>human)";
[membre] { CAT(CATN),GNR(MAS) }
"member(icl>part)";
[membre] { CAT(CATN),GNR(MAS),N(NC) }
"member";
Figure 1: Semantic ambiguity in Universal
Words.
When present, the list of semantic constraints
describes conceptual restrictions. For example,
the first three entries in Figure 1 define three
diﬀerent acceptions while the last one provides
only the lemma and is thus more general.
2.3 Relations
Binary relations are the building blocks for UNL
expressions. They link together two UWs in
a linguistic utterance and have labels that de-
pend on the roles the UWs play in the sentence.
A UNL relation is represented by a headword
(the label of the semantic relation) followed by a
bracketed expression containing the UWs. The
UWs are separated by a comma and decorated
with diﬀerent kinds of linguistic information.
Figure 2 shows the UNL enconversion for the
following French sentence:
“Lors de la 29e session de la Conf´erence
g´en´erale de l’Unesco, les 186 Etats membres
ont ratifi´e`a l’unanimit´eceprojet.”
2
agt(ratify(agt>thing,obj>thing).@entry.@past,
state(icl>nation).@def.@pl)
mod(state(icl>nation).@def.@pl,
member(mod<thing))
qua(state(icl>nation).@def.@pl,186)
man(ratify(agt>thing,obj>thing).@entry.@past,
unanimously(icl>how))
obj(ratify(agt>thing,obj>thing).@entry.@past,
project(icl>plan(icl>thing)))
mod(project(icl>plan(icl>thing)),this)
tim(ratify(agt>thing,obj>thing).@entry.@past,
session(icl>meeting).@def)
mod(session(icl>meeting).@def,29.@ordinal)
mod(session(icl>meeting).@def,
conference(icl>meeting).@def)
mod(conference(icl>meeting).@def,
general(mod<thing))
mod(conference(icl>meeting).@def,UNESCO.@def)
Figure 2: UNL expressions.
The UNL expressions in Figure 2 encode re-
lations such as agt (agent), qua (quantifier),
mod (modifier), tim (instant time), man (man-
ner) and obj (object).
As can be seen on the figure, the information
given by a UNL relation may be very semanti-
cally precise : for example, the notion of “time”
is composed of six labels, corresponding to an
instant time (tim), an initial time (tmf), a final
time (tmt), a period (dur), a sequence (seq)or
a simultaneous action (coo).
2
In their 29th General Conference, the 186 member
states of the Unesco ratified their unanimous support of
the project.
The couple of UWs present in a relation have
diﬀerent kinds of attributes : morphological in-
formation (def, pl, etc.), information about
tense (past), etc.
2.4 Representation of UNL graphs
The list of UNL relations for a linguistic ut-
terance is represented by a UNL hypergraph (a
graph where a node is simple or recursively con-
tains a hypergraph). The arcs bear semantic re-
lation labels and the nodes are UWs with their
attributes as showed in Figure 3.
29 .@ordinal186member(mod>thing)
   .@entry.@past
ratify(agt>thing,obj>thing)
UNESCO
.@def
conference(icl>meeting)
   .@def
state(icl>nation)
  .@def.@pl
unanimously(icl>how) project(icl>plan
 (icl>thing))   .@def
session(icl>meeting)
this
general(mod<thing)
mod
mod
mod mod
objman
timagt
mod
qua
mod
Figure 3: UNL Hypergraph.
UNL hypergraphs must contain one special
node, called the entry of the graph (usually the
finite verb). This information is encoded with
the label entry in the list of UNL relations re-
presenting the corresponding hypergraph.
3 An incremental robust parser
3.1 Overview of the parser
XIP (Ait-Mokhtar et al., 2002; Hagege and
Roux, 2002) is a rule-based platform for buil-
ding robust incremental parsers. It is deve-
lopped at the Xerox Research Centre Europe
(XRCE) and shares the same computation-
nal paradigm as the PNLPL approach (Jensen,
1992) and the FDGP approach (Tapanainen
and Jarvinen, 1997).
At present, various grammars for XIP have
been built for English and French. The diﬀerent
phases of linguistic processing are organized in-
crementally : syntactic analysis is done by first
chunking (Abney, 1991) a morphosyntactic an-
notated input text and then extracting func-
tionnal dependencies (links between the words).
The aim of the system is to produce a list of
syntactic dependencies which may be later used
in applications such as information retrieval, se-
mantic disambiguation, coreference resolution,
etc.
3.2 Incremental approach
A XIP parser, like the French parser (that we
will call XIPF hereafter), is composed of diﬀe-
rent modules that transform and process incre-
mentally the linguistic information given as in-
put. XIPF contains three main modules: one
for morphological disambiguation (disambigua-
tion of POS tags depending on contextual in-
formation), another one for chunking (marking
structural groups) and a last one for dependency
calculus (identifying links between words).
Each module may have a number of gram-
mars which are applied one after the other
depending on the linguistic complexity of the
phenomena present. For example, for French,
the identification of verbal phrases comes after
the identification of nominal phrases. The dif-
ferent rules in the grammars also apply incre-
mentally. They are organized in levels so that
they apply sequentially to enrich stepwise the
linguistic analysis. This strategy favors linguis-
tic precision over recall.
3.3 Data representation
Within the XIP formalism, information is re-
presented by means of syntactic trees with ter-
minal nodes or sequences of constituant nodes
(such as nominal phrases (NPs), finite verbal
phrases (FVs), etc.). The maximal node for each
tree (sentence) is a virtual node called GROUPE.
All nodes, lexical (membre)ornot(NP),
have a list of features associated with them
and describing precise features : typogra-
phical (capital letter [maj:+]), lexical (proper
noun [proper:+]), morphological (number
[plu:+]), syntactic (subcategorization with the
preposition “a” [sfa:+]) or semantic (time
[tim:+]).
Since the complete linguistic information of
anodeisalwayspresent,evenifitisnotdis-
played in the output, it is simple to manipulate
at any time during the analysis. Therefore, the
possibility of taking into account diﬀerent kinds
of features at any step of the analysis is a con-
siderable advantage when building a semantic
application (the enconversion into UNL expres-
sions).
Indeed, semantic information can be enriched
by adding new particular features when neces-
sary (a feature title has been added to be ap-
plied in titles).
3.4 XIPF output
The final result of the parser (a list of syntac-
tic dependencies) is obtained from the linguistic
processing done by the diﬀerent modules. Fi-
gure 4 shows the XIPF analysis for the French
sentence given as example in section 2.3.
SUBJ NOUN(ratifi´e,Etats)
VARG NOUN DIR(ratifi´e,projet)
VMOD LEFT NOUN INDIR(ratifi´e,Lors
de,session)
VMOD POSIT1 ADV(ratifi´e,`a, l’unanimit´e)
NMOD POSIT1 RIGHT ADJ(Conf´erence,g´en´erale)
NMOD POSIT1 LEFT ADJ NOUN(session,29e)
NMOD POSIT1 NOUN(session,de,Conf´erence)
NMOD POSIT1 NOUN(Conf´erence,de,Unesco)
NN(Etats,membres)
DETERM DEF NOUN DET(la,session)
DETERM DEF NOUN DET(la,Conf´erence)
DETERM DEF NOUN DET(l’,Unesco)
DETERM DEF NOUN DET(les,Etats)
DETERM DEM NOUN DET(ce,projet)
DETERM NUM NOUN(186,Etats)
AUXIL(ratifi´e,ont)
0>GROUPE{SC{PP{Lors de NP{la AP{29e}
session}} PP{de NP{la Conf´erence}}
AP{g´en´erale} PP{de NP{l’ Unesco}} ,NP{les
186 Etats} NP{membres} FV{ont ratifi´e}} `a
l’unanimit´eNP{ce projet} .}
Figure 4: XIPF output.
For this sentence, the parser extracts re-
lations such as subject (SUBJ), verbal subca-
tegorization (VARG), verbal and nominal mo-
dification (VMOD, NMOD and NN), determination
(DETERM) and verbal auxiliary (AUXIL). The
head of the dependency appears as the first ele-
ment except in the case of a determination re-
lation.
Relations usually have a list of morpho-
syntactic features associated with them : the
POS tag of the word linked to the head (NOUN in
a SUBJ relation, ADJ in a NMOD, etc.), morpholo-
gical precisions (NUM, DEM) or syntactic features
(the position of the adjective, POSIT1, RIGHT,
etc.).
The process of dependency extraction is de-
terministic: the most plausible relation accor-
ding to the system is extracted. The only excep-
tion is that of prepositionnal attachment (VMOD
and NMOD): the linguistic information that the
parser has is not enough to handle structural
ambiguities. In this case, all possible relations
appear in the result.
3.5 Parser evaluation
Parsers built with the XIP engine (XIPF) are
able to process about 2.000 words/s using 10 Mo
of memory footprint (only grammars, without
lexicons)
3
.
As for linguistic performance, an evalu-
ation of XIPF subject and object (VARG)
dependencies, conducted on French newspa-
pers (Ait-Mokhtar et al., 2001), showed the
following precision (P) and recall (R) rates:
SUBJECT, P = 93,45 %, R = 89,36 %; OBJECT,
P = 90,62 %, R = 86,56 %.
Another evaluation carried out with XIPF+
(Gala, 2003), a second French parser contai-
ning more specialized grammars to handle com-
plex phenomena such as punctuation, lists, ti-
tles etc., using varied raw corpora from diﬀerent
types and domains
4
gives P = 94 % for subject
even in sentences being or containing lists, enu-
merations etc. and P = 93 % and R = 89,6 %
for key words in titles (CLE relation).
4 A French UNL enconverter
4.1 Overview
The principal motivation to create a French en-
converter is to easily obtain huge amounts of
UNL enconverted corpora which can be subse-
quently used in other applications (for example,
multilingual information retrieval). To achieve
this objective, one of the main requirements was
also the reusability of existing robust linguistic
resources.
The choice of a XIP parser was motivated by
several reasons. First, its robustness permits to
deal with huge amounts of text (a result is al-
ways produced whatever the complexity of the
input). Second, its modular architecture facili-
tates the articulation of diﬀerent ressources (it
is easy to enrich the parser with new lexicons
and grammars and to desactivate a particular
module when necessary). Finally, the flexibi-
lity of the formalism permits to enrich the rules
and the features with no harm. We have pre-
fered XIPF+ over the standard XIPF because
of its broader linguistic coverage.
The French UNL enconverter is thus a proces-
sor that automatically transforms annotations
3
Obtained with a Pentum III 1 GHz.
4
About 108.000 words extracted from the Web (end
2000) concerning general newspaper (Le Monde)aswell
as specialized domains such as economics (journal Les
Echos), science (medecine, physics), law (project of law),
etc.
provided by the XIPF+ parser into UNL ex-
pressions.
4.2 Remarks on terminology
To avoid ambiguity, we use the term “depen-
dency” to indicate XIPF+ syntactic links of the
form D(x,y) or D(x,y,z),asshowninFigure
4, and the term “feature” to indicate linguis-
tic information provided by the parser. XIPF+
provides twelve types of dependencies and more
than two hundred and fifty features, of the types
described in section 3.3 (typographical, mor-
phological, etc.).
As for UNL, we use the term “rela-
tion” to denote a semantic link of the form
label(UW1.attributes,UW2.attributes),as
shown in Figure 2, while an “attribute” corre-
sponds to a UNL annotation. Such an annota-
tion appears to the right of a UW and adds par-
ticular linguistic information. The UNL forma-
lism provides about fourty relations and eighty
attributes of diﬀerent types.
4.3 Generation of UNL expressions
The first step of the enconvertion consists in
identifying the information provided by XIPF+
that will be translated into UNL relations.
There are three kinds of mapping rules perfor-
ming this task, depending on the input and the
result of the transformation: a dependency gi-
ving an attribute, a dependency giving a rela-
tion, a feature giving a relation.
4.3.1 Dependency to attribute.
The first kind of mapping rules transforms a
XIPF+ dependency into a UNL attribute. An
example is that of the relation CLE (the head of
a title). Within UNL it becomes @title and it
is included as an attribute of the UNL relation
containing the head word of the title.
The following example describes a title, its
analysis with XIPF+ and its UNL encoversion:
Le Forum Universel des Cultures
5
4.3.2 Dependency to relation.
The second kind of mapping rules transforms
a XIPF+ dependency into a UNL relation. In
some cases, this transformation is not straight-
forward since a number of lexical and semantic
features are to be taken into account (and they
are not always provided by the parser). This
5
The Universal Forum of Cultures
CLE(Forum)
NMOD POSIT1 RIGHT ADJ(Forum,Universel)
NMOD POSIT1 NOUN INDIR(Forum,des,Cultures)
DETERM CLOSED DEF NOUN DET(Le,Forum)
<title> 0>GROUPE{NP{Le Forum} AP{Universel}
PP{des NP{Cultures}} </title>
mod(forum.@def.@entry.@title,
universal(mod<thing))
mod(forum.@def.@entry.@title,
culture(icl>abstract thing).@def.@pl)
Figure 5: Example of dependency to attribute
transformation.
is the case of dependencies with the verb to be
and generally with all verbs denoting a state.
While in the UNL formalism the verb to be
is considered a copula and does not appear
in the semantic representation, the parser
produces the syntactic dependencies in which
the verb participates and marks the fact of
being a copula by means of features ([copula]
as lexical feature and SPRED -predicative- as
syntactic feature) as illustrated on the example
below :
Le Forum est Universel.
6
SUBJ(^etre,Forum)
VARG ADJ SPRED(^etre,Universel)
aoj(Universal,Forum)
Figure 6: Example of dependencies involving
the verb to be and their corresponding UNL re-
lation.
In this case, an aoj relation shows the link
between the noun in the subject and the adjec-
tive. The parser’s feature permitting the iden-
tification of a copula is thus crucial in order to
map precisely a SUBJ and a VARG into an agt
and a obj or into a single aoj.
Table 1 gives a summary of the principal
transformations performed by this second kind
of mapping rules (as it is shown, in the case
of modification, two types of XIPF+ relations
produce a UNL mod):
6
The Forum is Universal.
SUBJ(X[be-],Y) agt(X,Y)
VARG(X[be-],Y) obj(X,Y)
SUBJ(X[be+],Y)
VARG(X[be+],Z) aoj(Z,Y)
NMOD(X,Y) or NN(X,Y) mod(X,Y)
Table 1: XIPF+ dependencies producing UNL
relations.
4.3.3 Feature to relation.
The last type of mapping rules identifies parti-
cular information encoded as features within the
parser’s output and transforms them into UNL
relations with the appropriate words. This is
the case for the notions of quantification and
time.
Regarding quantification, this feature, en-
coded within the dependency DETERM,istrans-
formed to produce a qua UNL relation between
a determiner and a noun.
As for the relations involving the notion of
time, the feature time encoded by XIPF+ is
too general. Therefore, it is not possible to
produce the semantically precise UNL relations
expressing variations of the concept of time
(duration, final time, sequence, etc.). In this
case,wehavechosentocreateanintermediate
UNL relation named time in order to keep this
semantic information.
NMOD(X,Y,Z)
DETERM(W[quant+],Z) qua(Z,W)
NMOD(X,Y,Z[time+]) time(X,Z)
Table 2: XIPF+ features producing UNL rela-
tions.
4.4 Accessing the UW base
After identifying the UNL relations, the encon-
verter retrieves the UWs corresponding to each
French word in a relation. UWs are contained
into a UW database of 37.901 French lemmas.
The major diﬃculty here concerns ambiguity,
that is, accessing the right acception, since the
database usually contains a list of UWs for a
given lemma. The ambiguity can be semantic,
when a French lemma corresponds to a single
English lemma with diﬀerent acceptions (cf
Figure 1) or lexical, when a French lemma
corresponds to several English lemmas. Here
is an example of lexical ambiguity with the
pronoun il (”he” or ”it” in English) :
[il] { CAT(CATR) } "he(icl>human)";
[il] { CAT(CATR) } "it(icl>nonhuman)";
Figure 7: Lexical ambiguity in Universal Words.
To this date, as the lexico-semantic infor-
mation provided by the parser is not enough
to choose the appropriate UW, the enconverter
takes the most general acception (that is, the
word sense without a constraint list –the last
entry in the list showed in Figure 1). When all
acceptions of an entry have such list of cons-
traints, the enconverter chooses the first one.
4.5 Enrichment with lexical
information
The final step of the enconversion enriches the
rough UNL expressions produced (UNL labels
with simplified UWs) with more complete mor-
phological information. A set of rules is thus
specialized in translating diﬀerent linguistic fea-
tures from the parser into UNL descriptors com-
pleting the words in a relation.
Some of this morphological information can
also be extracted from the UW base (gender).
However, we have preferred to extract a maxi-
mum of information from the parser because it
produces a contextual analysis of the words ap-
pearing in a linguistic utterance.
The features which enrich the UNL output
concern definiteness (@def or @indef), num-
ber (@sg or @pl) and tense (@past, @present,
@fut). A few labels (@ordinal, @complete ...)
are absent on the XIPF+ output and therefore
not automatically enconverted in the UNL out-
put. Finally, the attribute @entry is systemati-
cally added to UWs head of their sentence (the
verb): agt, varg, aoj,etc.
5 Evaluation
A complete evaluation of a UNL enconverter
should take into account the following possible
kinds of errors:
• graphs with wrong linguistic information
(semantic relations, attributes, etc.),
• missing information (incomplete graph due
to missing relations, incomplete decora-
tions, etc.),
• graphs with wrong UWs (wrong acception
or wrong lemma).
Since in this article we want to emphasize the
use of an incremental robust parser for creating
an enconverter, we evaluated errors concerning
semantic relations
7
, thus the first and the se-
cond points which correspond, respectively, to
classic evaluation metrics of precision and recall
rates.
The enconverter was tested against the first
50 manually enconverted UNL graphs (1.059
words) from a corpus of legal text. The a-
verage length of the sentences was about 21
words (21,18). The semantic relations evalu-
ated in this preliminary experiment (322 UNL
expressions) were agt (44), obj (57) and mod
(221).
Table 3 gives the results obtained for the
evaluation of this first version of the encon-
verter :
Relation Precision Recall
agt(X,Y) 57 % 80 %
obj(X,Y) 51 % 48 %
mod(X,Y) 58 % 86 %
Table 3: Results of the evaluation of the first
version of the enconverter.
For agents, most errors come from syntac-
tic subjects correctly identified by the parser
but presenting semantic features that should
had been taken into account to create aoj rela-
tions. To give an example, in the sentence “La
culture acquiert des formes diﬀ´erentes (...)”
8
,
the parser extracts correctly the dependency
subj(acquire,culture) although it is seman-
tically encoded as aoj(acquire,culture) in
UNL because the verb “acquire” is considered
in this utterance as a verb of state.
In the case of objects, errors on precision
concern wrong scope of coordination as well
as objects being a whole sentence. As for
recall, there are several constructions which
may be considered obj from a semantic point
of view but that the parser identifies as
modifiers due to their surface construction
with a preposition. For example, “source
de creativit´e”
9
is analyzed by the parser as
mod(source,de,cr´eativit´e) although UNL
encodes obj(source,creativity). Likewise,
7
The presence or absence of the diﬀerent attributes
was not evaluated.
8
Culture acquires diﬀerent forms (...).
9
Source of creativity.
“vers l’acc´es de la diversit´eculturelle”
10
is en-
coded in UNL as obj(towards,access),akind
of relation that the parser does not extract.
A final remark concerns modifiers. As said
before, the parser is not deterministic in mark-
ing modifiers: all possible combinations be-
tween a head word and its dependents are ex-
tracted. That is the main reason why precision
is low and recall is high.
The average of all these figures gives a global
evaluation of the enconverter corresponding to
P = 56 % and R = 69 %.
6 Discussion
At this stage of the project, there are a number
of conclusions we can draw from the preceding
evaluation.
The first one is that the results are rather en-
couraging in terms of a first rough enconvertion
from syntactic XIPF+ information to UNL ex-
pressions (agt, obj and mod). However, we are
aware that certain cases present considerable
diﬃculties. For example, in addition to the ex-
amples presented in the evaluation for verbs of
state, subjects with a semantic feature of “pa-
tient” are to be enconverted as obj and not as
subj (unfortunately the semantic information
needed for this transformation is not yet avail-
able within the parser). Thus in “La r´eunion
continuera jusqu’`acesoir.”
11
the parser ex-
tracts a subj(continuer,r´eunion) that might
be enconverted as obj(continue,meeting) in
UNL. All these kinds of complex transforma-
tions including particular semantic features are
at this point an important bottleneck for the
enconverter.
The second conclusion coming from the eva-
luation (even if not quantitatively analyzed) is
that the choice of the UW remains a critical
point, as the enconverter has not the possibi-
lity of choosing the correct acception giving a
configuration. One possibility to consider might
be to introduce interactivity with a human to
choose the correct UW. The second possibility
is related to the improval of the parser: we can
consider adding more linguistic information, in
the form of semantic classes or semantic fea-
tures, in order to be able to disambiguate. Ha-
ving enriched the parser with these semantic
features, another possibility to improve the en-
converter might be to consider statistical infor-
mation about collocations.
10
Towards accessing cultural diversity.
11
The meeting will continue until this evening.
Finally, we are conscious that there would
still remain several aspects which would demand
to be improved within the parser itself : prepo-
sitionnal attachment disambiguation, scope of
coordination, complex coreference, etc. Parti-
cular strategies may be adapted to handle such
diﬃculties individually (using statistical infor-
mation, interactive disambiguation, etc.).
7Conclusion
In this paper we have presented a mechanism
for automatically producing UNL expressions
using the ouput of a robust parser. After des-
cribing the UNL formalism and presenting an
incremental parser able to accurately process
huge amounts of data, we have shown how one
can transform the linguistic information pro-
vided by the parser into UNL expressions. We
have also presented a first evaluation in an at-
tempt to try to assess the performance of the
enconverter.
Our results show that there are still several
crucial problems that we need to solve. Ho-
wever, taking into account that this is prelimi-
nary work, the results already obtained are en-
couraging and confirm the possibility of using
the reliable linguistic information automatically
obtained from an incremental robust parser to
create a UNL semantic enconverter for huge
amounts of data.
Acknowledgements
The author wants to express her gratitude to
E. Blanc and A. Max, as well as to the three
anonymous reviewers, for their suggestions and
comments on a first draft of the paper.

References
S. Abney. 1991. Parsing by chunks. In
Principle-Based Parsing,editedbyR.
Berwick, S. Abney and C. Tenny, 257–278.
Kluwer Academic Publishers. Boston.
S. A¨ıt-Mokhtar, J. P. Chanod, and C. Roux.
2001. A multi-input dependency parser. In
Proceedings of International Workshop on
Parsing Technologies, IWPT-2001, 201-204.
Beijing, China.
S. A¨ıt-Mokhtar, J. P. Chanod, and C. Roux.
2002. Robustness beyond shallowness : In-
cremental Deep Parsing. Special Issue of the
Natural Language Engineering Journal on
Robust Methods in Analysis of Natural Lan-
guage Data, vol. 8(3), 121–144, Cambridge
University Press.
C. Boitet, P. Guillaume, and M. Qu´ezel-
Ambrunaz. 1982. ARIANE-78, an integrated
environment for automated translation and
human revision. In Proceedings of Conference
on Computational Linguistics, COLING-82,
19–27, Prague.
T. Dhanabalan, and T. V. Geetha. 2003. UNL
Deconverter for Tamil. In International Con-
ference on the Convergence of Knowledge,
Culture, Language and Information Tech-
nologies. Alexandria, Egypt.
N. Gala. Un mod`ele d’analyseur syntaxique
robuste fond´e sur la modularit´eetlalexi-
calisation de ses grammaires. Th`ese de doc-
torat,Universit´e de Paris-Sud, UFR scien-
tifique d’Orsay, France.
C. Hag`ege, and C. Roux. 2002. A Robust
and Flexible Platform for Dependency Ex-
traction. In Proceedings of 3rd Conference on
Language Ressources and Evaluation, LREC-
2002, 520–523, Las Palmas de Gran Canaria,
Spain.
M. Hong, and O. Streiter. 1999. Overcom-
ing the language barriers in the Web : the
UNL-Approach. In Multilingual Corpora :
encoding, structuring, analysis. 11th Annual
Meeting of the German Society for Compu-
tational Linguistics and Language Technolo-
gies.. Frankfurt, Germany.
K. Jensen. 1992. PEG: the PLNLP English
Grammar. In Natural Language Processing :
the PLNLP approach, edited by K. Jensen, G.
Heidorn and S. Richardson. Kluwer Academic
Publishers. Boston.
G. Serasset, and C. Boitet. 2000. On UNL as
the future “html of the linguistic content”
and the reuse of existing NLP components
in UNL-related applications with the example
of a UNL-French deconverter. In Proceedings
of Conference on Computational Linguistics,
COLING-2000. Saarbruecken.
P. Tapanainen, and T. Jarvinen. 1997. A non-
projective dependency parser. In Proceedings
of Conference on Applied Natural Language
Processing, ANLP-97. Washington.
