Online Generic Editing of Heterogeneous Dictionary Entries in
Papillon Project
Mathieu MANGEOT
Unit Terjemahan Melalui Komputer
Universiti Sains Malaysia,
11800, Pulau Pinang
Malaysia
mathieu@mangeot.org
David THEVENIN
National Institute of Informatics
Hitotsubashi 2-1-2-1913 Chiyoda-ku
JP-101-8430 Tokyo
Japan
thevenin@nii.ac.jp
Abstract
The Papillon project is a collaborative
project to establish a multilingual dictio-
nary on the Web. This project started 4
years ago with French and Japanese. The
partners are now also working on English,
Chinese, Lao, Malay, Thai and Vietnamese.
It aims to apply the LINUX cooperative
construction paradigm to establish a broad-
coverage multilingual dictionary. Users can
contribute directly on the server by adding
newdataorcorrecting existingerrors. Their
contributions are stored in the user space
until checked by a specialist before being
fully integrated into the database. The re-
sulting data is then publicly available and
freely distributable. An essential condition
for the success of the project is to find a
handy solution for all the participants to be
able to contribute online by editing dictio-
nary entries.In this paper, we describe our
solution for an online generic editor of dic-
tionary entries based on the description of
their structure.
1 Introduction
The Papillon Project (S´erasset and Mangeot,
2001) is a cooperative project for a multilin-
gual dictionary on the Web with the following
languages: English, Chinese, French, Japanese,
Lao, Malay, Thai and Vietnamese. The dic-
tionary structure makes it very simple to add
a new language at any time. It aims to ap-
ply the LINUX construction paradigm to estab-
lish a multilingual usage dictionary with broad-
coverage.
This project is based on the participation of
voluntary contributors. In order to be really
attractive, this project must imperatively find
a convenient solution so that contributors can
easily edit the dictionary entries. Since Papillon
dictionary is available on a web server and the
contributors are located all around the world,
the obvious solution is to implement an editor
available online. Unfortunately, the existing so-
lutions (HTML forms, java applets) have impor-
tant limitations. Thus, we propose an entirely
generic solution that can adapt very easily not
only the interfaces to the various entry struc-
tures needing to be edited but also to the user
needs and competences.
Firstly, we outline the issue addressed in this
paper; and draw up an overview of the existing
methods for dictionary entry edition. A presen-
tation of the chosen method follows detailing
its integration in the Papillon server. Finally,
we show an example of the online edition of a
dictionary entry.
2 Addressed Issue and Requirements
In this paper, the addressed issue is how to
edit online dictionary entries with heteroge-
neous structures.
2.1 Online Edition
In order to build a multilingual dictionary that
covers a lot of languages, we need large compe-
tences in those languages. It may be possible
to find an expert with enough knowledge of 3
or 4 languages but when that number reaches
10 languages (like now), it is almost impossi-
ble. Thus, we need contributors from all over
the world.
Furthermore, in order to avoid pollution of
the database, we plan a two-step integration of
the contributions in the database. When a con-
tributor finishes a new contribution, it is stored
into his/her private user space until it is revised
by a specialist and integrated into the database.
Then, each data needs to be revised although
the revisers may not work in the same place of
the initial contributors.
Thus, the first requirement for the editor is
to work online on the Web.
2.2 Heterogeneous Entry Structures
The Papillon platform is built for generic pur-
poses. Thus, it can manipulate not only the
Papillon dictionary but also any kind of dictio-
nary encoded in XML (Mangeot, 2002). The
lexical data is organized in 3 layers:
• Limbo contains dictionaries in their origi-
nal format and structure;
• Purgatory contains dictionaries in their
original format but encoded in XML;
• Paradise contains the target dictionary, in
our case Papillon dictionary.
The Purgatory data can be reused for building
the Paradise dictionary.
We would like then to be able to edit different
dictionaries structures from Paradise but also
from Purgatory. Furthermore, being Papillon
a research project, entry structures may evolve
during the life of the project, since they are not
fixed from the beginning.
Hence, the second requirement is that the ed-
itor must deal with heterogeneous and evolving
entry structures.
2.3 Extra Requirements
Previous requirements must be fulfilled, whilst
the following ones are optional.
The contributors will have various compe-
tences and use the editor for different purposes
(a specialist in speech may add the pronuncia-
tion, a linguist may enter grammatical informa-
tion, a translator would like to add interlingual
links, and a reviewer will check the existing con-
tributions, etc.).
The second optional requirement concerns
the adaptation to the user platform. The
increasing number of smart mobile phones
and PDAs makes real the following scenarios:
adding an interlingual link with a mobile phone,
adding small parts of information with a PDA
and revising the whole entry with a workstation.
It would then be very convenient if the editor
could adapt itself both to the user and to the
platform.
2.4 Final Aim
Guided by these requirements, our final aim is
to generate, as much automatically as possible,
online interfaces for editing dictionary entries.
It has to be taken into account the fact that en-
try structures are heterogeneous and may vary
and to try to adapt as much as possible these in-
terfaces to the different kinds of users and plat-
forms.
3 Overview of Existing Editing
Methods
3.1 Local and Ad Hoc
The best way to implement a most comfort-
able editor for the users is to implement an ad-
hoc application like the one developed for the
NADIA-DEC project: DECID (S´erasset, 1997).
It was conceived to edit entries for the ECD
(Mel’ˇcuk et al., 1984889296). The Papillon mi-
crostructure is based on a simplification of this
structure. We were indeed very interested by
such software. It is very convenient - for exam-
ple - for editing complex lexical functions.
But several drawbacks made it impossible to
use in our project. First, the editor was de-
veloped ad hoc for a particular entry structure.
If we want to change that structure, we must
reimplement changes in the editor.
Second, the editor is platform-dependent
(here written and compiled for MacOs). The
users have to work locally and cannot contribute
online.
3.2 Distributed and Democratic
This solution implemented for the construc-
tion of the French-UNL dictionary (S´erasset and
Mangeot, 1998) project is called ”democratic”
because it uses common and widespread appli-
cations (works on Windows and MacOs) such
as Microsoft Word.
The first step is to prepare pre-existing data
on the server (implemented here in Macintosh
Common Lisp). Then, the data is converted
into rtf by using a different Word style for each
part of information (the style ”headword” for
the headword, the style ”pos” for the part-of-
speech, etc.) and exported. The clients can
open the resulting rtf files locally with their
Word and edit the entries. Finally, the Word
rtf files are reintegrated into the database via a
reverse conversion program.
This solution leads to the construction of
20,000 entries with 50,000 word senses. It was
considered as a very convenient method, never-
theless, two important drawbacks prevented us
to reuse this solution. The first is that in or-
der to convert easily from the database to rtf
and vice-versa, the dictionary entry structure
cannot be too complex. Furthermore, when the
user edits the entry with Word, it is very dif-
ficult to control the syntax of the entry, even
if some Word macros can partially remedy this
problem.
The second is the communication between the
users and the database. The Word files have
to be sent to the users, for example via email.
It introduces inevitably some delay. Further-
more, during the time when the file is stored
on the user machine, no other user can edit the
contents of the file. It was also observed that
sometimes, users abandon their job and forget
to send their files back to the server.
3.3 Online and HTML Forms
In order to work online, we should then use ei-
ther HTML forms, or a Java applet. The use of
HTML forms is interesting at a first glance, be-
cause the implementation is fast and all HTML
browsers can use HTML forms.
On the other hand, the simplicity of the forms
leads to important limitations. The only exist-
ing interactors are: buttons, textboxes, pop-up
menus, and checkboxes.
JavaScripts offer the possibility to enrich the
interactors by verifying for example the content
of a textbox, etc. However, very often they raise
compatibility problems and only some browsers
can interpret them correctly. Thus, we will
avoid them as much as possible.
One of the major drawbacks of this solution
is our need to modify the source code of the
HTML form each time we want to modify the
entry structure. We also need to write as many
HTML forms as there are different entry struc-
tures.
3.4 Online and Java Applets
In order to remedy the limitations of the HTML
forms and to continue to work online, there is
the possibility to use a java applet that will be
executed on the client side. Theoretically, it
is possible to develop an ad hoc editor for any
complicated structure, like the 3.1 solution.
Nevertheless, the problems linked to the use
of a java applet are numerous: the client ma-
chine must have java installed, and it must be
the same java version of the applet. Further-
more, the execution is made on the client ma-
chine, which can be problematic for not very
powerful machines. Moreover, nowadays there
is a strong decrease of java applets usage on the
Web mainly due to the previous compatibility
problems.
3.5 Conclusion
As a result, none of these existing solutions can
fully fulfil our requirements: online edition and
heterogeneous entry structures. We might then
use other approaches that are more generic like
the ones used in interface conception in order
to build our editor. In the remainder of this
paper, we will detail how we used an interface
generation module in Papillon server in order to
generate semi-automatically editing interfaces.
4 Using an Interface Generation
Module
This Papillon module has to generate graphic
user interfaces for consulting and editing dic-
tionary entries. We base our approach on
the work done on Plasticity of User interfaces
(Thevenin and Coutaz, 1999) and the tool ART-
Studio (Calvary et al., 2001). They propose
frameworks and mechanisms to generate semi-
automatically graphic user interfaces for differ-
ent targets. Below we present the design frame-
work and models used.
4.1 Framework for the UI generation
Our approach (Calvary et al., 2002) is based
on four-generation steps (Figure 1). The first
is a manual design for producing initial mod-
els. It includes the application description with
the data, tasks and instances models, and the
description of the context of use. This latter
generally includes the platform where the inter-
action is done, the user who interacts and the
environment where the user is. In our case we
do not describe the environment, since it is too
difficult and not really pertinent for Papillon.
From there, we are able to generate the Abstract
User Interface (AUI). This is a platform inde-
pendent UI. It represents the basic structure of
the dialogue between a user and a computer. In
the third step, we generate the Concrete User
Interface (CUI) based on the Abstract User In-
terface (AUI). It is an instantiation of the AUI
for a given platform. Once the interactor (wid-
get) and the navigation in UI have been chosen,
it is a prototype of the executable UI. The last
stage is the generation of Final User Interface
(FUI). This is the same as concrete user inter-
face (CUI) but it can be executed.
We will now focus on some models that de-
scribe the application.
4.2 Application Models: Data & Task
The Data model describes the concepts that the
user manipulates in any context of use. When
considering plasticity issues, the data model
should cover all usage contexts, envisioned for
the interactive system. By doing so, designers
obtain a global reusable reference model that
can be specialized according to user needs or
Task Concept
Instance
Abstract UI
Concrete
 UI
Concrete
 UI
Final
 UI
Final
 UI
Platform
User
Environment
Platform
User
Environment
Initial description
Transit description
Final description
Figure 1: Multitarget Generation Framework
C_entry
C_head 
word
C_pos
C_list
examples
C_example
I_head 
word
PopUp Menu
I_pos
example1
example2
example3
List•
I_list
examples
I_entry
TextBox
I_
examples
Legend:
Concept
Instance
Link to the interactor used by the concept
Link to a child concept
Link to the instance
TextBox
Figure 2: Data Model Structure
more generally to context of use. A similar de-
sign rationale holds for tasks modeling. For the
Papillon project, the description of data model
corresponds to the XML Schema description
of dictionary and request manipulation. The
tasks’ model is the set of all tasks that will be
implemented independently of the type of user.
It includes modification of the lexical database
and visualization of dictionaries.
As showed on Figure 2, the model of concepts
will drive the choice of interactors and the struc-
ture of the interface.
4.3 Instance Model
It describes instances of the concepts manipu-
lated by the user interface and the dependence
graph between them. For example there is the
concept ”Entry” and one of its instances ”sci-
entifique”. (cf. Figure 3).
I_head
word
I_pos
I_examples 
list
I_entry
I_example I_example
scientifique adj
journées
scientifiques
journal
scientifique
<entry><hv>scientifique</hv>
<pos>adj</pos>
<ex>journées scientifiques</ex>
<ex>journal scientifique</ex></entry>
<ex>journées scientifiques</ex>
<ex>journal scientifique</ex>
Legend:
Instance
Link to the instance value
Link to a child instance
Figure 3: Relation Between the XML Entry and
its Corresponding Instance
This model is described at design time, be-
fore generation, and linked with the task model
(a task uses a set of instances). Each instance
will be effectively created at run-time with data
coming from the Papillon database.
4.4 Platform and Interactors Models
A platform is described by interaction capacity
(for example, screen size, mouse or pen, key-
board, speech recognition, etc.). These capaci-
ties will influence the choice of interactors, pre-
sentation layouts or the navigation in the user
interface.
Associated to the platform there are the inter-
actors (widgets) proposed by the graphic tools-
box of the targeted language (for example Swing
or AWT for Java). In this project interac-
tors are coming from HMTL Forms (textBox,
comboBox, popup menu, button, checkBox, ra-
dioButton) and HTML tags. We also had to
build more complex interactors by a combina-
tion of HTML Forms and HTML Tags.
4.5 User Model
Previous research has shown the difficulty to de-
scribe the cognitive aspects of user behavior.
Therefore, we will simplify by defining differ-
ent user classes (tourist, student, business man,
etc.). Each class will be consisting of a set of de-
sign preferences. Depending on the target class,
the generator will use appropriate design rules.
The model is not yet implemented; it is im-
plicitly used in the data & task models. We
defined different views of data according to the
target:
• all data is rendered for the workstation
editing interface for lexicographers,
• only headword and grammatical class are
rendered and examples are browsable on
the mobile phone interface for a ”normal”
dictionary user.
4.6 Concrete User Interface Model
This model, based on an independent user in-
terface language, describes the graphic user in-
terface, as the final device will render it. It is
target-dependent.
4.7 Final User Interface
From the CUI model, the generator produces
a final interface that will be executed by the
targeted device, and links it with the Papillon
database. In our case we produce:
• HTML code for the workstation,
Figure 4: Generated GUI
• Tiny XHTML code for AU mobile phones,
• and CGI links for the communication with
the database.
Figure 4 shows a simple example of a final
generated UI.
5 Integrating the Module in
Papillon Server
5.1 Implementation
The Papillon server is based on Enhydra, a web
server of Java dynamic objects. The data is
stored as XML objects into an SQL database:
PostgresQL.
ARTStudio tool is entirely written in Java.
For its integration into the Papillon/Enhydra
server, we created a java archive for the codes
to stay independent.
The Papillon/Enhydra server can store java
objects during a user session. When the user
connects to the Papillon server with a browser,
a session is created and the user is identified
thanks to a cookie. When the user opens the
dictionary entry editor, the java objects needed
for the editor will be kept until the end of the
session.
5.2 A Working Session
When the editor is launched, the models cor-
responding to the entry structure are loaded.
Then, if an entry is given as a parameter (edit-
ing an existing entry), the entry template is in-
stantiated with the data contained in that entry.
If no entry is given, the template is instantiated
with an empty entry. Finally, the instantiated
models and entry templates are stored into the
session data and the result is displayed embed-
ded in an HTML form, through a Web page
(Figure 4).
Then, after a user modification (e.g. adding
an item to the examples list), the HTML form
entry
head
word
pos example example
scientifique adj
journées
scientifiques
journal
scientifique
Legend:
XML
Element
Link to a child element
Link to the element valuetextual content
Figure 5: Abstract View of an Entry
sends the data to the server via a CGI mecha-
nism. The server updates the models and tem-
plate stored in the session data and sends back
the modified result in the HTML page.
At the end of the session, the modified entry is
extracted from the session data and then stored
as a contribution in the database.
6 An Editing Example
6.1 A Dictionary Entry
Figure 5 shows an abstract view of a simple dic-
tionary entry. It is the entry ”scientifique” (sci-
entific) of a French monolingual dictionary. The
entry has been simplified on purpose. The en-
tries are stored as XML text into the database.
6.2 Entry Structure
The generation of the graphic interface is mostly
based on the dictionary microstructure. In the
Papillon project, we describe them with XML
schemata. We chose XML schemata instead of
DTDs because they allow for a more precise
description of the data structure and handled
types. For example, it is possible to describe the
textual content of an XML element as a closed
value list. In this example, the French part-of-
speech type is a closed list of ”nom”, ”verb”,
and ”adj”.
Figure 6 is an abstract view of the structure
corresponding to the previous French monolin-
gual dictionary entry.
6.3 Entry Displayed in the Editor
The dictionary entry of Figure 5 is displayed in
the HTML editor as in Figure 4. In the follow-
ing one (Figure 7), an example has been added
in the list by pushing the + button.
6.4 A More Complex Entry
In the following figure (Figure 8), we show the
entry �y� (taberu, to eat) of the Papillon
Japanese monolingual volume. The entry struc-
ture comes from the DiCo structure (Polgu`ere,
entry
head 
word
pos example
Occurences: 0 to ∞Occurence: 1Occurence: 1
Legend:
XML 
Element
Type
Link to a child element
Link to the element type
text
list:
 nom
 verbe
 adj
text
Figure 6: Structure of an Entry
Figure 7: Entry Displayed in the Editor
2000), a light simplification of the ECD by
Mel’ˇcuk & al.
Two interesting points may be highlighted.
You can note that not only the content of the
entry is in Japanese, but also the text labels
of the information. For example, the first one,
��W� (midashigo) means headword. The
interface generator is multitarget: it generates
the whole HTML content. It is then possible to
redefine the labels for each language.
The second point is the complexity of the en-
try structure. There is a list of lexical functions.
Each lexical function consists of a name and a
list of valgroups (group of values), and in turn,
eachvalgroupconsists ofalistofvalues. Finally,
each value is a textbox. The lists are nested the
one in the other one and it is possible to use the
lists + and - operators at any level.
7 Evaluation
7.1 Preamble
This paper focuses on one particular function-
ality of the Papillon platform: the generic edi-
tor. Its purpose is not to present the building of
Papillon dictionary or the progress of Papillon
Project as a whole. The evaluation will then
focus on the editor.
Figure 8: Papillon Entry Displayed in the Edi-
tor
The contribution phase on Papillon project
has not begun yet. Thus, for the moment, very
few users tested the editor. We have not yet
enough data to evaluate seriously the usability
of the interface. Then, the evaluation will be
driven on the technical aspects of the editor.
In order to evaluate the genericity and the
usability of the editor, we generated interfaces
for two other dictionary structures: the GDEF
Estonian-French dictionary and the WaDoku-
JiTen Japanese-German dictionary.
7.2 Edition of the GDEF dictionary
The GDEF project (Big Estonian-French Dic-
tionary) is managed by Antoine Chalvin from
INALCO, Paris. The dictionary microstructure
is radically different from the Papillon dictio-
nary as you will see in Figure 9 compared to
figure 8. You may notice the 6 levels of recur-
sion embedded in the entry structure.
It took about one week to write the interface
description files for the new dictionary structure
in order to generate properly a complete inter-
face for the GDEF dictionary.
7.3 Edition of the WaDokuJiTen
The WaDokuJiTen project is managed by Ul-
rich Apel, now invited researcher at NII, Tokyo.
The dictionary is originally stored in a File-
Maker database. It has more than 200,000 en-
tries. It took four days to export integrate the
dictionary in Papillon platform and to write the
Figure 9: GDEF Entry Displayed in the Editor
Figure 10: WaDokuJiTen Entry Displayed in
the Editor
files needed for the generation of the editor in-
terface. The integration was done in 4 steps:
export the dictionary from FileMaker into an
XML file, tag the implicit structure with a perl
script, write the metadata files and upload the
dictionary on the Papillon server.
The dictionary microstructure is simpler than
the previousone(seefigure 10). Ittookonlytwo
days to write the files needed for the generation
of the editor interface.
8 Conclusion
The implementation of ARTStudio and Papil-
lon plateform started separately four years ago.
The development of the HTML generation mod-
ule in ARTStudio and its integration into Papil-
lon platform took about a year from the first
specifications and the installation of a com-
plete and functional version on the Papillon
server. The collaboration between a specialist of
computational lexicography and a specialist of
the adaptability of interfaces has produced very
original and interesting work. Furthermore, the
evaluation and the feedback received from the
users is very positive. Now, we want to further
pursue this work following several paths.
First of all, only a specialist can use the ex-
isting interface for Papillon entry since it is too
complex for a beginner. We plan to generate dif-
ferent interface types adapted to the varied user
needs and competences. Thanks to the modu-
larity of the editor, we need only to describe the
tasks and instance models corresponding to the
desired interface.
For the moment, the interface generation is
not fully automatic; some of the model descrip-
tions used by the editor have to be written ”by
hand”. This is why we are working now on
automating the whole generation process and
the implementation of graphical editors allow-
ing users to post-edit or modify a generated in-
terface description.

References

Ga¨elle Calvary, Jo¨elle Coutaz, and David Thevenin. 2001.
Unifying reference framework for the development of plas-
tic user interfaces. In EHCI’01, IFIP WG2.7 (13.2) Work-
ing Conference, pages 173–192, Toronto, Canada.

Ga¨elle Calvary, Jo¨elle Coutaz, David Thevenin, Quentin Lim-
bourg, Nathalie Souchon, Laurent Bouillon, and Jean
Vanderdonckt. 2002. Plasticity of user interfaces: A re-
vised reference framework. In Proc. TAMODIA 2002,
pages 127–134, Bucharest, Romania. INFOREC Publish-
ing House.

Mathieu Mangeot. 2002. An xml markup language frame-
work for lexical databases environments: the dictionary
markup language. In International Standards of Termi-
nology and Language Resources Management, pages 37–
44, Las Palmas, Spain, May.

Igor Mel’ˇcuk, Nadia Arbatchewsky-Jumarie, L´eo Eltnisky,
Lidija Iordanskaja, Ad`ele Lessard, Suzanne Mantha, and
Alain Polgu`ere. 1984,88,92,96. DEC : Dictionnaire expli-
catif et combinatoire du fran¸cais contemporain, recherches
lexico-s´emantiques I,II,III et IV. Presses de l’universit´e de
Montr´eal, Canada.

Alain Polgu`ere. 2000. Towards a theoretically-motivated gen-
eral public dictionary of semantic derivations and col-
locations for french. In Proceeding of EURALEX’2000,
Stuttgart, pages 517–527.

Gilles S´erasset and Mathieu Mangeot. 1998. L’´edition lexi-
cographique dans un syst`eme g´en´erique de gestion de bases
lexicales multilingues. In NLP-IA, volume 1, pages 110–
116, Moncton, Canada.

Gilles S´erasset and Mathieu Mangeot. 2001. Papillon lexical
database project: Monolingual dictionaries and interlin-
gual links. In NLPRS-2001, pages 119–125, Tokyo, 27-30
November.

Gilles S´erasset. 1997. Le projet nadia-dec : vers un diction-
naire explicatif et combinatoire informatis´e ? In LTT’97,
volume 1, pages 149–160, Tunis, Septembre. Actualit´e sci-
entifique, AUPELF-UREF.

David Thevenin and Jo¨elle Coutaz. 1999. Plasticity of user
interfaces: Framework and research agenda. In Interact’99
Seventh IFIP Conference on Human-Computer Interac-
tion, volume 1, pages 110–117, Edinburgh, Scotland.
