File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1148_metho.xml
Size: 22,008 bytes
Last Modified: 2025-10-06 14:08:46
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1148"> <Title>Online Generic Editing of Heterogeneous Dictionary Entries in Papillon Project</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2.1 Online Edition </SectionTitle> <Paragraph position="0"> In order to build a multilingual dictionary that covers a lot of languages, we need large competences in those languages. It may be possible to find an expert with enough knowledge of 3 or 4 languages but when that number reaches 10 languages (like now), it is almost impossible. Thus, we need contributors from all over the world.</Paragraph> <Paragraph position="1"> Furthermore, in order to avoid pollution of the database, we plan a two-step integration of the contributions in the database. When a contributor finishes a new contribution, it is stored into his/her private user space until it is revised by a specialist and integrated into the database. Then, each data needs to be revised although the revisers may not work in the same place of the initial contributors.</Paragraph> <Paragraph position="2"> Thus, the first requirement for the editor is to work online on the Web.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Heterogeneous Entry Structures </SectionTitle> <Paragraph position="0"> The Papillon platform is built for generic purposes. Thus, it can manipulate not only the Papillon dictionary but also any kind of dictionary encoded in XML (Mangeot, 2002). The lexical data is organized in 3 layers: * Limbo contains dictionaries in their original format and structure; * Purgatory contains dictionaries in their original format but encoded in XML; * Paradise contains the target dictionary, in our case Papillon dictionary.</Paragraph> <Paragraph position="1"> The Purgatory data can be reused for building the Paradise dictionary.</Paragraph> <Paragraph position="2"> We would like then to be able to edit different dictionaries structures from Paradise but also from Purgatory. Furthermore, being Papillon a research project, entry structures may evolve during the life of the project, since they are not fixed from the beginning.</Paragraph> <Paragraph position="3"> Hence, the second requirement is that the editor must deal with heterogeneous and evolving entry structures.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 Extra Requirements </SectionTitle> <Paragraph position="0"> Previous requirements must be fulfilled, whilst the following ones are optional.</Paragraph> <Paragraph position="1"> The contributors will have various competences and use the editor for different purposes (a specialist in speech may add the pronunciation, a linguist may enter grammatical information, a translator would like to add interlingual links, and a reviewer will check the existing contributions, etc.).</Paragraph> <Paragraph position="2"> The second optional requirement concerns the adaptation to the user platform. The increasing number of smart mobile phones and PDAs makes real the following scenarios: adding an interlingual link with a mobile phone, adding small parts of information with a PDA and revising the whole entry with a workstation.</Paragraph> <Paragraph position="3"> It would then be very convenient if the editor could adapt itself both to the user and to the platform.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.4 Final Aim </SectionTitle> <Paragraph position="0"> Guided by these requirements, our final aim is to generate, as much automatically as possible, online interfaces for editing dictionary entries.</Paragraph> <Paragraph position="1"> It has to be taken into account the fact that entry structures are heterogeneous and may vary and to try to adapt as much as possible these interfaces to the different kinds of users and platforms. null</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Overview of Existing Editing </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Methods 3.1 Local and Ad Hoc </SectionTitle> <Paragraph position="0"> The best way to implement a most comfortable editor for the users is to implement an ad-hoc application like the one developed for the NADIA-DEC project: DECID (S'erasset, 1997).</Paragraph> <Paragraph position="1"> It was conceived to edit entries for the ECD (Mel'Vcuk et al., 1984889296). The Papillon microstructure is based on a simplification of this structure. We were indeed very interested by such software. It is very convenient - for example - for editing complex lexical functions.</Paragraph> <Paragraph position="2"> But several drawbacks made it impossible to use in our project. First, the editor was developed ad hoc for a particular entry structure.</Paragraph> <Paragraph position="3"> If we want to change that structure, we must reimplement changes in the editor.</Paragraph> <Paragraph position="4"> Second, the editor is platform-dependent (here written and compiled for MacOs). The users have to work locally and cannot contribute online.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Distributed and Democratic </SectionTitle> <Paragraph position="0"> This solution implemented for the construction of the French-UNL dictionary (S'erasset and Mangeot, 1998) project is called &quot;democratic&quot; because it uses common and widespread applications (works on Windows and MacOs) such as Microsoft Word.</Paragraph> <Paragraph position="1"> The first step is to prepare pre-existing data on the server (implemented here in Macintosh Common Lisp). Then, the data is converted into rtf by using a different Word style for each part of information (the style &quot;headword&quot; for the headword, the style &quot;pos&quot; for the part-ofspeech, etc.) and exported. The clients can open the resulting rtf files locally with their Word and edit the entries. Finally, the Word rtf files are reintegrated into the database via a reverse conversion program.</Paragraph> <Paragraph position="2"> This solution leads to the construction of 20,000 entries with 50,000 word senses. It was considered as a very convenient method, nevertheless, two important drawbacks prevented us to reuse this solution. The first is that in order to convert easily from the database to rtf and vice-versa, the dictionary entry structure cannot be too complex. Furthermore, when the user edits the entry with Word, it is very difficult to control the syntax of the entry, even if some Word macros can partially remedy this problem.</Paragraph> <Paragraph position="3"> The second is the communication between the users and the database. The Word files have to be sent to the users, for example via email.</Paragraph> <Paragraph position="4"> It introduces inevitably some delay. Furthermore, during the time when the file is stored on the user machine, no other user can edit the contents of the file. It was also observed that sometimes, users abandon their job and forget to send their files back to the server.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Online and HTML Forms </SectionTitle> <Paragraph position="0"> In order to work online, we should then use either HTML forms, or a Java applet. The use of HTML forms is interesting at a first glance, because the implementation is fast and all HTML browsers can use HTML forms.</Paragraph> <Paragraph position="1"> On the other hand, the simplicity of the forms leads to important limitations. The only existing interactors are: buttons, textboxes, pop-up menus, and checkboxes.</Paragraph> <Paragraph position="2"> JavaScripts offer the possibility to enrich the interactors by verifying for example the content of a textbox, etc. However, very often they raise compatibility problems and only some browsers can interpret them correctly. Thus, we will avoid them as much as possible.</Paragraph> <Paragraph position="3"> One of the major drawbacks of this solution is our need to modify the source code of the HTML form each time we want to modify the entry structure. We also need to write as many HTML forms as there are different entry structures. null</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.4 Online and Java Applets </SectionTitle> <Paragraph position="0"> In order to remedy the limitations of the HTML forms and to continue to work online, there is the possibility to use a java applet that will be executed on the client side. Theoretically, it is possible to develop an ad hoc editor for any complicated structure, like the 3.1 solution.</Paragraph> <Paragraph position="1"> Nevertheless, the problems linked to the use of a java applet are numerous: the client machine must have java installed, and it must be the same java version of the applet. Furthermore, the execution is made on the client machine, which can be problematic for not very powerful machines. Moreover, nowadays there is a strong decrease of java applets usage on the Web mainly due to the previous compatibility problems.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.5 Conclusion </SectionTitle> <Paragraph position="0"> As a result, none of these existing solutions can fully fulfil our requirements: online edition and heterogeneous entry structures. We might then use other approaches that are more generic like the ones used in interface conception in order to build our editor. In the remainder of this paper, we will detail how we used an interface generation module in Papillon server in order to generate semi-automatically editing interfaces.</Paragraph> </Section> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Using an Interface Generation </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Module </SectionTitle> <Paragraph position="0"> This Papillon module has to generate graphic user interfaces for consulting and editing dictionary entries. We base our approach on the work done on Plasticity of User interfaces (Thevenin and Coutaz, 1999) and the tool ARTStudio (Calvary et al., 2001). They propose frameworks and mechanisms to generate semi-automatically graphic user interfaces for different targets. Below we present the design framework and models used.</Paragraph> <Paragraph position="1"> 4.1 Framework for the UI generation Our approach (Calvary et al., 2002) is based on four-generation steps (Figure 1). The first is a manual design for producing initial models. It includes the application description with the data, tasks and instances models, and the description of the context of use. This latter generally includes the platform where the interaction is done, the user who interacts and the environment where the user is. In our case we do not describe the environment, since it is too difficult and not really pertinent for Papillon.</Paragraph> <Paragraph position="2"> From there, we are able to generate the Abstract User Interface (AUI). This is a platform independent UI. It represents the basic structure of the dialogue between a user and a computer. In the third step, we generate the Concrete User Interface (CUI) based on the Abstract User Interface (AUI). It is an instantiation of the AUI for a given platform. Once the interactor (widget) and the navigation in UI have been chosen, it is a prototype of the executable UI. The last stage is the generation of Final User Interface (FUI). This is the same as concrete user interface (CUI) but it can be executed.</Paragraph> <Paragraph position="3"> We will now focus on some models that describe the application.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Application Models: Data & Task </SectionTitle> <Paragraph position="0"> The Data model describes the concepts that the user manipulates in any context of use. When considering plasticity issues, the data model should cover all usage contexts, envisioned for the interactive system. By doing so, designers obtain a global reusable reference model that can be specialized according to user needs or more generally to context of use. A similar design rationale holds for tasks modeling. For the Papillon project, the description of data model corresponds to the XML Schema description of dictionary and request manipulation. The tasks' model is the set of all tasks that will be implemented independently of the type of user.</Paragraph> <Paragraph position="1"> It includes modification of the lexical database and visualization of dictionaries.</Paragraph> <Paragraph position="2"> As showed on Figure 2, the model of concepts will drive the choice of interactors and the structure of the interface.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 Instance Model </SectionTitle> <Paragraph position="0"> It describes instances of the concepts manipulated by the user interface and the dependence graph between them. For example there is the concept &quot;Entry&quot; and one of its instances &quot;scientifique&quot;. (cf. Figure 3).</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> its Corresponding Instance </SectionTitle> <Paragraph position="0"> This model is described at design time, before generation, and linked with the task model (a task uses a set of instances). Each instance will be effectively created at run-time with data coming from the Papillon database.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.4 Platform and Interactors Models </SectionTitle> <Paragraph position="0"> A platform is described by interaction capacity (for example, screen size, mouse or pen, keyboard, speech recognition, etc.). These capacities will influence the choice of interactors, presentation layouts or the navigation in the user interface.</Paragraph> <Paragraph position="1"> Associated to the platform there are the interactors (widgets) proposed by the graphic toolsbox of the targeted language (for example Swing or AWT for Java). In this project interactors are coming from HMTL Forms (textBox, comboBox, popup menu, button, checkBox, radioButton) and HTML tags. We also had to build more complex interactors by a combination of HTML Forms and HTML Tags.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.5 User Model </SectionTitle> <Paragraph position="0"> Previous research has shown the difficulty to describe the cognitive aspects of user behavior.</Paragraph> <Paragraph position="1"> Therefore, we will simplify by defining different user classes (tourist, student, business man, etc.). Each class will be consisting of a set of design preferences. Depending on the target class, the generator will use appropriate design rules.</Paragraph> <Paragraph position="2"> The model is not yet implemented; it is implicitly used in the data & task models. We defined different views of data according to the target: * all data is rendered for the workstation editing interface for lexicographers, * only headword and grammatical class are rendered and examples are browsable on the mobile phone interface for a &quot;normal&quot; dictionary user.</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.6 Concrete User Interface Model </SectionTitle> <Paragraph position="0"> This model, based on an independent user interface language, describes the graphic user interface, as the final device will render it. It is target-dependent.</Paragraph> </Section> <Section position="8" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.7 Final User Interface </SectionTitle> <Paragraph position="0"> From the CUI model, the generator produces a final interface that will be executed by the targeted device, and links it with the Papillon database. In our case we produce: * HTML code for the workstation, the database.</Paragraph> <Paragraph position="1"> Figure 4 shows a simple example of a final generated UI.</Paragraph> </Section> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Integrating the Module in </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Papillon Server 5.1 Implementation </SectionTitle> <Paragraph position="0"> The Papillon server is based on Enhydra, a web server of Java dynamic objects. The data is stored as XML objects into an SQL database: PostgresQL.</Paragraph> <Paragraph position="1"> ARTStudio tool is entirely written in Java. For its integration into the Papillon/Enhydra server, we created a java archive for the codes to stay independent.</Paragraph> <Paragraph position="2"> The Papillon/Enhydra server can store java objects during a user session. When the user connects to the Papillon server with a browser, a session is created and the user is identified thanks to a cookie. When the user opens the dictionary entry editor, the java objects needed for the editor will be kept until the end of the session.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.2 A Working Session </SectionTitle> <Paragraph position="0"> When the editor is launched, the models corresponding to the entry structure are loaded.</Paragraph> <Paragraph position="1"> Then, if an entry is given as a parameter (editing an existing entry), the entry template is instantiated with the data contained in that entry.</Paragraph> <Paragraph position="2"> If no entry is given, the template is instantiated with an empty entry. Finally, the instantiated models and entry templates are stored into the session data and the result is displayed embedded in an HTML form, through a Web page (Figure 4).</Paragraph> <Paragraph position="3"> Then, after a user modification (e.g. adding an item to the examples list), the HTML form sends the data to the server via a CGI mechanism. The server updates the models and template stored in the session data and sends back the modified result in the HTML page.</Paragraph> <Paragraph position="4"> At the end of the session, the modified entry is extracted from the session data and then stored as a contribution in the database.</Paragraph> </Section> </Section> <Section position="8" start_page="0" end_page="0" type="metho"> <SectionTitle> 6 An Editing Example </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 6.1 A Dictionary Entry </SectionTitle> <Paragraph position="0"> Figure 5 shows an abstract view of a simple dictionary entry. It is the entry &quot;scientifique&quot; (scientific) of a French monolingual dictionary. The entry has been simplified on purpose. The entries are stored as XML text into the database.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 6.2 Entry Structure </SectionTitle> <Paragraph position="0"> The generation of the graphic interface is mostly based on the dictionary microstructure. In the Papillon project, we describe them with XML schemata. We chose XML schemata instead of DTDs because they allow for a more precise description of the data structure and handled types. For example, it is possible to describe the textual content of an XML element as a closed value list. In this example, the French part-of-speech type is a closed list of &quot;nom&quot;, &quot;verb&quot;, and &quot;adj&quot;.</Paragraph> <Paragraph position="1"> Figure 6 is an abstract view of the structure corresponding to the previous French monolingual dictionary entry.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 6.3 Entry Displayed in the Editor </SectionTitle> <Paragraph position="0"> The dictionary entry of Figure 5 is displayed in the HTML editor as in Figure 4. In the following one (Figure 7), an example has been added in the list by pushing the + button.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 6.4 A More Complex Entry </SectionTitle> <Paragraph position="0"> In the following figure (Figure 8), we show the entry y (taberu, to eat) of the Papillon Japanese monolingual volume. The entry structure comes from the DiCo structure (Polgu`ere, 2000), a light simplification of the ECD by Mel'Vcuk & al.</Paragraph> <Paragraph position="1"> Two interesting points may be highlighted. You can note that not only the content of the entry is in Japanese, but also the text labels of the information. For example, the first one, W (midashigo) means headword. The interface generator is multitarget: it generates the whole HTML content. It is then possible to redefine the labels for each language.</Paragraph> <Paragraph position="2"> The second point is the complexity of the entry structure. There is a list of lexical functions. Each lexical function consists of a name and a list of valgroups (group of values), and in turn, eachvalgroupconsists ofalistofvalues. Finally, each value is a textbox. The lists are nested the one in the other one and it is possible to use the lists + and - operators at any level.</Paragraph> </Section> </Section> class="xml-element"></Paper>