File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/02/w02-1110_abstr.xml
Size: 3,249 bytes
Last Modified: 2025-10-06 13:42:35
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1110"> <Title>Frameworks, Implementation and Open Problems for the Collaborative Building of a Multilingual Lexical Database</Title> <Section position="2" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> Many NLP systems are based on lexical data. The development costs of such data are a major drawback in such NLP systems.</Paragraph> <Paragraph position="1"> In order to cut these costs, we adopt a strategy inspired from &quot;open-source&quot; projects to allow volunteers to collaborate in the creation of a multilingual lexical database.</Paragraph> <Paragraph position="2"> For this, we had to specify and develop tools to manage a lexical database containing information complete and detailed enough to be usable for a wide range of applications.</Paragraph> <Paragraph position="3"> This paper presents our project and details the tools, frameworks and structures used to manage such a database. We will also show some research problems still to be addressed in this context.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Resume </SectionTitle> <Paragraph position="0"> La connaissance linguistique reste une constituante importante de nombreux systemes de traitement automatique des langues (TAL). Le cout de creation d'un dictionnaire est l'un des freins majeurs dans le developpement de ces systemes.</Paragraph> <Paragraph position="1"> Afin de reduire les couts de creation de cette connaissance lexicale, nous adoptons une methode inspiree des projets &quot;open-source&quot; afin de creer une base lexicale multilingue.</Paragraph> <Paragraph position="2"> Pour cela, nous avons specifie et developpe des outils de gestion d'une base lexicale contenant des informations suffisamment completes et detaillees pour etres utilisees dans de nombreuses applications differentes.</Paragraph> <Paragraph position="3"> Cet article presente notre projet et detaille les outils, les cadres et les structures utilisees pour la gestion de cette base.</Paragraph> <Paragraph position="4"> Nous montrons aussi certains problemes de recherche ouverts qu'il nous faut aborder dans ce contexte.</Paragraph> <Paragraph position="5"> Introduction Many NLP systems are based on lexical data. The development costs of such data are a major drawback in such NLP systems. Furthermore, the existing lexical data have generally been developed for a specific purpose and can't be reused easily in other applications.</Paragraph> <Paragraph position="6"> The Papillon project applies some tools and methods to develop multipurpose, multilingual lexical data collaboratively on Internet. This data is complete and detailed enough to be eventually used either by NLP systems (MT engines for example) or by human users (language learners, translators...).</Paragraph> <Paragraph position="7"> After presenting the motivations of the Papillon project, we will show the management of existing data. Then we will describe the structure of the Papillon dictionary, and the tools that are used to allow contributions from Internet volunteers.</Paragraph> </Section> </Section> class="xml-element"></Paper>