File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1904_intro.xml
Size: 6,388 bytes
Last Modified: 2025-10-06 14:04:03
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1904"> <Title>Evaluation and Improvement of Cross-Lingual Question Answering Strategies</Title> <Section position="2" start_page="0" end_page="24" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> When a question is asked in a certain language on the Web, it can be interesting to look for the answer to the question in documents written in other languages in order to increase the number of documents returned. The CLEF evaluation campaign for cross-language question answering systems addresses this issue by encouraging the development of such systems.</Paragraph> <Paragraph position="1"> The objective of question answering systems is to return precise answers to natural-language questions, instead of the list of documents usually returned by a search engine. The opening to multilingualism of question answering systems raises issues both for the Information Retrieval and the Information Extraction points of view.</Paragraph> <Paragraph position="2"> This article presents a cross-language question answering system able to treat questions and documents either in French or in English. Two different strategies for shifting language are evaluated, and several possibilities of evolution are presented.</Paragraph> <Paragraph position="3"> 2 Presentation of our question answering system Our bilingual question answering system has participated in the CLEF 2005 evaluation campaign 1. The CLEF QA task aims at evaluating different question answering systems on a given set of questions, and a given corpus of documents, the questions and the documents being either in the same language (except English) or in two differents languages. Last year, our system participated in the French to English task, for which the questions are in French and the documents to search in English.</Paragraph> <Paragraph position="4"> This system is composed of several modules that are presented Figure 1. The first module analyses the questions, and tries to detect a few of their characteristics, that will enable us to find the answers in the documents. Then the collection is processed thanks to MG search engine 2. The documents returned are reindexed according to the presence of the question terms, and more precisely to the number and type of these terms ; next, a module recognizes the named entities, and the sentences from the documents are weighted according to the information on the question. Finally, different processes are applied depending on the expected answer type, in order to extract answers from the sentences.</Paragraph> <Paragraph position="5"> FIG. 1 - Architecture of our cross-language question answering system question translation and term-by-term translation. These approaches have been implemented and evaluated by many systems in the CLEF evaluations, which gives a wide state-of-the-art of this domain and of the possible cross-language strategies. null The first approach consists in translating the whole question into the target language, and then processing the question analysis in this target language. This approach is the most widely used, and has for example been chosen by the following systems : (Perret, 2004), (Jijkoun et al., 2004), (Neumann and Sacaleanu, 2005), (de Pablo-S'anchez et al., 2005), (Tanev et al., 2005). Among these systems, several have measured the performance loss between their monolingual and their bilingual systems. Thus, the English-French version of (Perret, 2004) has a 11 % performance loss (in terms of absolute loss), dropping from 24.5% to 13.5% of correct answers. The English-Dutch version of (Jijkoun et al., 2004)'s system has an approximative 10% performance loss of correct answers : the percentage of correct answers drops from 45.5% to 35%. As for (de Pablo-S'anchez et al., 2005), they lose 6% of correct answers between their Spanish monolingual system and their English-Spanish bi-lingual system. (Hartrumpf, 2005) also conducted an experiment by translating the questions from English to German, and reports a drop from about 50% of performance.</Paragraph> <Paragraph position="6"> For their cross-language system, (Neumann and Sacaleanu, 2004) chose to use several machine translation tools, and to gather the different translations into a &quot;bag of words&quot; that is used to expand queries. Synonyms are also added to the &quot;bag of words&quot; and EuroWordNet 3 is used to 3Multilingual database with wordnets for several Eurodisambiguate. They lose quite few correct answers between their German monolingual system and their German-English bilingual system, with which they obtain respectively 25 and 23.5% of correct answers.</Paragraph> <Paragraph position="7"> Translating the question raises two main problems : syntactically incorrect questions may be produced, and the resolution of translation ambiguities may be wrong. Moreover, the unknown words such as some proper names are not or incorrectly translated. We will describe later several possibilities to deal with these problems, as well as our own solution.</Paragraph> <Paragraph position="8"> Other systems such as (Sutcliffe et al., 2005) or (Tanev et al., 2004) use a term-by-term translation. In this approach, the question is analyzed in the source language and then the information returned by the question analysis is translated into the target language. (Tanev et al., 2004), who participated in the Bulgarian-English and Italian-English tasks in 2004, translate the question keywords by using bilingual dictionaries and MultiWordNet 4.</Paragraph> <Paragraph position="9"> In order to limit the noise stemming from the different translations and to have a better cohesion, they validate the translations in two large corpora, AQUAINT and TIPSTER. This system got a score of 22.5% of correct answers in the bilingual task, and 28% in the monolingual task in 2004.</Paragraph> <Paragraph position="10"> (Sutcliffe et al., 2005) combine two translation tools and a dictionary to translate phrases. Eventually, (Laurent et al., 2005) also translate words or idioms, by using English as a pivot language.</Paragraph> <Paragraph position="11"> The performance of this system is of 64% of correct answers for the French monolingual task, and 39.5% for the English-French bilingual task.</Paragraph> </Section> class="xml-element"></Paper>