File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1902_metho.xml
Size: 13,748 bytes
Last Modified: 2025-10-06 14:10:48
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1902"> <Title>The Affect of Machine Translation on the Performance of Arabic- English QA System</Title> <Section position="4" start_page="9" end_page="12" type="metho"> <SectionTitle> 3 Experimental Approach </SectionTitle> <Paragraph position="0"> To run this experiment, 199 questions were randomly compiled from the TREC QA track, namely from TREC-8, TREC-9, TREC-11, TREC-2003 and TREC-2004, to be run through AnswerFinder, the results of which are discussed in section 3.1. The selected 199 English TREC questions were translated into Arabic by one of the authors (who is an Arabic speaker), and then fed into Systran to translate them into English.</Paragraph> <Paragraph position="1"> The analysis of translation is discussed in detail in section 3.2.</Paragraph> <Section position="1" start_page="9" end_page="10" type="sub_section"> <SectionTitle> 3.1 Performance of AnswerFinder </SectionTitle> <Paragraph position="0"> The 199 questions were run over AnswerFinder; divided as follows: 92 factoid questions, 51 definition questions and 56 list questions. The answers were manually assessed following an assessment scheme similar to the answer categories in iCLEF 2004: * Correct: if the answer string is valid and supported by the snippets EACL 2006 Workshop on Multilingual Question Answering - MLQA06 * Non-exact: if the answer string is missing some information, but the full answer is found in the snippets.</Paragraph> <Paragraph position="1"> * Wrong: if the answer string and the snippets are missing important information or both the answer string and the snippets are wrong compared with the answer key.</Paragraph> <Paragraph position="2"> * No answer: if the system does not return any answer at all.</Paragraph> <Paragraph position="3"> Table 1 provides an overall view, the system correctly answered 42.6% of these questions, whereas 25.8% wrongly, 23.9% no answer and 8.1% non-exactly. Table 2 illustrates Answer-Finder' abilities to answer each type of these questions separately.</Paragraph> <Paragraph position="4"> formance-monolingual run To measure the performance of Answer-Finder, recall (ratio of relevant items retrieved to all relevant items in a collection) and precision (the ratio of relevant items retrieved to all retrieved items) were calculated. Thus, recall and precision and F-measure for AnswerFinder are, 0.51 and 0.76, 0.6 respectively.</Paragraph> </Section> <Section position="2" start_page="10" end_page="10" type="sub_section"> <SectionTitle> 3.2 Systran Translation </SectionTitle> <Paragraph position="0"> Most of the errors noticed during the translation process were of the following types: wrong transliteration, wrong word sense, wrong word order, and wrong pronoun translations. Table 3 lists Systran's translation errors to provide correct transliteration 45.7%, wrong word senses (key word) 31%, wrong word order 25%, and wrong translation of pronoun 13.5%.</Paragraph> <Paragraph position="1"> Below is a discussion of Systran' translation accuracy and the problems that occurred during translation of the TREC QA track questions.</Paragraph> </Section> <Section position="3" start_page="10" end_page="11" type="sub_section"> <SectionTitle> Wrong Transliteration </SectionTitle> <Paragraph position="0"> Wrong transliteration is the most common error that encountered during translation. Transliteration is the process of replacing words in the source language with their phonetic equivalent in the target language. Al-Onaizan and Knight (2002) state that transliterating names from Arabic into English is a non-trivial task due to the differences in their sound and writing system.</Paragraph> <Paragraph position="1"> Also, there is no one-to-one correspondence between Arabic sounds and English sounds. For example P and B are both mapped to the single Arabic letter &quot;b &quot;; Arabic &quot;H &quot; and &quot; &quot;are mapped into English H.</Paragraph> <Paragraph position="2"> Transliteration mainly deals with proper names errors when MT doesn't recognize them as a proper name and translates them instead of transliterating them. Systran produced 91 questions (45.7%) with wrong transliteration. It translated some names literally, especially those with a descriptive meaning. Table 4 provides an example of such cases where &quot;Aga&quot; was wrongly transliterated; and &quot;khan&quot; was translated to &quot;betray&quot; where it should have been transliterated. This can also be seen in table 5; &quot;Hassan Rohani&quot; was translated literally as &quot;Spiritual goodness&quot;. null</Paragraph> </Section> <Section position="4" start_page="11" end_page="11" type="sub_section"> <SectionTitle> Wrong Word Sense </SectionTitle> <Paragraph position="0"> Wrong translation of words can occur when a single word can have different senses according to the context in which the word is used. Word sense problems are commoner in Arabic than in a language like English as Arabic vowels (written as diacritics) are largely unwritten in most texts.</Paragraph> <Paragraph position="1"> Systran translated 63 questions (31.2%) with at least one wrong word sense, 25% of them could have been resolved by adding diacritics.</Paragraph> <Paragraph position="2"> Table 6 illustrates an error resulting from Systran's failure to translate words correctly even after diacritics have been added; the term &quot;un i &quot; (psychology) was wrongly translated as &quot;flag of breath&quot;. The Arabic form is a compound phrase; however Systran translated each word individually even after diacritics were added.</Paragraph> <Paragraph position="3"> Original text Who was the father of psy- null incorrect sense choice These errors can occur when a single word can have different senses according to the con-text in which the word is used. They also occur due to the diacritization in Arabic language. Arabic writing involves diacritization (vowel), which is largely ignored in modern texts. Ali (2003) gives a good example that can make an English speaker grasp the complexity caused by dropping Arabic diacritization. Suppose that vowels are dropped from an English word and the result is 'sm'. The possibilities of the original word are: some, same, sum, and semi.</Paragraph> <Paragraph position="4"> Systran translated 63 questions out of 199 (31.2%) with wrong word sense, 25% of them can be resolved by adding diacritization.</Paragraph> </Section> <Section position="5" start_page="11" end_page="11" type="sub_section"> <SectionTitle> Wrong Word Order </SectionTitle> <Paragraph position="0"> Word order errors occurred when the translated words were in order that made no sense and this problem produced grammatically ill formed sentences that the QA system was unable to process.</Paragraph> <Paragraph position="1"> Systran translated 25% of the questions with wrong word orders which lead to the construction of ungrammatical questions. Table 7 shows an example of wrong word order.</Paragraph> </Section> <Section position="6" start_page="11" end_page="11" type="sub_section"> <SectionTitle> Wrong Pronoun </SectionTitle> <Paragraph position="0"> Systran frequently translated the pronoun &quot;w &quot; to &quot;air&quot; in place of &quot;him&quot; or &quot;it&quot; as shown in table 8. Table 9 shows an example pronoun problems; the pronoun &quot; &quot; is translated into &quot;her&quot;, instead of &quot;it&quot;, which refers to &quot;building&quot; in this case.</Paragraph> <Paragraph position="1"> Original text Who is Colin Powell? It has been observed that Systran translation errors exhibited some clear regularities for certain questions as might be expected from a rule-based system. As shown in tables 2,3,4,7 the term &quot;u a &quot; was translated to &quot;from&quot; instead of &quot;who&quot;. This problem is related to the recognition of diacritization.</Paragraph> </Section> <Section position="7" start_page="11" end_page="12" type="sub_section"> <SectionTitle> Systran </SectionTitle> <Paragraph position="0"> The evaluator observed that Systran' propensity to produce some common wrong sense translations which lead to change the meaning of the questions, table 10 shows some of these common sense translation.</Paragraph> <Paragraph position="1"> Original text What are the names of the space shuttles?</Paragraph> </Section> <Section position="8" start_page="12" end_page="12" type="sub_section"> <SectionTitle> 3.3 The Effectiveness of AnswerFinder </SectionTitle> <Paragraph position="0"> combined with Systran' Translation After translating the 199 questions using Systran, they were passed to AnswerFinder. Figure 2 illustrates the system's abilities to answer the original and the translated questions; AnswerFinder was initially able to answer 42.6% of the questions while after translation, its accuracy to return correct answers dropped to 10.2%. Its failure to return any answer increased by 35.5% (from 23.9% to 59.4%); in addition, non-exact answers decreased by 6.6% while wrong answers increased by 3.6% (from 25.4% to 28.9%).</Paragraph> <Paragraph position="1"> AnswerFinder was able to answer 23 translated questions out of 199. Out of these 23 questions, 12 were correctly translated and 11 exhibited some translation errors. Looking closely at the 12 corrected translated question (shown in table 11), 9 of them are of the factoid type, one of definition type and two of the list type. And by investigating the other 11 questions that exhibited some translation errors (shown in table 12), it can be noticed that 9 of them are factoid and 2 are list types. Our assumption for Systran' ability to translate factoid questions more than definition and list questions is that they exhibited less pronouns, in contrast to definition and list questions where many proper names were included. null In total, Systran has significantly reduced AnswerFinder's ability to return correct answers by 32.4%. Table 13 shows recall, precision and F-measure before and after translation, the value of recall before translation is 0.51 and after translation has dropped down to 0.12. Similarly, the precision value has gone down from 0.76 to 0.41, accordingly the F-measure also dropped down from 0.6 to 0.2. Altogether, in multilingual retrieval task precision and recall are 40.6% and 30%, respectively, below the monolingual retrieval task.</Paragraph> </Section> </Section> <Section position="5" start_page="12" end_page="13" type="metho"> <SectionTitle> Finder 4 Conclusions </SectionTitle> <Paragraph position="0"> Systran was used to translate 199 TREC questions from Arabic into English. We have scrutinized the quality of Systran's translation through out this paper. Several translation errors appeared during translations which are of the type: wrong transliteration, wrong word sense, wrong word order and wrong pronoun. The translated questions were fed into AnswerFinder, which had a big impact on its accuracy in returning correct answers. AnswerFinder was seriously affected by the relatively poor output of Systran; its effectiveness was degraded by 32.4%. This conclusion confirms Rosso et al (2005) findings using different QA system, different test sets and different machine translation system. Our results validate their results which they concluded that translation of the queries from Arabic into English has reduced the accuracy of QA system by more than 30%.</Paragraph> <Paragraph position="1"> We recommend using multiple MT to give a wider range of translation to choose from, hence, correct translation is more likely to appear in EACL 2006 Workshop on Multilingual Question Answering - MLQA06 multiple MT systems than in a single MT system. However, it is essential to note that in some cases MT systems may all disagree with one another in providing correct translation or they may agree on the wrong translation.</Paragraph> <Paragraph position="2"> It should also be borne in mind that some keywords are naturally more important than others, so in a question-answering setting it is more important to translate them correctly. Some key-words may not be as important, and some key-words due to the incorrect analysis of the English question sentence by the Question Analysis module, may even degrade the translation and question-answering performance.</Paragraph> <Paragraph position="3"> We believe there are ways to avoid the MT errors that discussed previously (i.e. wrong transliteration, wrong word senses, wrong word order, and wrong translation of pronoun). Below are some suggestions to overcome such problems: * One solution is to make some adjustments (Pre or Post- processing) to the question translation process to minimize the effects of translation by automatically correcting some regular errors using a regular written expression.</Paragraph> <Paragraph position="4"> * Another possible solution is to try building an interactive MT system by providing users more than one translation to pick the most accurate one, we believe this will offer a great help in resolving word sense problem. This is more suit-able for expert users of a language.</Paragraph> <Paragraph position="5"> In this paper, we have presented the errors associated with machine translation which indicates that the current state of MT is not very reliable for cross language QA. Much work has been done in the area of machine translation for CLIR; however, the evaluation often focuses on retrieval effectiveness rather than translation correctness. null</Paragraph> </Section> class="xml-element"></Paper>