File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-1504_metho.xml
Size: 11,947 bytes
Last Modified: 2025-10-06 14:08:10
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1504"> <Title>Machine Translation as a testbed for multilingual analysis</Title> <Section position="5" start_page="3" end_page="3" type="metho"> <SectionTitle> 4 Examples from FE and SE </SectionTitle> <Paragraph position="0"> In this section we discuss specific examples to illustrate how results from MT evaluation help us to test and develop the analysis system.</Paragraph> <Paragraph position="1"> 4.1 FE translation: the Hansard corpus The evaluation we are discussing in this section was performed in January 2002, at the beginning of our effort on the Hansard corpus. The evaluation was performed on a corpus of 250 sentences, of which 55.6% (139 sentences) were assigned a score of 2 or lower, 30.4% (76 sentences) were assigned a score greater than 2 but not greater than 3, and 14% (35 sentences) were assigned a score greater than 3.</Paragraph> <Paragraph position="2"> Examination of French sentences receiving low-score translations led to the identification of some classes of analysis problems, such as the following: - mis-identification of vocatives - clefts not represented correctly - mis-analysis of ce qui / ce que free relatives - bad representation of complex inversion (pronoun-doubling of inverted subject) - no treatment of reflexives - fitted parses (i.e., not spanning the sentence) Most of the problematic structures are characteristic of spoken language as opposed to more formal, written styles (vocatives, clefts, direct questions), and had not been encountered in our previous work, which had involved mostly translation of technical manuals. Other problems (free relatives, reflexives) are analysis issues that we had not yet addressed. Fitted parses are parses that do not span the whole sentence, but are pieced together by the parser from partial parses; fitted parses usually result in poor translations.</Paragraph> <Paragraph position="3"> Examples of translations together with their score are given in Table I. The source sentences are the French sentences, the reference sentence is the human translation to which the translation is compared by the evaluators, and the translation is the output of MSR-MT. Each of the three categories considered above is illustrated by an example.</Paragraph> <Paragraph position="4"> Sentence (2) (with a score of 1.5) is a direct question with complex inversion and the doubled subject typical of that construction. In the LF for (2), les ministres des finances is analyzed as a modifier, because the verb reunir already has a subject, the pronoun ils 'they'. There are a couple of additional problems with this sentence: si is analyzed as the adverb meaning 'so' instead of as the conjunction meaning 'if', and a direct question is analyzed as a complement clause; the sketch and LF analyses of this sentence are given in the Appendix.. The MSR-MT translation of this sentence has a very low score, reflecting the severity of the analysis problems.</Paragraph> <Paragraph position="5"> The two other sentences, on the other hand, do not have analysis problems: the poor translation of (3) (score 2.16) is caused by bad alignment (droit translates as right instead of law), and the translation of (4) (score 3) is not completely fluent, but this is due to an English generation problem, rather than to a French analysis problem. This last sentence is the most correct with appropriate lexical items and has the highest score of the three. Of the 139 sentences with score 2 or lower, 73% were due to analysis problems, and 24% to alignment problems. Most of the rest had bugs related to the learned dictionary. There were a few cases of very free translations, where the reference translation was very far from the French sentence, and our translation, based on the source sentence, was therefore penalized.</Paragraph> <Paragraph position="6"> These figures show that, at this stage of development of our system, most of the problems in translation come from analysis. Translation can be improved by tackling analysis problems exhibited by the lowest scoring sentences, and, conversely, analysis issues can be discovered by looking at the sentences with the lowest translation score.</Paragraph> <Paragraph position="7"> The next section gives examples of issues with the SE system, which is more mature than the FE system.</Paragraph> <Section position="1" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 4.2 SE translation: Technical manuals </SectionTitle> <Paragraph position="0"> An evaluation of the Spanish-English MT system was also performed in January 2002, after work on the MT system had been progressing for approximately a year and a half. The SE system was developed and tested using a corpus of sentences from Microsoft technical manuals. A set of 600 unseen sentences was used for the evaluation.</Paragraph> <Paragraph position="1"> Out of a total of 600 sentences, the number of sentences with a score from 3 to 4 was 251 (42%), the number of sentences with a score greater than 2 but less than 3 was 186 (31%), and the remaining 163 sentences, (27%) had a score of 2 or lower. Of these 163 sentences with the lowest scores, 50% (82 sentences) had analysis problems, and 17% of them (29 sentences) had fitted parses. A few of the fitted parses, 7 sentences out of 29, had faulty input, e.g. input that contained unusual characters or punctuation, typos, or sentence fragments.</Paragraph> <Paragraph position="2"> Typical analysis problems that led to poor translation in the SE system include the following: - incorrect analysis of arguments in relative clauses, especially those with a single argument (and a possible non-overt subject) - failure to identify the referent of clitic le (i.e. usted 'you') in imperative sentences in LF - mis-analysis of Spanish reflexive or se constructions in LF - incorrect syntactic analysis of homographs - incorrect analysis of coordination - mis-identification of non-overt or controlled subjects - fitted parses Table II contains sample sentences from the SE evaluation. For each row, the second column displays the Spanish source sentence with the reference sentence in the next column, the translation produced by the MT system is in the fourth column, and the score for the translation assigned by the human evaluators in the last column.</Paragraph> <Paragraph position="3"> # Source Reference Translation Score (2) Si tel n'etait pas le cas, pourquoi les ministres des Finances des provinces se seraient-ils reunis hier pour essayer de s'entendre sur un programme commun a soumettre au ministre des Finances? If that were not the case, why were the finance ministers of the provinces coalescing yesterday to try and come up with a joint program to bring to the finance minister?.</Paragraph> <Paragraph position="4"> Not was the case that they have the ministers met why yesterday Finances of the provinces trying to agree on a common program to bring Finances for the minister this so like?</Paragraph> <Paragraph position="6"> Nous ne pouvons pas appuyer cette motion apres que le Bloc quebecois ait refuse de reconnaitre la primaute du droit et de la democratie pour tous.</Paragraph> <Paragraph position="7"> We cannot support this motion after seeing the Bloc Quebecois refuse to recognize the rule of law and the principle of democracy for all.</Paragraph> <Paragraph position="8"> We cannot support this motion after the Bloc Quebecois has refused to recognize the rule of the right and democracy for all. (4) En tant que membre de l'opposition officielle, je continuerai d'exercer des pressions sur le gouvernement pour qu'il tienne ses promesses a cet egard. As a member of the official opposition I will continue to pressure the government to fulfil its promises in this regard.</Paragraph> <Paragraph position="9"> As member of the official opposition, I will continue to exercise pressures on the government for it to keep its promises in this regard.</Paragraph> <Paragraph position="10"> Este procedimiento solo es aplicable si esta ejecutando una version de idioma de Windows 2000 que no coincida con el idioma en el que desee escribir. This procedure applies only if you are running a language version of Windows 2000 that doesn't match the language you want to type This procedure only applies if you are running a Windows 2000 language version that does not match the language that you want to type.</Paragraph> <Paragraph position="12"> Repita este proceso hasta que haya eliminado todos los componentes de red desde las propiedades de Red, haga clic en Aceptar y, a continuacion, haga clic en Si cuando se le pregunte si desea reiniciar el equipo.</Paragraph> <Paragraph position="13"> Repeat this process until you have deleted all of the network components from Network properties, click OK, and then click Yes when you are prompted to restart your computer.</Paragraph> <Paragraph position="14"> Repeat this process until you have deleted all of the network components from the Network properties, you click OK, and you click Yes then when asking that to restart the computer is wanted for him.</Paragraph> <Paragraph position="16"> En el siguiente ejemplo se muestra el nombre de la presentacion que se esta ejecutando en la ventana de presentacion con diapositivas uno.</Paragraph> <Paragraph position="17"> The following example displays the name of the presentation that's currently running in slide show window one.</Paragraph> <Paragraph position="18"> In the following example, the display name that is being run in the slide show window is displayed I join.</Paragraph> <Paragraph position="19"> In the evaluation process, human evaluators compared the MT translation to the reference sentence, in the manner described in Section 4.1. Example (5), with a score of 3.8, illustrates the fact that human evaluators considered the translation 'a Windows 2000 language version' to be a slightly worse translation than 'a language version of Windows 2000' for una version de idioma de Windows 2000; however the difference is so slight as to not be considered an analysis problem.</Paragraph> <Paragraph position="20"> Example (6) illustrates the failure to identify usted 'you' (understood as the subject of the imperative) as the referent of the pronominal clitic le; as mentioned above, this is a common source of bad SE translations. The last example (7) is a sentence with a fitted parse due to misanalysis of a word as its homograph : uno is analyzed as the first person singular present form of the verb unir 'join' instead of as the noun uno 'one'; the LF of this sentence is given in the Appendix.</Paragraph> </Section> <Section position="2" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 4.3 Discussion </SectionTitle> <Paragraph position="0"> The examples discussed in this section are typical: The sentences for which MSR-MT produces better translations tend to be the ones with fewer analysis errors, while those which are misanalyzed tend to be mistranslated.</Paragraph> <Paragraph position="1"> In this way, evaluation of MT output serves as one way to prioritize analysis problems; that is, to decide which among the many different analysis problems lead to the most serious problems. For example, the poor quality of the translation of (2) highlights the need for an improved analysis of complex inversion in the French grammar, which will need to be incorporated into the sketch and/or LF components. Similarly, the poor translation of (7) indicates the need to deal better with homographs in the Spanish morphological or sketch component.</Paragraph> <Paragraph position="2"> More generally, the analysis of FE and SE translation problems has led to the lists of analysis problems given in Sections 4.1 and 4.2, respectively. Analysis problems identified in this way then become priorities for grammar/LF development.</Paragraph> </Section> </Section> class="xml-element"></Paper>