File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/j03-3004_concl.xml
Size: 8,787 bytes
Last Modified: 2025-10-06 13:53:28
<?xml version="1.0" standalone="yes"?> <Paper uid="J03-3004"> <Title>c(c) 2003 Association for Computational Linguistics wEBMT: Developing and Validating an Example-Based Machine Translation System Using the World Wide Web</Title> <Section position="5" start_page="453" end_page="455" type="concl"> <SectionTitle> 6. Conclusions and Further Work </SectionTitle> <Paragraph position="0"> We have presented an EBMT system based on the marker hypothesis that uses post hoc validation and correction via the Web.</Paragraph> <Paragraph position="1"> Over 218,000 NPs and VPs were extracted automatically from the Penn-II Treebank using just 59 of its 29,000 rule types. These phrases were then translated automatically by three on-line MT systems. These translations gave rise to a number of automatically constructed linguistic resources: (1) the original <source,target> phrasal translation pairs, (2) the marker lexicon, (3) the gen11 Thanks are due to one of the anonymous reviewers for pointing out that our wEBMT system, seeded with input from multiple translation systems, with a postvalidation process via the Web (amounting to an n-gram target language model), in effect forms a multiengine MT system as described by Frederking and Nirenburg (1994), Frederking et al. (1994), and Hogan and Frederking (1998).</Paragraph> <Paragraph position="2"> Computational Linguistics Volume 29, Number 3 eralized lexicon, and (4) the word-level lexicon. When the system is confronted with new input, these knowledge sources are searched in turn for matching chunks, and the target language chunks are combined to create translation candidates.</Paragraph> <Paragraph position="3"> We presented a number of experiments that showed how the system fared when confronted with NPs and sentences. For the test set of 500 NPs, we obtained translations in 96% of cases, with 77.8% of the 500 NPs being translated correctly. For sentences, we obtained translations in 92% of cases, with a completely correct translation obtained 81.5% of the time. Translation quality improved both when chunks from different on-line systems were used and when the system's memories were seeded with both third-person singular and third-person plural forms. For both NPs and sentences, we obtained intelligible translations in over 96% of cases. In most cases, the &quot;best&quot; translation was ranked in the top 10 translations output by the system and was always ranked in the top 1% of translation candidates. This facilitates the task of any translator interacting with our system who needs to search for the &quot;best&quot; translation among the alternatives provided.</Paragraph> <Paragraph position="4"> We calculated the net gain of using wEBMT compared to the three on-line MT systems. In some cases, an improvement of 50% was seen when EBMT was used.</Paragraph> <Paragraph position="5"> As a consequence of the methodology chosen, we were able to perform a detailed evaluation of the strengths and weaknesses of the three Web-based systems used in our research, with Logomedia clearly outranking the other systems used. Nevertheless, adding chunks from the other two on-line MT systems improves both coverage and translation quality. In sum, therefore, the best results are obtained when chunks from all three on-line systems are used, and the system's databases are seeded with translations from these systems for both third-person singular and third-person plural versions of sentences.</Paragraph> <Paragraph position="6"> In addition, prior to the system's outputting the best-ranked translation candidate, morphological variants of certain components in the translation are searched for via the Web in order to confirm it as the final output translation or to propose a corrected alternative. Currently we validate our translations only with regard to subject head noun-head verb agreement and determiner-noun agreement, but we plan to extend this validation to cover more cases of boundary friction. We demonstrated that considerable improvements can be made to the translations derived by the system by submitting them to the Web for validation and correction.</Paragraph> <Paragraph position="7"> A number of issues for further work present themselves. The decision to take only those Penn-II rules occurring 1,000 or more times was completely arbitrary, and it might be useful to include some strings corresponding to the less frequently occurring structures in our database. Similarly, it would be a good idea to extend our word-level lexicon by including more entries using rules in which the right-hand side contains a single non-terminal.</Paragraph> <Paragraph position="8"> Furthermore, the quality of the output was not taken into consideration when selecting the on-line MT systems from which all our system resources are derived, so that any results obtained may be further improved by selecting a &quot;better&quot; MT system that permits batch processing.</Paragraph> <Paragraph position="9"> We could expect a significant improvement in the results obtained if we were to import the original sentences and their translations into a sentential database. Although we insert dummy subject pronouns to derive appropriate finite verb forms, we do not maintain these translation pairs as a resource for subsequent consultation and retrieval. Although the chance of finding an exact match at sentential level is very low, it will increase as more sentence pairs are added to the database, especially if we restrict the domain of applicability of (a version of) our system to a particular sublanguage area. However, the major improvement that can be expected is in the segmentation process: Way and Gough wEBMT Given that verbs are not a closed class, any verb will be contained within (part of) a chunk pertaining to its subject NP. That is, although subject-verb agreement poses a considerable problem to our system given the choice of original input material, this particular instance of boundary friction will disappear if we segment our translation pairs at the sentential rather than at the phrasal level.</Paragraph> <Paragraph position="10"> In addition, we want to evaluate our system further with respect to larger data sets. Manual evaluation is costly, both in terms of time and effort required. Accordingly, in future work we plan to use automatic evaluation methodologies such as sentence error rate or word error rate. These are very harsh metrics: Consider the example in (54), extracted from the Canadian Hansards: (54) Again this was voted down by the Liberal majority == Malheureusement, encore une fois, la majorit 'e lib 'erale l'a rejet 'e.</Paragraph> <Paragraph position="11"> Automatic evaluation measures presuppose the existence of an &quot;oracle&quot; (i.e., &quot;correct&quot;) translation produced by a human, such as here. Translations derived by the MT system to be evaluated are then compared against the human translation. In the example in (54), the human has inserted malheureusement although there is no sign of unfortunately in the English source. If the perfect translation Encore une fois, la majorit'e lib'erale l'a rejet'e were produced by an MT system, therefore, it would be penalized, as the human translation is always considered to be the &quot;correct&quot; translation. We have obtained a number of translation memories from two major computer companies, as well as a large amount of monolingual data from the same domain, with which we plan to test our system using automatic evaluation metrics in future work. This will also enable us to test our EBMT methodology against other language pairs, which may present the segmentation method employed with new challenges to overcome.</Paragraph> <Paragraph position="12"> Finally, we plan to prioritize the lexical resources produced so that more weighting would be given to translations derived from the phrasal and marker lexicons as opposed to those derived via word insertion from the word level lexicon and the generalized templates.</Paragraph> <Paragraph position="13"> In sum, we have demonstrated that using a &quot;linguistics-lite&quot; approach based on the marker hypothesis, with a large number of phrases extracted automatically from a very small number of the rules in the Penn Treebank, many new reusable linguistic resources can be derived automatically that can be utilized in an EBMT system capable of translating new input with quite reasonable rates of success. We have demonstrated that a net gain may be achieved by using EBMT over on-line MT systems. We have also shown that the Web can be used to validate and correct candidate translations prior to their being output.</Paragraph> </Section> class="xml-element"></Paper>