File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/n06-2029_concl.xml
Size: 2,113 bytes
Last Modified: 2025-10-06 13:55:13
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-2029"> <Title>Exploiting Variant Corpora for Machine Translation</Title> <Section position="6" start_page="115" end_page="115" type="concl"> <SectionTitle> 5 Conclusion </SectionTitle> <Paragraph position="0"> This paper proposed the usage of variant corpora to improve the translation quality of a multi-engine-based approach to machine translation. The element MT engines were used to translate the same input whereby the best translation was selected according to statistical models. A test on the significance of differences between statistical scores judging the translation quality of a given hypothesis was exploited to identify the model that fits the input sentence best and the respective translation hypothesis was selected as the translation output.</Paragraph> <Paragraph position="1"> The proposed method was evaluated on the CE translation task of the IWSLT 2005 workshop. The results showed that the proposed method achieving a BLEU score of 0.5765 outperformed not only all element MT engines (gaining 3.6% in BLEU score), but also a selection method using a larger corpus obtained from merging all variant corpora (gaining 11.2% in BLEU score) due to less ambiguity in the utilized models. In addition, the proposed method also outperformed the best MT system (C-STAR data track) of the IWSLT 2005 workshop gaining 4.8% in BLEU score.</Paragraph> <Paragraph position="2"> Further investigations should analyze the characteristics of the variant corpora in more detail and focus on the automatic identification of specific linguistic phenomena that could be helpful to measure how good an input sentence is covered by a specific model. This would allow us to select the most adequate variant beforehand, thus reducing computational costs and improving the system performance. This would also enable us to cluster very large corpora according to specific linguistic phenomena, thus breaking down the full training corpus to consistent subsets that are easier to manage and that could produce better results.</Paragraph> </Section> class="xml-element"></Paper>