File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0824_intro.xml
Size: 2,378 bytes
Last Modified: 2025-10-06 14:03:14
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0824"> <Title>RALI: SMT shared task system description</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Machine translation is nowadays mature enough that it is possible without too much effort to devise automatically a statistical translation system from just a parallel corpus. This is possible thanks to the dissemination of valuable packages.</Paragraph> <Paragraph position="1"> The performance of such a system may however greatly vary from one pair of languages to another. Indeed, there is no free lunch for system developers, and if a black box approach can sometimes be good enough for some applications (we can surely accomplish translation gisting with the French-English and Spanish-English systems we developed during this exercice), making use of the output of such a system for, let's say, quality translation is another kettle of fish (especially in our case with the Finnish-English system we ended-up with).</Paragraph> <Paragraph position="2"> We devoted two weeks to the SMT shared task, the aim of which was precisely to see how well systems can do across different language families.</Paragraph> <Paragraph position="3"> We began with a core system which is described in the next section and from which we obtained baseline performances that we tried to improve upon.</Paragraph> <Paragraph position="4"> Since the French- and Spanish-English systems produced output that were comprehensible enough1, we focussed on the two languages whose translations were noticeably worse: German and Finnish. For German, we tried to move around words in order to mimic English word order; and we tried to split compound words. This is described in section 4. For the Finnish/English pair, we tried to decompose Finnish words into smaller substrings (see section 5).</Paragraph> <Paragraph position="5"> In parallel to that, we tried to smooth a phrase-based model (PBM) making use of WORDNET.</Paragraph> <Paragraph position="6"> We report on this experiment in section 3. We describe in section 6 the final setting of the systems we used for submitting translations and their official results as computed by the organizers. Finally, we conclude our two weeks of efforts in section 7.</Paragraph> </Section> class="xml-element"></Paper>