File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/w01-1409_intro.xml
Size: 1,561 bytes
Last Modified: 2025-10-06 14:01:20
<?xml version="1.0" standalone="yes"?> <Paper uid="W01-1409"> <Title>Building a Statistical Machine Translation System from Scratch: How Much Bang for the Buck Can We Expect?</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Crises and disasters frequently attract international attention to regions of the world that have previously been largely ignored by the international community.</Paragraph> <Paragraph position="1"> While it is possible to stock up on emergency relief supplies and, for the worst case, weapons, regardless of where exactly they are eventually going to be used, this cannot be done with multilingual information processing technology. This technology will often have to be developed after the fact in a quick response to the given situation. Multilingual data resources for statistical approaches, such as parallel corpora, may not always be available.</Paragraph> <Paragraph position="2"> In the fall of 2000, we decided to put the current state of the art to the test with respect to the rapid construction of a machine translation system from scratch. Within one month, we would AF hire translators; AF translate as much text as possible; and AF train a statistical MT system on the data thus created. null The language of choice was Tamil, which is spoken in Sri Lanka and in the southern part of India. Tamil is a head-last language with a very rich morphology and therefore quite different from English.</Paragraph> </Section> class="xml-element"></Paper>