File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/06/n06-1003_abstr.xml
Size: 1,097 bytes
Last Modified: 2025-10-06 13:44:49
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-1003"> <Title>Improved Statistical Machine Translation Using Paraphrases</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> Parallel corpora are crucial for training SMT systems. However, for many language pairs they are available only in very limited quantities. For these language pairs a huge portion of phrases encountered at run-time will be unknown.</Paragraph> <Paragraph position="1"> We show how techniques from paraphrasing can be used to deal with these otherwise unknown source language phrases.</Paragraph> <Paragraph position="2"> Our results show that augmenting a state-of-the-art SMT system with paraphrases leads to significantly improved coverage and translation quality. For a training corpus with 10,000 sentence pairs we increase the coverage of unique test set unigrams from 48% to 90%, with more than half of the newly covered items accurately translated, as opposed to none in current approaches.</Paragraph> </Section> class="xml-element"></Paper>