File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/06/n06-1003_abstr.xml

Size: 1,097 bytes

Last Modified: 2025-10-06 13:44:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1003">
  <Title>Improved Statistical Machine Translation Using Paraphrases</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Parallel corpora are crucial for training SMT systems. However, for many language pairs they are available only in very limited quantities. For these language pairs a huge portion of phrases encountered at run-time will be unknown.</Paragraph>
    <Paragraph position="1"> We show how techniques from paraphrasing can be used to deal with these otherwise unknown source language phrases.</Paragraph>
    <Paragraph position="2"> Our results show that augmenting a state-of-the-art SMT system with paraphrases leads to significantly improved coverage and translation quality. For a training corpus with 10,000 sentence pairs we increase the coverage of unique test set unigrams from 48% to 90%, with more than half of the newly covered items accurately translated, as opposed to none in current approaches.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML