File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/n06-1003_intro.xml

Size: 2,269 bytes

Last Modified: 2025-10-06 14:03:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1003">
  <Title>Improved Statistical Machine Translation Using Paraphrases</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> As with many other statistical natural language processing tasks, statistical machine translation (Brown et al., 1993) produces high quality results when ample training data is available. This is problematic for so called &amp;quot;low density&amp;quot; language pairs which do not have very large parallel corpora. For example, when words occur infrequently in a parallel corpus parameter estimates for word-level alignments can be inaccurate, which can in turn lead to inaccurate phrase translations. Limited amounts of training data can further lead to a problem of low coverage in that many phrases encountered at run-time are not observed in the training data and therefore their translations will not be learned.</Paragraph>
    <Paragraph position="1"> Here we address the problem of unknown phrases.</Paragraph>
    <Paragraph position="2"> Specifically we show that upon encountering an unknown source phrase, we can substitute a paraphrase for it and then proceed using the translation of that paraphrase. We derive these paraphrases from resources that are external to the parallel corpus that the translation model is trained from, and we are able to exploit (potentially more abundant) parallel corpora from other language pairs to do so.</Paragraph>
    <Paragraph position="3"> In this paper we: * Define a method for incorporating paraphrases of unseen source phrases into the statistical machine translation process.</Paragraph>
    <Paragraph position="4"> * Show that by translating paraphrases we achieve a marked improvement in coverage and translation quality, especially in the case of unknown words which to date have been left untranslated. null * Argue that while we observe an improvement in Bleu score, this metric is particularly poorly suited to measuring the sort of improvements that we achieve.</Paragraph>
    <Paragraph position="5"> * Present an alternative methodology for targeted manual evaluation that may be useful in other research projects.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML