XML Viewer - n04-1021

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-1021_intro.xml
Size: 2,020 bytes
Last Modified: 2025-10-06 14:02:18
<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-1021">
  <Title>Anoop Sarkar</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Log-linear Models for Statistical MT
</SectionTitle>
    <Paragraph position="0"> The goal is the translation of a text given in some source language into a target language. We are given a source ('Chinese') sentence f = fJ1 = f1,...,fj,...,fJ, which is to be translated into a target ('English') sentence e = eI1 = e1,...,ei,...,eI Among all possible target sentences, we will choose the sentence with the highest probability:</Paragraph>
    <Paragraph position="2"> As an alternative to the often used source-channel approach (Brown et al., 1993), we directly model the posterior probability Pr(eI1|fJ1 ) (Och and Ney, 2002) using a log-linear combination of feature functions. In this framework, we have a set of M feature functions hm(eI1,fJ1 ),m = 1,...,M. For each feature function, there exists a model parameter lm,m = 1,...,M. The direct translation probability is given by:</Paragraph>
    <Paragraph position="4"> (2) We obtain the following decision rule:</Paragraph>
    <Paragraph position="6"> The standard criterion for training such a log-linear model is to maximize the probability of the parallel training corpus consisting of S sentence pairs {(fs,es) : s = 1,...,S}. However, this does not guarantee optimal performance on the metric of translation quality by which our system will ultimately be evaluated. For this reason, we optimize the parameters directly against the BLEU metric on held-out data. This is a more difficult optimization problem, as the search space is no longer convex.</Paragraph>
    <Paragraph position="7">  its English translation into alignment templates.</Paragraph>
    <Paragraph position="8"> However, certain properties of the BLEU metric can be exploited to speed up search, as described in detail by Och (2003). We use this method of optimizing feature weights throughout this paper.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML