XML Viewer - w06-3103

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-3103_intro.xml

Size: 1,769 bytes

Last Modified: 2025-10-06 14:04:12

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3103">
  <Title>Morpho-syntactic Arabic Preprocessing for Arabic-to-English Statistical Machine Translation</Title>
  <Section position="3" start_page="15" end_page="15" type="intro">
    <SectionTitle>
2 Baseline SMT System
</SectionTitle>
    <Paragraph position="0"> In statistical machine translation, we are given a source language sentence fJ1 = f1 ...fj ...fJ, which is to be translated into a target language sentence eI1 = e1 ...ei ...eI. Among all possible target language sentences, we will choose the sentence with the highest probability:</Paragraph>
    <Paragraph position="2"> The posterior probability Pr(eI1|fJ1 ) is modeled directly using a log-linear combination of several models (Och and Ney, 2002):  The denominator represents a normalization factor that depends only on the source sentence fJ1 . Therefore, we can omit it during the search process. As a decision rule, we obtain:</Paragraph>
    <Paragraph position="4"> This approach is a generalization of the source-channel approach (Brown et al., 1990). It has the advantage that additional models h(*) can be easily integrated into the overall system. The model scaling factors lM1 are trained with respect to the final translation quality measured by an error criterion (Och, 2003).</Paragraph>
    <Paragraph position="5"> We use a state-of-the-art phrase-based translation system including the following models: an n-gram language model, a phrase translation model and a word-based lexicon model. The latter two models are used for both directions: p(f|e) and p(e|f). Additionally, we use a word penalty and a phrase penalty. More details about the baseline system can be found in (Zens and Ney, 2004; Zens et al., 2005).</Paragraph>
  </Section>
class="xml-element"></Paper>

Download Original XML