File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/j03-1002_intro.xml
Size: 2,565 bytes
Last Modified: 2025-10-06 14:01:43
<?xml version="1.0" standalone="yes"?> <Paper uid="J03-1002"> <Title>c(c) 2003 Association for Computational Linguistics A Systematic Comparison of Various Statistical Alignment Models</Title> <Section position="4" start_page="22" end_page="23" type="intro"> <SectionTitle> 2. Review of Alignment Models </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="22" end_page="23" type="sub_section"> <SectionTitle> 2.1 General Approaches </SectionTitle> <Paragraph position="0"> We distinguish between two general approaches to computing word alignments: statistical alignment models and heuristic models. In the following, we describe both types of models and compare them from a theoretical viewpoint.</Paragraph> <Paragraph position="1"> The notational convention we employ is as follows. We use the symbol Pr(*) to denote general probability distributions with (almost) no specific assumptions. In contrast, for model-based probability distributions, we use the generic symbol p(*).</Paragraph> <Paragraph position="2"> is introduced that describes a mapping from a source position j to a target position a j . The relationship between the translation model and the alignment model is given by to account for source words that are not aligned with any target word. In general, the statistical model depends on a set of unknown parameters th that is learned from training data. To express the dependence of the model on the parameter Och and Ney Comparison of Statistical Alignment Models set, we use the following notation: The art of statistical modeling is to develop specific statistical models that capture the relevant properties of the considered problem domain. In our case, the statistical alignment model has to describe the relationship between a source language string and a target language string adequately.</Paragraph> <Paragraph position="3"> To train the unknown parameters th, we are given a parallel training corpus consisting of S sentence pairs {(f Typically, for the kinds of models we describe here, the expectation maximization (EM) algorithm (Dempster, Laird, and Rubin 1977) or some approximate EM algorithm is used to perform this maximization. To avoid a common misunderstanding, however, note that the use of the EM algorithm is not essential for the statistical approach, but only a useful tool for solving this parameter estimation problem. Although for a given sentence pair there is a large number of alignments, we can always find a best alignment:</Paragraph> </Section> </Section> class="xml-element"></Paper>