File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-1002_intro.xml
Size: 3,380 bytes
Last Modified: 2025-10-06 14:03:30
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1002"> <Title>Going Beyond AER: An Extensive Analysis of Word Alignments and Their Impact on MT</Title> <Section position="3" start_page="0" end_page="9" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Word alignments are a by-product of statistical machine translation (MT) and play a crucial role in MT performance. In recent years, researchers haveproposedseveralalgorithmstogenerateword alignments. However, evaluating word alignments is difficult because even humans have difficulty performing this task.</Paragraph> <Paragraph position="1"> The state-of-the art evaluation metric-alignment error rate (AER)--attempts to balance the precision and recall scores at the level of alignment links (Och and Ney, 2000). Other metrics assess the impact of alignments externally, e.g., different alignments are tested by comparing the corresponding MT outputs using automated evaluation metrics (e.g., BLEU (Papineni et al., 2002) or METEOR (Banerjee and Lavie, 2005)).</Paragraph> <Paragraph position="2"> However, these studies showed that AER and BLEU do not correlate well (Callison-Burch et al., 2004; Goutte et al., 2004; Ittycheriah and Roukos, 2005). Despite significant AER improvements achievedbyseveralresearchers, theimprovements in BLEU scores are insignificant or, at best, small. This paper demonstrates the difficulty in assessing whether alignment quality makes a difference in MT performance. We describe the impact of certain alignment characteristics on MT performance but also identify several alignment-related factors that impact MT performance regardless of the quality of the initial alignments. In so doing, we begin to answer long-standing questions about the value of alignment in the context of MT.</Paragraph> <Paragraph position="3"> We first evaluate 5 different word alignments intrinsically, using: (1) community-standard metrics--precision, recall and AER; and (2) a new measure called consistent phrase error rate (CPER). Next, we observe the impact of different alignments on MT performance. We present BLEU scores on a phrase-based MT system, Pharaoh (Koehn, 2004), using five different alignments to extract phrases. We investigate the impact of different settings for phrase extraction, lexical weighting, maximum phrase length and training data. Finally, we present a quantitative analysis of which phrases are chosen during the actual decoding process and showhow the distribution of thephrasesdifferfromonealignmentintoanother.</Paragraph> <Paragraph position="4"> Our experiments show that precision-oriented alignmentsyieldbetterphrasesforMTthanrecalloriented alignments. Specifically, they cover a higher percentage of our test sets and result in fewer untranslated words and selection of longer phrases during decoding.</Paragraph> <Paragraph position="5"> The next section describes work related to our alignment evaluation approach. Following this we outline different intrinsic evaluation measures of alignment and we propose a new measure to evaluatewordalignmentswithinphrase-basedMT framework. We then present several experiments to measure the impact of different word alignments on a phrase-based MT system, and investigate how different alignments change the phrase selection in the same MT system.</Paragraph> </Section> class="xml-element"></Paper>