File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-2122_intro.xml
Size: 3,463 bytes
Last Modified: 2025-10-06 14:03:48
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2122"> <Title>Inducing Word Alignments with Bilexical Synchronous Trees</Title> <Section position="3" start_page="0" end_page="953" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> A major dif culty in statistical machine translation is the trade-off between representational power and computational complexity. Real-world corpora for language pairs such as Chinese-English have complex reordering relationships that are not captured by current phrase-based MT systems, despite their state-of-the-art performance measured in competitive evaluations. Synchronous grammar formalisms that are capable of modeling such complex relationships while maintaining the context-free property in each language have been proposed for many years, (Aho and Ullman, 1972; Wu, 1997; Yamada and Knight, 2001; Melamed, 2003; Chiang, 2005), but have not been scaled to large corpora and long sentences until recently.</Paragraph> <Paragraph position="1"> In Synchronous Context Free Grammars, there are two sources of complexity, grammar branching factor and lexicalization. In this paper we focus on the second issue, constraining the grammar to the binary-branching Inversion Transduction Grammar of Wu (1997). Lexicalization seems likely to help models predict alignment patterns between languages, and has been proposed by Melamed (2003) and implemented by Alshawi et al. (2000) and Zhang and Gildea (2005). However, each piece of lexical information considered by a model multiplies the number of states of dynamic programming algorithms for inference, meaning that we must choose how to lexicalize very carefully to control complexity.</Paragraph> <Paragraph position="2"> In this paper we compare two approaches to lexicalization, both of which incorporate bilexical probabilities. One model uses bilexical probabilities across languages, while the other uses bilexical probabilities within one language. We compare results on word-level alignment, and investigate the implications of the choice of lexicalization on the speci cs of our alignment algorithms. The new model, which bilexicalizes within languages, allows us to use the hook trick (Eisner and Satta, 1999) and therefore reduces complexity. We describe the application of the hook trick to estimation with Expectation Maximization (EM). Despite the theoretical bene ts of the hook trick, it is not widely used in statistical monolingual parsers, because the savings do not exceed those obtained with simple pruning. We speculate that the advantages may be greater in an EM setting, where parameters to guide pruning are not (initially) available.</Paragraph> <Paragraph position="3"> In order to better understand the model, we analyze its performance in terms of both agreement with human-annotated alignments, and agreement with the dependencies produced by monolingual parsers. We nd that within-language bilexicalization does not improve alignment over cross-language bilexicalization, but does improve recovery of dependencies. We nd that the hook trick signi cantly speeds training, even in the presence of pruning.</Paragraph> <Paragraph position="4"> Section 2 describes the generative model. The hook trick for EM is explained in Section 3. In Section 4, we evaluate the model in terms of alignment error rate and dependency error rate. We conclude with discussions in Section 5.</Paragraph> </Section> class="xml-element"></Paper>