File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1060_intro.xml
Size: 3,984 bytes
Last Modified: 2025-10-06 14:02:05
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1060"> <Title>Syntax-Based Alignment: Supervised or Unsupervised?</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Statistical approaches to machine translation, pioneered by Brown et al. (1990), estimate parameters for a probabilistic model of word-to-word correspondences and word re-orderings directly from large corpora of parallel bilingual text. In recent years, a number of syntactically motivated approaches to statistical machine translation have been proposed. These approaches assign a parallel tree structure to the two sides of each sentence pair, and model the translation process with reordering operations defined on the tree structure. The tree-based approach allows us to represent the fact that syntactic constituents tend to move as unit, as well as systematic differences in word order in the grammars of the two languages. Furthermore, the tree structure allows us to make probabilistic independence assumptions that result in polynomial time algorithms for estimating a translation model from parallel training data, and for finding the highest probability translation given a new sentence.</Paragraph> <Paragraph position="1"> Wu (1997) modeled the reordering process with binary branching trees, where each production could be either in the same or in reverse order going from source to target language. The trees of Wu's Inversion Transduction Grammar were derived by synchronously parsing a parallel corpus, using a grammar with lexical translation probabilities at the leaves and a simple grammar with a single nonterminal providing the tree structure. While this grammar did not represent traditional syntactic categories such as verb phrases and noun phrases, it served to restrict the word-level alignments considered by the system to those allowable by reordering operations on binary trees. This restriction corresponds to intuitions about the alignments that could be produced by systematic differences between the two language's grammars, and allows for a polynomial time algorithm for finding the highest-probability alignment, and for re-estimation of the lexical translation and grammar probabilities using the Expectation Maximization algorithm.</Paragraph> <Paragraph position="2"> Yamada and Knight (2001) present an algorithm for estimating probabilistic parameters for a similar model which represents translation as a sequence of re-ordering operations over children of nodes in a syntactic tree, using automatic parser output for the initial tree structures. This gives the translation model more information about the structure of the source language, and further constrains the reorderings to match not just a possible bracketing as in Wu (1997), but the specific bracketing of the parse tree provided.</Paragraph> <Paragraph position="3"> In this paper, we make a direct comparison of a syntactically unsupervised alignment model, based on Wu (1997), with a syntactically supervised model, based on Yamada and Knight (2001).</Paragraph> <Paragraph position="4"> We use the term syntactically supervised to indicate that the syntactic structure in one language is given to the training procedure. It is important to note, however, that both algorithms are unsupervised in that they are not provided any hand-aligned training data. Rather, they both use Expectation Maximization to find an alignment model by iteratively improving the likelihood assigned to unaligned parallel sentences. Our evaluation is in terms of agreement with word-level alignments created by bilingual human annotators. We describe each of the models used in more detail in the next two sections, including the clone operation of Gildea (2003). The reader who is familiar with these models may proceed directly to our experiments in Section 4, and further discussion in Section 5.</Paragraph> </Section> class="xml-element"></Paper>