File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/c04-1060_evalu.xml

Size: 6,914 bytes

Last Modified: 2025-10-06 13:59:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1060">
  <Title>Syntax-Based Alignment: Supervised or Unsupervised?</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5 Discussion
</SectionTitle>
    <Paragraph position="0"> The Inversion Transduction Grammar significantly outperforms the syntactically supervised tree-to-string model of Yamada and Knight (2001). The tree-to-string and IBM models are roughly equivalent. Adding the cloning operation improves tree-to-string results by 2% precision and recall. It is particularly significant that the ITG gets higher recall than the other models, when it is the only model entirely limited to one-to-one alignments, bounding the maximum recall it can achieve.</Paragraph>
    <Paragraph position="1"> Our French-English experiments show only small differences between the various systems. Overall, performance on French-English is much better than for Chinese-English. French-English has less re-ordering overall, as shown by the percentage of productions in the viterbi ITG parses that are inverted: 14% for French-English in comparison to 23% for Chinese-English.</Paragraph>
    <Paragraph position="2"> One possible explanation for our results is parser error. While we describe our system as &amp;quot;syntacti- null the form of the annotation of the Wall Street Journal treebank on which the parser is trained, rather than parses for our parallel training corpus. In particular, the text we are parsing has a different vocabulary and style of prose from the WSJ treebank, and often the fluency of the English translations leaves something to be desired. While both corpora consist of newswire text, a typical WSJ sentence Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.</Paragraph>
    <Paragraph position="3"> contrasts dramatically with In the past when education on opposing Communists and on resisting Russia was stressed, retaking the mainland and unifying China became a slogan for the authoritarian system, which made the unification under the martial law a tool for oppressing the Taiwan people.</Paragraph>
    <Paragraph position="4"> a typical sentence from our corpus.</Paragraph>
    <Paragraph position="5"> While we did not have human-annotated gold-standard parses for our training data, we did have human annotated parses for the Chinese side of our test data, which was taken from the Penn Chinese Treebank (Xue et al., 2002). We trained a second tree-to-string model in the opposite direction, using Chinese trees and English strings. The Chinese training data was parsed with the Bikel (2002) parser, and used the Chinese Treebank parses for our test data. Results are shown in Table 3. Because the ITG is a symmetric, generative model, the ITG results in Table 3 are identical to those in Table 1. While the experiment does not show a significant improvement, it is possible that better parses for the training data might be equally important.</Paragraph>
    <Paragraph position="6"> Even when the automatic parser output is correct, the tree structure of the two languages may not correspond. Dorr (1994) categorizes sources of syntactic divergence between languages, and Fox (2002) analyzed a parallel French-English corpus, quantifying how often parse dependencies cross when projecting an English tree onto a French string. Even in this closely related language pair with generally similar word order, crossed dependencies were caused by such common occurrences as adverb modification of a verb, or the correspondence of &amp;quot;not&amp;quot; to &amp;quot;ne pas&amp;quot;. Galley et al. (2004) extract translation rules from a large parsed parallel corpus that extend in scope to tree fragments beyond a single node; we believe that adding such larger-scale operations to the translation model is likely to significantly improve the performance of syntactically supervised alignment.</Paragraph>
    <Paragraph position="7"> The syntactically supervised model has been found to outperform the IBM word-level alignment models of Brown et al. (1993) for translation by Yamada and Knight (2002). An evaluation for the alignment task, measuring agreement with human judges, also found the syntax-based model to out-perform the IBM models. However, a relatively small corpus was used to train both models (2121 Japanese-English sentence pairs), and the evaluations were performed on the same data for training, meaning that one or both models might be significantly overfitting.</Paragraph>
    <Paragraph position="8"> Zens and Ney (2003) provide a thorough analysis of alignment constraints from the perspective of decoding algorithms. They train the models of Wu  (1997) as well as Brown et al. (1993). Decoding, meaning exact computation of the highest probability translation given a foreign sentence, is not possible in polynomial time for the IBM models, and in practice decoders search through the space of hypothesis translations using a set of additional, hard alignment constraints. Zens and Ney (2003) compute the viterbi alignments for German-English and French-English sentences pairs using IBM Model 5, and then measure how many of the resulting alignments fall within the hard constraints of both Wu (1997) and Berger et al. (1996). They find higher coverage for an extended version of ITG than for the IBM decoding constraint for both language pairs, with the unmodified ITG implementation covering about the same amount of German-English data as IBM, and significantly less French-English data. These results show promise for ITG as a basis for efficient decoding, but do not address which model best aligns the original training data, as IBMderived alignments were taken as the gold standard, rather than human alignments. We believe that our results show that syntactically-motivated models are a promising general approach to training translation models as well to searching through the resulting probability space.</Paragraph>
    <Paragraph position="9"> Computational complexity is an issue for the tree-based models presented here. While training the IBM models with the GIZA++ software takes minutes, the tree-based EM takes hours. With our C implementation, one iteration of the syntactically supervised model takes 50 CPU hours, which can be parallelized across machines. Our tree-based models are estimated with complete EM, while the training procedure for the IBM models samples from a number of likely alignments when accumulating expected counts. Because not every alignment is legal with the tree-based models, the technique of sampling by choosing likely alignments according to a simpler model is not straightforward. Nonetheless, we feel that training times can be improved with the right pruning and sampling techniques, as will be necessary to train on the much larger amounts data now available, and on longer sentences.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML