File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-1628_evalu.xml
Size: 2,590 bytes
Last Modified: 2025-10-06 13:59:51
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1628"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics A Discriminative Model for Tree-to-Tree Translation</Title> <Section position="8" start_page="238" end_page="239" type="evalu"> <SectionTitle> 7 Experiments </SectionTitle> <Paragraph position="0"> We applied the approach to translation from German to English, using the Europarl corpus (Koehn, 2005) for our training data. This corpus contains over 750,000 training sentences; we extracted over 441,000 training examples for the AEP model from this corpus, using the method described in section 4. We reserved 35,000 of these training examples as development data for the model. We used a set of features derived from the those described in section 5.2. This set was optimized using the development data through experimentation with several different feature subsets.</Paragraph> <Paragraph position="1"> Modifiers within German clauses were translated using the phrase-based model of Koehn et al. (2003). We first generated n-best lists for each modifier. We then built a reranking model--see section 6--to choose between the elements in the n-best lists. The reranker was trained using around 800 labeled examples from a development set.</Paragraph> <Paragraph position="2"> The test data for the experiments consisted of 2,000 sentences, and was the same test set as that used by Collins et al. (2005). We use the model of Koehn et al. (2003) as a baseline for our experiments. The AEP-driven model was used to translate all test set sentences where all clauses within the German parse tree contained at least one verb and there was no embedding of clauses-there were 1,335 sentences which met these criteria. The remaining 665 sentences were translated with the baseline system. This set of 2,000 translations had a BLEU score of 23.96. The baseline system alone achieved a BLEU score of 25.26 on the same set of 2,000 test sentences. We also obtained judgments from two human annotators on 100 randomly-drawn sentences on which the base-line and AEP-based outputs differed. For each example the annotator viewed the reference translation, together with the two systems' translations presented in a random order. Annotator 1 judged 62 translations to be equal in quality, 16 translations to be better under the AEP system, and 22 to be better for the baseline system. Annotator 2 judged 37 translations to be equal in quality, 32 to be better under the baseline, and 31 to be better under the AEP-based system.</Paragraph> </Section> class="xml-element"></Paper>