File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/h05-1095_evalu.xml
Size: 8,997 bytes
Last Modified: 2025-10-06 13:59:20
<?xml version="1.0" standalone="yes"?> <Paper uid="H05-1095"> <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 755-762, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Translating with non-contiguous phrases</Title> <Section position="7" start_page="758" end_page="760" type="evalu"> <SectionTitle> 6 Evaluation </SectionTitle> <Paragraph position="0"> We have conducted a number of experiments to evaluate the potential of our approach. We were particularly interested in assessing the impact of non-contiguous bi-phrases on translation quality, as well as comparing the different bi-phrase library contruction strategies evoked in Section 2.1.</Paragraph> <Section position="1" start_page="758" end_page="759" type="sub_section"> <SectionTitle> 3It can be seen that, as the set of possible translations for </SectionTitle> <Paragraph position="0"> S stabilizes, we eventually reach a point where the procedure converges to a maximum. In practice, however, we can usually stop much earlier.</Paragraph> </Section> <Section position="2" start_page="759" end_page="759" type="sub_section"> <SectionTitle> 6.1 Experimental Setting </SectionTitle> <Paragraph position="0"> All our experiments focused exclusively on French to English translation, and were conducted using the Aligned Hansards of the 36th Parliament of Canada, provided by the Natural Language Group of the USC Information Sciences Institute, and edited by Ulrich Germann. From this data, we extracted three distinct subcorpora, which we refer to as the bi-phrase-building set, the training set and the test set. These were extracted from the so-called training, test-1 and test-2 portions of the Aligned Hansard, respectively. Because of efficiency issues, we limited ourselves to source-language sentences of 30 words or less. More details on the evaluation data is presented in Table 14.</Paragraph> </Section> <Section position="3" start_page="759" end_page="759" type="sub_section"> <SectionTitle> 6.2 Bi-phrase Libraries </SectionTitle> <Paragraph position="0"> From the bi-phrase-building set, we built a number of libraries. A first family of libraries was based on a word alignment &quot;A&quot;, produced using the Refined method described in (Och and Ney, 2003) (combination of two IBM-Viterbi alignments): we call these the A libraries. A second family of libraries was built using alignments &quot;B&quot; produced with the method in (Goutte et al., 2004): these are the B libraries. The most notable difference between these two alignments is that B contains &quot;native&quot; non-contiguous bi-phrases, while A doesn't.</Paragraph> <Paragraph position="1"> Some libraries were built by simply extracting the cepts from the alignments of the bi-phrase-building corpus: these are the A1 and B1 libraries, and variants. Other libraries were obtained by combining cepts that co-occur within the same pair of sentences, to produce &quot;composite&quot; bi-phrases. For instance, the A2 libraries contain combinations of 1 or 2 cepts from alignment A; B3 contains combinations of 1, 2 or 3 cepts, etc.</Paragraph> <Paragraph position="2"> Some libraries were built using a &quot;gap-size&quot; filter. For instance library A2-g3 contains those bi-phrases obtained by combining 1 or 2 cepts from alignment A, and in which neither the source nor the target phrase contains more than 3 gaps. In particular, library B1-g0 does not contain any non-contiguous bi-phrases.</Paragraph> <Paragraph position="3"> 4Preliminary experiments on different data sets allowed us to establish that 800 sentences constituted an acceptable size for estimating model parameters. With such a corpus, the estimation procedure converges after just 2 or 3 iterations. Finally, all libraries were subjected to the same two filtering procedures: the first excludes all bi-phrases that occur only once in the training corpus; the second, for any given source-language phrase, retains only the 20 most frequent target-language equivalents. While the first of these filters typically eliminates a large number of entries, the second only affects the most frequent source phrases, as most phrases have less than 20 translations.</Paragraph> </Section> <Section position="4" start_page="759" end_page="760" type="sub_section"> <SectionTitle> 6.3 Experiments </SectionTitle> <Paragraph position="0"> The parameters of the model were optimized independantly for each bi-phrase library. In all cases, we performed only 2 iterations of the training procedure, then measured the performance of the system on the test set in terms of the NIST and BLEU scores against one reference translation. As a point of comparison, we also trained an IBM-4 translation model with the GIZA++ toolkit (Och and Ney, 2000), using the combined bi-phrase building and training sets, and translated the test set using the ReWrite decoder (Germann et al., 2001)5.</Paragraph> <Paragraph position="1"> Table 2 describes the various libraries that were used for our experiments, and the results obtained for each.</Paragraph> <Paragraph position="2"> The top part of the table presents the results for the A libraries. As can be seen, library A1 achieves approximately the same score as the baseline system; this is expected, since this library is essentially 5Both the ReWrite and our own system relied on a trigram language model trained on the English half of the bi-phrase building set.</Paragraph> <Paragraph position="3"> made up of one-to-one alignments computed using IBM-4 translation models. Adding contiguous bi-phrases obtained by combining pairs of alignments does gain us some mileage (+0.1 NIST)6. Again, this is consistent with results observed with other systems (Tillmann and Xia, 2003). However, the addition of non-contiguous bi-phrases (A2-g3) does not seem to help.</Paragraph> <Paragraph position="4"> The middle part of Table 2 presents analogous results for the corresponding B libraries, plus the B1-g0 library, which contains only those cepts from the B alignment that are contiguous. Interestingly, in the experiments reported in (Goutte et al., 2004), alignment method B did not compare favorably to A under the widely used Alignment Error Rate (AER) metric. Yet, the B1-g0 library performs better than the analogous A1 library on the translation task.</Paragraph> <Paragraph position="5"> This suggests that AER may not be an appropriate metric to measure the potential of an alignment for phrase-based translation.</Paragraph> <Paragraph position="6"> Adding non-contiguous bi-phrases allows another small gain. Again, this is interesting, as it suggests that &quot;native&quot; non-contiguous bi-phrases are indeed useful for the translation task, i.e. those non-contiguous bi-phrases obtained directly as cepts in the B alignment.</Paragraph> <Paragraph position="7"> Surprisingly, however, combining cepts from the B alignment to produce contiguous bi-phrases (B2-G0) does not turn out to be fruitful. Why this is so is not obvious and, certainly, more experiments would be required to establish whether this tendency continues with larger combinations (B3g0, B4-g0...). Composite non-contiguous bi-phrases produced with the B alignments (B2-g3) seem to bring improvements with regard to &quot;basic&quot; bi-phrases (B1), but it is not clear whether these are significant.</Paragraph> <Paragraph position="8"> 6While the differences in scores in these and other experiments are relatively small, we believe them to be significant, as they have been confirmed systematically in other experiments and, in our experience, by visual inspection of the translations. Visual examination of the B1 library reveals that many non-contiguous bi-phrases contain long-spanning phrases (i.e. phrases containing long sequences of gaps). To verify whether or not these were really useful, we tested a series of B1 libraries with different gap-size filters. It must be noted that, because of the final histogram filtering we apply on libraries (retain only the 20 most frequent translations of any source phrase), library B1-g1 is not a strict subset of B1-g2. Therefore, filtering on gap-size usually represents a tradeoff between more frequent long-spanning bi-phrases and less frequent short-spanning ones.</Paragraph> <Paragraph position="9"> The results of these experiments appear in the lower part of Table 2. While the differences in score are small, it seems that concentrating on bi-phrases with 3 gaps or less affords the best compromise.</Paragraph> <Paragraph position="10"> For small libraries such as those under consideration here, this sort of filtering may not be very important. However, for higher-order libraries (B2, B3, etc.) it becomes crucial, because it allows to control the exponential growth of the libraries.</Paragraph> </Section> </Section> class="xml-element"></Paper>