File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-1627_evalu.xml
Size: 4,785 bytes
Last Modified: 2025-10-06 13:59:51
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1627"> <Title>Efficient Search for Inversion Transduction Grammar</Title> <Section position="7" start_page="228" end_page="230" type="evalu"> <SectionTitle> 5 Experiments </SectionTitle> <Paragraph position="0"> We tested the performance of our heuristics for alignment on a Chinese-English newswire corpus.</Paragraph> <Paragraph position="1"> Probabilities for the ITG model were trained using Expectation Maximization on a corpus of 18,773 sentence pairs with a total of 276,113 Chinese words and 315,415 English words. For EM training, we limited the data to sentences of no more than 25 words in either language. Here we present timing results for nding the Viterbi alignment of longer sentences using this xed translation model with different heuristics. We compute alignments on a total of 117 test sentences, which are broken down by length as shown in Table 1.</Paragraph> <Paragraph position="2"> Results are presented both in terms of time and the number of arcs added to the chart before the optimal parse is found. Full refers to exhaustive parsing, that is, building a complete chart with all n4 arcs. Uniform refers to a best- rst parsing strategy that expands the arcs with the highest inside probability at each step, but does not incorporate an estimate of the outside probability. Ibm1encn denotes our heuristic based on IBM model 1, applied to translations from English to Chinese, while ibm1sym applies the Model 1 heuristic in both translation directions and takes the minimum. The factor by which times were decreased was found to be roughly constant across different length sentences. The alignment times for the entire test set are shown in Table 2, the best heuristic is 3.9 times faster than exhaustive and without the hook trick, as well as results using our heuristic (labeled A*) and with beam pruning (which no longer produces optimal results). On the right, we show the relationship between computation time and BLEU scores as the pruning threshold is varied for both A* search and bottom-up CYK parsing. dynamic programming.</Paragraph> <Paragraph position="3"> We did our ITG decoding experiments on the LDC 2002 MT evaluation data set for translation of Chinese newswire sentences into English. The evaluation data set has 10 human translation references for each sentence. There are a total of 371 Chinese sentences of no more than 20 words in the data set. These sentences are the test set for our different versions of ITG decoders using both a bigram language model and a trigram language model. We evaluate the translation results by comparing them against the reference translations using the BLEU metric. The word-for-word translation probabilities are from the translation model of IBM Model 4 trained on a 160-million-word English-Chinese parallel corpus using GIZA++.</Paragraph> <Paragraph position="4"> The language model is trained on a 30-millionword English corpus. The rule probabilities for ITG are from the same training as in the alignment experiments described above.</Paragraph> <Paragraph position="5"> We compared the BLEU scores of the A* decoder and the ITG decoder that uses beam ratio pruning at each stage of bottom-up parsing. In the case of bigram-integrated decoding, for each input word, the best 2 translations are put into the bag of output words. In the case of trigram-integrated decoding, top 5 candidate words are chosen. The A* decoder is guaranteed to nd the Viterbi translation that maximizes the product of n-grams probabilities, translation probabilities (including insertions and deletions) and inversion rule probabilities by choosing the right words and the right word order subject to the ITG constraint.</Paragraph> <Paragraph position="6"> Figure 5 (left) demonstrates the speedup ob- null tained through the hook trick, the heuristic, and pruning, all based on A* search. Table 3 shows the improvement of BLEU score after applying the A* algorithm to nd the optimal translation under the model. Figure 5 (right) investigates the relationship between the search effort and BLEU score for A* and bottom-up CYK parsing, both with pruning. Pruning for A* works in such a way that we never explore a low probability hypothesis falling out of a certain beam ratio of the best hypothesis within the bucket of X[i, j,[?],[?]], where [?] means any word. Table 4 shows results for trigram-integrated decoding. However, due to time constraint, we have not explored time/performance tradeoff as we did for bigram decoding.</Paragraph> <Paragraph position="7"> The number of combinations in the table is the average number of hyperedges to be explored in searching, proportional to the total number of computation steps.</Paragraph> </Section> class="xml-element"></Paper>