File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/p98-2221_evalu.xml
Size: 4,638 bytes
Last Modified: 2025-10-06 14:00:34
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2221"> <Title>Modeling with Structures in Statistical Machine Translation</Title> <Section position="7" start_page="1360" end_page="1362" type="evalu"> <SectionTitle> 5 Evaluation and Discussion </SectionTitle> <Paragraph position="0"> We used the Janus English/German scheduling corpus (Suhm et al., 1995) to train our phrase-based alignment model. Around 30,000 parallel sentences (400,000 words altogether for both languages) were used for training. The same data were used to train Simplified Model \[I he she itself\] \[have propose remember hate...\] \[eleventh thirteenth...\] \[after before around\] \[one two three...\] lation gets one credit, an okay translation gets 1/2 credit, an incorrect one gets 0 credit. Since the IBM Model 3 decoder is too slow, its performance was not measured on the entire test set.</Paragraph> <Paragraph position="1"> ity mass is more scattered in the structure-based model, reflecting the fact that English and German have different phrase orders. On the other hand, the word based model tends to align a target word with the source words at similar positions, which resulted in many incorrect alignments, hence made the word translation probability t distributed over many unrelated target words, as to be shown in the next subsection.</Paragraph> <Section position="1" start_page="1361" end_page="1361" type="sub_section"> <SectionTitle> 5.3 Model Complexity </SectionTitle> <Paragraph position="0"> language model. A preprocessor splited German compound nouns. Words that occurred only once were taken as unknown words. This resulted in a lexicon of 1372 English and 2202 German words. The English/German lexicons were classified into 250 classes in each language and 560 English phrases were constructed upon these classes with the grammar inference algorithm described earlier.</Paragraph> <Paragraph position="1"> We limited the maximum sentence length to be 20 words/15 phrases long, the maximum fertility for non-null words to be 3.</Paragraph> </Section> <Section position="2" start_page="1361" end_page="1361" type="sub_section"> <SectionTitle> 5.1 Translation Accuracy </SectionTitle> <Paragraph position="0"> Table 1 shows the end-to-end translation performance. The structure-based model achieved an error reduction of around 12.5% over the word-based alignment models.</Paragraph> </Section> <Section position="3" start_page="1361" end_page="1362" type="sub_section"> <SectionTitle> 5.2 Word Order and Phrase Alignment </SectionTitle> <Paragraph position="0"> Table 2 shows the alignment distribution for the first German word/phrase in Simplified Model 2 and the structure-based model. The probabil-The structure-based model has 3,081,617 free parameters, an increase of about 2% over the 3,022,373 free parameters of Simplified Model 2.</Paragraph> <Paragraph position="1"> This small increase does not cause over-fitting, as the performance on the test data suggests.</Paragraph> <Paragraph position="2"> On the other hand, the structure-based model is more accurate. This can be illustrated with an example of the translation probability distribution of the English word 'T'. Table 3 shows the possible translations of 'T' with probability greater than 0.01. It is clear that the structure-based model &quot;focuses&quot; better on the correct translations. It is interesting to note that the German translations in Simplified Model 2 often appear at the beginning of a sentence, the position where 'T' often appears in English sentences. It is the biased word-based alignments that pull the unrelated words together and increase the translation uncertainty.</Paragraph> <Paragraph position="3"> We define the average translation entropy as m n F_. P(ei) F_, -t(gs I ei)logt(gs l ei).</Paragraph> <Paragraph position="4"> i=O j=l in the structure-based model. The second distribution reflects the higher possibility of phrase reordering in translation.</Paragraph> <Paragraph position="5"> is more uncertain in the word-based alignment model because the biased alignment distribution forced the associations between unrelated English/German words.</Paragraph> <Paragraph position="6"> (m, n are English and German lexicon size.) It is a direct measurement of word translation uncertainty. The average translation entropy is 3.01 bits per source word in Simplified Model 2, 2.68 in Model 3, and 2.50 in the structured-based model. Therefore information-theoretically the complexity of the word-based alignment models is higher than that of the structure-based model.</Paragraph> </Section> </Section> class="xml-element"></Paper>