File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/p03-1039_evalu.xml
Size: 2,651 bytes
Last Modified: 2025-10-06 13:58:59
<?xml version="1.0" standalone="yes"?> <Paper uid="P03-1039"> <Title>Chunk-based Statistical Translation</Title> <Section position="6" start_page="4" end_page="4" type="evalu"> <SectionTitle> 4 Experiments </SectionTitle> <Paragraph position="0"> The corpus for this experiment was extracted from the Basic Travel Expression Corpus (BTEC), a collection of conversational travel phrases for Japanese and English (Takezawa et al., 2002) as seen in Table 1. The entire corpus was split into three parts: 152,169 sentences for training, 4,846 sentences for testing, and the remaining 10,148 sentences for parameter tuning, such as the termination criteria for the training iteration and the parameter tuning for decoders.</Paragraph> <Paragraph position="1"> Three translation systems were tested for comparison: null model4: Word alignment based translation model, IBM Model 4 with a beam search decoder.</Paragraph> <Paragraph position="2"> chunk3: Chunk-based translation model, limiting the maximum allowed chunk size to 3.</Paragraph> <Paragraph position="3"> model3+: chunk3 with example-based chunk candidate generation.</Paragraph> <Paragraph position="4"> Figure 5 shows some examples of viterbi chunking and chunk alignment for chunk3.</Paragraph> <Paragraph position="5"> Translations were carried out on 510 sentences selected randomly from the test set and evaluated according to the following criteria with 16 reference sets.</Paragraph> <Paragraph position="6"> WER: Word-error-rate, which penalizes the edit distance against reference translations.</Paragraph> <Paragraph position="7"> For simplicity of notation, dependence on other variables are omitted, such asJ.</Paragraph> <Paragraph position="8"> [ i * have ][the * number ][of my * passport ] considering positional disfluencies.</Paragraph> <Paragraph position="9"> BLEU: BLEU score, which computes the ratio of n-gram for the translation results found in reference translations (Papineni et al., 2002).</Paragraph> <Paragraph position="10"> SE: Subjective evaluation ranks ranging from A to D (A:Perfect, B:Fair, C:Acceptable and D:Nonsense), judged by native speakers.</Paragraph> <Paragraph position="11"> Table 2 summarizes the evaluation of Japanese-to-English translations, and Figure 6 presents some of the results by model4 and chunk3+.</Paragraph> <Paragraph position="12"> As Table 2 indicates, chunk3 performs better than model4 in terms of the non-subjective evaluations, although it scores almost equally in subjective evaluations. With the help of example-based decoding, chunk3+was evaluated as the best among the three systems.</Paragraph> </Section> class="xml-element"></Paper>