File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/w05-0827_evalu.xml

Size: 4,778 bytes

Last Modified: 2025-10-06 13:59:32

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0827">
  <Title>Improving Phrase-Based Statistical Translation by modifying phrase extraction and including several features</Title>
  <Section position="9" start_page="151" end_page="152" type="evalu">
    <SectionTitle>
5 Evaluation framework
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="151" end_page="151" type="sub_section">
      <SectionTitle>
5.1 Corpus Statistics
</SectionTitle>
      <Paragraph position="0"> Experiments were performed to study the effect of our modifications in the phrases. The training material covers the transcriptions from April 1996 to September 2004. This material has been distributed by the European Parlament. In our experiments, we have used the distribution of RWTH of Aachen under the project of TC-STAR 1. The test material was used in the first evaluation of the project in March 2005. In our case, we have used the development divided in two sets. This material corresponds to the transcriptions of the sessions from October the 21st to October the 28th. It has been distributed by ELDA2. Results are reported for Spanish-to-English translations.</Paragraph>
    </Section>
    <Section position="2" start_page="151" end_page="152" type="sub_section">
      <SectionTitle>
5.2 Experiments
</SectionTitle>
      <Paragraph position="0"> The decoder used for the presented translation system is reported in [2]. This decoder is called MARIE and it takes into account simultaneously all the 7 features functions described above. It implements a beam-search strategy.</Paragraph>
      <Paragraph position="1"> As evaluation criteria we use: the Word Error Rate (WER), the BLEU score [15] and the NIST score [3].</Paragraph>
      <Paragraph position="2"> As follows we report the results for several experiments that show the performance of: the baseline, adding the posterior probability, IBM Model 1 and IBM1[?]1, and, finally, the modification of the phrases extraction.</Paragraph>
      <Paragraph position="3"> Optimisation. Significant improvements can be obtained by tuning the parameters of the features adequately. In the complet system we have 7 parameters to tune: the relatives frecuencies P(f|e) and P(e|f), IBM Model 1 and its inverse, the word penalty, the phrase penalty and the weight of the language model. We applied the widely used algorithm SIMPLEX to optimise [9]. In Table 2 (line 5th), we see the final results.</Paragraph>
      <Paragraph position="4"> Baseline. We report the results of the baseline.</Paragraph>
      <Paragraph position="5"> We use the union alignment and we extract the BP of length 3. As default language model feature, we use the standard trigram with smoothing Kneser-Ney and interpolation. Also we tune the parameters (only two parameters) with the SIMPLEX algorithm (see Table 2).</Paragraph>
      <Paragraph position="6"> Posterior probability. Table 2 shows the effect of using the posterior probability: P(e|f). We use all the features but the P(e|f) and we optimise the parameters. We see the results without this feature decrease around 1.1 points both in BLEU and WER (see line 2rd and 5th in Table 2).</Paragraph>
      <Paragraph position="7"> IBM Model 1. We do the same as in the paragraph above, we do not consider the IBM Model 1 and the IBM1[?]1. Under these conditions, the translation's quality decreases around 1.3 points both in BLEU and WER (see line 3th and 5th in  Modification of the Phrase Extraction. Finally, we made an experiment without modification of the phrases' length. We can see the comparison between: (1) the phrases of fixed maximum length of 3; and (2) including phrases with a maximum length of 5 which can not be generated by smaller phrases. We can see it in Table 2 (lines 4th and 5th). We observe that there is no much difference between the number of phrases, so this approach does not require more resources. However, we get slightly better scores.</Paragraph>
    </Section>
    <Section position="3" start_page="152" end_page="152" type="sub_section">
      <SectionTitle>
5.3 Shared Task
</SectionTitle>
      <Paragraph position="0"> This section explains the participation of &amp;quot;Exploiting Parallel Texts for Statistical Machine Translation&amp;quot;. We used the EuroParl data provided for this shared task [4]. A word-to-word alignment was performed in both directions as explained in section 2. The phrase-based translation system which has been considered implements a total of 7 features (already explained in section 4). Notice that the language model has been trained with the training provided in the shared task. However, the optimization in the parameters has not been repeated, and we used the parameters obtained in the sub-section above. We have obtained the results in the</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML