XML Viewer - w05-0814

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0814_intro.xml

Size: 1,450 bytes

Last Modified: 2025-10-06 14:03:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0814">
  <Title>ISI's Participation in the Romanian-English Alignment Task</Title>
  <Section position="4" start_page="0" end_page="91" type="intro">
    <SectionTitle>
2 Baseline
</SectionTitle>
    <Paragraph position="0"> To train our systems, Model 4 was trained two times, first using Romanian as the source language and then using English as the source language. For each training, we ran 5 iterations of Model 1, 5 iterations of the HMM model and 3 iterations of Model 4.</Paragraph>
    <Paragraph position="1"> For the distortion calculations of Model 4, we removed the dependencies on Romanian and English word classes. We applied the &amp;quot;union&amp;quot;, &amp;quot;intersection&amp;quot; and &amp;quot;refined&amp;quot; symmetrization metrics (Och and Ney, 2003) to the final alignments output from training, as well as evaluating the two final alignments directly.</Paragraph>
    <Paragraph position="2"> We tried to have a strong baseline. GIZA++ has many free parameters which can not be estimated using Maximum Likelihood training. We did not use  the defaults, but instead used settings which produce good AER results on French/English bitext. We also optimized p0 on the 2003 test set (using AER), rather than using likelihood training. Turning off the extensions to GIZA++ and training p0 as in (Brown et al., 1993) produces a substantial increase in AER.</Paragraph>
  </Section>
class="xml-element"></Paper>

Download Original XML