File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0814_intro.xml
Size: 1,450 bytes
Last Modified: 2025-10-06 14:03:15
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0814"> <Title>ISI's Participation in the Romanian-English Alignment Task</Title> <Section position="4" start_page="0" end_page="91" type="intro"> <SectionTitle> 2 Baseline </SectionTitle> <Paragraph position="0"> To train our systems, Model 4 was trained two times, first using Romanian as the source language and then using English as the source language. For each training, we ran 5 iterations of Model 1, 5 iterations of the HMM model and 3 iterations of Model 4.</Paragraph> <Paragraph position="1"> For the distortion calculations of Model 4, we removed the dependencies on Romanian and English word classes. We applied the &quot;union&quot;, &quot;intersection&quot; and &quot;refined&quot; symmetrization metrics (Och and Ney, 2003) to the final alignments output from training, as well as evaluating the two final alignments directly.</Paragraph> <Paragraph position="2"> We tried to have a strong baseline. GIZA++ has many free parameters which can not be estimated using Maximum Likelihood training. We did not use the defaults, but instead used settings which produce good AER results on French/English bitext. We also optimized p0 on the 2003 test set (using AER), rather than using likelihood training. Turning off the extensions to GIZA++ and training p0 as in (Brown et al., 1993) produces a substantial increase in AER.</Paragraph> </Section> class="xml-element"></Paper>