File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0812_metho.xml
Size: 2,674 bytes
Last Modified: 2025-10-06 14:09:56
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0812"> <Title>Improved HMM Alignment Models for Languages with Scarce Resources</Title> <Section position="4" start_page="84" end_page="85" type="metho"> <SectionTitle> 3 Results with the Workshop Data </SectionTitle> <Paragraph position="0"> In our experiments, the dependency parse and parts of speech are produced by minipar (Lin, 1998). This parser has been used in a much different alignment model (Cherry and Lin, 2003). Since we only had parses for English, we did not use tree distortion in the application of P(ejf), needed for symmetrization.</Paragraph> <Paragraph position="1"> The parameter settings that we used in aligning the workshop data are presented in Table 1. Although our prior work with English and French indicated that intersection was the best method for symmetrization, we found in development that this varied depending on the characteristics of the corpus and the type of annotation (in particular, whether the annotation set included probable alignments). The results are summarized in Table 2. It shows results with our HMM model using both Equations 2 and 4 as our distortion model, which represent the unlimited and limited resource tracks, respectively. It also includes a comparison with IBM Model 4, for which we use a training sequence of IBM Model 1 (5 iterations), HMM (6 iterations), and IBM Model 4 (5 iterations). This sequence performed well in an evaluation of the IBM Models (Och and Ney, 2003).</Paragraph> <Paragraph position="2"> For comparative purposes, we show results of applying both P(fje) and P(ejf) prior to symmetrization, along with results of symmetrization. Comparison of the asymmetric and symmetric results largely supports the hypothesis presented in Section 2.3, as our system generally produces much better recall than IBM Model 4, while offering a competitive precision. Our symmetrized results usually produced higher recall and precision, and lower alignment error rate.</Paragraph> <Paragraph position="3"> We found that the largest gain in performance came from the improved initialization. The combined distortion model (Equation 4), which provided a small benefit over the surface distortion model (Equation 2) on the development set, performed slightly worse on the test set. We found that the dependencies on C(eai[?]1) and T (eai[?]1) were harmful to the P(fje) alignment for Inuktitut, and did not submit results for the unlimited resources configuration. However, we found that alignment was generally difficult for all models on this particular task, perhaps due to the agglutinative nature of Inuktitut.</Paragraph> </Section> class="xml-element"></Paper>