File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/p04-1066_concl.xml

Size: 2,630 bytes

Last Modified: 2025-10-06 13:54:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1066">
  <Title>Improving IBM Word-Alignment Model 1</Title>
  <Section position="10" start_page="3" end_page="5" type="concl">
    <SectionTitle>
9 Conclusions
</SectionTitle>
    <Paragraph position="0"> We have demonstrated that it is possible to improve the performance of Model 1 in terms of alignment error by about 30%, simply by changing the way its parameters are estimated. Almost half this improvement is obtained with a simple heuristic model that does not require EM re-estimation.</Paragraph>
    <Paragraph position="1"> It is interesting to contrast our heuristic model with the heuristic models used by Och and Ney (2003) as baselines in their comparative study of alignment models. The major difference between our model and theirs is that they base theirs on the Dice coefficient, which is computed by the formula  while we use the log-likelihood-ratio statistic defined in Section 6. Och and Ney find that the standard version of Model 1 produces more accurate alignments after only one iteration of EM than either of the heuristic models they consider, while we find that our heuristic model outperforms the standard version of Model 1, even with an optimal number of iterations of EM.</Paragraph>
    <Paragraph position="2"> While the Dice coefficient is simple and intuitive--the value is 0 for words never found together, and 1 for words always found together--it lacks the important property of the LLR statistic that scores for rare words are discounted; thus it does not address the over-fitting problem for rare words.</Paragraph>
    <Paragraph position="3"> The list of applications of IBM word-alignment Model 1 given in Section 1 should be sufficient to convince anyone of the relevance of improving the model. However, it is not clear that AER as defined by Och and Ney (2003) is always the appropriate way to evaluate the quality of the model, since the Viterbi word alignment that AER is based on is seldom used in applications of Model 1.</Paragraph>
    <Paragraph position="4">  Moreover, it is notable that while the versions of Model 1 having the lowest AER have dramatically higher precision than the standard version, they also have quite a bit lower recall. If AER does not reflect the optimal balance between precision and recall for a particular application, then optimizing AER may not produce the best task-based performance for that application. Thus the next step in this research must be to test whether the improvements in AER we have demonstrated for Model 1 lead to improvements on task-based performance measures.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML