File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-0315_concl.xml

Size: 1,582 bytes

Last Modified: 2025-10-06 13:53:42

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0315">
  <Title>Efficient Optimization for Bilingual Sentence Alignment Based on Linear Regression</Title>
  <Section position="7" start_page="11" end_page="11" type="concl">
    <SectionTitle>
6 Conclusion
</SectionTitle>
    <Paragraph position="0"> In this paper, we have demonstrated ways to efficiently optimize a sentence alignment module, such that it is able to select aligned sentence pairs of high translation quality automatically. This procedure of alignment score optimization requires (a) a small number of human subjects who annotate a set of about 100 sentence pairs each for translation quality; and (b) a set of alignment scores, based on perplexity and sentence length ratio, to be able to learn to predict the human scores.</Paragraph>
    <Paragraph position="1"> Based on the learned predictions, by means of linear regression, the alignment program can choose the best sentence pair candidates to be included in the training data for the SMT system re-estimation.</Paragraph>
    <Paragraph position="2"> Our experiments showed that, for Chinese-English language pair, perplexity based on the reverse word pair conditional probability p(e|f) (PP-2) gives the most reliable prediction among the five models proposed in this paper; the regression model, which combines those five models, give the best correlation between human score and automatic predictions. Our approach needs only a fairly limited number of human labeled sentences pairs, and is an efficient optimization of the sentence</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML