File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/03/w03-1001_abstr.xml
Size: 1,103 bytes
Last Modified: 2025-10-06 13:43:06
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1001"> <Title>A Projection Extension Algorithm for Statistical Machine Translation</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> In this paper, we describe a phrase-based unigram model for statistical machine translation that uses a much simpler set of model parameters than similar phrase-based models. The units of translation are blocks - pairs of phrases. During decoding, we use a block unigram model and a word-based trigram language model. During training, the blocks are learned from source interval projections using an underlying high-precision word alignment.</Paragraph> <Paragraph position="1"> The system performance is significantly increased by applying a novel block extension algorithm using an additional high-recall word alignment. The blocks are further filtered using unigram-count selection criteria. The system has been successfully test on a Chinese-English and an Arabic-English translation task.</Paragraph> </Section> class="xml-element"></Paper>