File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1626_intro.xml
Size: 1,507 bytes
Last Modified: 2025-10-06 14:04:01
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1626"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Distributed Language Modeling for N-best List Re-ranking</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Statistical language modeling has been widely used in natural language processing applications such as Automatic Speech Recognition (ASR), Statistical Machine Translation (SMT) (Brown et al., 1993) and Information Retrieval (IR) (Ponte and Croft, 1998).</Paragraph> <Paragraph position="1"> Conventional n-gram language modeling counts the frequency of all the n-grams in a corpus and calculates the conditional probabilities of a word given its history of n [?] 1 words P(wi|wi[?]1i[?]n+1). As the corpus size increases, building a high order language model offline becomes very expensive if it is still possible (Goodman, 2000).</Paragraph> <Paragraph position="2"> In this paper, we describe a new approach of language modeling using a distributed computing paradigm. Distributed language modeling can make use of arbitrarily large training corpora and provides a natural way for language model adaptation. null We applied the distributed LM to the task of re-ranking the N-best list in statistical machine translation and achieved significantly better translation quality when measured by the BLEU metric (Papineni et al., 2001).</Paragraph> </Section> class="xml-element"></Paper>