File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-3255_intro.xml

Size: 3,506 bytes

Last Modified: 2025-10-06 14:02:51

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3255">
  <Title>Efficient Decoding for Statistical Machine Translation with a Fully Expanded WFST Model</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Recently, research on statistical machine translation has grown along with the increase in computational power as well as the amount of bilingual corpora.</Paragraph>
    <Paragraph position="1"> The basic idea of modeling machine translation was proposed by Brown et al. (1993), who assumed that machine translation can be modeled on noisy channels. The source language is encoded from a target language by a noisy channel, and translation is performed as a decoding process from source language to target language.</Paragraph>
    <Paragraph position="2"> Knight (1999) showed that the translation problem defined by Brown et al. (1993) is NPcomplete. Therefore, with this model it is almost impossible to search for optimal solutions in the decoding process. Several studies have proposed methods for searching suboptimal solutions.</Paragraph>
    <Paragraph position="3"> Berger et al. (1996) and Och et al. (2001) proposed such depth-first search methods as stack decoders. Wand and Waibel (1997) and Tillmann and Ney (2003) proposed breadth-first search methods, i.e. beam search. Germann (2001) and Watanabe and Sumita (2003) proposed greedy type decoding methods. In all of these search algorithms, better representation of the statistical model in systems can improve the search efficiency.</Paragraph>
    <Paragraph position="4"> For model representation, a search method based on weighted finite-state transducer (WFST) (Mohri et al., 2002) has achieved great success in the speech recognition field. The basic idea is that each statistical model is represented by a WFST and they are composed beforehand; the composed model is optimized by WFST operations such as determinization and minimization. This fully expanded model permits efficient searches. Our motivation is to apply this approach to machine translation. However, WFST optimization operations such as determinization are nearly impossible to apply to WFSTs in machine translation because they are much more ambiguous than speech recognition. To reduce the ambiguity, we propose a WFST optimization method that considers the statistics of hypotheses while decoding. null Some approaches have applied WFST to statistical machine translation. Knight and Al-Onaizan (1998) proposed the representation of IBM model 3 with WFSTs; Bangalore and Riccardi (2001) studied WFST models in call-routing tasks, and Kumar and Byrne (2003) modeled phrase-based translation by WFSTs. All of these studies mainly focused on the representation of each submodel used in machine translation. However, few studies have focued on the integration of each WFST submodel to improve the decoding efficiency of machine translation.</Paragraph>
    <Paragraph position="5"> To this end, we propose a method that expands all of the submodels into a composition model, reducing the ambiguity of the expanded model by the statistics of hypotheses while decoding. First, we explain the translation model (Brown et al., 1993; Knight and Al-Onaizan, 1998) that we used as a base for our decoding research. Second, our proposed method is introduced. Finally, experimental results show that our proposed method drastically improves decoding efficiency.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML