XML Viewer - w02-1405

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1405_intro.xml
Size: 3,445 bytes
Last Modified: 2025-10-06 14:01:35
<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1405">
  <Title>Improving a general-purpose Statistical Translation Engine by Terminological lexicons</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> SMT mainly became known to the linguistic community as a result of the seminal work of Brown et al. (1993b). Since then, many researchers have invested efiort into designing better models than the ones proposed in the aforementioned article and several new exciting ways have been suggested to attack the problem.</Paragraph>
    <Paragraph position="1"> For instance, Vogel et al. (1996) succeeded in overcoming the independence assumption made by IBM models by introducing order-1 Hidden Markov alignment models. Och et al. (1999) described an elegant way of integrating automatically acquired probabilistic templates into the translation process, and Nie...en and Ney (2001) did the same for morphological information.</Paragraph>
    <Paragraph position="2"> Radically difierent statistical models have also been proposed. (Foster, 2000) investigated maximum entropy models as an alternative to the so-called noisy-channel approach. Very recently, Yamada and Knight (2001) described a model in which the noisy-channel takes as input a parsed sentence rather than simple words.</Paragraph>
    <Paragraph position="3"> While many of these studies include intensive evaluation sections, it is not always easy to determine exactly how well statistical translation can do on a given task. We know that on a speciflc task of spoken language translation, Wang (1998) provided evidence that SMT compared favorably to a symbolic translation system; but as mentioned by the author, the comparison was not totally fair.</Paragraph>
    <Paragraph position="4"> We do not know of any studies that describe extensive experiments evaluating the adequacy of SMT in a real translation environment. We prefer not to commit ourselves to deflning what a real translation task is; instead, we adopt the conservative point of view that a viable translation engine (statistical or not) is one that copes with texts that may be very difierent in nature from those used to train it.</Paragraph>
    <Paragraph position="5"> This fairly general deflnition suggests that adaptativity is a cornerstone of a successful SMT engine. Curiously enough, we are not aware of much work on adaptative SMT, despite the tremendous amount of work done on adaptative statistical language modeling.</Paragraph>
    <Paragraph position="6"> In this paper, we propose to evaluate how a statistical engine behaves when translating a very domain speciflc text which is far difierent from the corpus used to trained both our translation and language models. We flrst describe our translation engine. In section 3, we quantify and analyse the performance deterioration of an SMT engine trained on a broad-based corpus (the Hansard) when used to translate a domain speciflc text (in this study, a manual for military snipers). In section 4, We then suggest a simple but natural way of improving a broad-based SMT engine; that is, by opening the engine to available terminological resources. In section 5, we report on the improvement we observed by implementing our proposed approach. Finally, in section 6 we discuss other approaches we feel can lead to more robust translation.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML