XML Viewer - w02-1405

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/w02-1405_concl.xml
Size: 3,015 bytes
Last Modified: 2025-10-06 13:53:23
<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1405">
  <Title>Improving a general-purpose Statistical Translation Engine by Terminological lexicons</Title>
  <Section position="7" start_page="0" end_page="0" type="concl">
    <SectionTitle>
6 Discussion
</SectionTitle>
    <Paragraph position="0"> In this study, we have shown that translating texts in speciflc domains with a general[]purpose statistical engine is di-cult. This suggests the need to implementing an adaptative strategy. Among the possible scenarios, we have shown that opening the engine to terminological resources is a natural and e-cient way of softening the decoder.</Paragraph>
    <Paragraph position="1"> In a similar vein, Marcu (2001) investigated how to combine Example Based Machine Translation (EBMT) and SMT approaches. The author automatically derived from the Hansard corpus what he calls a translation memory: actually a collection of pairs of source and target word sequences that are in a translation relation according to the viterbi alignment run with an IBM4 model that was also trained on the Hansard corpus. This collection of phrases was then merged with a greedy statistical decoder to improve the overall performance of the system.</Paragraph>
    <Paragraph position="2"> What this study suggests is that translation memories collected from a given corpus can improve the performance of a statistical engine trained on the same corpus, which in itself is an interesting result. A very similar study but with weaker results is derscribed in (Langlais et al., 2000), in the framework of the TransType project. Besides the difierent metrics the authors used, the discrepancy in performance in these two studies may be explained by the nature of the test corpora used. The test corpus in the latter study was more representative of a real translation task, while the test corpus that Marcu used was a set of around 500 French sentences of no more than 10 words.</Paragraph>
    <Paragraph position="3"> Our present study is close in spirit to these last two, except that we do not attack the problem of automatically acquiring bilingual lexicons; instead, we consider it a part of the translator's task to provide such lexicons. Actually, we feel this may be one of the only ways a user has of retaining some control over the engine's output, a fact that professional translators seem to appreciate (Langlais et al., 2001).</Paragraph>
    <Paragraph position="4"> As a flnal remark, we want to stress that we see the present study as a flrst step toward the eventual uniflcation of EBMT and SMT, and in this respect we agree with (Marcu, 2001). Potentially, of course, EBMT can ofier much more than just a simple list of equivalences, like those we used in this study. However, the basic approach we describe here still holds, as long as we can extend the notion of constraint used in this study to include non-consecutive sequences of words. This is a problem we we plan to investigate in future research.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML