File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/p01-1004_intro.xml

Size: 4,064 bytes

Last Modified: 2025-10-06 14:01:12

<?xml version="1.0" standalone="yes"?>
<Paper uid="P01-1004">
  <Title>Low-cost, High-performance Translation Retrieval: Dumber is Better</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Translation memories (TMs) are a list of translation records (source language strings pairedwithauniquetargetlanguagetranslation), which the TM system accesses in suggesting a list of target language (L2)translationcandi null a match with the overall L1 input, or the input is partitioned into coherent segments, and individual translations retrieved for each (Sato and Nagao, 1990; Nirenburg et al., 1993); this is the  firststeptowardgeneratingacustomisedtranslationfortheinput. Withstand-aloneTMsystems, ontheotherhand,thesystemselectsanarbitrary numberoftranslationcandidatesfallingwithina certain empirical corridor of similarity with the overallinputstring,andsimplyoutputsthesefor manualmanipulationbytheuserinfashioningthe finaltranslation.</Paragraph>
    <Paragraph position="1"> Akeyassumptionsurroundingthebulkofpast TRresearchhasbeenthatthegreaterthematch stringency/linguistic awareness of the retrieval mechanism, the greater the final retrieval accuracywillbecome. Naturally,anyappreciationin retrievalcomplexitycomesatapriceintermsof computationaloverhead. Wethusfollowthelead  ofBaldwinandTanaka(2000)inaskingthequestion: whatistheempiricaleffectonretrievalperformance of different match approaches? Here, retrieval performance is defined as the combinationofretrievalspeedandaccuracy,withtheideal null  methodofferingfastresponsetimesathighaccuracy. null In this paper, we choose to focus on retrieval performancewithin a Japanese-EnglishTR context. One key area of interest with Japanese is the effect that segmentation has on retrieval performance. As Japanese is a non-segmenting language (does not explicitly delimit words orthographically), we can take the brute-force approach in treating each string as a sequence of characters (character-based indexing), or alternativelycallupon segmentationtechnologyin partitioningeachstringintowords(word-based indexing). Orthogonaltothisisthequestionof sensitivityto segment order. Thatis, shouldour match mechanism treat each string as an unorganisedmultisetofterms(thebag-of-wordsap- null proach), or attempt to find the match that best preserves the original segment order in the input (the segment order-sensitive approach)? We tackle this issue by implementing a sample  ofrepresentativebag-of-wordsandsegmentordersensitive methods and testing the retrieval performanceof each. As athird orthogonalparameter, we consider the effects of segment contiguity. Thatis,domatchesovercontiguoussegments provide closer overall translation correspondence thanmatchesoverdisplacedsegments? Segment contiguityiseitherexplicitlymodelledwithinthe stringmatchmechanism,orprovidedasanadd-in intheformofsegmentN-grams.</Paragraph>
    <Paragraph position="2"> To preempt the major findings of this paper, over a series of experiments we find that character-based indexing is consistently superior to word-based indexing. Furthermore, the bag-of-words methods we test are equivalent in retrieval accuracy to the more expensive segment order-sensitivemethods,butsuperiorinretrieval speed. Finally,segmentcontiguitymodelsprovide benefits in terms of both retrieval accuracy and retrieval speed, particularly when coupled with character-basedindexing. We thus provideclear evidencethathigh-performanceTRisachievable  withnaivemethods,andmoresothatsuchmethods outperform more intricate, expensive methods. Thatis,thedumbertheretrievalmechanism, thebetter.</Paragraph>
    <Paragraph position="3"> Below,wereviewtheorthogonalparametersof segmentation, segment order and segment contiguity(SS2). Wethenpresentarangeofbothbagof-wordsandsegmentorder-sensitivestringcom- null parison methods (SS 3) and detail the evaluation methodology (SS 4). Finally, we evaluate the differentmethodsinaJapanese-EnglishTRcontext null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML