File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/p01-1004_intro.xml
Size: 4,064 bytes
Last Modified: 2025-10-06 14:01:12
<?xml version="1.0" standalone="yes"?> <Paper uid="P01-1004"> <Title>Low-cost, High-performance Translation Retrieval: Dumber is Better</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Translation memories (TMs) are a list of translation records (source language strings pairedwithauniquetargetlanguagetranslation), which the TM system accesses in suggesting a list of target language (L2)translationcandi null a match with the overall L1 input, or the input is partitioned into coherent segments, and individual translations retrieved for each (Sato and Nagao, 1990; Nirenburg et al., 1993); this is the firststeptowardgeneratingacustomisedtranslationfortheinput. Withstand-aloneTMsystems, ontheotherhand,thesystemselectsanarbitrary numberoftranslationcandidatesfallingwithina certain empirical corridor of similarity with the overallinputstring,andsimplyoutputsthesefor manualmanipulationbytheuserinfashioningthe finaltranslation.</Paragraph> <Paragraph position="1"> Akeyassumptionsurroundingthebulkofpast TRresearchhasbeenthatthegreaterthematch stringency/linguistic awareness of the retrieval mechanism, the greater the final retrieval accuracywillbecome. Naturally,anyappreciationin retrievalcomplexitycomesatapriceintermsof computationaloverhead. Wethusfollowthelead ofBaldwinandTanaka(2000)inaskingthequestion: whatistheempiricaleffectonretrievalperformance of different match approaches? Here, retrieval performance is defined as the combinationofretrievalspeedandaccuracy,withtheideal null methodofferingfastresponsetimesathighaccuracy. null In this paper, we choose to focus on retrieval performancewithin a Japanese-EnglishTR context. One key area of interest with Japanese is the effect that segmentation has on retrieval performance. As Japanese is a non-segmenting language (does not explicitly delimit words orthographically), we can take the brute-force approach in treating each string as a sequence of characters (character-based indexing), or alternativelycallupon segmentationtechnologyin partitioningeachstringintowords(word-based indexing). Orthogonaltothisisthequestionof sensitivityto segment order. Thatis, shouldour match mechanism treat each string as an unorganisedmultisetofterms(thebag-of-wordsap- null proach), or attempt to find the match that best preserves the original segment order in the input (the segment order-sensitive approach)? We tackle this issue by implementing a sample ofrepresentativebag-of-wordsandsegmentordersensitive methods and testing the retrieval performanceof each. As athird orthogonalparameter, we consider the effects of segment contiguity. Thatis,domatchesovercontiguoussegments provide closer overall translation correspondence thanmatchesoverdisplacedsegments? Segment contiguityiseitherexplicitlymodelledwithinthe stringmatchmechanism,orprovidedasanadd-in intheformofsegmentN-grams.</Paragraph> <Paragraph position="2"> To preempt the major findings of this paper, over a series of experiments we find that character-based indexing is consistently superior to word-based indexing. Furthermore, the bag-of-words methods we test are equivalent in retrieval accuracy to the more expensive segment order-sensitivemethods,butsuperiorinretrieval speed. Finally,segmentcontiguitymodelsprovide benefits in terms of both retrieval accuracy and retrieval speed, particularly when coupled with character-basedindexing. We thus provideclear evidencethathigh-performanceTRisachievable withnaivemethods,andmoresothatsuchmethods outperform more intricate, expensive methods. Thatis,thedumbertheretrievalmechanism, thebetter.</Paragraph> <Paragraph position="3"> Below,wereviewtheorthogonalparametersof segmentation, segment order and segment contiguity(SS2). Wethenpresentarangeofbothbagof-wordsandsegmentorder-sensitivestringcom- null parison methods (SS 3) and detail the evaluation methodology (SS 4). Finally, we evaluate the differentmethodsinaJapanese-EnglishTRcontext null</Paragraph> </Section> class="xml-element"></Paper>