File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/p01-1004_evalu.xml
Size: 11,145 bytes
Last Modified: 2025-10-06 13:58:44
<?xml version="1.0" standalone="yes"?> <Paper uid="P01-1004"> <Title>Low-cost, High-performance Translation Retrieval: Dumber is Better</Title> <Section position="5" start_page="1" end_page="1" type="evalu"> <SectionTitle> 5 Results and Supporting Evidence </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 5.1 Basicevaluation Inthissection,wetestourfivestringcomparison </SectionTitle> <Paragraph position="0"> methodsovertheconstructionmachinerycorpus, under both character- and word-based indexing, and with each of unigrams, bigrams and mixed unigrams/bigrams. The retrieval accuracies and times for the different string comparison methods are presented in Figs. 1 and 2, respectively. the vector space model, &quot;TINT&quot; to token intersection, &quot;3opD&quot;to3-opedit distance, &quot;3opS&quot;to 3-op edit similarity, and &quot;WSC&quot; to weighted sequential correspondence; the bag-of-wordsmethodsarelabelledinitalicsandthesegmentorder- null sensitivemethodsinbold. InFigs.1and2,results forthe three N-grammodels arepresentedseparately,withineachofwhich,thedataissectioned null off into the different stringcomparisonmethods.</Paragraph> <Paragraph position="1"> Weighted sequential correspondence was tested Based on the above results, we judge bi-grams to be the best segment contiguity model for character-based indexing, and mixed uni- null resultsoverdifferentdatasets. Finally,wepresent a brief qualitativeexplanation forthe overallresults. null</Paragraph> </Section> <Section position="2" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 5.2 Theeffectsofsegmentationand </SectionTitle> <Paragraph position="0"> lexicalnormalisation Above, we observed that segmentation consistentlybroughtaboutadegradationintranslation null retrieval for the given dataset. Automated segmentationinevitablyleadstoerrors,whichcould null possibly impinge on the accuracy of word-based indexing. Alternatively, the performance drop couldsimplybecausedsomehowbyourparticular choiceofsegmentationmodule,thatisChaSen.</Paragraph> <Paragraph position="1"> First, we used JUMAN to segment the constructionmachinerycorpus,andevaluatedthere- null sultant dataset in the exact same manner as for the ChaSen output. Similarly, we ran a developmentversionofALTJAWSoverthesamecor- null pustoproducetwodatasets,thefirstsimplysegmented and the secondboth segmentedand lexically normalised. By lexical normalisation, we meanthateachwordisconvertedtoitscanonical form. Themainsegmenttypesthatnormalisation hasan effect onareverbsand adjectives(conju null are presented in Fig. 3, juxtaposed against the retrieval accuracies for character-based indexing (bigrams) and also ChaSen (mixed uni-</Paragraph> </Section> <Section position="3" start_page="1" end_page="1" type="sub_section"> <SectionTitle> LookingfirsttotheresultsforJUMAN,thereis </SectionTitle> <Paragraph position="0"> againinaccuracyoverChaSenforallstringcomparison methods. With ALTJAWS, also, a consistentgaininperformanceisevidentwithsimple null segmentation,thedegreeofwhichissignificantly higher than for JUMAN. The addition of lexicalnormalisationenhancesthiseffectmarginally. null Notice that character-based indexing (based on character bigrams) holds a clear advantage over thebestoftheword-basedindexingresultsforall stringcomparisonmethods.</Paragraph> <Paragraph position="1"> Basedontheabove,wecanstatethatthechoice of segmentation system does have a modest impactonretrievalaccuracy,butthattheeffectsof null lexicalnormalisationarehighlylocalised. In the following,welooktoquantifytherelationshipbetweenretrievalandsegmentationaccuracy. null One slight complication in evaluating the output of the three systems is that they adopt incongruentmodelsofconjugation. Wethusmade allowanceforvariationintheanalysisofverband adjectivecomplexes,andfocusedonthesegmentationofnouncomplexes. null A performance breakdown for ChaSen (CS), JUMAN(JM)andALTJAWS(AJ)ispresentedin Tab.1. ALTJAWSwasfoundtooutperformthe remaining two systems in terms of segment precision, while ChaSen and JUMAN performed at theexactsamelevelofsegmentprecision. Looking next to segment recall, ChaSen significantly outperformedbothALTJAWSandJUMAN.The sourceof almost all errorsin recall, and roughly half of errors in precision for both ChaSen and English. ALTJAWS, on the other hand, was remarkablysuccessfulatsegmentingkatakanaword null sequences,achievingasegmentprecisionof100% and segment recall approaching 99%. This is thoughttohavebeenthemaincauseforthedisparityinretrievalaccuracyforthethreesystems, null aggravated by the fact that most katakana sequenceswerekeytechnicalterms. null Togainaninsightintoconsistencyinthecase of error, we further calculated the total number ofsegmenttypesintheoutput,expectingtofind a core set of correctly-analysedsegments, of relativelyconstantsizeacrossthedifferentsystems, null plus an unpredictable component of segment errors,ofvariablesize. Thesystemgeneratingthe fewest segment types can thus be said to be the mostconsistent.</Paragraph> <Paragraph position="2"> Based on the segment type counts in Tab. 1, ALTJAWS errs more consistently than the remaining two systems, and there is very little to separateChaSenandJUMAN.Thisisthoughtto havehadsomeimpactontheinflatedretrievalaccuracyforALTJAWS. null To summarise, there would seem to be a direct correlation between segmentation accuracy andretrievalperformance,withsegmentationaccuracyonkeyterms(katakanasequences)having null aparticularlykeeneffectontranslationretrieval.</Paragraph> <Paragraph position="3"> In this respect, ALTJAWS is superior to both ChaSenandJUMANforthetargetdomain. Additionally,complementingsegmentationwithlex- null icalnormalisationwouldseemtoproducemeager performancegains. Lastly,despitetheslightgains toword-basedindexingwiththedifferentsegmentation systems, it is still significantly inferior to character-basedindexing.</Paragraph> </Section> <Section position="4" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 5.3 Scalabilityofperformance Allresultstodatehavearisenfromevaluationover </SectionTitle> <Paragraph position="0"> asingledatasetoffixedsize. Inordertovalidate the basic findings from above and observe how increases in the data size affect retrieval performance,wenextranthestringcomparisonmeth- null odsoverdiffering-sizedsubsetsoftheJEIDAcorpus. null WesimulateTMsofdifferingsizebyrandomly splitting the JEIDA corpus into ten partitions, and running the various methods first over partition1,thenoverthecombinedpartitions1and null 2,andsoonuntilalltenpartitionsarecombined togetherintothefullcorpus. Wetestedallstring comparisonmethodsotherthanweightedsequential correspondence over the ten subsets of the JEIDA corpus. Weighted sequential correspondence was excluded from evaluation due to its overall sub-standard retrieval performance. The translation accuracies for the different methods in Fig. 4, with each string comparison method tested under characterbigrams (&quot;2-gram [?]seg&quot;) and mixed word unigrams/bigrams (&quot;1/2-gram +seg&quot;) as above. The results for token intersec- null ing to make more subtle choices as to the final translationcandidate.</Paragraph> <Paragraph position="1"> One key trend in Fig. 4 is the superiority of character- over word-based indexing for each of the three string comparison methods, at a relativelyconstantlevel asthe TM sizegrows. Also of interest is the finding that there is very little to distinguish bag-of-words from segment order-sensitive methods in terms of retrieval accuracy intheirrespectivebestconfigurations.</Paragraph> <Paragraph position="2"> As with the original dataset from above, 3-operation edit similarity was the strongest performer just nosing out (character bigram-based) VSM forline honours, with 3-operationedit distancelaggingwellbehind. null Next, we turn to consider the mean unit retrieval times for eachmethod, under the twoindexingparadigms. TimesarepresentedinFig.5, plottedonceagainonalogarithmicscaleinorder tofitthefullfan-outofretrievaltimesontoasingle graph. VSM and 3-operation edit distance were themostconsistentperformers,bothmaintaining retrievalspeedsinlinewiththosefortheoriginal datasetataroundorunder1.0(i.e.thesameretrievaltimeperinputas3-operationeditdistance null runoverwordunigramsfortheconstructionmachinery dataset). Most importantly, only minor increases in retrieval speed were evident as the TM size increased, which were then reversed for the larger datasets. All three string comparison methods displayed this convex shape, although the final running time for 3-operation edit similarity under character- and word-based indexing order-sensitive methods. We are still no closer, however, to determining why this should be the case. Here,weseektoprovideanexplanationfor theseintriguingresults.</Paragraph> <Paragraph position="3"> Firstcomparingcharacter-andword-basedindexing, we found that the disparity in retrieval accuracy was largely related to the scoring of katakanawords,whicharesignificantlylongerin characterlengththannativeJapanesewords. For the construction machinery dataset as analysed with ChaSen, for example, the average character length of katakana words is 3.62, as compared to 2.05 overall. Under word-based indexing, all words are treated equally and character length does not enter into calculations. Thus a katakana word is treated identically to any other word type. Under character-based indexing, onthe otherhand, the longerthe word, the moresegmentsitgenerates,andasinglematching katakanasequencethustendstocontributemore heavily to the final score than other words. Effectively, therefore, katakana sequences receive a higherscorethankanjiandothersequences,producing apreferenceforTRecswhich incorporate the same katakana sequences as the input. As notedabove,katakanasequencesgenerallyrepresentkeytechnicalterms,andsuchweightingthus null tendstobebeneficialtoretrievalaccuracy.</Paragraph> <Paragraph position="4"> Wenextexaminethereasonforthehighcorrelationinretrievalaccuracybetweenbag-of-words null andsegmentorder-sensitivemethodsintheiroptimum configurations (i.e. when coupled with character bigrams). Essentially, the probability of a given segment set permuting in different string contexts diminishes as the number of co-occurring segments decreases. That is, for a given string pair, the greater the segment overlap between them (relative to the overall string lengths),thelowertheprobabilitythatthosesegments are going to occur in different orderings. This is particularly the case when local segment contiguity is modelled within the segment description,asoccursforthecharacterbigramand null mixedworduni/bigrammodels. Forhigh-scoring matches, therefore, segment ordersensitivity becomes largely superfluous, and the slight edge in retrieval accuracy for segment order-sensitive methods tends to comefor mid-scoringmatches, inthevicinityofthetranslationutilitythreshold.</Paragraph> </Section> </Section> class="xml-element"></Paper>