File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/h01-1060_evalu.xml

Size: 9,166 bytes

Last Modified: 2025-10-06 13:58:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="H01-1060">
  <Title>Rapidly Retargetable Interactive Translingual Retrieval</Title>
  <Section position="6" start_page="1" end_page="3" type="evalu">
    <SectionTitle>
4. RESULTS
</SectionTitle>
    <Paragraph position="0"> We present results both for component-level performance of our language-independentretargeting modules and an assessmentof the overall retargeting process.</Paragraph>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
4.1 Component-level Evaluation
</SectionTitle>
      <Paragraph position="0"> We applied our retargeting approach and retrieval enhancement techniquesdescribedabove in the context of the first Cross-Language Evaluation Forum's (CLEF) multilingual task. We used the English language forms of the queries to retrieve English, French, German, and Italian documents. Below we present comparative performance measuresfor two of the main processingcomponentsdescribed above - statistical stemming backoff translation - applied to the English-French cross-languagesegment of the CLEF task. The post-translation document expansion component was applied to the smaller Topic Detection and Tracking (TDT-3) collection to improve retrieval of</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="3" type="sub_section">
      <SectionTitle>
Mandarin documents using English.
</SectionTitle>
      <Paragraph position="0"> Our baseline run was conducted as follows. We translated the 44;;000 documents from the 1994 issues of Le Monde. We used the English-French bilingual term list downloaded from the Web at http://www.freedict.com. We then inverted the term list to form a 35,000 term French-English translation resource. We performed the necessary document and term list normalization; in this case, removing accentsfrom document surface forms to enable matching with the un-accentedterm list entries, converting case, and splitting clitic contractions, such as l'horlage, on punctuation. We trained the statistical stemming rules on a sample of the bilingual term list and document collection and applied these rules in stemming backoff. Our default condition was run with top-2 balanced translation using the Brown corpus as a source of target language unigram frequency information. Translated documents were then indexed with  of 4-stage backoff translation with statistical stemming.</Paragraph>
      <Paragraph position="1"> the InQuery (version 3.1p1) system, using the kstem stemmer for English stemming and InQuery's default English stopword list. Long queries were formed by concatenatingthe title, description, and narrative fields of the original query specification. The resulting word sequence was enclosed in an InQuery #sum operator, indicating unweighted sum.</Paragraph>
      <Paragraph position="2"> Our figure of merit for the evaluations below is mean (uninterpolated) average precision computed using trec eval  across the 34 topics in the CLEF evaluation for which relevant French documents are known.</Paragraph>
      <Paragraph position="3">  We first contrast the above baseline system with the effectiveness of an otherwise identical run without the stemming backoff component. Terms in the documents are thus only translated if there is an exact match between the surface form in the document and a surface form in the bilingual term list. We find that mean average precision for unstemmed translation is 0.19 as compared with 0.2919 for our baseline system including stemming backoff based on trained rules. This difference is significant at p&lt;0:05, by paired t-test, two-tailed. The per-query effectiveness is illustrated in Figure 1. Backoff translation improves translation coverage while retaining relatively high precision of matching in contrast to unstemmed effectiveness. null Backoff translation improves cross-languageinformation retrieval effectiveness by improving translation coverage of the terms in the document collection. Using the statistical stemmer, by-token coverage of document terms increased by 7coverage. The different stages of the four-stage backoff process contributed as illustrated in 1. The majority of terms match in the Stage 1 exact match, accounting for 70% of the term instances in the documents. The remaining stages each accountfor between 0.5% and 3% of the document terms, while 20% of document term instances remain untranslatable. However, this relatively small increase in coverage results in the highly significant improvement in retrieval effectiveness above.</Paragraph>
      <Paragraph position="4">  Here we contrast top-2 balanced translation with top-1 translation. We retain statistical stemming backoff for the top-1 translation. We replace each French documentterm with the highest ranked English translation by target languageunigram frequencyin the Brown Corpus as detailed above, retaining the original French term when no translation is found in the bilingual term list. We achieve a mean average precision of 0.2532 in contrast with the baseline condition. This difference is significant atp&lt;0:01by paired t-test, two-tailed. We can effectively incorporate additional translations using top-2 balanced translation without degrading performance by introducing significant additional noise. A query-by-query contrast is presented in Figure 2.</Paragraph>
      <Paragraph position="5">  We evaluatedpost-translation documentexpansionusing the Topic Detection and Tracking (TDT-3) collection. For this evaluation, we used the TDT-1999 topic detection task evaluation framework, but  Available at ftp://ftp.cs.cornell.edu/pub/smart/.</Paragraph>
      <Paragraph position="6"> because out focus in this paper is on ranked retrieval effectiveness we report mean uninterpolated averageprecision rather than the topicweighted detection cost measure typically reported in TDT. In the topic detection task, the system is presented with one or more exemplar stories from the training epoch--a form of query-by-example-and must determine whether each story in the evaluation epoch addresses either the same seminal event or activity or some directly related event or activity. This is generally thought to be a somewhat narrower formulation than the more widely used notion of topical relevance, but it seems to be well suited to query-by-example evaluations. The TDT-1999 tracking task was multilingual, searching stories in both English and Mandarin Chinese, and multi-modal, involving both newswire text and broadcast news audio. We focus on the cross-language spoken document retrieval component of the tracking task, using English exemplars to identify on-topic stories in Mandarin Chinese broadcast news audio. We compare top-1 translation of the Mandarin Chinese stories with and without post-translation document expansion.</Paragraph>
      <Paragraph position="7">  We used the earlier TDT-2 English newswire text collection as our side collection for expansion. We perform topic tracking on 60 topics with 4 exemplarseach. Here, we report the mean average precision on the 55 topics for which there are on-topic Mandarin audio stories. The mean uninterpolated averageprecision for retrieval of unexpandeddocumentsis 0.36 while post-translation document expansion raises this figure to 0.41. This difference is significant at p&lt;0:01by paired t-test, two-tailed. The contrast is illustrated in Figure 3. Interestingly, when we tried this with French, we noted that expansion tended to select terms from the few foreign-language documents that happened to be present in our expansion collection. We have not yet explored that effect in detail, but this observation suggests that the document expansion may be sensitive to the characteristics of the expansioncollection that are not immediately apparent.</Paragraph>
    </Section>
    <Section position="3" start_page="3" end_page="3" type="sub_section">
      <SectionTitle>
4.2 The Learning Curve
</SectionTitle>
      <Paragraph position="0"> We havefound that retargeting can be accomplishedquite quickly (a day without document expansion, three days for TREC-sized collections with document expansion), but only if the required infrastructure is in place. Adapting a system that was developed initially for Chinese to handle French documents required several weeks, with most of that effort invested in development of four-stage back-off translation and statistical stemming. Further adapting the system to handle German documents revealed the importance of compound splitting, a problem that we will ultimately need to address by incorporating a more general segmentationstrategy than we used initially for Chinese. In extending the system to Italian we have found that although our statistical stemmer presently performs poorly in that language, we can achieve quite credible results even with a fairly small (17,313 term) bilingual term list using a freely available Muscat stemmer (which exist for ten languages). So although it is possible in concept to retarget to a new language in just a few days, extending the system typically takes us between one and three weeks because we are still climbing the learning curve.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML