File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/06/p06-1075_relat.xml

Size: 4,750 bytes

Last Modified: 2025-10-06 14:15:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1075">
  <Title>The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval</Title>
  <Section position="4" start_page="593" end_page="593" type="relat">
    <SectionTitle>
2 Related Work
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="593" end_page="593" type="sub_section">
      <SectionTitle>
2.1 Effect of Translation Resources
</SectionTitle>
      <Paragraph position="0"> Previous studies have explored the effect of translation resources such as bilingual wordlists or parallel corpora on CLIR performance.</Paragraph>
      <Paragraph position="1"> Xu and Weischedel (2000) measured CLIR performance as a function of bilingual dictionary size. Their English-Chinese CLIR experiments on TREC 5&amp;6 Chinese collections showed that the initial retrieval performance increased sharply with lexicon size but the performance was not improved after the lexicon exceeded 20,000 terms. Demner-Fushman and Oard (2003) identified eight types of terms that affected retrieval effectiveness in CLIR applications through their coverage by general-purpose bilingual term lists. They reported results from an evaluation of the coverage of 35 bilingual term lists in news retrieval application. Retrieval effectiveness was found to be strongly influenced by term list size for lists that contain between 3,000 and 30,000 unique terms per language.</Paragraph>
      <Paragraph position="2"> Franz et al. (2001) investigated the CLIR performance as a function of training corpus size for three different training corpora and observed approximately logarithmically increased performance with corpus size for all the three corpora. Kraaij (2001) compared three types of translation resources for bilingual retrieval based on query translation: a bilingual machine-readable dictionary, a statistical dictionary based on a parallel web corpus and the Babelfish MT service. He drew a conclusion that the mean average precision of a run was proportional to the lexical coverage. McNamee and Mayfield (2002) examined the effectiveness of query expansion techniques by using parallel corpora and bilingual wordlists of varying quality. They confirmed that retrieval performance dropped off as the lexical coverage of translation resources decreased and the relationship was approximately linear.</Paragraph>
      <Paragraph position="3"> Previous research mainly focused on studying the effectiveness of bilingual wordlists or parallel corpora from two aspects: size and lexical coverage. Kraaij (2001) examined the effectiveness of MT system, but also from the aspect of lexical coverage. Why lack research on analyzing effect of translation quality of MT system on CLIR performance? The possible reason might be the problem on how to control the translation quality of the MT system as what has been done to bi-lingual wordlists or parallel corpora. MT systems are usually used as black boxes in CLIR applications. It is not very clear how to degrade MT software because MT systems are usually optimized for grammatically correct sentences rather than word-by-word translation.</Paragraph>
    </Section>
    <Section position="2" start_page="593" end_page="593" type="sub_section">
      <SectionTitle>
2.2 MT-Based Query Translation
</SectionTitle>
      <Paragraph position="0"> MT-based query translation is perhaps the most straightforward approach to CLIR. Compared with dictionary or corpus based methods, the advantage of MT-based query translation lies in that technologies integrated in MT systems, such as syntactic and semantic analysis, could help to improve the translation accuracy (Jones et al., 1999). However, in a very long time, fewer experiments with MT-based methods have been reported than with dictionary-based methods or corpus-based methods. The main reasons include: (1) MT systems of high quality are not easy to obtain; (2) MT systems are not available for some language pairs; (3) queries are usually short or even terms, which limits the effectiveness of MT-based methods. However, recent research work on CLIR shows a trend to adopt MT-based query translation. At the fifth NTCIR workshop, almost all the groups participating in Bilingual CLIR and Multilingual CLIR tasks adopt the query translation method using MT systems or machine-readable dictionaries (Kishida et al., 2005). Recent research work also proves that MT-based query translation could achieve comparable performance to other methods (Kishida et al., 2005; Nunzio et al., 2005). Considering more and more MT systems are being used in CLIR, it is of significance to carefully analyze how the performance of MT system may influence the retrieval effectiveness.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML