File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/c04-1099_evalu.xml

Size: 4,765 bytes

Last Modified: 2025-10-06 13:59:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1099">
  <Title>Query Translation by Text Categorization</Title>
  <Section position="5" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
4 Results and Discussion
</SectionTitle>
    <Paragraph position="0"> Evaluations are computed by retrieving the flrst 1000 documents for each query. In flgure 1, we provide the average precision of each CLIR run depending on the threshold value. The maximum of the average precision is reached when three MeSH terms are selected per query (0.1925), but we can notice that selecting only two terms is as efiective (0.19). On the contrary, selecting the unique top returned term is not su-cient (average precision is below 0.145), and adding more than three terms smoothly degrade the precision, so that with 25 terms, precision falls below 0.15. Table 3 compares the results to the baseline, i.e. the score of the monolingual information retrieval system (MLIR). The relative score (CLIR Ratio) of the system which selects only three terms is 80% (THR-3), and should be contrasted with the score obtained by the MT system8 (59.7%). In the same table, we observe that using a linear function (THR-F), to compute the number of terms to select, results in a very modest improvement as compared to using the best performing static value (82.2% vs. 80%): it means that using a dynamic threshold is not really more efiective than translating only the top 3 MeSH concepts. This moderate efiectiveness may be due to the fact that OHSUMED queries roughly have a similar length. In contrast, we could expect that querying with very short (one word) and very long queries (querying by documents) could justify the use of a length-dependent threshold.</Paragraph>
    <Paragraph position="1"> In a last experiment, we try to combine the two translation strategies: the translation provided by selecting three terms is simply added to the translation provided by the MT system.</Paragraph>
    <Paragraph position="2"> In table 3, a signiflcant improvement (THR3 +  of terms are translated by concept mapping.</Paragraph>
    <Paragraph position="3"> MT = 91.8%) is observed as compared to each single strategies. It seems to conflrm that at least some of the words, which are not translated or not properly translated by the text categorizer are well translated by the commercial system.</Paragraph>
    <Paragraph position="4"> For example, if we consider a French query such as \anPemie - anPemie ferriprive, quel examen est le meilleur&amp;quot; (OHSUMED ID = 97: \anemia - iron deflciency anemia, which test is best&amp;quot;), the ranked list of English MeSH term returned by the categorizer is (most similar terms flrst, with N = 3): anemia; anemia, iron-deflciency; anemia, neonatal.</Paragraph>
    <Paragraph position="5"> We also observe that an important word like test is missing from the list of terms, while on the opposite a less relevant term like anemia, neonatal is provided. Now, if we consider the translation supplied by MT, the above query becomes \weaken - weakens ferriprive, which examination is the best&amp;quot;: although this translation is far from perfect, it is interesting to remark that part of the sense expressed by the word test in the English query can be somehow found in words such as examination and best. Further, it is also of interest to notice that most of the erroneously translated content (weaken - ferriprive) is very unlikely to afiect the document retrieval for this query: ferriprive as a French word will be ignored, while weaken is of marginal content.</Paragraph>
    <Paragraph position="6"> Volk et al. (2002) works with a related collection but using German queries, they observe that morphological analysis was efiective and report on a CLIR ratio above 80% (MLIR = 0.3543; CLIR = 0.2955). Directly related to our experiments, Eichmann et al. (1998) use the same benchmarks and similar terminological resources, but rely on a word-by-word transfer lexicon constructed from the UMLS. The average precision of their system using French queries is 0.1493, what results in a CLIR ratio of 62% 9. Because we use the same benchmarks and resources and because our monolingualbaselinesarequitesimilar, themethodological difierence must be underlined: while Eichmann and al. rely on a word to word transfer lexicon, our system aims at breaking the bag of word limitation by translating multiwords terms. Finally, we also observe that the combined system is able to take advantage of existing multilingual vocabulary without assuming any prior terminological knowledge from the user, so that usual problems associated with controlled vocabularies (cf. the introduction) are mutually solved in the proposed architecture. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML