Cross-Language Multimedia Information Retrieval
 
The Cross-Language Retrieval Application
 
Multimedia
 
This paper offers several original contributions to the literature on 
cross information retrieval. First, the choice of application is novel, 
and significant because it simplifies the  problem enough to make it 
tractable. Because the objects retrieved are images and not text, they are 
instantly comprehensible to the user regardless of  issues. This fact 
makes it possible for users to perform a relevance assessment without the need 
for any kind of translation. More important, users themselves can select objects 
of interest, without recourse to translation. The images are, in fact,
 
13
 
fiassociated with caption information, but, even in the monolingual system, few 
users ever even view the captions. It should be noted that most of the images in 
PictureQuest are utilized for advertising and publishing, rather than for news 
applications. Users of history and news photos do tend to check the captions, 
and often users in publishing will view the captions. For advertising, however, 
what the image itself conveys is far more important than the circumstances under 
which it was created. Another significant contribution of this paper is the 
inclusion of a variety of machine translation systems. None of the systems 
tested is a high-end machine translation system: all are freely available on the 
Web. Another key feature of this paper is the careful selection of an accuracy 
measure appropriate to the circumstances of the application. The standard 
measure, percent of monolingual performance achieved, is used, with a firm focus 
on precision. In this application, users are able to evaluate only what they 
see, and generally have no idea what else is present in the collection. As a 
result, precision is of far more interest to customers than recall. Recall is, 
however, of interest to image suppliers, and in any case it would not be prudent 
to optimize for precision without taking into account the recall tradeoff. The 
PictureQuest application avoids several of the major stumbling blocks that stand 
in the way of high-accuracy cross- retrieval. Ballesteros and Croft 
(1997) note several pitfalls common to cross- information retrieval:
 
The dictionary may not contain specialized vocabulary (particularly bilingual 
dictionaries). Dictionary translations are inherently ambiguous and add 
extraneous terms to the query. Failure to translate multi-term concepts as 
phrases reduces effectiveness.
 
In the PictureQuest application, these pitfalls are minimized because the 
queries are short, not paragraph-long descriptions as in TREC (see, e.g., 
Voorhees and Harman 1999). This would be a problem for a statistical approach, 
since the queries present little context, but, since we are not relying on 
context (because reducing ambiguity is not our top priority) it makes our task 
simpler. Assuming that the translation program keeps multi-term concepts intact, 
or at least that it preserves the modifier-head structure, we can successfully 
match phrases. The captions (i.e. the documents to be retrieved) are mostly in 
sentences, and their phrases are intact. The phrase recognizer identifies 
meaningful phrases (e.g. fire engine) and handles them as a unit. The pattern 
matcher recognizes core noun phrases and makes it more likely that they will 
match correctly. Word choice can be a major issue as well for cross- 
retrieval systems. Some ambiguity problems can be resolved through the use of a 
part-of-speech tagger on the captions. As Resnik and Yarowsky (in press) 
observe, part-of-speech tagging considerably reduces the word sense 
disambiguation problem. However, some ambiguity remains. For example, the 
decision to translate a word as car, automobile, or vehicle, may dramatically 
affect retrieval accuracy. The PictureQuest
 
fisystem uses a semantic net based on WordNet (Fellbaum 1998) to expand terms. 
Thus a query for car or automobile will retrieve essentially identical results; 
vehicle will be less accurate but will still retrieve many of the same images. 
So while word choice may be a significant consideration for a system like that 
of Jang et al., 1999, its impact on PictureQuest is minimal. The use of WordNet 
as an aid to information retrieval is controversial, and some studies indicate 
it is more hindrance than help (e.g. Voorhees 1993, 1994, Smeaton, Kelledy and 
O'Donnell 1995). WordNet uses extremely fine-grained distinctions, which can 
interfere with precision even in monolingual information retrieval. In a 
cross- application, the additional senses can add confounding 
mistranslations. If, on the other hand, WordNet expansion is constrained, the 
correct translation may be missed, lowering recall. In the PictureQuest 
application, we have tuned WordNet expansion levels and the corresponding 
weights attached to them so that WordNet serves to increase recall with minimal 
impact on precision (Flank 2000). This tuned expansion appears to be beneficial 
in the cross- application as well. Gilarranz, Gonzalo and Verdejo (1997) 
point out that, for cross- information retrieval, some precision is lost 
in any case, and WordNet is more likely to enhance cross-linguistic than 
monolingual applications. In fact, Smeaton and Quigley (1996) conclude that 
WordNet is indeed helpful in image retrieval, in particular because image 
captions are too short for statistical analysis to be useful. This insight is 
what led us to develop a proprietary image retrieval engine in the first place: 
fine-grained linguistic
 
analysis is more useful that a statistical approach in a caption averaging some 
thirty words. (Our typical captions are longer than those reported in Smeaton 
and Quigley 1996). 3 Translation Methodology
 
We performed preliminary testing using two translation methodologies. For the 
initial tests, we chose European s: French, Spanish, and German. 
Certainly this choice simplifies the translation problem, but in our case it 
also reflects the most pressing business need for translation. For the French, 
Spanish, and German tests, we used Systran as provided by AltaVista (Babelfish); 
we also tested several other Web translation programs. We used native speakers 
to craft queries and then translated those queries either manually or 
automatically and submitted them to PictureQuest. The resulting image set was 
evaluated for precision and, in a limited fashion, for recall. The second 
translation methodology employed was direct dictionary translation, tested only 
for Spanish. We used the same queries for this test. Using an on-line 
Spanish-English dictionary, we selected, for each word, the top (top-frequency) 
translation. We then submitted this wordby-word translation to PictureQuest. 
(Unlike AltaVista, this method spellcorrected letters entered without the 
necessary diacritics.) Evaluation proceeded in the same manner. The word-by-word 
method introduces a weakness in phrase recognition: any phrase recognition 
capabilities in the retrieval system are defeated if phrases are not retained in 
the input. We can assume that the non-Englishspeaking user will, however, 
recognize phrases in her or his own , and look
 
fithem up as phrases where possible. Thus we can expect at least those multiword 
phrases that have a dictionary entry to be correctly understood. We still do 
lose the noun phrase recognition capabilities in the retrieval system, further 
confounded by the fact that in Spanish adjectives follow the nouns they modify. 
In the hombre de negocios example in the data below, both AltaVista and 
Langenscheidt correctly identify the phrase as multiword, and translate it as 
businessman rather than man
 
be worse. In the architecture described here, fixing this problem requires 
access to the internals of the machine translation program.
 4 Evaluation
 
of businesses.
 The use of phrase recognition has been shown to be helpful, and, optimally, we 
would like to include it. Hull and Grefenstette 1996 showed the upper bound of 
the improvements possible by using lexicalized phrases. Every phrase that 
appeared was added to the dictionary, and that tactic did aid retrieval. Both 
statistical co-occurrence and syntactic phrases are also possible approaches. 
Unfortunately, the extra-system approach we take here relies heavily on the 
external machine translation to preserve phrases intact. If AltaVista (or, in 
the case of Langenscheidt, the user) recognizes a phrase and translates it as a 
unit, the translation is better and retrieval is likely to be better. If, 
however, the translation mistakenly misses a phrase, retrieval quality is likely 
to be worse. As for compositional noun phrases, if the translation preserves 
normal word order, then the PicmreQuest-internal noun phrase recognition will 
take effect. That is, ifjeune fille translates as young girl, then PictureQuest 
will understand that young is an adjective modifying girl. In the more difficult 
case, if the translation preserves the correct order in translating la selva 
africana, i.e. the African jungle, then noun phrase recognition will work. If, 
however, it comes out as the jungle African, then retrieval will
 
Evaluating precision and recall on a large corpus is a difficult task. We used 
the evaluation methods detailed in Flank 1998. Precision was evaluated using a 
crossing measure, whereby any image ranked higher than a better match was 
penalized. Recall per se was measured only with respect to a defined subset of 
the images. Ranking incorporates some recall measures into the precision score, 
since images ranked too low are a recall problem, and images marked too high are 
a precision problem. If there are three good matches, and the third shows up as 
#4, the bogus #3 is a precision problem, and the too-low #4 is a recall problem. 
For evaluation of the overall cross- retrieval performance, we simply 
measured the ratio between the cross- and monolingual retrieval accuracy 
(C/M%). This is standard; see, for example, Jang et al. 1999. Table 1 
illustrates the percentage of monolingual retrieval performance we achieved for 
the translation tests performed. In this instance, we take the precision 
performance of the human-translated queries and normalize it to 100%, and adjust 
the other translation modalities relative to the human baseline.
 Language Raw Precision (%) 80 86 C/M
 
French (Human) French (AltaVista) French (Transparent Language)
 
Language French (Intertran) Spanish (Human) Spanish (AltaVista) Spanish 
(Langenscheidt Bilingual Dictionary) German (Human) German (AltaVista)
 
Raw Precision (%) 44 90 53 63
 
C/M
 
simplifies word choice issues. We used a variety o f machine translation 
systems, none of them high-end and all of them free, and nonetheless achieved 
commercially viable results. 5 Appendix: Data Source Example Score 100 0 100 100 
90 (2 of 20 bad) wearing red is lost 75 (5 of 20 bad) 100
 
Human men repairing road AV men repairing wagon Lang. man repair road Human 
woman wearing red shopping in store AV woman dressed red buying in one tends 
Lang. woman clothe red buy in shop
 
Several other factors make the PictureQuest application a particularly good 
application for machine translation technology. Unlike document translation, 
there is no need to match every word in the description; useful images may be 
retrieved even if a word or two is lost. There are no discourse issues at all: 
searches never use anaphora, and no one cares if the translated query sounds 
good or not. In addition, the fact that the objects being retrieved were images 
greatly simplified the endeavor. Under normal circumstances, developing a 
user-friendly interface is a major challenge. Users with only limited (or 
nonexistent) reading knowledge of the  of the documents need a way to 
determine, first, which ones are useful, and second, what they say. In the 
PictureQuest application, however, the retrieved assets are images. Users can 
instantly assess which images meet their needs. In conclusion, it appears that 
simple on-line translation of queries can support effective cross- 
information retrieval, for certain applications. We showed how an image 
retrieval application eliminates some of the problems of cross- 
retrieval, and how carefully tuned WordNet expansion
 
Human cars driving on highway AV cars handling by freeway Lang. cart handle for 
expressway
 
the
 
the 80' (4 of 20 bad) the 0
 
Human lions hunting in the 80 (1 of 5 African forest bad) AV lions hunting in 
the 80 (1 of 5 African forest bad) Lang. lion hunt in thejungle 45 (11 of gSt ] 
I 20 bad)
 
Human juggler using colorful balls AV Lang.
 
67 (1 of 3 bad) juggler with using balls of 50 (4 of 8 colors bad) juggler by 
means of use (0; 1 ball colour should be there)
 
Source Example Score Human blonde children playing 90(#3 with marbles should be 
#1; remainder of top 20
 
Langenscheidt
 
Langenscheidt, word-by-word: 63% (70% normalized) For the Langenscheidt 
word-by-word, we used the bilingual dictionary to translate each word separately 
as if we k n e w no English at all, and always took the first translation. We 
made the following adjustments: 1. Left out "una," since Langenscheidt m a p p e 
d it to "unir" rather than to either a or
 one
 
ok)
 AV Lang. blond children playing with marbles young fair play by means of marble 
90 (2 of 20 bad) 50 (1 of 2 bad)
 
Human buying power spending power AV
 
Lang.
 AV Lang.
 
purchasing power
 
45 (11 of 20 bad) 100
 
2. Translated "e" as a n d instead o f e successful businessman in 60 (8 of 
office 20 bad) successful businessman in 6 (8 of 20 office bad)
 
French
 
H u m a n translations, tested on PictureQuest: 80% AltaVista: 86% (100% 
normalized)
 
Human mother and daughter baking bread in the kitchen AV Lang. mother and 
daughter [horneando-removed] bread in the kitchen mother and child bake bread in 
the kitchen
 
100 (but no full matches) 30 (14 of 20 bad) 100 (but no full matches) 100 0 100
 
Human AV Lang.
 
old age and loneliness oldness and solitude old age and loneliness
 
[French examples originally drawn from http 
://humanities.uchicago.edu/ARTFL/proj ects/academie/1835.searchform.html: 
French-French] Source ~,, ~ Human AV TrLang IntrTran Human AV TrLang IntrTran : 
Example ~ii~l! i!, "~~:s~:: ~ ~ ~ ~ ~' signs of the zodiac signs of the zodiac 
sign zodiaque [signes] any zodiac fish in water fish in water fish in water fish 
at water Score 100 100 0 100 30 (14 of 20 bad) 30 (14 of 20 bad) 30 (14 of 20 
bad) 30 (14 of 20 bad)
 
Spanish
 
H u m a n translations, tested on PictureQuest: 90% (normalize to 100%) 
AltaVista: 53% (59% normalized) Langenscheidt, word-by-word: 63% (70% 
normalized)
 
AltaVista
 
For AltaVista, we left out the words that AltaVista didn't translate.
 
Source
 
Example
 painfulearaches Painful earaches the painful ear evil the [manx] [doreille]' 
distressing
 
Score
 
Human AV TrLang
 
Source AV TrLang IntrTran
 
Example Pleasures of the body the pleasures of the body the delight any body
 
Score 100 100 0
 
IntrTran
 
to take a rabbit by the 65 (7 of 20 ears bad) To take a rabbit by the 65 (7 of 
20
 
Human AV TrLang IntrTran
 
a girl eats fruit a girl eats fruit a girl eats fruit a girl am eating any 
fruit
 
100 100 100 65 (7 of 20 bad)
 
ears take a rabbit by the ears
 
bad) 65 (7 of 20 bad) capture a bunny by the 80 (1 of 5 ears bad)
 
German
 
H u m a n translations, tested on PictureQuest: 80% (100% normalized) AltaVista 
54% (68% normalized) Source Human AV Human AV Example boys golf course golf 
course artificial paradise artificial paradiese Score 95 95 100 0 95 95 ~~ ; , . 
: 90 0 25 (#17
 should
 
Human AV TrLang IntrTran
 
cat which lives in wood Cat which lives in wood cat that lives in wood
 
45 (11 of 20 bad) 45 (11 of 20 bad) 65 (7 of 20
 bad)
 
cat thanksgiving lives at 70 (6 of 20 the forest bad) to leave a house
 
Human
 
60 (8 of 20 bad) AV To leave a house 60 (8 of 20 bad) TrLang to go out of a 
house 95 (1 of 20 bad) IntrTran come out dune' dwelling 90 (18 of 20 house bad)
 
Human solar energy for automobiles AV solar energy for auto ..........~,., . ,~ 
:~,,~ .. ~ . ~.~ ~ Human AV Human hiking through the forest migrations by the 
forest an elephant in a zoo
 
Human AV TrLang I IntrTran
 
carpenter's tool Instrument of carpenter instrument of carpenter implement any 
carpenter
 
95 (1 of 20 bad) 100 100 35 (13 of 20 bad) 100 100 100 0 100
 
AV
 
elephant in the zoo
 
AV Human AV TrLang IntrTran Human to play the violin to play of the violin to 
play the violin gamble any violin pleasures of the body
 
desoxyribonucleic acid the synthesis of the Desoxynribonukleinsaeure black cars 
black auto playing together young together play
 
Human AV Human
 
Source Human AV Human AV
 
Example women in blue Ladies in blue woman at work Ladies on work
 
Score 65 75 65 40
 
Acknowledgements
 
I a m grateful to D o u g Oard for c o m m e n t s on an earlier version o f 
this paper.
 
References

Ballesteros, Lisa, and W. Bruce Croft. 1997. "Phrasal Translation and Query Expansion Techniques for Cross-Language Information Retrieval," in AAAI Spring Symposium on Cross-Language Text and Speech Retrieval, Stanford University, Palo Alto, California, March 24-26, 1997. 

Fellbaum, Christiane, ed., 1998. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.
 
Flank, Sharon. 2000. "Does WordNet Improve Multimedia Information Retrieval?" Working paper 

Flank, Sharon. 1998. "A Layered Approach to NLP Based Information Retrieval," in Proceedings of COLING-ACL, 36th Annual Meeting of the Association for Computational Linguistics, Montreal, Canada, 10-14 August 1998.
 
Gilarranz, Julio, Julio Gonzalo and Felisa Verdejo. 1997. "An Approach to Conceptual Text Retrieval Using the EuroWordNet Multilingual Semantic Database," in AAAI Spring Symposium on CrossLanguage Text and Speech Retrieval, Stanford University, , Palo Alto, California, March 24-26, 1997. (http://www.clis.umd.edu/dlrg/filter/sss/papers) 

Grefenstette, Gregory, ed., 1998. Cross-Language Information Retrieval. Norwell, MA: Kluwer. 

Hull, David A. and Gregory Grefenstette, 1996. "Experiments in Multilingual Information Retrieval," in Proceedings of the 19th International Conference on Research and Development in Information Retrieval (SIGIR96) Zurich, Switzerland. 

Jang, Myung-Gil, Sung Hyon Myaeng, and Se Young Park, 1999. "Using Mutual Information to Resolve Query Translation Ambiguities and Query Term Weighting," in Proceedings of 37th Annual Meeting of the Association for Computational Linguistics, College Park, Maryland. 

McCarley, J. Scott, 1999. "Should We Translate the Documents or the Queries in Cross-Language Information Retrieval?" 

Resnik, Philip and Yarowsky, David, in press. "Distinguishing Systems and Distinguishing Sense: New Evaluation Methods for Word Sense Disambiguation," Natural Language Engineering. 

Smeaton, Alan F., F. Kelledy and R. O'Donnell, 1995. "TREC-4 Experiments at Dublin City University: Thresholding Posting Lists, Query Expansion with WordNet and POS Tagging of Spanish," in Donna K. Harman (ed.) NIST Special Publication 500-236: The Fourth Text REtrieval Conference (TREC-4), Gaithersburg, MD, USA: Department of Commerce, National Institute of Standards and Technology. (http://trec.nist.gov/pubs/trec4/t4_proceedings.html) 

Smeaton, Alan F. and I. Quigley, 1996. "Experiments on Using Semantic Distances Between Words in Image Caption Retrieval," in Proceedings of the 19th International Conference on Research and Development in Information Retrieval (SIGIR96) Zurich, Switzerland. 

Voorhees, Ellen M. 1994. "Query Expansion Using Lexical-Semantic Relations," in Proceedings of the 17th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 61-70. 

Voorhees, Ellen M. 1993. "Using WordNet to Disambiguate Word Senses for Text Retrieval," in Proceedings of the 16th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 171-180. 

Voorhees, Ellen M. and Donna K. Harman, editors, 1999� The 7th Text Retrieval Conference (TREC- 7).
