File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-1312_intro.xml
Size: 1,898 bytes
Last Modified: 2025-10-06 14:01:07
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1312"> <Title>Cross-lingual Information Retrieval using Hidden Markov Models</Title> <Section position="3" start_page="0" end_page="95" type="intro"> <SectionTitle> 2 HMM for Mono-Lingual Retrieval </SectionTitle> <Paragraph position="0"> Following Miller et al., 1999, the IR system ranks documents according to the probability that a document D is relevant given the query Q, P(D is R IQ). Using Bayes Rule, and the fact that P(Q) is constant for a given query, and our initial assumption of a uniform a priori probability that a document is relevant, ranking documents according to P(Q\[D is R) is the same as ranking them according to P(D is RIQ). The approach therefore estimates the probability that a query Q is generated, given the document D is relevant. (A glossary of symbols used appears below.) We use x to represent the language (e.g.</Paragraph> <Paragraph position="1"> English) for which retrieval is carried out.</Paragraph> <Paragraph position="2"> According to that model of monolingual retrieval, it can be shown that p(Q \[ D is R) = II (aP(W \[ Gx) + (1- a)e(w I D)), W inQ where W's are query words in Q. Miller et al. estimated probabilities as follows: * The transition probability a is 0.7 using the EM algorithm (Rabiner, 1989) on the TREC4 ad-hoc query set.</Paragraph> <Paragraph position="3"> number of occurrences of W in C x * e0e IGx)= length of Cx which is the general language probability for word W in language x.</Paragraph> <Paragraph position="4"> number of occurrences of W in D * e(WlD) = length of D In principle, any large corpus Cx that is representative of language x can be used in computing the general language probabilities. In practice, the collection to be searched is used for that purpose. The length of a collection is the sum of the document lengths.</Paragraph> </Section> class="xml-element"></Paper>