XML Viewer - x98-1018

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/x98-1018_metho.xml
Size: 16,622 bytes
Last Modified: 2025-10-06 14:15:20
<?xml version="1.0" standalone="yes"?>
<Paper uid="X98-1018">
  <Title>DYNAMIC DATA FUSION</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
DEFINITIONS AND RESEARCH QUESTIONS
</SectionTitle>
    <Paragraph position="0"> We define a q,ery as a natural language expression of a user's need. For sonic query and some collection of documents, it is possible for a human to attribute the relevance of the document to the query. A retrieral system is a machine that accepts a query and full texts of documents, and produces, for each document, a relevance score for the query-document pair. A measure of the effectiveness of a retrieval system for a query and a collection is precision, the proportion of the N documents with the highest relevance scores that are relevant (in our study, N is 5, 10, or 30).</Paragraph>
    <Paragraph position="1"> Using multiple retrieval systems produces multiple retrieval scores for a query-document pair.</Paragraph>
    <Paragraph position="2"> A fitsion fimction accepts these scores as its inputs, and produces a single relevance score as its output for the query-document pair. A staticfiisionfimction has only the relevance scores for a single query-document pair as its inputs. A dynamic filsion fimction can have more inputs.</Paragraph>
    <Paragraph position="3"> We are concerned with the following two questions: If we allow each query its own static fusion function, can we achieve higher precision than if we force all queries to have the same static fusion function? . If we can achieve higher precision by allowing each query its own static fusion function, then what inputs or tkmtures would enable us to construct a dynamic fusion function that adjusts to the query, the documents retrieved by the retrieval systems, and the distribution of scores produced by the retrieval systems'?</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
THE DATA
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="6" start_page="0" end_page="124" type="metho">
    <SectionTitle>
Judgments
</SectionTitle>
    <Paragraph position="0"> We used 247 queries, including TREC I-6 training queries, and queries developed bv business analysts for TcxtWisc's internal use. We appticd these queries to the TREC Wall Street Journal collection fronl (1986-1992). For the TREC queries.</Paragraph>
    <Paragraph position="1"> we used only TREC relevance judgments. Relevance  judgments for the TextWise queries were initially made on a 5-point scale, which we mapped to the binary judgments used by TREC.</Paragraph>
    <Paragraph position="2"> Several of the retrieval systems described below used a document segmentation scheme to split compound documents into their components, resulting in a collection size of 222,525. For these systems, retrieval scores were calculated separately for the components of compound documents, and then merged by taking the maximum component score, thus mapping back to the original document space of 173,252.</Paragraph>
    <Section position="1" start_page="123" end_page="124" type="sub_section">
      <SectionTitle>
Retrieval Systems
</SectionTitle>
      <Paragraph position="0"> We used five retrieval systems to generate relevance scores for query-document pairs: Fuzzy Boolean (FB). This system translates a query into a Boolean expression in which the terminals are single terms, compound nominals, and proper nouns; instantiates the terminals in the expression with the document's tfidfweights; and applies fuzzy Boolean semantics to resolve the instantiated expression into a scalar relevance score.</Paragraph>
      <Paragraph position="1"> Probabilistic (PRB). This system applies a match formula that sums term frequencies of query terms in the document, weighted by terms' inverse document frequencies, and adjusts tor document length. We applied this formula to a vocabulary of single terms. Subiect Field Code (SFC). This system applies a vector similarity metric to query and document representations in TextWise's Subject Field Code space to obtain relevance scores.</Paragraph>
      <Paragraph position="2"> N-gram (NG3). This system applies a vector similarity metric to query and document representations obtained by counting the occurrences of 3-letter sequences (after squeezing out blanks, newlines, and other non-alphabetic characters I.</Paragraph>
      <Paragraph position="3"> Latent Semantic Indexing (LSI). This system obtains query and document representations by applying a translation matrix to single terms (excluding compound nominals and proper nouns). We obtained the translation matrix by singular value decomposition of a matrix of (.idf weights for single terms from a 1/3 sample of the Wall Street Journal.</Paragraph>
      <Paragraph position="4"> We used a vector similarity metric to obtain relevance scores.</Paragraph>
      <Paragraph position="5"> Query and Document Representations We used the following procedures to process the queries and documents into tbrms that enabled application of matching formulae to produce relevance scores: Document Segmentation. We used either the original document segmentation from the TREC data or a more aggressive segmentation that split compound documents into their components.</Paragraph>
      <Paragraph position="6"> Stop Word Removal. For all but one retrieval system, we removed stopwords.</Paragraph>
      <Paragraph position="7"> Stemming. For the various retrieval systems, we used the Xerox stemmer, the Stone stemmer, or we obtained word roots as a byproduct of constructing trigrams.</Paragraph>
      <Paragraph position="8"> Phrase Reco,~nition. For some retrieval systems, we used a set of part-of-speech-based rules to detect and aggregate sequences of tokens into compound nominal phrases.</Paragraph>
      <Paragraph position="9"> Proper Nouns. For some retrieval systems, we detected proper nouns, and normalized multiple expressions of the same proper noun entity to a canonical form.</Paragraph>
      <Paragraph position="10"> Term Weit~htin~. In documents, weights represented the frequency of terms in the document, conditioned by the number of documents in which the terms  appeared ( tf id~.</Paragraph>
      <Paragraph position="11"> Dimension Reduction. We used single words to translate into weightings in a 900-dimensional feature space using TextWise's Subject Field Coder (SFC), or into a 167-dimensional feature space using Latent Semantic Indexing (LSI).</Paragraph>
      <Paragraph position="12"> Table 1 summarizes the query representations, document representations, and matching semantics used by the five matchers.</Paragraph>
      <Paragraph position="13"> Dynamic Fusion Function Input Features In addition to the five relevance score inputs to the dynamic fusion function, we used the following inputs: Query Features Several items of information might be available about the query independently of any particular retrieval approach or its representation of the query, the documents, or their similarity: Query Length (QLEN). The number of tokens in the natural language query.</Paragraph>
      <Paragraph position="14"> Query Terms' Specificity (QTSP). The average inverse document frequency (IDF) of the quartile of the query's terms with the highest IDF's.</Paragraph>
      <Paragraph position="15"> Number of Proper Nouns (QNPN).</Paragraph>
      <Paragraph position="16"> Number of Compound Nominals (QNCN).</Paragraph>
      <Paragraph position="17"> Query Terms&amp;quot; Synonymy (QTSY). Over all terms in the query, the average of the number of words in the svnset for the correct sense of the query term in WordNet. WordNet is a semantic knowledge base that distinguishes words by their senses, and groups word:senses that are synonymous to each other into synsets.</Paragraph>
      <Paragraph position="18"> Query Terms' Polyscmv (QTPL), Over all terms in query, the average number of senses for the query term in WordNet,</Paragraph>
    </Section>
    <Section position="2" start_page="124" end_page="124" type="sub_section">
      <SectionTitle>
Document Features
</SectionTitle>
      <Paragraph position="0"> There is currently one document feature, instantiated separately for each query, for each retrieval system S: Length of Top-Ranked Documents Retrieved by System (DLEN\[S\]). This is the average of the number of tokens in the top 5 documents scored by</Paragraph>
    </Section>
    <Section position="3" start_page="124" end_page="124" type="sub_section">
      <SectionTitle>
system S.
Score Distributions
</SectionTitle>
      <Paragraph position="0"> The following features are instantiated once for each retrieval system S, for each query: Maximum Score Assigned by Approach (SMAX\[S\]).</Paragraph>
      <Paragraph position="1"> Variance of Scores Assigned by Approach (SVAR\[S1).</Paragraph>
      <Paragraph position="2"> The lbllowing input to the dynamic fusion function is instantiated once for each pair of retrieval systems S~ and S,: Correlation of Ranks Assi,~ned to Documents by Two Approaches (SCOR\[St, Sz\]..~ For documents ranked in the top 1,000 by any of the retrieval systems for the query, the correlation of the documents' ranks in systems S~ and $2.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="124" end_page="126" type="metho">
    <SectionTitle>
RESEARCH QUESTION 1: OPPORTUNITY
FOR IMPROVING RETRIEVAL
</SectionTitle>
    <Paragraph position="0"> For a sample of 50 queries from our 297, we lbund, separately for each query, an optimal static fusion function. We then lound the single optimal static fusion function that gave the best precision over all 50 queries. Table 2 shows the precision for the 50 queries using the 5 retrieval systems  separately, using a single overall static fusion function, and using 50 (possibly) different query-specific static functions.</Paragraph>
    <Paragraph position="1"> At first glance, our results suggest that allowing query-specific fusion functions substantially improves retrieval. For instance, by using query-specific static fusion functions, we achieved precision at 5 of .5960, compared to .3840 when applying the same static fusion function to all queries. However, this comparison is overly optimistic, since it allows query-specific fusion functions to be trained and evaluated on exactly the same data, while forcing the overall fusion function to be trained on a large set of data, but then evaluated on a small subset of that data. To provide a more pessimistic comparison, we partitioned the data for our 50 queries into equallysized training and test sets. We trained each query-specific fusion function on the query's training data, and evaluated it on the test data. (Although our goal is to improve retrospective retrieval, this arrangement resembles the TREC routing scenario.) Table 3 shows a considerably weaker, but still appreciable improvement due to using query-specific fusion functions. For instance, we achieved precision at 5 of .4160 when allowing each query its own static fusion function, compared to .3400 when forcing all queries to use the same function.</Paragraph>
    <Paragraph position="2"> dimensions are the relevance scores from the set of matchers. We constructed a fused score for a test document by summing the relevance judgments for the test document's K nearest training documents (where K was 5, 10, 15 or 20). We tried weighting the sums by an inverse function of the distance between the test document and the training document. We also tried scaling the dimensions' contribution to the distance metric with a weight reflecting the corresponding matcher's precision.</Paragraph>
    <Paragraph position="3"> To our surprise, none of these experiments produced K-NN-based fusion functions that performed consistently better than a linear fusion function. On closer inspection, it appears that at least part of the poor performance of K-NN as a fusion function can be attributed to instances in which the probability distribution of relevance for the training documents for the query did not resemble the probability distribution of relevance for all the documents in the query. In this sort of situation, the linear model appears to be more robust than K-NN.</Paragraph>
    <Paragraph position="4"> It may be that a more careful selection of the training set would result in more reasonable performance from K-NN-based fusion functions.</Paragraph>
    <Paragraph position="5"> For the linear fusion function, we found the optimal vector of coefficients by selecting the coefficients that produce the greatest precision at 5  We constrained our fusion functions to be weighted linear combinations of the five retrieval scores lk)r a query-document pair. We considered the possibility of more complex non-linear fusion models through exploration of K-Ncarest Neighbor (K-NN) classifiers. (The use of K-NN tbr selecting a single retrieval system has been documented in \[5\]. By contrast, we sought to use K-NN to fuse relevance scores.) In this approach, training documents and their rclevance judgments populatcd a space whose (the proportion of the five top-ranked documents that are relevantl. To date, we have lk~und the optimal vector using an exhaustive search over the set of vectors whose elements are non-negative, evenly divisible by 0.1. and whose elements sum to 1.0.</Paragraph>
    <Paragraph position="6"> (We had tried using logistic regression to find the coefficients, but the coefficients we found in this manner yielded considerably lower precision than those we found using the exhaustive search method.)  In sum, it appears that for our selection of retrieval systems, there is a potential for improving retrieval through query-specific fusion.</Paragraph>
    <Paragraph position="7"> One way to exploit this opportunity is to use initially-retrieved documents to adjust the weights of the single overall static fusion function, as in \[3\]. Although we tried several ways of updating fusion function coefficients with relevance feedback, we were unable to exploit any of the apparent potential to improve retrieval performance in this way.</Paragraph>
    <Paragraph position="8"> distribution of the retrieval systems' retrieval scores for the query enumerated above. We are currently working on building such a dynamic fusion function.</Paragraph>
    <Section position="1" start_page="126" end_page="126" type="sub_section">
      <SectionTitle>
Dynamic Fusion Function Architecture
</SectionTitle>
      <Paragraph position="0"> We chose to implement the dynamic fusion function as a hybrid of a &amp;quot;mixture expert&amp;quot; and the static linear fusion models used in Research Question I. The mixture expert attempts to predict the best coefficients to use for the linear fusion function.</Paragraph>
      <Paragraph position="1"> Figure I shows the relationship of the mixture expert</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="126" end_page="127" type="metho">
    <SectionTitle>
RESEARCH QUESTION 2: THE DYNAMIC
FUSION FUNCTION
</SectionTitle>
    <Paragraph position="0"> So far, optimal fusion coefficients for a query have been determined using full knowledge of the relevance of the documents for the query. In the retrospective retrieval setting, these relevance judgments will not be available beforehand, and thus cannot be used to adjust the fusion model to the query. For the retrospective setting, we seek to construct a dynamic fusion function that can adjust the way it fuses the five systems' relevance scores for a query-document pair using additional inputs. These inputs include the Icatures of the query, features of the retrieved documents, and features of the joint to the linear fusion model and the individual retrieval systems.</Paragraph>
    <Section position="1" start_page="126" end_page="127" type="sub_section">
      <SectionTitle>
Training and Evaluation
</SectionTitle>
      <Paragraph position="0"> We use the remaining 197 queries lor training.</Paragraph>
      <Paragraph position="1"> For these queries, we have used all the documents to find coefficient vectors for optimal linear static fusion models. These coefficient vectors constitute the &amp;quot;target&amp;quot; outputs the mixture expert will be trained to reproduce.</Paragraph>
      <Paragraph position="2"> We also fit a single linear static fusion function to the 197 training queries, again using all the data from those queries. The performance of this static fusion function on all of the documents for the 50 test  queries constitutes the baseline for the second research question. To answer this research question, we will compare the performance of the dynamic fusion function for the 50 test queries to this baseline.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML