File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/n04-4024_metho.xml

Size: 10,006 bytes

Last Modified: 2025-10-06 14:08:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-4024">
  <Title>Direct Maximization of Average Precision by Hill-Climbing, with a Comparison to a Maximum Entropy Approach</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 The Weight Search Algorithm
</SectionTitle>
    <Paragraph position="0"> The general behavior of the weight search algorithm is similar to the maximum entropy modeling described in Section 2 given a document corpus and a term vector, it seeks to maximize average precision by choosing a weight vector that orders the documents optimally. Unlike the maximum entropy approach, the weight search algorithm hill-climbs directly on average precision.</Paragraph>
    <Paragraph position="1"> The core of the algorithm is an exhaustive search of a single direction in weight space. Although each direction is continuous and unbounded, we show that the search can be performed with a nite amount of computation.</Paragraph>
    <Paragraph position="2"> This technique arises from a natural geometric interpretation of changes in document ordering and how they affect average precision.</Paragraph>
    <Paragraph position="3"> At the top level, the algorithm operates by cycling through different directions in weight space, performing an exhaustive search for a maximum in each direction, until convergence is reached. Although a global maximum is found in each direction, the algorithm relies on a greedy assumption of unimodality and, as with the maximum entropy model, is not guaranteed to nd a global maximum in the multi-dimensional space.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Framework
</SectionTitle>
      <Paragraph position="0"> This section formalizes the notion of weight space and what it means to search for maximum average precision within it.</Paragraph>
      <Paragraph position="1"> Queries in information retrieval can be treated as vectors of terms t1; t2; ; tN . Each term is, as the name suggests, an individual word or phrase that might occur in the document corpus. Every term ti has a weight i determining its importance relative to the other terms of the query. These weights form a weight vector = h 1 2 Ni. Further, given a document corpus , for each document dj 2 we have a value vector j = h j1 j2 jNi, where each value ji 2 &lt; gives some measure of term ti within document dj typically the frequency of occurrence or a function thereof. In the case of the standard tf-idf formula, ji is the term frequency and i the inverse document frequency. null If the document corpus and set of terms is held xed, the average precision calculation can be considered a function f : &lt;N ! [0; 1] mapping to a single average precision value. Finding the weight vectors in this  by the maximum entropy model for TREC topic 307.</Paragraph>
      <Paragraph position="2"> context is then the familiar problem of nding maxima in an N-dimensional landscape.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Powell's algorithm
</SectionTitle>
      <Paragraph position="0"> One general approach to this problem of searching a multi-dimensional space is to decompose the problem into a series of iterated searches along single directions within the space. Perhaps the most basic technique, credited to Powell, is simply a round-robin-style iteration along a set of unchanging direction vectors, until convergence is reached (Press et al., 1992, pp. 412-420). This is the approach used in this study.</Paragraph>
      <Paragraph position="1"> Formally, the procedure is as follows. You are given a set of direction vectors !1; !2; ; !N and a starting point 0. First move 0 to the maximum along !1 and call this 1, i.e. 1 = 0 + 1!1 for some scalar 1.</Paragraph>
      <Paragraph position="2"> Next move 1 to the maximum along !2 and call this 2, and so on, until the nal point N . Finally, replace 0 with N and repeat the entire process, starting again with !1. Do this until some convergence criterion is met.</Paragraph>
      <Paragraph position="3"> This procedure has no guaranteed rate of convergence, although more sophisticated versions of Powell's algorithm do. In practice this has not been a problem.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Exhaustively searching a single direction
</SectionTitle>
      <Paragraph position="0"> Powell's algorithm can make use of any one-dimensional search technique. Rather than applying a completely general hill-climbing search, however, in the case where document scores are calculated by a linear equation on the terms, i.e.</Paragraph>
      <Paragraph position="2"> as they are in the tf-idf formula, we can exhaustively search in a single direction of the weight space in an ef cient manner. This potentially yields better solutions and potentially converges more quickly than a general hill-climbing heuristic.</Paragraph>
      <Paragraph position="3">  The insight behind the algorithm is as follows. Given a direction ! in weight space and a starting point , the score of each document is a linear function of the scale along ! from :</Paragraph>
      <Paragraph position="5"> i.e. document di's score, plotted against , is a line with slope ! i and y-intercept j.</Paragraph>
      <Paragraph position="6"> Consider the graph of lines for all documents, such as the example in Figure 2. Each vertical slice of the graph, at some point on the x axis, represents the order of the documents when = ; speci cally, the order of the documents is given by the order of the intersections of the lines with the vertical line at x = .</Paragraph>
      <Paragraph position="7"> Now consider the set of intersections of the document lines. Given two documents dr and ds, their intersection, if it exists, lies at point rs = ( xrs; yrs) where</Paragraph>
      <Paragraph position="9"> (Note that this is unde ned if ! r = ! s, i.e., if the document lines are parallel.) Let be the set of all such document intersection points for a given direction, document set and term vector. Note that more than two lines may intersect at the same point, and that two intersections may share the same x component while having different y components.</Paragraph>
      <Paragraph position="10"> Now consider the set x, de ned as the projection of onto the x axis, i.e. x = f j 9 2 s.t. x = g.</Paragraph>
      <Paragraph position="11"> The points in x represent precisely those values of where two or more documents are tied in score. Therefore, the document ordering changes at and only at these points of intersection; in other words, the points in x partition the range of into at most M(M 1)=2+1 regions, where M is the total number of documents. Within a given region, document ordering is invariant and hence average precision is constant. As we can calculate the boundaries of, and the document ordering and average precision within, each region, we now have a way of nding the maximum across the entire space by evaluating a nite number of regions. Each of the O(M 2) regions requires an O(M log M) sort, yielding a total computational bound of O(M 3 log M).</Paragraph>
      <Paragraph position="12"> In fact, we can further reduce the computation by exploiting the fact that the change in document ordering between any two regions is known and is typically small.</Paragraph>
      <Paragraph position="13"> The weight search algorithm functions in this manner. It sorts the documents completely to determine the ordering in the left-most region. Then, it traverses the regions from left to right and updates the document ordering in each, which does not require a sort. Average precision can be incrementally updated based on the document ordering changes. This reduces the computational bound to O(M2 log M), the requirement for the initial sort of the O(M2) intersection points.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Experiment Setup
</SectionTitle>
    <Paragraph position="0"> In order to compare the results of the weight search algorithm to those of the maximum entropy model, we employed the same experiment setup. We ran on 15 topics, which were manually selected from the TREC 6, 7, and 8 collections (Voorhees and Harman, 2000), with the objective of creating a representative subset. The document sets were divided into randomly selected training, validation and test splits , comprising 25%, 25%, and 50%, respectively, of the complete set.</Paragraph>
    <Paragraph position="1"> For each query, a set of candidate terms was selected based on mutual information between (binary) term occurrence and document relevance. From this set, terms were chosen individually to be included in the query, and coef cients for all terms were calculated using L-BFGS, a quasi-Newton unconstrained optimization algorithm (Zhu et al., 1994).</Paragraph>
    <Paragraph position="2"> For experimenting with the weight search algorithm, we investigated queries of length 1 through 20 for each topic, so each topic involved 20 experiments. The rst term weight was xed at 1.0. The single-term queries did not require a weight search, as the weight of a single term does not affect the average precision score. For the remaining 19 experiments for each topic, the direction vectors ! were chosen such that the algorithm searched a single term weight at a time. For example, a query with  weight search algorithm. Each line represents a topic.</Paragraph>
    <Paragraph position="3"> i terms used the i 1 directions</Paragraph>
    <Paragraph position="5"> ...</Paragraph>
    <Paragraph position="6"> !i;i 1 = h0 0 0 0 1i: The two-term query for a topic started the search from the point 2;0 = h1 0i, and each successive experiment for that topic was initialized with the starting point 0 equal to the nal point in the previous iteration, concatenated with a 0. The value vectors j used in all experiments were Okapi tf scores.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML