File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-3801_metho.xml

Size: 7,850 bytes

Last Modified: 2025-10-06 14:11:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3801">
  <Title>A Graphical Framework for Contextual Search and Name Disambiguation in Email</Title>
  <Section position="3" start_page="1" end_page="2" type="metho">
    <SectionTitle>
3 Graph Similarity
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="1" end_page="2" type="sub_section">
      <SectionTitle>
3.1 Edge weights
</SectionTitle>
      <Paragraph position="0"> Similarity between two nodes is defined by a lazy walk process, and a walk on the graph is controlled by a small set of parameters Th. To walk away from a node x, one first picks an edge label lscript; then, given lscript, one picks a node y such that x lscript[?]- y. We assume that the probability of picking the label lscript depends only on the type T(x) of the node x, i.e., that the outgoing probability from node x of following an edge type lscript is:</Paragraph>
      <Paragraph position="2"> Let STi be the set of possible labels for an edge leaving a node of type Ti. We require that the weights over all outgoing edge types given the source node type form a probability distribution, i.e., that</Paragraph>
      <Paragraph position="4"> In this paper, we will assume that once lscript is picked, y is chosen uniformly from the set of all y such that x lscript[?]- y. That is, the weight of an edge of type l connecting source node x to node y is:</Paragraph>
      <Paragraph position="6"> This assumption could easily be generalized, however: for instance, for the type T(x) = file and  source type edge type target type file sent-from person sent-from-email email-address sent-to person sent-to-email email-address date-of date has-subject-term term has-term term person sent-from inv. file  lscript = has-term, weights for terms y such that x lscript[?]- y might be distributed according to an appropriate language model (Croft and Lafferty, 2003).</Paragraph>
    </Section>
    <Section position="2" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
3.2 Graph walks
</SectionTitle>
      <Paragraph position="0"> Conceptually, the edge weights above define the probability of moving from a node x to some other node y. At each step in a lazy graph walk, there is also some probability g of staying at x. Putting these together, and denoting byMxy the probability of being at node y at time t + 1 given that one is at x at time t in the walk, we define</Paragraph>
      <Paragraph position="2"> If we associate nodes with integers, and makeM a matrix indexed by nodes, then a walk of k steps can then be defined by matrix multiplication: specifically, if V0 is some initial probability distribution over nodes, then the distribution after a k-step walk is proportional to Vk = V0Mk. Larger values of g increase the weight given to shorter paths between x and y. In the experiments reported here, we consider small values of k, and this computation is carried out directly using sparse-matrix multiplication methods.1 If V0 gives probability 1 to some node x0 1We have also explored an alternative approach based on sampling; this method scales better but introduces some additional variance into the procedure, which is undesirable for experimentation. null and probability 0 to all other nodes, then the value given to y in Vk can be interpreted as a similarity measure between x and y.</Paragraph>
      <Paragraph position="3"> In our framework, a query is an initial distribution Vq over nodes, plus a desired output type Tout, and the answer is a list of nodes y of type Tout, ranked by their score in the distribution Vk. For instance, for an ordinary ad hoc document retrieval query (like &amp;quot;economic impact of recycling tires&amp;quot;) would be an appropriate distribution Vq over query terms, with Tout = file. Replacing Tout with person would find the person most related to the query-e.g., an email contact heavily associated with the retread economics. Replacing Vq with a point distribution over a particular document would find the people most closely associated with the given document. null</Paragraph>
    </Section>
    <Section position="3" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
3.3 Relation to TF-IDF
</SectionTitle>
      <Paragraph position="0"> It is interesting to view this framework in comparison to more traditional IR methods. Suppose we restrict ourselves to two types, terms and files, and allow only in-file edges. Now consider an initial query distribution Vq which is uniform over the two terms &amp;quot;the aardvark&amp;quot;. A one-step matrix multiplication will result in a distribution V1, which includes file nodes. The common term &amp;quot;the&amp;quot; will spread its probability mass into small fractions over many file nodes, while the unusual term &amp;quot;aardvark&amp;quot; will spread its weight over only a few files: hence the effect will be similar to use of an IDF weighting scheme.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="2" end_page="3" type="metho">
    <SectionTitle>
4 Learning
</SectionTitle>
    <Paragraph position="0"> As suggested by the comments above, this graph framework could be used for many types of tasks, and it is unlikely that a single set of parameter values will be best for all tasks. It is thus important to consider the problem of learning how to better rank graph nodes.</Paragraph>
    <Paragraph position="1"> Previous researchers have described schemes for adjusting the parameters th using gradient descentlike methods (Diligenti et al., 2005; Nie et al., 2005). In this paper, we suggest an alternative approach of learning to re-order an initial ranking. This reranking approach has been used in the past for metasearch (Cohen et al., 1999) and also several natural- null language related tasks (Collins and Koo, 2005). The advantage of reranking over parameter tuning is that the learned classifier can take advantage of &amp;quot;global&amp;quot; features that are not easily used in walk.</Paragraph>
    <Paragraph position="2"> Note that node reranking, while can be used as an alternative to weight manipulation, it is better viewed as a complementary approach, as the techniques can be naturally combined by first tuning the parameters th, and then reranking the result using a classifier which exploits non-local features. This hybrid approach has been used successfully in the past on tasks like parsing (Collins and Koo, 2005).</Paragraph>
    <Paragraph position="3"> We here give a short overview of the reranking approach, that is described in detail elsewhere (Collins and Koo, 2005). The reranking algorithm is provided with a training set containing n examples. Example i (for 1 [?] i [?] n) includes a ranked list of li nodes. Let wij be the jth node for example i, and let p(wij) be the probability assigned to wij by the graph walk. A candidate node wij is represented through m features, which are computed by m feature functions f1,...,fm. We will require that the features be binary; this restriction allows a closed form parameter update. The ranking function for node x is defined as:</Paragraph>
    <Paragraph position="5"> value parameters. Given a new test example, the output of the model is the given node list re-ranked by F(x, -a).</Paragraph>
    <Paragraph position="6"> To learn the parameter weights -a, we use a boosting method (Collins and Koo, 2005), which minimizes the following loss function on the training data:</Paragraph>
    <Paragraph position="8"> where xi,1 is, without loss of generality, a correct target node. The weights for the function are learned with a boosting-like method, where in each iteration the feature fk that has the most impact on the loss function is chosen, and ak is modified. Closed form formulas exist for calculating the optimal additive updates and the impact per feature (Schapire and Singer, 1999).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML