XML Viewer - h94-1071

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/h94-1071_evalu.xml
Size: 6,110 bytes
Last Modified: 2025-10-06 14:00:18
<?xml version="1.0" standalone="yes"?>
<Paper uid="H94-1071">
  <Title>Learning from Relevant Documents in Large Scale Routing Retrieval</Title>
  <Section position="5" start_page="360" end_page="361" type="evalu">
    <SectionTitle>
4. EXPERIMENTS AND
DISCUSSION OF RESULTS
</SectionTitle>
    <Paragraph position="0"> For testing our various strategies of subdocument selection for training, we performed experiments exactly as those of TREC2 routing: Topics 51-100 retrieving on the 1 GB of documents on Disk3 of the TREC collection. Topics 51-100 have relevant document information from Disk l&amp;2 totaling 2 GB.</Paragraph>
    <Paragraph position="1"> There are altogether 16400 relevant documents averaging out to 328 per query. During our processing however, a small percentage of the relevants are lost, so that we in effect use only 16114 relevants that get segmented into 57751 subdocuments. This averages to about 1155 units per query. For the ranking strategies of Section 3.2, we have created a separate subcollection consisting only of the 57751 training relevants but using Disk l&amp;2 term statistics, and ranking for the first 2000 of each query is done. Various subsets of these ranked training documents are then used for weight learning for the query-term side of the network, with term expansion level K=40 terms as the standard. For some cases we also did term expansion of K=80.</Paragraph>
    <Paragraph position="2"> After freezing these trained edge weights, Disk3 subdocuments are linked in and routing retrievals are done. Results using the 'total number of relevants retrieved' (at 1000 retrieved cutoff) and 'average precision over all recall points' as measures of effectiveness, as well as the number of training units used, are summarized in Table 1. Some of the detailed precision-recall values are given in Table 2. The overall conclusion from these results is that for this TREC-2 routing experiment, where a large number of relevant documents of different sizes and quality is available, it is possible to define good subsets of the documents or portions of them for training.</Paragraph>
    <Paragraph position="3"> From Table 1 and using the average precision (av-p) measure for comparison, it appears that the simple strategy (b) of just using short, 'nonbreak' max=l relevant documents gives one of the best results, achieving av-p at K=40 expansion level of 0.4050, about 6.7% better than the 0.3795 of our baseline strategy (a) which uses all the relevant units.</Paragraph>
    <Paragraph position="4"> Moreover it is very efficient, requiring only 5235 units which is less than 10% of the total 57751 relevant subdocuments available and about 1/3 of the 16114 documents. Using longer documents that break into two and six units (max=2 and 6) successively leads to slightly worse results as well as more work (15103 and 32312 subdocuments). Thus, it appears that longer documents carry with it more noise as discussed in the Introduction. Just using the first subdocument of every relevant (c) performs quite well, with av-p of 0.4001. Since the FR collection has many documents of thousands of words long, it is difficult to imagine that signal parts are all in the first subdocuments. A casual scan however shows that some FR documents, such as FR88107-0009 and FR88119-0018, carry a summary at the beginning.</Paragraph>
    <Paragraph position="5"> Moreover, FR documents constitute only a minority of the training relevants. Thus the first subdocuments apparently carry sufficient signals of documents for training in this experiment. Last subdocuments (results not shown) do not perform as well as first.</Paragraph>
    <Paragraph position="6"> One of the best results is fmax=2 achieving av-p of 0.4047 as good as 'nonbreak' max=l method and using 10,169 training units.</Paragraph>
    <Paragraph position="7"> Surprisingly, using the best ranking bestnx=30, 100, 300, 2000 subdocuments (e) gives 0.3790, 0.3993, 0.3999 and 0.3877 average precision respectively,  peaking around bestnx=300 but does not give better performance than (b,c,d) strategies. For bestnx=30, employing only 1500 subdocuments apparently is not sufficient, and training may be limited to subdocuments resembling the original query.</Paragraph>
    <Paragraph position="8"> bestnx=100 uses 4945 units similar to max=l but with av-p about 1.5% worse, while bestnx=300 uses 13712 which is slightly less than first and performs about the same. In general, bestn results (not shown) are slightly less than those of bestnx as expected. Using the topnx=l subdocument of every relevant (If) achieves 0.4082, the best numerically. In (f) we have le, ss than 16114 units for training because we only rank the top 2000 for each query, and so some subdocuments ranking below 2000 are not accounted for. It appears that including other overall relevants can help improve performance.</Paragraph>
    <Paragraph position="9"> Strategies (g,h) of combining sets of subdocuments do not seem to lead to more improved results.</Paragraph>
    <Paragraph position="10"> Using the relevants retrieved (r-r) as a measure, it appears that larger training set sizes between 10000 to 16000 are needed to achieve good recall. For example, max=l and bestnx=100 employs about 5000 units for training and have r-r of 7646 and 7605.</Paragraph>
    <Paragraph position="11"> bestnx=300, max=2, first and topnx=l have r-r values of 7703, 7783, 7805 and 7833, and training set sizes of: 13712, 15103, 16114 and 15702. fmax=2 achieves good r-r of 7827 with a training size of 10169. fmax=3 (results not shown) is inferior. For this collection, the best strategies of selecting subdocuments for training appears to be either fmax=2 with av-p/r-r values of 0.4047/7827 or topnx=l with 0.4082/7833. fmax=2 has the advantage that a ranking is not done and the training set is smaller. The detailed recall-precision values in Table 3 also shows that fmax=2 gives better precision at the low recall region. It appears that using document properties to select training documents in this routing experiment is both effective and efficient.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML