File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/h94-1070_intro.xml

Size: 4,719 bytes

Last Modified: 2025-10-06 14:05:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="H94-1070">
  <Title>OVERVIEW OF THE SECOND TEXT RETRIEVAL CONFERENCE (TREC-2)</Title>
  <Section position="2" start_page="0" end_page="351" type="intro">
    <SectionTitle>
1. INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> In November of 1992 the first Text REtrieval Conference (TREC-1) was held at NIST (Harman 1993). This conference, co-sponsored by ARPA and NIST, brought together information retrieval researchers to discuss their system results on the new TIPSTER test collection. This was the first time that such groups had ever compared results on the same data using the same evaluation methods, and represented a breakthrough in cross-system evaluation in information retrieval. It was also the first time that most of these groups had tackled such a large test collection and required a major effort by all groups to scale up their retrieval techniques.</Paragraph>
    <Paragraph position="1"> Since TREC is designed to evaluate system performance both in a routing (filtering or profiling) mode, and in an adhoc mode, both functions were tested. The test design was based on traditional information retrieval models, involving documents, &amp;quot;user&amp;quot; questions, and the &amp;quot;right answers&amp;quot; (Harman 1994a). Participants were first sent two disks of documents (about 2 gigabytes of data) and a training set of 100 questions or topics. They were also sent lists of documents in the two disks that were considered the &amp;quot;right answers&amp;quot; or relevant documents for each of the 100 topics. The participants were asked to train their systems on this data, and at some point to signal their readiness for testing by submitting their system queries for a specific fifty of the topics. The routing test consisted of each group running new test documents against those 50 queries. The adhoc test consisted of running a new set of 50 topics against the old document set (the original 2 disks). In each case, the results of the retrieval systems were submitted to NIST for evaluation.</Paragraph>
    <Paragraph position="2"> The documents in the test collection are from various types of text, covering different writing styles and different information domains. They include information from the Wall Street Journal, the San Jose Mercury News, the AP Newswire, and artcles from the Computer Select disks. The documents were uniformly formatted into an SGML-Iike structure for easy handling by the TREC participants. null  The topics used in the test collection are in the form of &amp;quot;user need&amp;quot; statements rather than more traditional queries. They are designed to mimic a real user's need, and were written by people who are actual users of a retrieval system. Although the subject domain of the topics is diverse, some consideration was given to the documents to be searched.</Paragraph>
    <Paragraph position="3"> The relevance judgments or &amp;quot;right answers&amp;quot; were made using a sampling method, with the sample constructed by taking the top 100 documents retrieved by each participating system for a given topic and merging them into a pool for manual relevance assessment. This is a valid sampling method since all the systems used ranked retrieval methods, with those documents most likely to be relevant returned first. All systems were then evaluated against the common set of relevant documents, i.e. the total number of relevant documents found by all the systems combined.</Paragraph>
    <Paragraph position="4"> How well did the systems do with this test collection? Whereas the TREC-1 conference demonstrated a wide range of different approaches to the retrieval of text from large document collections, the results could be viewed only as very preliminary. Not only were the deadlines for results were very tight, but the huge scale-up in the size of the document collection required major work from all groups in rebuilding their systems. Much of this work was simply a system engineering task: finding reasonable data structures to use, getting indexing routines to be efficient enough to finish indexing the data, finding enough storage to handle the large inverted files and other structures, etc. Still, the results showed that the systems did the task well, and that automatic construction of queries from the topics did as well as, or better than, manual construction of queries.</Paragraph>
    <Paragraph position="5"> The second TREC conference (TREC-2) occurred in August of 1993, less than 10 months after the first conference. In addition to most of the TREC-1 groups, nine new groups took part, bringing the total number of participating groups to 31.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML