File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/93/x93-1007_concl.xml

Size: 5,003 bytes

Last Modified: 2025-10-06 13:57:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="X93-1007">
  <Title>DOCUMENT DETECTION SUMMARY OF RESULTS</Title>
  <Section position="8" start_page="41" end_page="46" type="concl">
    <SectionTitle>
7. CONCLUSIONS
</SectionTitle>
    <Paragraph position="0"> What are some of the conclusions that can be drawn from the many experiments performed in the TIPSTER and TREC evaluations, and equally important, what is the lasting value of this two-year project? First, the statistical techniques (using non-Boolean methods without any formal query language) that were used on the smaller test collections DO scale up. The simplest example of this is the consistently high performance of the Comell SMART system in TREC. This very basic system  relies on the vector-space model and on carefully crafted term weighting to produce their high results. A more complex example of the successful use of statistical techniques is the University of Massachusetts INQUERY system, which uses the more sophisticated inference network approach to achieve their high performance. This system has been very successful throughout the TIPSTER project, and has achieved this success using variations on their original system rather than having to completely revise their techniques.</Paragraph>
    <Paragraph position="1"> Second, the results obtained by the best systems in TIPSTER and TREC are at a level of performance that is generally accepted to be superior to the best current Boolean retrieval system. More importantly, this performance is achieved from simple natural language input, allowing consistently superior retrieval performance without exhaustive training or experience. These systems are clearly ready to be tested in fully operational environments.</Paragraph>
    <Paragraph position="2"> Third, the use of a large test collection has shown some unexpected results. Techniques that should have brought improvements have not done so. The use of phrases instead of single terms has not resulted in significant improvements; the use of proximity or paragraph-level retrieval has not shown especially good results; and the use of more complex NLP techniques have not worked well yet. Conversely, techniques that have not been successful before such as using types of automatic thesaurii for topic expansion have had unexpected success. These unexpected results using a large test collection are reopening research on old discarded ideas and starting research in new areas. It is much too early to draw firm conclusions on any of these techniques. Often poor performance that is attributed to one problem may be the result of lack of balance in parameter adjustment, e.g., the lack of improvement from phrases may be caused by the difficulty in balancing the weights of these phrases and the weights of single terms.</Paragraph>
    <Paragraph position="3"> What is the lasting value of the document detection half of the TIPSTER phase I project? The first contribution in my opinion has been the development of a large test collection and the wide acceptance of its use via the TREC conferences. The lack of a large test collection has been a major barrier in the field of information retrieval and its removal allows an expansion of research by many groups world-wide.</Paragraph>
    <Paragraph position="4"> The second lasting value is the demonstration of the feasibility of using the non-Boolean, statistically-based retrieval systems both in the ARPA community and in the broader commercial sector. Not only have well-established small-scale research groups braved the scaling effort, but at least four new commercial products have used the TIPSTER/TREC program as launching pads.</Paragraph>
    <Paragraph position="5"> The TIPSTER program has caused the establishment of two major new retrieval research groups; both Syracuse University and HNC Inc. have built systems during the TIPSTER project that are approaching the power of the best of the TIPSTER/TREC systems. Additionally many of the TREC systems are either new groups in the informarion retrieval research arena or are older groups expanding their small programs to tackle this major retrieval experiment.</Paragraph>
    <Paragraph position="6"> The final lasting value of the TIPSTER project has been the joining of the NLP community and the information retrieval (IR) community in the project. This has led to the high expecrions for combining these disjoint technologies in phase II and has helped cement the important coUaboration of two diverse groups of researchers.</Paragraph>
    <Paragraph position="7"> These three lasting contributions are not only of value individnally, but will lead to a resurgence of research in the information retrieval area. The combination of the large test collection, the growing demand for improved retrieval products, and the increased collaboration between the NLP and IR communities will result in new techniques that will finally achieve the breakthrough in performance that is TIPSTER's goal.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML