File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/92/p92-1014_concl.xml

Size: 2,824 bytes

Last Modified: 2025-10-06 13:56:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="P92-1014">
  <Title>INFORMATION RETRIEVAL USING ROBUST NATURAL LANGUAGE PROCESSING</Title>
  <Section position="11" start_page="108" end_page="110" type="concl">
    <SectionTitle>
SUMMARY OF RESULTS
</SectionTitle>
    <Paragraph position="0"> The preliminary series of experiments with the CACM-3204 collection of computer science abstracts showed a consistent improvement in performance: the average precision increased from 32.8% to 37.1% (a 13% increase), while the normalized recall went from 74.3% to 84.5% (a 14% increase), in comparison with the statistics of the base NIST system. This improvement is a combined effect of the new stemmer, compound terms, term selection in queries, and query expansion using filtered similarity relations. The choice of similarity relation filter has been found critical in improving retrieval precision through query expansion. It should also be pointed out that only about 1.5% of all similarity relations originally generated from CACM-3204 were found processing texts without any internal document structure. 14 The filter was most effective at o = 0.57.</Paragraph>
    <Paragraph position="1">  more specific term).</Paragraph>
    <Paragraph position="2"> admissible after filtering, contributing only 1.2 expansion on average per query. It is quite evident significantly larger corpora are required to produce more dramatic results. 15 ~6 A detailed summary is given in Table 3 below.</Paragraph>
    <Paragraph position="3"> These results, while quite modest by IR stundards, are significant for another reason as well. They were obtained without any manual intervention into the database or queries, and without using any other ts KL Kwok (private communication) has suggested that the low percentage of admissible relations might be similar to the phenomenon of 'tight dusters' which while meaningful are so few that their impact is small.</Paragraph>
    <Paragraph position="4"> :s A sufficiently large text corpus is 20 million words or more. This has been paRially confirmed by experiments performed at the University of Massachussetts (B. Croft, private comrnunicadon). null  information about the database except for the text of the documents (i.e., not even the hand generated key-word fields enclosed with most documents were used). Lewis and Croft (1990), and Croft et al. (1991) report results similar to ours but they take advantage of Computer Reviews categories manually assigned to some documents. The purpose of this research is to explore the potential of automated NLP in dealing with large scale IR problems, and not necessarily to obtain the best possible results on any particular data collection. One of our goals is to point a feasible direction for integrating NLP into the traditional IR.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML