File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/98/x98-1019_concl.xml

Size: 2,241 bytes

Last Modified: 2025-10-06 13:58:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="X98-1019">
  <Title>IMPROVING ENGLISH AND CHINESE AD-HOC RETRIEVAL: TIPSTER TEXT PHASE 3 FINAL REPORT</Title>
  <Section position="8" start_page="136" end_page="136" type="concl">
    <SectionTitle>
6. CONCLUSION
</SectionTitle>
    <Paragraph position="0"> A 2-stage retrieval strategy with pseudo-feedback often returns better ad-hoc results than 1-stage alone.</Paragraph>
    <Paragraph position="1"> We have further investigated term, phrasal and topical concept level evidence methods for improving retrieval accuracy in this situation. We showed that five term level methods together are effective for enhancing ad-hoc short query results some 20 to 40% for TREC5 &amp; 6 experiments. A particularly useful technique is collection enrichment, which simply adds domain-related external collections to a target collection to help improve 2 degd stage retrieval downstream. It brings substantial improvements in many cases and does not hurt much in others. It works for long and short queries in both English and Chinese IR.</Paragraph>
    <Paragraph position="2"> With long queries we showed that using linguistic phrases to match within document windows as further evidence to re-rank retrieval output can lead to some small improvements. We also studied re-ranking of output documents based on topical concept level evidence using document clustering, but the effort has so far not been successful.</Paragraph>
    <Paragraph position="3"> Contrary to expectations, word segmentation is not crucial for Chinese IR. Simple bigrams or short-word with character indexing can produce very good results.</Paragraph>
    <Paragraph position="4"> A manual stoplist is also unnecessary; one only needs to screen out high frequency statistical stopwords.</Paragraph>
    <Paragraph position="5"> Best results are obtained by combining retrievals using multiple representations.</Paragraph>
    <Paragraph position="6"> For the future, it will be interesting to see if phrasal evidence can be employed for Chinese IR, and to study how to improve its usefulness. Topical clustering for enhancing retrieval, display and for data reduction in general are also important issues for large scale IR.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML