File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/96/x96-1030_concl.xml

Size: 2,686 bytes

Last Modified: 2025-10-06 13:57:45

<?xml version="1.0" standalone="yes"?>
<Paper uid="X96-1030">
  <Title>amp;quot;Retrieving Records from a Gigabyte of text on a</Title>
  <Section position="8" start_page="146" end_page="147" type="concl">
    <SectionTitle>
CONCLUSIONS
</SectionTitle>
    <Paragraph position="0"> We presented in some detail our natural language information retrieval system consisting of an advanced NLP module and a 'pure' statistical core engine. While many problems remain to be resolved, including the question of adequacy of term-based representation of document content, we attempted to demonstrate that the architecture described here is nonetheless viable. In particular, we demonstrated that natural language processing can now be done on a fairly large scale and that its speed and robustness has improved to the point where it can be applied to real IR problems. We suggest, with some caution until more experiments are run, that natural language processing can be very effective in creating appropriate search queries out of user's initial specifications which can be frequently imprecise or vague. An encouraging thing to note is the sharp increase of precision near the top of the ranking. This indicates a higher than average concentration of relevant documents in the first 10-20 documents retrieved, which can leverage further gains in performance via an automatic feedback process. This should be our focus in TREC-5.</Paragraph>
    <Paragraph position="1"> Run base xbase nyuge I nyuge2  base - statistical terms only, no expansion; (2) xbase - massive query expansion, no phrases; (3) nyugel - phrases, names, with massive expansion up to 500 terms; (4) nyuge2 - expansion limited to 200 terms per query.</Paragraph>
    <Paragraph position="2"> Run abase aloe mbase mloc iloc  statistical terms only; (2) aloc - automatic phrases and names, locality N=20; (3) mbase - queries manually expanded, no phrases; (4) mloc - manual phrases, locality N=20; (5) iloc - interactive phrases, locality N=20.</Paragraph>
    <Paragraph position="3"> At the same time it is important to keep in mind that the NLP techniques that meet our  performance requirements (or at least are believed to be approaching these requirements) are still fairly unsophisticated in their ability to handle natural language text. In particular, advanced processing involving conceptual structuring, logical forms, etc., is still beyond reach, computationally. It may be assumed that these advanced techniques will prove even more effective, since they address the problem of representation-level limits; however the experimental evidence is sparse and necessarily limited to rather small scale tests.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML