File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-1109_intro.xml

Size: 2,727 bytes

Last Modified: 2025-10-06 14:00:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1109">
  <Title>From Information Retrieval to Information Extraction</Title>
  <Section position="4" start_page="85" end_page="85" type="intro">
    <SectionTitle>
2 IR and NLP
</SectionTitle>
    <Paragraph position="0"> Our key interest in this work was to provide a system which allowed users to get answers: not just documents or sub-documents. We have not addressed the question of whether or not these techniques would also be useful for more traditional IR in the sense of finding the most relevant document for a particular query. There is some potential since there are extra options to refine or expand a query e.g. using sortal constraints such as company and location, and restrictive constraints such as subject_of or same sentence. Since the linguistic constraints are under user control the query is more likely to be accurate than in systems where linguistic constraints are derived from a natural language query (though at the expense of usability). null The system was designed to deal with multiple answer queries such as &amp;quot;which protein interacts with TAF-2?'. This differs somewhat from the TREC question answering track ((TREC), 2000), where the emphasis is on questions which have a single answer, and systems attempt to provide the most relevant sub-document. To attempt the TREC task we would need to extend the system with a relevance weighting mechanism, and provide further techniques for query expansion.</Paragraph>
    <Paragraph position="1"> We would then expect the system to do well, since (Srihari and Li, 1999) show that the use of sortal constraints such as company, location and time plus constraints such as same sentence give good results, even with a relatively simple ranking mechanism.</Paragraph>
    <Paragraph position="2"> How much extra cost is involved in using linguistic information? There is obviously some initial cost in parsing and analysing the texts, however this can largely be hidden by preprocessing the documents. In addition there is a space cost in having a much larger index: this is necessary since we are keeping more information about the document's structure. Finally, there is the cost of evaluating more complex constraints. The cost of using sortal constraints is negligible: we can index them in exactly the same way as words. However,  relational constraints such as same sentence do introduce extra processing. The figures we present at the end of this paper show that this first implementation of the system is fast enough to be usable for some real applications (and very fast by Information Extraction standards), but is not yet in the same league as standard IR engines.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML