File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2205_metho.xml
Size: 5,941 bytes
Last Modified: 2025-10-06 14:15:04
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2205"> <Title>FIRST RETRIEVED DOCUMENT (TITLE): Arab Hijackers' Demands Similar To Those of Hostage- Takers in Lebanon SUMMARIZER TOPIC: Evidence of Iranian support For Lebanese hostage takers</Title> <Section position="2" start_page="1262" end_page="1262" type="metho"> <SectionTitle> PARTIALLY EXPANDED SEARCH STATEMENT: </SectionTitle> <Paragraph position="0"> < ~itle > Evidence of Iranian support for Lebanese hostage takers.</Paragraph> <Paragraph position="1"> < desc > Document will give data linking Iran to groups in Lebanon which seize and hold Western hostages. < expd > Mugniyeh, 36, is a key figure in the security apparatus of Hezbollah, or Party of God, an Iranian-backed Shiite movement believed to be the umbrella For factions holding most of the 22 t'oreign hostages in Lebanon.</Paragraph> <Paragraph position="2"> Overview of t~tie NLIR System The Natural I~anguage Information 17Letrieval System (NISIR) deg as been designed as a series of parallel text processing and indexing &quot;s\[reams '~. Each stream constitutes an alternative representation of the database obtained using differenl combination of natural language processing steps. The purpose of NI~ processing is to obtain a more accurate content representation than that based on words alone, which will in turn lead to improved performance.</Paragraph> <Paragraph position="3"> The following term extraction steps correspond to some of the streams used in our syslem: that include all closed-class words (determiners, prepositions, etc.) 2. Morphological stemming: Words are normalized across morphological variants using a lexicon-based stemmer.</Paragraph> <Paragraph position="4"> 3. Phrase extraction: Shallow text processing tech null niques, including part-of-speech tagging, phrase boundary detection, and word co-occurrence metrics are used to identify relatively stable groups of words, e.g., joint venture.</Paragraph> <Paragraph position="5"> 4. Phrase normalization: Documents are processed with a syntactic parser, and &quot;Head+Modifier&quot; pairs are extracted in order to normalize across syntactic variants and reduce to a common &quot;concept&quot;, e.g., weapon+proliferate.</Paragraph> <Paragraph position="6"> 5. Proper name extraction: Names of people, localions, organizations, etc. are identified.</Paragraph> <Paragraph position="7"> Search queries, after appropriate processing, are run against each stream, i.e., a phrase query against the phrase stream, a name query against the name stream, etc. The results are obtained by merging ranked lists of documents obtained from searching all streams. This allows for an easy combination of alternative retrieval methods, creating a metasearch strategy which maximizes the contribution of each stream. Different information retrieval systems can used as indexing and search engines each stream. In the experiments described here we used Cornell's SMART (version 11) (Buckley, et al. 1995).</Paragraph> </Section> <Section position="3" start_page="1262" end_page="1263" type="metho"> <SectionTitle> TREC Evaluatlion ResuItls </SectionTitle> <Paragraph position="0"> Table 1 lists selected runs performed with the NLIR system on TREC-6 database using 50 queries (TREC topics) numbered 301 through 350. The expanded query runs are contrasted with runs obtained using TI~EC original topics using NLIt{ as well as Cornell's SMART (version 11) which serves here as a benchmark. The first two columns are automatic runs, which means that there was no human intervention in the process at any time. Since query expansion requires human decision on summary selection, these runs (columns 3 and 4) are classified as &quot;manual&quot;, although most of the process is automatic. As can be seen, query expansion produces an impressive improvement in precision at all levels, l~ecall figures are shown at 1000 retrieved documents.</Paragraph> <Paragraph position="1"> Query expansion appears to produce consistently high gains not only for different sets of queries but also for different systems: we asked other groups participating in TREC to run search using our expanded queries, and they reported similarly large improvements.</Paragraph> <Paragraph position="2"> Finally, we may note that NLP-based indexing has also a positive effect on overall performance, but the improvements are relatively modest, particularly on the expanded queries. A similar effect of reduced effectiveness of linguistic indexing has been reported also in connection with improved term weighting techniques.</Paragraph> <Paragraph position="3"> Conclusions We have developed a method to derive quick-read summaries from news-like texts using a number of shallow NISP and simple quantitative techniques. The summary is assembled out of passages extracted from the original text, based on a pre-determined DMS template. This approach has produced a very e~cient and robust summarizer for news-like tex~s. We used the summarizer, via the QET interface, to build effective search queries for an information retrieval system. This has been demonstrated to produce dramatic performance improvements in TREC evaluations. We believe that this query expansion approach will also prove useful in searching very large databases where obtaining a full index may be impractical or impossible, and accurate sampling will become critical.</Paragraph> </Section> <Section position="4" start_page="1263" end_page="1263" type="metho"> <SectionTitle> Acknowledgements We thank Chris Buckley for </SectionTitle> <Paragraph position="0"> helping us to understand the inner workings of SMART, and also for providing SMART system results used here. This paper is based upon work supported in part by the Defense Advanced Research Projects Agency under Tipster Phase-3 Contract 97F157200-000. null</Paragraph> </Section> class="xml-element"></Paper>