XML Viewer - x98-1020

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/x98-1020_metho.xml
Size: 14,829 bytes
Last Modified: 2025-10-06 14:15:20
<?xml version="1.0" standalone="yes"?>
<Paper uid="X98-1020">
  <Title>Arab Hijackers' Demands Similar To Those of Hostage- Takers in Lebanon SUMMARIZER TOPIC: Evidence of Iranian support for Lebanese hostage takers</Title>
  <Section position="4" start_page="143" end_page="143" type="metho">
    <SectionTitle>
2. SUMMARIZATION-BASED AUTOMATIC TOPIC EX-
</SectionTitle>
    <Paragraph position="0"> PANSION I. This single-stream automatic Inquery run was produced with automatically expanded topics. Plain stems stream and syntactic noun phrase stream were combined and converted into a  single Inquery-syntax representation (tokens and quoted strings).</Paragraph>
    <Paragraph position="1"> 3. SUMMARIZATION-BASED AUTOMATIC TOPIC EXPANSION II. This multi-stream automatic run was produced using SMART rather than Inquery. Automatically expanded queries were NL processed using GE NLToolset.</Paragraph>
    <Section position="1" start_page="143" end_page="143" type="sub_section">
      <SectionTitle>
Helsinki NLP System
</SectionTitle>
      <Paragraph position="0"> We used Helsinki's Functional Dependency Grammar (FDG) includes the EngCG-2 tagger and dependency syntax which links phrase heads to their modifiers and verbs to their complements and adjuncts. null FDG was applied to the whole corpus, with the output passed to the stream extractor. The streams were generated as follows:</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="143" end_page="145" type="metho">
    <SectionTitle>
* SIMPLE STREAMS
</SectionTitle>
    <Paragraph position="0"> 1. STEM: just stemmed words, stopwords removed.</Paragraph>
    <Paragraph position="1"> 2. NAME: all proper names 3. AAN: simple noun phrases with attributives. Basically adjective-noun sequences minus some exceptions.</Paragraph>
    <Paragraph position="2"> * DIRECT DEPENDENCY STREAMS 1. sv: subject-verb pairs where the subject is a noun phrase.</Paragraph>
    <Paragraph position="3"> 2. vo: verb-complement pairs. The complement includes objects and some object-like adverbial classes.</Paragraph>
    <Paragraph position="4"> * INDIRECT DEPENDENCY STREAMS 1. NOFN: N1 ... o\] ... N2 pairs, where N1 and N2 are heads of simple noun phrases.</Paragraph>
    <Paragraph position="5"> 2. sc: subject-complement pairs where the complement modifies the subject, e.g., flowers grow wild - wild+flower.</Paragraph>
    <Paragraph position="6">  In this section we discuss a semi-interactive approach to information retrieval which consists of two tasks performed in a sequence. First, the system assists the searcher in building a comprehensive statement of information need, using automatically generated topical summaries of sample documents. Second, the detailed statement of information need is automatically processed by a series of natural language processing routines in order to derive an optimal search query for a statistical information retrieval system. We investigate the role of automated document summarization in building effective search statements.</Paragraph>
    <Paragraph position="7"> In the opening section of this paper we argued that the quality of the initial search topic, or user's information need statement is the ultimate factor in the performance of an information retrieval system. This means that the query must provide a sufficiently accurate description of what constitutes the relevant information, as well as how to distinguish this from related but not relevant information. We also pointed out that today's NLP techniques are not advanced enough to deal effectively with semantics and meaning, and instead they rely on syntactic and other surface forms to derive representations of content.</Paragraph>
    <Paragraph position="8"> In order to overcome these limitations, many IR systems allow varying degrees of user interaction that facilitates query optimization and calibration to closer match user's information seeking goals. A popular technique here is relevance feedback, where the user or the system judges the relevance of a sample of results returned from an initial search, and the query is subsequently rebuilt to reflect this information. Automatic relevance feedback techniques can lead to a very close mapping of known relevant documents, however, they also tend to overfit, which in turn reduces their ability of finding new documents on the same subject. Therefore, a serious challenge for information retrieval is to devise methods for building better queries, or in assisting user to do SO.</Paragraph>
    <Paragraph position="9"> Building effective search topics We have been experimenting with manual and automatic natural language query (or topic, in TREC parlance) building techniques. This differs from most query modification techniques used in IR in that our method is to reformulate the user's statement of information need rather than the search system's internal representation of it, as relevance feed-back does. Our goal is to devise a method of full-text expansion that would allow for creating exhaustive search topics such that: (1) the performance of any system using the expanded topics would be significantly better than when the system is run using the original topics, and (2) the method of topic expansion could eventually be automated or semi-automated so as to be useful to a non-expert user. Note that the first of the above requirements effectively calls for a free text, unstructured, but highly precise and exhaustive description of user's search statement. The preliminary results from TREC evaluations show that such an approach is indeed very effective.</Paragraph>
    <Paragraph position="10"> One way to view query expansion is to make the user query resemble more closely the documents it is expected to retrieve. This may include both content, as well as some other aspects such as composition, style, language type, etc. If the query is indeed made to resemble a &amp;quot;typical&amp;quot; relevant document, then suddenly everything about this query becomes a valid search criterion: words, collocations, phrases, various relationships, etc. Unfortunately, an average search query does not look anything like this, most of the time. It is more likely to be a statement specifying the semantic criteria of relevance. This means that except for the semantic or conceptual resemblance (which we cannot model very well as yet) much of the appearance of the query (which we can model reasonably well) may be, and often is, quite misleading for search purposes. Where can we get the right queries? In today's information retrieval, query expansion usually is typically limited to adding, deleting or  re-weighting of terms. For example, content terms from documents judged relevant are added to the query while weights of all terms are adjusted in order to reflect the relevance information. An alternative to term-only expansion is a full-text expansion described in (Strzalkowski et al. 1997). In this approach, search topics are expanded by pasting in entire sentences, paragraphs, and other sequences directly from any text document. To make this process efficient, an initial search is performed with the unexpanded queries and the top N (10-30) returned documents are used for query expansion. These documents, irrespective of their overall relevancy to the search topic, are scanned for passages containing concepts referred to in the query. The resulting expanded queries undergo further text processing steps, before the search is run again. We need to note that the expansion material was found in both relevant and non-relevant documents, benefiting the final query all the same. In fact, the presence of such text in otherwise non-relevant documents underscores the inherent limitations of distribution-based term reweighting used in relevance feedback. Summarization-based topic expansion We used our automatic text summarizer to derive query-specific summaries of documents returned from the first round of retrieval. The summaries were usually 1 or 2 consecutive paragraphs selected from the original document text. The initial purpose was to show to the user, by the way of a quick-read abstract, why a document has been retrieved. If the summary appeared relevant and moreover captured some important aspect of relevant information, then the user had an option to paste it into the query, thus increasing the chances of a more successful subsequent search. Note again that it wasn't important if the summarized documents were themselves relevant, although they usually were.</Paragraph>
    <Paragraph position="11"> The topic expansion interaction proceeds as follows: null 1. The initial natural language statement of information need is submitted to SMART-based NLIR retrieval engine via a Query Expansion Tool (QET) interface. The statement is converted into an internal search query and run against the TREC database. 2 2. NLIR returns top N (=30) documents from the database that match the search query.</Paragraph>
    <Paragraph position="12"> 2TREC-6 database consisted of approx. 2 GBytes of documents from Associated Press newswire, Wall Street Journal, Financial Times, Federal Register, FBIS and other sources (Harman &amp; Voorhees 1998).</Paragraph>
    <Paragraph position="13"> . The user determines a topic for the summarizer.</Paragraph>
    <Paragraph position="14"> By default, it is the title field of the initial search statement (see below).</Paragraph>
    <Paragraph position="15"> 4. The summarizer is invoked to automatically summarize each of the N documents with respect to the selected topic.</Paragraph>
    <Paragraph position="16"> 5. The user reviews the summaries (spending approx. 5-15 seconds per summary) and de-selects these that are not relevant to the search statement. null 6. All remaining summaries are automatically attached to the search statement.</Paragraph>
    <Paragraph position="17"> . The expanded search statement is passed through a series of natural language processing steps and then submitted for the final retrieval.</Paragraph>
    <Section position="1" start_page="145" end_page="145" type="sub_section">
      <SectionTitle>
Implementation and evaluation
</SectionTitle>
      <Paragraph position="0"> We have developed an automatic text summarizer as part of our Tipster Phase III contract. This work is described in a separate paper included in this volume. null We have included the summarizer as a helper application within the user interface to the natural language information retrieval system. In this application, the summarizer is used to derive query-related summaries of documents returned from database search. The summarization method used here is the same as for generic summaries described thus far, with the following exceptions: 1. The passage-search &amp;quot;query&amp;quot; is derived from the user's document search query rather than from the document title.</Paragraph>
      <Paragraph position="1"> 2. The distance of a passage from the beginning of the document is not considered towards its summary-worthiness.</Paragraph>
      <Paragraph position="2"> The topical summaries are read by the users to quickly decide their relevance to the search topic and, if desired, to expand the initial information search statement in order to produce a significantly more effective query. The following example shows a topical (query-guided summary) and compares it to the generic summary (we abbreviate SGML for brevity).</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="145" end_page="145" type="metho">
    <SectionTitle>
INITIAL SEARCH STATEMENT:
</SectionTitle>
    <Paragraph position="0"> &lt; title &gt; Evidence of Iranian support for Lebanese hostage takers.</Paragraph>
    <Paragraph position="1"> &lt; desc &gt; Document will give data linking Iran to groups in Lebanon which seize and hold Western hostages.</Paragraph>
  </Section>
  <Section position="7" start_page="145" end_page="146" type="metho">
    <SectionTitle>
FIRST RETRIEVED DOCUMENT (TITLE):
</SectionTitle>
    <Paragraph position="0"> Mugniyeh, 36, is a key figure in the security apparatus of Hezbollah, or Party of God, an Iranian-backed Shiite movement believed to be the umbrella for factions holding most of the 22 foreign hostages in Lebanon.</Paragraph>
  </Section>
  <Section position="8" start_page="146" end_page="146" type="metho">
    <SectionTitle>
GENERIC SUMMARY (for comparison):
</SectionTitle>
    <Paragraph position="0"> The demand made by hijackers of a Kuwaiti jet is the same as that made by Moslems holding Americans hostage in Lebanon - freedom for 17 pro-Iranian extremists jailed in Kuwait for bombing U.S. and French embassies there in 1983.</Paragraph>
  </Section>
  <Section position="9" start_page="146" end_page="146" type="metho">
    <SectionTitle>
PARTIALLY EXPANDED SEARCH STATEMENT:
</SectionTitle>
    <Paragraph position="0"> &lt; title &gt; Evidence of Iranian support for Lebanese hostage takers.</Paragraph>
    <Paragraph position="1"> &lt; desc &gt; Document will give data linking Iran to groups in Lebanon which seize and hold Western hostages. &lt; expd &gt; Mugniyeh, 36, is a key figure in the security apparatus of Hezbollah, or Party of God, an Iranian-backed Shiite movement believed to be the umbrella for factions holding most of the 22 foreign hostages in Lebanon.</Paragraph>
  </Section>
  <Section position="10" start_page="146" end_page="146" type="metho">
    <SectionTitle>
TREC Evaluation Results
</SectionTitle>
    <Paragraph position="0"> Table 3 lists selected runs performed with the NLIR system on TREC-6 database using 50 queries (TREC topics) numbered 301 through 350. The expanded query runs are contrasted with runs obtained using TREC original topics using NLIR as well as Cornell's SMART (version 11) which serves here as a benchmark. The first two columns are automatic runs, which means that there was no human intervention in the process at any time. Since query expansion requires human decision on summary selection, these runs (columns 3 and 4) are classified as &amp;quot;manual&amp;quot;, although most of the process is automatic. As can be seen, query expansion produces an impressive improvement in precision at all levels. Recall figures are shown at 1000 retrieved documents.</Paragraph>
    <Paragraph position="1"> Query expansion appears to produce consistently high gains not only for different sets of queries but also for different systems: we asked other groups participating in TREC to run search using our expanded queries, and they reported similarly large improvements.</Paragraph>
    <Paragraph position="2"> Finally, we may note that NLP-based indexing has also a positive effect on overall performance, but the improvements are relatively modest, particularly on the expanded queries. A similar effect of reduced effectiveness of linguistic indexing has been reported also in connection with improved term weighting techniques.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML