File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-1027_intro.xml
Size: 4,783 bytes
Last Modified: 2025-10-06 14:02:22
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-1027"> <Title>An Empirical Study of Information Synthesis Tasks</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> A classical Information Retrieval (IR) system helps the user finding relevant documents in a given text collection. In most occasions, however, this is only the first step towards fulfilling an information need.</Paragraph> <Paragraph position="1"> The next steps consist of extracting, organizing and relating the relevant pieces of information, in order to obtain a comprehensive, non redundant report that satisfies the information need.</Paragraph> <Paragraph position="2"> In this paper, we will refer to this process as Information Synthesis. It is normally understood as an (intellectually challenging) human task, and perhaps the Google Answer Service1 is the best general purpose illustration of how it works. In this service, users send complex queries which cannot be answered simply by inspecting the first two or three documents returned by a search engine. These are a couple of real, representative examples: a) I'm looking for information concerning the history of text b) Provide an analysis on the future of web browsers, if any.</Paragraph> <Paragraph position="3"> Answers to such complex information needs are provided by experts which, commonly, search the Internet, select the best sources, and assemble the most relevant pieces of information into a report, organizing the most important facts and providing additional web hyperlinks for further reading. This Information Synthesis task is understood, in Google Answers, as a human task for which a search engine only provides the initial starting point. Our mid-term goal is to develop computer assistants that help users to accomplish Information Synthesis tasks.</Paragraph> <Paragraph position="4"> From a Computational Linguistics point of view, Information Synthesis can be seen as a kind of topic-oriented, informative multi-document summarization, where the goal is to produce a single text as a compressed version of a set of documents with a minimum loss of relevant information. Unlike indicative summaries (which help to determine whether a document is relevant to a particular topic), informative summaries must be helpful to answer, for instance, factual questions about the topic. In the remainder of the paper, we will use the term &quot;reports&quot; to refer to the summaries produced in an Information Synthesis task, in order to distinguish them from other kinds of summaries.</Paragraph> <Paragraph position="5"> Topic-oriented multi-document summarization has already been studied in other evaluation initiatives which provide testbeds to compare alternative approaches (Over, 2003; Goldstein et al., 2000; Radev et al., 2000). Unfortunately, those studies have been restricted to very small summaries (around 100 words) and small document sets (1020 documents). These are relevant summarization tasks, but hardly representative of the Information Synthesis problem we are focusing on.</Paragraph> <Paragraph position="6"> The first goal of our work has been, therefore, to create a suitable testbed that permits qualitative and quantitative studies on the information synthesis task. Section 2 describes the creation of such a testbed, which includes the manual generation of 72 reports by nine different subjects across 8 complex topics with 100 relevant documents per topic.</Paragraph> <Paragraph position="7"> Using this testbed, our second goal has been to compare alternative similarity metrics for the Information Synthesis task. A good similarity metric provides a way of evaluating Information Synthesis systems (comparing their output with manually generated reports), and should also shed some light on the common properties of manually generated reports. Our working hypothesis is that the best metric will best distinguish between manual and automatically generated reports.</Paragraph> <Paragraph position="8"> We have compared several similarity metrics, including a few baseline measures (based on document, sentence and vocabulary overlap) and a state-of-the-art measure to evaluate summarization systems, ROUGE (Lin and Hovy, 2003). We also introduce another proximity measure based on key concept overlap, which turns out to be substantially better than ROUGE for a relevant class of topics.</Paragraph> <Paragraph position="9"> Section 3 describes these metrics and the experimental design to compare them; in Section 4, we analyze the outcome of the experiment, and Section 5 discusses related work. Finally, Section 6 draws the main conclusions of this work.</Paragraph> </Section> class="xml-element"></Paper>