File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/x98-1001_metho.xml

Size: 7,660 bytes

Last Modified: 2025-10-06 14:15:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="X98-1001">
  <Title>The TIPSTER Text Program Overview</Title>
  <Section position="2" start_page="0" end_page="3" type="metho">
    <SectionTitle>
PHASE I ACCOMPLISHMENTS
</SectionTitle>
    <Paragraph position="0"> The focus of TIPSTER Phase I was to advance the state of the art in two text-processing technologies, Document Detection and Information Extraction. Document detection included two subtasks: Routing (running static queries against a text stream) and ad hoc retrieval (running ad hoc queries against archival data). Information Extraction is a technology in which pre-specified types of information are located within free text, extracted and placed within such structured forms as templates which can be considered as pseudo-databases.</Paragraph>
    <Paragraph position="1"> The algorithm development in detection and extraction during Phase I resulted in improvements in the technologies. As a result of TIPSTER advances, detection users had: * Improved recall (the system retrieves more of the relevant documents available in the applicable document collection) * Improved precision (the system returns to the user a higher percentage of relevant documents in the &amp;quot;hits&amp;quot; list, meaning that the user will read fewer documents in order to find the one he wants) * Ranked retrievals (the user reviews documents statistically ranked according to how well they matched the query, thus improving the chances that the most relevant documents will be at the top of the hits list) * Query expansion (the system would attempt to automatically expand queries to draw in more relevant documents by using concept based tools such as thesauri) * Automatic query generation (the system uses a natural language description of the subject  supplied by the user, including example text, to generate queries) The TIPSTER Program continued Government sponsorship of information extraction research. The extraction efforts, begun with the DARPA-sponsored MUC in 1987. sought to reduce the burden of tasks largely characterized by manual procedures and large resource investments both in terms of people and corresponding dollars. As a result of extraction algorithm development in Phase I, systems could be</Paragraph>
  </Section>
  <Section position="3" start_page="3" end_page="3" type="metho">
    <SectionTitle>
PHASE II ACCOMPLISHMENTS
</SectionTitle>
    <Paragraph position="0"> The Government continued its sponsorship of information technology in TIPSTER Phase II. The participating agencies defined a two-tiered program of continued algorithm development and transfer of technology into demonstration projects. The Government, industry and academia continued their close cooperation and, based on Phase I experiences, crafted a four-part program. (See Figure 1.) While continuing the traditional focus on advanced research and metrics-based evaluation, the program supported the development of a common software architecture and applications of the technologies to help solve operational problems.</Paragraph>
    <Paragraph position="1"> While advances were continuing in the technology areas, there was a growing need for interoperability among the diverse systems. The impetus for the architecture came from an analysis of the designs of Phase I systems and analysis of operational scenarios that indicated the complementary nature of detection and extraction operations. The Government sponsors wanted an architecture that would support both technology</Paragraph>
  </Section>
  <Section position="4" start_page="3" end_page="3" type="metho">
    <SectionTitle>
TECHNOLOGY RESEARCH
</SectionTitle>
    <Paragraph position="0"> Research on the two underlying technology areas:  - Document detection (and the more general category of information retrieval) - Text extraction For Phase III of the program, 15 research contracts were executed.</Paragraph>
  </Section>
  <Section position="5" start_page="3" end_page="3" type="metho">
    <SectionTitle>
TIPSTER ARCHITECTURE
</SectionTitle>
    <Paragraph position="0"> A framework to enable sharing and interchangeability of software modules developed under the TIPSTER program. The architecture documents provided standards for basic software components and specifications for interfaces between two components.</Paragraph>
  </Section>
  <Section position="6" start_page="3" end_page="3" type="metho">
    <SectionTitle>
METRIC-BASED EVALUATIONS
</SectionTitle>
    <Paragraph position="0"> Development of evaluation methodologies and creation of data collections, with ground truth, to serve as the testbed for software systems in the three technology areas. The evaluations forums were:  - Text Retrieval Conference (TREC) - Message Understanding Conference (MUC) - Multilingual Entity Task (MET)</Paragraph>
  </Section>
  <Section position="7" start_page="3" end_page="4" type="metho">
    <SectionTitle>
DEMONSTRATION PROJECTS
</SectionTitle>
    <Paragraph position="0"> Projects funded independently by various partner agencies to evaluate the technologies for transfer into the workplace.</Paragraph>
    <Paragraph position="1"> A total of 15 projects were executed in Phase II and several continued into Phase III.</Paragraph>
    <Paragraph position="2">  areas. An architecture working group (AWG), consisting of TIPSTER Phase II R&amp;D contractors, was formed to address the issues of developing a common, open architecture. This architecture would provide the framework for interoperability between detection and extraction systems and for plug-and-play flexibility. The AWG, with the support of an independent System Engineering/Configuration Management contractor, drafted initial versions of a TIPSTER architecture.</Paragraph>
    <Paragraph position="3"> To test the feasibility of applying the TIPSTER-developed algorithms to operational environments, individual Government agencies sponsored separate development projects. For each of these projects, a demonstration system based on the architecture and modules developed in the R&amp;D tier was developed. Some 15 systems were developed and tested in operational environments at several Government agencies as a result of this effort. Identified needs for architecture and algorithm improvements or additional research were fed back to the R&amp;D projects.</Paragraph>
    <Paragraph position="4"> The research and development efforts of the TIPSTER Program Phase II included improvements of algorithms and research into combining the results of the application of diverse extraction and detection techniques. There were improvements in detection recall and precision. Automatic query generation and relevance ranking spread beyond the TIPSTER Program and began to be common features in commercial search engines. Extraction technology advanced to the point that for at least one task, named entity extraction, machine performance was nearly the same as human performance. Extraction robustness had improved to allow operational users to test systems on real-world problems to determine that automatic population of databases was indeed possible with this technology.</Paragraph>
    <Paragraph position="5"> The program continued its primary sponsorship of both the Message Understanding Conferences and the Text Retrieval Conferences.</Paragraph>
    <Paragraph position="6"> This sponsorship was based on the belief that these forums for evaluation are essential to technology advances, synergistic interactions of conference participants and the continued success in TIPSTER research and development. The combined efforts of TIPSTER Phase I and Phase II effectively set the scene for the third--and final--phase of the</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML