File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/h94-1031_intro.xml
Size: 1,751 bytes
Last Modified: 2025-10-06 14:05:42
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1031"> <Title>ISSUES AND METHODOLOGY FOR TEMPLATE DESIGN FOR INFORMATION EXTRACTION</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> ABSTRACT </SectionTitle> <Paragraph position="0"> The goal of Information Extraction tasks is to identify, categorize, classify, relate, and normalize specific information of interest found in free text, and to make that information available to a back-end data base, data fusion, or other application. A data structure referred to as a template is typically used for capturing such information, particularly in cases where the amount and complexity of information is substantial. The design of the template for such/m application (or exercise) thus defines the task itself and therefore crucially affects the success of the Information Extraction attempt.</Paragraph> <Paragraph position="1"> This paper discusses template structure and methodological issues which arise in the template design process, within the context of a discussion of the design process itself; this paper is based on the template design process for TIPSTER/MUC5 and certain subsequent Information Extraction exercises. The first section of this paper addresses the issue of selection of the appropriate data representation (text annotation vs. flat template representation vs.</Paragraph> <Paragraph position="2"> object-oriented template). The second section outlines a set of high-level design considerations (desiderata) that have emerged; these desiderata feed into the discussion of design elements and a procedural review of the design process (design iterations, use of those linguistic analysis tools, etc.)</Paragraph> </Section> class="xml-element"></Paper>