File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/x96-1016_intro.xml
Size: 3,092 bytes
Last Modified: 2025-10-06 14:06:10
<?xml version="1.0" standalone="yes"?> <Paper uid="X96-1016"> <Title>APPENDIX C: SGML TAG LISTING SGML Tag Description</Title> <Section position="3" start_page="0" end_page="61" type="intro"> <SectionTitle> 2. CAPABILITIES </SectionTitle> <Paragraph position="0"> ADEPT was conceived as a vehicle for capabilities to alleviate problems currently being faced by OIR. ADEPT tags documents in a uniform fashion, using Standard Generalized Markup (SGML) according to OIR standards. ADEPT provides a friendly user interface enabling Data Administrators to easily extend the system to tag new document formats and resolve problems with existing document formats.</Paragraph> <Paragraph position="1"> Data Processing and Extraction: ADEPT processes both well-formed and ill-formed data; accepting raw documents and parsing them to identify source-dependent fields that delineate specific important information. Some of these strings will be normalized.</Paragraph> <Paragraph position="2"> The field names, field values, and their normalized forms are stored as annotations along with the document in a TIPSTER compliant document manager. An SGML tag, defined by OIR, is associated with each annotation. The SGML tags delineate predefined document segments, such as title, publication date, main body text, etc. If ADEPT correctly captures all the fields for a documents format, an SGML-encoded document is transmitted to the ROSE System for information dissemination.</Paragraph> <Paragraph position="3"> Problem Detection and Diagnosis: ADEPT recognizes problems in the input documents and, offers deep diagnostics and suggestions to the Data Administrator for fixing those problems. Although new sources, format changes and erroneous or ill-behaved data can cause processing errors, ADEPT identifies these problem occurrences, generating diagnostics that describe the nature of the problem, such as where it occurred and why it did not match. From the diagnostics, the Data Administrator can easily determine whether the problem is due to an error (anomaly) in the data or a change in format.</Paragraph> <Paragraph position="4"> Error Handling and Document Viewing: ADEPT maintains a problem queue and provides GUI windows to aid the Data Administrator with both evaluating the source of problems (data error or new/changed format) and resolving them. The GUI enables a Data Administrator to see the original document, the output SGML template and the fields from which the SGML tags were generated. A Data Administrator can manually change the value of a tag and resubmit resolved document(s) for reprocessing by the system.</Paragraph> <Paragraph position="5"> System Adaptation: ADEPT enables Data Administrators to manually adapt the system's configuration (mapping templates) to meet new or changed formats.</Paragraph> <Paragraph position="6"> Through a combination of menus, customized panels and, cutting and pasting operations, the Data Administrator can specify the instructions to be used by ADEPT to parse and extract data from incoming documents.</Paragraph> </Section> class="xml-element"></Paper>