File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/m92-1027_intro.xml
Size: 3,160 bytes
Last Modified: 2025-10-06 14:05:18
<?xml version="1.0" standalone="yes"?> <Paper uid="M92-1027"> <Title>HUGHES RESEARCH LABORATORIES : DESCRIPTION OF THE TRAINABLE TEXT SKIMMER USED FOR MUC-4</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> INTRODUCTION </SectionTitle> <Paragraph position="0"> The objective of the Hughes Trainable Text Skimmer (TTS) Project is to create text skimming software that: (1) can be easily re-configured for new applications, (2) improves its performance with use, and (3) is fas t enough to process several megabytes of text per day. The TTS-MUC4 system is our second full-scale prototype . It is an adaptation of the TTS-MUC3 system [1] [2], which constituted our first-full scale text skimming prototype.</Paragraph> <Paragraph position="1"> TTS-MUC3 utilized a previously-constructed text database facility and pattern matcher used for shallo w parsing. Its modular process model integrated the results of case memory retrieval over sentences from multipl e stories, extracting the date and location of incidents, and computing cross-reference information for various slots .</Paragraph> <Paragraph position="2"> One calendar month and approximately three (3) person months were spent developing TTS for MUC-3 .</Paragraph> <Paragraph position="3"> TTS-MUC3 demonstrated that a pattern classification approach was promising for performing tex t skimming. TTS-MUC4 is similar to TTS-MUC3, with a few minor changes . First, the K-Nearest Neighbor classifier used in TTS-MUC3 was replaced in TTS-MUC4 with a Bayesian Classifier which actually includes specialized classifiers for each slot . Therefore, for : INCIDENT-TYPE the set of features present in an entire sentence were used as features, but for :HUM-TGT-NAME the features just before and after a candidate were used.</Paragraph> <Paragraph position="4"> Secondly, in the new prototype, code was added to extract information to fill the new and revised slots of the MUC- 4 templates. Thirdly, additional filters were developed to improve the precision of the values of the template fillers .</Paragraph> <Paragraph position="5"> Like our rust prototype, TTS-MUC4 incorporates semi-automated lexicon generation and almost fully automate d phrase pattern generation. Two calendar months and approximately 2.5 person months were spent on enhancing th e TTS-MUC3 system to create TTS-MUC4 .</Paragraph> <Paragraph position="6"> As with TTS-MUC3, all the modules in TTS-MUC4 are domain independent. All the modules except the date and location extraction modules are trained prior to skimming . In addition, the location extraction modul e requires a location database, including the specification of which locations are contained within others .</Paragraph> <Paragraph position="7"> The goal of the TTS project is to develop a text skimmer that can be used in a variety of applications . By relying on statistical information processing and keeping the amount of domain-dependent information to a minimum, it is hoped that this system can be easily ported to a variety of tasks, such as analysis of finance-relate d wire-service stories.</Paragraph> </Section> class="xml-element"></Paper>