File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/m92-1027_intro.xml

Size: 3,160 bytes

Last Modified: 2025-10-06 14:05:18

<?xml version="1.0" standalone="yes"?>
<Paper uid="M92-1027">
  <Title>HUGHES RESEARCH LABORATORIES : DESCRIPTION OF THE TRAINABLE TEXT SKIMMER USED FOR MUC-4</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> The objective of the Hughes Trainable Text Skimmer (TTS) Project is to create text skimming software that: (1) can be easily re-configured for new applications, (2) improves its performance with use, and (3) is fas t enough to process several megabytes of text per day. The TTS-MUC4 system is our second full-scale prototype . It is an adaptation of the TTS-MUC3 system [1] [2], which constituted our first-full scale text skimming prototype.</Paragraph>
    <Paragraph position="1"> TTS-MUC3 utilized a previously-constructed text database facility and pattern matcher used for shallo w parsing. Its modular process model integrated the results of case memory retrieval over sentences from multipl e stories, extracting the date and location of incidents, and computing cross-reference information for various slots .</Paragraph>
    <Paragraph position="2"> One calendar month and approximately three (3) person months were spent developing TTS for MUC-3 .</Paragraph>
    <Paragraph position="3"> TTS-MUC3 demonstrated that a pattern classification approach was promising for performing tex t skimming. TTS-MUC4 is similar to TTS-MUC3, with a few minor changes . First, the K-Nearest Neighbor classifier used in TTS-MUC3 was replaced in TTS-MUC4 with a Bayesian Classifier which actually includes specialized classifiers for each slot . Therefore, for : INCIDENT-TYPE the set of features present in an entire sentence were used as features, but for :HUM-TGT-NAME the features just before and after a candidate were used.</Paragraph>
    <Paragraph position="4"> Secondly, in the new prototype, code was added to extract information to fill the new and revised slots of the MUC- 4 templates. Thirdly, additional filters were developed to improve the precision of the values of the template fillers .</Paragraph>
    <Paragraph position="5"> Like our rust prototype, TTS-MUC4 incorporates semi-automated lexicon generation and almost fully automate d phrase pattern generation. Two calendar months and approximately 2.5 person months were spent on enhancing th e TTS-MUC3 system to create TTS-MUC4 .</Paragraph>
    <Paragraph position="6"> As with TTS-MUC3, all the modules in TTS-MUC4 are domain independent. All the modules except the date and location extraction modules are trained prior to skimming . In addition, the location extraction modul e requires a location database, including the specification of which locations are contained within others .</Paragraph>
    <Paragraph position="7"> The goal of the TTS project is to develop a text skimmer that can be used in a variety of applications . By relying on statistical information processing and keeping the amount of domain-dependent information to a minimum, it is hoped that this system can be easily ported to a variety of tasks, such as analysis of finance-relate d wire-service stories.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML