File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/91/m91-1020_intro.xml

Size: 4,065 bytes

Last Modified: 2025-10-06 14:05:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="M91-1020">
  <Title>MARINES INTO THE PORT OF EL CALLAO . SOME 3 YEARS AGO TWO MARINES DIED FOLLOWING A SHINING PATH BOMBING O F A MARKET USED BY SOVIET MARINES .. IN ANOTHER INCIDENT 3 YEARS AGO, A SHINING PATH MILITANT WAS KILLE D BY SOVIET EMBASSY GUARDS INSIDE THE EMBASSY COMPOUND . THE SOURCES ALSO SAID THAT THE SHINING PATH HAS ATTACKED SOVIET INTERESTS IN PERU IN THE PAST . :END-MSG :END-PROC Figure 6: TST1-MUC3-0099 Input to CAUCU S</Title>
  <Section position="2" start_page="0" end_page="129" type="intro">
    <SectionTitle>
BACKGROUND
</SectionTitle>
    <Paragraph position="0"> ADS has been developing an approach to text processing, called CODEX, COntext directed Data EXtraction), that couples a concept-based, probabilistic keyword pattern matcher, called RUBRIC, with a probabilistic, gen eralized graph composition chart parser, called CAUCUS . We configure these two key technologies together for a variety of natural language sorting and gisting applications to provide greater depth of analysis (higher precision) tha n keyword-based techniques alone, as well as higher throughput and greater breadth of coverage than parsing technique s alone.</Paragraph>
    <Paragraph position="1"> In a typical text data extraction application, a complete syntactic, semantic, and pragmatic analysis of relevant text segments is required in order to achieve the precision necessary to reduce the level of human interaction require d for reliable system performance . This is the role of CAUCUS . Unfortunately, both the knowledge base developmen t and the computational costs are currently too high to apply this technique indiscriminately to all of the text entering a typical data extraction system, where many documents may contain no relevant information, and potentially large seg ments of relevant texts are also uninteresting . Coupling the keyword pattern matcher with the parser appropriately min imizes the size of the required parsing knowledge bases as well as the amount of text that needs to be interpreted i n detail. In future research we expect to show that RUBRIC can improve CAUCUS' performance even further by alterin g a priori confidences in various choices considered by the parser, based on the most probable concepts instantiated b y the keyword processing .</Paragraph>
    <Paragraph position="2"> Prior to MUC-3 and apart from the development of RUBRIC for its original IR function, approximately fiv e staff years have gone into developing the CODEX system and CAUCUS knowledge bases used in MUC-3 . A prototype implementation of CODEX has been demonstrated in a military message domain, and an operational version o f that system has been partially deployed . The operational system, written in C to run efficiently on a Mac II, as a run time system only, could not be used for MUC-3 because it was not finished and because it was not designed to handl e the knowledge engineering tasks . Thus, the MUC-3 system evolved from the prototype implementation of the militar y message handling system. Originally written in Allegro Common Lisp to run on a Sun3 workstation, the CAUCU S module of this system was designed to accommodate knowledge engineering and NLP research, rather than processin g efficiency. Unfortunately, its inefficient use of memory had to be remedied in order to compile a 10,000 word lexicon and parse an average MUC-3 sentence within a reasonable timeframe. Though we did eventually solve this problem , we did not do so in time to get the reengineered CAUCUS to produce output for the official MUC-3 testing . Our MUC-3 results thus reflect only the output of a Profiler configured to find relevant text for the parser to analyze . It is important to note that the output generated for MUC-3 official testing strongly reflects our Profiler/Analyzer strategy ; we could have extracted more template slot fillers from the Profiler output than we did, but we did not because we intende d to have the Analyzer produce these slot fillers .</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML