File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/x98-1014_intro.xml
Size: 1,827 bytes
Last Modified: 2025-10-06 14:06:52
<?xml version="1.0" standalone="yes"?> <Paper uid="X98-1014"> <Title>ALGORITHMS THAT LEARN TO EXTRACT INFORMATION m BBN: TIPSTER PHASE III</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> INTRODUCTION </SectionTitle> <Paragraph position="0"> We believe that trained statistical models offer significant advantages for information extraction tasks. In this report on BBN's research under the TIPSTER III program, we describe a number of research efforts that developed fully-trained systems whose extraction performance was close to the highest levels achieved by carefully optimized systems based on hand-written rules.</Paragraph> <Paragraph position="1"> SIFT, the first system described, extracts entities and relations from text. On the sentence level, it combines syntactic and semantic knowledge in a novel way, thus taking advantage of the significant recent progress in statistical parsing and leveraging those techniques for information extraction. Knowledge of English syntax extracted from the Penn Treebank is automatically combined with semantically annotated training material in the target domain that identifies how the entities and relations of interest in the domain are signaled in text. At the message level, the local entities and relations identified within each sentence are then merged, and cross-sentence relations are identified using an additional trained model. The resulting system achieved the second-best score of those participating in the MUC-7 evaluation.</Paragraph> <Paragraph position="2"> The second system described here is the IdentiFinder TM system for locating named entities. This system is a fully-trained, HMM-based model that learns from examples the contextual clues that help to identify names in the text.</Paragraph> </Section> class="xml-element"></Paper>