File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/93/m93-1019_abstr.xml
Size: 4,550 bytes
Last Modified: 2025-10-06 13:47:52
<?xml version="1.0" standalone="yes"?> <Paper uid="M93-1019"> <Title>SRI: Description of the JV-FASTUS System Used for MUC-5</Title> <Section position="1" start_page="0" end_page="221" type="abstr"> <SectionTitle> INTRODUCTION AND BACKGROUN D SRI International developed an information extraction system called FASTUS', a permuted acrony m </SectionTitle> <Paragraph position="0"> standing for &quot;Finite State Automata-based Text Understanding System . The choice of acronym is somewhat misleading, however, because FASTUS is a system for information extraction, not text understanding.</Paragraph> <Paragraph position="1"> The former problem is much simpler and more tractable, characterized by a relatively straightforward specification of information to be extracted from the text, only a fraction of which is relevant to the extraction task, and with the author's underlying goals and nuances of meaning of little interest . In contrast, a text understanding task is to recover all of the information in a text, including that which is only implicit in what is actually written . All the richness of natural language becomes fair game, including metaphor, metonymy , discourse structure, and the recognition of the author's underlying intentions, and the full interplay between language and world knowledge becomes central to the task.</Paragraph> <Paragraph position="2"> Text understanding is extremely difficult, and presents a number of research problems that have no t yet been adequately solved . On the other hand, the relative simplicity of the information extraction tas k means that the full complexity of natural language need not be confronted head-on . In fact, much simple r mechanisms can be successfully employed to solve the more constrained problem, and in a computationall y efficient and conceptually elegant way. It was this insight that led to the development of FASTUS fo r extracting information from articles about terrorism in Latin America for the MUC-4 evaluation [2] [1].</Paragraph> <Paragraph position="3"> In contrast to natural-language processing systems designed for text understanding applications, FASTU S does not do a complete syntactic and semantic analysis of each sentence . Instead, sentences are processed by a sequence of nondeterministic finite-state transducers . The output of each level of transducers becomes th e input to the next level . Each level of processing produces some new linguistic structure, and discards som e information that is irrelevant to the information extraction task . The nondeterminism of the transducers makes it possible to produce local analyses of fragments of the input that can be combined into a complet e analysis. There is no need to determine the complete structure of each sentence when such an effort has littl e payoff for the task at hand. The nondeterminism can also be exploited to produce competing analyses o f portions of the text . These alternatives can be compared, and the best analysis can be selected for processin g at subsequent levels, reducing the combinatoric complexity of the subsequent levels.</Paragraph> <Paragraph position="4"> ' FASTUS is a trademark of SRI International .</Paragraph> <Paragraph position="5"> When the transducer for the final level enters a final state, the result is a &quot;raw template&quot; that is unified with other raw templates from the current and previous sentences. At the final stage, a postprocessor transforms the raw templates into the form required by the specifications of the task .</Paragraph> <Paragraph position="6"> The basic architecture of the MUC-5 FASTUS system has evolved from the MUC-4 FASTUS system i n only minor ways . The primary difference is the addition of a user interface to facilitate rapid developments o f the system in a new domain . When we developed the MUC-4 FASTUS system, we had extensive experienc e working in the terrorist domain for the MUC-3 TACITUS system . Before this year, two open question s existed : (1) Does the FASTUS system provide the basic tools necessary to develop a new information extraction system from scratch in a short period of time? (2) Does the FASTUS approach succeed wit h languages significantly different from English? We believe that our MUG5 experience enables us to answe r both of these questions with a confident &quot;yes .&quot; In the following discussion, we will refer to the English Joint Venture FASTUS system as EJV-FASTUS , and the Japanese system as JJV-FASTUS .</Paragraph> </Section> class="xml-element"></Paper>