File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/86/h86-1002_intro.xml
Size: 3,461 bytes
Last Modified: 2025-10-06 14:04:33
<?xml version="1.0" standalone="yes"?> <Paper uid="H86-1002"> <Title>System Development Corporation -- A Burroughs Company prepared by</Title> <Section position="2" start_page="0" end_page="11" type="intro"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> We are engaged in the development of systems capable of analyzing short narrative messages dealing with a limited domain and extracting the information contained in the narrative. These systems are initially being applied to messages describing equipment failure. This work is a joint effort of New York University and the System Development Corp. for the Strategic Computing Program. Our aim is to create a system reliable enough for use in an operational environment. This is a formidable task, both because the texts are unedited (and so contain various errors) and because the complexity of any real domain precludes us from assembling a &quot;complete&quot; collection of the relationships and domain knowledge relevant to understanding texts in the domain.</Paragraph> <Paragraph position="1"> A number of laboratory prototypes have been developed for the analysis of short narratives. None of the systems we know about, however, is reliable enough for use in an operational environment (the possible exceptions are expectation-driven systems, which simply ignore anything deviating from these built-in expectations). Typical success rates reported are that 75-80% of sentences are correctly analyzed, and that many erroneous analyses pass the system undetected; this is not acceptable for most applications. We see the central task of the work to be described below as the construction of a substantially more reliable system for narrative analysis.</Paragraph> <Paragraph position="2"> Our basic approach to increasing reliability will be to bring to bear on the analysis task as many different types of constraints as possible. These include constraints related to syntax, semantics, domain knowledge, and discourse structure. In order to be able to capture the detailed knowledge about the domain that is needed for correct message analysis, we are initially limiting ourselves to messages about one particular piece of equipment (the &quot;starting air compressor&quot;); if we are successful in this narrow domain, we intend to apply the system to a broader domain.</Paragraph> <Paragraph position="3"> The risk with having a rich set of constraints is that many of the sentences will violate one constraint or another. These violations may arise from problems in the messages or in the knowledge base. On the one hand, the messages frequently contain typographical or grammatical errors (in addition to the systematic use of fragments, which can be accounted for by our grammar). On the other hand, it is unlikely that we will be able to build a &quot;complete&quot; model of domain knowledge; gaps in the knowledge base will lead to constraint violations for some sentences. To cope with these violations, we intend to develop a &quot;forgiving&quot; or flexible analyzer which will find a best analysis (one violating the fewest constraints) if no &quot;perfect&quot; analysis is possible. One aspect of this is the use Of syntactic and semantic information on an equal footing in assembling an analysis, so that neither a syntactic nor a semantic error would, by itself, block an analysis.</Paragraph> </Section> class="xml-element"></Paper>