XML Viewer - c82-1065

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/82/c82-1065_metho.xml
Size: 20,812 bytes
Last Modified: 2025-10-06 14:11:29
<?xml version="1.0" standalone="yes"?>
<Paper uid="C82-1065">
  <Title>NATURAL-LANGUAGE-ACCESS SYSTEMS AND THE ORGANIZATION AND USE OF INFORMATION</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
OVERALL RESEARCH OBJECTIVES
</SectionTitle>
    <Paragraph position="0"> This paper describes a program of research intended to clarify how people, working as scientists and professionals on problems in thelr respective areas of expertise, actually use information to solve those problems. 1 The strategy we are pursuing entails the construction of computer-based systems in which the users can access different kinds of information through dialogue interactions In ordinary conversational language (see Walker, 19gl; 1982)o To carry out this program requires an extension of computational capabilities for processing and understanding natural-language requests, for representing the Information content of data and text files, and for relating the analyzed requests to the representations. Studying how people use these systems to formulate and test hypotheses and to make decisions can serve as a guide to system modification, leading not only to improvements in performance but also to effective techniques for organizing knowledge about the problem domain and incorporating it within the system. To the extent that this Iterative process is successful, the systems should increasingly come to model the behavior oF the users.</Paragraph>
    <Paragraph position="1"> The systems we are developing fall somewhere in the middle of a continuum of facilities that are relevant to the organization and use of information. The two ends of the continuum are the information retrieval systems of information science and the knowledge-based expert systems of artificial intelligence. Considered in idealized form, both ends represent static states. The information retrieval systems provide access to factual data, the raw materials of a technical domain.</Paragraph>
    <Paragraph position="2"> The expert systems embody digested knowledge that Is consensually validated as germane to that area of inquiry. In contrast, the &amp;quot;systems for experts&amp;quot; that we are creating constitute a milieu in which specialists in a particular field can I The preparation of this paper was supported in part by grants from the National Cancer Institute (No. I ROI CA26655), the National Library of Medicine (No. I ROI LM03611), and SRI International Internal Research and Development funds.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
408 D.E. WALKER
</SectionTitle>
    <Paragraph position="0"> explore the range of information available in orde; to test hypotheses or develop new insights, the results of which will eventually become part of that field's knowledge base. In that respect, our systems reflect the dynamic instabilities and uncertainties of the continuing search for new ideas and new answers--the core of the knowledge-synthesis-and-interpretation process.</Paragraph>
    <Paragraph position="1"> it is important that systems for experts actually be used by people who are specialists in an area. The capabilities we are developing are predicated on the expectation that only the person who actually needs the information is able to evaluate its adequacy. Two considerations are worth noting here. First, we recognize that these needs may not be well formulated at the beginning. A person may recognize that he or she needs to know something about an area but not be able to specify it precisely. 2 It is for this reason that our systems provide for dialogue interactions; they make possible, on the basis of an assessment of the retrieved information, a progressive refinement of the search specification or the hypothesis to be tested. Only if the user is a person able to evaluate the adequacy of the results can such a dialogue be sustained.</Paragraph>
    <Paragraph position="2"> Second, because of these differences in needs, the materials in the database will be interpreted in different ways. That is, the information is not intrinsic in the data; for example, a document's relevance for a user does not have to bear any necessary relation to the relevance its contents had for the person who generated it. Actually, scientists and professionals typically have complex problems for which there are no ready-made answers. Both the problems and the information necessary for their solution are dynamic. To be useful, a system must support the reorganization and reinterpretation of its &amp;quot;facts.'&amp;quot; It must also allow the results of these actions to be added to the database because of their value for subsequent users. As noted above, the incorporation of consensually validated information as knowledge is a key element in expert systems. The systems for expert._____~s that we are discussing here certainly must contribute to this process.</Paragraph>
    <Paragraph position="3"> However, it is essential to recognize that, even in expert systems, the underlying knowledge structures are subject to change.</Paragraph>
    <Paragraph position="4"> The following sections describe our current efforts= which explore some of the capabilities required to achieve these objectives. The first, Providing Natural-Language Access to Data, focuses on the utility of a natural-language interface for retrieving formatted information from a database by a person familiar with the subject matter, but not the structure of the file itself. The second, Representing the Information Content of Texts, concentrates on the development of procedures for analyzing propositional content so that natural-language requests can effect selective retrieval of relevant passages. The third, Facilitating Generalized Access to Information, is directed toward establishing a more general system structure wlthln which a group of people working in a related area can annotate and evaluate information sources in relation to their needs, store the commentaries so that they can be accessed by others, and communicate the results of their research.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
PROVIDING NATURAL-LANGUAGE ACCESS TO DATA
</SectionTitle>
    <Paragraph position="0"> The MEDINQUIRY project (being conducted cooperatively with research groups at the University of California at San Francigco, the University of Pennsylvania, and the National Library of Me~ielne) is concerned with providing natural-language access to clinical databases. ~ The MEDINQUIRY system (Epsteln and Walker, 1978; Epstein,  2 The characterization by Belkin and hls colleagues of an information need as an &amp;quot;anomalous state of knowledge&amp;quot; reflects this insight (Belkin, 1978; 1980). 3 This work is supported by a grant from the National Cancer Institute. &amp;quot;I am indebted to Scott Blols and Richard Sagebiel and their colleagues at UCSF, Wallace Clark and his colleagues at the University of Pennsylvania, and Martin Epstein at the NLM, in addition to Robert Amsler and BII Lewis at SRI for their contributions to this project.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
NATURAL-LANGUAGE-ACCESS SYSTEMS AND INFORMATION USE 409
</SectionTitle>
    <Paragraph position="0"> 1980) is designed to support both clinical research and patient management by physicians studying the prognosis for chronic dlseases. 4 The project database, currently being established at SRI, will eventually store information on over 150 attributes from approximately 1500 records of patleuts with malignant melanoma, a skin cancer with an unusually high mortality rate.</Paragraph>
    <Paragraph position="1"> MEDINQUIRY enables the physician to enter requests in English that retrieve specified data for particular patients or for groups of patients who share certain characteristics, that induce a variety of calculations, that enable browsing through the database, that support identifying and exploring relationships among patient attributes, and that relate information in the database to prognosis and outcome. The system consists of a natural language processor based on LIFER (Hendrix, 1977), a database access module~ the database itself, which contains information from patient records, and a response generator. When the user types in a request, the natural-language processor attempts to analyze it using general knowledge about English and knowledge specific to melanoma, that are contained in a set of grammatical rules defining the language accepted by the system. The grammar was developed on the basis of a comprehensive review of the literature on melanoma~ an analysis of the database, and discussions with melanoma experts. For requests that are analyzed successfully, the user is presented with a paraphrase to show how the system has interpreted it. For requests that cannot be analyzed, an attempt is made to explain the difficulty. Once the request has been analyzed grammatically, a set of functions is applied to create a logical statement of its content in a formal query language. That query is applied to the database, and the requested data are retrieved and returned to the natural-language processor.</Paragraph>
    <Paragraph position="2"> There, they are reorganized in a form that corresponds to the language of the original request, and the result is presented to the user.</Paragraph>
    <Paragraph position="3"> ME,INQUIRY supports dialogue interactions; the user can follow a llne of inquiry to test a particular hypothesis by entering a series of requests that are sequentially interdependent. Phrases can be used as well as complete sentences, the meaning of a given phrase being established on the basls of the analysis of the prior request. The user can actually define new constructs at word, phrase, and sentence levels that generalize to allow interpreting a set of related constructions. For every user session MEDINQUIRY automatically records a transcript that provides a complete record of requests entered and responses made.</Paragraph>
    <Paragraph position="4"> This facility proved to be extremely helpful for evaluating problems encountered during system development; we plan to use it extensively in our studies of hypothesis formation and testing by physicians.</Paragraph>
    <Paragraph position="5"> The primary medical objectives of the project include investigating the natural history of melanoma, studying dlffere~ces between the patient populatlbns in California and Pennsylvania, and developing individual rlsk-predictlon methods.</Paragraph>
    <Paragraph position="6"> These goals entail the acquisition and management of large volumes of data. To study a particular aspect of the dlsease--and exclude the effects of others--It is necessary to stratify and form arbitrary classes of data elements and then examine their interactions. The development and testing of hypotheses entail several different levels of analysis. Material from the medical records of patients constitutes the basic data. Included are the primary clinical observations and the results of laboratory tests and histopathologlcal studies. The physician needs to identify and aggregate the critical variables and to determine how they relate to high level concepts llke &amp;quot;stage of disease&amp;quot; and &amp;quot;high risk primary.&amp;quot; The judgments of a particular physician, so labeled, can be entered into the database so that others can assess their utility.</Paragraph>
    <Paragraph position="7"> MEDINQUIRY is operational, and we are In the process of entering data from patient records. When a sufficient amount of material is available, the physicians on the project will begin to access the database systematically. Then, we will begin the 4 MEDINQUIRY, written in INTERLISP, is installed on DEC 2060 computers at SRI and at the National Library of Medicine in Betbesda, Maryland.</Paragraph>
    <Paragraph position="8"> 410 D.E. WALKER real process of evaluation, using those observations to guide refinements in the system design.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
REPRESENTING THE INFORMATION CONTENT OF TEXTS
</SectionTitle>
    <Paragraph position="0"> Increasing amounts of medical information in text form are becoming available for computer-based search and retrieval. However, the existing key-word-based procedures for locating a particular passage in a document are both awkward to use and grossly insensitive. To enable more efficient access by physicians and other health professionals, we are developing capabilities that allow a person5to search a textual data base more effectively through natural-language dialogues.</Paragraph>
    <Paragraph position="1"> The initial database for our research Is a computerized monograph containing current knowledge about hepatitis, the Hepatitis Knowledge Base being developed at the National Library of Medicine (Bernsteln et al., 1980). We are encoding (primarily by hand but wlth computer assistance) a &amp;quot;text structure&amp;quot; for the document that consists of logical representations summarizing the Information content of individual passages together wlth a specification of the hierarchical relationships among the passages. The logical representations are expressed in a formal language in which canonical predicates are useo.</Paragraph>
    <Paragraph position="2"> In the text access system we are developing, a request is analyzed in two major phases (I) A grammatical analysis determines the structure of the sentence, which is then translated into its logical form. For this purpose, we are usin~ DIALOGIC, a natural-language-understanding system developed at SRI. v (2) Inferences drawn from a knowledge store are used to solve discourse problems posed by the request and to translate it into the canonical predicates in which the text structure is expressed. The result is th@n matched against the text structure to identify relevant passages for retrieval. l The knowledge store is of particular interest, because it allows the text access system to deal with requests that go beyond its canonical vocabulary. It is not enough to represent Just the vocabulary in the monograph itself, for a physician cannot be expected to be restricted to that set of terms. The user will generally be approaching the document from the broader point of view of medicine as a whole and without knowing precisely what is included in the text. Consequently, we need to Incorporate knowledge about the larger domain within which a request is being formulated, Here, too, it is not sufficient simply to relate the medical knowledge of the user to the actual contents of the monograph, because requests will often concern aspects of the disease that are not mentioned in the particular text, but would be in a more comprehensive &amp;quot;possible'&amp;quot; hepatitis knowledge base. For example, requests can concern aspects of hepatitis that are not yet known or are no longer believed, and have therefore been deleted from subsequent versions of the HKB. Requests may use a vocabulary that is not in the document, and may mention events, specific interactions, and exceptions for which the existing monograph has only indirect information. The existing texts on hepatitis are really only one part of a larger set of texts that did or could exist; the requester is, in effect, addressing a request to this larger set rather than to the actual document.</Paragraph>
    <Paragraph position="3"> 5 This work is supported by a grant from the National Library of Medicine. It is reported in Walker and Hobbs (1981) and in Hobbs, Walker, and Amsler (1982), a paper that is included in the proceedings of this conference.</Paragraph>
    <Paragraph position="4"> 6 DIALOGIC is described in Grosz et al. (1982), which is included in the proceedings of this conference; the grammar for the system is discussed in Robinson (1982).</Paragraph>
    <Paragraph position="5"> 7 Hobbs (1980) provides a more detailed description of the DIANA system, which is used to perform the Infereuclng.</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
NATURAL-LANGUAGE-ACCESS SYSTEMS AND INFORMATION USE 411
</SectionTitle>
    <Paragraph position="0"> Our work on representing the lnformatlon content of texts ls still at an early stage of development. However, the problems we are addressing are critical for the development of capabilities that can support people who organize and use information.</Paragraph>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
FACILITATING GENERALIZED ACCESS TO INFORMATION
</SectionTitle>
    <Paragraph position="0"> Polytext is a new system concept for text retrieval being developed in cooperation with Hans Karlgren and the Kval Institute for Information Science In Stockholm, Sweden (Karlgren and Walker, 1980). 8 Responding to inadequacies in current information retrieval technology, Polytext provides the following capabilities: dialogue facilities aid the user in formulating and refining requests and in evaluating the relevance of both intermediate and final results; the successive experiences users have with the data are accumulated in the database and avallable to others; alternative algorithms and strategies (human as well as computer) for processing texts and representing the information they contain are accommodated-and the metatextual commentary they provide is explicitly ident\[fled as to source. The central feature of the system is the notion of &amp;quot;messaging&amp;quot;: all data elements In the system are considered to be messages--as are the alternative representations of the content of each data element, the requests addressed to the system, the evaluations of the relevance of each request to the data retrieved, and other communlcatlons among users. The structural features of each such message--in particular, a toplc/comment relation patterned after contemporary linguistic usage (Kiefer, 1980)--provlde the basis for linking it appropriately to other messages. When a user's request is processed, pointers are provided both to relevant items In the primary source text and to the results of previous requests that appear related. It Ks possible to examine the rationale for the relationships adduced and to identify their origin.</Paragraph>
    <Paragraph position="1"> After establishing the design features for Polytext, we recognized the need for a project of this magnitude to proceed by well-deflned steps. Accordingly, the next phase of our research was the production of a demonstration model to verify some of the basic concepts (Loef 1980). For this initial work, we selected a short legal document containing rules for arbitrating disputes that arise in connection with contracts. We developed three ways of providing access to the text, using, respectively, (I) index terms, (2) the hierarchical structure of the text, and (3) an analysis of the predlcate-argument, or propositional, structure of the text to derive a more detailed model of the information It contains. For each approach, we provided the appropriate interface to a LIFER grammar, so that it was actually possible to enter English queries and to retrieve the appropriate passage as a response.</Paragraph>
    <Paragraph position="2"> We are attempting to keep the basic Polytext software as simple as posslble.</Paragraph>
    <Paragraph position="3"> Therefore, intelligent and short-llved modules are kept outside as programs uslng the system rather than as parts of it. Thus text analyzers (machine or human) may take one message at a time, interpret it, and report the result as a new message, which has the analyzed message as its topic and the recoding In some meta-language as its comment. The lexicon for the system would itself be :,toted in message form, and the programs could use other messages for Information in the course of their analyses, p In the context of the research program, Polytext constitutes an environment in which the range of Issues associated with the organization and use of information can begin to be evaluated. It will be essential t~ incorporate sophisticated capabilities for dialogue Interactlon and content representatlon--of the kind being developed in the other two projects--but we have at least begun to establish a flexible system structure that can ac~mmodate many users and accumulate tbei# collective experiences.</Paragraph>
    <Paragraph position="4"> 8 This work is supported by SRI Independent Research and Development funds and by several Swedish sources. I am deeply indebted to Hans Karlgren for his inspiration.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML