File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/79/j79-1023_metho.xml

Size: 128,495 bytes

Last Modified: 2025-10-06 14:11:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="J79-1023">
  <Title>NATURAL LANGUAGE COMMUNICATION IN A PERSONAL INFORMATION RETRIEVAL SYSTEM</Title>
  <Section position="2" start_page="0" end_page="2" type="metho">
    <SectionTitle>
NATURAL LANGUAGE COMMUNICATION
IN A PERSONAL
INFORMATION RETRIEVAL SYSTEM
</SectionTitle>
    <Paragraph position="0"> This paper is based on a doctoral disserfiation by the first author.</Paragraph>
    <Paragraph position="1"> Support from the iUationai Science Foundation under ~rmt 'No. DCR71-02038 is gratefully acknowle8ged. Those wishing more mmplete details about system cads and imp.tomentatioa should,write the s-d author for a User's Manual.  lVow att southern Railway System, 125 Spring Street, S. W., Atlanta, Georgia 30.303</Paragraph>
  </Section>
  <Section position="3" start_page="2" end_page="2" type="metho">
    <SectionTitle>
ABSTRACT
</SectionTitle>
    <Paragraph position="0"> Natural language combinesnOuns and adjectives into noun phrases,, and links phrases by means of.p,repositions to form complex descriptiops of objects and topics. AUTONOTEZ, a file-orsented retrieval systeq, allows the user to employ such descriptions to characterize the items of information he wishes to store and retrieve. Tn addition. the system also cmstructs a network qpresentation of the user's sub3ect matter, using syntactic analysis to derive dependency structures fxhn hh descriptions. The depe~dency information, expressed as subordinate and coordinate linkages among the phrases, is representea by a tree of nodes, with simple phrases at the terminalbranches. The PARSER uses the network to digambiguate dew criptions, querying the user only abht regidual ambiguities.</Paragraph>
    <Paragraph position="1"> Associated with the PARSER is a network LOCATOR, which determines whether a ckrent user description refers to an existing topic at some level in the network. The LOCATOR also builds a table specifying t;he changes, if anp, to be mede in a network in order to represent the topicdnferred from the current input description. For example, if the user's description contains one or more simple phrases (thereafter referred to as active) directly describing at least one existing node in the network, the description as a whole quite likely references an exssting network topic. To locate 1t, the PARSER fgrs't deterdaes tlie - focus phrase, the active phrase at the highest dependency level. The nodes directly described by the focus phrase are wed to generate candidate topice. These then are matched against the remaining active phrases obtained from the description to determine the most likely referent.</Paragraph>
    <Paragraph position="2"> Manp gf the procedures employed in dsScription and representation also are wed in network-mediated' retrieval. The user may* initiate retrieval with a FIND comm~nd, supplying a descripti~n as afgument. The resultant phrase table is passed along to the network LOCATOR, which returns a node ntllhber to the FIND processor. The FIND processor constructs a set of item numbers by extracting the tkxttral refereaces from the node. The system then checks for upward pointers from the node. If there ars.tm.-uctura~ly xelsted topics, the FIND processor so informs the user. Note that by virtue of netwprk midiation of retrieval, if a user description Ys imprecise or incorrect, the systemmay be able to direct the user to relevant related topics.</Paragraph>
    <Paragraph position="3"> meri the system queries the user about a topic, for example to determine the intent-of a descriptPon, the eapic node number is passed to a SPEAKER component. A phraeal description of the node is returned. To minimize redundant cmication, a level indicator may Qe set according to the level of &amp;tail in the user's description. For example, if the user describes an item as RESULTS OF TBE WPERIMWT and the systemamst ask it he in rezerring to SMITH'S EXPERIMENT ON</Paragraph>
  </Section>
  <Section position="4" start_page="2" end_page="2" type="metho">
    <SectionTitle>
SHORT TERM MEMIRY OF WHITE RATS,
</SectionTitle>
    <Paragraph position="0"> the resulting query would be: ARE YOU WRRING TO SMITH'S EXPERIMENT ON WORP? Cmetruct5on of a desclr;iption from the network takes place in mob stages.</Paragraph>
    <Paragraph position="1"> The fitst stage steps thorugh the network recurs&amp;vely, collecting the eiqle phrases that directly or indirectly describe fhe speoified node. The level indicator blocks collection of simple phrasee below the specified level. The second stage is carried out by a recursive algorithm that operaqes on the tabled simple phraygs and their interrelations to construct the phrasal description.</Paragraph>
    <Paragraph position="2"> The last major component of the system handle6 network modification and reorganization. This enables the user to add or remove references and phrases, and to modify,, delete, or reorganize his topic structure. A detailed ease study comparing AUTONOTE2 with a good keyword-based retrieval aystxm showed that fol: a coherent body of material, the comunicati,ve efficiency 0% AUTONOTEL, as measuredfbp the ratio of the number of pords conveyed to the number of words entered, was more than double that of the kyword-based system. Retrieval capability was enhanced considerably, and the tepresentati~n dewqrk effectxvely distinguished among the many topics partially indexed by the same words. Furthermore, SPEAKER output of topics from the rep~esentatiorral network proved a useful retrieval intermediary, greatly reducing the need fo,r perusal of item texts.</Paragraph>
  </Section>
  <Section position="5" start_page="2" end_page="2" type="metho">
    <SectionTitle>
TABLE OF CONTENTS
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="6" start_page="2" end_page="2" type="metho">
    <SectionTitle>
REFERENCES 97
I. INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> When two humans communicate, each party builds up a conceptual representation of the topics of discussion.</Paragraph>
    <Paragraph position="1"> Such repreqentations are fundamental to human cormrmnicat%~e efficiency.</Paragraph>
    <Paragraph position="2"> The listener' s representation of the topics alteacly discussed facilitates communication in that the speaker is spared the trouble of describing in complete detail those thbgs to which he refers. Furthermore, the speaker can proceed to related topics without having to describe them in full.</Paragraph>
    <Paragraph position="3"> For example, a speaker who has been talking about the design of a particular experiment can safely move on to discuss the results of the experiment without specifying anew the experiment he has in mind. We use the term referential communication to indicate the process by which a speaker communicates a reference to some subject or topic to a listener. ff within the envirbnment of a personal information system, we view the information universe as a collection of textual materials each</Paragraph>
  </Section>
  <Section position="7" start_page="2" end_page="2" type="metho">
    <SectionTitle>
II
</SectionTitle>
    <Paragraph position="0"> pertinent to one or more topics,&amp;quot; then one can readily construct an analogy, The user and system take on the rol~~s of speaker and listener, respectively.</Paragraph>
    <Paragraph position="1"> The domain of discourse is a set of topic descriptions characterizing the ther's t~tualmaterials, The user enters his materials and describes to the system the topic or topics to which they pertain. Durlng this process, the system constructs its - represiintation for the subjects the user has described, and associatgs each piece of text with its, corresbonding topic representations.</Paragraph>
    <Paragraph position="2"> This paper describes the design and implementation of a personal information storage and retrieval system based on the foregoing analogy with human referential communication. It presents a hierarchical network data strqcture foe representing topic descriptions formulated within a phrasal description language.</Paragraph>
    <Paragraph position="3"> ~alied the representational network, this structure enables the system to move easily from one Bubject to other elated ones. It provides a means for representing the user's working context, thereby enabling the user to describe his materials much more tersely than is possible in keyword-based systems. The system makes use of the syntactic dependencies among the words and phrases of descriptiops in or'der to represent structural relationsh'ips among the user's topics. Consequently, the user impaats structure to the data base in a particularly natural way, eliminating much of the organization activity normalb associated with keyword-based systems. Our central thesis is that the network mediated techniques provide for mze effective mawmachine communication during the processes of description, organization, and retrieval within a personally generated information universe The procedures used here differ substantially from the typical keyword indexing and retrieval mechanisms of other personal retrieval systems. The centrab objective is to provide the user with framework for defining the important topics or informational objects he deals with, and to enable him to easily associate items in his data base with these entities. Rather than viewing the data base as a collection of items and associated index terms, the user deals with &amp;quot;objects&amp;quot; thet are in some sense meaningful to him. Whether retrieving information or indexing new material the user conveys references to the appropriate topics. This shift in the user's view of his information universe, coupled with the mechanisms we have developed for building up and referring to the topic framework, oonstitute the substance of our approach to personal information storage and retrieval.</Paragraph>
  </Section>
  <Section position="8" start_page="2" end_page="2" type="metho">
    <SectionTitle>
11. THE AWONOTE SYSTEM
</SectionTitle>
    <Paragraph position="0"> The system described here uses the AUTONOTE information storage and retrieval system (Reibnan - et -. al 9 1969) as a base. AUTONOTE is an on-line retrieval system that runs a&amp; a user program under the Michigan Terminal System (MTS) , a time-oharing system implemented on the IBM 370/168. The basic units of information stored in AUTONOTE a're called items. The user may enter arbitrary textual materials into an item and may assign descripmrs by whfch these materials can be retrieved. Retrieval requests take the farm ~f single descriptors or combinations of descriptors connected by AND, OR, 9s NOT logical operators. Facilities are provided for deleting, replacing, linking, and hierarchically organizing text item&amp;.</Paragraph>
    <Paragraph position="1"> AUTONOTE makes extensive use of the WS disk file system EiflIS disk files (line files) may be read or written either sequentially or in an indexed fashion by specifying a line file number. AUTONOTE maintdins two line files for each user's data base; one for storing textual materials and bookkeeping information, the other for storing a descriptor index. Each text item occupies a specific region of the line number range of the text file. The descripfor index, on the other hand, is accessed through an efficient hash coding algorithm that mdps each descriptor into an index file line number.</Paragraph>
    <Paragraph position="2"> The descriptor index is organized as an inverted file, that is, each line in the index contains pointers to each of the text items assigned the descriptor for that line.</Paragraph>
  </Section>
  <Section position="9" start_page="2" end_page="2" type="metho">
    <SectionTitle>
Basic AUTONOTE Commands
</SectionTitle>
    <Paragraph position="0"> Text entry, To enter a new text item, the user first types the command ENTER and the eystem responds with a numerical tag for the new item.</Paragraph>
    <Paragraph position="1"> The system then enters a &amp;quot;lext insertian mode&amp;quot; and indicates its readiness to accept successive text lines with a question mark. Aftex entering text,, the user may return to &amp;quot;command mode&amp;quot; by entering a null line or an end-of-file indication, Should the user at any time wish to continue inserting text into the current item, he may re-enter text insertion mode via the INSERT cormnand. Subsequent lines are placed below the most recent line for the current item in the text file.</Paragraph>
    <Paragraph position="2"> In command mode, the system prompts the user for input with a minus sign. The user may give each command in full or he may abbreviate by giving any initial substring of the command name.</Paragraph>
    <Paragraph position="3"> Descriptor entry. To associate one or more descriptors with the current text item, the user enters a list of words, beginning the input line with an at sign (@). Any character string up to 16 characters in length may be used as a descriptor. kn addition to updating the descriptor index, the system also places the actual &amp;quot;@-linet' in the text file in a subregion beneath the text aPS che current item.</Paragraph>
    <Paragraph position="4"> Retrieval, To display a partciular text item the user may enter the command PRINT followed by the appropriate item number. Sequential blocks of items can also be specified in the PRINT commapd, &amp;go, PRINT 77...85.</Paragraph>
    <Paragraph position="5"> In most cases, however, the specific item number(s) will not be known.</Paragraph>
    <Paragraph position="6"> The LIST command accepts a descriptor or logical combination of descriptors as its argument and ~esponds with a list of the item numbere that satisfy the. query.</Paragraph>
    <Paragraph position="7"> The functions of the PRINT and LIST commands are combined in the RETRIEVE7command.</Paragraph>
    <Paragraph position="8"> It also takes a descriptor specification Be argument and causes each item .i~ the resulting list to be PRIWed.</Paragraph>
    <Paragraph position="9"> Definitional facility. AUTONOTE also provides a definitional facility that allows the user to create sets of items referenced by arbitrary combinations of descriptors. For example, the comnd CREATE SIRS= INFORMATION AND RETRIJ3VA.L AND SYSTEMS adds a new descriptor, SIRS, to the index that references each item having the words INFORMATION, RETRI~vAL, and SYSTEMS as descriptors. Any defined term may be used just as any other descriptor in retrieval requests; they my also be used to def other new terms (e,g.,</Paragraph>
  </Section>
  <Section position="10" start_page="2" end_page="2" type="metho">
    <SectionTitle>
CREATE $OTHERSYSTEMS= SIRS NOT AUTONOTE) .
</SectionTitle>
    <Paragraph position="0"> The definitYona1 facility is also invoked implicitly each time the user issues a retrieval query. The set of items referenced by the most recent LIST or RETRIEVE command, called the active set, is assigned the name $, Should the user wish to refine the results of the previous query, he has access to the active set. To facilitate this process, each time a missing descriptor is noted in a retrieval request the descriptor $ is inserted auto~&amp;Lcally by the system. For example, the command LIST NOT FPRT is Interpreted as LIST $ NOT FROTRAN, i.e., the old active set of items is restricted to include only those not referenced by the descriptor FQRT This operation, of course, redefines the active set.</Paragraph>
    <Paragraph position="1"> Item-iteq linkae. The ability to define asso~iative links between any two text items is provided by the APPEND command. When an item is displayed, its associative links to other items may optionally be printed along with a user-lspecified comment indicating the nature of the assocLation.</Paragraph>
    <Paragraph position="2"> Tutorial feature. Throughout the course of its development, AUTONOTE has been employed to collect, organize, and maintain up-to-date documentation af its capabilities, usage strategies, and so ono This information is stored in a publically available data base. It includes brief descriptions of ea&amp; of the commands, announcements of recent developments and system changes, and other instructive information. The AUTONOTE user may call upon this store of wterial by entering a HELP command. The user s data base is temporarily set aside and the public data base is attsebed to the system. The user may then retrieve instructive informakion in the same way that he operates with his own data base. To assist novice users, the system will optionally print instructions for accessing the HELP data base.</Paragraph>
    <Paragraph position="3"> Grouping. AllTONOTE provides a grouping facility which permits the user to organize text items in several useful ways.</Paragraph>
    <Paragraph position="4"> It enables the user to define a &amp;quot;grouping item&amp;quot; which references an arbitrarily ordered list of other items This is done by entering into an item an @-line of the form: Since apy item can represent a g+oup, *it is possible to form a complex hierarchical structure in this way.</Paragraph>
    <Paragraph position="5"> A grouping item can be viewed as a node of an inverted tree structure w;ith downward branJles to those items listed in its &amp;quot;@GROW&amp;quot; line. A request to display a grouping item initiates recursive processing of the tree structure to identify the terminal ad&amp; nonterminal f terns of the hierarchy. The user may request that only terminal or',nonterminal items,be displayed, or t e entire list of materials be printed, The organization of the HELP data base described above provides an excellent example of the pmer and flexibility of the grouping facility.</Paragraph>
    <Paragraph position="6"> The HErJe text filelcontains at this writing approxinrstely 150 %terns of documentation. Using the grouping conventton, these are orgaritzed into five subgroups: (1) general #nf ormation; (2) input and editing facilities ; (3) output (retrieval) facilities; (4) organizational facilities; and (5) utflity compands. Thme-is; one major item which groups all of these aubgrottps into a single tree structure. The top node of the strusture is indexed by the descriptor USERS-MANUAL. As new facilities are incorporated into the system, their descriptions are entered .Into the manual structure, thus assuring that complete and up-to-date documentat$..on is always available. At any time, the single command: RETRIEVE USERS-WAL causes the entire updated data base to be displayed in organieed fom: Command modifiers. AUTONOTE lacJudes a set of modifiers or option eettings that control the execution of many c ds. These include options that affect the format of displayed items, the etlrpanslon of grouping atructures, the nature and extent of system feedback, etc.</Paragraph>
    <Paragraph position="7"> E&amp;ch 0-f the modifiers has a default. value that is chosen to simplify use of the system by a novice.</Paragraph>
    <Paragraph position="8"> The more experienced user may alter the modifiers via the SET c mand to. tailor the system to hie awn needs, usage patterns and hvel of competence.</Paragraph>
    <Paragraph position="9"> AUTONOTE also provides a large number of auxillary commands and facilities. A list of the major AUTONOTE commands, each accompanied by a brief description, is incldded in Linn (1972, Appendix A).</Paragraph>
  </Section>
  <Section position="11" start_page="2" end_page="2" type="metho">
    <SectionTitle>
AaTONOTE S m Otganization
</SectionTitle>
    <Paragraph position="0"> AUTONOTE l-@s been des gned as a modular system SQ that as new facilities become a~ailable~they may be tested and later added with' little or no reprogramming of the exjgting system. The majority of AUTOMOTE co are fmplemented as subroutines, each of which resides permanently in an MTS disk file. The basic system $s organized around a central monitor that accepts user input and calls upon appropriate modules to service the user's requests. In addition to the monitor, the core resident system includes a dynamic loader, a disk file interface and a set of frequently used utility routines. A number of pridtive commands, text entry, and descriptor assignment are qlso handled by the resident system. As the user =quests more cbmplex sentices (LIST, RETRIEVE, or PRINT, Ear example), the monitor calls upon the* dynamic loader to bring the appropriate modules into core storage.</Paragraph>
    <Paragraph position="1"> These routines then becoke a part of the resident system, remining in wre etorage until the user explZcitly requests their removd. An organizational diagram of the AUTONOTE system appears in Fig. 1.</Paragraph>
    <Paragraph position="2"> The modular design of AUTONOTE coupled wtth the dynamic loading facility offers two important benefits. From the user s viewpoint, he has access to the complete repertory of AUTONOTE seMces, yet he pays core storage charges only fm those routines he actually uses during a given session.</Paragraph>
    <Paragraph position="3"> To the developers of the system, the modular framework facilitates the  Fig, 1 - AUTONOTE System Organization addition of new system components. The latter has been an important factor in the implementation of the AUTONOTE2 system.</Paragraph>
    <Paragraph position="4"> 111, OVERVIEW OF AUTONOTE2 The AUTONOTE2 system uses ideas (Reitman, 2965; Reitman - et -* a1 9 1969) concerning the use of our &amp;quot;knowledge of the world&amp;quot; to clisambiguate and fill in implied facts when conversing with one anathear. Zn parti~ular, the system design is based upon the assumption that efficient human communication...</Paragraph>
    <Paragraph position="5"> &amp;quot;depends upon the listener's ability to make inferences from prior information, from context, and from a knowledge of the speaker and the world. Communicating in this way, we risk occasional misunderstanding as the price for avoiding verbose, redundant messages largely consisting of material the listener already knows&amp;quot; [Reitman - et -* a1 9 19691.</Paragraph>
    <Paragraph position="6"> In bu~ mare restricted domain of discourse, we view the process of human referential c~unication as onq &gt;pided by some f om of internal rapresentation of the various topfqs or referents discussed earlier. When a listener can be assumed to have such a representation, the speaker is spared the difficulty of describing in complete detail the things to which he refers.</Paragraph>
    <Paragraph position="7"> He; need only give enough information to allow the referent to be discerned in full. Our goal then is to develop a represenratlonal scheme for our retrieval system that allows the user analogous conrmunicative efficiencies.</Paragraph>
    <Paragraph position="8"> The Description Language The first step in devising a representational framework wa8 the fomlation of a language for expressing topic dedcriptions to the system.</Paragraph>
    <Paragraph position="9"> Although an underlying factor in the design of AUTONOTEL was to make comIt null municatloh with the system more natural,&amp;quot; it should be noted that the emphasis of this research is not upon parsing or &amp;quot;understanding&amp;quot; natural language. Rather, our goal is to investigate the notions of topic repre-sentation and referential camunication as a means for improving the user's ability to describe, organize, and retrieve his materials. Consequently, a minimal subset of noun phrases waa chosen-minimal in the sense that it excludes most of the complexity of natural English, yet still retains a degree of descriptive richness sufficient to explore the underlying ideas of this study.</Paragraph>
    <Paragraph position="10"> Natural language enables us to combine nouns and adjectives int~ noun phrases and to interlink noun phrases via prepositions to form complex descriptions of objects in the real world. The AUTONOTE2 description language provldes such a framework for composing topic references. A form~l grammar for the language is given in Fig. 2 along with a few sample descriptions that illustrate the flexibility of expression achievable with the language. These grammatical rules are not in fact used explicitly by the system in actually parsing topic descriptions.</Paragraph>
    <Paragraph position="11"> The grammar is presented here only to specify precisely the set of descriptions acceptable to the system. The actual AUTONOTE2 parser is heuristic-based, making use of previously analyzed phrases, noun-preposition co-occurrences, and a set of heuristics to guide  %odif iers and nouns are arbitrary character strings not recognized as articles or prepositions. When a number of consecutive &amp;quot;words&amp;quot; are encountered, the last is parsed as a noun and the preceding words as modifiers.</Paragraph>
    <Paragraph position="12"> b Possessive adjectives are treated as a special case of adjectival modification. null the parsing process.</Paragraph>
    <Paragraph position="13"> In some instances, the user may even be asked for parsing assistance.</Paragraph>
    <Section position="1" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
Representational Framework
</SectionTitle>
      <Paragraph position="0"> Central to the design of AUTONOTEZ is the idea of viewing the user's information universe as a collection of &amp;quot;informational objects&amp;quot; or topics, each having associated with it a number of text items.</Paragraph>
      <Paragraph position="1"> When the user wishes to describe a text item, we assume he has such a topic in mind.</Paragraph>
      <Paragraph position="2"> Using the phrasal language specified above, he composes a description of that topic and presents it to the system. AUTONOTE2 then constructs an internal repre-sentation of that topic. When a text item is described, the system must consult the representation to determine if the description (1) references an existing topic, (2) is related to an existing topic, or (3) defines a new topic. In any case, the ultimate goal is to associate the text item with a topic representation, possibly augmenting the representation in the process. Criteria for the Representation Efficiency of comunication, Efficient man-machine communication implies that the user should not in general have to formulate a complete description of a particular topic in order to convey a reference to it. The system should be capable of accepting and correctly interpreting incomplete referefices bp filling in missing information. As an example, a topic fully described as THE PAPER BY SALTON ABOUT THE SMART SYSTEM might be referred to as THE PAPER, THE PAPER BY SALTON, THE PAPER ABOUT THE SMART SYSTEM and so on. A description in the AUTONOTE2 language consists of a noun modified by adjectives and prepositional phrases. The words that modify any given term may themselves be modified in exactly the same way. In effect, each adjective and prepositional phrase functions as a phrase component that imparts greater detail to the overall description. In the example above, BY SALTON and ABOUT THE SYSTEM provide information about the paper; SMART specifies which system is meant.</Paragraph>
      <Paragraph position="3"> To facilitate efficient communication we require a representational framework that makes explicit the component phrases of each topic description. Given such a framework, we have a basis for comparing incomplete descriptions with the representation to determine possible topic referents.</Paragraph>
      <Paragraph position="4"> --.</Paragraph>
      <Paragraph position="5"> A system that makes use of syntax in the user's descriptor entries increases descriptive power in that it permits distinctions that, in geberal, will not be made in keyword-based retrieval systems. A description such as THE ORGANIZATION OF THE PAPER ABOUT MTS is semantically quite different from THE PAPER ABOUT THE ORGANIZATION OF MTS, despite the fact that both contain the same words, A system that takes into consideration the syntactic relationships that hold among the words ORGANIZATION, PAPER, and MTS can discriminate between the two.</Paragraph>
      <Paragraph position="6"> The considerations outlined thus far lead quite naturally to some form of dependency representation for the user's topics.</Paragraph>
      <Paragraph position="7"> Essentially, a dependency representation for the AUTONOTE2 language would reflect the syntactic dependence of each adjective and prepositional phrase upon an appropriate noun. Such a framework provides the essential information for enhancing descriptive power and communicative efficiency as defined above.</Paragraph>
      <Paragraph position="8"> Hierarchical representations.</Paragraph>
      <Paragraph position="9"> We view a topic as a group of interconnected subtopics, each bearing on a central theme yet with varying levels of generality.</Paragraph>
      <Paragraph position="10"> To make this notion more concrete, consider a user of AUTONOTE2 putting down his thoughts and ideas for a book he is writing.</Paragraph>
      <Paragraph position="11"> He begins by entering some general material which he describes simply as &amp;quot;THE BOOK ABOUT.. . .&amp;quot; At some later time he may enter an outline for the book, a list of reference materials he will use, publishing arrangements, etc. Still later, he wi.11 enter materials for the chapters of his book and perhaps outlines for each chapter, In time he will have defined a host of related descriptions. Fig. 3 gives a pictorla1 representation of the resultant complex &amp;quot;topic. &amp;quot; The representational spl~eme of AUTONOTE2 was designed with complex hierarchies such as this one in mind. In other words, we want to represent related topic descriptions via interconnections in a network, The essential idea is that such a network corresponds to a map of the organization of the associated textual materials--a map that should reflect important structural relationships among the materials from the user' s viewpoint. A hierarchical representation of this kind is especially effective during retrieval. If the user requests materials dealing with his book, for example, the system can also inform him that he bas more specific items dealing with the publishing arrangements, the component chapters, and so on. The notion of a representational network fits well with the dependency framework we require.</Paragraph>
      <Paragraph position="12"> The syntactic dependencies among the words and phrases of a description may be used to represent structural relationships among the user's topics.</Paragraph>
      <Paragraph position="13"> In the example above, the network connection between the</Paragraph>
    </Section>
  </Section>
  <Section position="12" start_page="2" end_page="2" type="metho">
    <SectionTitle>
OF CHAPTER 1
WE BOOK
</SectionTitle>
    <Paragraph position="0"> Fig. 3 - A Topic Hierarchy &amp;quot;out line&amp;quot; and the &amp;quot;book1' corresponds to the syntactic dependency of &amp;quot;book&amp;quot; upon &amp;quot;outline1' in the descrip tionL THE OUTLINE OF THE BOUK ABOUT. . . . Au~plentation of the repre-sentation. In the previous discussion of communicative efficiency we were concerned with associating an incomplete description with its corresponding topic. In designing the representational framedork we also had to consider the cqse in which a reference provides a more detailed d'escription of an existing topic. In such instances we want to enrich the topic represbntation to include the additional information.</Paragraph>
    <Paragraph position="1"> Whether additional descriptive information is encountered in a subsequent item description or in a retrieval request, we want the system to incorporate it into its existing knowledge of the user's topics. This requires that the representation be stmctured in such a way that dynamic augmentation is easily accomplished.</Paragraph>
    <Paragraph position="2"> The representation of context. In providing a framework for interpreting terse, incomplete references we naturally are confronted with the problem of ambiguity. A description such as THE PAPER, or TRE PAPER ABOUT MICROPRO-GRAMMING may in fact satisfy a large number of distinctly described topics. To deal with this problem we require some kind of contextual framework that enables the system to infer, where possible, the intent of a vague or ambiguous reference. A user who has been entering material for a paper he is writing should be able to describe a subsequent item as,say, THE OUTLINE OF THE PAPER, and have the system infer which paper he means.</Paragraph>
    <Paragraph position="3"> In general then, we want the representatfonal framework to include information that identifies the &amp;quot;working- context ,&amp;quot; i. e. , those topics the user has ref erred to recently. System interrogation of the user. Presented with an ambiguous description &amp;quot;out of context,&amp;quot; the system is faced with much the same dilemma a human listener would face. In such instances, we want the system to be capable of asking pertinent questions to resolve the ambiguous reference. This implies, of course, that the representation preserve sufficient information to enable it to reconstruct descriptions of the user's topics.</Paragraph>
    <Paragraph position="4"> werview of the AUTONOTE2 Irpplementation Data structures. We have now presented the major design requirements for the representational framework. These preliminary criteria suggest a representation organized as a network of (possibly interconnected) dependency structures obtained from syntactic analysis of topic descriptions. The network data structures are- discussed in section IVin terms of the representational criteria and also the computational requirements--how they are to be accessed, modified, and so on.</Paragraph>
    <Paragraph position="5"> The parser. The parsing of descriptions is guided by the state of the representation at the point they are entered. For this reason, the parsing algorithm is treated in section IV in conjunction with the representational data structures. The&amp;presentation includes detailed dlscussion of the parsing problems encountered and the heuristics employed in dealing with them.</Paragraph>
    <Paragraph position="6"> Network,,locati,on. The function of the network locator is to analyze the parse tree to decide whether the description references an existing topic or defines a new one. Once this decision is made, it constructs a list of any network madifications required to represent the topic and its associated item reference. The network location algorithm is described in section V.</Paragraph>
    <Paragraph position="7"> Retrieval. The AUTONOTE2 retrieval component is invoked via a FIND command. The command takes a topic description as its argument.</Paragraph>
    <Paragraph position="8"> The FIND processor in turn calls upon both the parser and network locator, regaining control after the appropriate network topic has been identified.</Paragraph>
    <Paragraph position="9"> Text items directly associated with the topic then may be retrieved from the data base. Alternately, the retrieval component will move to structurally related topics in the representational network to collect additional item references for subsequent display.</Paragraph>
    <Paragraph position="10"> To reconstruct topiq descriptions from the network, AUTONOTE2 includes a SPEAKER module. If the user's description is ambiguous, for exmple, the network locator may call for a display of the alternative topics. The FIND processor employs the SPEAKER to present descriptions of topics structurally related to the user's original query. The user also may invoke the SPEAKER explicitly, via a DESCRIBE comnd, to obtain descriptions of some subset of the topics in the representational network. The retrieval component, the SPEAKER, and the DESCRIBE command are treated in section VI.</Paragraph>
    <Paragraph position="11"> Network modification. The last major component obf AUTONOTEP, the network modification processor, is described in section VIZ. It allows the user to delete topic representations, create new ones, and merge multiple topics into a single representation. It also enables the user to move through clusters of related topics in order to explore associations in the network.</Paragraph>
    <Paragraph position="12"> Auxiliary commands. Various auxillary conrmands and facilities are given in Linn (1972, Appendix By. This appendix also includes some discussion of usage strategies for achieving the most effective use of AUTONOTE2.</Paragraph>
    <Paragraph position="13"> Fig. 4 depicts the organization of the AUTONOTE2 components within the  When the user wise$ to describe a text item, we assume he has in mind some subject, topic, or informational object that can be characterized by a phrasal description. A description may convey a refeence to a topic the user has dealt with earlier; or it may define a new one. The description is analyzed to determine a dependency tree--a structure that preserves the original words and phrases of the description and the syntactic dependencies among them.</Paragraph>
    <Paragraph position="14"> In constructing this tree, the parser incorporates primary units called simple phrases. A simple phrase may consist of a modifier and a noun (e .g., ACM CONFEFUNCE), or of a noun followed by a preposition and modifier (e.g., OUTLINE OF PAPER). The parser extracts these basic phrases from the original description and records the syntactic dependencies among them.</Paragraph>
    <Paragraph position="15"> A description such as THE OUTLINE: OF THE PAPER ABOUT AUTONOTE2 FOR THE ACM CONFERENCE will be analyzed into four simple phrases: (1) THE OUTLINE OF THE PAPER, (2) THE PAPER ABOUT AUTONOTE2, (3) THE PAPER FOR THE CONFERENCE, and (4) THE ACM CON-FERENCE. Each simple phrase consists of a subject noun and a modifier word. When two simple phrases have a common subject noun, we say they are coordinate simple phrases. When a modifier word of one simple phrase subsequently  becomes the subject noun of another, we say the latter phrase is subordinate</Paragraph>
  </Section>
  <Section position="13" start_page="2" end_page="2" type="metho">
    <SectionTitle>
DESCRIPTION
PROCESSOR
</SectionTitle>
    <Paragraph position="0"> Fig. 4 - The Organization of AUTONOTE2 @thin the AUTONOTE System to the former. In the above example, THE PAPER ABOUT AUTONOTE2 and THE PAPER FOR THE CONFERENCE are coordinate simple phrases--both have PAPER as their subject noun. Both of these are subordinate to the phrase OUTLINE OF THE PAPER, in which PAPER appears as a modifier word. Additioaally, the simple phrase THE ACM CONFERENCE is subordinate TO THE PAPER FOR TWE CONFERENCE.</Paragraph>
    <Paragraph position="1"> Subordinate phrases simply qualify the use of their subject words. For example, phrases subordinate to THE OUTLINE OF THE PAPER provide a more detailed description of the paper (&amp;quot;ABOUT AUTONOTE2 , &amp;quot; and &amp;quot;FOR THE CONFERENCE&amp;quot;) ; the phrase subordinate to THE PAPER FOR THE CONFERENCE further qualifies the conf erence.</Paragraph>
    <Paragraph position="2"> In effect, two kinds of dependency information are extracted by the parser. The first is the dependency of adjectives and prepositional phrases upon a noun. This information is reflected in the selection of the simple phrases themselves. Second, there are the dependency relationships among the simple phrases of the description, This information, expressed in ttrms of subordinate and coordinate linkages, may be represented by a tree structure consisting of nodes with simple phrases at the terminal branches. Fig. 5 * gives the tree structure for the example. Simple phrases wlth an immediate lknkage to a node are said to directly describe that node. Note that the two coordinate phrases from the example directly describe a common node, node B.</Paragraph>
    <Paragraph position="3"> The subordinate relationship of the node B phrases to the node A phrase, and in turn, that of the node C phrase to the node B phrases is reflected by downward branches connecting those nodes.</Paragraph>
    <Paragraph position="4"> The resultant tree structure degines the representatf~n of its corresponding topic.</Paragraph>
    <Paragraph position="5"> Representations of each of the user's topics are organized into a Fig. 5 - Siaiple Phraqe Dependency Structure hierarchical data structure called the representational network. The representational fietwork is composed of interconnected nodes, simple phrases, and words. When description is mapped ohto the network, the number of the associated text item is stored with the highest order node in the corresponding topic representation.</Paragraph>
    <Paragraph position="6"> Each node in the network may b&amp;ve up to four types of linkages: (I) pointers down to simple phrases that directly describe the node; (2) pointers dm to subordinate nodes; (3) pointers up to superior nodes; and (4) pointers to textual materials associated with the node. Each simple phrase or single word is directly accessible 3s a unit in the network. through hash coding procedures similar to those used in maintaining the AUTONOTE keyword index. I~sociated with each simple phrase are the linkages to the node(s) the phrase directly describes. In turn, each sdngle word has associated pointers that lead the system to the simple phrases containing the word. Fig. 6 illustrates the network representation of the eftample.</Paragraph>
    <Paragraph position="7"> Once a topic is defined in the network, the user can refer to it using a word, a simple phrase or composition of simple phrases, For example, should the user later describe a new text item as say, OUTLINE OF THE PAPER or OUTLINE OF THE PAPER ABOUT AUTONOTF2, the system will note that: it already has a representation for the topic. The only change to the network in such cases is the addtkion of new item reference linkage to the identiffed node (node 1). In general, the system attempts to relate each new item description to those it already &amp;quot;knows&amp;quot; about. For new topics, new nodes are allocated in the network. Should some subset of the simple phrases of a new description refer to an existing topic, the additional simple phrases are  linked to that existing representation. For example, if in reference to the same paper the user describes another item as THE ABSTRACT OF THE PAPER ABOUT AUTONOTE2, the system would modify the network to that shown in Fig. 7.</Paragraph>
    <Paragraph position="8"> Design of the Network Data Structures List structures. As noted above, the representational criteria dictate a hierarchical ne4Qork-type organization, based upon dependency analyses of topic descriptions. List structures are particularly well suited for this kind of application. They provide a convenient representahon for dependency trees and are especially appropriate for dealing with complex, evolving structures.</Paragraph>
    <Paragraph position="9"> In designing special purpose list structures for the representational net~~ork, we first specified the logical components of the structure and defined the interconnections among these primitives. Three logical components were forumlated--simple phrases, nodes, and words. The following subsections present the major design considerations for each structural component.</Paragraph>
    <Paragraph position="10"> Simple phrases,. Given our goal of communicative efficiency, we chose the simple phrase as a primary udit for the network. By analyzing a toplc</Paragraph>
    <Paragraph position="12"> description into simple phrases we are in effect isolating possible shorthand&amp;quot; references to the given topic. The representational data structures have been designed to allow a topic to be referenced through any of its component simple phrases.</Paragraph>
    <Paragraph position="13"> Simple phrases are formed from either adjectival or prepositional modification of a noun. Very often, an adjectival modification can be equivalently expressed by a prepositional phrase dependent upon the same noun  (example: THE PAPER ABOUT AUTONOm and THE AUTONOTE PAPER). In other instances the adjectival form may have multiple interpretations; THE SMITH ARTICLE could refer to an article by Smith or possibly an article about Smith. Some prepositions may be used synonymousLy in a particular context (THE PAPER ABOUT (ON) SHORT TERM MEMORY); others convey distinctly different meanings (THE MEMO TO THE COMMITTEE versus THE MEMO FROM THE COMMITTEE). We do not deal with these problems to the extent of providing a semantics for &amp;quot;understanding&amp;quot; natural language. Hmwer, the representational structure makes explicit the various possibilities, so that the system is able to generate plausible alternatives.</Paragraph>
    <Paragraph position="14"> We treat adjectival modification as a special case, as if the modifier and subject noun were related by an unspecified preposition. In terms of the data structure design, all simple phrases composed of the same two words are mapped into a larger unit, each subunit of which represents a particular instance of a simple phrase in a topic description. This arrangement assures that all information on simple phrases involving any two words is accessible collectively. This information will then be at hand to provide a basls for interpreting the alternative referents of each incoming simple phrase. For example, should the user make reference to THE SMITH PAPER and the system finds only PAPER ABOUT SMITH in the network, then that single alternative is chosen. On the other hand, if PAPER BY SMITH also is present, the system considers both possibilities.</Paragraph>
    <Paragraph position="15"> Network nodes.</Paragraph>
    <Paragraph position="16"> The next structural component of the representational network is the node, A node groups together a set of simple phrases that comprise the description of the node, me node also functions as a collector of item references, Each node in the representational network corresponds to a topic or concept pertinent to the items of textual material associated with it.</Paragraph>
    <Paragraph position="17"> The simple phrases that directly describe a node define the corresponding concept.</Paragraph>
    <Paragraph position="18"> Any given node may be linked to more general (lower) nodes, or to more specific (higher) nodes. For example, a node that represents a particular paper may be linked downward to another that describes a conference at which the paper was presented; it may also be linked to several higher order nodes corresponding to, say, a summary, an outline, and a review of the paper. As more and more items are described, additional topics may be tied into the same conference node. The ultimate result will be a highly interconnected set of concept nodes, each with its own set of associated textual materials.</Paragraph>
    <Paragraph position="19"> To achieve this kind of structural organization for the network, we make use of the dependency relationships in the user's descriptions: each node level corresponds to a syntactic dependency level. In terms of the example above, the adjectives and prepositional phrases modifying the noun &amp;quot;paper&amp;quot; are formed into simple phrases that will directly describe a common node. Simple phrases identifying the confer~nce will describe a subordinate node due to the syntactic dependency of &amp;quot;conference&amp;quot; upon &amp;quot;paper&amp;quot; in a phrase of the form, PAPER AT THE...CONFERENCE. Superior nodes are assigned to the outline, the summary, and the review, reflecting the dependence of &amp;quot;paper'' upon those nouns in appropriate descriptions.</Paragraph>
    <Paragraph position="20"> A node may be viewed as a collection of pointers to simple phrases, other nodes, and text items. All node linkages are two-way. Pointers dm from a node to its simple phrases are required in order to reconstruct a description of the node, Pointers down to subordinate nodes are necessary for the same reason. Both upward and downwa~d pointers to other nodes provide a means for moving from any topic to structurally related ones. Associated with each instance of a simple phrase is a po9nter to the node where item references are stored, Finally, bookkeeping information stored with each text item includes pointers to each topic node with which the item is associated. Itemnode linkages enable the system to provide the user with topic descriptions of any text item.</Paragraph>
    <Paragraph position="21"> Words. The representational structures considered thus far provide simple phrases as the sole means for accessing the nodes in the network. A less restrActive access mechanism also is required, for several important reasons. First, it would be unrealistic to assume that the user will always phrase references to a particular topic in exactly the stme way, Second, single word descriptions play an important role in achieving our goal of communicative efficiency. Since we anticipate that users will make frequent use of single word references when working in the context of a p~rticular topic, we want to provide a natural and convenient treatment of such descriptions. Finally, a phrasal description can convey a higher order categ~rization of an existing topic without containing a simple phrase for that topic. For example, THE</Paragraph>
  </Section>
  <Section position="14" start_page="2" end_page="2" type="metho">
    <SectionTitle>
REVIEMER'S COMMENTS ON THE PAPER may reference a paper mentioned earlier; yet
</SectionTitle>
    <Paragraph position="0"> it contains no simple phrases describing that paper.</Paragraph>
    <Paragraph position="1"> These considerations lead us to the third logical component of the network data stmctu\res, the single word. Essentially, each component word provldes access to a series of pointers to simple phrases in which the word occurs.</Paragraph>
    <Paragraph position="2"> Word-to-phrase pointers are of two types: those indicating usage as subject noun; and those indicating modifier usage in a particular simple phrase. As we shall see later, this distinction is required in order to relate new simple phrases to existing topics at an appropriate node level.</Paragraph>
    <Paragraph position="3"> Hadng specified the three logical components and the linkages in the representational network, we now turn our attent&amp;on to the storage implementation of these structures.</Paragraph>
    <Paragraph position="4"> Storage Implementation of the Network Structure There are three directories needed to maintain the representational network, one for each of the components of the structure. All directory informatlon must, of course, be saved in permanent storage between AUTONOTE2 sessions. Two design alternatives were considered for maintaining the network during execution of the program. Thedirectories could be accessed and updated on disk, or they could be brought into core storage for the duratlon of the session. We adopted the former strategy for a number of reasons.</Paragraph>
    <Paragraph position="5"> First, AUTONOTE is highly oriented toward the use of disk file storage.</Paragraph>
    <Paragraph position="6"> Several file interface routines were available at the outset for conveniently storing and accessing information through the MTS file system. Second, as the network grows in complexity, it becomes increasingly unlikely that the user will reference the major portion of the network during any given session. By maintaining the network in disk files, the amount of core storage required is substantially reduced. Finally, the file approach greatly simplified the programming effort, especially in those system components that operate recursively on the list structured network. We will elaborate on this point further in section VI, which illustrates the simplification of recursive processes in AUTONOTE2.</Paragraph>
    <Paragraph position="7"> Rather than store all the directories in a single dfsk file, we chose to maintain each directory separately. This strategy preserves the logical distinction among the three types of directory information, and has also simplified the programing of the system. We now describe the organization of each of the directory files.</Paragraph>
    <Paragraph position="8"> The node directory. Each node in the representational network has a corresponding integral node number which is also the line number in the node directory file. As new node numbers are needed to represent new topics, the next sequentially numbered line in the node directory is assigned as the node number. Each node directory line contarns four fields--one for bookkeeping information and three fields for the upward, downward, and item reference pointers for the node. The item reference region contains a list of Integer item numbers. The upward pointer region also contains a list of rntegers that represent immediate linkages to superior nodes. The two types of downward pointers (to mdes and to phrases) are stored in a common reglon. Each node, simple phrase, and single word has a corresponding file llne number In its respective directory file. In the,case of nodes, the line number is simply the node number. In the case of words and simple phrases, the line nuntiber is the result of a, hash codPng process on a compact characterrepresentation of the word or phrase. Thus a &amp;quot;pointer&amp;quot; 1s actually a file line number. Downward pointers to nodes and phrases are distinguishable in the node directory on the basis of the magnitude of the line number.</Paragraph>
    <Paragraph position="9"> Since each of the three pointer fields is of fixed length, there is a maximum number of each type of pointer for a given node directory line.</Paragraph>
    <Paragraph position="10"> Each field consequently has an associated continuation pointer to a line where additional pointers are stored if necessary.</Paragraph>
    <Paragraph position="11"> The phrase directory.</Paragraph>
    <Paragraph position="12"> To locate the phrase directory line for a particular simple phrase, a hash coding function is applied to the character string formed by concatenating the modifier word, a slash, and the subject word. For example, the directory line for the simple phrase PAPER ABOUT AUTONOTE is the hashcode for the string &amp;quot;AUTONOTE/PAPER.&amp;quot; Since the hashing function operates only on the modifier and subject word, simple phrases formed from the same two words, but with differing (or no) prepositions, are mapped into the same directory line number.</Paragraph>
    <Paragraph position="13"> To distinguish among the various instances of the same two-word combination, the directory line for simple phrases consists of a series of pointer blocks. Each pointer block contains a code for the particular preposition used, some additional bookkeeping information, and a pointer to the node directly descrsbed by that occurrence of the simple phrase.</Paragraph>
    <Paragraph position="14"> The word directory. The word directory incorporates the same pointer block principle as the phrase directory. The pointer field of the block in this case is a pointer into the phrase directory. The preposition code field contains a binary flag indicating whether the particular wdrd occurs as the subject noun or modifier word in the simple phrase specified by the pointer. Like phrases, each word directory line is accessed through an efficient hash coding algorithm.</Paragraph>
    <Paragraph position="15"> The word directory also maintains preposition usage information foreach word. For example, the entry for MEMO may indicate that the word has occurred with the prepositions ON, ABOUT, TO, FROM, etc. This information is used to guide the parsing of descriptions.</Paragraph>
    <Paragraph position="16"> The organization of the three network directories is depicted in Fig. 8.</Paragraph>
    <Paragraph position="17"> $he Representational. Network: An Example To help fix iderfs, Qe now present a more detailed example that illustrates the structure of the representational netwo2k. Suppose the user describes Items 157, 158, and 159 as THE PAPER ABOUT AUTONOTE FOR THE ACM CONFERENCE: he enters materials on the organization of that paper into Items 201 and 202. A summary of the paper is placed in Item 230. The user also describes Item 270 as SMITH'S PAPER, and enters a summary of that paper into Item 312. A pictorial representation of the resultant portion of the network is given in Fig. 9, while the corresponding directory contents appear ib Fig. 10. For simplicity, the simple phrase hash codes are represented by the alphabetic characters U through 2. (In subsequent diagrams, we alsoomit word-to-phrase linkages for simplicity.)  This section outlines our general approach to parsing topic descriptions. The parsing of prepositional phrases, consecutive modifiers, and possessive modifiers is considered.</Paragraph>
    <Paragraph position="18"> Prepositional phrases. Despite the apparent simplicity of the description language there are several nontrivial parsing problems.</Paragraph>
    <Paragraph position="19"> One of these is the difficulty in,dletermining the noun referent of prepositional phrases. The determination of noun referents is partially a semantic problem rather than a purely syntactic one. Consider the following two descript~ons:  For single words, there is one block for each phrase containing the word. For phrases, there is one block for each - node that the phrase directly describes.</Paragraph>
    <Paragraph position="20">  In the first example, both prepositional phrases refer to the immediately preceding noun. In the second case, both refer back to the noun MEMO at the beginning of the string, Although neither of these examples is intuitively ambiguous, the parsing algorithm must consider each preceding noun as a possible referent of any given prepositional phrase, ?he AUTONOTE2 parser deals with this problem to a limited extent, by utilizing prepositional clues, For example, if the system finds that the noun MEMO can form a simple phrase with the prepositions ON, ABOUT, TO, and FRW, then phrases introduced by these prepositions will be associated with that noun. Such clues will not always yield a unique parsing, of course, as in the caee of inherently ambiguous descriptions. THE PAPER FOR THE CONFERENCE ON GENETICS, for example, could refer to a paper on genetics to be delivered at a conference, or to a paper which is to be delivered at a confyence on genetics.</Paragraph>
    <Paragraph position="21"> In sucfi instances we rely upon the user to supply the referent noun upon request. In the example above, the system may prompt: DOES &amp;quot;ABOUT GENETICS&amp;quot; REFER TO PAPER OR CONFERENCE? Should the user reply CONFERENCE, the simple phrase CONFERENCE ON GmETICS will be added to the network.</Paragraph>
    <Paragraph position="22"> If at some later time, the parser is attempting to find a referent for the prepositional phrase ON GENETICS where CONFERENCE is one of the alternatives, ft forms that simple phrase directly.</Paragraph>
    <Paragraph position="23"> Consecutive modifiers. A parallel problem arises in determining the noun referents for a string of consecutive modifiers.</Paragraph>
    <Section position="1" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
Descriptions
</SectionTitle>
      <Paragraph position="0"> containing at 111ust a single adjective for any particular noun are parsed in the obvious manner.</Paragraph>
      <Paragraph position="1"> A simple phrase is formed from each modifier and the noun following it. In the event a noun is preceded by two or more modifiers, the parser is confronted with a task similar to that of determining the referent of a prepositional phrase.</Paragraph>
      <Paragraph position="2"> The modifier occurring immediately before the noun is first processed as above.</Paragraph>
      <Paragraph position="3"> Each of the remaining modifiers, however, can modify any one of several words depending upon their &amp;quot;distance&amp;quot; from the head noun. Specifically, any such modifier can refer to either the head noun or any of the other modifiers following it.</Paragraph>
      <Paragraph position="4"> Consider the descriptions : A summary of p I A summary of personal inform In both of the cases above, INFORMATION modifies the modifier RETRIEVAL which in turn modifies the head noun SYSTEMS. ~epending upon the user's intent, PERSONAL can modify either INFORMATION or SYSTEMS. The choice of modifier referents is an especially important problem when there are multiple parslngs, each resulting in a different semantic interpretation. For example, LARGE COMPUTER CONFEREXCE could refer to a conference on large computers, or a large conference on computers. Another important reason for our emphasis upon correctly identifying modifier referents concerns the use of paraphrasing. In the example, PERSONAL INFORMATION RETRIEVAL SYSTEMS, if we determine that INFORMATION modifies RETRIEVAL and PERSONAL modifies SYSTEMS, then the resultant topic can be paraphrased as (1) PERSONAL SYSTEMS FOR INFORMATION RETRIEVAL, or (2) PERSONAL SYSTEMS FOR THE RETRIEVAL OF INFORMA-TION. Depending upon context and the nature of other topics in the network, the following incomplete descriptions will in most cases identify the topic:</Paragraph>
    </Section>
  </Section>
  <Section position="15" start_page="2" end_page="2" type="metho">
    <SectionTitle>
1. SYSTEMS
2. SYSTEMS FOR RETRIEVAL (or RETRIEVAL SYSTEMS)
3. PERSO~AL SYSTEMS
4. PERSONAL SYSTEMS FOR RETRIEVAL (or PERSONAL RETRIEVAL SYSTEMS)
5. SYSTENS FOR INFORMATION MTRIEVAL
6, SYSTEMS FOR RETRIEVAL OF INFORMATION
</SectionTitle>
    <Paragraph position="0"> A different choice of modif~er referents determines a correspondingly chiferent set of paraphrases, If PERSONAL was intended to modify INFORMATION, we would have the paraphrase SYSTEMS FOR THE ~TRIEvAL OF PERSONAL INFORMATION, with a corresponding list of incomplete references to the topic.</Paragraph>
    <Paragraph position="1"> As in the prepositional case, the choice of modifier referents is guided by the current state of the representational network. After processing the last modifier in the string, the parser positions itself at the preceding modifier and moves left in the input string untll the first word in the modifier string is processed. In the above example, after associating RETRIEVAL with SYSTEMS, the parser next examines the modifier INFORMATION. A list of simple phrase candidates is formed. In this case, the list contains INFORMA-</Paragraph>
  </Section>
  <Section position="16" start_page="2" end_page="3" type="metho">
    <SectionTitle>
TION RETRIEVAL and LNFORlUTLON SYSTEMS. If neither of the candidate phrases
</SectionTitle>
    <Paragraph position="0"> has been previously used, the system queries-: WHAT DOES INFORMATION MODIFY? The user's reply is matched against the oandidate referents and the appropriate simple phrase is formed.</Paragraph>
    <Paragraph position="1"> Possessive adjectives, Possessives are processed in much the same way as normal modifiers.</Paragraph>
    <Paragraph position="2"> The system recognizes the 's word stem and marks the root word as a possessive.</Paragraph>
    <Paragraph position="3"> The root word is later stored in the network directories along with a possessive flag.</Paragraph>
    <Paragraph position="4"> Thus the phrase SMITH'S PAPER is stored internally as SMITH~PAPER (possessive).</Paragraph>
    <Paragraph position="5"> The removal of the stem insures that a subsequent simple phrase incorporating a preposition (PAPER BY SMITH) will hash to the same directory lifie thus allowing the use of either prepositional or possessive forms in referencing topics.</Paragraph>
    <Paragraph position="6"> A particularly interesting case arises when a possessive occurs in a string of consecutive modifiers as in SMITH'S LATEST MEMORY EXPERIMENT. The string is first processed as described above; that is, a check is made to see if SMITH has been used in a simple phrase with LATEST, MEMORY, or EXPERIMENT. In the event that this yields no clues, the system then checks to see if SMITH was rendered as a pbssessive. Upon noting that it was, the parser carries out a heuristic that assumes that the possessive modifies the head noun, EXPERIMENT. The possessive heuristic can be fully stated as follows. A possessive occurring in a stfing of modifiers will be assumed to modify the head noun unless another possessive occurs between it and the head noun. In the latter case, the first possessive will be assumed to modify the second. This is similar to the possessive feature employed by the REL parser (Dosert &amp; Thompson, 1971).</Paragraph>
    <Paragraph position="7"> Thus in SMITH'S RESEARCH GROUP'S MEMORY EXPERIMENT, SMITH'S is assumed to modify GROUP, and GROUP'S is assumed to modify the head noun EXPERIMENT. The question now arises, why check the phrase directory first instead of applying the possessive heuristic immediately? To answer this, suppose a topic was originally described as THE WSULTS OF THE MEMORY EXPERIMENT BY SMXJTH and the user now attempts to refer to it as SMITH'S MEMORY EXPERIMENT RESULTS. If the possessive heuristic were applied immediately, the system would incorrectly form the simple phrase SMITH'S RESULTS, not SMITH'S EXPERIMENT. By checking the network first, the simple phrase EXPERImNT BY SMITH will be detected and the system will parse the description appropriately.</Paragraph>
    <Paragraph position="8"> Implementation of the Parser The ultimate goal of the parser is to determine the simple phrases of a topic description. The parsing algorithm is implemented as a two stage process. The first stage is a preliminary scan to ascertain that the string is in a form acceptable for analysis. The description is segmented into an ordered list of words, each of which is marked as either WORD, POSSESSIVE, ARTICLE, or PEPOSITION. The parser makes no distinction between nouns and modifiers until completing the scan. At this point, the last in a series of consecutive WORDS is marked as a NOUN; the preceding words are marked as MOD-IFIERS. Possessive modifiers are an exception as they can be recognized explicitly during the scan. A record of article usage is also kept, but the articles themselves are not placed on the word list.</Paragraph>
    <Paragraph position="9"> The preliminary scan of the description can be viewed as a simple finite state process. Of course, to be completely formal, the recognizer would have to examine each input character. For convenience we will assume a five state automaton with inputs: WORD, POSSESSIVE, PREPOSITION, and ARTICLE. The state transition graph for the machine is given in Fig. 11. The machine starts in state So, examines the next input and moves to a new state. If at the end of the input string, the machine is in state S called the final state, the in- null or articles, an article between two words, a phrase beginning wlth a preposition, etc.</Paragraph>
    <Paragraph position="10"> The state transitions for the description BRUNER'S FIRST EXPERIMENT ON THE  Descriptions found acceptable by the scanner next undergo analysis by the second stage procedure.</Paragraph>
    <Paragraph position="11"> This algorithm steps through the word list and builds a table of simple phrases called the phrase table.</Paragraph>
    <Paragraph position="12"> Each entry in the phrase table Includes (1) the internal character representation of the phrase for use in hash coding, (2) a numerical code for the preposition used, (3) the hash code (directory line number) for the phrase, (4) a list of nodes directly described by the phrase, and (5) a coordinate or subordinate link to another phrase table entry.</Paragraph>
    <Paragraph position="13"> We now illustrate the construction of the phrase table by following through several examples.</Paragraph>
    <Paragraph position="14"> The parser in operation. Let us assume that a user is running the system for the first time; consequently, the three network directories are initially empty. Item No. 1 is opened, some text is inserted, and the user describes it as THE PLANNED PAPER ABOUT AUTONOTE FOR THE CONFERENCE. The description successfully passes the preliminary scan aqd the word list is constructed. The parser then moves on to determine the simple phrases.</Paragraph>
    <Paragraph position="15"> The modifier PLANNED is first noted. Since it is followed immediately by a noun, the simple phrase PLANNED PAPER becomes the first entry in the phrase table. Next the prepositional phrase ABOUT AUTONOTE is encountered. Again there is only one possible noun referent. The phrase PAPER ABOUT AUTONOTE is entered into the table and marked as coordinate with the first entry. To determine the referent of FOR CONFERENCE, the system must consider two alternatives: AlfTONOTE FOR CONFERENCE and PAPER FOR CONFERENCE.</Paragraph>
    <Paragraph position="16"> The network is interrogated to determine if either of the candidate phrases has been previously used. This test fails since the network is empty at chis point.</Paragraph>
    <Paragraph position="17"> A check is then made in the word directory to determine if either AUTONOTE or PAPER has headed a simple phrase with the preposition FOR* This also fails so the system asks the user: DOES &amp;quot;FOR CONFERENCE&amp;quot; REFER TO PAPER? A yes response results in the addition of PAPER FOR CONFERENCE to the phrase table. Since the noun referent, PAPER, is the same as the previous phrase, the new entry is marked as coordinate with PAPER ABOUT AUTONOTEw The completed phrase table is given in Fig. 12.</Paragraph>
    <Paragraph position="18">  (1) autono te/paper (2) planned/paper (3) conferenee/paper  Fig. 12 - Sample Phrase Table The phrase table fs next passed to the network locator. We will assume it determines that the user is defining a new topic. Using the syntactic dependencies in the phrase table, the network locator assigns new node numbers to the phrases in the description. In this case, all three phrases are coordinate; each will directly describe node No. 1 in the network. In addition, a reference to Item NO* 1 is stored with the topic node (see Fig. 13). The user next enters Text Item Noo 2 describing it as THE ORGANIZATION OF</Paragraph>
  </Section>
  <Section position="17" start_page="3" end_page="3" type="metho">
    <SectionTitle>
THE PAPER FOR THE ACM CONFERENCE. The system proceeds as before untll it en-
</SectionTitle>
    <Paragraph position="0"> counters the prepositional phrase FOR THE CONFERENCE. It forms the two alternatives PAPER FOR CONFERENCE and ORGANIZATION FOR CONFERENCE0 Upon interrogating the network, it finds that PAPER FOR CONFERENCE has been defined previously and accepts that candidate.</Paragraph>
    <Paragraph position="1"> ACM CONFERENCE is added to the phrase table and the parsing is complete (Fig.</Paragraph>
    <Paragraph position="2">  The network locator must then determine ff the user is referring to the same paper or a new one. The operation of the network locator will be discussed in detail in the next chapter. Let us assume for now that the current description is indeed a reference to the same paper.</Paragraph>
    <Paragraph position="3"> The simple phrase PAPER FOR CONFERENCE is already in the network.</Paragraph>
    <Paragraph position="4"> The system must then decide what to do with THE ORGANIZATION OF PAPER, and with ACM CONFERENCE. Since the former is superior to the node 1 phrase, it is assigned to node 2 and a downward pointer from node 2 to node 1is added. The phrase ACM CONFERENCE, on the other hand, is subordinate to a node 1 phrase. Thus it is assigned a new node number (node 3) and a pointer up from node 3 to node 1 is added. Phrase-tonode and node-to-node pointers are two way, thus corresponding pointers down from node 1 to node 3, and up from node 1 to node 2 are also added. The resul tant network is illustrated in Fig, 15. This example points out an interesting feature of the AUTONOTE2 system. Although Item No, 1 was originally described as alpaper for some unspecified conference, a subsequent reference to that paper has enriched its description.</Paragraph>
    <Paragraph position="5"> V, THE NETWORK LOCATOR The purpose of the network locator is to determine whether the user's des cription makes reference to an existing topic in the representational network. Its decision is based on the information in the phrase table and the current state of the network. Once the decision is made, the locator builds a table, called the links table, that specifies the changes to be made in the network to represent the description.</Paragraph>
    <Paragraph position="6"> In cases where the input descriptiori matches exactly some structure in the network, the links table will specify only the addition of an item reference. When the description defines a new topic, every phrase in the phrase table will be assigned a new node and links entries will be made for the proper node-node linkages.</Paragraph>
    <Paragraph position="7"> Fig. 15 - Network after Augmentation In order to describe more precisely the operation of the network locator, let us assume the network has evolved to the state depicted in Fig. 16. Note that by starting at any node and tracing downward through the network, it is possible to reconstruct the description of the topic the node represents. The nodes in the network represent the fo'llowing topics.</Paragraph>
    <Paragraph position="8">  Node 1. THE PLANNED PAPER ABOUT $WONOTE FOR THE ACM CONFEIIENCE.</Paragraph>
    <Paragraph position="9"> Node 2. ORGANIZATION OF THE PLANNBD PAPER ABOUT AUTONOTE FOR THE ACM CONFERENCE.</Paragraph>
    <Paragraph position="10"> Node 3. THE ACM CONFERENCE.</Paragraph>
    <Paragraph position="11"> Node 4. AN ABSTRACT OF THE FIRST PAPER ABOUT AUTONOTE.</Paragraph>
    <Paragraph position="12"> Node 5. THIE FIRST PAPER ABOUT AUTQNOTE.</Paragraph>
    <Paragraph position="13"> Node 6. THE REVIEWER'S COMMENTS O@ THE PLANNED PAPER..oo Node 7. THE PROCEEDINGS OF THE Am CONFERENCE.</Paragraph>
    <Paragraph position="14"> Node 8. TRAVEL ARRANGEMI3W.S FOR THE ACM CONFERENCE.</Paragraph>
    <Paragraph position="15">  To illustrate the network location procedures, we will now ga &amp;quot;through several subsequent references to topics already defined in the rqrmentation. Before passing the phrase table to the locator, the parser first checks to see if the description contains any active phrases, simple phrases that directly describe one or more nodes in the network. When the locator gets control, it checks an internal flag that indicates one of three conditions: the description contains one or more active phrases; the description contains no active phrases; or the description contains only a single word.</Paragraph>
    <Paragraph position="16"> As our first example, consider the subseqaent item description:  the first of these, PAPER ABOUT AUTONOTE. From information in the phrasetable, it sees that PAPER ABOUT AUTONOTE plays a role in two d-istinct topics represented by nodes 1 and 5. It then considers bath of these alternatives, checking to see if the remaining phrases in the phrase table either directly or indirectly descxibe either of the two nodes. Since both phrases directly describe node 1, the locator assumes that it is the topic node of user reference. As an option, the user may request the locator to display its assumptions, in which case the system replies: I ASSUME YOU MEAN THE PLANNED PAPER ABOUTAUTO-</Paragraph>
  </Section>
  <Section position="18" start_page="3" end_page="3" type="metho">
    <SectionTitle>
NOTE FOR THE ACM CONFERENCE. Other than the addition of an item reference to
</SectionTitle>
    <Paragraph position="0"> node 1, no network changes are made in this example. Note that the user has efficiently made reference to tbe desired topic, relying on the system to fill ia the gaps in hi&amp; description. The system would proceed in much the same way in processing shorthand descriptions such as SUMMARY OF THE PAPER or REVIEWER'S</Paragraph>
  </Section>
  <Section position="19" start_page="3" end_page="3" type="metho">
    <SectionTitle>
COMMENTS ON THE FAPER, in each-caae assumtng that the user is referring to the
</SectionTitle>
    <Paragraph position="0"> same paper about AUTONOTE In previous discussion we have alluded to the use of contextual clues in deciding among the alternative referents of a vague or ambiguous description. Context in the AUTONOTE2 system takes the form of an access recency nwlber (context number). Each time the user refers to some topic in the network,  co-ord 1 each of the component nodes is assigned the current context number. The current context number is incremented at the beginning of each AUTONOTE2 session and each time the user defines a new topic.</Paragraph>
    <Paragraph position="1"> Thus when deciding among alternative topic nodes, the system can readily determine which was referred to most recently, Another class of interesting cases are those i~ which the description consists of a single noun.</Paragraph>
    <Paragraph position="2"> IPS the current description is THE PAPER, for example, the system would use the word directory to locate those simple phrases where PAPER is the subject noun. Using the resultant list of simple phrases, a list of nodes directly described by these phrases is generated, In this case, this process generates two alternatives (node 5 and node 1). The system then functions as before, either choosing a node in context, or interrogatidg the user, The foregoing discussion has described our approach to network location. We now give a more detailed presentation of the algorithm. Case I: Active phrases in the description. Should the user's description contain one or more active phrases, there is a good possibility that it references an existing network topic. The first step in processing such a description is to determine the focus phrase, the active phrase at the highest dependency level. Note that the focus phrase may be subordinate to other (nonactive) phrases in the desctiption. The basic idea is to use the nodes directly described by the focus phrase to get a set of candidate topics. Once these candidates are determined, they are matched against the remaining activephrases in the phrase table to determine the most likely referent.</Paragraph>
    <Paragraph position="3"> Before describing the matching process, let us first consider a few special cases. Suppose, for instance, that the focus phrasq directly describes only one topic node and that any additional active phrases are also present ih that toflic representation. The presence (or absence) of non-active phrases in the description is, in this case, an important parameter. Any non-active phkases may serve to distinguish the description from the existing topic. On the other hand, they could very well represent additional description of the topic at hand. If the topic under consideration is recent, we first assume the latter case. In addition, when processing descriptions rendered for retrieval, the netwofk locator naturally rules out the possibility that a new topic is being described and accepts the one at hand.</Paragraph>
    <Paragraph position="4"> The_matching process. When the focus phrase directly describes two or more nodes, a network matching procedure is used to determine which of the associated topics the description references. The matching routine uses a list of the candidate nodes, a list of the active phrases in the description, and the current contents of the representational network. For each candidate node, the routine determines how many of the descripti~n's active phrases directly or indirectly describe that node. The matching routine returns a table of this information along with the number of the node, if any, that best matches the input- description. The &amp;quot;best&amp;quot; node is the one that has the highest number of matching phrases. If two or more nodes have an equal number of matching phrases, an attempt is made to choose one of them on the basis aPS context. null The simplified flow diagram appearing in Fig. 18 summarizes the decision procedure for the case in whikh the description contains one or more active #up--the number of upward pointers to nodes from the focus phrase ilevel--a user-specified interaction level #nonac t--number of nonac t ive phrases in the des-</Paragraph>
    <Paragraph position="6"> phrases. Note in particular that in case the description consists of only a single active phrase that directly describes a single node, the network l~cator assumes the node immediately. The rationale is that we anticipate the user will frequently make use of such terse descriptions in reference to previously define topics. Recalling our earlier discussion of human referential communication, a speaker makes incomplete references with the assumption that the topic can be inferred by the listener; otherwise, he describes his subject more precisely to avoid being misunderstood. The network locator was designed with this in mind. That is, whenever a terse description references a single node directly, or in context, that node is taken as the referent, Case 11: No. active phrases. The first step in processing a description with no active phrases is to examine its component words, attempting to identify possible referents by utilizing the word and phrase directories. If no candidate nodes are generated by the procedure, the network locator assumes a new topic is being defined and allocates new nodes in the network, Non-acttve descriptions that reference existing topics fall into two categories. First, there are those that paraphrase some existing topic descriptton, For example, a topic originally described as THE USE OF THESAURI IN THE SMART S'ISTEMmay subsequently be referred to as THESAURI TECHNIQUES IN SALTON'S SYSTEN. Second, the description may constitute a more specific classification of some topic. While working in the context of a particular paper, for example, the user may describe a new item as THE ORGANIZATION OF THE PAPER, where</Paragraph>
  </Section>
  <Section position="20" start_page="3" end_page="3" type="metho">
    <SectionTitle>
ORGANIZATION OF PAPER is a non-active phrase.
</SectionTitle>
    <Paragraph position="0"> The word search procedure involves the use of selected words from the description and the wofd directoq to obtain a set of active phrases containing rhose words.</Paragraph>
    <Paragraph position="1"> From there, a set of nodes is obtained by collecting the upward pointers from those phrases in the phrase directory. The resultant set of nodes then is processed as a list of candidate topics just as if they had been obtained immediately from a description containing active phrases.</Paragraph>
    <Paragraph position="2"> An important consideration here is which of the words in the description to use in the search for candidate topics. In an early version of the system, we tried using each noun and modifier in turn. Although this approach was successful in many cases, it often resulted in an extremely lengthy list of alternatives. We also noted that words at the highest dependency level more often led to identification of the topic node than those words occurring at subordinate levels. For this reason, it was decided to curtail the word search, using</Paragraph>
  </Section>
  <Section position="21" start_page="3" end_page="3" type="metho">
    <SectionTitle>
1 I
</SectionTitle>
    <Paragraph position="0"> only the words in the root phrase&amp;quot; of the description. Fop the non-active description PAPER ON CLUSTERING IN THE SMART SYSTEM, the words PAPSR and CLUSTERING would be used in the search for candidate nodes. As a user option, the system will expand the search to include the remaining words in the description.</Paragraph>
    <Paragraph position="1"> There are four stages in the search for candidate topics (see below).</Paragraph>
    <Paragraph position="2"> Stages 1 and 2 deal with the subject word of the root phrase; Stages 3 and 4 with the modifier word. In Stages 1 and%3, only those nodes directly described by phrases having the particular word in subject position are considered. In Stages 2 and 4, topic nodes with the word in modifier position are considered.</Paragraph>
    <Paragraph position="3">  After completion of each stage, if there were any nodes generated they are passed through the recency check. If no node is distinguished, the user is presented with a list of the alternatives. The user may then choose one of the topics or reply that none is the intended referent, in which case the next stage is tried.</Paragraph>
    <Paragraph position="4"> If a node eventually is identified by this process, the locator must note the stage it is in, since each case implies a distinct links table. Fig. 19 gives an example of each case. The state of the network before processing the description is illustrated $y solid lines; the network additions by dashed lines. Note that is the Stage 2 example, the user originally described a new topic simply as ORGANIZATION OF THE PAPER and then gave a more complete description of the paper. The description of the paper was also left pending in the Stage 4 example. In fact, Case 2 and 4 can only occur in this situation since the presence of a simple phrase with paper as the subject noun would have been picked up earlier in either Stage 1 or 3.</Paragraph>
    <Paragraph position="5"> The stage in which topic identification is made also is important when processing retrieval descriptions, Recall that a priacipal advantage of the referential system is that it enriches its representation of the user's topics during retrieval. Whenever a retrieval description contains simple phrases not asready present in the representation of the identified topic, they are added to the representation. However, if topic identification occurs in Stage 2, note that a simple phrase will be added at one level higher than the decided topic. If processing a retrieval description, the addition would be meaningless as it will not enrich the description of the identified node. For example, if the user has previously described a paper and later calls for the retrieval  to the &amp;quot;paper&amp;quot; node is of no value in later referencing the topic.</Paragraph>
    <Paragraph position="6"> In such cases, the network locator returns the located node to the retrieval processor and suppresses the network addition.</Paragraph>
    <Paragraph position="7"> To state this more generally, retrieval descriptions are employed to augment the representation only when the additional phrases constitute co-ordinate or subordinate description of the located topic.</Paragraph>
    <Paragraph position="8"> Although we have found the word searching procedure quite effective, its success ultimately depends upon a co-occurrance of some word in both the description and the Yepresentational network. A proposed extension to AUTONOTE2, as described in Linn (1972), would augment this procedure to include word stem and synonym processing.</Paragraph>
    <Paragraph position="9"> The major objection to this procedure is that as the network grows larger it generates too many candidate nodes and consequently more queries to theuser. To alleviate this problem, we allow the user to cancel processing of the description any time he decides the system is having difficulty relating his description to the current representation, In addition, if the user is unsure how he previously described a particular topic, a facility is provided that allows him to obtain topic descriptions from a specified region of the network. Upon noting the topic he originally intended, he may then give a more precise description.</Paragraph>
    <Paragraph position="10"> Case 111: One word descriptions. Descriptions consisting of a single noun are processed in much the same manner as non-active descripti~ns, The noun is treated as if it resulted from a deletion on a simple phrase.</Paragraph>
    <Paragraph position="11"> The wrd directory is first searched for simple phrases in which the word appears as the subject and, if necessary, the modifier. The nodes obtained from the phrase directory then are processed as described earlier.</Paragraph>
    <Paragraph position="12"> VI. NETWORK MEDIATED RETRIEVAL The previous sections dealt primarily with the process of item description, that is, the process of constructing a representation from descriptions of the user's textual materials. This section discusses the AUTONOTE2 procedures that retrieve information through the representational network.</Paragraph>
    <Section position="1" start_page="3" end_page="3" type="sub_section">
      <SectionTitle>
Retrieval via Descriptions
</SectionTitle>
      <Paragraph position="0"> Many of the procedures described earlier for item description and repre-sentation are used in retrieval. The user initiates retrieval by giving a FIND command, supplying a description as argument. Retrieval descriptions are first passed to the parser, and are therefore subject to exactly the same constraints as item descriptions. If the description is acceptable, the resultant phrase table is passed along to the netw~rn locator which ultimately returns a node number to the FIND processor.</Paragraph>
      <Paragraph position="1"> The FIND processor constructs a set of item numbers by extracting the textual references from the node returned by the network locator. The system then checks for upward poipters from the node, to more specifically described materials, If there are ~tructurally related topics, the FIND processor so informs the user and asks if he would like to explore further. If so, the user is presented with descriptions of the higher order alternatives. Using the network depicted in Fig. 16, for example, consider the retrieval request</Paragraph>
    </Section>
  </Section>
  <Section position="22" start_page="3" end_page="3" type="metho">
    <SectionTitle>
FIND THE PLANNED PAPER ABOUT AUTONOTE. The network locator would determine
</SectionTitle>
    <Paragraph position="0"> that node 1is the desired referent and return that fact to the FIND processor.</Paragraph>
    <Paragraph position="1"> After storing away the item references of node 1 the system would ask:</Paragraph>
  </Section>
  <Section position="23" start_page="3" end_page="3" type="metho">
    <SectionTitle>
DO YOU WANT:
A.
THE ORGANIZATION OF THE PAPER
Bw THE REVIEWER'S COMMENTS ON THE PAPER
</SectionTitle>
    <Paragraph position="0"> The user my respond with an appropriate letter indicating which topic he desires.</Paragraph>
    <Paragraph position="1"> If the topic selected also has higher order nodes, the process is repeated until the user terminates the search.</Paragraph>
    <Paragraph position="2"> If the node returned by the network locator has no associated item references, the system searches upward in the network for a node with text item pointers. If a node is reached with multiple upward paths, the system stops and queries the user. For example, if a user has entered only an outline and some bibliographic references for a paper he is writing, then a retrieval description that maps onto the &amp;quot;paper&amp;quot; node would elicit a query such as:</Paragraph>
  </Section>
  <Section position="24" start_page="3" end_page="3" type="metho">
    <SectionTitle>
DO YOU WANT:
A. THE OUTLINE OF THE PAPER
</SectionTitle>
    <Paragraph position="0"> B. BIBLIOGRAPHIC WFERENCES FOR THP, PAPER This example illustrates a distinct advantage of the referential system over sfmple keyword indexing. When the user's description is imprecise, AUTONOTE2 directs the user to related topic nodes with associated textual materials, Upon termination of the search, the resultant set of textual references is stored internally. Depending upon the user' s option settings , a reference count and the set of ftem numbers then may be displayed on the user's consoh.</Paragraph>
    <Paragraph position="1"> The user may PRINT those particular items he wishes ta see, or he may simply RETRIEVE the entire set, In dealing with groups of related items, network mediated retrieval has three major advantages over simple keyword-based technfques. First, the user I I need only make his descriptions more specific in order to zero in&amp;quot; upon correspondingly specific textual materials. Second, the representational network enables the system to use the user's original description as a starting point in guiding him to structurally related topic nodes. Finally, the possibility of network exploration can help the user recall the structure of the materials represented in some portion of the network. This can be quite valuable after the user has spent an extended period working with other topics, or as the number of topics and their interconnections become large.</Paragraph>
    <Paragraph position="2"> After processing a retrieval request, the system determines if the user's description contained any prepositional phrases or adjectives not already present in the identified topic's representation. If so, the topic descrip tion is enriched accordingly. For example, if the representation of the located node is THE PAPER FOR TNEl ACM CONFERENCE, and the user referred to it by the retrieval description TKE PAPFR FOR THE FALL CONFERENCE, the system will augment its representation to include the simple phrase FALL CONFERENCE. This is an important aspect of AUTONOTEZ. Whether descriptions are employed for the purpose of characterizing text items or retrieviag them, the system continually updates its representation of the user's topics, In addition, this example illustrates how the system is able to establish a limited form of phrase synonomy. There will subsequently be a node in the network directly described by both ACM CONFERENCE and FALL CONFERENCE, and any topic associated with that node may later be referenced using either or both of the two simple phrases.</Paragraph>
    <Section position="1" start_page="3" end_page="3" type="sub_section">
      <SectionTitle>
Interrogating .the Network
</SectionTitle>
      <Paragraph position="0"> As the network grows complex, the user must be able to question the system about the current representation.</Paragraph>
      <Paragraph position="1"> This capability may help him recall the structure of some set of related topics. Or, prior to formulating a new topic description, the user may wish to examine the representational network for possible related topics. Finally, periodic perusal of the network may strengthen the user's own conceptual representation of the various topics and their interrelationships.</Paragraph>
      <Paragraph position="2"> The DESCRIBE command retrieves topic descriptions PStom the representational network. It accepts 9 variety of arguments and first generates a set of topic nodes. Then, using the SPEAKER routine, it outputs a description of esch n~de in the set. The various input forms include the following.</Paragraph>
      <Paragraph position="3"> DESCRIBE ITEM &lt;list). Each time the description processor adds a textual reference to a node, the node number is placed in a predetermined location in the text file region of the,item. The DESCRIBE processor consequently has access to the desired set of associated node numbers. For any particular text item, the user may wish to know which topics it currently is associated with. Initially, when an item is first described by the user, the actual description line is placed in the data base beneath the text. To recall how he described an item origfnally, the user need only request that the item be printed (omitting the text i.f he chooses). But the original description may have been only a terse reference, in context,to amore fully described node. Furthermore, the description of that node may have been enriched or altered subsequent to the entry of the item.</Paragraph>
      <Paragraph position="4"> To obtain a full description of each topic presently associated with the item, regardless of how the item originally was described, the user employs DESCRIBE ITEM.</Paragraph>
      <Paragraph position="5"> DESCRIBE CURRENT [TOPICL. A pointer to the node most recently referenced in the repxesentation is maintained in the node directory. In response to this command, the DESCRIBE routine simply determines the node number and displays its description. The current node number is saved between AUTONOTE2 sessions; this command is often employed at the beginning of a session to remind the user of the previotis working context.</Paragraph>
      <Paragraph position="6"> DESCRIBE TOPICS. This command causes every node in the network having associated item references to be described. Because of the voEuminous output, it is most frequently employed iri batch mode.</Paragraph>
      <Paragraph position="7"> DESCRIBE &lt;description&gt;. When the DESCRIBE routine encounters an argument that is not in one of the special forms discussed above, it treats the input as a phrasal description. Using the parser and network locator, an attempt is made to map the input into a unique topic node. Tf successful, a complete description of the node is presented to the user. Thus, if the user cannot recall precisely how he described some topic, he may supply an incomplete rderence to obtain the topic description in full.</Paragraph>
      <Paragraph position="8"> The network locator functions somewhat differently when processing a des.</Paragraph>
      <Paragraph position="9"> cription for the DESCRIBE command. If it is unable to discern a unique node using the matching procedure and context, a list of the alternatives is returned for subsequent display.</Paragraph>
      <Paragraph position="10"> The FULLY modifier. The user may request the display of a host of related topics by employing the. JXJLLY modifier.</Paragraph>
      <Paragraph position="11"> Specifically, the user types DESCRIBE FULLY, followed by any of the argument forms discussed above. As before, this generates a node or set of nodes.</Paragraph>
      <Paragraph position="12"> When describing FULLY, each node is in turn expanded into a set of structurally related nodes also having associated textual references.</Paragraph>
      <Paragraph position="13"> As an example, consider again the network in Fig. 16. The user types DESCRIBE 'FULLY, THE PAPER ABOUT AUTONOTE. Assuming no choice is possible in context, the description is ambiguous, and the network locator returns nodes 1 and 5.</Paragraph>
      <Paragraph position="14"> The two nodes themare passed to a routine that displays an indented outline representing the structurally related topics reached by moving upward in the network. Each level of indentation represents a node level traversed in the network. In this example the following outline would be printed:</Paragraph>
    </Section>
  </Section>
  <Section position="25" start_page="3" end_page="3" type="metho">
    <SectionTitle>
A. THE PLANNED PAPER FOR THE ACM CONFE3ENCE
THE: ORGANIZATION OF THE PAPER
THE REVIEWER'S COMMENTS ON THE PAPER
B. THE FIRST PAPER ABOUT AUTONOTE
THE ABSTRACT OF THE PAPER
</SectionTitle>
    <Paragraph position="0"> DESCRIBE STRUCTURES. This command functions as if FULLY was specified, diaplaying outlines of each topic cluster in the representational network. To accomplish this, the network is searched for nodes having nowdownward pointers to other nodes. Each such node corresponds to the lowest order node level in a particular cluster of related topics. When described FULLY, the effect is to reveal the structural outline of its associated cluster.</Paragraph>
    <Section position="1" start_page="3" end_page="3" type="sub_section">
      <SectionTitle>
The SPEAKER Conrpanent
</SectionTitle>
      <Paragraph position="0"> As we have seen, SPEAKER is invoked during many phases of AUTONOTE2's operation. The calling routine passes the SPEAKER a node number.</Paragraph>
      <Paragraph position="1"> A buffer containing a phrasal description of the node is returned. A second, optional input parameter specifies the level of detail desired in the resultant description, The level indicator corresponds to the number of node levels in the representation to be employed in formulating the description. The level indicator is particularly useful when the system must question the intent of a description. When querying the user during the network loca-tion process, for example, the system requests topic descriptions from the SPEAKER with the level indicator set according to the user's current preferred levelofdetail,as inferred from his most recent description. For example, if the user describes an item as RESULTS OF THE EXPERIMENT and the system must ask if he is referring to SMITH'S EXPERIMENT ON THE SHORT TERM MENORY OF WHITE RATS, the resulting query would be ARE YOU REFlERRING TO SMITH'S EXPERIMENT ON MEMORY? The process of constructing a description from the network takes place in two stages. The first stage steps through the network recursively, collecting the simple phrases that directly or indirectly describe the specified node. The level indicator, if applicable, blocks the collection of simple phrases below the specified level. During this stage, the SPEAKER constructs two tables of words, one for subject nouns and another for modifiers. Each entry in the subjects table is linked to a list of adjectives for that subject, and a list (called the modification chain) of prepositional modifications of the subjec.t noun. For example, the subjects tables entry for PAPER may have an adjective list containing PLANNED, and a modification chain consisting of (ABOUT) AUTONOTE and (FOR) CONEI3RENCE. Both of the lists are chained through the table oE modifiers. Note that some words will appear in both the subject and modifier tables. For example, PAPER may be in the modifler table as part of the modification chain of the word ORGANTZATION, and also is the subjects table with a modification chain of its own, The subjects table also maintains article usage information for each of its entries. Fig. 20 illustrates the subject and modifier tables constructed from a typical topic node,  the two tables to construct the phrasal description.</Paragraph>
      <Paragraph position="2"> The process begins with the first word in the subjects table, in this case ORGANIZATION.</Paragraph>
      <Paragraph position="3"> If an article applies, it is added to the description buffer.</Paragraph>
      <Paragraph position="4"> Next, the adjective chain is  traversed adding each adjective in turn to the buffer. In this case there are no adjective8 so the current subject word (ORGANIZATION) is added to the buffer and the systetn continues with the modification chain. This leads to the second entry in the modifier table, (OF) PAPER. The preposition is then added to the buffer yielding THE ORGANIZATION OF. Next, a check is made in the subjects table to determine if the current modifier word (PAPER) is further described. Since there is an entry for PAPER, the current position in the modification chain for ORGANIZATION is placed on a push down stack (the goal stacw and the algorithm recurses on the word paper. After adding the article, the adjective (PLANNED), and the subject word (PAPER), the description buffer contains THE ORGANIZATION OF THE PLANNED PAPER. The system now begins processing the modification chain of PAPER. The first piece of the chain adds ABOUT AUTS NOTE to the buffer, Note that there was no recursion on AUTONOTE because that word does not have a subjects table entry. The pointer to the next piece of the modification chain, (FOR) CONFERENCE, is then picked up from the link field of the AUTONOTE entry. After adding the preposition (FOR), the algorithm recurses on CONFERENCE, adding THE, ACM, and CONFERENCE in turn to the buffer. The goal stack is then popped in search of remaining modification chain pointers. null The first &amp;quot;pop&amp;quot; restores the PAPER modification chain. Since there is no additional modification of the paper, the goal stack is popped again to restore the ORGANIZATION chain, We are at the end of this chain also, and thus the process terminates with the description buffer reading: THE ORGANZ-</Paragraph>
    </Section>
  </Section>
  <Section position="26" start_page="3" end_page="3" type="metho">
    <SectionTitle>
ATION OF THE PWED PAPER ABOUT AUTONOTE FOR THE ACM CONFERNECE.
</SectionTitle>
    <Paragraph position="0"> SPEAKER heuristics. The addition of phrases to a topic in many cases could reduce the readability of its SPEAKER-generated description. For example, suppose a topic is first defined as SAMPLE DESCRIPTIONS FOR USE IN</Paragraph>
  </Section>
  <Section position="27" start_page="3" end_page="3" type="metho">
    <SectionTitle>
THE ACM PRESENTATION, and later is referred to as SAMPLE DESCRIPTIONS FOR USE
IN THE NSF PROPOSAL. Given only the algorithm just presented, the SPEAKER
</SectionTitle>
    <Paragraph position="0"> generated description would be SAMPLE DESCRIPTIONS FOR USE IN THE ACM PRESEN-</Paragraph>
  </Section>
  <Section position="28" start_page="3" end_page="3" type="metho">
    <SectionTitle>
TATION IN THE NSF PROPOSAL. To avoid such unreadable descriptions, whenever
</SectionTitle>
    <Paragraph position="0"> the modification chain for a subject noun contains two or more prepositional phrases headed by the same preposition, the SPEAKER sets off each phrase after the first with parentheses. The above example then becomes SAMPLE DESCRIP-</Paragraph>
  </Section>
  <Section position="29" start_page="3" end_page="3" type="metho">
    <SectionTitle>
TIONS FOR USE IN THE ACM PRESENTATION (AND THE NSF PROPOSAL). Note that a
</SectionTitle>
    <Paragraph position="0"> description such as COWTS - ON SMITH'S ARTICLE - ON CLUSTERING is not processed in this manner since (ON) ARVTCLE is in the modification chain of COMMENTS, while (ON) CLUSTERING is in the modification chain of the work ARTICLE. Note also that although parentheti~al phfases are excluded from topic descriptions generated for the purpoSe of Lnterrogating the user, when the user requests a description of a tppic via the DESCRIBE command, the complete description is provided .</Paragraph>
    <Paragraph position="1"> Simplification of list processing. It may be added here that our decision to maintain the representational network in disk file storage has greatly simplified the list processing in recursive algorithms such as the SPEAKER, The network can be envisioned as a complex list structure where the links are simply line file numbers. To illustrate this point, consider the recursive collection of simple phrases carried out in the first step of the SPEAKER.</Paragraph>
    <Paragraph position="2"> The main body of the routine collects the simple phrases that directly describe a node. If the node processed has downward links to subordinate nodes, they are placed on a push down stack.</Paragraph>
    <Paragraph position="3"> Next the stack is popped and the routine is called recursively to operate on a new node number. Thus all the concomitant problems of storage management that are normally present in list processing systems are avoided. Recursive deletion, discussed in the next section, similarly is simplified. To delete a portion of the list structure requires only the removal of a line from a directory file. Thus the process of &amp;quot;garbage collection&amp;quot; is both automatic and transparent to AUTONOTE2.</Paragraph>
    <Paragraph position="4"> VII. NETWORK MODIFICATION Procedures for modifying the representational network are required for several reasons. Should the system incorrectly parse a description, the user's ability to reference~the associated topic will be impaired. The user may wish to alter the description of a topic to (a) make it more precise, (b) insure that it is not confused with similarly described topics, or (c) enable a topic to be referenced in more than one way, After initially describing a text item, the user may discover that the item should also be associated with other topics in the representation. Alternatively, he may decide that a text item should be dissociated from some topic. The user may wish to delete an obsolete topic from the representation altogether, or replace a description in its entirety by a more suitable one while maintaining the same list of associated textual references. Finally, when dealing with a group of structurally related topics, the user may wish to delete an entire structure, or certain components of a structure, from the network.</Paragraph>
    <Paragraph position="5"> We cannot expect a typical user to think in terms of list structures, nodes, linkages, etc. Thus we sought to provide a command language and feedback more or less independent of the internal data structures that implement the representation. In addition, care was taken to avoid(the possibility of accidental damage to the representation steming from misunderstanding or misapplication of the modification procedures.</Paragraph>
    <Paragraph position="6"> The resultant processor includes procedures for removing or adding item references to a topic, deleting topics, adding or removing simple phrases from the description of a topic, etc. Rather than require the user to identify the particular topic to be altered each time a modification is to be performed, primitives are implemented as local commands to a generalized modificatton processor.</Paragraph>
    <Paragraph position="7"> The modification processor is invoked by issuing a CHANGE command which accepts a phrasal description as its argument. A node in the network is established as the currentjidentified topic. The processor then prompts the user for modification instructions. After all modifications are completed, the user types DONE and control is returned to the regular command monitor. The CHANGE command also may be issued while in modification mode, thereby changing the current topic. Each of the local commands is discussed separately below, using the hypothetical representation depicted in Fig. 21 for illustration.</Paragraph>
    <Paragraph position="8"> Adding References and Phrases to the Network The ADD command associates additional text references with the current topic, and adds simple phrases to the topic's description.</Paragraph>
    <Paragraph position="9"> To add item references, the user types ADD ITEM[S] followed by a list of item numbers. This procedure is quite useful if the user has a large set of items that pertain to a particular topic.</Paragraph>
    <Paragraph position="10"> He simply identifies the topic and adds the list of references. null Fig. 21 - Sample Representation for Discussion of Network Modification If the supplied argument is a phrase, it is added to the current node.</Paragraph>
    <Paragraph position="11"> For example, if the current topic is THE PAPER FOR THE ACM CONFERENCE, the command ADD PAPER ABOUT AUTONOTE causes the prepositional phrase ABOUT AUTONOTE to become a part of the topic description.</Paragraph>
    <Paragraph position="12"> Adjectives may also be added to a description (example: ADD SMITH'S PAPER). If only a single word occurs as the argument, it is assumed to be an adjective which is to modify the current subject noun.</Paragraph>
    <Paragraph position="13"> Moving through the Network The MOVE command allows the user to change the current node pointer from its present position to structurally proximate topics without having to enter a description. For example, if currently located at the &amp;quot;paper&amp;quot; node, the command MOVE DOWN causes the ACM CONFERENCE to become the current topic. If the current topic is the ACM CONFERENCE, MOVE UP will produce three higher order topics. Each is saved, and the leftmost node becomes the current topic. Subsequently, the user my MOVE LEFT or RIGHT to the other topics. Mter a successful move, a brief description of the new topic is displayed.</Paragraph>
    <Section position="1" start_page="3" end_page="3" type="sub_section">
      <SectionTitle>
The Caching Facility
</SectionTitle>
      <Paragraph position="0"> The CACHE command stores item references for subsequent use.</Paragraph>
      <Paragraph position="1"> If the command is given with no argument, the set of text references associated with the current topic is added to an internal cache. The caching facility may be wed to manipulate large sets of item references, for example, in transfering all item references from one topic to another.</Paragraph>
      <Paragraph position="2"> This may be accomplished by identifying the first topic and issuing a CACHE command.</Paragraph>
      <Paragraph position="3"> After identifying the second topic, the command ADD CACHE causes the set of cached items to be merged with those of the new topic.</Paragraph>
    </Section>
    <Section position="2" start_page="3" end_page="3" type="sub_section">
      <SectionTitle>
Retrieval Commands
</SectionTitle>
      <Paragraph position="0"> The retrieval commands of the network modification processor are analogous to their counterparts in AUTONOTE, LIST outputs a list of the item numbers associated with the current topic. LIST CACHE displays the numbers of the items in the cache. PRINT outputs selected text items. A11 items associated with the current topic, or those In the cache, dl1 be printed in response to RETRIEVE and RETRIEVE CACHE, respect Lvely, By employing the IDENTIFY and MOVE commands, the user may explore the representation, LISTing the associated references for each topic. During the exploration, the CACHE command can be used to store selected references for later retrieval, or the user may choose to PRINT or RETRIEVE pertinent references as he goes. These procedures allow the retrieval set to be shaped interactively, and more selectively than is possible with the FIND command discussed earlier.</Paragraph>
      <Paragraph position="1"> Removing References and Phrases from the Network The REMOVE command accepts the same argument forms as the ADD command and simply performs the inverse operations. The argument ALL also is recognized, causing all item references to be removed from the current topic.</Paragraph>
    </Section>
    <Section position="3" start_page="3" end_page="3" type="sub_section">
      <SectionTitle>
Topic Deletion
</SectionTitle>
      <Paragraph position="0"> DELETE may be employed to remove obsolete topics from the representation, or as the first step in replacing a topic description with a more appropriate one. CREATE then may be used to enter the replacement topic into the network.</Paragraph>
      <Paragraph position="1"> The major problem in the design of the topic deletion algorithm can be stated as follows.</Paragraph>
      <Paragraph position="2"> When deleting a topic, under what circumstances are structurally related nodes to be deleted as well? Consider the following cases.</Paragraph>
      <Paragraph position="3"> In the hypothetical network, a request to delete the ''paper&amp;quot; node involves a decision about deleting (a) the outline of the paper, and (b) smith's comments on the paper.</Paragraph>
      <Paragraph position="4"> Note that if the paper node were deleted and the two higher order nodes were not, the higher order nodes would no longer be structurally related. In addition, their descriptions will still contain the word &amp;quot;paper,&amp;quot; but which paper no longer is specified. For these reaaons, we concluded that a topic deletion should also entail the deletion of more specifically described topics. In many cases, this convention is an advantage, since the user can delete &amp; entire structure by identifying and deleting a single lower order node.</Paragraph>
      <Paragraph position="5"> The considerations involved in dealing with lower order nodes are a bit more complex. Some lower order nodes serve only to augment the description of superior nodes. In THE EXPERIMENT ON BLIND RATS, there will be a subordinate node directly described as BLIND RATS. In this case, deletion of the EXPERIMENT node should include deletion of the subordinate node. On the other hand, deletion of the &amp;quot;paper&amp;quot; node in the previous example does not imply deletion of the ACM CONEXRENCE topic. The ACM CONFERENCE node also plays a role in other topics.</Paragraph>
      <Paragraph position="6"> In the instances we have examined, the distinction between the two cases seems to be that &amp;quot;unimportant&amp;quot; nodes have neither textual references nor upward pointers to other topics.</Paragraph>
      <Paragraph position="7"> The deletion process employs a heuristic based upon this observation. When a subordinate node is deemed unimportant, it is deleted; otherwise, the user is asked to confirm it8 deletion.</Paragraph>
      <Paragraph position="8"> The deletion algorithm. First, a list of all nodes to be deleted is constructed. After the list is complete, the user is presented with a brief summary of the deletions to be made and is prompted for confirmation. The algorithm for constructing the deletions list is recursive. Two push-down stacks are employed: one for storing upward node paths yet to be explored, and one for saving downward paths. The procedure is most easily explained with an example. Suppose the user identifies the &amp;quot;paper&amp;quot; node and requests its deletion. The algorithm begins with node 2, first pushing down any upward pointers along with the node number (2) that was being processed when t 1 the pointers were added to the up stack.&amp;quot; In this case the pairs (1,Z) and (3,Z) are pushed down. Next, the downward pointers are placed on the &amp;quot;down stack&amp;quot; along with the current node number. The current node number is then added to the deletions list. The current state of the pushdown stacks and the deletibns list is now: The down stack is then popped and node 4 is established as the next node to be examined. After setting a flag indicating that we have just moved down a level in the network, the algorithm recurses on node 4. The system detects three upward pointers (to nodes 2, 5, and 6). It should now be apparent why we save the fact that node 4 was reached by moving down fram node 2. When  placing a node's upward pointers on the up stack, the node that led down to the current node must be ignored.</Paragraph>
      <Paragraph position="9"> Upon noting that node 4 has upward pointers in addition to node 2, the system checks to see if it has just moved down. In this case it has; consequently, node-4 is deemed &amp;quot;important&amp;quot; and the system asks DO YOU WANT THE ACM CONFERENCE DELETED? Assume the reply is NO. Since the ACM CONFERENCE node will remain, the system records that the linkage between nodes 2 and 4 must be severed. The algorithm then recurses without adding node 4 to the deletions list. Note also that the upward pointers from node 4 to nodes 5 and 6 are not placed on the up stack. The current state of the process is now:</Paragraph>
    </Section>
  </Section>
  <Section position="30" start_page="3" end_page="7" type="metho">
    <SectionTitle>
DOWN STACK DELETIONS LIST
</SectionTitle>
    <Paragraph position="0"> The attempt to pop the down stack fails, so the qp stack is popped and a flag set to indicate upward movement (to node 3).</Paragraph>
    <Paragraph position="1"> Node 3 has no upward pointers but it has two downward pointers, to nodes 2 and 7. Node 2 is ignored because it led up to node 3. Node 7 is placed on the down stack and node 3 is added to the deletions list yielding: The down stack is popped qnd the algorithm recurses on node 7.</Paragraph>
    <Paragraph position="2"> Node 7 has neither text references, nor upward pointers (besides node 3). Consequently, it le deemed unimportant and is added to the deletions list. We now have: The down stack is empty so the up stack is popped and the system recurses on node 1. The node has no new upward or downwardpointers SO it is added to the deletions list. Both stacks are now empty and the algorithm terminates having collected nodes 2, 3, 7, and 1 for deletion.</Paragraph>
    <Paragraph position="3"> Carrying out the deletion involves several steps. First, any linkages between those nodes that are to be deleted and those that will remain are sewzed. These changes will have been detected and recorded during the recursive collection process. Next the system executes a REMOVE ALL for each node on the deletions list so that no associated text reference points to a non-existent topic. Then the system removes all pointers from simple phrases to the obsolete nodes. Finally, each deleted node is removed from the node directory file.</Paragraph>
    <Paragraph position="4"> Creating New Topic Representations CREATE enables the user to define a new topic for the representation. The command takes as argument a description which is processed in the normal way, except that no item references are associated with the topic.</Paragraph>
    <Paragraph position="5"> The new topic becomes the current node.</Paragraph>
    <Paragraph position="6"> AI)D is used to associate any appropriate text references. During topic deletion, the system adds to the cache all text references previously associated with deleted topics.</Paragraph>
    <Paragraph position="7">  When processing a CREATE description, the network locator attempts to associate the description with an existing topic, for two reasons. If the description is to be a new topic but the network locator confuses it with another one, the user may want to alter the description. Second, this permits use of the CREATE command in adding to an existing description.</Paragraph>
    <Paragraph position="8"> VXfI. A ,CASE STUDY OF SYSTEM PERFORMANCE The Inapplicability of Recall and Precision The most widely accepted methods for retrieval systgm evaluation are based upon recall and precision measures. As applied to the results of retrieval queries, precision is defined as the proportion of retrieved material that is deemed relevant to a query; recall is the ratio of relevant documents retrieved to the total relevant in the data base. But recall and precision cannot meaningfully be applied to the evaluation of AUTONOTE2. The AUTONOTE2 user describes each piece of textual material himself. Even within a large personal data base, the user will certainly recollect some of his topics and the key words and phrases that define them. Furthermore, subject to the user's own limitations in describing his materials, the topic framework of AUTONOTE2 implies &amp;quot;perfect&amp;quot; precision and recall once a particular topic is identified during retrieval.</Paragraph>
    <Paragraph position="9"> A principal motivation for the AUTONOTE2 system was the desire to overcome the disadvantages of keyword indexing techniques, which force the user to trap late ideas and concepts pertinent to a given document into discrete content indicators. null In developing AUTONOTE2 we have sought to provide mechanisms for defining and efficiently referencing these concepts directly. An evaluation of AUTONOTE2 should therefore provide some comparisons of keyword indexing vs. indexing by topic. To achieve a direct comparison, protocols of both types of indexing activity with a common data base are required.</Paragraph>
    <Section position="1" start_page="7" end_page="7" type="sub_section">
      <SectionTitle>
The Sauvain Data Base
</SectionTitle>
      <Paragraph position="0"> The original AUTONOTE system was employed in a study (Sauvain, 1970) aimed at uncovering structural communication problems within a keyword-based system. The resulting data base is related primarily to Sauvain's dissertatlon research. It includes reading notes, bibliographic references, research ideas, expository material, and so on. The collection brings together a broad range of topics and ideas touching upon various aspects of computer science, information retrieval, man-machine interaction, and psychology.</Paragraph>
      <Paragraph position="1"> Copies of the item texts, the originally assigned keywords, and protocols of Sauvain's activities during data base indexing, organi8ation, and rekrieval were acquired. We then proceeded to re-index the collection with AUTONOTE2 topic descriptions. Each of the roughly 400 items in the data base was viewed and described in a sequential fashion; that is, there was no look-ahead or preplanning of topic phrasings to facilitate network structuring. Protocols were collected of all interaction with the system and the state of the network was recorded at periodic intervals. (For details, see Linn, 1972).</Paragraph>
    </Section>
    <Section position="2" start_page="7" end_page="7" type="sub_section">
      <SectionTitle>
Results
</SectionTitle>
      <Paragraph position="0"> For brevity, AUTONOTE2 reports of parsing assumptions are excluded.</Paragraph>
      <Paragraph position="1"> However, system responses that elicit a user reply are shown to provide a feeling for user interaction under AUTONOTEZ.</Paragraph>
      <Paragraph position="2"> Indexing activity.</Paragraph>
      <Paragraph position="3"> The AUTONOTEL protocols show a high degree of terse, efficient referencing of previously defined topics.</Paragraph>
      <Paragraph position="4"> The communicative efficiency was especially great in instances where several consecutive items were entered on a common topic. This situation frequently occurred when entering a set of reading notes on a particular paper or collection of papers. Typically, the first item in such a set of entries was assigned to one or more new topics, In describing the subsequent items, references to these topics often were conveyed by a s$ngle word or phrase, or by a null description (a description line consisting of only a slash is treated as a reference to the topic just mentioned).</Paragraph>
      <Paragraph position="5"> To illustrate, consider the materials dealing with various aspects of artificial intelligence. A total of 17 of these items contained notes taken at a 1968 conference at Case Western Reserve University. Of the AUTONOTE2 descriptions supplied for these items, three mention only the word CONFERENCE; five include the subphrase 1968 CONFERENCE; two include CWRU CONFERENCE; and seven make no explicit reference to the conference at all. Each of the items, tt however, was associated with a topic node linked in efome way to the conference&amp;quot; node. Furthermore, though none of the descriptions contain the wor'ds ARTIFICIAL or INTELLIGENCE, each of the associated items can bedaccesseb In the network through the &amp;quot;artificial intelligence&amp;quot; node.</Paragraph>
      <Paragraph position="6"> In the AUTONOTE protocol for these materials, there was frequent use of descriptor abbreviations and other idiosyncratic tags (CWRUAICONF, AT, COGPSY, etc.). These suggest a strong desire to eliminate repeated entry nf lengthy descriptors and phrases. The major drawback of this strategy, however, is that abbreviations (especially the more uncommon ones) are not as easily remembered as the words they represent. In addition, once an abbreviation has been used, the user q~ust remember that he has done so in order to maintain consistent indexing. In contrast, there was little motivation for descriptor abbreviations under the AUTONOTE2 system. Once a lengthy phrase had been defined in the network there was generally no need to reference it again with a full description, The Sauvain study identified a clear need for mechanisms to assist the user in maintaining consistent indexing.</Paragraph>
      <Paragraph position="7"> The second type of need (how - a descriptor has been used) frequently occurs when a text item is being entered. The user has some ideas for candidate descriptors, suspects that there has been prior usage of these words, and needs a way to check the prior usage to keep his indexing consistent. He also may want to look at prior usage contexts to get ideas about other descriptors to use, or to weed out candidates that look too general.</Paragraph>
      <Paragraph position="8"> The topital view of the data base under AWONOTE2 eliminates part of this problem. When describing a - new topic the AUTONOTE2 user need not be as concerned about prior word usage in other contexts. The representational network provides a means for discriminating among the various topics in which a particular word occurs.</Paragraph>
      <Paragraph position="9"> If, on the other hand, the user suspects that the item at hand is somehow related to a previously existing topic, there is an analogous need to interrogate the representational network for candidate topics.</Paragraph>
      <Paragraph position="10"> This capability is provided by the DESCRIBE command. There were, in fact, numerous instances in the AUTONOTE2 protocols of network interrogation prior to entering descriptions. An example is given in Fig. 22. In response to the user's description</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML