brictm Journal of C~mputat~~nal l$ingui~tio Microfiche 23 
Copyright 1975 by the Association for. ~ompufational Linguistias 
A U T 0 N 0 T E 2 : NETWORK MEDIATED 
NATURAL LANGUAGE COMMUNICATION 
IN A PERSONAL 
INFORMATION RETRIEVAL SYSTEM 
William E. Linn, Jr . 
2 
and 
Walter Reitman 
Upiversity of Michigan 
Ann Arbor 
1 
This paper is based on a doctoral disserfiation by the first author. 
Support 
from the iUationai Science Foundation under ~rmt 'No. DCR71-02038 is gratefully 
acknowle8ged. Those wishing more mmplete details about system cads and 
imp.tomentatioa should,write the s-d author for a User's Manual. 
2 
lVow att southern Railway System, 125 Spring Street, S. W., Atlanta, Georgia 30.303 
ABSTRACT 
Natural language combinesnOuns and adjectives into noun phrases,, and 
links phrases by means of.p,repositions to form complex descriptiops of 
objects and topics. AUTONOTEZ, a file-orsented retrieval systeq, allows 
the user to employ such descriptions to characterize the items of informa- 
tion he wishes to store and retrieve. Tn addition. the system also cm- 
structs a network qpresentation of the user's sub3ect matter, using syntac- 
tic analysis to derive dependency structures fxhn hh descriptions. The 
depe~dency information, expressed as subordinate and coordinate linkages 
among the phrases, is representea by a tree of nodes, with simple phrases 
at the terminalbranches. The PARSER uses the network to digambiguate dew 
criptions, querying the user only abht regidual ambiguities. 
Associated with the PARSER is a network LOCATOR, which determines 
whether a ckrent user description refers to an existing topic at some level 
in the network. The LOCATOR also builds a table specifying t;he changes, if 
anp, to be mede in a network in order to represent the topicdnferred from 
the current input description. For example, if the user's description con- 
tains one or more simple phrases (thereafter referred to as active) directly 
describing at least one existing node in the network, the description as a 
whole quite likely references an exssting network topic. To locate 1t, the 
PARSER fgrs't deterdaes tlie - focus phrase, the active phrase at the highest 
dependency level. The nodes directly described by the focus phrase are 
wed to generate candidate topice. These then are matched against the 
remaining active phrases obtained from the description to determine the 
most likely referent. 
Manp gf the procedures employed in dsScription and representation also 
are wed in network-mediated' retrieval. The user may* initiate retrieval 
with a FIND comm~nd, supplying a descripti~n as afgument. The resultant 
phrase table is passed along to the network LOCATOR, which returns a node 
ntllhber to the FIND processor. The FIND processor constructs a set of item 
numbers by extracting the tkxttral refereaces from the node. The system 
then checks for upward pointers from the node. If there ars.tm.-uctura~ly 
xelsted topics, the FIND processor so informs the user. Note that by virtue 
of netwprk midiation of retrieval, if a user description Ys imprecise or 
incorrect, the systemmay be able to direct the user to relevant related 
topics. 
meri the system queries the user about a topic, for example to deter- 
mine the intent-of a descriptPon, the eapic node number is passed to a 
SPEAKER component. A phraeal description of the node is returned. To 
minimize redundant cmication, a level indicator may Qe set according to 
the level of &tail in the user's description. For example, if the user 
describes an item as RESULTS OF TBE WPERIMWT and the systemamst ask it he 
in rezerring to SMITH'S EXPERIMENT ON 
SHORT TERM MEMIRY OF WHITE RATS, 
the resulting query would be: ARE YOU WRRING TO SMITH'S EXPERIMENT ON 
WORP? 
Cmetruct5on of a desclr;iption from the network takes place in mob 
stages. 
The fitst stage steps thorugh the network recurs&vely, collecting 
the eiqle phrases that directly or indirectly describe fhe speoified node. 
The level indicator blocks collection of simple phrasee below the specified 
level. The second stage is carried out by a recursive algorithm that 
operaqes on the tabled simple phraygs and their interrelations to construct 
the phrasal description. 
The last major component of the system handle6 network modification 
and reorganization. This enables the user to add or remove references and 
phrases, and to modify,, delete, or reorganize his topic structure. 
A detailed ease study comparing AUTONOTE2 with a good keyword-based 
retrieval aystxm showed that fol: a coherent body of material, the comuni- 
cati,ve efficiency 0% AUTONOTEL, as measuredfbp the ratio of the number of 
pords conveyed to the number of words entered, was more than double that of 
the kyword-based system. Retrieval capability was enhanced considerably, 
and the tepresentati~n dewqrk effectxvely distinguished among the many 
topics partially indexed by the same words. Furthermore, SPEAKER output of 
topics from the rep~esentatiorral network proved a useful retrieval inter- 
mediary, greatly reducing the need fo,r perusal of item texts. 
TABLE OF CONTENTS 
Page 
I . AUTONOTI? SYSTEM ....................... 9 
................. 
Basic~~ONOTECommands~ 10 
................ 
AUTONOTE System Organization 14 
....................... 111 . QVER~IEWOFAUTONOTEZ 16 
.................. The Descripaoh Languagk 17 
.................. Representational Framework 19 
.............. Criteria for the Representation 19 
.......... Overview of the AUTONOTE2 Implementation 24 
Design of the Network Data Structures ......... 32 
........... 
Storqge Implementation of the Network 37 
The Repfesentational Network: An Example ........ 40 
.................. Parsing OF Descriptions 40 
VI . NETWORK~IATEDBETRIWAI .................... 68 
Retrieval via Descriptions, ................ 68 
Interrogating the Wetwork ................. 71 
................... !be SPUR Component 73 
Page 
...................... 
VXT . I'TE~~ORK MODIFICATION 78 
....... Adding Refetences and Phrases to the Network 
79 
................ 
Moving through the Network 81 
.................... 
The Caching Facility 81 
.................... 
RetrievalCommands 82 
Removing References and Phrases from the Network . . . 82 
Topic Deletion ....... ... .... .. ...... 82 
............ 
Creating New Topic Representations 86 
............... 
VIII . A CASE STUDY OF SYSTEM PlERFORMANCE 87 
........ The Inapplicability of Recall and precision 87 
............. .... 
The Sauvain Data Base ... 88 
.......................... 
Results 88 
......................... 
Conclusion 94 
.............................. 
REFERENCES 97 
I. INTRODUCTION 
When two humans communicate, each party builds up a conceptual represen- 
tation of the topics of discussion. 
Such repreqentations are fundamental to 
human cormrmnicat%~e efficiency. 
The listener' s representation of the topics 
alteacly discussed facilitates communication in that the speaker is spared 
the trouble of describing in complete detail those thbgs to which he refers. 
Furthermore, the speaker can proceed to related topics without having to 
describe them in full. 
For example, a speaker who has been talking about the 
design of a particular experiment can safely move on to discuss the results 
of the experiment without specifying anew the experiment he has in mind. 
We use the term referential communication to indicate the process by 
which a speaker communicates a reference to some subject or topic to a 
listener. ff within the envirbnment of a personal information system, we 
view the information universe as a collection of textual materials each 
II 
pertinent to one or more topics," then one can readily construct an analogy, 
The user and system take on the rol~~s of speaker and listener, respectively. 
The domain of discourse is a set of topic descriptions characterizing the 
ther's t~tualmaterials, The user enters his materials and describes to 
the system the topic or topics to which they pertain. Durlng this process, 
the system constructs its - represiintation for the subjects the user has 
described, and associatgs each piece of text with its, corresbonding topic 
representations. 
This paper describes the design and implementation of a personal infor- 
mation storage and retrieval system based on the foregoing analogy with 
human referential communication. It presents a hierarchical network data 
strqcture foe representing topic descriptions formulated within a phrasal 
description language. 
~alied the representational network, this structure 
enables the system to move easily from one Bubject to other  elated ones. 
It provides a means for representing the user's working context, thereby 
enabling the user to describe his materials much more tersely than is possi- 
ble in keyword-based systems. The system makes use of the syntactic depen- 
dencies among the words and phrases of descriptiops in or'der to represent 
structural relationsh'ips among the user's topics. Consequently, the user 
impaats structure to the data base in a particularly natural way, eliminating 
much of the organization activity normalb associated with keyword-based 
systems. Our central thesis is that the network mediated techniques provide 
for mze effective mawmachine communication during the processes of des- 
cription, organization, and retrieval within a personally generated informa- 
tion universe 
The procedures used here differ substantially from the typical keyword 
indexing and retrieval mechanisms of other personal retrieval systems. The 
centrab objective is to provide the user with framework for defining the 
important topics or informational objects he deals with, and to enable him to 
easily associate items in his data base with these entities. Rather than 
viewing the data base as a collection of items and associated index terms, 
the user deals with "objects" thet are in some sense meaningful to him. 
Whether retrieving information or indexing new material the user conveys 
references to the appropriate topics. This shift in the user's view of his 
information universe, coupled with the mechanisms we have developed for 
building up and referring to the topic framework, oonstitute the substance 
of our approach to personal information storage and retrieval. 
11. THE AWONOTE SYSTEM 
The system described here uses the AUTONOTE information storage and 
retrieval system (Reibnan - et -. al 9 1969) as a base. AUTONOTE is an on-line 
retrieval system that runs a& a user program under the Michigan Terminal 
System (MTS) , a time-oharing system implemented on the IBM 370/168. The 
basic units of information stored in AUTONOTE a're called items. The user 
may enter arbitrary textual materials into an item and may assign descripmrs 
by whfch these materials can be retrieved. Retrieval requests take the farm 
~f single descriptors or combinations of descriptors connected by AND, OR, 9s 
NOT logical operators. Facilities are provided for deleting, replacing, 
linking, and hierarchically organizing text item&. 
AUTONOTE makes extensive use of the WS disk file system EiflIS disk 
files (line files) may be read or written either sequentially or in an indexed 
fashion by specifying a line file number. AUTONOTE maintdins two line files 
for each user's data base; one for storing textual materials and bookkeeping 
information, the other for storing a descriptor index. Each text item 
occupies a specific region of the line number range of the text file. The 
descripfor index, on the other hand, is accessed through an efficient hash 
coding algorithm that mdps each descriptor into an index file line number. 
The descriptor index is organized as an inverted file, that is, each line in 
the index contains pointers to each of the text items assigned the descriptor 
for that line. 
Basic AUTONOTE Commands 
Text entry, To enter a new text item, the user first types the command 
ENTER and the eystem responds with a numerical tag for the new item. 
The 
system then enters a "lext insertian mode" and indicates its readiness to 
accept successive text lines with a question mark. Aftex entering text,, the 
user may return to "command mode" by entering a null line or an end-of-file 
indication, Should the user at any time wish to continue inserting text into 
the current item, he may re-enter text insertion mode via the INSERT cormnand. 
Subsequent lines are placed below the most recent line for the current item 
in the text file. 
In command mode, the system prompts the user for input with a minus sign. 
The user may give each command in full or he may abbreviate by giving any 
initial substring of the command name. 
Descriptor entry. To associate one or more descriptors with the current 
text item, the user enters a list of words, beginning the input line with an 
at sign (@). Any character string up to 16 characters in length may be used 
as a descriptor. kn addition to updating the descriptor index, the system 
also places the actual "@-linet' in the text file in a subregion beneath the 
text a£ che current item. 
Retrieval, To display a partciular text item the user may enter the 
command PRINT followed by the appropriate item number. Sequential blocks of 
items can also be specified in the PRINT commapd, &go, PRINT 77...85. 
In most cases, however, the specific item number(s) will not be known. 
The LIST command accepts a descriptor or logical combination of descriptors 
as its argument and ~esponds with a list of the item numbere that satisfy 
the. query. 
The functions of the PRINT and LIST commands are combined in the 
RETRIEVE7command. 
It also takes a descriptor specification Be argument and 
causes each item .i~ the resulting list to be PRIWed. 
Definitional facility. AUTONOTE also provides a definitional facility 
that allows the user to create sets of items referenced by arbitrary com- 
binations of descriptors. For example, the comnd CREATE SIRS= INFORMATION 
AND RETRIJ3VA.L AND SYSTEMS adds a new descriptor, SIRS, to the index that 
references each item having the words INFORMATION, RETRI~vAL, and SYSTEMS as 
descriptors. Any defined term may be used just as any other descriptor in 
retrieval requests; they my also be used to def other new terms (e,g., 
CREATE $OTHERSYSTEMS= SIRS NOT AUTONOTE) . 
The definitYona1 facility is also invoked implicitly each time the user 
issues a retrieval query. The set of items referenced by the most recent 
LIST or RETRIEVE command, called the active set, is assigned the name $, 
Should the user wish to refine the results of the previous query, he has 
access to the active set. To facilitate this process, each time a missing 
descriptor is noted in a retrieval request the descriptor $ is inserted 
auto~&Lcally by the system. For example, the command LIST NOT FPRT is 
Interpreted as LIST $ NOT FROTRAN, i.e., the old active set of items is 
restricted to include only those not referenced by the descriptor FQRT 
This operation, of course, redefines the active set. 
Item-iteq linkae. The ability to define asso~iative links between 
any two text items is provided by the APPEND command. When an item is dis- 
played, its associative links to other items may optionally be printed along 
with a user-lspecified comment indicating the nature of the assocLation. 
Tutorial feature. Throughout the course of its development, AUTONOTE 
has been employed to collect, organize, and maintain up-to-date documentation 
af its capabilities, usage strategies, and so ono This information is stored 
in a publically available data base. It includes brief descriptions of ea& 
of the commands, announcements of recent developments and system changes, and 
other instructive information. The AUTONOTE user may call upon this store of 
wterial by entering a HELP command. The user s data base is temporarily set 
aside and the public data base is attsebed to the system. The user may then 
retrieve instructive informakion in the same way that he operates with his 
own data base. To assist novice users, the system will optionally print 
instructions for accessing the HELP data base. 
Grouping. AllTONOTE provides a grouping facility which permits the user 
to organize text items in several useful ways. 
It enables the user to define 
a "grouping item" which references an arbitrarily ordered list of other items 
This is done by entering into an item an @-line of the form: 
Since apy item can represent a g+oup, *it is possible to form a complex hier- 
archical structure in this way. 
A grouping item can be viewed as a node of an inverted tree structure 
w;ith downward branJles to those items listed in its "@GROW" line. 
A request 
to display a grouping item initiates recursive processing of the tree 
structure to identify the terminal ad& nonterminal f terns of the hierarchy. 
The user may request that only terminal or',nonterminal items,be displayed, 
or t e entire list of materials be printed, 
The organization of the HELP data base described above provides an 
excellent example of the pmer and flexibility of the grouping facility. 
The HErJe text filelcontains at this writing approxinrstely 150 %terns of 
documentation. Using the grouping conventton, these are orgaritzed into 
five subgroups: (1) general #nf ormation; (2) input and editing facilities ; 
(3) output (retrieval) facilities; (4) organizational facilities; and (5) 
utflity compands. Thme-is; one major item which groups all of these aub- 
grottps into a single tree structure. The top node of the strusture is 
indexed by the descriptor USERS-MANUAL. As new facilities are incorporated 
into the system, their descriptions are entered .Into the manual structure, 
thus assuring that complete and up-to-date documentat$..on is always avail- 
able. At any time, the single command: RETRIEVE USERS-WAL causes the 
entire updated data base to be displayed in organieed fom: 
Command modifiers. AUTONOTE lacJudes a set of modifiers or option 
eettings that control the execution of many c ds. These include options 
that affect the format of displayed items, the etlrpanslon of grouping atruc- 
tures, the nature and extent of system feedback, etc. 
E&ch 0-f the modifiers 
has a default. value that is chosen to simplify use of the system by a 
novice. 
The more experienced user may alter the modifiers via the SET c 
mand to. tailor the system to hie awn needs, usage patterns and hvel of 
competence. 
AUTONOTE also provides a large number of auxillary commands and facili- 
ties. A list of the major AUTONOTE commands, each accompanied by a brief 
description, is incldded in Linn (1972, Appendix A). 
AaTONOTE S m Otganization 
AUTONOTE l-@s been des gned as a modular system SQ that as new facili- 
ties become a~ailable~they may be tested and later added with' little or no 
reprogramming of the exjgting system. The majority of AUTOMOTE co 
are fmplemented as subroutines, each of which resides permanently in an MTS 
disk file. The basic system $s organized around a central monitor that 
accepts user input and calls upon appropriate modules to service the user's 
requests. In addition to the monitor, the core resident system includes a 
dynamic loader, a disk file interface and a set of frequently used utility 
routines. A number of pridtive commands, text entry, and descriptor assign- 
ment are qlso handled by the resident system. As the user =quests more 
cbmplex sentices (LIST, RETRIEVE, or PRINT, Ear example), the monitor calls 
upon the* dynamic loader to bring the appropriate modules into core storage. 
These routines then becoke a part of the resident system, remining in wre 
etorage until the user explZcitly requests their removd. An organizational 
diagram of the AUTONOTE system appears in Fig. 1. 
The modular design of AUTONOTE coupled wtth the dynamic loading facility 
offers two important benefits. From the user s viewpoint, he has access to 
the complete repertory of AUTONOTE seMces, yet he pays core storage 
charges only fm those routines he actually uses during a given session. 
To the developers of the system, the modular framework facilitates the 
STORAGE 
mUT/otrrPUT 
ROUTIWS 
J 
A 
d 
MONITOR 
LYIMMIC LOADER I 
(COMMAND 
INTERPRErnR) 
s 
PROCESSOR 
Fig, 1 - AUTONOTE System Organization 
addition of new system components. The latter has been an important factor 
in the implementation of the AUTONOTE2 system. 
111, OVERVIEW OF AUTONOTE2 
The AUTONOTE2 system uses ideas (Reitman, 2965; Reitman - et -* a1 9 1969) 
concerning the use of our "knowledge of the world" to clisambiguate and fill 
in implied facts when conversing with one anathear. Zn parti~ular, the system 
design is based upon the assumption that efficient human communication... 
"depends upon the listener's ability to make inferences from prior informa- 
tion, from context, and from a knowledge of the speaker and the world. Com- 
municating in this way, we risk occasional misunderstanding as the price for 
avoiding verbose, redundant messages largely consisting of material the 
listener already knows" [Reitman - et -* a1 9 19691. 
In bu~ mare restricted domain of discourse, we view the process of human 
referential c~unication as onq >pided by some f om of internal rapresenta- 
tion of the various topfqs or referents discussed earlier. When a listener 
can be assumed to have such a representation, the speaker is spared the dif- 
ficulty of describing in complete detail the things to which he refers. 
He; 
need only give enough information to allow the referent to be discerned in 
full. Our goal then is to develop a represenratlonal scheme for our retrieval 
system that allows the user analogous conrmunicative efficiencies. 
The Description Language 
The first step in devising a representational framework wa8 the fom- 
lation of a language for expressing topic dedcriptions to the system. 
Although an underlying factor in the design of AUTONOTEL was to make com- 
It 
municatloh with the system more natural," it should be noted that the 
emphasis of this research is not upon parsing or "understanding" natural 
language. Rather, our goal is to investigate the notions of topic repre- 
sentation and referential camunication as a means for improving the user's 
ability to describe, organize, and retrieve his materials. Consequently, 
a minimal subset of noun phrases waa chosen-minimal in the sense that it 
excludes most of the complexity of natural English, yet still retains a 
degree of descriptive richness sufficient to explore the underlying ideas of 
this study. 
Natural language enables us to combine nouns and adjectives int~ noun 
phrases and to interlink noun phrases via prepositions to form complex des- 
criptions of objects in the real world. The AUTONOTE2 description language 
provldes such a framework for composing topic references. A form~l grammar 
for the language is given in Fig. 2 along with a few sample descriptions that 
illustrate the flexibility of expression achievable with the language. These 
grammatical rules are not in fact used explicitly by the system in actually 
parsing topic descriptions. 
The grammar is presented here only to specify 
precisely the set of descriptions acceptable to the system. The actual 
AUTONOTE2 parser is heuristic-based, making use of previously analyzed 
phrases, noun-preposition co-occurrences, and a set of heuristics to guide 
qdeacrip tion> : : = (noun-group, I 
knoun-group> <preposition> <description> 
<noun-group 3 : := (<article>) (<modif ier-group) <noun>a 
/ <preposition> : about 1 to I from 1 in I on etc. 
: := a I an I the 
(a) Grammar for the description language. 
The paper about microprogramming in the proceedings of the fall joint computer 
conference 
Notes on the organization of AUTONOTE2 for use in the presentation of the ACM 
Paper 
The use of recall precision measures in the evaluation of the SMART information 
retrieval system 
Quotes from Peldman's 1969 paper for use in the introduction of the second 
chapterb 
(b) Sample descriptions. 
Fig. 2 - The AUTONOTE2 Description Language 
%odif iers and nouns are arbitrary character strings not recognized as 
articles or prepositions. When a number of consecutive "words" are encountered, 
the last is parsed as a noun and the preceding words as modifiers. 
b 
Possessive adjectives are treated as a special case of adjectival modifi- 
cation. 
the parsing process. 
In some instances, the user may even be asked for 
parsing assistance. 
Representational Framework 
Central to the design of AUTONOTEZ is the idea of viewing the user's 
information universe as a collection of "informational objects" or topics, 
each having associated with it a number of text items. 
When the user wishes 
to describe a text item, we assume he has such a topic in mind. 
Using the 
phrasal language specified above, he composes a description of that topic 
and presents it to the system. AUTONOTE2 then constructs an internal repre- 
sentation of that topic. When a text item is described, the system must 
consult the representation to determine if the description (1) references an 
existing topic, (2) is related to an existing topic, or (3) defines a new 
topic. In any case, the ultimate goal is to associate the text item with a 
topic representation, possibly augmenting the representation in the process. 
Criteria for the Representation 
Efficiency of comunication, Efficient man-machine communication im- 
plies that the user should not in general have to formulate a complete des- 
cription of a particular topic in order to convey a reference to it. The 
system should be capable of accepting and correctly interpreting incomplete 
referefices bp filling in missing information. As an example, a topic fully 
described as THE PAPER BY SALTON ABOUT THE SMART SYSTEM might be referred to 
as THE PAPER, THE PAPER BY SALTON, THE PAPER ABOUT THE SMART SYSTEM and so on. 
A description in the AUTONOTE2 language consists of a noun modified by 
adjectives and prepositional phrases. The words that modify any given term 
may themselves be modified in exactly the same way. In effect, each adjec- 
tive and prepositional phrase functions as a phrase component that imparts 
greater detail to the overall description. In the example above, BY SALTON 
and ABOUT THE SYSTEM provide information about the paper; SMART specifies 
which system is meant. 
To facilitate efficient communication we require a representational 
framework that makes explicit the component phrases of each topic description. 
Given such a framework, we have a basis for comparing incomplete descriptions 
with the representation to determine possible topic referents. 
--. 
A system that makes use of syntax in the user's 
descriptor entries increases descriptive power in that it permits distinctions 
that, in geberal, will not be made in keyword-based retrieval systems. A des- 
cription such as THE ORGANIZATION OF THE PAPER ABOUT MTS is semantically 
quite different from THE PAPER ABOUT THE ORGANIZATION OF MTS, despite the fact 
that both contain the same words, A system that takes into consideration 
the syntactic relationships that hold among the words ORGANIZATION, PAPER, 
and MTS can discriminate between the two. 
The considerations outlined thus far lead quite naturally to some form 
of dependency representation for the user's topics. 
Essentially, a depen- 
dency representation for the AUTONOTE2 language would reflect the syntactic 
dependence of each adjective and prepositional phrase upon an appropriate 
noun. Such a framework provides the essential information for enhancing 
descriptive power and communicative efficiency as defined above. 
Hierarchical representations. 
We view a topic as a group of intercon- 
nected subtopics, each bearing on a central theme yet with varying levels of 
generality. 
To make this notion more concrete, consider a user of AUTONOTE2 
putting down his thoughts and ideas for a book he is writing. 
He begins by 
entering some general material which he describes simply as "THE BOOK 
ABOUT.. . ." At some later time he may enter an outline for the book, a list 
of reference materials he will use, publishing arrangements, etc. Still 
later, he wi.11 enter materials for the chapters of his book and perhaps out- 
lines for each chapter, In time he will have defined a host of related des- 
criptions. Fig. 3 gives a pictorla1 representation of the resultant complex 
"topic. " The representational spl~eme of AUTONOTE2 was designed with complex 
hierarchies such as this one in mind. In other words, we want to represent 
related topic descriptions via interconnections in a network, 
The essential idea is that such a network corresponds to a map of the 
organization of the associated textual materials--a map that should reflect 
important structural relationships among the materials from the user' s view- 
point. A hierarchical representation of this kind is especially effective 
during retrieval. If the user requests materials dealing with his book, for 
example, the system can also inform him that he bas more specific items deal- 
ing with the publishing arrangements, the component chapters, and so on. 
The notion of a representational network fits well with the dependency 
framework we require. 
The syntactic dependencies among the words and phrases 
of a description may be used to represent structural relationships among the 
user's topics. 
In the example above, the network connection between the 
OF CHAPTER 1 
WE BOOK 
Fig. 3 - A Topic Hierarchy 
"out line" and the "book1' corresponds to the syntactic dependency of "book" 
upon "outline1' in the descrip tionL THE OUTLINE OF THE BOUK ABOUT. . . . 
Au~plentation of the repre-sentation. In the previous discussion of 
communicative efficiency we were concerned with associating an incomplete 
description with its corresponding topic. In designing the representational 
framedork we also had to consider the cqse in which a reference provides a 
more detailed d'escription of an existing topic. In such instances we want to 
enrich the topic represbntation to include the additional information. 
Whether additional descriptive information is encountered in a subsequent 
item description or in a retrieval request, we want the system to incorporate 
it into its existing knowledge of the user's topics. This requires that the 
representation be stmctured in such a way that dynamic augmentation is easily 
accomplished. 
The representation of context. In providing a framework for interpre- 
ting terse, incomplete references we naturally are confronted with the problem 
of ambiguity. A description such as THE PAPER, or TRE PAPER ABOUT MICROPRO- 
GRAMMING may in fact satisfy a large number of distinctly described topics. 
To deal with this problem we require some kind of contextual framework that 
enables the system to infer, where possible, the intent of a vague or ambigu- 
ous reference. A user who has been entering material for a paper he is 
writing should be able to describe a subsequent item as,say, THE OUTLINE OF 
THE PAPER, and have the system infer which paper he means. 
In general then, 
we want the representatfonal framework to include information that identifies 
the "working- context ," i. e. , those topics the user has ref erred to recently. 
System interrogation of the user. Presented with an ambiguous descrip- 
tion "out of context," the system is faced with much the same dilemma a human 
listener would face. In such instances, we want the system to be capable of 
asking pertinent questions to resolve the ambiguous reference. This implies, 
of course, that the representation preserve sufficient information to enable 
it to reconstruct descriptions of the user's topics. 
werview of the AUTONOTE2 Irpplementation 
Data structures. We have now presented the major design requirements 
for the representational framework. These preliminary criteria suggest a 
representation organized as a network of (possibly interconnected) dependency 
structures obtained from syntactic analysis of topic descriptions. The net- 
work data structures are- discussed in section IVin terms of the representa- 
tional criteria and also the computational requirements--how they are to be 
accessed, modified, and so on. 
The parser. The parsing of descriptions is guided by the state of the 
representation at the point they are entered. For this reason, the parsing 
algorithm is treated in section IV in conjunction with the representational 
data structures. The&presentation includes detailed dlscussion of the parsing 
problems encountered and the heuristics employed in dealing with them. 
Network,,locati,on. The function of the network locator is to analyze the 
parse tree to decide whether the description references an existing topic or 
defines a new one. Once this decision is made, it constructs a list of any 
network madifications required to represent the topic and its associated item 
reference. The network location algorithm is described in section V. 
Retrieval. The AUTONOTE2 retrieval component is invoked via a FIND 
command. The command takes a topic description as its argument. 
The FIND 
processor in turn calls upon both the parser and network locator, regaining 
control after the appropriate network topic has been identified. 
Text items 
directly associated with the topic then may be retrieved from the data base. 
Alternately, the retrieval component will move to structurally related topics 
in the representational network to collect additional item references for 
subsequent display. 
To reconstruct topiq descriptions from the network, AUTONOTE2 includes 
a SPEAKER module. If the user's description is ambiguous, for exmple, the 
network locator may call for a display of the alternative topics. The FIND 
processor employs the SPEAKER to present descriptions of topics structurally 
related to the user's original query. The user also may invoke the SPEAKER 
explicitly, via a DESCRIBE comnd, to obtain descriptions of some subset of 
the topics in the representational network. The retrieval component, the 
SPEAKER, and the DESCRIBE command are treated in section VI. 
Network modification. The last major component obf AUTONOTEP, the net- 
work modification processor, is described in section VIZ. It allows the 
user to delete topic representations, create new ones, and merge multiple 
topics into a single representation. It also enables the user to move 
through clusters of related topics in order to explore associations in the 
network. 
Auxiliary commands. Various auxillary conrmands and facilities are given 
in Linn (1972, Appendix By. This appendix also includes some discussion of 
usage strategies for achieving the most effective use of AUTONOTE2. 
Fig. 4 depicts the organization of the AUTONOTE2 components within the 
AUTONOTE system framework, 
IV. 
THE PARSER AND THE REPRESENTATIONAL NETWORK 
Overview 
When the user wise$ to describe a text item, we assume he has in mind 
some subject, topic, or informational object that can be characterized by a 
phrasal description. A description may convey a refeence to a topic the user 
has dealt with earlier; or it may define a new one. The description is ana- 
lyzed to determine a dependency tree--a structure that preserves the original 
words and phrases of the description and the syntactic dependencies among 
them. 
In constructing this tree, the parser incorporates primary units called 
simple phrases. A simple phrase may consist of a modifier and a noun (e .g., 
ACM CONFEFUNCE), or of a noun followed by a preposition and modifier (e.g., 
OUTLINE OF PAPER). The parser extracts these basic phrases from the original 
description and records the syntactic dependencies among them. 
A description 
such as THE OUTLINE: OF THE PAPER ABOUT AUTONOTE2 FOR THE ACM CONFERENCE will 
be analyzed into four simple phrases: (1) THE OUTLINE OF THE PAPER, (2) THE 
PAPER ABOUT AUTONOTE2, (3) THE PAPER FOR THE CONFERENCE, and (4) THE ACM CON- 
FERENCE. Each simple phrase consists of a subject noun and a modifier word. 
When two simple phrases have a common subject noun, we say they are coordin- 
ate simple phrases. When a modifier word of one simple phrase subsequently 
- 
becomes the subject noun of another, we say the latter phrase is subordinate 
DESCRIPTION 
PROCESSOR 
Fig. 4 - The Organization of AUTONOTE2 @thin the AUTONOTE System 
to the former. In the above example, THE PAPER ABOUT AUTONOTE2 and THE PAPER 
FOR THE CONFERENCE are coordinate simple phrases--both have PAPER as their 
subject noun. Both of these are subordinate to the phrase OUTLINE OF THE 
PAPER, in which PAPER appears as a modifier word. Additioaally, the simple 
phrase THE ACM CONFERENCE is subordinate TO THE PAPER FOR TWE CONFERENCE. 
Subordinate phrases simply qualify the use of their subject words. For ex- 
ample, phrases subordinate to THE OUTLINE OF THE PAPER provide a more detailed 
description of the paper ("ABOUT AUTONOTE2 , " and "FOR THE CONFERENCE") ; the 
phrase subordinate to THE PAPER FOR THE CONFERENCE further qualifies the con- 
f erence. 
In effect, two kinds of dependency information are extracted by the 
parser. The first is the dependency of adjectives and prepositional phrases 
upon a noun. This information is reflected in the selection of the simple 
phrases themselves. Second, there are the dependency relationships among 
the simple phrases of the description, This information, expressed in ttrms 
of subordinate and coordinate linkages, may be represented by a tree structure 
consisting of nodes with simple phrases at the terminal branches. Fig. 5 * 
gives the tree structure for the example. Simple phrases wlth an immediate 
lknkage to a node are said to directly describe that node. Note that the two 
coordinate phrases from the example directly describe a common node, node B. 
The subordinate relationship of the node B phrases to the node A phrase, and 
in turn, that of the node C phrase to the node B phrases is reflected by down- 
ward branches connecting those nodes. 
The resultant tree structure degines the representatf~n of its correspon- 
ding topic. 
Representations of each of the user's topics are organized into a 
Fig. 5 - Siaiple Phraqe Dependency Structure 
hierarchical data structure called the representational network. The repre- 
sentational fietwork is composed of interconnected nodes, simple phrases, and 
words. When description is mapped ohto the network, the number of the asso- 
ciated text item is stored with the highest order node in the corresponding 
topic representation. 
Each node in the network may b&ve up to four types of linkages: 
(I) 
pointers down to simple phrases that directly describe the node; (2) pointers 
dm to subordinate nodes; (3) pointers up to superior nodes; and (4) poin- 
ters to textual materials associated with the node. Each simple phrase or 
single word is directly accessible 3s a unit in the network. through hash cod- 
ing procedures similar to those used in maintaining the AUTONOTE keyword in- 
dex. I~sociated with each simple phrase are the linkages to the node(s) the 
phrase directly describes. In turn, each sdngle word has associated pointers 
that lead the system to the simple phrases containing the word. Fig. 6 il- 
lustrates the network representation of the eftample. 
Once a topic is defined in the network, the user can refer to it using 
a word, a simple phrase or composition of simple phrases, For example, 
should the user later describe a new text item as say, OUTLINE OF THE PAPER 
or OUTLINE OF THE PAPER ABOUT AUTONOTF2, the system will note that: it al- 
ready has a representation for the topic. The only change to the network in 
such cases is the addtkion of new item reference linkage to the identiffed 
node (node 1). In general, the system attempts to relate each new item des- 
cription to those it already "knows" about. For new topics, new nodes are 
allocated in the network. Should some subset of the simple phrases of a new 
description refer to an existing topic, the additional simple phrases are 
Pointer to text 
PAPER about 
Fig 
6 - Corresponding Representational Network Structure 
linked to that existing representation. For example, if in reference to the 
same paper the user describes another item as THE ABSTRACT OF THE PAPER ABOUT 
AUTONOTE2, the system would modify the network to that shown in Fig. 7. 
Design of the Network Data Structures 
List structures. As noted above, the representational criteria dictate 
a hierarchical ne4Qork-type organization, based upon dependency analyses of 
topic descriptions. List structures are particularly well suited for this 
kind of application. They provide a convenient representahon for depen- 
dency trees and are especially appropriate for dealing with complex, evol- 
ving structures. 
In designing special purpose list structures for the representational 
net~~ork, we first specified the logical components of the structure and de- 
fined the interconnections among these primitives. Three logical components 
were forumlated--simple phrases, nodes, and words. The following subsections 
present the major design considerations for each structural component. 
Simple phrases,. Given our goal of communicative efficiency, we chose 
the simple phrase as a primary udit for the network. By analyzing a toplc 
t ? 
description into simple phrases we are in effect isolating possible short- 
hand" references to the given topic. The representational data structures 
have been designed to allow a topic to be referenced through any of its com- 
ponent simple phrases. 
Simple phrases are formed from either adjectival or prepositional modi- 
fication of a noun. Very often, an adjectival modification can be equiva- 
lently expressed by a prepositional phrase dependent upon the same noun 
Pointer to 
Pointer to text 
Fig. 7 - Network Representation of a Topic Hierarchy 
(example: THE PAPER ABOUT AUTONOm and THE AUTONOTE PAPER). In other in- 
stances the adjectival form may have multiple interpretations; THE SMITH 
ARTICLE could refer to an article by Smith or possibly an article about 
Smith. Some prepositions may be used synonymousLy in a particular context 
(THE PAPER ABOUT (ON) SHORT TERM MEMORY); others convey distinctly different 
meanings (THE MEMO TO THE COMMITTEE versus THE MEMO FROM THE COMMITTEE). We 
do not deal with these problems to the extent of providing a semantics for 
"understanding" natural language. Hmwer, the representational structure 
makes explicit the various possibilities, so that the system is able to gen- 
erate plausible alternatives. 
We treat adjectival modification as a special case, as if the modifier 
and subject noun were related by an unspecified preposition. In terms of 
the data structure design, all simple phrases composed of the same two words 
are mapped into a larger unit, each subunit of which represents a particular 
instance of a simple phrase in a topic description. This arrangement assures 
that all information on simple phrases involving any two words is accessible 
collectively. This information will then be at hand to provide a basls for 
interpreting the alternative referents of each incoming simple phrase. For 
example, should the user make reference to THE SMITH PAPER and the system 
finds only PAPER ABOUT SMITH in the network, then that single alternative 
is chosen. On the other hand, if PAPER BY SMITH also is present, the system 
considers both possibilities. 
Network nodes. 
The next structural component of the representational 
network is the node, A node groups together a set of simple phrases that 
comprise the description of the node, me node also functions as a collector 
of item references, 
Each node in the representational network corresponds 
to a topic or concept pertinent to the items of textual material associated 
with it. 
The simple phrases that directly describe a node define the cor- 
responding concept. 
Any given node may be linked to more general (lower) 
nodes, or to more specific (higher) nodes. For example, a node that repre- 
sents a particular paper may be linked downward to another that describes 
a conference at which the paper was presented; it may also be linked to 
several higher order nodes corresponding to, say, a summary, an outline, and 
a review of the paper. As more and more items are described, additional 
topics may be tied into the same conference node. The ultimate result will 
be a highly interconnected set of concept nodes, each with its own set of 
associated textual materials. 
To achieve this kind of structural organization for the network, we 
make use of the dependency relationships in the user's descriptions: each 
node level corresponds to a syntactic dependency level. In terms of the 
example above, the adjectives and prepositional phrases modifying the noun 
"paper" are formed into simple phrases that will directly describe a common 
node. Simple phrases identifying the confer~nce will describe a subordinate 
node due to the syntactic dependency of "conference" upon "paper" in a phrase 
of the form, PAPER AT THE...CONFERENCE. Superior nodes are assigned to the 
outline, the summary, and the review, reflecting the dependence of "paper'' 
upon those nouns in appropriate descriptions. 
A node may be viewed as a collection of pointers to simple phrases, 
other nodes, and text items. All node linkages are two-way. Pointers dm 
from a node to its simple phrases are required in order to reconstruct a 
description of the node, Pointers down to subordinate nodes are necessary for 
the same reason. Both upward and downwa~d pointers to other nodes provide a 
means for moving from any topic to structurally related ones. Associated with 
each instance of a simple phrase is a po9nter to the node where item refer- 
ences are stored, Finally, bookkeeping information stored with each text item 
includes pointers to each topic node with which the item is associated. Item- 
node linkages enable the system to provide the user with topic descriptions of 
any text item. 
Words. The representational structures considered thus far provide sim- 
ple phrases as the sole means for accessing the nodes in the network. A less 
restrActive access mechanism also is required, for several important reasons. 
First, it would be unrealistic to assume that the user will always phrase ref- 
erences to a particular topic in exactly the stme way, Second, single word 
descriptions play an important role in achieving our goal of communicative 
efficiency. Since we anticipate that users will make frequent use of single 
word references when working in the context of a p~rticular topic, we want to 
provide a natural and convenient treatment of such descriptions. Finally, a 
phrasal description can convey a higher order categ~rization of an existing 
topic without containing a simple phrase for that topic. For example, THE 
REVIEMER'S COMMENTS ON THE PAPER may reference a paper mentioned earlier; yet 
it contains no simple phrases describing that paper. 
These considerations lead us to the third logical component of the network 
data stmctu\res, the single word. Essentially, each component word provldes ac- 
cess to a series of pointers to simple phrases in which the word occurs. 
Word-to-phrase pointers are of two types: those indicating usage 
as subject noun; and those indicating modifier usage in a particular simple 
phrase. As we shall see later, this distinction is required in order to 
relate new simple phrases to existing topics at an appropriate node level. 
Hadng specified the three logical components and the linkages in the 
representational network, we now turn our attent&on to the storage implemen- 
tation of these structures. 
Storage Implementation of the Network Structure 
There are three directories needed to maintain the representational net- 
work, one for each of the components of the structure. All directory infor- 
matlon must, of course, be saved in permanent storage between AUTONOTE2 ses- 
sions. Two design alternatives were considered for maintaining the network 
during execution of the program. Thedirectories could be accessed and up- 
dated on disk, or they could be brought into core storage for the duratlon 
of the session. We adopted the former strategy for a number of reasons. 
First, AUTONOTE is highly oriented toward the use of disk file storage. 
Several file interface routines were available at the outset for conveniently 
storing and accessing information through the MTS file system. Second, as 
the network grows in complexity, it becomes increasingly unlikely that the 
user will reference the major portion of the network during any given ses- 
sion. By maintaining the network in disk files, the amount of core storage 
required is substantially reduced. Finally, the file approach greatly 
simplified the programming effort, especially in those system components 
that operate recursively on the list structured network. We will elaborate 
on this point further in section VI, which illustrates the simplification of 
recursive processes in AUTONOTE2. 
Rather than store all the directories in a single dfsk file, we chose 
to maintain each directory separately. This strategy preserves the logical 
distinction among the three types of directory information, and has also 
simplified the programing of the system. We now describe the organization 
of each of the directory files. 
The node directory. Each node in the representational network has a 
corresponding integral node number which is also the line number in the node 
directory file. As new node numbers are needed to represent new topics, the 
next sequentially numbered line in the node directory is assigned as the node 
number. Each node directory line contarns four fields--one for bookkeeping 
information and three fields for the upward, downward, and item reference 
pointers for the node. The item reference region contains a list of Integer 
item numbers. The upward pointer region also contains a list of rntegers 
that represent immediate linkages to superior nodes. The two types of down- 
ward pointers (to mdes and to phrases) are stored in a common reglon. Each 
node, simple phrase, and single word has a corresponding file llne number In 
its respective directory file. In the,case of nodes, the line number is 
simply the node number. In the case of words and simple phrases, the line 
nuntiber is the result of a, hash codPng process on a compact characterrepresen- 
tation of the word or phrase. Thus a "pointer" 1s actually a file line num- 
ber. Downward pointers to nodes and phrases are distinguishable in the node 
directory on the basis of the magnitude of the line number. 
Since each of the three pointer fields is 
of fixed length, there is a 
maximum number of each type of pointer for a given node directory line. 
Each field consequently has an associated continuation pointer to a line 
where additional pointers are stored if necessary. 
The phrase directory. 
To locate the phrase directory line for a par- 
ticular simple phrase, a hash coding function is applied to the character 
string formed by concatenating the modifier word, a slash, and the subject 
word. For example, the directory line for the simple phrase PAPER ABOUT 
AUTONOTE is the hashcode for the string "AUTONOTE/PAPER." 
Since the hashing 
function operates only on the modifier and subject word, simple phrases 
formed from the same two words, but with differing (or no) prepositions, are 
mapped into the same directory line number. 
To distinguish among the various instances of the same two-word combin- 
ation, the directory line for simple phrases consists of a series of pointer 
blocks. Each pointer block contains a code for the particular preposition 
used, some additional bookkeeping information, and a pointer to the node 
directly descrsbed by that occurrence of the simple phrase. 
The word directory. The word directory incorporates the same pointer 
block principle as the phrase directory. The pointer field of the block in 
this case is a pointer into the phrase directory. The preposition code field 
contains a binary flag indicating whether the particular wdrd occurs as the 
subject noun or modifier word in the simple phrase specified by the pointer. 
Like phrases, each word directory line is accessed through an efficient hash 
coding algorithm. 
The word directory also maintains preposition usage information foreach 
word. For example, the entry for MEMO may indicate that the word has 
occurred with the prepositions ON, ABOUT, TO, FROM, etc. This information 
is used to guide the parsing of descriptions. 
The organization of the three network directories is depicted in Fig. 8. 
$he Representational. Network: An Example 
To help fix iderfs, Qe now present a more detailed example that illus- 
trates the structure of the representational netwo2k. Suppose the user 
describes Items 157, 158, and 159 as THE PAPER ABOUT AUTONOTE FOR THE ACM 
CONFERENCE: he enters materials on the organization of that paper into 
Items 201 and 202. A summary of the paper is placed in Item 230. The user 
also describes Item 270 as SMITH'S PAPER, and enters a summary of that paper 
into Item 312. A pictorial representation of the resultant portion of the 
network is given in Fig. 9, while the corresponding directory contents appear 
ib Fig. 10. For simplicity, the simple phrase hash codes are represented by 
the alphabetic characters U through 2. (In subsequent diagrams, we alsoomit 
word-to-phrase linkages for simplicity.) 
Parsing of Descriptions 
This section outlines our general approach to parsing topic descrip- 
tions. The parsing of prepositional phrases, consecutive modifiers, and 
possessive modifiers is considered. 
Prepositional phrases. Despite the apparent simplicity of the descrip- 
tion language there are several nontrivial parsing problems. 
One of these 
is the difficulty in,dletermining the noun referent of prepositional phrases. 
The determination of noun referents is partially a semantic problem rather 
than a purely syntactic one. Consider the following two descript~ons: 
(a) Format for the word and phrase directories. 
Series of Fixed 
Length  locks^ 
(Format Given in 
(b) Below) 
(b) Pointer block format in the word and phrase directories. 
Internal Form 
of the Word 
or Phrase 
Preposi- 
tion 
Usage 
C 
Upward 
Link 
Fig. 8 - Representational Network Directory Formats 
Pointer to 
Continua- 
tion 
Lines 
Number 
of 
Upward 
Links 
Access 
Recency 
Preposition 
codeC 
Number 
of 
Upward 
Links 
%sed in conjunction with the hash codihg mechanism. 
Collisibn 
pointera 
b 
For single words, there is one block for each phrase containing the 
word. For phrases, there is one block for each - node that the phrase directly 
describes. 
Article 
Codes 
(c) Format for the node directory. 
Number 
of 
Downward 
Links 
%or single words, the preposition code is used to distinguish between 
words used as subjects or modifiers. 
For Future 
Expansion 
' Number 
of 
Items 
Preposi- 
tion 
Code for 
Each 
Downward 
Pointer 
Access 
Recency 
I 
List of 
Upward 
Poin- 
ters 
List 
of 
Item 
Ref= 
ences 
- 
Continu- 
ation 
Line 
Pointers 
List of 
Downward 
Pointers 
FAg. 9 - A Complex Representation 
(a) Word directory. 
J 
(b) Phrase directbry. 
C 
Word 
10rganization 
Paper 
I ZZEzce 
S V 
ACH 
Mth (poss) 
1 
r 
No 
Blocks 
1 
5 
1 
2 
1 
1 
1 
Blocks 
Line 
I 
No. 
U 
V 
W 
X 
Y 
Z 
v 
(c) Node directory. 
1 
Blocks 
Line 
No 
1 
2 
3 
4 
5 
6 
Fig. 10 - Corresponding Directory contentsa 
U (sub) 
W (sub) 
X (mod) 
W (mod) 
Z (sub) 
Y (mod) 
V (mod) 
Phrase 
paper/organization 
smikhlpaper (poss) 
con£ erence/paper 
autonote/paper 
acm/conference 
paper /summary 
Node 3 (of) 
Node 5 (adj) 
Node 1 (for) 
Node 1 (about) 
Node 2 (adj) 
Node 4 (of) 
a 
See Fig. 9. 
U (mod) X (sub) 
Y (sub) 
No. 
Blocks 
1 
1 
1 
1 
1 
2 Node 6 (of) 
Pointers 
UP 
394 
I 
*** 
*a* 
6 
me. 
V (sub) 
! 
Pointers 
Down 
2,W,X 
Y 
19u 
291 
V 
295 
Z (mod) 
- 
Item 
References 
8157, 11158, 1,159 
*a* 
#201, #202 
$203 
#270 
11312 
I) I 1 
THE MEMO (FROM THE COMMITTEE) (TO THE CHAIRMAN). 
In the first example, both prepositional phrases refer to the immedi- 
ately preceding noun. In the second case, both refer back to the noun MEMO 
at the beginning of the string, Although neither of these examples is in- 
tuitively ambiguous, the parsing algorithm must consider each preceding noun 
as a possible referent of any given prepositional phrase, 
?he AUTONOTE2 parser deals with this problem to a limited extent, by 
utilizing prepositional clues, For example, if the system finds that the 
noun MEMO can form a simple phrase with the prepositions ON, ABOUT, TO, and 
FRW, then phrases introduced by these prepositions will be associated with 
that noun. Such clues will not always yield a unique parsing, of course, as 
in the caee of inherently ambiguous descriptions. THE PAPER FOR THE CON- 
FERENCE ON GENETICS, for example, could refer to a paper on genetics to be 
delivered at a conference, or to a paper which is to be delivered at a con- 
fyence on genetics. 
In sucfi instances we rely upon the user to supply the referent noun 
upon request. In the example above, the system may prompt: DOES "ABOUT 
GENETICS" REFER TO PAPER OR CONFERENCE? Should the user reply CONFERENCE, 
the simple phrase CONFERENCE ON GmETICS will be added to the network. 
If 
at some later time, the parser is attempting to find a referent for the pre- 
positional phrase ON GENETICS where CONFERENCE is one of the alternatives, 
ft forms that simple phrase directly. 
Consecutive modifiers. A parallel problem arises in determining the 
noun referents for a string of consecutive modifiers. 
Descriptions 
containing at 111ust a single adjective for any particular noun are parsed in 
the obvious manner. 
A simple phrase is formed from each modifier and the 
noun following it. In the event a noun is preceded by two or more modifiers, 
the parser is confronted with a task similar to that of determining the 
referent of a prepositional phrase. 
The modifier occurring immediately be- 
fore the noun is first processed as above. 
Each of the remaining modifiers, 
however, can modify any one of several words depending upon their "distance" 
from the head noun. Specifically, any such modifier can refer to either the 
head noun or any of the other modifiers following it. 
Consider the descrip- 
tions : 
A summary of p 
I 
A summary of personal inform 
In both of the cases above, INFORMATION modifies the modifier RETRIEVAL which 
in turn modifies the head noun SYSTEMS. ~epending upon the user's intent, 
PERSONAL can modify either INFORMATION or SYSTEMS. The choice of modifier 
referents is an especially important problem when there are multiple parslngs, 
each resulting in a different semantic interpretation. For example, LARGE 
COMPUTER CONFEREXCE could refer to a conference on large computers, or a 
large conference on computers. Another important reason for our emphasis 
upon correctly identifying modifier referents concerns the use of para- 
phrasing. In the example, PERSONAL INFORMATION RETRIEVAL SYSTEMS, if we 
determine that INFORMATION modifies RETRIEVAL and PERSONAL modifies SYSTEMS, 
then the resultant topic can be paraphrased as (1) PERSONAL SYSTEMS FOR 
INFORMATION RETRIEVAL, or (2) PERSONAL SYSTEMS FOR THE RETRIEVAL OF INFORMA- 
TION. Depending upon context and the nature of other topics in the network, 
the following incomplete descriptions will in most cases identify the topic: 
1. SYSTEMS 
2. SYSTEMS FOR RETRIEVAL (or RETRIEVAL SYSTEMS) 
3. PERSO~AL SYSTEMS 
4. PERSONAL SYSTEMS FOR RETRIEVAL (or PERSONAL RETRIEVAL SYSTEMS) 
5. SYSTENS FOR INFORMATION MTRIEVAL 
6, SYSTEMS FOR RETRIEVAL OF INFORMATION 
A different choice of modif~er referents determines a correspondingly chi- 
ferent set of paraphrases, If PERSONAL was intended to modify INFORMATION, 
we would have the paraphrase SYSTEMS FOR THE ~TRIEvAL OF PERSONAL INFORMATION, 
with a corresponding list of incomplete references to the topic. 
As in the prepositional case, the choice of modifier referents is guided 
by the current state of the representational network. After processing the 
last modifier in the string, the parser positions itself at the preceding 
modifier and moves left in the input string untll the first word in the modi- 
fier string is processed. In the above example, after associating RETRIEVAL 
with SYSTEMS, the parser next examines the modifier INFORMATION. A list of 
simple phrase candidates is formed. In this case, the list contains INFORMA- 
TION RETRIEVAL and LNFORlUTLON SYSTEMS. If neither of the candidate phrases 
has been previously used, the system queries-: WHAT DOES INFORMATION MODIFY? 
The user's reply is matched against the oandidate referents and the appropriate 
simple phrase is formed. 
Possessive adjectives, 
Possessives are processed in much the same way as 
normal modifiers. 
The system recognizes the 's word stem and marks the root 
word as a possessive. 
The root word is later stored in the network directories 
along with a possessive flag. 
Thus the phrase SMITH'S PAPER is stored intern- 
ally as SMITH~PAPER (possessive). 
The removal of the stem insures that a sub- 
sequent simple phrase incorporating a preposition (PAPER BY SMITH) will hash to 
the same directory lifie thus allowing the use of either prepositional or pos- 
sessive forms in referencing topics. 
A particularly interesting case arises when a possessive occurs in a string 
of consecutive modifiers as in SMITH'S LATEST MEMORY EXPERIMENT. The string is 
first processed as described above; that is, a check is made to see if SMITH 
has been used in a simple phrase with LATEST, MEMORY, or EXPERIMENT. In the 
event that this yields no clues, the system then checks to see if SMITH was 
rendered as a pbssessive. Upon noting that it was, the parser carries out a 
heuristic that assumes that the possessive modifies the head noun, EXPERIMENT. 
The possessive heuristic can be fully stated as follows. A possessive 
occurring in a stfing of modifiers will be assumed to modify the head noun un- 
less another possessive occurs between it and the head noun. In the latter 
case, the first possessive will be assumed to modify the second. This is simi- 
lar to the possessive feature employed by the REL parser (Dosert & Thompson, 
1971). 
Thus in SMITH'S RESEARCH GROUP'S MEMORY EXPERIMENT, SMITH'S is assumed 
to modify GROUP, and GROUP'S is assumed to modify the head noun EXPERIMENT. 
The question now arises, why check the phrase directory first instead of 
applying the possessive heuristic immediately? To answer this, suppose a topic 
was originally described as THE WSULTS OF THE MEMORY EXPERIMENT BY SMXJTH and 
the user now attempts to refer to it as SMITH'S MEMORY EXPERIMENT RESULTS. If 
the possessive heuristic were applied immediately, the system would incorrectly 
form the simple phrase SMITH'S RESULTS, not SMITH'S EXPERIMENT. By checking 
the network first, the simple phrase EXPERImNT BY SMITH will be detected and 
the system will parse the description appropriately. 
Implementation of the Parser 
The ultimate goal of the parser is to determine the simple phrases of a 
topic description. The parsing algorithm is implemented as a two stage pro- 
cess. The first stage is a preliminary scan to ascertain that the string is 
in a form acceptable for analysis. The description is segmented into an 
ordered list of words, each of which is marked as either WORD, POSSESSIVE, 
ARTICLE, or PEPOSITION. The parser makes no distinction between nouns and 
modifiers until completing the scan. At this point, the last in a series of 
consecutive WORDS is marked as a NOUN; the preceding words are marked as MOD- 
IFIERS. Possessive modifiers are an exception as they can be recognized ex- 
plicitly during the scan. A record of article usage is also kept, but the 
articles themselves are not placed on the word list. 
The preliminary scan of the description can be viewed as a simple finite 
state process. Of course, to be completely formal, the recognizer would have 
to examine each input character. For convenience we will assume a five state 
automaton with inputs: WORD, POSSESSIVE, PREPOSITION, and ARTICLE. The state 
transition graph for the machine is given in Fig. 11. The machine starts in 
state So, examines the next input and moves to a new state. If at the end of 
the input string, the machine is in state S called the final state, the in- 
1' 
put is accepted; otherwise, the user is asked to rephrase. Note that in state 

S2' 
the machine has just encountered an article and is anticipating a "word." 
If the machine is in state S upon completion, it has juat recognized a pre- 
0 
position and is expecting an object; thus, the string is rejected. The state 
S is reached whenever a possessive is encountered. Since a possessive must 
4 
have an object noun, a "word" input is required to reach state S1. State S 
3 
is a trapping state; once entered, the machine remains in that state regard- 
less of the remaining input and the description is consequently rejected. 
State S corresponds to various error conditions--two consecutive prepositions 
3 
or articles, an article between two words, a phrase beginning wlth a preposi- 
tion, etc. 
The state transitions for the description BRUNER'S FIRST EXPERIMENT ON THE 
CONSERVATION OF LIQUIDS are given below along with the resultant word list. 
INPUT 
Bruner' s 
First 
Experiment 
On 
The 
Conservation 
Of 
Liquids 
TYPE 
Possessive 
Word 
Word 
Preposition 
Article 
Word 
Preposition 
Word 
WORD 
- 
Bruner ' s 
First 
Experiment 
On 
Conservation 
Of 
Liquids 
RESULTANT STATE 
s4 
S1 
S1 
So 
S2 
S1 
$0 
S Accept 
1 
TYPE 
- 
Modifier 
Modifier 
Noun 
Preposition 
Noun (the) 
Preposition 
Noun 
Descriptions found acceptable by the scanner next undergo analysis by the 
second stage procedure. 
This algorithm steps through the word list and builds 
a table of simple phrases called the phrase table. 
Each entry in the phrase 
table Includes (1) the internal character representation of the phrase for use 
in hash coding, (2) a numerical code for the preposition used, (3) the hash 
code (directory line number) for the phrase, (4) a list of nodes directly des- 
cribed by the phrase, and (5) a coordinate or subordinate link to another 
phrase table entry. 
We now illustrate the construction of the phrase table by following 
through several examples. 
The parser in operation. Let us assume that a user is running the system 
for the first time; consequently, the three network directories are initially 
empty. Item No. 1 is opened, some text is inserted, and the user describes it 
as THE PLANNED PAPER ABOUT AUTONOTE FOR THE CONFERENCE. The description suc- 
cessfully passes the preliminary scan aqd the word list is constructed. The 
parser then moves on to determine the simple phrases. 
The modifier PLANNED is first noted. Since it is followed immediately by 
a noun, the simple phrase PLANNED PAPER becomes the first entry in the phrase 
table. Next the prepositional phrase ABOUT AUTONOTE is encountered. Again 
there is only one possible noun referent. The phrase PAPER ABOUT AUTONOTE is 
entered into the table and marked as coordinate with the first entry. 
To 
determine the referent of FOR CONFERENCE, the system must consider two alter- 
natives: AlfTONOTE FOR CONFERENCE and PAPER FOR CONFERENCE. 
The network is in- 
terrogated to determine if either of the candidate phrases has been previously 
used. This test fails since the network is empty at chis point. 
A check is 
then made in the word directory to determine if either AUTONOTE or PAPER has 
headed a simple phrase with the preposition FOR* This also fails so the sys- 
tem asks the user: DOES "FOR CONFERENCE" REFER TO PAPER? A yes response 
results in the addition of PAPER FOR CONFERENCE to the phrase table. Since 
the noun referent, PAPER, is the same as the previous phrase, the new entry 
is marked as coordinate with PAPER ABOUT AUTONOTEw The completed phrase table 
is given in Fig. 12. 
(1) autono te/paper 
(2) planned/paper 
(3) conferenee/paper 
Fig. 12 - Sample Phrase Table 
The phrase table fs next passed to the network locator. We will assume it 
determines that the user is defining a new topic. Using the syntactic depen- 
dencies in the phrase table, the network locator assigns new node numbers to 
the phrases in the description. In this case, all three phrases are coordin- 
ate; each will directly describe node No. 1 in the network. In addition, a 
reference to Item NO* 1 is stored with the topic node (see Fig. 13). 
The user next enters Text Item Noo 2 describing it as THE ORGANIZATION OF 
THE PAPER FOR THE ACM CONFERENCE. The system proceeds as before untll it en- 
counters the prepositional phrase FOR THE CONFERENCE. It forms the two alter- 
natives PAPER FOR CONFERENCE and ORGANIZATION FOR CONFERENCE0 Upon interro- 
gating the network, it finds that PAPER FOR CONFERENCE has been defined 
previously and accepts that candidate. 
ACM CONFERENCE is added to the phrase 
table and the parsing is complete (Fig. 
14). 
PAPER about 
Autonote 
(I) paper/organization 
(2) con£ erence/paper 
(3) acm/ con£ erence 
Fig. 14 - Sample Phrase Table 
The network locator must then determine ff the user is referring to the 
same paper or a new one. The operation of the network locator will be dis- 
cussed in detail in the next chapter. Let us assume for now that the current 
description is indeed a reference to the same paper. 
The simple phrase PAPER 
FOR CONFERENCE is already in the network. 
The system must then decide what to 
do with THE ORGANIZATION OF PAPER, and with ACM CONFERENCE. Since the former 
is superior to the node 1 phrase, it is assigned to node 2 and a downward 
pointer from node 2 to node 1is added. The phrase ACM CONFERENCE, on the 
other hand, is subordinate to a node 1 phrase. Thus it is assigned a new node 
number (node 3) and a pointer up from node 3 to node 1 is added. Phrase-to- 
node and node-to-node pointers are two way, thus corresponding pointers down 
from node 1 to node 3, and up from node 1 to node 2 are also added. The resul 
tant network is illustrated in Fig, 15. This example points out an interest- 
ing feature of the AUTONOTE2 system. Although Item No, 1 was originally des- 
cribed as alpaper for some unspecified conference, a subsequent reference to 
that paper has enriched its description. 
V, THE NETWORK LOCATOR 
The purpose of the network locator is to determine whether the user's des 
cription makes reference to an existing topic in the representational network. 
Its decision is based on the information in the phrase table and the current 
state of the network. Once the decision is made, the locator builds a table, 
called the links table, that specifies the changes to be made in the network 
to represent the description. 
In cases where the input descriptiori matches exactly some structure in 
the network, the links table will specify only the addition of an item refer- 
ence. When the description defines a new topic, every phrase in the phrase 
table will be assigned a new node and links entries will be made for the pro- 
per node-node linkages. 
Fig. 15 - Network after Augmentation 
In order to describe more precisely the operation of the network locator, 
let us assume the network has evolved to the state depicted in Fig. 16. Note 
that by starting at any node and tracing downward through the network, it is 
possible to reconstruct the description of the topic the node represents. The 
nodes in the network represent the fo'llowing topics. 
Node 1. THE PLANNED PAPER ABOUT $WONOTE FOR THE ACM CONFEIIENCE. 
Node 2. ORGANIZATION OF THE PLANNBD PAPER ABOUT AUTONOTE FOR THE 
ACM CONFERENCE. 
Node 3. THE ACM CONFERENCE. 
Node 4. AN ABSTRACT OF THE FIRST PAPER ABOUT AUTONOTE. 
Node 5. THIE FIRST PAPER ABOUT AUTQNOTE. 
Node 6. THE REVIEWER'S COMMENTS O@ THE PLANNED PAPER..oo 
Node 7. THE PROCEEDINGS OF THE Am CONFERENCE. 
Node 8. TRAVEL ARRANGEMI3W.S FOR THE ACM CONFERENCE. 
To illustrate the network location procedures, we will now ga "through several 
subsequent references to topics already defined in the rqrmentation. 
Before passing the phrase table to the locator, the parser first checks to 
see if the description contains any active phrases, simple phrases that direc- 
tly describe one or more nodes in the network. When the locator gets control, 
it checks an internal flag that indicates one of three conditions: 
the des- 
cription contains one or more active phrases; the description contains no 
active phrases; or the description contains only a single word. 
As our first example, consider the subseqaent item description: 
THE PAPER 
ABOUT AUTONOTE FOR THE CONFERENCE. The phrase table is given in Fig. 17. The 
locator notes tliat there are two active phrases and focuses its attention on 
ARRANGEMENTS 
ORCZANI ZATION 
PROCEEDINGS 
PAPER about 
CONFERENCE 
u 
Fig. 16 - A Cluster of Related Topics 
Fig. 17 - Sample Phrase Table 
the first of these, PAPER ABOUT AUTONOTE. From information in the phrasetable, 
it sees that PAPER ABOUT AUTONOTE plays a role in two d-istinct topics repre- 
sented by nodes 1 and 5. It then considers bath of these alternatives, check- 
ing to see if the remaining phrases in the phrase table either directly or in- 
directly descxibe either of the two nodes. Since both phrases directly des- 
cribe node 1, the locator assumes that it is the topic node of user reference. 
As an option, the user may request the locator to display its assumptions, in 
which case the system replies: I ASSUME YOU MEAN THE PLANNED PAPER ABOUTAUTO- 
NOTE FOR THE ACM CONFERENCE. Other than the addition of an item reference to 
node 1, no network changes are made in this example. Note that the user has 
efficiently made reference to tbe desired topic, relying on the system to fill 
ia the gaps in hi& description. The system would proceed in much the same way 
in processing shorthand descriptions such as SUMMARY OF THE PAPER or REVIEWER'S 
COMMENTS ON THE FAPER, in each-caae assumtng that the user is referring to the 
same paper about AUTONOTE 
In previous discussion we have alluded to the use of contextual clues in 
deciding among the alternative referents of a vague or ambiguous description. 
Context in the AUTONOTE2 system takes the form of an access recency nwlber 
(context number). Each time the user refers 
to some topic in the network, 
k 
Phrase 
- 
(1) autonote/paper 
(2) Conferenae/paper 
e 
1 
Links 
Pode I. 
Node 1 
m 
Article 
the 
the 
- 
Preposition 
about 
for 
i 
- 
Dependency 
root 
co-ord 1 
each of the component nodes is assigned the current context number. 
The cur- 
rent context number is incremented at the beginning of each AUTONOTE2 session 
and each time the user defines a new topic. 
Thus when deciding among alterna- 
tive topic nodes, the system can readily determine which was referred to most 
recently, 
Another class of interesting cases are those i~ which the description 
consists of a single noun. 
I£ the current description is THE PAPER, for ex- 
ample, the system would use the word directory to locate those simple phrases 
where PAPER is the subject noun. Using the resultant list of simple phrases, 
a list of nodes directly described by these phrases is generated, In this 
case, this process generates two alternatives (node 5 and node 1). The system 
then functions as before, either choosing a node in context, or interrogatidg 
the user, 
The foregoing discussion has described our approach to network location. 
We now give a more detailed presentation of the algorithm. 
Case I: Active phrases in the description. Should the user's description 
contain one or more active phrases, there is a good possibility that it refer- 
ences an existing network topic. The first step in processing such a descrip- 
tion is to determine the focus phrase, the active phrase at the highest depen- 
dency level. Note that the focus phrase may be subordinate to other (non- 
active) phrases in the desctiption. The basic idea is to use the nodes direc- 
tly described by the focus phrase to get a set of candidate topics. Once these 
candidates are determined, they are matched against the remaining activephrases 
in the phrase table to determine the most likely referent. 
Before describing the matching process, let us first consider a few 
special cases. Suppose, for instance, that the focus phrasq directly des- 
cribes only one topic node and that any additional active phrases are also 
present ih that toflic representation. The presence (or absence) of non- 
active phrases in the description is, in this case, an important parameter. 
Any non-active phkases may serve to distinguish the description from the exis- 
ting topic. On the other hand, they could very well represent additional des- 
cription of the topic at hand. If the topic under consideration is recent, we 
first assume the latter case. In addition, when processing descriptions ren- 
dered for retrieval, the netwofk locator naturally rules out the possibility 
that a new topic is being described and accepts the one at hand. 
The_matching process. When the focus phrase directly describes two or 
more nodes, a network matching procedure is used to determine which of the 
associated topics the description references. The matching routine uses a 
list of the candidate nodes, a list of the active phrases in the description, 
and the current contents of the representational network. For each candidate 
node, the routine determines how many of the descripti~n's active phrases 
directly or indirectly describe that node. The matching routine returns a 
table of this information along with the number of the node, if any, that best 
matches the input- description. The "best" node is the one that has the high- 
est number of matching phrases. If two or more nodes have an equal number of 
matching phrases, an attempt is made to choose one of them on the basis a£ con- 
text. 
The simplified flow diagram appearing in Fig. 18 summarizes the decision 
procedure for the case in whikh the description contains one or more active 
#up--the number of upward 
pointers to nodes from 
the focus phrase 
ilevel--a user-specified 
interaction level 
#nonac t--number of nonac t ive 
phrases in the des- 
cription 
f indmode--a binary flag, "on" 
for retrieval descrip- 
iions 
Pig. 18 - Network Location Procedure for Descriptions Containing Active Phrases 
phrases. Note in particular that in case the description consists of only a 
single active phrase that directly describes a single node, the network l~ca- 
tor assumes the node immediately. The rationale is that we anticipate the 
user will frequently make use of such terse descriptions in reference to pre- 
viously define topics. Recalling our earlier discussion of human referential 
communication, a speaker makes incomplete references with the assumption that 
the topic can be inferred by the listener; otherwise, he describes his subject 
more precisely to avoid being misunderstood. The network locator was designed 
with this in mind. That is, whenever a terse description references a single 
node directly, or in context, that node is taken as the referent, 
Case 11: No. active phrases. The first step in processing a description 
with no active phrases is to examine its component words, attempting to iden- 
tify possible referents by utilizing the word and phrase directories. If no 
candidate nodes are generated by the procedure, the network locator assumes 
a new topic is being defined and allocates new nodes in the network, 
Non-acttve descriptions that reference existing topics fall into two cate- 
gories. First, there are those that paraphrase some existing topic descrip- 
tton, For example, a topic originally described as THE USE OF THESAURI IN THE 
SMART S'ISTEMmay subsequently be referred to as THESAURI TECHNIQUES IN SALTON'S 
SYSTEN. Second, the description may constitute a more specific classification 
of some topic. While working in the context of a particular paper, for ex- 
ample, the user may describe a new item as THE ORGANIZATION OF THE PAPER, where 
ORGANIZATION OF PAPER is a non-active phrase. 
The word search procedure involves the use of selected words from the des- 
cription and the wofd directoq to obtain a set of active phrases containing 
rhose words. 
From there, a set of nodes is obtained by collecting the upward 
pointers from those phrases in the phrase directory. The resultant set of 
nodes then is processed as a list of candidate topics just as if they had been 
obtained immediately from a description containing active phrases. 
An important consideration here is which of the words in the description 
to use in the search for candidate topics. In an early version of the system, 
we tried using each noun and modifier in turn. Although this approach was suc- 
cessful in many cases, it often resulted in an extremely lengthy list of alter- 
natives. We also noted that words at the highest dependency level more often 
led to identification of the topic node than those words occurring at subordin- 
ate levels. For this reason, it was decided to curtail the word search, using 
1 I 
only the words in the root phrase" of the description. Fop the non-active 
description PAPER ON CLUSTERING IN THE SMART SYSTEM, the words PAPSR and CLUS- 
TERING would be used in the search for candidate nodes. As a user option, the 
system will expand the search to include the remaining words in the description. 
There are four stages in the search for candidate topics (see below). 
Stages 1 and 2 deal with the subject word of the root phrase; Stages 3 and 4 
with the modifier word. In Stages 1 and%3, only those nodes directly described 
by phrases having the particular word in subject position are considered. In 
Stages 2 and 4, topic nodes with the word in modifier position are considered. 
Stage 1 
Stage 2 
S,tage 3 
Stage 4 
Role of wordtin 
the description 
Subject 
Subject 
Modifier 
Modifier 
Role of word in 
the network 
Subject 
Modifier 
Subject 
Modifier 
After completion of each stage, if there were any nodes generated they 
are passed through the recency check. If no node is distinguished, the user 
is presented with a list of the alternatives. The user may then choose one of 
the topics or reply that none is the intended referent, in which case the next 
stage is tried. 
If a node eventually is identified by this process, the locator must note 
the stage it is in, since each case implies a distinct links table. Fig. 19 
gives an example of each case. The state of the network before processing the 
description is illustrated $y solid lines; the network additions by dashed 
lines. Note that is the Stage 2 example, the user originally described a new 
topic simply as ORGANIZATION OF THE PAPER and then gave a more complete des- 
cription of the paper. The description of the paper was also left pending in 
the Stage 4 example. In fact, Case 2 and 4 can only occur in this situation 
since the presence of a simple phrase with paper as the subject noun would have 
been picked up earlier in either Stage 1 or 3. 
The stage in which topic identification is made also is important when 
processing retrieval descriptions, Recall that a priacipal advantage of the 
referential system is that it enriches its representation of the user's topics 
during retrieval. Whenever a retrieval description contains simple phrases not 
asready present in the representation of the identified topic, they are added 
to the representation. However, if topic identification occurs in Stage 2, 
note that a simple phrase will be added at one level higher than the decided 
topic. If processing a retrieval description, the addition would be meaning- 
less as it will not enrich the description of the identified node. For example, 
if the user has previously described a paper and later calls for the retrieval 
I 1 
i Acm I 
I CONFERENCE I 
\ 
- 
- 
Z 
'4 
'CI 
- 
2. 
- 
\ 
'. 
'. 
\ 
(a) Stage 1. User input: The paper for the Acm conference. 
I PAPER about I PAPER for I 
1 
I PAPER for 
I 
i 
, conference I 
L,,,,, _I 
planned 
PAPER 
I Autonote I I conference 1 
I I I I 
L- - -, 
L------ J 
PAPER about 
Autonote 
(b) Stage 2. User input: The paper about Autonote for the conference. 
Fig. 19 - Network Alterations Arising from a Successful Word Search in Each 
of the Four Stages 
1 
of paper 
\ 
\ 
PAPER about 
(c) Stage 3. User input: The organization of the paper. 
/ 
/-\ 
I 
\ 
\ 
I 
/ 
4- - * \ 
\\ 
\ 
/ \ 
\ / \ 
\ 
/ \ 
\ 
/ 
4, - % b 
r --- --1 
Y 
/ \ I 
ORGANIZATION I \ ' I SUMMARY 
I 1 
of paper , 
\ 
N' - 0 v L----- 
/ \ 
A 
I I I 
PAPER about 1 I PAPER for , 
I Autonote i 1 conference 
1 I 
(d) Stage 4, User input: Summary of the paper about Autdnote for the confer- 
ence. 
Fig, 19 - Continued 
of MATERIAL FOR THE PAPER, the addition of a higher order node pointing down 
to the "paper" node is of no value in later referencing the topic. 
In such 
cases, the network locator returns the located node to the retrieval processor 
and suppresses the network addition. 
To state this more generally, retrieval 
descriptions are employed to augment the representation only when the addi- 
tional phrases constitute co-ordinate or subordinate description of the located 
topic. 
Although we have found the word searching procedure quite effective, its 
success ultimately depends upon a co-occurrance of some word in both the des- 
cription and the Yepresentational network. A proposed extension to AUTONOTE2, 
as described in Linn (1972), would augment this procedure to include word stem 
and synonym processing. 
The major objection to this procedure is that as the network grows larger 
it generates too many candidate nodes and consequently more queries to theuser. 
To alleviate this problem, we allow the user to cancel processing of the des- 
cription any time he decides the system is having difficulty relating his des- 
cription to the current representation, In addition, if the user is unsure 
how he previously described a particular topic, a facility is provided that 
allows him to obtain topic descriptions from a specified region of the network. 
Upon noting the topic he originally intended, he may then give a more precise 
description. 
Case 111: One word descriptions. Descriptions consisting of a single 
noun are processed in much the same manner as non-active descripti~ns, 
The 
noun is treated as if it resulted from a deletion on a simple phrase. 
The wrd 
directory is first searched for simple phrases in which the word appears as the 
subject and, if necessary, the modifier. The nodes obtained from the phrase 
directory then are processed as described earlier. 
VI. NETWORK MEDIATED RETRIEVAL 
The previous sections dealt primarily with the process of item descrip- 
tion, that is, the process of constructing a representation from descriptions 
of the user's textual materials. This section discusses the AUTONOTE2 proced- 
ures that retrieve information through the representational network. 
Retrieval via Descriptions 
Many of the procedures described earlier for item description and repre- 
sentation are used in retrieval. The user initiates retrieval by giving a FIND 
command, supplying a description as argument. Retrieval descriptions are first 
passed to the parser, and are therefore subject to exactly the same constraints 
as item descriptions. If the description is acceptable, the resultant phrase 
table is passed along to the netw~rn locator which ultimately returns a node 
number to the FIND processor. 
The FIND processor constructs a set of item numbers by extracting the 
textual references from the node returned by the network locator. The system 
then checks for upward poipters from the node, to more specifically described 
materials, If there are ~tructurally related topics, the FIND processor so 
informs the user and asks if he would like to explore further. If so, the 
user is presented with descriptions of the higher order alternatives. Using 
the network depicted in Fig. 16, for example, consider the retrieval request 
FIND THE PLANNED PAPER ABOUT AUTONOTE. The network locator would determine 
that node 1is the desired referent and return that fact to the FIND processor. 
After storing away the item references of node 1 the system would ask: 
DO YOU WANT: 
A. 
THE ORGANIZATION OF THE PAPER 
Bw THE REVIEWER'S COMMENTS ON THE PAPER 
The user my respond with an appropriate letter indicating which topic he 
desires. 
If the topic selected also has higher order nodes, the process is 
repeated until the user terminates the search. 
If the node returned by the network locator has no associated item refer- 
ences, the system searches upward in the network for a node with text item 
pointers. If a node is reached with multiple upward paths, the system stops 
and queries the user. For example, if a user has entered only an outline and 
some bibliographic references for a paper he is writing, then a retrieval des- 
cription that maps onto the "paper" node would elicit a query such as: 
DO YOU WANT: 
A. THE OUTLINE OF THE PAPER 
B. BIBLIOGRAPHIC WFERENCES FOR THP, PAPER 
This example illustrates a distinct advantage of the referential system 
over sfmple keyword indexing. When the user's description is imprecise, AUTO- 
NOTE2 directs the user to related topic nodes with associated textual materials, 
Upon termination of the search, the resultant set of textual references is 
stored internally. Depending upon the user' s option settings , a reference 
count and the set of ftem numbers then may be displayed on the user's consoh. 
The user may PRINT those particular items he wishes ta see, or he may simply 
RETRIEVE the entire set, 
In dealing with groups of related items, network mediated retrieval has 
three major advantages over simple keyword-based technfques. First, the user 
I I 
need only make his descriptions more specific in order to zero in" upon cor- 
respondingly specific textual materials. Second, the representational network 
enables the system to use the user's original description as a starting point 
in guiding him to structurally related topic nodes. Finally, the possibility 
of network exploration can help the user recall the structure of the materials 
represented in some portion of the network. This can be quite valuable after 
the user has spent an extended period working with other topics, or as the 
number of topics and their interconnections become large. 
After processing a retrieval request, the system determines if the user's 
description contained any prepositional phrases or adjectives not already 
present in the identified topic's representation. If so, the topic descrip 
tion is enriched accordingly. For example, if the representation of the 
located node is THE PAPER FOR TNEl ACM CONFERENCE, and the user referred to it 
by the retrieval description TKE PAPFR FOR THE FALL CONFERENCE, the system 
will augment its representation to include the simple phrase FALL CONFERENCE. 
This is an important aspect of AUTONOTEZ. Whether descriptions are employed 
for the purpose of characterizing text items or retrieviag them, the system 
continually updates its representation of the user's topics, In addition, 
this example illustrates how the system is able to establish a limited form of 
phrase synonomy. There will subsequently be a node in the network directly 
described by both ACM CONFERENCE and FALL CONFERENCE, and any topic associated 
with that node may later be referenced using either or both of the two simple 
phrases. 
Interrogating .the Network 
As the network grows complex, the user must be able to question the sys- 
tem about the current representation. 
This capability may help him recall the 
structure of some set of related topics. Or, prior to formulating a new topic 
description, the user may wish to examine the representational network for 
possible related topics. Finally, periodic perusal of the network may 
strengthen the user's own conceptual representation of the various topics and 
their interrelationships. 
The DESCRIBE command retrieves topic descriptions £tom the representa- 
tional network. It accepts 9 variety of arguments and first generates a set 
of topic nodes. Then, using the SPEAKER routine, it outputs a description of 
esch n~de in the set. The various input forms include the following. 
DESCRIBE ITEM <list). Each time the description processor adds a textual 
reference to a node, the node number is placed in a predetermined location in 
the text file region of the,item. The DESCRIBE processor consequently has 
access to the desired set of associated node numbers. For any particular text 
item, the user may wish to know which topics it currently is associated with. 
Initially, when an item is first described by the user, the actual description 
line is placed in the data base beneath the text. To recall how he described 
an item origfnally, the user need only request that the item be printed (omit- 
ting the text i.f he chooses). But the original description may have been only 
a terse reference, in context,to amore fully described node. Furthermore, the 
description of that node may have been enriched or altered subsequent to the 
entry of the item. 
To obtain a full description of each topic presently 
associated with the item, regardless of how the item originally was described, 
the user employs DESCRIBE ITEM. 
DESCRIBE CURRENT [TOPICL. A pointer to the node most recently referenced 
in the repxesentation is maintained in the node directory. In response to 
this command, the DESCRIBE routine simply determines the node number and dis- 
plays its description. The current node number is saved between AUTONOTE2 
sessions; this command is often employed at the beginning of a session to 
remind the user of the previotis working context. 
DESCRIBE TOPICS. This command causes every node in the network having 
associated item references to be described. Because of the voEuminous output, 
it is most frequently employed iri batch mode. 
DESCRIBE <description>. When the DESCRIBE routine encounters an argument 
that is not in one of the special forms discussed above, it treats the input as 
a phrasal description. Using the parser and network locator, an attempt is 
made to map the input into a unique topic node. Tf successful, a complete des- 
cription of the node is presented to the user. Thus, if the user cannot re- 
call precisely how he described some topic, he may supply an incomplete rder- 
ence to obtain the topic description in full. 
The network locator functions somewhat differently when processing a des. 
cription for the DESCRIBE command. If it is unable to discern a unique node 
using the matching procedure and context, a list of the alternatives is re- 
turned for subsequent display. 
The FULLY modifier. The user may request the display of a host of related 
topics by employing the. JXJLLY modifier. 
Specifically, the user types DESCRIBE 
FULLY, followed by any of the argument forms discussed above. As before, this 
generates a node or set of nodes. 
When describing FULLY, each node is in turn 
expanded into a set of structurally related nodes also having associated text- 
ual references. 
As an example, consider again the network in Fig. 16. The user types DES- 
CRIBE 'FULLY, THE PAPER ABOUT AUTONOTE. Assuming no choice is possible in con- 
text, the description is ambiguous, and the network locator returns nodes 1 
and 5. 
The two nodes themare passed to a routine that displays an indented out- 
line representing the structurally related topics reached by moving upward in 
the network. Each level of indentation represents a node level traversed in 
the network. In this example the following outline would be printed: 
A. THE PLANNED PAPER FOR THE ACM CONFE3ENCE 
THE: ORGANIZATION OF THE PAPER 
THE REVIEWER'S COMMENTS ON THE PAPER 
B. THE FIRST PAPER ABOUT AUTONOTE 
THE ABSTRACT OF THE PAPER 
DESCRIBE STRUCTURES. This command functions as if FULLY was specified, 
diaplaying outlines of each topic cluster in the representational network. To 
accomplish this, the network is searched for nodes having nowdownward pointers 
to other nodes. Each such node corresponds to the lowest order node level in 
a particular cluster of related topics. When described FULLY, the effect is 
to reveal the structural outline of its associated cluster. 
The SPEAKER Conrpanent 
As we have seen, SPEAKER is invoked during many phases of AUTONOTE2's 
operation. The calling routine passes the SPEAKER a node number. 
A buffer 
containing a phrasal description of the node is returned. 
A second, optional 
input parameter specifies the level of detail desired in the resultant des- 
cription, The level indicator corresponds to the number of node levels in 
the representation to be employed in formulating the description. 
The level indicator is particularly useful when the system must question 
the intent of a description. When querying the user during the network loca- 
tion process, for example, the system requests topic descriptions from the 
SPEAKER with the level indicator set according to the user's current preferred 
levelofdetail,as inferred from his most recent description. For example, if 
the user describes an item as RESULTS OF THE EXPERIMENT and the system must 
ask if he is referring to SMITH'S EXPERIMENT ON THE SHORT TERM MENORY OF WHITE 
RATS, the resulting query would be ARE YOU REFlERRING TO SMITH'S EXPERIMENT ON 
MEMORY? 
The process of constructing a description from the network takes place in 
two stages. The first stage steps through the network recursively, collecting 
the simple phrases that directly or indirectly describe the specified node. 
The level indicator, if applicable, blocks the collection of simple phrases 
below the specified level. During this stage, the SPEAKER constructs two 
tables of words, one for subject nouns and another for modifiers. Each entry 
in the subjects table is linked to a list of adjectives for that subject, and 
a list (called the modification chain) of prepositional modifications of the 
subjec.t noun. For example, the subjects tables entry for PAPER may have an 
adjective list containing PLANNED, and a modification chain consisting of 
(ABOUT) AUTONOTE and (FOR) CONEI3RENCE. Both of the lists are chained through 
the table oE modifiers. Note that some words will appear in both the subject 
and modifier tables. For example, PAPER may be in the modifler table as part 
of the modification chain of the word ORGANTZATION, and also is the subjects 
table with a modification chain of its own, 
The subjects table also maintains article usage information for each of 
its entries. Fig. 20 illustrates the subject and modifier tables constructed 
from a typical topic node, 
(a) Subjects table 
A 
(b) Modifier table 
t 
Subject 
Noun 
organization 
Paper 
conference 
s 
Fig. 20 - SPEAKER tables generated from the network representation of 
ORGANIZATION OF TTB PLANNED PAPER ABOUT AUTONOTE FOR THE 
ACM COmRENCE . 
Article 
the 
the 
the 
A 
Chain 
Link 
... 
.*a 
(5) 
... 
... 
- 
i 
C 
- 
Modifier 
Word 
(13 planned 
(2) paper 
(3) autonote 
(4) acm 
(5) conf erence 
The second stage is carried out by a recursive algorithm that operates on 
- Y 
Modification 
Chain 
(2) 
(3) 
... 
Level 
1 
3 
3 
Preposition 
adj 
of 
about 
adj 
for 
the two tables to construct the phrasal description. 
The process begins with 
the first word in the subjects table, in this case ORGANIZATION. 
If an article 
applies, it is added to the description buffer. 
Next, the adjective chain is 
-. ,. 
_L 
Adjective 
Chain 
. . . 
(1) 
(4) 
traversed adding each adjective in turn to the buffer. In this case there are 
no adjective8 so the current subject word (ORGANIZATION) is added to the buf- 
fer and the systetn continues with the modification chain. This leads to the 
second entry in the modifier table, (OF) PAPER. The preposition is then added 
to the buffer yielding THE ORGANIZATION OF. Next, a check is made in the sub- 
jects table to determine if the current modifier word (PAPER) is further des- 
cribed. Since there is an entry for PAPER, the current position in the modi- 
fication chain for ORGANIZATION is placed on a push down stack (the goal stacw 
and the algorithm recurses on the word paper. After adding the article, the 
adjective (PLANNED), and the subject word (PAPER), the description buffer con- 
tains THE ORGANIZATION OF THE PLANNED PAPER. The system now begins processing 
the modification chain of PAPER. The first piece of the chain adds ABOUT AUTS 
NOTE to the buffer, Note that there was no recursion on AUTONOTE because that 
word does not have a subjects table entry. The pointer to the next piece of 
the modification chain, (FOR) CONFERENCE, is then picked up from the link field 
of the AUTONOTE entry. After adding the preposition (FOR), the algorithm re- 
curses on CONFERENCE, adding THE, ACM, and CONFERENCE in turn to the buffer. 
The goal stack is then popped in search of remaining modification chain poin- 
ters. 
The first "pop" restores the PAPER modification chain. Since there is 
no additional modification of the paper, the goal stack is popped again to 
restore the ORGANIZATION chain, We are at the end of this chain also, and 
thus the process terminates with the description buffer reading: THE ORGANZ- 
ATION OF THE PWED PAPER ABOUT AUTONOTE FOR THE ACM CONFERNECE. 
SPEAKER heuristics. The addition of phrases to a topic in many cases 
could reduce the readability of its SPEAKER-generated description. For 
example, suppose a topic is first defined as SAMPLE DESCRIPTIONS FOR USE IN 
THE ACM PRESENTATION, and later is referred to as SAMPLE DESCRIPTIONS FOR USE 
IN THE NSF PROPOSAL. Given only the algorithm just presented, the SPEAKER 
generated description would be SAMPLE DESCRIPTIONS FOR USE IN THE ACM PRESEN- 
TATION IN THE NSF PROPOSAL. To avoid such unreadable descriptions, whenever 
the modification chain for a subject noun contains two or more prepositional 
phrases headed by the same preposition, the SPEAKER sets off each phrase after 
the first with parentheses. The above example then becomes SAMPLE DESCRIP- 
TIONS FOR USE IN THE ACM PRESENTATION (AND THE NSF PROPOSAL). Note that a 
description such as COWTS - ON SMITH'S ARTICLE - ON CLUSTERING is not processed 
in this manner since (ON) ARVTCLE is in the modification chain of COMMENTS, 
while (ON) CLUSTERING is in the modification chain of the work ARTICLE. Note 
also that although parentheti~al phfases are excluded from topic descriptions 
generated for the purpoSe of Lnterrogating the user, when the user requests a 
description of a tppic via the DESCRIBE command, the complete description is 
provided . 
Simplification of list processing. It may be added here that our decis- 
ion to maintain the representational network in disk file storage has greatly 
simplified the list processing in recursive algorithms such as the SPEAKER, 
The network can be envisioned as a complex list structure where the links are 
simply line file numbers. To illustrate this point, consider the recursive 
collection of simple phrases carried out in the first step of the SPEAKER. 
The main body of the routine collects the simple phrases that directly describe 
a node. If the node processed has downward links to subordinate nodes, they 
are placed on a push down stack. 
Next the stack is popped and the routine is 
called recursively to operate on a new node number. Thus all the concomitant 
problems of storage management that are normally present in list processing 
systems are avoided. Recursive deletion, discussed in the next section, 
similarly is simplified. To delete a portion of the list structure requires 
only the removal of a line from a directory file. Thus the process of "garbage 
collection" is both automatic and transparent to AUTONOTE2. 
VII. NETWORK MODIFICATION 
Procedures for modifying the representational network are required for 
several reasons. Should the system incorrectly parse a description, the user's 
ability to reference~the associated topic will be impaired. The user may wish 
to alter the description of a topic to (a) make it more precise, (b) insure 
that it is not confused with similarly described topics, or (c) enable a topic 
to be referenced in more than one way, After initially describing a text item, 
the user may discover that the item should also be associated with other topics 
in the representation. Alternatively, he may decide that a text item should 
be dissociated from some topic. The user may wish to delete an obsolete topic 
from the representation altogether, or replace a description in its entirety 
by a more suitable one while maintaining the same list of associated textual 
references. Finally, when dealing with a group of structurally related topics, 
the user may wish to delete an entire structure, or certain components of a 
structure, from the network. 
We cannot expect a typical user to think in terms of list structures, 
nodes, linkages, etc. Thus we sought to provide a command language and 
feedback more or less independent of the internal data structures that imple- 
ment the representation. In addition, 
care was taken to avoid(the possibility 
of accidental damage to the representation steming from misunderstanding or 
misapplication of the modification procedures. 
The resultant processor includes procedures for removing or adding item 
references to a topic, deleting topics, adding or removing simple phrases from 
the description of a topic, etc. Rather than require the user to identify the 
particular topic to be altered each time a modification is to be performed, 
primitives are implemented as local commands to a generalized modificatton 
processor. 
The modification processor is invoked by issuing a CHANGE command which 
accepts a phrasal description as its argument. A node in the network is estab- 
lished as the currentjidentified topic. The processor then prompts the user 
for modification instructions. After all modifications are completed, the 
user types DONE and control is returned to the regular command monitor. The 
CHANGE command also may be issued while in modification mode, thereby changing 
the current topic. Each of the local commands is discussed separately below, 
using the hypothetical representation depicted in Fig. 21 for illustration. 
Adding References and Phrases to the Network 
The ADD command associates additional text references with the current 
topic, and adds simple phrases to the topic's description. 
To add item refer- 
ences, the user types ADD ITEM[S] followed by a list of item numbers. This 
procedure is quite useful if the user has a large set of items that pertain to 
a particular topic. 
He simply identifies the topic and adds the list of refer- 
ences. 
Fig. 21 - Sample Representation for Discussion of Network Modification 
If the supplied argument is a phrase, it is added to the current node. 
For example, if the current topic is THE PAPER FOR THE ACM CONFERENCE, the 
command ADD PAPER ABOUT AUTONOTE causes the prepositional phrase ABOUT AUTO- 
NOTE to become a part of the topic description. 
Adjectives may also be added 
to a description (example: 
ADD SMITH'S PAPER). If only a single word occurs 
as the argument, it is assumed to be an adjective which is to modify the cur- 
rent subject noun. 
Moving through the Network 
The MOVE command allows the user to change the current node pointer from 
its present position to structurally proximate topics without having to enter 
a description. For example, if currently located at the "paper" node, the 
command MOVE DOWN causes the ACM CONFERENCE to become the current topic. If 
the current topic is the ACM CONFERENCE, MOVE UP will produce three 
higher order topics. Each is saved, and the leftmost node becomes the current 
topic. Subsequently, the user my MOVE LEFT or RIGHT to the other topics. 
Mter a successful move, a brief description of the new topic is displayed. 
The Caching Facility 
The CACHE command stores item references for subsequent use. 
If the com- 
mand is given with no argument, the set of text references associated with the 
current topic is added to an internal cache. The caching facility may be wed 
to manipulate large sets of item references, for example, in transfering all 
item references from one topic to another. 
This may be accomplished by identi- 
fying the first topic and issuing a CACHE command. 
After identifying the 
second topic, the command ADD CACHE causes the set of cached items to be 
merged with those of the new topic. 
Retrieval Commands 
The retrieval commands of the network modification processor are analo- 
gous to their counterparts in AUTONOTE, LIST outputs a list of the item num- 
bers associated with the current topic. LIST CACHE displays the numbers of 
the items in the cache. PRINT outputs selected text items. A11 items associ- 
ated with the current topic, or those In the cache, dl1 be printed in response 
to RETRIEVE and RETRIEVE CACHE, respect Lvely, 
By employing the IDENTIFY and MOVE commands, the user may explore the 
representation, LISTing the associated references for each topic. During the 
exploration, the CACHE command can be used to store selected references for 
later retrieval, or the user may choose to PRINT or RETRIEVE pertinent refer- 
ences as he goes. These procedures allow the retrieval set to be shaped inter- 
actively, and more selectively than is possible with the FIND command discussed 
earlier. 
Removing References and Phrases from the Network 
The REMOVE command accepts the same argument forms as the ADD command and 
simply performs the inverse operations. The argument ALL also is recognized, 
causing all item references to be removed from the current topic. 
Topic Deletion 
DELETE may be employed to remove obsolete topics from the representation, 
or as the first step in replacing a topic description with a more appropriate 
one. CREATE then may be used to enter the replacement topic into the network. 
The major problem in the design of the topic deletion algorithm can be 
stated as follows. 
When deleting a topic, under what circumstances are 
structurally related nodes to be deleted as well? 
Consider the following 
cases. 
In the hypothetical network, a request to delete the ''paper" node in- 
volves a decision about deleting (a) the outline of the paper, and (b) smith's 
comments on the paper. 
Note that if the paper node were deleted and the two 
higher order nodes were not, the higher order nodes would no longer be struc- 
turally related. In addition, their descriptions will still contain the word 
"paper," but which paper no longer is specified. For these reaaons, we con- 
cluded that a topic deletion should also entail the deletion of more specifi- 
cally described topics. In many cases, this convention is an advantage, since 
the user can delete & entire structure by identifying and deleting a single 
lower order node. 
The considerations involved in dealing with lower order nodes are a bit 
more complex. Some lower order nodes serve only to augment the description 
of superior nodes. In THE EXPERIMENT ON BLIND RATS, there will be a subordin- 
ate node directly described as BLIND RATS. In this case, deletion of the EX- 
PERIMENT node should include deletion of the subordinate node. On the other 
hand, deletion of the "paper" node in the previous example does not imply 
deletion of the ACM CONEXRENCE topic. The ACM CONFERENCE node also plays a 
role in other topics. 
In the instances we have examined, the distinction between the two cases 
seems to be that "unimportant" nodes have neither textual references nor up- 
ward pointers to other topics. 
The deletion process employs a heuristic based 
upon this observation. When a subordinate node is deemed unimportant, it is 
deleted; otherwise, the user is asked to confirm it8 deletion. 
The deletion algorithm. First, a list of all nodes to be deleted is con- 
structed. After the list is complete, the user is presented with a brief sum- 
mary of the deletions to be made and is prompted for confirmation. 
The algorithm for constructing the deletions list is recursive. Two push- 
down stacks are employed: one for storing upward node paths yet to be ex- 
plored, and one for saving downward paths. The procedure is most easily ex- 
plained with an example. Suppose the user identifies the "paper" node and re- 
quests its deletion. The algorithm begins with node 2, first pushing down any 
upward pointers along with the node number (2) that was being processed when 
t 1 
the pointers were added to the up stack." In this case the pairs (1,Z) and 
(3,Z) are pushed down. Next, the downward pointers are placed on the "down 
stack" along with the current node number. The current node number is then 
added to the deletions list. The current state of the pushdown stacks and the 
deletibns list is now: 
The down stack is then popped and node 4 is established as the next node 
to be examined. After setting a flag indicating that we have just moved down 
a level in the network, the algorithm recurses on node 4. The system detects 
three upward pointers (to nodes 2, 5, and 6). It should now be apparent why 
we save the fact that node 4 was reached by moving down fram node 2. When 
.. 
UP STACK 
(1,2) 
(392) 
k 
DOWN STACK 
(4,2) 
J 
DELETIONS LIST 
2 
placing a node's upward pointers on the up stack, the node that led down to 
the current node must be ignored. 
Upon noting that node 4 has upward pointers in addition to node 2, the 
system checks to see if it has just moved down. In this case it has; conse- 
quently, node-4 is deemed "important" and the system asks DO YOU WANT THE ACM 
CONFERENCE DELETED? Assume the reply is NO. Since the ACM CONFERENCE node 
will remain, the system records that the linkage between nodes 2 and 4 must be 
severed. The algorithm then recurses without adding node 4 to the deletions 
list. Note also that the upward pointers from node 4 to nodes 5 and 6 are not 
placed on the up stack. The current state of the process is now: 
DOWN STACK DELETIONS LIST 
The attempt to pop the down stack fails, so the qp stack is popped and a 
flag set to indicate upward movement (to node 3). 
Node 3 has no upward poin- 
ters but it has two downward pointers, to nodes 2 and 7. Node 2 is ignored be- 
cause it led up to node 3. Node 7 is placed on the down stack and node 3 is 
added to the deletions list yielding: 
The down stack is popped qnd the algorithm recurses on node 7. 
Node 7 has 
neither text references, nor upward pointers (besides node 3). 
Consequently, it 
le deemed unimportant and is added to the deletions list. 
We now have: 
The down stack is empty so the up stack is popped and the system recurses 
on node 1. The node has no new upward or downwardpointers SO it is added to 
the deletions list. Both stacks are now empty and the algorithm terminates 
having collected nodes 2, 3, 7, and 1 for deletion. 
Carrying out the deletion involves several steps. First, any linkages be- 
tween those nodes that are to be deleted and those that will remain are sewzed. 
These changes will have been detected and recorded during the recursive col- 
lection process. Next the system executes a REMOVE ALL for each node on the 
deletions list so that no associated text reference points to a non-existent 
topic. Then the system removes all pointers from simple phrases to the obso- 
lete nodes. Finally, each deleted node is removed from the node directory 
file. 
Creating New Topic Representations 
CREATE enables the user to define a new topic for the representation. 
The 
command takes as argument a description which is processed in the normal way, 
except that no item references are associated with the topic. 
The new topic 
becomes the current node. 
AI)D is used to associate any appropriate text refer- 
ences. During topic deletion, the system adds to the cache all text references 
previously associated with deleted topics. 
Consequently,  AD^ CACHE will now 
associate those items with the new description. 
J 
t 
L 
I 
UP STACK 
(1,2) 
* 
DOWN STACK 
empty 
DELETTONS LIST 
2 
3 
7 
When processing a CREATE description, 
the network locator attempts to as- 
sociate the description with an existing topic, 
for two reasons. If the des- 
cription is to be a new topic but the network locator confuses it with another 
one, the user may want to alter the description. Second, this permits use of 
the CREATE command in adding to an existing description. 
VXfI. A ,CASE STUDY OF SYSTEM PERFORMANCE 
The Inapplicability of Recall and Precision 
The most widely accepted methods for retrieval systgm evaluation are based 
upon recall and precision measures. As applied to the results of retrieval 
queries, precision is defined as the proportion of retrieved material that is 
deemed relevant to a query; recall is the ratio of relevant documents retrieved 
to the total relevant in the data base. But recall and precision cannot mean- 
ingfully be applied to the evaluation of AUTONOTE2. The AUTONOTE2 user des- 
cribes each piece of textual material himself. Even within a large personal 
data base, the user will certainly recollect some of his topics and the key 
words and phrases that define them. Furthermore, subject to the user's own 
limitations in describing his materials, the topic framework of AUTONOTE2 im- 
plies "perfect" precision and recall once a particular topic is identified dur- 
ing retrieval. 
A principal motivation for the AUTONOTE2 system was the desire to overcome 
the disadvantages of keyword indexing techniques, which force the user to trap 
late ideas and concepts pertinent to a given document into discrete content in- 
dicators. 
In developing AUTONOTE2 we have sought to provide mechanisms for 
defining and efficiently referencing these concepts directly. An evaluation 
of AUTONOTE2 should therefore provide some comparisons of keyword indexing vs. 
indexing by topic. To achieve a direct comparison, protocols of both types of 
indexing activity with a common data base are required. 
The Sauvain Data Base 
The original AUTONOTE system was employed in a study (Sauvain, 1970) 
aimed at uncovering structural communication problems within a keyword-based 
system. The resulting data base is related primarily to Sauvain's dissertatlon 
research. It includes reading notes, bibliographic references, research ideas, 
expository material, and so on. The collection brings together a broad range 
of topics and ideas touching upon various aspects of computer science, infor- 
mation retrieval, man-machine interaction, and psychology. 
Copies of the item texts, the originally assigned keywords, and protocols 
of Sauvain's activities during data base indexing, organi8ation, and rekrieval 
were acquired. We then proceeded to re-index the collection with AUTONOTE2 
topic descriptions. Each of the roughly 400 items in the data base was viewed 
and described in a sequential fashion; that is, there was no look-ahead or pre- 
planning of topic phrasings to facilitate network structuring. Protocols were 
collected of all interaction with the system and the state of the network was 
recorded at periodic intervals. (For details, see Linn, 1972). 
Results 
For brevity, AUTONOTE2 reports of parsing assumptions are excluded. 
However, system responses that elicit a user reply are shown to provide a 
feeling for user interaction under AUTONOTEZ. 
Indexing activity. 
The AUTONOTEL protocols show a high degree of terse, 
efficient referencing of previously defined topics. 
The communicative effi- 
ciency was especially great in instances where several consecutive items were 
entered on a common topic. This situation frequently occurred when entering 
a set of reading notes on a particular paper or collection of papers. Typi- 
cally, the first item in such a set of entries was assigned to one or more 
new topics, In describing the subsequent items, references to these topics 
often were conveyed by a s$ngle word or phrase, or by a null description (a 
description line consisting of only a slash is treated as a reference to the 
topic just mentioned). 
To illustrate, consider the materials dealing with various aspects of 
artificial intelligence. A total of 17 of these items contained notes taken 
at a 1968 conference at Case Western Reserve University. Of the AUTONOTE2 
descriptions supplied for these items, three mention only the word CONFERENCE; 
five include the subphrase 1968 CONFERENCE; two include CWRU CONFERENCE; and 
seven make no explicit reference to the conference at all. Each of the items, 
tt 
however, was associated with a topic node linked in efome way to the confer- 
ence" node. Furthermore, though none of the descriptions contain the wor'ds 
ARTIFICIAL or INTELLIGENCE, each of the associated items can bedaccesseb In 
the network through the "artificial intelligence" node. 
In the AUTONOTE protocol for these materials, there was frequent use of 
descriptor abbreviations and other idiosyncratic tags (CWRUAICONF, AT, COGPSY, 
etc.). These suggest a strong desire to eliminate repeated entry nf lengthy 
descriptors and phrases. The major drawback of this strategy, however, is 
that abbreviations (especially the more uncommon ones) are not as easily 
remembered as the words they represent. In addition, once an abbreviation has 
been used, the user q~ust remember that he has done so in order to maintain 
consistent indexing. In contrast, there was little motivation for descriptor 
abbreviations under the AUTONOTE2 system. Once a lengthy phrase had been de- 
fined in the network there was generally no need to reference it again with a 
full description, 
The Sauvain study identified a clear need for mechanisms to assist the 
user in maintaining consistent indexing. 
The second type of need (how - a descriptor has been used) 
frequently occurs when a text item is being entered. The user 
has some ideas for candidate descriptors, suspects that there 
has been prior usage of these words, and needs a way to check 
the prior usage to keep his indexing consistent. He also may 
want to look at prior usage contexts to get ideas about other 
descriptors to use, or to weed out candidates that look too general. 
The topital view of the data base under AWONOTE2 eliminates part of this 
problem. When describing a - new topic the AUTONOTE2 user need not be as con- 
cerned about prior word usage in other contexts. The representational network 
provides a means for discriminating among the various topics in which a par- 
ticular word occurs. 
If, on the other hand, the user suspects that the item at hand is somehow 
related to a previously existing topic, there is an analogous need to interro- 
gate the representational network for candidate topics. 
This capability is 
provided by the DESCRIBE command. There were, in fact, numerous instances in 
the AUTONOTE2 protocols of network interrogation prior to entering descrip- 
tions. An example is given in Fig. 22. In response to the user's description 
USER: OPEN 
USBR: /SALTON'S ccs COLLOQUIUM ON EVALUATION 
EJEW TOPIC ASSUMED 
USER: RELOCATE 
WHICH DO YOU MEAN: 
A. THIE COMPUTER EVALUATION OF INDEXING (AND TEXT PROCESSING) 
B. EVALUATION OF CRT DISPLAY USAGE 
Fig. 22 - Network interrogation durhg description entry 
the system indicates that no association will be made with a prior topic. The 
user recalls talking about the evaluation of automatic indexing techniques 
earlier so he requests the system to search further by entering a RELOCATE com- 
mand. Two candidates are generated, one of which is the desired referent. 
Under the,keyword system, searching for candidate descriptors and usage 
contexts was much more tedious. Typically, a RETRIEVE: command was issued cal- 
ling for the display of all keyword lines of items indexed by a particular term 
or logical combination of terms. In some cases a large number of items were 
accessed necessitating time consuming perusal of the data base. 
The discussion thus far should convey some feeling for the degree of com- 
municative efficiency achievable with AUTONOTE2. To provide a more precise 
indication of this aspect of system performance we calculated the ratio of 
content words conveyed to content words entered for three samples of data base 
items (articles and prepositions were excluded). The average number of AUTO- 
NOTE2 words entered and conveyed were compared with the average number of 
AUTONOTE keywords assigned within each sample (under the keyword system this 
ratio will always equal one). The three samples taken were (1) a random sample 
of 50 items, (2) 41 sequential items, and (3) all items dealing with some as- 
pect of artificial intelligence. 
The results of this tabulation are summarized in Fig. 23. In all three 
samples more than three content words were conveyed for every two entered, on 
the average. The conveyed-entered ratio was lowest for the random sample and 
highest for the artificial intelligence items. This is because most of the 
items dealing with artificial intelligence were entered with a rich global con- 
text of topic nodes defined. The sequential items, on the other hand, were re- 
lated to several smaller, more localized topic structures. Consequently, mapy 
more of these items were described in full. The random sample lacked a consis- 
tent contextual framework, and consequently had the least communicative effi- 
ciency on the average. 
Retrieval activity. We have seen that retrieval activity is an essential 
part of the indexing and organizational processes. The protocols show a fre- 
quent need to search for related material and item numbers in the keyword sys- 
tem and a corresponding need for network interrogation prior to description 
entry under AUTONOTE2. However, the AUTONOTE2 topic framework eliminates much 
of the text file perusal so common in the AUTONOTE protocols. Each topic des- 
cription typically provides a clear indication of the content of its associated 
Fig. 23 - A comparison of entered and conveyed content words 
for three samples of data base items 
1 
Artificial 
Random Sequential Intelligence 
Sample Sample Sample 
No. of items in sample 
50 41 30 
No. of keywords originally assigned 270 251 155 
Avg. No. of keywords per item 5.4 6.1 5.1 
No. of content words entered 256 233 109 
Avg. Now of content words entered per 5.1 5.7 3.6 
item 
No. of content words conveyed 388 396 235 
Avg. Now of content words conveyed per 7.9 9.7 7.8 
item 
No. of words conveyed per word entered 1.5 1.7 2.2 
I 
L1 
items. Consequently, there was very little need to examine the text file 
prior to describing new items or relating them toothers in the data base. A 
perusal of candidate topics generated by the DESCRIBE command was sufficient in 
most cases. 
The most outstanding improvement during retrieval activity occurred in in- 
stances of very general queries. Under the keyword system a request for all 
items pertinent to, say, ARTIFICIAL INTELLIGENCE or PROBLEM SOLVING will access 
a large set of items. Queries of this kind were employed to peruse large seg- 
ments of the data base relevant to a general topic area--for example, in search 
of item candidates for a particular grouping. The important difference between 
the two systems in this situation is that AUTONOTE gives the user no indication 
of the subtopics within the general topic area. The AUT.ONOTE user essentially 
has two alternatives: he may display each of the accessed itemsCoptionally 
suppressing text); or he may further restrict the set of itemswith an addi- 
J 
additional set of descriptors. Both options have notable drawbacks. The 
first entails time-consuming perusal of the data base. The second raises a 
more significant problem. Which descriptors should be used to restrict the 
size of the accessed set of items? Some descriptors may restrict the set too 
greatly, eliminating relevant material; others may discriminate very little or 
not at all. In the absence of system feedback, this discrimination process 
places a major burden on the user's memory. 
The AUTONOTE2 system, on the other hand, provides the user with very 
meaningful feedback in response to general queries. Consider, fbr example, 
the retrieval protocol presented in Fig, 24. At each level in the representa- 
tional network, the user is given an opportunity to choose among several sub- 
topics. This example very effectively demonstrates a marked improvement over 
keyword indexing--the ability to discriminate among subsets of material indexed 
under a common set of general descriptors. 
Conclusion 
An analysis of man-machine dialogs collected during the description of a 
realistically diverse collection of textual materials has shown the communica- 
tive ease and efficiency and the descriptive power attainable under the refer- 
ential system. The results Indicate that the referential mechanisms developed 
in this study copstitute a viable alternative to keyword indexing techniques 
as applied to personal information systems. The referential approach offers 
four primary contributions toward the improvement of man-machine communication; 
each corresponds to a particular kind of facilitation during storage and re- 
trieval activity. 
L S 
USER: FIND ARTIFICIAL INTELLIGENCE 
DO YOU WANT: 
A. THE 1968 ARTIFICIAL INTELLIGENCE CONFERENCE AT CWRU 
Be THE INFL~cE OF ARTIFICIAL INTELLIGENCE ON COGNITIVE 
PSYCHOLOGY 
C. THE GENERAZ APPROACH TO ARTIFICIAL INTELLIGENCE 
D. ARTIFICIAL INTELLIGENCE USEARCH AT MIT 
E. RETATION OF ARTIFICIAL INTELLIGENCE TO PSYCHOLOGY 
USER: A 
1 ITEM SAnD. WANT TO EXPLORE? 
USER: YES 
DO YOU WANT: 
A. ML'S TALK AT THE CONFERENCE ON THE REPRESENTATION OF A 
PROBLEM SOLVING SYSTEM 
B. ROBINSON'S TALK AT THE CONFERENCE ON THEOREM PROVING SYSTEMS 
C. LIN'S PAPER AT THE CONFERENCE ON THE HEURISTIC SOLUTION OF 
'LARGE COMBINATORIAL PROBLEMS 
D. BANERJI'S OVERVIEW OF GAME PLAYING PROGRAMS AT THE CONFERENCE 
E. SIMMONS REVIEW OF QUESTION ANSWERING SYSTEMS AT THE CONFERENCE 
F. OTHER TALKS AT THE CONFERENCE 
G. SLAGLE'S DISCUSSION AT THE CONFERENCE ON HEURISTIC SEARCH 
PROGRAMS 
He FImS PRESENTATION AT THE CONFE'IIENCE OF AN ALGOL-LIKE 
LANGUAGE FOR PROBLEM SOLVING PROCEDURES 
I. FEIGENBAUM'S DISCUSSION AT THE CONFERENCE OF THE DENDRAL 
J. BANERJI'S 
Fig. 24 - Topic descrimination during 
retrieval activity 
First, the concept of a representational network provides the user with a 
particularly natural view of both the content and organization of his data base 
In essence, the user explicitly defines the important concepts and topics with- 
in his own area of interest; he specified structural relationships among con- 
cepts and, in general, manipulates these informational objects during all phases 
of problem solving activity. This topical view of the data base is, in a very 
real sense,rnore "meaningful" td the user than the artificial view inherent in 
keyword indexing systems. 
Second, the developed techniques provide a unified treatment for both in- 
dexing and organizational activity. 
The communication of structural 
associations is achieved through exactly the same descriptive mechanisms used 
in categorizing material. In effect, the topic serves as the focal point in 
all aspects of communication with the system. 
Third, retrieval capability is considerably enhanced by the discriminatory 
power of the referential system. The representational network provides an 
effective means for distinguishing among the many topics that may be partially 
indexed under a comn set of words. As noted earlier, this discriminatory 
power is especially useful in providing meaningful user feedback in response to 
general retrieval queries. Further, since each topic description serves to 
identify the content of its associated items, the representational network may 
be used as a retrieval intermediary. That is, the user can essentially engage 
in retrieval activity by utilizing mechanisms for exploring the topic struc- 
tures in the network, This aspect of the system greatly reduces the need for 
lengthy perusal of document texts in search of desired materials. 
Finally, the utilization of the structural context provided by the network 
approach taakes it possible for the user to describe, organize, and retrieve 
materials with considerable communicative efficiency, This is a fundamental 
aspect of the system design--to provide a framework for interpreting terse, 
efficient, sometimes ambiguous references to the topics in the information uni- 
verse. 
In light of the increasing availability of on-line computing facilities 
today, it seems reasonable to expect that personalized retrieval systems will 
play an expanding role in the computer support of individual research activity. 
It is hoped that thisstudywill suggest new directions for the design of such 
systems. 

REFERENCES 
Dosert, B. H., & Th~mpson, F. B. How features resolve syntactic ambiguity. 
In J. Minker & S. Rosenf ield (~ds. ) , Proceedings of Sytnposiwn of Infor- 
mation Storage and Retrieval, University of Maryland, April, 1971. 
Linn, W. E. Jr. Man-machine referential communication in a personal informa- 
tion retrieval system. (Doctoral dissertation, The University of 
Michigan) Ann Arbor, Michigan: University Microfilms, 1972. No. 73-6867. 
Reitman, W. Cognition and thought. New York: Wiley , 1965. 
Reitman, W,, Roberts, R. B., Sauvain, Re W,, Wheeler, D, D., & Linn, W. 
AUTONOTE: A personal information storage and retrieval system, Proceed- 
ings of the 24th National Conference of the Associati~n for Computinq 
Machinery, New York: Association for Computing Machinery, 1969. 
Pp. 67-76. 
Sauvain, Re W. Structural communication in a personal information storage and 
retrieval system. (~octoral dissertation, The University of Michigan) 
Ann Arbor, Michigan: University Microfilms, 1970. No, 70-21782. 
