File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/79/j79-1050_metho.xml

Size: 114,685 bytes

Last Modified: 2025-10-06 14:11:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="J79-1050">
  <Title>Association for Computational Linguistics</Title>
  <Section position="1" start_page="0" end_page="95" type="metho">
    <SectionTitle>
THE FINITE STRING
NEWSLETTER OF THE ASSOCIATION ,FOR COMPUTATIONAL LINGUISTICS
TABLE OF CONTENTS
</SectionTitle>
    <Paragraph position="0"> An organization for a dictionary of word senses Dick H, Fredericksen ............... 2 Current Bibliography ................. 24 Cahiers du groupe de travail Analyse et experimentation dans les sciences de 1 'homme par les methodes infor mathiques - E. Chouraqui and J. Vfrbel ....... 93 Bibliography and subject index, current computing ... 94 .......</Paragraph>
    <Paragraph position="1"> Directory of university computer science  Privacy, security, and the information processing industry - Dahl A. Gerberick ............ 96</Paragraph>
  </Section>
  <Section position="2" start_page="95" end_page="95" type="metho">
    <SectionTitle>
AMERICAN JOURNAL OF COMPUTATIONAL LINGUISTICS is published
</SectionTitle>
    <Paragraph position="0"> by the Center for Applied Linguistics for the Association for Computational Linguistics.</Paragraph>
    <Paragraph position="1"> EDITOR: David G. Hays Professor of Linguistics, SUNY Buffalo</Paragraph>
  </Section>
  <Section position="3" start_page="95" end_page="95" type="metho">
    <SectionTitle>
EDITORIAL ASSISTANT: William Benzon
EDITORIAL ADDRESS : win Willows, Wanakah, New York 14075
MANAGING EDITOR: A. HOOD ROBERTS Deputy Director, Center for
Applied LinguistTics
MANAGEMENT ASSISTANT : James Megginson
PRODUCTION AND SUBSCRIPTION ADDRESS : 1611 North Kent Street,
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="4" start_page="95" end_page="95" type="metho">
    <SectionTitle>
DICK H. FREDERICKSEN
IBM THOMAS J. WATSON RESEARCH CENTER
YORKTWON WIGHTS, NEW YORK 10598
</SectionTitle>
    <Paragraph position="0"> ABSTRACT: This paper describes a loxical organization in which &amp;quot;scnscs&amp;quot; arc rcprcscnted in their own right, along with &amp;quot;words&amp;quot; and &amp;quot;phrases&amp;quot;, by distinct data itcms. The objcctivc of the scheme is to facilitate recognition and employment of synonyms and stock phrases by programs which process natural la,nguage. Besides presenting the proposed organization, the paper characterizes thc lexical &amp;quot;senses&amp;quot; which result.</Paragraph>
    <Paragraph position="1">  1. Introduction.</Paragraph>
    <Paragraph position="2">  - null This paper describes an internal lexical organization which is particularly designed to capture the facts about synonymy. Besides recording the inclusion of each word in one or more synonym sets (identified with its various &amp;quot;senses&amp;quot;), the scheme attempts to distribute attributes per~picuously between &amp;quot;senses&amp;quot;, &amp;quot;wordings&amp;quot;, and the intersections of the two. In addition, there is provision to record multi-word idioms, stock phrases, and the like. and to include these as elements in synonym sets when appropriate.</Paragraph>
    <Paragraph position="3"> Briefly, &amp;quot;senses&amp;quot; are represented in their own right, along with &amp;quot;words&amp;quot; and &amp;quot;phrases&amp;quot;, by distinct data items. Each word or phrase is associated with a list of the &amp;quot;senses&amp;quot; which it can express; conversely, each &amp;quot;sense&amp;quot; is associated with a list of &amp;quot;alternative wo iings&amp;quot; Additionally. each word is associated with a list of phrases in which it occurs.</Paragraph>
    <Paragraph position="4"> Grammatical category, features, selection restrictions, and the like are applicable at three different levels: to words or phrases as such, to &amp;quot;senscs&amp;quot; as such, or to particular usages of words or phrases (equivalently, to particular wordings of &amp;quot;senses&amp;quot;).</Paragraph>
    <Paragraph position="5"> An Organization for a Dictionary of Word Senses This lexical organization has been inlplemerlted at 1BM Research, Yorktown Heights, N.Y., by a program -- not to be described here -- which builds such dictionaries in a very compact form, giving interactive assistance to the person making the entries. (For cxaniple, the plrogram pdn ts out the possibility of merging &amp;quot;senses&amp;quot; whenever their wordings overlap and their attributes arc compatible, and merges them if so directed.) There are suitable I'acilities for saving the results, retrieving them in various ways, and for altering such things as sclienies of classification witlio~~~ scrapping previously prepared work.</Paragraph>
    <Paragraph position="6"> The ultimate intent is that the &amp;quot;dictionary of senses&amp;quot; should serve as the lexical coniponent in a natural language fact-retrieval system. Pending its incorporation in that rolc, it will be used to amass and organize information on the semantic relations among words and phrases.</Paragraph>
    <Paragraph position="7"> The balance of this paper conies in two sections: Section 2 presents the proposed lexical data structures, and suggests haw they arc to be used. Included is a sketch of how various types of grammatical anri semantic &amp;quot;attributes&amp;quot; fit into thc scheme.</Paragraph>
    <Paragraph position="8"> Section 3 discusses the character of the &amp;quot;senses&amp;quot; encoded in the resulting dictictlary. Reasons arc advanced for regarding lexical &amp;quot;senses&amp;quot; as something far short of semantic primitives. At the same time, synonym sets are defended against the view that &amp;quot;true paraphrases are rare or rtonexistcnt&amp;quot;. An Organization far a Dictionary of Word Senses</Paragraph>
  </Section>
  <Section position="5" start_page="95" end_page="95" type="metho">
    <SectionTitle>
2. The Internal Remesentation.
</SectionTitle>
    <Paragraph position="0"> It will be our purpose in this section to say just enough about internal representation to lay bare the organizing principles of the lexicon. The focus is on architecture afid motivations; details of field layouts, internal codes, etc. are not at issue here.</Paragraph>
    <Paragraph position="1"> To make the discllssion concrete, suppose we are interested in the senses of the word &amp;quot;changev Assuming that none of the words are unfamiliar, the following should put us in mind of two senses: change: 1. v alter; 2. n small coin.</Paragraph>
    <Paragraph position="2"> This, of course, is just a dictionary entry in the traditional format (though with synonyms offered in lieu of definitions), On the other hand, we might approach the same information from a different direction: starting with the two concepts, we might seek words to express them. It is difficult to picture this latter situation without assigning artificial labels to the concepts. Call them concepts 1 and 2, and suppose for a moment that there were a practical way to look the concepts up (witlzozlt having thought of either word for either concept). Then the information to be retrieved might be envisioned this way: I. v change, alter</Paragraph>
  </Section>
  <Section position="6" start_page="95" end_page="95" type="metho">
    <SectionTitle>
2. n change, small coin
</SectionTitle>
    <Paragraph position="0"> It is this duality of viewpoint -- that words have senses, while senses have wordings -- that our lexical representation must reflect.</Paragraph>
    <Paragraph position="1"> The starting point, then, is that words, phrases, and &amp;quot;senses&amp;quot; are separately represented. There are three principal types of data item, plus a standard connector:  1. A &amp;quot;Key Data Item &amp;quot; (KDI) represent:; a single word.</Paragraph>
    <Paragraph position="2"> 2. A &amp;quot;Phrase Data Item&amp;quot; (PDI) represents a string of two or more words which are to serve as a unit in some context..</Paragraph>
    <Paragraph position="3"> An Organization for a Dictionary of Word Senses 3. A &amp;quot;Sense Data Item&amp;quot; (SDI) represents one distinct sense common to a set of wordq  and/or phrases. In general, a word or phrase may be usable in more than one sense, while a given sense may have alternative (synonymous) wordings. Both these types of variability are recorded making use of the next data item:</Paragraph>
  </Section>
  <Section position="7" start_page="95" end_page="95" type="metho">
    <SectionTitle>
4. A &amp;quot;Sertse Link Element&amp;quot; (SLE) is a connective item, to be explained shortly.
</SectionTitle>
    <Paragraph position="0"> Three principal fields will engage our attention in each type of data item. Fig. 1 summarizes the fields for each type.</Paragraph>
    <Paragraph position="1">  pointer to the first SLE (Sense Link Element) in a chain of SLE's which represent the various senses of the word or phrase. The SLE's are chained via their own &amp;quot;alternative senses&amp;quot; links, and the final member points back to the KDI or PDI. Thus, we shall speak of such a chain as a ring  specifically, an &amp;quot;alternative senses ring&amp;quot;. If no senses are on record for a particular word ox phra , the &amp;quot;alternative senses&amp;quot; link in the KDI or PDI is ssl: referent. Reciprocally, each SDI (Sense Data Item) contahs an &amp;quot;alt-native wordings&amp;quot; link. This leads to a chain of SLE~S which represent more-or-less synonymous wordings mat express the sense. These SLE's are chained through their own &amp;quot;alternative wordink links, and again the chain is closed into a ring -- this time beginniog and ending with the SDI.</Paragraph>
    <Paragraph position="2"> The structure that is shaping up may now be seen in Fig. 2, The crtrcicrl poirir is that encll SLE represents the intersection betwen nn &amp;quot;alternntive sense^ ring nrrd an &amp;quot;cllterncirive wordi~rgs&amp;quot; ring. From the standpoint of the word or phrase, it; represents a particular scnse; from the standpoint of the sense, it represents a particular wordink Starting from a KDI or PDI, one gets to the SDI for a particular sense by advancing along the &amp;quot;alternative senses&amp;quot; ring to the relevant SLE, then &amp;touring along the ring which connects th~ latter to the SDI (as one of the SDI's &amp;quot;alternative wordings&amp;quot;). Starting from an SDI, one gets to a particular wording by the reverse process. Since each &amp;quot;alternative senses&amp;quot; ring contains exactly one KDI or PDI, while each &amp;quot;alternative wording#' ring contains exactly one SDI, each SLE is tied to exactly one sense of one word or phrase. (Eani~alently, it is tied to one wording of one sense.) The next point of interest is that &amp;quot;attribute&amp;quot; f'i~lds are present in all four types of data item -- even in the connectors (SLE's). The attributes which may be recorded in each, however, come from different bags.</Paragraph>
    <Paragraph position="3"> To begin with, the attributes found in an SDI characterize a11 the wordings of a given sense whenever the wording-, are used in that sense. In Fig. 2, for example, sense &amp;quot;I&amp;quot; should be marked as a &amp;quot;verb&amp;quot; sense, whil? sense &amp;quot;2&amp;quot; is a &amp;quot;noun&amp;quot;. One would not wish to record the attribute &amp;quot;verb&amp;quot; in the KDI for the word &amp;quot;change&amp;quot;, for the KDI represents facts about the word itself, irrespective of sense, and &amp;quot;verb&amp;quot; does no! 'hold for all uses of the word &amp;quot;change&amp;quot;. On the other hand, &amp;quot;verb&amp;quot; does characterize all wordings of sense &amp;quot;I&amp;quot;, whenever they're being employed to express that sense. It would furthermore apply to any additional wordings which we might think of, such as &amp;quot;modify&amp;quot;, provided they are really used in a synonymous way.</Paragraph>
    <Paragraph position="4"> As a matter of fact, it turns out that the traditional parts of speech -- noun, verb, adjective, preposition, etc. -- fit best in this scheme as global attributes of senses, recorded ill the SDI's. An Organization for a Dictionary of Word Senses senses of &amp;quot;alter&amp;quot; Fig. 2 &amp;quot;Alternative Senses&amp;quot; and &amp;quot;Alternative Wordings&amp;quot; Rings (The first sense has two wordings: &amp;quot;alter&amp;quot; and &amp;quot;change&amp;quot;. The second sense has wordings &amp;quot;change&amp;quot; and &amp;quot;small coin&amp;quot;. Two senses are recorded for &amp;quot;change&amp;quot;, and one sense each for &amp;quot;alter'.' and &amp;quot;small coin&amp;quot;.) A different sort of attribute may be recorded in a KDI, as a global feature of the word itself. For example, we may note of the word &amp;quot;change&amp;quot; that it is &amp;quot;regularly conjugated&amp;quot;. That is, when used An Organization for a Dictionary of Word Senses  as a verb, it forms the third person singular by adding &amp;quot;s&amp;quot;, and both past ~nd past participle by adding &amp;quot;ed&amp;quot; To be sure, this &amp;quot;global&amp;quot; attribute applies only to the &amp;quot;verb&amp;quot; senses of &amp;quot;change&amp;quot;; but a moment's reflection will confirm that &amp;quot;change&amp;quot; has more than one &amp;quot;verb&amp;quot; sense, and the regularity of its conjugation is common to all of them. Thus, it is useful to note this regularity as an attribute of the word itself. (Contrast this with the behavior of the word &amp;quot;can&amp;quot;, which is regular whn it means &amp;quot;to pack in cans&amp;quot;, but irregular when it means &amp;quot;is able to&amp;quot;.) Various other attributes suggest themselves as global characterizers of the words thcn~selvcs, to bc recarded in the KDI's. For example, one might wish to note of &amp;quot;change&amp;quot; that it drops its final &amp;quot;e&amp;quot; whon adding &amp;quot;ing&amp;quot; (this is the normal rule) but of &amp;quot;singe&amp;quot; that it doesn't.</Paragraph>
    <Paragraph position="5"> Still other attributes are appropriate when characterizing multi-word units (in PDl's). A string of words whose meaning is not evident from the nlere juxtaposition of its constittlents (such as &amp;quot;givc up&amp;quot;) may be classified as an &amp;quot;idiom&amp;quot;, A string of words whose meaning could be figured out from the meanings of its constituents, but which occurs with enough frequency to warrant inclusion in the dictionary, might be classed as a &amp;quot;stock phrase&amp;quot;. (Example: &amp;quot;drop dead&amp;quot;.) A string like &amp;quot;perform in a subordinate role&amp;quot;, which one would not normally expect to encounter in its own right, might be classed as a &amp;quot;definition&amp;quot; (for a certain sense of the word &amp;quot;accompany&amp;quot;, difficult to reword except with a definition).</Paragraph>
    <Paragraph position="6"> Perhaps the most unexpected site for recording attributes is in the connective elements (SLE's).</Paragraph>
    <Paragraph position="7"> These are the logical place, though, to note features that apply to a specific sense of a word, without being global to either the sense or the word. Consider the following four senrenccs: On the way to the office, he stopped daydreaming.</Paragraph>
    <Paragraph position="8"> On the way to the office, he ceased daydreaming.</Paragraph>
    <Paragraph position="9"> On the way to the office, he ceased to daydream.</Paragraph>
    <Paragraph position="10"> versus: On thebway to the office, he stopped to daydream.</Paragraph>
    <Paragraph position="11"> Suppose we choose to view this as a restriction upon the (surface) object of the verb: &amp;quot;stop&amp;quot;, when applied to an action, must take a gerund as its object; &amp;quot;cease&amp;quot; can take either a gerund or an infinitive. (It wouldn't affect the point being made if we said that &amp;quot;stop&amp;quot; inhibits a certain grammatical transformation en route to surface structure, while &amp;quot;cease&amp;quot; pennits it.) An Organization tur a Dictionary of Word Senses  Now, we wouldn't want to mark &amp;quot;gerund object only&amp;quot; as a global. attribute of the sense, for we have just shown that &amp;quot;cease&amp;quot; and &amp;quot;stop&amp;quot;, two wordings of the sense, differ with respect to this restriction. On the other hand, it doesn't belong among the global attributes of the word &amp;quot;stop&amp;quot; as such, for &amp;quot;stop&amp;quot; has other verb senses, even transitive ones, tu which the restriction is completely inapplicable. (Consider &amp;quot;stop a hole in the dike&amp;quot;, &amp;quot;stop a catastrophe&amp;quot;, etc.) That leaves the alternative we are suggesting: treat the restriction as an attribute of one particular usage of thc word (equivalently, one particular wording of the sense).</Paragraph>
    <Paragraph position="12"> Fig. 3  Besides having senses, individu~l words are involved in phrases, and this fact is also represented in our data structure. Fig. 3 shows the plan of attack. I'n the KDI for each word, there is a link connecting it to the PDI for the first phrase in which the word is known to occur, together with a number designating the position of the word (Ist, Znd, 3rd. etc.) in that phrase. In the PDI itself, there is a coiltinuation link for each word of the phrase, together with its niiri~ber in the next phrase. In the final PDI involving a given word, the link for that word points back to the KDI. Thus, independent of its &amp;quot;altcrnative senses&amp;quot; ring, each KDI tnay hmc r~ &amp;quot;phrase invol\~cit~ents&amp;quot; ring.</Paragraph>
    <Paragraph position="13"> This structure makes it possible to retrieve all the idioms, stock phrases, definitions, etc., in which a given word has made its appearance, anywhere in the dictionary. As the same structure is used to encode every multi-word unit, no occurrence of a word is ever lost sight of, and r~ phrase can be looked up via any of its constituent words.</Paragraph>
    <Paragraph position="14"> Of the fields to which Fig. 1 calls attention, we have discussed all but one. In the SDI for each &amp;quot;sense&amp;quot;, tbpre is a &amp;quot;sense chain&amp;quot; link field. This links the SDI to its successor in a global chain of &amp;quot;senses&amp;quot;. Using this chain, it is possible to make an exhaustive, non-duplicative list of all the &amp;quot;senses&amp;quot; recorded in the dictionary. The listing program has only to proceed down the chain, retrieve frdm each SDI its attributes, decode them, then chase around the &amp;quot;alternative wordings' ring of the SDI and list the wordings alongside the attributes.</Paragraph>
    <Paragraph position="15"> One more feature of the internal representation deserves mention: the data items for words occur as &amp;quot;leaves&amp;quot; in a lexical tree (Fig. 4). That is, the KDI for a word can be looked up letter-by-letter, following a chain of pointers that correspond to successive letters. The chain ends at a KDI after following a substring sufficient to distinguish the word fro111 the nearest thing like it in thc dictionary. The lexical tree has the advantage that words can be looked up either at random or in sequence.</Paragraph>
    <Paragraph position="16"> Recapitulating, these are the essential features of the representation:  *1) &amp;quot;Senses&amp;quot; are represented separately from &amp;quot;wordings&amp;quot;, and the mutual connections between them are made explicit in both directions.</Paragraph>
    <Paragraph position="17"> *2) &amp;quot;Wordings&amp;quot; may be either single words or multi-word pnrases. I hese are representea by  distinct types of data item, and may be subject to distinct schemes of classification. but they are on the same footing with regard to &amp;quot;sense&amp;quot; connections. With each word is associated an exhaustive list of the phrases in which it occurs.</Paragraph>
    <Paragraph position="18"> An Organization for a Dictionary of Word Senses  (For a dictionary containing only the words a , &amp;quot;above&amp;quot;, &amp;quot;abate&amp;quot;, and &amp;quot;monkey&amp;quot;, this would be the full tree. The path to each word is only as long as needed to distinguish it from the neighbor with which it shares the longest leading substring.) *3) Classifiers and features, drawn from appropriate sets, may be attributed separately to words, to phrases, to senses, or to particular senses of words or phrases (i.e., to particular wordings of senses).</Paragraph>
    <Paragraph position="19"> *4) The data items which represent senses are globally chained, and may be exhaustively listed.</Paragraph>
    <Paragraph position="20"> An Organhation for a Dictionary of Word Senses *5) The data items which represent words are accessible as &amp;quot;leaves&amp;quot; of a lexical tree; hence they may either be retrieved by lookup (in response to presentation of the words) or volunteered in alphabetical order, Given a con~mitn~ent to repreiknt a lexicon as suggested by points * 1 through 9 tabovc, various implementations would be possible. Alternative implementations of individtlal points (though not of the scheme as a whole) have in fact been described by other writers. Tlte lexical tree (*S), for example, is no great novelty: Sydney M Lamb and William H. Jacobsen describe implementation details of one such tree [SJ. [lo] also concerns a dictionary which uscs this general style of organization for lookup. For '.hat matter, the lexical tree is reminiscent of Fcigcrrbr~um's &amp;quot;discrin~ination tree.&amp;quot; [ 11 More interestingly, the Separate representation of senses n11d wordings has been incorporated in othei. systems by R. F. Simnlons ([Ill, [12]) and by Larry R. Harris (31. This way of looking at matters led Harris to remark some of the same points that we have been stressing: that senses have alternative wordings just as words have alternative senses; that multi-word phrases might occur on the same footing as individual words in the expression of a sense; and (interestingly enough) that part-of-speech information really adheres to the &amp;quot;sense&amp;quot;, not to the &amp;quot;word&amp;quot; Similarly, Simmons associates his &amp;quot;deep case&amp;quot; information with lexical nodes representing &amp;quot;wordsenscs&amp;quot;,, while words the~nselves are treated as &amp;quot;print image&amp;quot; attributes of the wordsenses.</Paragraph>
    <Paragraph position="21"> Harris's dictionary was only a minor component in a small-scale model of concept acquisition. No great number of either words or concepts was required to illustrate the principles at stake, so Harris programmed the dictionary as an array, with words represented by rows and &amp;quot;concepts&amp;quot; by columns. Elements of the array were merely frequencies, indicating the strength of association between each word and each concept.</Paragraph>
    <Paragraph position="22"> Needless to say, for a full-scale vocabulary of words and concepts, such an array is mostly empty; nobody would dream of expanding it in that form. From a programming standpoint, the only thinkable choice is some form of list structure. Having decided in principle to use &amp;quot;some form of list structure&amp;quot;, though, one might well ask: Why chains? Why rings? Why not just include in each Key Data Item a full list of pointers to the corresponding Sense Data Items. and vice-versa? The answer is simply one of convenience. It's easier to handle insertions and deletions when they don't require the movement of expanded items to new quarters, or the provision of &amp;quot;overflow&amp;quot; pointers. It's easier to reclaim freed storage when deleted items come in a handful of standard An Organization for a Dictionary of Word Senses  sizes. As for &amp;quot;rings&amp;quot;, they eliminate the need for two-way pointers, since one can break into a ring at any point and follow it to its source.</Paragraph>
    <Paragraph position="23"> It should be noted that to make rings an attractive representation, the details of the material being represented must cooperate, In particular, the rings must not become too long, or the processing requlred to follow them becomes excessive. It happens that &amp;quot;alternative senses&amp;quot; rings and &amp;quot;alternative wordings&amp;quot; rings are typically short rarely more than a dozen links per ring. &amp;quot;Phrase involvement&amp;quot; rings, on the other hand, can become spectacularly long, especially for words like &amp;quot;a*' and &amp;quot;to&amp;quot;. In practice, it's necessary to provide these rings with short-cut links. Any of these, programming details could be altered, however, without abando~ting tltc ossencc of the scheme, which is given in points * 1 through *5 above.</Paragraph>
    <Paragraph position="24"> An OqginiZatioh for a Dictionary of Word Senses</Paragraph>
  </Section>
  <Section position="8" start_page="95" end_page="95" type="metho">
    <SectionTitle>
3. The Character of Lexical Senses.
</SectionTitle>
    <Paragraph position="0"/>
    <Paragraph position="2"> Pe~haps the first tljing to get straight about the &amp;quot;senses&amp;quot; represented in this dictionary is what they are not. They are aot &amp;quot;concepts&amp;quot;; they are not a set of &amp;quot;primitives&amp;quot; into which human espurlcncc an be decomposed, No conjecture is put forivard here [hot any such collcrtion of discrete, i~tr~riiic concepts even exists, let ulonc that it might be finite.</Paragraph>
    <Paragraph position="3"> Rather, the &amp;quot;senses&amp;quot; of the dictionary are in thc nr.urc of fuzzy equivalcncc sets among worcis.</Paragraph>
    <Paragraph position="4"> (This is only a metaphor; we shall do more and more violence to thc ttchnicul notion of an &amp;quot;equivalence set&amp;quot; as we proceed.) Each &amp;quot;sense&amp;quot; groups n set of words which, in n set of appr:~priate cu.ntexts; might be tiscd ~nore or less intcrchangenbly. That the cq\livalenct. sots arc fu7'r.y. nrie can convir~cc otleself with but the briefest im~~~crsion in the r~lntcrials of thc. lang~i~p~ -- trying to decide whether particular words belong in particular groups or justiPi\~ the crcr~tion of new groups.</Paragraph>
    <Paragraph position="5"> Consider, for example, the following set of words and phrases: (abandon, give up, surrender, relinquish, let go. desert, leave, forsake, abdicate) Clearly, there is a common theme that can run through all of these, given the right circumstances.</Paragraph>
    <Paragraph position="6"> It might be expressed as &amp;quot;reluctant parting from somebody or something&amp;quot;. This can be seen by coupling the verbs with var.ious possible objects: (abandon, give up, surrender) a town to the enemy (abandon, give up) all hope (give up, relinquish) one's claim to an estate (give up, let go) our entire stock at a loss (abandon, desert, leave) one's wife and children (desert, forsake) a friend in need (give up, abdicate) the throne An Organhtion for a Dictionary of Word Senses (abandon, desert) an exhausted mine (forsake, give up) all other, keeping thee only to her/him (abandon, desert, leave) the area threatened by the storm Should we, then, declare this group of words to be a &amp;quot;sense&amp;quot;? There are difficulties. The various words carry nuances, which it may or may not be easy to ignore in a particular context, &amp;quot;Forsake&amp;quot;, for example, can suggest that there is something reprehensible about the action. It can also connote formal renunciation, and the above example from a marriage vow shows that the formality can be present without the reprehensibility. Nuances get in the way of interchangeability; it would sound strange to substitute &amp;quot;desert&amp;quot; into the marriage vow. Besides nuances, the individual words have conventional areas of application. One does not normally say that the doctors &amp;quot;deserted&amp;quot; all hope, or that an errant husband &amp;quot;surrendered&amp;quot; his wife and children. The minister officiating at a wedding would be considered daft if be adjured the bride and groom to &amp;quot;abdicate&amp;quot; all others, and a merchant would not advertize that he was &amp;quot;relinquishing&amp;quot; his entire stock at a loss. (Somehow, the larkr situaaon calls for more pedestrian language.) At the opposite extreme, overawed by this lack of interchangeability, we might decide to respect the unique personality of each word, abolishing equivalenioe classes altogether. The inconvenience of such a cop-out is obvious: we then have to introduce some other mechanism for recognizing the equivalence of utterances that are intended synonymously, though they employ different words. But beyond being inconvenient, the exclusion of equidence sets is a denial of linguistic facts -just as bad, in its own way, as the naive attribution of unconditional synonymy.</Paragraph>
    <Paragraph position="7"> For it is a commonplace of everyone's experience that the speaker and the listener agree to ignore the nuances of words, whenever nuances get in the way of communication. A writer who has used the word &amp;quot;give up&amp;quot; eight times in five lines will surely cast about for some alternative ways of saying the same thing. If &amp;quot;relinquish&amp;quot; and &amp;quot;abandon&amp;quot; would normally be too flowery, or if &amp;quot;mender&amp;quot; would in other circumstances call to mind an armistice ceremony in a railway wagon. that will not deter the writer from tossing in a few occurrences of those words -- once a context has been established that discourages the overtones. Nor will the reader understand matters any differently. It is as if writer and reader conspired: &amp;quot;We're fed up with that word, let's hear another.&amp;quot; Or, perhaps, the writer simply connives at jolting the reader awake with frequent changes of idiom, maybe even an occasional incongruity. En any case, synonymy is imposed upan An Organization for a Dictionary of Word Senses  the words, and this literary behavior merely exaggerates what people do habitually in cbmmon speech.</Paragraph>
    <Paragraph position="8"> Not only can words be stripped of nuances normally present; they can toke on colorations suggested by the context. The suggestion of &amp;quot;reluctance&amp;quot; conveyed by all thc verbs of our example can be inferred, in at least one case, from the setting alone; and in this case, a variety of rnorc neutral verbs could be used synonymously: (part with, take leave of) our entire stock at a loss One could even substitute the word &amp;quot;sell&amp;quot;, and it wouldn't change the meaning that was already bad into the utterance. But to adnlit contest-dependent synonyniy of this clcgrcc is to strctch the equivalence sets&amp;quot; to thc point of uselessness.</Paragraph>
    <Paragraph position="9"> It comes to this: neither the grouping nor the separation of words can be fully justified. Grouping is nearly always conditional, and separation is often so. If one could anticipate all possible contexts in which a group of words could occur, one could perhaps enumerate all possible equivalence sets -- one for each combination of word group with a set of contexts making the words interchangeable. Anyone, however, can see the futility of that aspiration.</Paragraph>
    <Paragraph position="10"> In the end, one settles for messy compromises. Words are grouped if a largish set of contexts in which they are interchangeable springs readily to mind. They are separated (into perhaps overlapping groups) if the imagination readily suggests contexts in which their meanings differ &amp;quot;significantly&amp;quot; -- whatever &amp;quot;significantly&amp;quot; may mean. In doubtful cases, when words are grouped somewhat questionably, one promises oneself to add markings some clay that will prevent misuse of the equivalence. When words are separated somewhat questionably, one promises oneself to add a mechanism some day that will recognize their relatedness.</Paragraph>
    <Paragraph position="11"> In the end, too, one assigns internal structure to the equivalence sets. That's the effect of assigning local attributes to the alternative wordings (&amp;quot;animate subject&amp;quot;, &amp;quot;object a vehicle&amp;quot;, etc.): constraints are imposed upon the interchangeability of the wordings. More radical structuring can be accomplished if, for example, one notes &amp;quot;government&amp;quot; as an alternative wording of the sense &amp;quot;govern, rule, control&amp;quot;, with the attribute &amp;quot;nominalization&amp;quot;.</Paragraph>
    <Paragraph position="12"> A trenchant discussion of such difficulties may be found in Kelly and Stone [4]. There the emphasis is upon disambiguation: given a word in a passage of text, they seek to identify (by selection from a fixed list of possibilities) the sense in which it is used. Building a computerized An Organization for a Dictionary of Word Senses dictionary for the purpose, they soon became concerned with the arbitariness and the proliferation of target &amp;quot;senses&amp;quot;, as taken from standard desk dictionaries. They argue, with persuasive examples, that what lexicographers conventionally distinguish as separate senses of a word are often just applications of the word's underlying concept to different contexts. To cover the various contexts, the underlying concept has to be stretched a little, by a process of metaphoric extension. This metaphoric process is beyond our present power to computerize, but for 'the long run looks indispensable for successful language processing. Meanwhile, the authors advocate a dictionary which records for each word as few discrete senses as practicable, combining into one scnsc all the usages which can reasonably be united by a common underlying thougl~t.</Paragraph>
    <Paragraph position="13"> It is interesting to re-examine Kelly and Stone's argument with a diffcrent task in n~inii: not tile disambiguation of one word, but the recognition of synonymy between two words. A n~etaphorical capability would be as useful for the one task as for the other, but in the case of synonynl rccognition, some of the considerations which have guided traditional lexicography remain pertinent. In particular, it is necessary to ask not merely whether the concepts overlap, but whether the one word may in fact be used in place of the other. As noted before, usage is restricted by conventional domaitls of application; for example, an &amp;quot;alteration&amp;quot; is conceptually both a &amp;quot;change&amp;quot; and a &amp;quot;modification&amp;quot;, but one wouldn't call it a change or a modification when painting a sign for a tailor's shap.</Paragraph>
    <Paragraph position="14"> The arbitrariness of the equivalence sets is not all that disqualilies then1 as &amp;quot;conceptual primitives&amp;quot;. There is a much deeper difficulty in the fact that practically all &amp;quot;senses&amp;quot; can be paraphrased in terms of other &amp;quot;senses&amp;quot;. Take, for example, the intransitive sense of &amp;quot;change&amp;quot; (as in &amp;quot;My, but you've changed!&amp;quot;). Surely, one would suppose, the concept of &amp;quot;change&amp;quot; must be primitive? Change of state is what well-nigh a third of all verbs are about.</Paragraph>
    <Paragraph position="15"> But if &amp;quot;change&amp;quot; is a &amp;quot;primitive&amp;quot;, it's a peculiar sort of &amp;quot;primitive&amp;quot;, for it can be paraphrased in a variety of ways: (change, become different, cease to be the same, assume new characteristics, make a transition into a new state) Note that the multi-word paraphrasals are not idioms; the individual words contribute their usual meaqings to concatenated meanings which express the concept &amp;quot;change&amp;quot;.</Paragraph>
    <Paragraph position="16"> An Organization for a Dictionary of Word Senses  But perhaps we were merely unlucky? Perhaps we chanced upon a concept which looked elemental but actually turned out to be complex. Maybe the real primitives are &amp;quot;become&amp;quot;, &amp;quot;be&amp;quot;, &amp;quot;cease&amp;quot;, &amp;quot;different&amp;quot;, &amp;quot;same&amp;quot;, etc. Let's dig into that possibility.</Paragraph>
    <Paragraph position="17"> What does it mean to &amp;quot;become X&amp;quot;, where X is an adjective? The meaning can bc \~ario\isly expressed: (become X, come to bq X, get to be X, get X, turn X, grow X, assume thc charrtctcristic X) That's a discouraging number of ways for a &amp;quot;primitive&amp;quot; to be re-expressible -- though if we choose to regard &amp;quot;come to be&amp;quot; and &amp;quot;get to be&amp;quot; as idiomatic concatcnr\tions of words, only onc of the alternatives makes use of other concepts to explain the one at hand.</Paragraph>
    <Paragraph position="18"> As for &amp;quot;different&amp;quot;, it implies a whole underlying anecdote about sortlebody making a contparisoi~, after first making a judgment about relevant things to compare. In the combination of the two concepts -- &amp;quot;become different&amp;quot; --, we furthermore drop mention of the objects being compared. It's simply understood that they are certain attributes of the subject at two points in time.</Paragraph>
    <Paragraph position="19"> It is tempting to invent ad-hoc &amp;quot;transformational&amp;quot; explanations for these phenomena. One might conjecture, for example, tliat &amp;quot;The man changed.&amp;quot; is a surface realization of four underlying sentences: (Man be X at time m. Man be Y at time n. X not equal Y, Time n greater-than time n~.) Thetrouble with explanations of this sort -- apart from the fact that they introduce growing complexity into the understanding of straightforward utterances -- is that they assign arbitrary primacy to some concepts at the expense of others. Why should &amp;quot;time n greater-than time m&amp;quot; be an assumed primitive? May we not equally well conjecture that &amp;quot;time n greater-than time m&amp;quot; is a surface realization of these?: (Time be m. Time change. Then time be n.) For that matter, why not view An Organization for a Dictibnary of Word Senses &amp;quot;Time elapsed. &amp;quot; as a surface form of this?: &amp;quot;At least one thing in the universe changed.&amp;quot; After all, what is &amp;quot;time&amp;quot; but a nominalized way of talking about the presence and partitioning of change? The difficulty, it would seem, lies in the very notion of context-independent &amp;quot;conceptual primitives&amp;quot;. The metaphor itself is at fault: it calls to mind a fixed set of dements, like thosc of which matter is composed, out of which all ideus must be compounded. But where concepts arc concerned, primitivity is a matter of focus, Shift the perspective a little, and new elenlcnts swim into view as fundamentals, while former simples become con~plex.</Paragraph>
    <Paragraph position="20"> A more promising metaphor is the analogy to a 'vector space. A set of basis vectors is, in a way, a set of &amp;quot;primitives&amp;quot; out of which all the entities in the space can be composed. These primitives have the appealing property that they are only primitive relative to one frame of reference. Rotatc your point of view, and what used to come natural as basis vectors are now at an angle; they become easier to express as sums of vectors that lie along new axes. That bears a resemblance to what we have seen,in tlhe case of lexical &amp;quot;primitives&amp;quot;.</Paragraph>
    <Paragraph position="21"> Thus far and no further may the analogy be pushed, however. The elements which span &amp;quot;conceptual space&amp;quot; can be no such uniform set of objects as those in a vettor space, while the rules of composition are coextensive with grammar -- at a minimum. Composition of concepts itself contributes to the meaning. (For that matter, it is arguable whether concepts are sufficiently separable to model them as discrete objects at all -- whether simple or composite.) Moreover as &amp;quot;conceptual space&amp;quot; must encompass all things thinkable, the rules of composition must themselves be part of the space. That is, the operators as much as the things operated upon lie within the space to be spanned.</Paragraph>
    <Paragraph position="22"> A seming counterexample to these remarks may be found in the &amp;quot;primitive ACT's&amp;quot; of conceptual dependency theory, as propounded by Schank, Goldman, Rieger, and Riesbeck (121, [7], [8], [9]). On a close reading, however, the &amp;quot;primitive ACT's&amp;quot; turn out to be verb paradigms -- powerful, semantically motivated generalizations about large classes of verbs. The names of these paradigms replace specific verbs as building blocks in the &amp;quot;conceptual&amp;quot; representation of an utterance. The An manbation f~r a Dictionary of Word Senses</Paragraph>
    <Paragraph position="24"> effect is to provide strong guidelines for the inference of unstated information, for the comparison of related utterances, for paraphrasal, etc.</Paragraph>
    <Paragraph position="25"> TO represent a particular verb in terms of these ACT's, however, it is necessary to augment each ACT with various substructures which detail the manner, the means, the type of actor or object, etc. No reduced set of representatives is as yet offered for the adverbs, nouns, adjectives, etc. in terms of which the &amp;quot;primitive ACT's&amp;quot; are qualified. If such additional condcnsntion werc attempted, the elaboration of a given utterance in terms of the full set of &amp;quot;primitives&amp;quot; might well ramify without practical end. In other words, reduction of the set of names L'or nodes (and labels for arcs) must be purchased at the expense of extending the number of them required to represent each utterance, In conceptual dependency representation, just as in the sernantic networks&amp;quot; of Quillinn [6], Simmons ([I I], [12]), Slocum, and others, reality ultimately appears as a shimmering web, every part of which trembles when any part of it is touched upon. Taken in its totality, the system -- as yet -- is entirely compatible with skepticism about a corrrprehensive set of &amp;quot;conceptual primitives&amp;quot; In any case, the verbal &amp;quot;senses&amp;quot; proposed here lie at a far lower level of generality than the &amp;quot;primitive ACT's&amp;quot; used in conceptual dependency theory. In terms of that theory, they come closest to the so-called &amp;quot;CONCEXICON entries&amp;quot; used by Goldman in realizing surface expressions 'of a concept from its conceptual representation [2]. Given a primitive ACT, Goldman narrows it down to a particular &amp;quot;CONCEXICON&amp;quot; entry by applying the tests in a discrin~ination tree to the rest of the structure in which the ACT appears.</Paragraph>
    <Paragraph position="26"> Our lexical &amp;quot;senses&amp;quot;, therefore, are lcft with a humbled role. If they span anything, it might best be thought of as &amp;quot;communication space&amp;quot;, not &amp;quot;conceptual space&amp;quot;. Even in this light, they arc a hugely redundant basis, and a not at all unique one. They form no inventory of the experiences being communicated about; &amp;quot;meaning&amp;quot; is still a step removed, still evoked rather than embodied by the elements of this basis.</Paragraph>
    <Paragraph position="27"> If we persist in calling these things &amp;quot;senses&amp;quot;, it is because that is the traditional term for what is brought to mind as the synonym sets of a given word are enumerated. The tie-in with meaning is tenuous, but the human user is able to supply it. There is at least this much justification for the term: synonym sets, more forcefully than words, direct attention to the points at which a tie-in must be made between the tokens of communication and the underlying representation of &amp;quot;world knowledge&amp;quot; An Organization for a Dictionary of Word Senses 2 1 In a full-fledged system for processing natural language, then, we must envision the &amp;quot;dictionary of senses&amp;quot; as a component stretching vertically across the &amp;quot;upper&amp;quot; layers. Its &amp;quot;sense data items&amp;quot; must link, ifi some way, to the deeper-lying data structures which encode &amp;quot;knowledge of the world&amp;quot; (the &amp;quot;pragmatic component&amp;quot;). The &amp;quot;key data items&amp;quot; and &amp;quot;phrase data items&amp;quot; register tokens to be expected or employed in &amp;quot;surfzrce&amp;quot; utterances. Global and local attributes recorded in the various data items guide parsing and interpretation. Where one takes it froin there depends upon thc linguistic approach to be used.</Paragraph>
    <Paragraph position="28"> An Organization for a Dictionary of Word Senses American Journal of Computational Linguistics Microfiche 50 : 24</Paragraph>
  </Section>
  <Section position="9" start_page="95" end_page="95" type="metho">
    <SectionTitle>
CURRENT BIBLIOGRAPHY
</SectionTitle>
    <Paragraph position="0"> Despite repeated predictions to the contrary, both the selection of material for this issue and the choice of subject categories are tentative. The Editor and his collaborators have found the reconstruction of intellectual and mechanical systems more onerous than they had expected.</Paragraph>
    <Paragraph position="1"> Completeness of coverage, especially for reports circulated privately, depends on the cooperation of authors. Summaries or articles to be summarized should be sent to the editorial office, Twin Willows, Wanakah, New York 14075.</Paragraph>
    <Paragraph position="2"> Many summaries are authors' abstracts, sometimes edited for clarity, brevity, or completeness. Where possible, an informative summary is provided.</Paragraph>
    <Paragraph position="3"> The Informatheque de linguistique de ilUniversite d'ottawa, Dermot Ronan F. Collis, Director, provides a portion of odr entries. AJCL gratefully acknowledges the assistance of J. Beck, B. Harris, and D. Castonguay.</Paragraph>
    <Paragraph position="4"> See the following framfor a list of subject headings with frame numbers.</Paragraph>
    <Paragraph position="5">  Brilish Journal for the Philosophy of Science 26:213-225, Seprrrnber I975 lSSN 0007-0882 Putnam argues for a satirical privacy for machines by asserting that, just us it makes no sense to ask John how he knows that he is in pain, so it makes no sense to ask a Turing Machine (TM) how it knows that it is in stnte n. When addressed to an abstract Tb1 the question is absurd, but not when addressed to a physically realized TM. Putiirlm equivocates about the notion of stnte, discussirig only abstract TM's when introducing the notion of state, but making an argument which is cohcrer~t only with respect ot a physicaliy renli~cd TM. There is thus, in effect, n cot)fusion between 'state' as of an ai1t0111;lt n rind 'stnte' as of a real Y machine which is executing a program which rcalizes that automaton--any of a iiun~ber of states of the machine might correspond to one state of the nutonlaton embodied in the progranl. Thus Putnam's argument fails. Clarke's criticisms of Putnam are rilisguided but instructive. A more serious notion of ir~achine privacy can be constructed by noting that it is impossible to inter the machine's real activitity deterrninately from the content of registers.</Paragraph>
  </Section>
  <Section position="10" start_page="95" end_page="95" type="metho">
    <SectionTitle>
GENERAL
</SectionTitle>
    <Paragraph position="0"/>
    <Paragraph position="2"> The experimental problem solving environrnen t is one of formulating specifying, debugging and executing (algebraic) procedures interactively on a small processor. The speech recognition system is a real time, syntax directed, limited vocabulary, highly cost effective scheme specifically tailored to this environment. The data transformation operations of the language are verbally specified and the control flow is specified graphically as a two-dimensional directed graph. The semantics of the latter structure is independent of the time sequence of its input. An input restricted (conditional input) pseudo-finite state machine model is used for the continuous syntax checking of the input on an atomic token basis and for directing the speech recognizer.</Paragraph>
    <Paragraph position="3">  modules available to it. Technological developmel~ls may well niake it practical for iridividuals to have CNLlS (Cotrputerizcd Piatural Language Inforniution System) terminals in their homes. A typic4 user terrninal may include a microcornp~~ter an interactive TV, and an electric typewriter. In the computerization of Chinese, a simplification of the written characters is urged.</Paragraph>
    <Paragraph position="4">  Five types of Chinese language data processing systems are discussed. 1) Accept assembly code in English and hand-coded Chinese data and use an expanded subroutine library. 2) Accept assembly code and data, both in Chinese. by adding a pre-assembler and translators to the manufacturer-supplied assembler and linkage editor. 3) Accept a high-level Chinese programming language and Chinese data. 4) Those which use the Chinese-language-oriented postfix string as the machine language. 5) In which the high-level language itself is the machine language (i.e. one-level language). This type of data processing system has no intermediate language, no assembly language, no relocatable language, and no absolute language.</Paragraph>
    <Paragraph position="5">  Basic design philosophy of a Chinese-oriented cornpnter includes consideration of the idiosyncrasies of silch a computer, The following topics are d~sci~ssrd: 1) I11 tt'rnal coding of Chiriese characters, 2) Chirlese Input/Output devices, 3) Instruction Kcpertorie to hl;~nipulnte Chinese Characters, arid 4) Miscellaneous. The design approach toward such a Chinese-oriented cornpiiter is also comniented on.</Paragraph>
  </Section>
  <Section position="11" start_page="95" end_page="95" type="metho">
    <SectionTitle>
GENERAL
</SectionTitle>
    <Paragraph position="0"> Is Technology Ready for Chinese/Japanese Data Processing h. J. Crecnblott, and M. Y, Hsiao I BM Yougkkeepsie, New Yurk S. Gould, Ed., Proceedirlgs of the Firsr Internatior~nl S~mposiurn on Computers orrd Chinese Input /Ouipur Sysrems, Acndenia Sini'ca, 1.51-16 1  The technology for the computer processing of Chinese characters on a large scale is almost around the corner. The major bottleneck is the training required to key in 5,000 different Chinese characters quickly and correctly by either a set of keyboards or some cleverly combined coded form or keyboard design. Advances in LSI technology. mechanical or magnetic keys, CRT, etc., will all contribute to the realilation of a d3t3 processing system capable of handling ideographic languages. An automatic pattern recognition system was not chosen to represent the major future trend because its technical development is still b'eyond the level of practical large scale implementation.</Paragraph>
    <Paragraph position="1">  This book describes a theory of man/man or madmachine conversations and cognitive processes (with emphasis upon the dynamics of learning and teaching at an individual level) together with several special experimental methods and practical applications. Most of the illustrations and data supporting the argument stem from education, course design, and similar fields and the material is relevant to epistemology, subject matter organis:~tion, as well as such disciplines as pedagogy, computer aided instruction etc. Some experiments, however, deal with laboratory learning and the acquisition of perceptual motor skills, and an attempt is made to identify the theory and methods wrth inany standard paradigms in social and experimental psychology. An account of consciousness and self-reference is given in the  ...................... Strategies and Conscious Control .................. The Place of Value in a Judgment of Fact ................................ Overview References ...............................</Paragraph>
    <Paragraph position="2"> 4 FORM. FORMATION. AND TRANSFORMATION OF INTERNAL REPRESENTATIONS. Roger N . Shepard .................... 87 ............................ Some Central Issues 87</Paragraph>
  </Section>
  <Section position="12" start_page="95" end_page="291" type="metho">
    <SectionTitle>
SECTION I1
5 RETRIEVAL As A MEMORY MODIFIER: AN INTERPRETATION OF
</SectionTitle>
    <Paragraph position="0"> . . ..... NEGATIVE RECENCY AND RELATED PHENOMENA Robert A Bjorl  GENERAL 6 ENCODING. STORAGE. AND RET'RIEVAL OF ITEM INFORMATION. Bennet B Murdock. Jr . and . Rita E . Anderson ...................... 145 Encoding .............................. 153 Storage ................................ 158 Retrieval ..... .......................... 164 Retrieval at Short and Long Lags ..................... 185 References .............................. 192 7 WITHIN-INDIVIDUAL DIFFERENCES IN &amp;quot;COGNITIVE&amp;quot; PROCESSES. .............................</Paragraph>
    <Paragraph position="1"> Willian~ F . Battig  .......... Some Questions about Current Cognitive Research Practices .................. Processing Differences within Individuals ............................. Serial Learning ......................... Paired- Associate Learning ...................... Verbal-Discrimination Learning ........................... Free-Recall Learning ........................... General Conclusions ............................... References 8 CONSCIOUSNESS: RESPECTABLE. USEFUL. AND PROBABLY NECESSARY.</Paragraph>
  </Section>
  <Section position="13" start_page="291" end_page="291" type="metho">
    <SectionTitle>
12 THE CONSTRUCTION AND USE OF REPRESENTATIONS INVOLVING
</SectionTitle>
    <Paragraph position="0"> ............. . LINEAR ORDER. Tom Trabasso and Christine A Riley 381 ........... Problem Origins: the Development of Transitive Reasoning 382 ................... Training Results: Serial Position Effects 385 Testing Results: Memory and Inference Correlations .............. 386 ........................ Stochastic Retrieval Models 388 ............. What Occurs in Training: The Serial-Position Effect 392</Paragraph>
  </Section>
  <Section position="14" start_page="291" end_page="291" type="metho">
    <SectionTitle>
GENERAL: CHINESE
</SectionTitle>
    <Paragraph position="0"> Interactive Processing of Chinese Characters and Texts J. T. Tou, J. C. Tsay, and J, K. Yoo Center for In forrriatics Research, University of Florida, Gainsville S. Could, Ed., Proceedings of the First International Syrnposium on Computers and Chinese Inpur/Uurput Systems, Academia Sinica 1-28 The system provides a tool to teach pupils how to write Chinese ideographs, how to make proper pronunciations, and how to translate into a foreign language. and features dynamic display of characters. The system can also perform text-editing operations. Techniques for Chinese character representation, based on chain codes for stroke sequence, and dictionary generation, in which each character of subcharacter is represented as a subroutine in the dictionary, are introduced. Text-edi ti ng routines are discussed and the paper concludes With an illustrations of text-editing operations. The final edited text can be transcribed from the display scope for making hard copies. The system will be further developed for editing maps, for typesetting and for use as a Chinese typewriter.</Paragraph>
    <Paragraph position="1">  With adequate programming facilities at hand the phonetician would be able to make his table-top coniputer perform pratically everything that was done earlier by conventional equipment for analysis and registration. In addition, data may be stored for automatic processing, atid sequences of events, such as qunli tative or quan ti trltive variations of parameters or stimuli in experiments with human subjects, can be governed l~ccording to a pre-set program, or by incoming signals of random pulses. Topics considered: 1) the nature of available equipment, 2) programming, 3) speech analysis, 4) speech spnthesis, 5) the computer in experimental work, 6) dir\lcctology and phonology, 7) teaching phonetics.  1. M. Bcnnctt, and J, G. 1,invill Depar~nlenf u f EIectrical Engineering, Sran ford Universi~y, Slanford, California 94305  Journal of the Audio Engineering SociPly 23:713-721, November I975 Time-domain speech compression using the SDA (sample, discard, about) procedure at compression ratios of 0.25 to 0.75 is studied by means of a new analog speech processor and minicomputer algorithms. Fourier transform methods have been used to establish a correspondence between the quality of the reconstructed compressed speech waveforms and the subjective recognition of compressed speech. The result of two psychoacot~stic experiments indicate that 1) the interruption frequency should be equal to the pitch frequency of the voice waveform for optimum recognition of the compressed speech, and 2) smoothing of the discontinuities with electronic techniques significantly improves the recognition of the compressed speech. The optimum smoothing parameters, window width and characteristic function, are also obtained from this study.</Paragraph>
    <Paragraph position="2">  The loci of formant frequency patterns of vowels in many kinds of CVC contexts were represented in FI-F? space. The areas enclosing these loci were obtained for each vowel. The positions of vobels in connected speech lie inside an area surrounded by the isolated vowels because of the neutralization sf vowels io connected speech. The faster the speaking r3 c, the more the areas tend to concentrate deeper inside. Also, the areas of individual vowels overlap each other and the Caster the speaking rate, the more the overlapping areas increase. The overlapping areas were estimated in FI-F2 space and, to investigate the effect of F in F1;F2-FJ space. The distribution of F is nearly approximated by the normal densi ? y funct~on, because the effect on imbalance o? the vowel occurrence frequencies is not clearly observed in the frequency distribution of F3. Areas reflecting the bound of articulatory movement in the acoustic domain were obtained from the loci of formant frequencies represented in F1-FZ and FI-F3 spaces. We conclude with a comparison of the discussed areas and those obtained from the artlculatsry model by Lindblorn.</Paragraph>
    <Paragraph position="3"> Epoch Extraction of Voiced Speech 'r. V. Anathapadnlannbha, and B. Yegnanarayana Deparrrnenr of Eleckrical Communication Er~gineering, Indian Institute of Science, Bangolore 5600 12, India IEEE Transactions on Acousrics, Speech, and Signal Processing 23362-570, December 1975 A general theory of epoch extraction d overlapping nonidentical waveforms is presented and applied to outputs of models of voiced speech production (model 1, impirlse excitation of a two-resonator system; model 2, glottal wave excitation of a two-resonator system) and to actual speech data. Some typical glottal waveshapes are considered to explain their effect of the speech output. The points of excitation of the vocal tract can be precisely identified for continuous speech and it is possible to obtain accurate pitch information by this method even for high-prtched sounds.</Paragraph>
  </Section>
  <Section position="15" start_page="291" end_page="8540" type="metho">
    <SectionTitle>
PHONETICS-PHONOLOGY: RECOGNITION
</SectionTitle>
    <Paragraph position="0"> A high-quality pitch detector has been built in digital hardware and operates ill real time at a 10 kHz snrnpli~ig rate, The hardware is cnpahle of providing energy as well as pitch-period estim;ltes. l'he pitch arid energy cr~mputatioris ;Ire pcrfornied 100 tin~es/s (1.t3'., orice pcr 10 ms interval). 'T'he slgrari thn~ to estinlnte the pitch period uses cell ter clipping, infinite peak clipping, nrld n sirripl if ied nutocorrelation analysis. 'l'htl nnalysis is pc'rforriied on a 300 sample section of spccch which is botti center clipped iind irlfirlite peuh clipped, yielding a three-level speech signal where the levels are -1, O, arid +1 depending on the relation of the originill speech ample to the clipping threshold. Thus computation of the sutocorretation function of the clipped speech is easily inipler;-\en tsd in digital hardware using simple cori1bin3torial logic, i.e., an up-down counter can be used to conipute each correlation point.</Paragraph>
    <Paragraph position="1"> A Comparison of Three Methods of Extracting Resonance Information from</Paragraph>
    <Section position="1" start_page="291" end_page="8540" type="sub_section">
      <SectionTitle>
Predictor-Coeficient Coded Speech
Randall L. Cl~ristcnscn
</SectionTitle>
      <Paragraph position="0"> Naval Weapons Cenler, Chino. Lake, cbli~ornia 93555 William .I. Strong, and E. I'aul Palr~ier Department oj' Ph.vsics and Astronony, Brighnrn Young University, Prov~, Utah 84602 IEEE Transactions on Acoustics, Speech, and Signal Processing 24:s-14, February 1976 The methods: finding roots of the polynomial in the denominator of the transfer function using Newton iteration, picking peaks in the spectrum of the transfer function. and picking peaks in the negative of the second derivative of the spectrum. A relationship was found between the bandwidth of a resonance and the magnitude of the second derivative peak. Data. accumulated from a total of about two minutes of running speech from both female and male talkers, are presented illustrating the relative effectiveness of each method in locating resonances. The second-derivative method was shown to locate about 98 percent of the significant resonances while the simple peak-picking method located about 85 percent  lEEE Transactions of Computers 25:172-178, February 1976 Using a method for correcting garbled words based on Levenshtein distance and weighted Levenshtein distance we can correct substi trrtion errors, insertior~ errors, ar~d delection errors. According to the results of computer simulation on nearly 1000 high occurrence English words, higher error correcting rates can be achieved by this method thn~i any other method tried to date. Short words remain a problem; solving it will probably requite utilization of contextual information, Hardware realization of the method is possible, though coniplicated. PWNETICS-PHONOLOGY: RECOGNlTlCfN Speaker-ldentifying Features Based on Formant Tracks Ursula C. Goldstein Departnlent 01 El~ctrical Engineeriug and Computer Science and Researol Laboratory of E/earonics, Massachuseits lrrstilure of Technology, Cambridge, 02139 journal of [he Acoustic Society of America 59:176-182, January 1976 The formant structure of three dipthongs, four tense vowels, and tliree retroflex sounds was examined in detail for possible speaker-identifying features. Formant tracks were computed for each sound under investigation using covariance-type pitch-asynchronous linear prediction together with a root-finding algorithm. The interspeaker variability of about 200 measurements made on these formant tracks was compared initially with intrnspeaker variability through the palculation of F ratios. Those with average F ratios greater than 80 were evaluated further with a probability-of-error criterion. Features that are potentially most effective in identifying speakers are the minimum second-formant value in [ar], the maximum first-formant value in [ar], the maximum second-formant values of [o], and [ 11. and the minimum third-formant value of [ ] The individual differences apparent in these sounds presumably depend mort on speaker habits than on vocal-tract anatomy. The error bound predicted-for :i speaker identification procedure based on these five features in 0.24%. An identification experiment using only the best two features gave 12 errors out of SO identifications.</Paragraph>
      <Paragraph position="1">  Journal oj the Acoustic Society of America 56:1296-, December 1975 lmplicit in the use of linear prediction is the assumption that within each analysis frame the signal is stationary. The acoustic signal is assumed to be suitably approxininted by a recursion which describes a linear time-invariant acoustic systeni cori~posed of a concatcnntion of equal-length. constan t-d ii~meter nondissipativr t 11 bes. Thn t is, nssoicnted with the coefficients (c ) in the recursion is a styliied articuli~tory configuration which remains fixcd throughout the analysis interval. If we allow the corfficientc to hc functions of ti~ne rather than constants we can obtain a more r~'3li~tic 1110drl in which [lie par;meters of the model chnrige contin~lously and autornntitally which articulation, yarhttr than discontinuously at fixed intervals. The time-varying area function can be estimated by  The determination of an utterance's pitch contour utilizes simultaneous display (r,~ a 10 ms section-by-section basis) ~i the low-pass filtered waveform, the autocorrelation of a 400point segment of the wideband recording. For each of the separate displays (i.e., waveform, autocorrelation, and cepstrum) an independent estimate of the pitch period is made on an interactive basis with the computer, and the final pitch period decision is made by the user based on results of each of the meacurements. Formal tests of the method were made in which four people were asked to use the method on lhree different iltterances, and their results were then compared. During voiced regions, the standard deviation in the value of the pitch period was about 0.5 samples across the four people. The standard deviation of the location of the time at which voiced regions became uilvoiced, and vice versa was on the order of a half a section duration, or 5 ms. The major limitation of the proposed method is that it requires about 30 min to analyze 1 s of speech.</Paragraph>
      <Paragraph position="2">  The system has three portions: pitch extraction, segmentation, formant analysis. The pitch extractor uses an adaptive digital filter in time-domain transforming the speech signal into a signal similar to the glottal waveform. Using the levels of the speech signal and the differenced signal as parameters in tim,e domain, the subsequent segmentation algorithm derives a signal parameter which describes the speed of articulatory nlovenient. From this. the signal is divided into &amp;quot;stationary&amp;quot; and &amp;quot;transitional&amp;quot; segments; one stationary segment is associated to one phoneme. For the formant tracking procedure a subset of the pitch periods is selected by the seglnentation algorithm and is transfor~ned into freque~icy domain. The formant tracking algorith111 uses a niaximum detection strategy and continnity criteria for adjacent spectra. After this step the total parameter set is offered to an adaptive universal pattern classifier which is trained by selected nlaterial before working. For sationary phonemes. the recognition rate is about 85 percent when training material ad test material are uttered by the same speaker. The recognition rate is increased to about 90 percent when  Pauses are used to delimit utterances into segments. Linear regression analysis of pitch patterns allows a 4-way classification of slopes of lines: fast rising, rising, level, falling. These are the 4 Fundamental Pattern Features (FPF). A combination of 2 or 3 (of the 4) FPF's per segment of utterance is a pitch pattern (80 possible). An intonation pattern is a com~ii~ation of pitch patterns. The position of the highest frequency vdue in the utterance is important. In comparing 2 utterances. if the high point occurs in different segments the intonations are contrastive even if the pitch patterns are the same. Of the 80 possible pitch patterns, some must be recognized as cardinal patterns and same as cognate patterns to the cardinal patterns. Different sets of rules must be used for the &amp;quot;high&amp;quot; segment and the final segment of the utterance.</Paragraph>
      <Paragraph position="3">  A clustering algorithm is n~ainly a two stage process: 1) selection of 3 pairwise similarity measure between every two saniples or objects in the data set, 2) the similarity nieasure is used in a sorting procedt~re whereby groups of similar san~nles :Ire extracted. In 3 graph-theoretic clustering algorithni n graph is constriictcd fur [tic. given d;\t:~ arid subgraphs G satisfying certain properties are obtained. The clust2ring algorithm fct~ti~res a flexible method of edge construction (k-nearest neighbor thrttst~i~ld ii~cthod), uhich nllu\+s [tie grouping of samples to be riiorc effective, and the generalired Frisch's I;~helling algorith~i~, which detects and rrnloves the possible chaining effect in the d:lt;l. The al~orithtn is applied to the recogrii tion sf nasal consunan ts.</Paragraph>
      <Paragraph position="4">  derived through the use of an automatic performance evaluation system. Over 3000 hand-labelled spectra were used. The most si~ccessful classification i~iethod involved a linearlymean-corrected minimum distance measure, on a 30-point spectral representation with a square (or cube) norm. Straight minimum distance is the worst performer. The question of appropriate point representation is really one of adequate information retention. The SOpoint representation contains too many components above 3kHz while 20- and 10-point representations contain insufficient information relative to the classes to be discriminated. The value of the norm exponent primarily relates to the weight given extrerrla in the norm kernel; a heavier weighting (2 or 3) should be placed on extrerna.</Paragraph>
      <Paragraph position="5">  The machine recognition of Mandarin mono-syllables seems to be feasible at the present. An integrated recognition procedure of monosyllable utterances has also been suggested, and some results are described. The basic syllable structure contains three major parts: initial, tone, and final. The initial contains only consonantal phonemes of four different categories: sonorant, plosive, fricative, aspirate. There a re four tonemes in Mandarin Chinese; the pitch con tours cover only the final part of the syllable. In the vowel part of a syllable, although seven phonen~es are sufficient to describe all possible vowels, the final can also con tuin dipthorigs and tripthongs composed of more than one vowel phoneme. An itegrated recognition procedure of monosyllable utterances has been suggested, and some results are reported.  The system for producing consonants is a noise generator followed by a pole-zero resonator. while that for producing vowels is a quasi-periodic pulse generator with variable period followed by a resonator with three variable poles called formant frequencies, ranging f rorn 0 to 3 KHz. The results of analyzing, by means of sonagrams obtained from ten male voices, show that the 16 vowel phonemes can be classified into two classes as single and compounded vowel sounds. Some of the synthesized single vowels are very monotonic and can be recognized. Others, with third formant frequency slightly greater than 3 KHz are not as clear, due to the fast decaying of high frequency components in the generated pulses.</Paragraph>
      <Paragraph position="6"> The compounded vowels are also synthesized by a step variation of formant frequency derived from its components. The result is also well recognizable.</Paragraph>
      <Paragraph position="7"> The sonogram analysis of a continuous Chinese speech shows that every word and its spelling phonemes are quite independent and separable, and are therfore very different from English speech.</Paragraph>
      <Paragraph position="8">  In design of 10 systems for human graphics it is necessary to simulate the activi!,, of writing and not the graphic result of that activity. Orthographic rules are essentially sets of criteria for determining proper seriul order of graphic signs, upon which further suhsets of phonetic, semantic, graphic and other conventions are imposed. Chinese orthogriiphy is of the polyalternnting polyvariable type in which a nunlber of undefined subsets of graphic signs conibine with each other in any of several possible juxtnpusitional nludcs according to rules not yet fully elucid:~tcd. But if tile logography is not converted to un in~rarinble series it cannot be input to, m;~nipul;~tcd in, or outpul from a digital compt~tt.r, The temporal series in which elements are coriiposed into logograph$ is variably serial, and therefore computer conipatible. Graphemic synthesis is a procedure by which logographs are niechanically produced in a manner simulating the nwnla1 writing procedure. Since logographs are synthesized from a sn~:lll finite set of coriiponent elements, there is no need to prestore logographs, but only the small grapheme set. Output in normal log~gr~iphg is achieved as needed only at the output end by reve.rs;ll of the synthesizing process. It is possible to achieve a synthesis at somewhat less than the ideal level, pseudographemic s-vnthesis, and this has been implemented in the SlNCO system.</Paragraph>
      <Paragraph position="9"> WRITING: RECOGNITION Feature Extraction on a Finite Set of Binary Patterns l'aul 1'. \Vans, and William S. Hodgkiss, Jr.</Paragraph>
      <Paragraph position="10"> Departrr~ent of Electrical Engineering, Duke University, Durham, North Carolina S. Gould, Ed., Proceedings of the Ficst infernational Synzposium on Cornputers and Chinese Input /Output Systems, Academia Sinicu, 183-194 A &amp;quot;best&amp;quot; subset of mutilally orthogonal features which are minin~um in number but sufficient to discriminate a finite set of patterns is chosen from a much larger set of available features in a systematic and deterministic manner by a heuristic program based on the criterion of maximum separability. The unique code words for the finite set of binary patterns are established through a learning procedure derived from a theorem on necessary and sufficient conditions for mutual independence of these vectors over a binary field.</Paragraph>
      <Paragraph position="11">  stroke is ideally a line segnlent or a cuncntenation oh sevcrnl straight line segments. Fach straight line segment has its starting point. direction and length. Therc is also a specified sequence among these segments. The specified sequence of line segments for a character is the same as the seq'uence of th'eir starting points. Therefore, when a character is drawn on the tablet of the hgitizer, the output paper tape containing the (x,y) coordinates of sa~nyling points of each of its line segments presents these points in the proper order. The preprocessor produces a sequence of siniglif'ied straight line segments, each with a direction code and length, from the paper tape input, and sends the results to the classifier which constructs 3 dictionary of characters which it then uses in the recognition process. The program achieved 95% recognition for a test sample of 300 Chinese characters.</Paragraph>
      <Paragraph position="12">  A Chinese character may be called a kind of block picture. Each stroke seems to be a hieroglyph. Curves formed by several strokes or by a continous stroke do not happen very often. Sorne characters only differ by a single stroke. If the Markovian processes mentioned in this paper are used, detailed recognition for each row and colunin is available and even a single stroke would not be missed absolutely, so the increasing the degree of correct recognition is by no means a proi~lem. A plane block picture may be divided i'nto plane blocks while a solid one can be discussed by dividing it into solid blocks. The greater the number of layers, the higher the degree of correct recognition. The input. pattern is divided into 20 layers for individual recognition. If every layer satisfied the condition, then recognition is complete. If one of the layers has a great difference, then the pattern should be another picture.</Paragraph>
      <Paragraph position="13">  A set of features believed to be useful in classification and recognition and which is deduced from topologiciil properties and heuristic properties is pro,posed. AII encoding schenie offers a unique code word for each character (signature) of a dictionary of about 6.000 items. A three-st~ge n~nchine recognition system, based upon the optimal ~~~i~ltiple category classif ic:ilion principle, has bee11 proposed to solve r he problem of ui~to~naiic reading of Chinese characters. A by product of this research 1s the dcveloplnent of s topolugic:llly based Icxicographic;~l ordering for a useful Chinese dictionary. Finally, some reco~nmendations concernilig machine recognition of printed Chinese ideographs are made.</Paragraph>
      <Paragraph position="14">  in the FCL (FACOM Composition Language) System, information is punched on paper tape with a Kanji keyboard. The layout data turns out the forms of finnl printed matter as parameters. punched with an alphanumeric keyboard. and these are applied to the FCL as an input together with the text data. The results of the editing by the FCL are output to a cassette magnetic tape, transferred to the photo type-autosetter, and are printed on film. At this stage a print for correction is completed and this print is placed in the c0rrectio.n processing cycle. The correction processing generates the corrected data concr rn ing errors in the test and layout data and this corrected data is again input to the FCL. The FCL saved file is utilized as the objective of the correction processing. The FRAME program is the portion of the layout control system which defines paragraph groups which have the same character and shape.</Paragraph>
      <Paragraph position="15">  This Kanji printer consists of a character generator and a printer. The character generator has a small rotating iniage disc with 5,376 characters printed on it, and the character patterns are converted into video signals by a vidicon. The printer has an optical fiber tube and a photo-electrostatic recording element. Reproduced character patterns on the surface of the optical fiber tube are recorded on the dielectric coated paper by a photo-electrostatic element. This Kanji printer is capable of printing 100 characters a second, aid is usable for any application of printing in Kanji.</Paragraph>
      <Paragraph position="16">  The storage unit (SU) provides a permanent filing cabinet for storing Chinese characters in any predetermined binary-coded form. The stored information is addressable and readable froin external control. The SU contains a large-scale cellular array of Read-only Memories (ROM) with the associated address decoder, control logic, memory addreddata registers and sense amplifier to enable readability The output unit is used for printing or displaying decoded Chinese in ideographic form. It consists of a k2-segment character decoder, two buffer registers, and auxiliary display terminals (DT). The DT may take many forms, such as a multiple-head printer, D/A converter and storage CRT monitor or a graphical display console make with an array of light emitting devices (LED).</Paragraph>
      <Paragraph position="17">  The source patterns are scanned with a vidicon camera and recoxied on magnetic tape. The following procedures are used: 1) Noise elin~ination and smoothing of scnnned patterns, 2) Line enhancement, 3) Matrix size conlpression, 4) Inreractive refinerner~t, 5) Auton~atrc veneration of Chinese characttr read only nlcmory (ROM) patterns. l'he obtair~ed dot b patterris are then translated into a paper tape for numerical coiltroller which in turn drives the wiring systrnl for tllr read only memory. Chir~esr char;lctrr line printers. CRT displays, and other Chinese character output devices can be implemented by this Chinese character rend only memory.</Paragraph>
      <Paragraph position="18">  WRITING: TEXT INPUT: CHINESE ArSystem Design for the Input of Chinese Characters through 'the Use of ~hbnetic and Orthographic Symbols.</Paragraph>
      <Paragraph position="19"> ti. C. Li, S. 1'. Hu, C. L. Jcn, H. Chou, S. Shnn, and E, T. Chen  The following principles have been observed in svstem design: 1) Easy to Learn and Use, 2) inexpensive to Implement. 3) Higher lnput Rate. 4) Unique Code for Dictionary Search, 5) Facilitate Other Related P,pplications. The input of Chinese characters is through ti]: use of ?honetic and orthographical symbols. The total number of symbols needed to transcribe a single Chinese character varies from a lower limit of three to a maximum of eight. A single Chinese character requires a maximum. of three phonetic symbols and one intonation notation to indicate pronunciation. By coupling one to four of fifteen orthographical symbols with the pronunication symbols, each Chinese character can be uniquely transcribed into a set of symbols which indicates the pronunciation as well as the orthographical structure of the ideosraphy. No new hardware is required for implen~entation. With some minor modifications, the keypunch machines and other typewriter-like input peripherals now available on the market can be used immediately.</Paragraph>
      <Paragraph position="20">  A typing Keyboard is proposed which is arranged like an English teletypewriter, using an 8unit code as in an ASCII code with even parity Switching a lever key, you will be able to type Chinese or English. Whe~ you type Chinese, you need only to push a key three tirnes at most to con~plete the selection of a character. Then the character will come out by pushing the space bar once. If it is so arranged as to push a key three times for all characters, we shall be able to save the process of pushing the space bar for every character. The rules of decomposing Chinese characters are studied. The key layout of the keyboard is so arranged as to make recog~ition of key position easy. Four simple typing rules have been determined. There are only 21, out of the 3,000 characters which often appear in current newspapers, that share the same 3 keyed codes, and so they have been treated as exceptions.</Paragraph>
      <Paragraph position="21">  Chinese text is coded into phrases separated by delimiters. then the phrases can often be decoded unambiguously to obtain tht corresponding string of Chinese characters.</Paragraph>
      <Paragraph position="22"> The file structure for the PEACE system consists of a character file and a phrase file.</Paragraph>
      <Paragraph position="23"> Chinese characters are stored in the form of a composition rule. The phonetic encoding method has also proved to be quite satisfactory, especially for the generation and editing of Chinese texts, where more than 60% of the characters are embeded in phrases. Character encoding rates of the order of 30 characters per minute can be attained with this system.</Paragraph>
      <Paragraph position="24">  The Following are assumed: 1) A11 Chinese ideographs are composed of one or more components, and thus may be classified by the pattern of these conqxments. 2) Each of the cormnents is in turn conqposed of one or more graphic elements. The total number of gralpvic elen~cnts is fairly limited. 1deogr;iphs may be divided into four mujur pntterrls: Hontontal. Vertical. Bordered, ncld Independent. After ide~~lific:~tion of 80 ideogr:~ph's pattern, a component's structure, and the husic elements, one c:ln then perforni the codirig by followilig rules for: 1) Ideograph as a wliole. 2) Coniponents. 3) Ele~l~e~its, 4) Rrl;it~u~rship or separation signs, 5) Coding sequence, 6) Conipone~~t of bordered pattern, 7) lndept*ndent ideographs of components.</Paragraph>
      <Paragraph position="25">  A set of symbols has been derived by dividing up each in a set of 4,600 Chinese characters. This set of symbols can represent practically all Chinese characters. These syrnbols can be used as the alphabets of the language, except that they offer no phonetic information on each character. The spelled.-out Chinese is readable without ambiguity and referential :.id. The immediate and long-term applicatioss of these alphabets are discussed.</Paragraph>
      <Paragraph position="26"> Extensive examples are given.</Paragraph>
      <Paragraph position="27">  Chinese characters maybe defined as objects produced by a generative grammar suitably extended to two din~ensions. At the heart of this extension is the coordinate-free configuration operator which. together with its inflections, permits the desired stroke linking to compose a given character. The character so generated corresponds then ts a derivationtree which reveals structural properties common with other characters. This tree in turn leads to a quasi-algebraic expression for the character which can be coded to give character indexing for storage and retrieval purposes. The recognition problem is considered briefly. Some experimental results with a character design language are presented.</Paragraph>
      <Paragraph position="28">  Chinese characters are characterized by strokes which are well distributed in n block space of a definite size. The use of radicals plus weight design allows composition of characters from radicals to yield pleilsing results. Radicals with fewer strokes are given srn;illcr weights: those with conlplicatcd strokes are given greater weights, so t hiit the character obtr1ir1s.d f run the composition is eveilly distributed in the block. The characters thus coniposrd directly look very 111uch like those integral characters \Shicl~ are obt~~iried f roni dot rll:ltricc's without decon~position. Using only 496 radicals, 18.713 Chiriese characters can be gentrated. The radical system proposed is a precedence grammar  There are two steps in composing a Chinese character from its radical formula. 1) Calculate the position of each radical and the area occupied by each in the character. 2) Compress each radical and place the radicals illto their right positions. There are seven methods for representing the radical system in a computer: 1) dot matrix method, 2) absolute line segment method (the method adopted in implementing the system). 3) relative line seement method, 4) core method, 5) chain code method (basically a line segment method), 6) analytical method, 7) mixed mode, which requires the smallest memory space (7.992 K Bytes) to store all 496 radicals.</Paragraph>
      <Paragraph position="29">  including more than one common radical. 100 common radicals were selected for use iq the system and nearly 70% of the chnracters belong to class 2. Since most of thel common radicals are on the left sides of characters, we suggest indexing a cllarncter according to its upper or right stroke form. In the resulting system it is necessary to learn only 47 indices, The characters as well as the nuin parts are equally distributed under the indices. The system is easy to learn, fast in operation, has a large vocabulary, low in cost and is readily implen~entable on a mini-computer,  S.Gould, Ed., Proceedings of the First lrlternational Symposium on Computers and Chinese input /Output Sysrcrrts, Academia Sinica, 1007-101 2 We assume that Chinese characters are digitized into n by n matrices of 0's and 1's and represented in packed from ih computer storage, usually in a magnetic disk. Assume that an unblocked indexed sequential method is used for nrgani~ing the c.p. (character pattern) file. Each rgcord contairis a 2-bvte key and 34n bytes of data where n is the nunlbsr of Chinese character patterns cotitainei in the record. We :lssurne here that we use 2 bytes fur character ID and 32 bytes of character patterns, If we illcrease the number of character patterns ir] each record from one to four, we reduce the nuniber of records per track (disc 111eriiory) by a factor less than 2 A study of contexts of occurrence will tell us what c,p.'s should be stored together. Thus a record shoi~ld contain not only the c.p. of a given character, but the c.p.'s of some characters that would follow the given character with high probability.</Paragraph>
      <Paragraph position="30"> WRITING: CHINESE An Melligent Terminal for Chinese Character Processing F. F. Fang, C. N. Liu, and D. T. Tang IBM Thomas J. CVatson Research Center, Yorktowrt Heights, New York S. Gould, Ed., Proceedings of the First International Symposiunr on Conlputers and Chinese Input /Output Systems, Acadernia Sinica, 103- 114 The proposed terminal system consists of three modules: Input Output. Control. The input module includes: 1) a character board consisting of a position sensing input device or an array of keys which may be used to input a selected character by transforming its x, y positions to any desired character code. 2) a general purpose keyboard which may contain 50 tn 300 keys each of which is associated with a discriminating mask, and 3) a set of discriminating masks which, when used singly or as a group, perform the character set selection lo_eic. The autput module consists of: I) a character description file which is organized according to the character code and colitains information for pneration of characters for pri~ting or display, and 2) a printer or display for physical output of characters. The control for the input and/or output system is obtained ttjrough either software or firmware. Various degrees of intellieence may be programmed to achieve additional character selection and character generation features.</Paragraph>
      <Paragraph position="31"> WRITING: CHINESE 55 Techniques for the Implementation of a Chinese Input/Output System G, W. Crawford, and S. H. Chung Weslinghouse Electric Corporarion, Pittsburgh, Pennsylvania S. Gould Ed., Proceedings of the First international Symposium on Computers and Chinese Input /Oufpul Sysrems, Academia Sinica, 96 1-968 The input problem is solved using an extension of Wang's Four-Corner Dictiorary Method that uniquely describes a Chinese word by a six digit number. The technique is then simplified by utilizing a five digit number pnd an interactive CRT terniinal that allows complete resolution of ambiguities. Various methods of output are briefly considered and it is concluded and a Computer Output Microfil~n (COM) unit is the nlost. logical device presently available. A specific example of an implementation of n Chinese input/output system based on a. CDC 7600 computer driving a Stromberg-Dotagraphix COM unit is described and a specific example of the output included.</Paragraph>
      <Paragraph position="32">  Input is handled by the author's mixed typing system Modern Chinese typewriter w'hich has a total of 2400 normal and half-size Chinese words which permit the form~ion of the necessary 8000 common Chinese words. With this snialler nun~bet of Chinese words it is possible to represent then1 by :different pudched holes on the same kind of punch card as in the occidental computers. Any occidental computer and any new Chinese computqr could be used with the system for Chinese language as well as for English and other languages. Output is ihrough the author's Modern Chinese typewriter equipped with automatic vping and positioning devices.</Paragraph>
      <Paragraph position="33">  The system consists of: 1) HP 2100A n~ini-coniputer. 2) keyhoard of 640 keys. 3) or keyboard of 88 keys, 4) tape reader; 5) electrost:\tic printer, h) display scope. Chiriesc chilrnclers may be fed into the computer eilticr throt~~h ke),htxlrdz; or L tirough paper tape Outpul is prinrtd by the printer and alstl displ;\yt.d on thc s~.~)pe. The input radicals are coqlbincd into cliaructcrs by the cori~positic~ii nlcttlocl in the coniputcr. After three huurs of fi~nlili;~riz;~tion, m operator nlay reach n spt.t.il rlf 10 chi~r;~c.tcrs pcr r~iiriutc, It is cstiriiated that 3 speed of' 30 cprn 111;1!~ bc rt\;lc.tit.d in ;r nlonrh. 7 il bc conip:~tit~lc MII th other input rrlrthods (phurirtic alphabets, four-corner code, standard telephone code, etc.) only s table is needed. Sirlce it takes niertsly 1 nis to cor~~pcrss ;I character, tirlie sharing may be utilized,  First a dictionary of the corpus (here is an t.str;~ct from S. h4;lugharn's Tlie Painied I'cil) is produced. Classification of words is based on the maxirnt~~n infurn~ntion principle which considers that for a given number of groups, the greater the qmntity of information. the better the distribution of the words into these groups. The words are distributed into two groups in such 3 way that the quantity of information associated with this classificati.on is C maximized. Dichotomization is carried on until the statistical uncertainty on the amount of information is greater than the zain of information obtained by a new dichotomy. For each sentence of the corpus, the code produces a structure bxed on the degree of correlation of two consectltive words. Inside the sentence consecutive words are connected two by two in order of decreasing degree of correlation,  The distribution of a word in a text can be described in two ways. by the mariner in which it differs from that in some other texts and by the manner in which it varies from one part of the text to another, Thematic words are those which stand out in comparison with a given background. The use of the proper statistical techniques (a measure due to Hassler-Goransson which is defined as the chi-square value divided by the number of degrees of freedom is suggested) makes it possible to study the way in which words enter and leave the scene in a pattern which characterizes the text in mircl~ the satlie way as the intrclt s and exit s determine a play, Using more sopliisticated methods it is possible to study the strength of connection between any two parts of the text and a segmentation of the text into internally mote strongly connected and mutually more loosely connected portions can be tested or even tnechanically suggested.</Paragraph>
      <Paragraph position="34">  Four items of information are recorded for each entry. (1) the form the word as it is in the text, (2) the lemm~. the form of the word in a dictionary. (3) a detailed reference to the form. (4) its coded analysis. from morphological and syntactic viewpoints. It was found that for Caesar's Commenrarii de bell0 gallico the 704 words which occurred more than 10 times covered 86.03% of the text Studies of this sort are important in determining what vocabulary shoyld be taught &lt;o students; there is no immediate need to teach words of low occurrence.</Paragraph>
    </Section>
  </Section>
  <Section position="16" start_page="8540" end_page="8540" type="metho">
    <SectionTitle>
LEXICOGRAPHY -LEXICOLOGY: DIALECTOLOGY
</SectionTitle>
    <Paragraph position="0"> An Organization for a Dictionary of Senses Dick H. Fredericksen IBM Thorllas J. Watson Research Center, Yorktown Heights, New York IBM Research Reporr 5548, 4 June 1975 &amp;quot;Senses&amp;quot; are represented separately from &amp;quot;wordings&amp;quot; and the mutual connections between them are made explicit in both directions, &amp;quot;Wordings&amp;quot; may be either single words or multi-word phrases which are on the same footing with regard to &amp;quot;sense&amp;quot; connections. Each word is associated with an exhai~stive list of the phr:\ses in which it occurs. Classifiers and features, drawn froni appropriate sets, may be attributed separately to words, to phrases, to senses, or to particular senses of words or phrases (i.e. to particular wordings of senses). The data items which represerlt scnses are globally chained, and may be exhaustively listed. The dnt;\ itenis which represent words ore accessible as &amp;quot;le;lves&amp;quot; of a lexical tree and may be retrieved either by lookup or volr~n teered in alphabetical order. The &amp;quot;senses&amp;quot; represented in this dictionary are not a set of primitives into which I~urn;~n experience can be decomposed; meaning is still a step removed, still evoked rather than embodied by the elements of this basis. In a fullfled~ed system fur natural language processing the &amp;quot;dictionary of senses&amp;quot; could be envisioned as a component stretching vertically across the &amp;quot;upper&amp;quot; layers. The &amp;quot;sense drlta items&amp;quot; must link to th'e deeper-lying data structures which encode &amp;quot;knowledge of the world.&amp;quot;  A collection of English language data based on free associcition is organized as a network: node = word, arc = associative frequency. We obtained 100 responses for each of 8,400 words; the net has 56,000 nodes. For each of 35,000 words we can obtain an environment of related words and this environment is fairly large and relevant in content for 8400 stimi~lus words. The environment can be clustered into subsets, which turn out to be a semantic sorting of the environment. The search can be forward (stimulus to associate), inverse, or in both directions. Since growing environments can produce ponderously large subsets of network, environments are lin~ited by techniques involving transmittance to each node, the number of paths traversed which lead to each node, path length, and a frequency cut-off. It seems better to consider forward and inverse environments separately.</Paragraph>
    <Paragraph position="2"> Stochastic context-free langyiges as models of NL. A senttnce is parsed first and transformed into a sentence dtrivstion, and then this derivation, expressed as a string of results of &amp;quot;productions,&amp;quot; is coded by coding each production.</Paragraph>
    <Paragraph position="3"> For this coding procedure, it is shown that* if the productions are divided into groups of the same left-hand side and if each production group is 'Huffman-coded, then the mean code length is less than the sum of the entrop,y and the mean number of steps of sentence derivations. Futliermore. under certain conditions this code becomes optinial in the sense that the mean code length and the entropy coincide, that is, there is no uniquely decodable code with shorter mean code length.</Paragraph>
    <Paragraph position="4">  A stochastic context-free grammar (scfg) is an automaton which stops when a sentence is generated. However, in investigating the information theoretic properties of a long text is convenient to use a scfg-based system which returns to the start symbol and begins the eeneration of another sentence as soon as it completes the generation on one sentence. The C system becomes a Markov chain having the set of all derivations as the state space, which is called hece the Markov chain associated with an scfg It turns out that: 1) The Markov chain assoicated with an scfg is irreducible. 2) For the chain to be recurrent it is necessary and sufficient that the language generated by the scfg be a probability space. 3) For the chain to be positive recurrent, it is necessary and sufficient that the mean number of steps of the sentence derivations be finite. 4) It is well known that when a Markov chain is positive recurrent it has an invariant distribution and its entropy per step HI is defined. If an scfg satisfies certain conditions, we have HI = H(E)/(M(E) + 1).</Paragraph>
    <Paragraph position="5">  Beckniann's error-detecting code model for the structure of natural languages is used in a simple efficient procedure for generating a large number of English sentences with simple gramni;~ticnl structure atid a high rate of word repetition--such as dingrioslic messages of a compiler, messages in interactive systems, etc. The prarnmar dcscri bed is irnplerntln ted with a state table with S states: Article. Numeral, Adjective. Noun, Conjunc(ion, Preposition, Verb, Auxiliary Verb. The syntactic type of the varius words is specified by the location of the entry in the dictionary. Sentences are formed by supplying the routine with a sequence of poillters to dictionary entries; the routine itself checks wlirther this is a possible sentence, calculates the correct check morphemes, and puts a period at the end.</Paragraph>
    <Paragraph position="6">  A truth-definition (in the manner of Tarski) or something to the sanie effect must be a par1 of any adequate semantic theory. The syntax of a Montague grammar is a simultaneous recursive definition of all of the syntactic categories of the language, For every syntactic category there must be a unique corresponding semantic category and for every syntactic rule that combines (operates on) phrases of categories A and B to produce a category C, there nilst be a unique senian tic rule that operates on the corresponding semantic interpretation to give a semantic interpretation for the resulting phrase; that interpretation will be of the semantic category corresponding to the syntactic category C. Techniques involving labelled bracketing and &amp;quot;starred variables&amp;quot; enable the additions of transformations to Montague grammars.</Paragraph>
    <Paragraph position="7">  1 Language and lnfornlation Processing, Dominic W Mqssaro 1. introduction . , . . . . . . . . , . . . . . . . . . . . . . . , . . . . 3 11. Information Processing . . . . , . . . . . . . . . . . . , . . . . . . . 5 Ill. Auditory Information Processing . . . . . . . . . . . . . . . . . . . . . 7 IY. Visual Information Processing . . . . . . . . . . . . . . . . . . . . . . 21 V. Conclusion . . . . . . . . . . . . . . . . . , . . . , . . . . . . . 27 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Part I1 Speech Perception 2 Articulatory and Acoustic Characteristics of Speech Sounds, Lucinda Wilder 1. Introduction . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . 31 11. Production of Speech Sounds . . . . . . . . . . . . . . . . . . . . . . 33 111. General Acoustic Properties of Speech Sounds . . . . . . . . . . . . . . . 36 1V. Articulation of Speech Sounds . . . . . . . . . . . . . . . . . . . . . . 43 V. Occurrence of Speech Sounds . . . . . . . . . . . . . . . . . . . . . 46 VI. Vowel Phonemes of English . . . . . . . . . . . . . . . . . . . . . . 51 VII. Coarticulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 VIII. Consonant Phonemes of English . . . . . . . . . . . . . . . . . . . . 57  benefits come fro111 making the distinction. Plans stated in high level progranlrrling languages are most appropriately scen as texts, since they do riot seen] to he either real world procedures or procedures for handliog natural language, As a pe~leral n~erhod of pursing nafural language. expectation is radically defective ilnless it has some general slid svste~i~atic capacity for attending to what it is rending. The important dist~nction betwen catlses and reasons in the esplnnation of hum;tn behnvior should have procedur;ll and not only t:lsono~i~ic rcflction in an underst;lnding system. Tenipl:\te to tenlplnte (in preferential setnan tics) inference rules are: GOAL, CAUSE. IM1'IL.IC. The CAUSE/GOAL distinction often rstuces to no niore than the temporal directionnlity of the rule. Real world knowledge can be represented quite usefully at quite local levels in an llnderstanding and can function there as part of a general systeni of lit~guistic inference. The use of frames threattns to swamp us in large frame structures for relatively trivial nlatters ;~nd it is not cleilr how frames are rrlr1tt.d Io lower level structures. We need multiie~el represen!:tion. It is not at all obvious that a NL understanding system should be resporlsible or modeling general knowledge of the world.  Some of the arguments which have been given both for and against the use of natural languages in qnestioning-answering (QA) systems are discussed. The following systems are considered in evaluatingr the current level of QA system development: LSNLIS, REL, SHRDLU. REQUEST There is a trade-off between syntactic and semantic complexity. A yystem with relatively simple syntactic capabilities must have complex semantic analysis procedures while a system, such as REQUEST, with sophisticated syntax can produce i~nderlying syntactic structures which directly reflect meaning without the need for &amp;quot;creative&amp;quot; interpretation. A brief comparison between processing times in LSN LlS and REQUEST is given.</Paragraph>
    <Paragraph position="8">  This theory of question answering is based on the SAM (Script Applier Mechanism) mechanism. In the interpretative phase it takes a question in Conceptual Dependency form and categorizes it in terms of particular question types: 1) why. 2) how, 3) yes or no, 4) occurrence, 5) component. Each question type corresponds to a specific form of CD representation. In the response phase the memory is searched for the answer; this may involve nothing more than simple inforni:itioi~ retrieval or it may entail inferring the answer using gener;ll knowledge of the world. The systenl tries for co~npleteness in answering; a yes/no q\~estion will be answered with yes or no plus an account of that answer. Work is being done on a Generation-Selection paradigm in which each question generates a number of feasible answers (the problem of memory representation) and a selection procedure chooses among them (pure QA). Selection rules are presented and discussed.</Paragraph>
    <Paragraph position="9">  The 'text-structure-world-structure theory' (TeSWeST) is an empirically motivated logicoriented theory aiming at the gramn~atical description of a text as a con~plex sign (intensional semantic description) and the assignment of the possible extensional interpretations to the intensional-semantically described text structure (extensional-semantic theory). The intensional-semantic and extensional-semantic descriptions are si~ch that they also contain the description of the pragmatic' aspects. The grammatical component of the TeSWeST is a generative transformational text grammar operating with linearly not fixed canonic basic structur? The formation rule system of this grammar consists of a o-called comn~unicative E rule (R ) expressed informally as: A communicative basis (TB ) is a communicative predicate-complex: a communicator (C1) communicates (COMM) to a (potential)</Paragraph>
    <Paragraph position="11"> The elements of the norme implicit representation of the text intension are definienda in the lexicon and the elements of the normed explicir repxsentation of the text intension are definientes in the lexicon. The task of the extensional semantic component is to assign possible extensional semantic interpretations to the possible in tensional semantic representations.</Paragraph>
  </Section>
  <Section position="17" start_page="8540" end_page="8540" type="metho">
    <SectionTitle>
LINGUISTICS: METHODS
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="8540" end_page="8540" type="sub_section">
      <SectionTitle>
Linguistics and Artificial Intelligence
Petr Sgall
</SectionTitle>
      <Paragraph position="0"> Centre of Numerical hiothemo~ics, Charles University, Prague Prague Bulletin of Moihemorical Linguisrics 24.9-33, 1975 (1) Winograd's approach to language has forced a reeval. ~tiotr of the relationship between the theory of competence and the study of performance, pragmatics, etc., though it is still wise to ack'nowledge the descri~ption of language as a relatively i~~dependent enterprise. (2) The use of lingi~istics descriptions in A1 provides a test for l~nguistic theories. (3) Winograd's &amp;quot;imperative form&amp;quot; of representing knowledge and semantics is more effective than one based entirely on deductive logic. Considerations of topic and focus, theme and rhemt., furictional sentence perspective, etc. are siniilnrly &amp;quot;imperative&amp;quot; in their (:niphasis or) the dirfrrcli t use^&amp;quot; which various iteriis of information in a comniunicntion hint.. (4) Wiriogr;ld's work, in his wav of ir~truducing new definitions in to the sen1311 tic cotnp~~nen t, sl~ggcsts that in m:lnmichine coniii~unication the burden of learning the other participant's 1anpu:lge niay be shiftable from m;ln to the cotnputer. (5) The significance of linguistics for Al is connected with making prosramming languages more and rnort. like NL':;. (6) The study of tense and time and the study of nega~ion illlou us to see how linguistic n~esning and non-linguistic content can be distinguished.</Paragraph>
      <Paragraph position="1">  Four procedures based on binary tree structilres aR presented alor~g with PL/1 code. 1) Sgmrnetric or inorder traversal (Knuth) exploits the powerful syntax of PL/1 in the i~se of modular code sr~d recursion. 2) Esplicit Stack (Knuth). 3) &amp;quot;Threading&amp;quot; 3 tree (PCrlis 81 Thornton) permits traversal without giving up space for either nn iniplicit or an explicit stack. 4) 'This technique requires an estim;~te oT outer bound for the sire of the list; by nllocntion of 3 fixed hinary nrrily at the outset the tree can be ci~nstructed by basing frcsh riodcs on progressively l~ipl~cr ele~iie~~ts in the array.</Paragraph>
      <Paragraph position="2">  strings, 3) statement format. 4) INPUT and OUTPUT, 5) siniple programs. A sirnple prosram to count word frequencies in a text is described and discussed and 12 examples of output from student programs are given.</Paragraph>
      <Paragraph position="3"> COMPUTATION: PROGRAMMING: LANGUAGES Revised Report on the Algorithmic Language ALGOL 68 A. van Wijngaarden, B. J. Mailloux, J. E L. Peck, C. H. A. Koster, M. Sintzofl, C. H. Lindsey, L. G. L. T. Meertens, and R. G. Fisker, Eds,  0. lntroduction . . . . . . . . . . * . . . . . * . - . . * a 8 PART I: Prel iminrry Considerations 1. Language and metalanguage . . . . . . . . . . . ............. 17 2. The computer and the program . . . . . . . . . . . . . . . . . , . . . 35 PART 11: Fundamental Constructions 3.Clauses . , . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4. Declarations, declarers and indicators, 66 . . . . . . . . . . . . . . . . . . . . 5.Uni ......................... . .. *....... 77 PART I I I: Context Dependence 6.Coercion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 7. Modes and nests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 PART IV: Elaboration-independent constructions 8. Denotations . . . . * . . . . . . . . . . * . . . . . . . . . , , . . . 108 9. Tokens and symbols. . . . . . . . . . . . + . . . . . . . . , , . . . . . 113 PART V: Environment and Examples 10. Standard Environment . . , , . . . . . , . . . . . . . . . . . . . . . . 124 11. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 12. Glossaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 COMPUTATION: PICTORIAL SYSTEMS 72 lnteracive Graphics on Intelligent Terminals in a Time-Sharing Environment W. K. Ciloi, and S. Savitt  interactive graphics in n time-sharing environment shoilld be org;~nized in such a way that the user's activities are locally processed in order to avoid unaccept;lhly long response times. On the other hand, the host coiiiputcr ~iitrst be kept iriforiiied about Ihe user's actions and, ccnversrtly, the display file in [lie tcrminal has to be updated whenever the execution of the application progrilm causes a rlnnpe in the visi~al representation. In order to avoid the trtlnsn~ission of redundancy, the display file is deronlposed iriro two intersecting parts such that \he part in the host coniputrr and the other in the terniinal contains only the locally required information, i'he riecessarv communication between both parts is n~aintained by an  information module generated on the base of a low-low-level intermediate language (L ) and exchanged between coniputer an terminal. Th~s lellds to the notion of an abstract terminal programming systems.</Paragraph>
      <Paragraph position="4">  A multilevel data structure has been developed to allow its use to construct graphical symbols on a automated d~afting machine. The implemented logic allows a user to define several sets of alphanumerical characters and graphical symbols in terms of three levels of tables for creating and retrieving the needed line segment strokes. It is believed that the same logic codd be applied to provide a solution to the computer output of Chinese characters. To call any desired Chinese tharacter as output, it is simply a matter of calling the character's legal designator within a particular library.</Paragraph>
      <Paragraph position="5">  Subroutines are written to generate each word or part of the word. On ttre input side, one can use a light-pen on the appropriate part of the screen to choose which part of the program one wants to go to, e.h, go to next frame, delete this point, etc.</Paragraph>
      <Paragraph position="6"> A push of the designated function key switch will also convey inforn~ation. Chernoff proposed the napping of multidimensianal variables orito features of faces. Mouth, nose, eyes, facial contour, etc,, are modified in form and size to represent each vector of measurements, A,graphics program was written using this idea and extending it to provide the histogram comparison and interactive classification of individual cases.</Paragraph>
    </Section>
  </Section>
  <Section position="18" start_page="8540" end_page="8540" type="metho">
    <SectionTitle>
DOCUMENTATION
</SectionTitle>
    <Paragraph position="0"> Two Major Flaws in the CODASYL DDL 1973 and Proposed Corrections G. M. Nijssen Conrrol Dura Europe, 46, Avenue des Arts, 8-1040, Brussels, Belgium In formation Systems l:f 15-132, 1975 In a schema dtten in CODASYL DDL 1973 it is syntactically correct to describe an entity either by declaring a data,item in the record of the entry or by declaring a CODASYL set type in which the record, describing the entity, is a member record. This is a major flaw because extension of a database or integration of two existing-databases will then lead to either reprogramming or inconsistency, or both. The flaw can be corrected by 'requiring that all attributes are represented as a dab item in the (Ingical) schema. In the CODASY L DDL 1973, there are five places to optionally declare a record identifier and four of these five places are not in the rncord but in the CODASYL set type. Declaring record identifiers therefore results in fairly complex and non-orthogonal declarations. This could be simplified by abondoning tjtese five places and by introducing a record identifier clause in the record type entry. For integrity reasons it is necessary to require that at least one record identifier is declared in every fecord type entry. The previous two corrections will make is possible to design a CODASY L set selection clause. still providing the same functional capabilities. The corrected DDL is functionally equivalent, yet offers more data independence, is simpler and more orthogonal. Examples are given.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML