File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/84/p84-1098_abstr.xml
Size: 5,607 bytes
Last Modified: 2025-10-06 13:46:13
<?xml version="1.0" standalone="yes"?> <Paper uid="P84-1098"> <Title>MACHINE-READABLE COMPONENTS IN A VARIETY OF INFORMATION-SYSTEM APPLICATIONS</Title> <Section position="1" start_page="0" end_page="463" type="abstr"> <SectionTitle> MACHINE-READABLE COMPONENTS IN A VARIETY OF INFORMATION-SYSTEM APPLICATIONS </SectionTitle> <Paragraph position="0"> Components of the machine-readable dictionary can be applied in a number of information systems. The most direct applications of the kind are in wordprocessing or in &quot;writingsupport&quot; systems built on a wordprocessing base. However, because a central function of any dictionary is in fact data verification, there are other proposed applications in communications and data storage and retrieval systems.</Paragraph> <Paragraph position="1"> Moreover, the complete interrelational electronic dictionary is in some sense the model of the language; and there are, accordingly, additional implications for language-based information search and retrieval.</Paragraph> <Paragraph position="2"> In regard to wordprocessing, the electronic lexicon can serve as the base for spelling verification (in which the computer detects many spelling or typographical errors} and spelling correction (in which the computer offers corrections to the errors it has identified). Because it is possible to develop algorithms that permit the computer to calculate the chances that the single best alternative it offers is actually correct, this substitution can in many cases be made automatically. It is at this point in the development of such systems wise to flag such automatic corrections for inspection by the operator.</Paragraph> <Paragraph position="3"> At the present time, these processes generally depend upon the application of strict frequency measures, which permit the lexicon to be reduced to small-machine proportions and thereby reduce the possibility of a false hit--the passing of a misspelled common word that happens to coincide in orthography with a legitimate but rare word. As our ability to draw cognitive information from text increases, and as available memory increases, then such limits can be abandoned.</Paragraph> <Paragraph position="4"> Truncation of the lexicon for other specific applications can be considered. It is possible, for example, to shape the lexicon to reflect a children's vocabulary and thereby to develop spelling correction and other writing aids for the early educational years on a very small machine base. It is also possible to shape the lexicon to the needs of the educated adult user, for whom information about common words is unnecessary, and thereby to provide an exceptionally rich resource about &quot;difficult&quot; words within small-machine memory for on-line access to spelling, definition, and pronunciation.</Paragraph> <Paragraph position="5"> Configuring the lexicon pyramidally by frequency, including all words of high frequency, seems an inevitable model to us now, but it is of course a kind of historical accident.</Paragraph> <Paragraph position="6"> As many of these comments already make clear, even if one resolves to work within the linguistic bounds of the ordinary print dictionary, there are differences in the demands placed upon the dictionary by print applications and those arising out of electronic applications. It is a matter of judgment or taste for the print lexicographer not to include geographic and biographic terms in the lexicon, but the electronic lexicographer does not have that latitude.</Paragraph> <Paragraph position="7"> Access to on-line dictionaries can be by the standard alphabetic means or by well-developed phonetic algorithms (which solve the conundrum of needing to know spelling before being able to find spelling) or by definition (the reverse dictionary). As electronic citation for words and senses is done on the basis of machine scans of print-composition tapes and even of voice scans, then sensitive subject coding should permit the development of lexicons tailored to the user profile, with attendant benefits in comprehensiveness and economy of memory. One can conceive of dictionaries that monitor their own use and respond by offering only unkown information to the individual user.</Paragraph> <Paragraph position="8"> The dictionary that contains synonymy is a resource in the construction of electronic synonym generators, of which there is at least one model that returns synonyms in the inflections of the source words, including phrasal synonyms, taking precise account of all irregularities in doing so. Presentation of synonyms is useful for &quot;knowledge workers&quot; but not for clerical workers.</Paragraph> <Paragraph position="9"> If usage information is included in the dictionary, then it is deliverable as a discrete electronic product. The most direct key to specific usage guidance is by &quot;trigger&quot; words or phrases that call up guidance information for the operator, but much more sophisticated implementations are possible when programming addresses grammar and syntax.</Paragraph> <Paragraph position="10"> In large-system management, where accuracy of alpha data is a consideration, the machine dictionary can be the base or one of the bases for verification and correction of data streams in communication or of stored data. ~hat I have called the complete interrelational dictionary-fully coded to reflect the range of significant linguistic information-will serve as the base for retrieving information by meaning rather than mechanics.</Paragraph> </Section> class="xml-element"></Paper>