File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/82/c82-2023_abstr.xml
Size: 5,964 bytes
Last Modified: 2025-10-06 13:46:02
<?xml version="1.0" standalone="yes"?> <Paper uid="C82-2023"> <Title>COLLOCATIONAL GRAMMAR AS A MODEL FOR H I NAN-COMPUTER INTERACTION</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> COLLOCATIONAL GRAMMAR AS A MODEL FOR H I NAN-COMPUTER INTERACTION </SectionTitle> <Paragraph position="0"> Contrary to the long-held belief of transfo~national E~mmarians for communication in general, the majority cf natural languaEe sentences which people actually use in oommunicating with a co~puter, in an unconstrained mode, are not novel. As Thompson and Thompson 1981s660 obselwez &quot;monotony of structure is the rule rathe~ than the exception in homan-computer communication.&quot; Thompson 1981&quot;41 reports in her study of such communications that 75 percent of the queries were wh-questions, 1% percent were commands, 5 percent were statements, and 1 percent were yes/no questions.</Paragraph> <Paragraph position="1"> The repetitive feature of natural language is not a new concept. Similar observations have been made bsforeo Damerau 1971 used collocation of lexioal items as the basis for a Marker model in an experiment for text generation. Becket claims that &quot;the wonderful feats of the homan intellect.., are based as much on memorization as on any impromptu problem-solving&quot; (1975:62). He posits a phrasal lexicon oonslstln~ of six major categories of lexioal phrases by which we &quot;stitch tcgether swatches of text that we have hear~ before |productive processes have the secondary role of adapting the old phrases to the new situations&quot; (1975z60)o All of these approaches to natural languaEe data rely heavily on the observation that many lexlcal items tend - 106 co-occur. This surface co-occurrence is the result of what may at times be complicated syntactic and semantic interrelations of language unite. Unfortunately, a systematic accounting of these interrelations has not been achieved in any linguistic theory. The thrust of our approach is that the more of language which can be handled lexical!y , the easier will homan language be able to be modelled.</Paragraph> <Paragraph position="2"> Actual data on the frequency of lexical collocation are very spares. A study of word sequences in PANALOG text has shown a surprising amount of repetition of word sequences (Bienstock and Smith in preparation). (PANALOG is a system for passing messages among small ~oups in computer conferenclng with telemall and calendar features. See Housman 1979. The data are of human-homan communication and not homan-computer cc~.,nunication and are thematically restricted. They therefore resemble Damerau's data.) A study of parts of the Brown English Corpus has been undertaken in order to get less thematically homogeneous material. In addition, Wizard of OZ experiments with unconstrained h,-.an-computer input will begin soon at GTE Laboratories in order to gather the more relevant human-oomputeu~ data.</Paragraph> <Paragraph position="3"> A PANALOG text of 16,133 words chosen for study. The longest string which occurred more than once was a seven word quote from Ntetzsche. Two six-word strings occurred twice and at length five, one occurred three times and thirteen were repeated twice.</Paragraph> <Paragraph position="4"> An interesting feature of these distributions is that the number of hapaxes (those strings occurring only once at a given string length) reaches a peak at length three (see This is a reveall~ measure of the Amount of repetition in a text of this length. In particular, recurring two word strings account for 40.8 percent of the running text and recu~ring three word strings comprise 8.1 percent of the text. The basic assumption of the frequent occurrence of lexical collocation in natul-al language texts, especially in h,-,an-ocmputer communication, is the basis for the development of a new type of natural language processor. Ford, 1981, has constructed a natural languaEe processing system for database updating, retrieval, and manipulationt which relies critically on the observation that real users tend to employ a very limited set .of lexical dtri~ types in queryi~ databases.</Paragraph> <Paragraph position="5"> The Ford natural language processor consists of a two stage reduction algorithm for ~ranslating natural languaEs inputs into basic functions which are then used to perform the query. The first stags of the reduction changes the input words to meaning representations usinE a list of lexioal items and a meaning correlate list. The second stage takes as input strings of these meaning correlates and oha~es them into basic llst.</Paragraph> <Paragraph position="6"> 409 numeric representations for words mapped down to 132 unique meanings and 1328 canonical sentence vectors mapped down - 108 to 19 functions. This two stage reduction scheme worked efficiently enough to respond to 93.8 percent of the 1697 input queries, including ungrammatical ones from inexperienced users, with a response time of 1.5 seconds, operating in an environment of 90K 8-bit bytes. This compares very favorably to Thompson 1981 where only 67.7 percent of REL queries were correctly parsed with an average response time of 10 seconds.</Paragraph> <Paragraph position="7"> (Space requirements were not reported.) Similarly, Damerau 1981 and Patrick 1981 report a success rate for TQA of 65.1 percent inputs correctly parsed with the time required to process a sentence typically being 10 seconds.</Paragraph> <Paragraph position="8"> The reason why the system works so well in terms of accuracy, speed, and small storage reqairements is based on the two stags reduction technique which, in t~rn, is based on the fact that a great ma~v inputs in human-computer communication are repetitious syntactically, semantically, and lexl~ally. Repetition is a principal characteristic of human-computer communication.</Paragraph> </Section> class="xml-element"></Paper>