File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/80/p80-1019_metho.xml
Size: 15,179 bytes
Last Modified: 2025-10-06 14:11:24
<?xml version="1.0" standalone="yes"?> <Paper uid="P80-1019"> <Title>Expanding the Horizons of Natural Language Interfaces</Title> <Section position="3" start_page="0" end_page="71" type="metho"> <SectionTitle> 2. Non-Literal Aspects of Communication </SectionTitle> <Paragraph position="0"> In this section we will discuss four human communication needs and tile non-literal aspects of communication they have given rise to: The account here is based in part on work reported more fully in \[8, 9\]. Humans must deal with non-grammatical utterances in conversation simply because DePute produce them all the time. They arise from various sources: people may leave out or swallow words; they may start to say one thing, stop in the middle, and substitute something else; they may interrupt themselves to correct something they have just said; or they may simply make errors of tense, agreement, or vocabulary. For a combination of these and other reasons, it is very rare to see three consecutive grammatical sentences in ordinary conversation.</Paragraph> <Paragraph position="1"> Despite the ubiquity of ungrammaticality, it has received very little attention in the literature or from the implementers of natural-language interfaces. Exceptions include PARRY {17\]. COOP \[14\], and interfaces produced by the LIFER \[11\] system. Additional work on parsing ungrammatical input has been done by Weischedel and Black \[25\], and Kwasny and Sandheimer \[15\]. AS part of a larger project on user interfaces \[ 1 \], we (Hayes and Mouradian \[7\]) have also developed a parser capable of dealing flexibly with many forms of ungrammaticality.</Paragraph> <Paragraph position="2"> Perhaps part of the reason that flexibility in Darsmg has received so little attent*on in work on natural language interlaces is thai the input is typed, and so the parsers used have been derived from those used to parse written prose. Speech parsers (see for example I101 or 126i) have always been much more Ilexible. Prose is normally quite grammatical simply because the writer has had time to make it grammatical. The typed input to a computer system is. produced in &quot;real time&quot; and is therefore much more likely to contain errors or other ungrammaticalities.</Paragraph> <Paragraph position="3"> The listener al any given turn in a conversation does not merely decode or extract the inherent &quot;meaning&quot; from what the speaker said. Instead. lie =nterprets the speaker's utterance in the light at the total avnilable context (see for example. Hoblo~ \[13\], Thomas \[24J, or Wynn \[27\]). In cooperative dialogues, and computer interfaces normally operate in a cooperative situation, this contextually determined interpretation allows the participants considerable economies in what they say, substituting pronouns or other anaphonc forms for more complete descriptions, not explicitly requesting actions or information that they really desire, omitting part=cipants from descriphons of events, and leaving unsaid other information that will be &quot;obvious&quot; to the listener because of the Context shared by speaker and listener. In less cooperative situations, the listener's interpretations may be other than the speaker intends, and speakers may compensate for such distortions in the way they construct their utterances.</Paragraph> <Paragraph position="4"> While these problems have been studied extensively in more abstract natural language research (for just a few examples see \[4, 5, 16\]). little attention has been paid to them in more applied language wOrk. The work of Grosz \[6J and Sidner \[21\] on focus of attention and its relation tO anaphora and ellipsis stand out here. along with work done in the COOP \[14\] system on checking the presuppositions of questions with 8 negative answer, in general, contextual interpretation covers most of the work in natural language proces~ng, and subsumes numerous currently intractable problems. It is only tractable in natural language interfaceS because at the tight constraints provided by the highly restricted worlds in which they operate.</Paragraph> <Paragraph position="5"> Just as in any other communication across a noisy channel, there is always a basic question in human conversstion of whether the listener has received the speaker's tltterance correctly. Humans have evolved robust communication conventions for performing such checks with considerable, though not complete, reliability, and for correcting errors when they Occur (see Schegloff {20i). Such conventions include: the speaker assuming an utterance has been heard correctly unless the reply contradicts this assumbtion or there is no reply at all: the speaker trying to correct his own errors himself: the listener incorporating h=s assumptions about a doubtful utterance into his reply; the listener asking explicitly for clarification when he is sufficiently unsure.</Paragraph> <Paragraph position="6"> This area of robust conimunlcatlon IS porhaps II~e non-literal aspect of commumcat~on mOSt neglected in natural language work. Just a few systems such as LIFEPl ItlJ and COOP \[141 have paid even minimal attenhon Io it, Intereshngiy, it ~S perhaps the area in which Ihe new technology mentioned above has the most to oiler as we shall see.</Paragraph> <Paragraph position="7"> Fill\[lily. the SllOken Dart of a humlin conversation takes place over what is essenllully a s=ngle shared channel. In oilier words, if more than one person talks at once. no one can understand anything anyone else is saying. There are marginal exceptions to this. bul by and large reasonable conversation can only be conducted if iust one person speaks at a time. Thus people have evolved conventions for channel sharing \[19\], so that people can take turns to speak. Int~. =.stmgly, if people are put in new communication situations in which the standard turn-taking conventions do not work well. they appear quite able to evolve new conventions \[3i.</Paragraph> <Paragraph position="8"> AS noted earlier, computer interfaces have sidestepped this problem by making the interaction take place over a half-duplex channel somewhat analogous to the half-duplex channel inherent m sPeech, i.e. alternate turns at typing on a scroll el paper (or scrolled display screen). However, rather than prowding flexible conventions for changing turns, such =ntertaces typically brook no interrupt=arts while they are typing, and then when they are finished ins=st that the user type a complete input with no feedback (apart from character echoing), at which point the system then takes over the channel again.</Paragraph> <Paragraph position="9"> in the next Section we will examine how the new generation of interface technology can help with some of the problems we have raised.</Paragraph> </Section> <Section position="4" start_page="71" end_page="72" type="metho"> <SectionTitle> 3. Incorporating Non-Literal Aspects of </SectionTitle> <Paragraph position="0"> Communication into User Interfaces If computer interfaces are ever to become cooperative and natural to use, they must incorporate nonoiiteral aspects of communication. My mum point in this section is that there =s no reason they should incorporate them in a way directly im=tative of humans: so long as they are incorporated m a way that humans are comfortable with. direct imitation is not necessary, indeed, direct imitation iS unlikely to produce satislactory mterachon. Given the present state of natural language processing end artificial intelligence in general, there iS no prospect in the forseeable future that interlaces will be able to emulate human performance, since this depends so much on bringing to bear larger quantities of knowledge than current AI techmques are able to handle. Partial success in such emulation zs only likely to ra=se lalse expectations in the mind of the user, and when these expectations are inevitably crushed, frustration will result. However, I believe that by making use of some of the new technology ment=oned earlier, interfaces can provide very adequate substitutes for human techniques for non-literal aspects of commumcation; substitutes that capitalzze on capabilities of computers that are not possessed by humans, bul that nevertheless will result m interaction that feels very natural to a human.</Paragraph> <Paragraph position="1"> Before giving some examples, let tis review the kind of hardware I am assuming. The key item is a bit-map graphics display capable of being tilled with information very quickly. The screen con be divided into independent windows to which the system can direct difterent streams of OUtput independently. Windows can be moved around on the screen, overlapped, and PODDed out from under a pile of other windoWs. The user has a pointing device with which he can posit=on a cursor to arbitrary points on the SCreen, plus, of course, a traditional keyboard. Such hardware ex=sts now and will become increasingly available as powerful personal computers such as the PERO \[18J or LISP machine \[2\] come onto the market and start to decrease in price. The examDlas of the use of such hardware which follow are drawn in part from our current experiments m user interface research {1. 7\] on similar hardware.</Paragraph> <Paragraph position="2"> Perhaps the aspect of communication Ihal can receive the most benefit from this type of hardware is robust communication. Suppose the user types a non.grammatical input to the system which the system's flexible parser is able to recognize if. say, it inserts a word and makes a spelling correction. Going by human convention the system would either have to ask the user to confirm exDlicdly if its correction was correct, tO cleverly incorDoram ~tS assumption into its next output, or just tO aaaume the correction without comment. Our hypothetical system has another option: it Can alter what the user just typed (possibly highlighting the words that it changed). This achieves the same effect as the second optiert above, but subst=tutes a technological trick for huma intelligencf' Again. if the user names a person, say &quot;Smith&quot;, in a context where the system knows about several Smiths with different first names, the human oot=ons are either to incorporate a list of the names into a sentence (which becomes unwmldy when there are many more than three alternatives) or to ask Ior the first name without giving alternatives. A third alternative, possible only in this new technology, is to set up 8 window on the screen with an initial piece of text followed by a list ol alternatives (twenty can be handled quite naturally this way). The user is then free to point at the alternative he intends, a much simpler and more natural alternative than typing the name. although there is no reason why this input mode should not be available as well in case the user prefers it.</Paragraph> <Paragraph position="3"> As mentioned in the previous section, contextually based interpretation is important in human conversation because at the economies of expression it allows. There is no need for such economy in an interface's output, but the human tendency to economy in this matter is somelhing that technology cannot change. The general problem of keeping track of focus of attention in a conversation is a dillicult one (see, for example, Grosz 161 and Sidner \[221), but the type ol interface we are discussing can at least provide a helpful framework in which the current locus ol attention can be made explicit. Different loci at attention can be associated with different windows on tile screen, and the system can indicate what it thinks iS Ihe current lOCUS of .nttention by, say, making the border of the corresponding window dilferent from nil the rest. Suppose in the previous example IIlat at the time the system displays the alternative Smiths. the user decides that he needs some other information before he can make a selection. He might ask Ior this information in a typed request, at which point the system would set up a new window, make it the focused window, and display the requested information in it. At this point, the user could input requests to refine the new information, and any anaphora or ellipsis he used would be handled in the appropriate context.</Paragraph> <Paragraph position="4"> Representing.contexts explicitly with an indication of what the system thinks is the current one can also prevent confusion. The system should try to follow a user's shifts of focus automatically, as in the above example. However, we cannot expect a system of limited understanding always to track focus shifts correctly, and so it is necessary for the system to give explicit feedback on what it thinks the shift was. Naturally, this implies that the user should be able to change focus explicitly as well as implicitly (probably by pointing to the appropriate window).</Paragraph> <Paragraph position="5"> Explicit representation of loci can also be used to bolster a human's limited ability to keep track of several independent contexts. In the example above, it would not have been hard lot the user to remember why he asked for the additional information and to return and make the selection alter he had received that information. With many more than two contexts, however, people quickly lose track of where they are and what they are doing. Explicit representation of all the possibly active tasks or contexts can help a user keep things straight.</Paragraph> <Paragraph position="6"> All the examples of how sophisticated interface hardware can help provide non-literal aspects of communication have depended on the ability of the underlying system to produce pos~bly large volumes of output rapidly at arbitrary points on the screen. In effect, this allows the system multiple output channels independent of the user's typed input, which can still be echoed even while the system is producing other output, Potentially, this frees interaction over such an interface from any turn-taking discipline. In practice, some will probably be needed to avoid confusing the user with too many things going on at once, but it can probably be looser than that found in human conversations.</Paragraph> <Paragraph position="7"> As a final point, I should stress that natural language capability is still extremely valuable for such an interface. While pointing input is extremely fast and natural when the object or operation that the user wishes tO identify is on the screen, it obviously cannot be used when the information is not there. Hierarchical menu systems, in which the selection of one item in a menu results in the display of another more detailed menu, can deal with this problem to some extent, but the descriptive power and conceptual operators ol nalural language (or an artificial language with s=milar characteristics) provide greater flexit)ility and range of expression. II the range oI options =.~ larg~;, t)ul w,dl (tiscr,nm;de(I, il =s (llh.~l easier to specify a selection by description than by pointing, no matter how ctevedy tile options are organized.</Paragraph> </Section> class="xml-element"></Paper>