File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/86/j86-4002_metho.xml
Size: 100,431 bytes
Last Modified: 2025-10-06 14:11:53
<?xml version="1.0" standalone="yes"?> <Paper uid="J86-4002"> <Title>REFERENCE IDENTIFICATION AND REFERENCE IDENTIFICATION FAILURES</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 MISCOMMUNICATION </SectionTitle> <Paragraph position="0"> People must and do manage to resolve lots of (potential) miscommunication in everyday conversation. Much of it seems to be resolved subconsciously - with the listener unconcerned that anything is wrong. Other miscommunication is resolved with the listener actively deleting or replacing information in the speaker's utterance until it fits the current context. Sometimes this resolution is postponed until the questionable part of the utterance is actually needed. Still, when all these fail, the listener can ask the speaker to clarify what was said. 3 In this section we present evidence that people do miscommunicate and yet they often manage to repair reference failures. We look at specific forms of miscommunication and describe ways to detect them. We highlight relationships between different miscommunication problems and demonstrate ways for resolving some of them. The different kinds of miscommunication we present directly motivate the need by listeners for many of the types of knowledge we describe in Section 3.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 2.1 CAUSES OF MISCOMMUNICATION </SectionTitle> <Paragraph position="0"> This section motivates a paradigm for the kinds of conversation' that we studied and points out places in the paradigm that leave room for miscommunication.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> EFFECTS OF THE STRUCTURE OF TASK-ORIENTED DIALOGUES </SectionTitle> <Paragraph position="0"> Task-oriented conversations have a specific goal to be achieved: the performance of a task (e.g., the air compressor assembly in Grosz (1977)). The participants in the dialogue can have the same skill level and they can work together to accomplish the task; or one of them, the expert, could know more and could direct the other, the apprentice, to perform the task. We have concentrated primarily on the latter case - due to the protocols that we examined - but many of our observations can be generalized to the former case.</Paragraph> <Paragraph position="1"> The viewpoints of the expert and apprentice differ greatly in apprentice-expert exchanges. The expert, having an understanding of the functionality of the elements in the task, has more of a feel for how the elements work together, how they go together, and how the individual elements can be used. The apprentice normally has no such&quot; knowledge and must base his decisions on perceptual features such as shape (Grosz 1981). These differences can lead to problems.</Paragraph> <Paragraph position="2"> The structure of the task affects the structure of the dialogue (Grosz 1977), particularly through the center of attention of the expert and apprentice during the accomplishment of each step of the task. The common center of attention of the dialogue participants is called the focus (Grosz 1977, Reichman 1978, Sidner 1979). Shifts in focus correspond to shifts between the tasks and subtasks; e.g., the objects in a task and the subpieces of each object. Focus and focus shifts are governed by many rules (Grosz 1977, Reichman 1978, Sidner 1979).</Paragraph> <Paragraph position="3"> Computational Linguistics, Volume 12, Number 4, October-December 1986 277 Bradley A. Goodman Reference Identification and Reference Identification Failures Confusion may result when expected shifts do not take place. For example, if the expert changes focus to some object but never bothers to talk about the object reasonably soon after its introduction (i.e., between the time of its introduction and its use, without digressing in a well-structured way in between (see Reichman 1978)), or never discusses its subpieces (such as an obvious attachment surface), then the apprentice may become confused, leaving him likely to misunderstand further utterances. The reverse influence between focus and objects can lead to trouble, too. A shift in focus by the expert that does not have a manifestation in the apprentice's world will also perplex the apprentice.</Paragraph> <Paragraph position="4"> Focus also influences how descriptions are formed (Grosz 1981, Appelt 1981). The level of detail required in a description depends directly on the elements currently in focus. If the object to be described is similar to other elements in focus, the expert must be more specific in the formulation of the description or may consider shifting focus away from the confusing objects.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 INSTANCES OF MISCOMMUNICATION </SectionTitle> <Paragraph position="0"> Figure 2 outlines some of the ways people get confused during a conversation. These instances were derived from analyzing the water pump protocols. We only discuss referent confusion in this paper. The other forms of confusion - Action, Goal, and Cognitive Load - are described in Goodman (1982, 1984). The confusions themselves, coupled with the description at the end of this section on how to recognize when one of them is occurring and the knowledge people use to perform reference described in Section 3, provide motivation for the use of the algorithm outlined in Section 4 as a means for repairing communication problems.</Paragraph> <Paragraph position="1"> We illustrate here many of the confusions in the taxonomy through numerous excerpts. Each excerpt has marked in parentheses the modality of communication that was used in the excerpt (face-to-face, over the telephone, and so forth). A description about the collection of these excerpts can be found in Cohen (1984). Each bracketed portion of the excerpt explains what was occurring at that point in the dialogue.</Paragraph> </Section> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> ERRONEOUS SPECIFICITY </SectionTitle> <Paragraph position="0"> A speaker can be over- or underspecific in his descriptions (which violates Grice's (1975) maxim of quantity). Such descriptions are a form of erroneous specificity that can lead to mistakes on the part of the listener even though, technically, nothing is wrong with the description.</Paragraph> <Paragraph position="1"> A request is overspecific if extra details are given that seem obvious to the listener (Grosz 1978). Since the listener would not expect the speaker to provide him with obvious details, the listener might become confused; thinking that he had done something incorrectly as the task seemed easier than the one apparently described by the speaker. 4 For example, in Excerpt 2, E's description of the bubbled piece (i.e., the AIRCHAMBER) is overspecific because it supplies many more features than needed to identify the piece. The extra description in Lines 15 to 17 confused the listener, who appeared to have correctly identified the piece by Line 13 but ended up taking the wrong one when the expert kept adding more details. See Excerpt 10 in the section on bad analogies for other related examples of overspecificity.</Paragraph> <Paragraph position="2"> Excerpt 2 (Telephone) E: 1. Okay? 2. Now you have two devices that 3. are clear plastic.</Paragraph> <Paragraph position="3"> \[A picks up MAINTUBE and SPOUT\] A: 4. Okay.</Paragraph> <Paragraph position="4"> E: 5. One of them has two openings 6. on the outside with threads on Referent Confusion Action Confusion Goal Confusion Cognitive Load Confusion I ' ' ' &quot;4 Y ' ' &quot; .~.m~ci.ticity Redand~ncy Spcci.~icity Figure 2. A taxonomy of confusions.</Paragraph> <Paragraph position="5"> 278 Computational Linguistics, Volume 12, Number 4, October-December 1986 Bradley A. Goodman Reference Identification and Reference Identification Failures 7. the end, and its about five 8. inches long.</Paragraph> <Paragraph position="6"> 17. Both of these are tubular.</Paragraph> <Paragraph position="7"> \[A puts down SPOUT\] A: 18. Okay.</Paragraph> <Paragraph position="8"> 19. not the bent one.</Paragraph> <Paragraph position="9"> Ambiguous descriptions are underspecified and can cause confusion about the referent. Excerpt 3 below illustrates a case where the speak'er's description is underspecified - it does not provide enough detail to prune the set of possible referents to one.</Paragraph> <Paragraph position="10"> Excerpt 3 (Face-to-Face) E: 1. And now take the little red 2. peg, \[A takes PLUG\] 3. Yes, 4. and place it in the hole at the 5. green end, \[A starts to put PLUG into OUTLET2 of MAINTUBE\] 6. no 7. the-in the green thing</Paragraph> <Paragraph position="12"> A: 8. Okay.</Paragraph> <Paragraph position="13"> In Line 4 and 5, E describes the location to place a peg into a hole by giving spatial information. Since the location is given relative to another location by &quot;in the hole at the green end,&quot; it defines a region where the peg might go instead of a specific location. In this particular case, there are three possible holes to choose from that are near the green end. The listener chooses one - the wrong one - and inserts the peg into it. Because this dialogue took place face to face, E is able to correct the ambiguity in Lines 6 and 7.</Paragraph> <Paragraph position="14"> An underspecified description can be imprecise in many possible ways.</Paragraph> <Paragraph position="15"> * A description may consist of features that do not readily apply or that are inappropriate in the domain. In Line 3, Excerpt 4, the feature &quot;funny&quot; has no meaning to the listener here. It is not until E provides a fuller description in Lines 5 to 8 that A is able to select the proper piece.</Paragraph> <Paragraph position="16"> * It may use imprecise feature values. For example, one could use an imprecise head noun coupled with few or no feature values (and context alone does not necessarily suffice to distinguish the object). In Excerpt 5, Lines 8 and 9, &quot;attachment&quot; is imprecise because all objects in the domain are attachable parts. The expert's use of &quot;attachment&quot; was most likely to signal the action the apprentice can expect to take next. The use of the feature value &quot;clear&quot; provides little benefit either because three clear, unused parts exist. The size descriptor &quot;little&quot; prunes this set of possible referents to two contenders.</Paragraph> <Paragraph position="17"> Another use of imprecise feature values occurs when enough feature values are provided but at least one value is too imprecise. In Excerpt 6, Line 3, the use of the attribute value &quot;rounded&quot; to describe the shape does not sufficiently reduce the set of four possible referents (though, in this particular instance, A correctly identifies it) because the term is applicable to numerous parts in the domain. 5 A more precise shape descriptor such as &quot;bell-shaped&quot; or &quot;cylindrical&quot; would have been more beneficial to the listener.</Paragraph> <Paragraph position="18"> Excerpt 4 (Telephone) E: 1. All right.</Paragraph> <Paragraph position="19"> 2. Now.</Paragraph> <Paragraph position="20"> 3. There's another funny little 4. red thing, a \[A is confused, examines both NOZZLE and SLIDEVALVE\] 5. little teeny red thing that's 6. some-should be somewhere on 7. the desk, that has um-there's 8. like teeth on one end.</Paragraph> <Paragraph position="21"> \[A takes SLIDEVALVE\] A: 9. Okay.</Paragraph> <Paragraph position="22"> E: 10. It's a funny-loo-hollow, 11. hollow projection on one end 12. and then teeth on the other.</Paragraph> <Paragraph position="23"> Excerpt 5 (Teletype) E: 1. take the red thing with the 2. prongs on it 3. and fit it onto the other hole 4. of the cylinder 5. so that the prongs are 6. sticking out A: 7. ok E: 8. now take the clear little 9. attachment 10. and put on the hole where you 11. just put the red cap on 12. make sure it points Computational Linguistics, Volume 12, Number 4, October-December 1986 279 Bradley A. Goodman Reference Identification and Reference Identification Failures 13. upward A: 14. ok Excerpt 6 (Teletype) E: 1. Ok, 2. put the red nozzle on the outlet 3. of the rounded clear chamber 4. ok? A: 5. got it.</Paragraph> </Section> <Section position="8" start_page="0" end_page="0" type="metho"> <SectionTitle> IMPROPER FOCUS </SectionTitle> <Paragraph position="0"> Earlier we talked about focus and problems that occur due to it. In this section, we discuss how misfocus can cause misreference. Focus confusion can occur when the speaker sets up one focus and then proceeds with another one without letting the listener know of the switch (i.e., a focus shift occurs without any indication). The opposite phenomenon can also happen - the listener may feel that a focus shift has taken place when the speaker actually never intended one. These really are very similar - one is viewed more strongly from the perspective of the speaker and the other from that of the listener.</Paragraph> <Paragraph position="1"> Excerpt 7 illustrates an instance of the first type of focus confusion. In the excerpt, the speaker (E) shifts focus without notifying the listener (A) of the switch. As the excerpt begins, A is holding the TUBEBASE. E provides in Lines 1 to 16 instructions for A to attach the CAP and the SPOUT to OUTLET1 and OUTLET2, respectively, on the MAINTUBE. Upon A's successful completion of these attachments, E switches focus in Lines 17 to 20 to the TUBEBASE assembly and requests A to screw it on to the bottom of the MAINTUBE. While A completes the task, E realizes (Line 22) she left out a step in the assembly - the placement of the SLIDEVALVE into OUTLET2 of the MAINTUBE before the SPOUT is placed over the same outlet. E attempts to correct her mistake by requesting (Line 23) A to remove &quot;the plas ''6 piece. Since E never indicated a shift in focus from the TUBEBASE back to the SPOUT, A interprets &quot;the plas&quot; to refer to the TUBEBASE.</Paragraph> <Paragraph position="2"> 2. the blue cap that's left \[A takes CAP\] 3. on the side holes that are 4. on the cylinder, \[A lays down TUBEBASE\] 5. the side hole that is farthest 6. from the green end.</Paragraph> <Paragraph position="3"> 24. no 25. the clear plastic thing that I 26. told you to put on \[A removes SPOUT\] 27. sorry.</Paragraph> <Paragraph position="4"> 28. And place the little red thing \[A takes SLIDEVALVE\] 29. in there first, \[A inserts SLIDEVALVE into OUTLET2 of MAINTUBE\] 30. it fits loosely in there.</Paragraph> <Paragraph position="5"> Excerpt 8 demonstrates the latter type of focus confusion that occurs when the speaker (E) sets up one focus - the MAINTUBE, which is the correct focus in this case - but then proceeds in such a manner that the listener (A) thinks a focus shift to another piece, the TUBEBASE, has occurred. Thus, Line 15, &quot;a bottom hole,&quot; refers to &quot;the lower side hole in the MAINTUBE&quot; for E and &quot;the hole in the TUBEBASE&quot; for A. A has no way of realizing that he has focused incorrectly unless the description as he interprets it doesn't have a real world correlate (here something does satisfy the description so A doesn't sense any problem) or if, later in the exchange, a conflict arises due to the mistake (e.g., a requested action can not be performed). In Line 31, A inserts a piece into the wrong hole because of the misunderstanding in Line 15. Line 31 hints that A may have become suspicious that an ambiguity existed somewhere 280 Computational Linguistics, Volume 12, Number 4, October-December 1986 Bradley A. Goodman Reference Identification and Reference Identification Failures in the previous conversation, but since the task appeared to be successfully completed (i.e., the red piece fit into the hole in the base), and since E did not provide any clarification, he assumed he was correct.</Paragraph> <Paragraph position="6"> Excerpt 8 (Telephone) E: 1. Umnow.</Paragraph> <Paragraph position="7"> 2. Now we're getting a little 3. more difficult.</Paragraph> <Paragraph position="8"> A: 4. (laughs) E: 5. Pick out the large air tube \[A picks up STAND\] 6. that has the plunger in it.</Paragraph> <Paragraph position="9"> \[A puts down STAND, takes PLUNGER/MAINTUBE assembly\] A: 7. Okay.</Paragraph> <Paragraph position="10"> E: 8. And set it on its base, \[A puts down MAINTUBE, standing vertically, on the TABLE\] 9. which is blue now, 10. right? \[A has shifted focus to the TUBEBASE\] A: 11. Yeah.</Paragraph> <Paragraph position="11"> E: 12. Base is blue.</Paragraph> <Paragraph position="12"> 13. Okay, 14. Now 15. You've got a bottom hole still 16. to be filled, 17. correct? A: 18. Yeah.</Paragraph> <Paragraph position="13"> E: 19. Okay.</Paragraph> <Paragraph position="14"> A: 22.</Paragraph> <Paragraph position="15"> E: 23.</Paragraph> <Paragraph position="16"> 24.</Paragraph> <Paragraph position="17"> 25.</Paragraph> <Paragraph position="18"> 26.</Paragraph> <Paragraph position="19"> A: 27.</Paragraph> <Paragraph position="20"> E: 28.</Paragraph> <Paragraph position="21"> \[A answers this with MAINTUBE still sitting on theTABLE; he shows no indication of what hole he thinks is meant - the one on the MAINTUBE, OUTLET2, or the one in the TUBEBASE\] 20. You have one 21. remaining? \[A red piece picks up MAINTUBE assembly and looks at TUBEBASE, rotating the MAINTUBE so that TUBEBASE is pointed up, and sees the hole in it; he then looks at the SLIDEVALVE\] Yeah.</Paragraph> <Paragraph position="22"> Okay.</Paragraph> <Paragraph position="23"> Take that red piece.</Paragraph> <Paragraph position="24"> \[A takes SLIDEVALVE\] It's got four little feet on it? Yeah.</Paragraph> <Paragraph position="25"> And put the small end into that hole on the air tube30. on the big tube.</Paragraph> <Paragraph position="26"> A: 31. On the very bottom? \[A starts to put it into the bottom hole of TUBEBASE - though he indicates he is unsure of himself\] E: 32. On the bottom, 33. Yes.</Paragraph> <Paragraph position="27"> Misfocus can also occur when the speaker inadvertently fails to distinguish the proper focus because he did not notice a possible ambiguity; or when, through no fault of the speaker, the listener just fails to recognize a switch in focus indicated by the speaker. Excerpt 8 is an example of the first type because E failed to notice that an ambiguity existed since he never explicitly brought the TUBEBASE either into or out of focus. He just assumed that A had the same perspective as he had - a perspective in which no ambiguity occurred.</Paragraph> </Section> <Section position="9" start_page="0" end_page="0" type="metho"> <SectionTitle> WRONG CONTEXT </SectionTitle> <Paragraph position="0"> Context differs from focus. The context of a portion of a conversation is concerned with the intention of the discussion in that fragment and with the set of objects relevant to that discussion, though not attended to currently. Focus pertains to the elements currently being attended to in the context. For example, two people can share the same context but have different focus assignments within it - we're both talking about the water pump, but you're describing the MAINTUBE and I'm describing the AIRCHAMBER. Alternatively, we could just be using different contexts - I think you're talking about taking the pump apart, but you're talking about replacing the pump with new parts - in both cases we may be sharing the same focus - the pump - but our contexts are totally off from one another. 7 The kinds of misunderstandings that can occur because of context inconsistencies are similar to those for focus problems: * the speaker might set up or use one context for a discussion and then proceed in another one without effectively letting the listener know of the change, * the listener may feel a change in context has taken place when in fact the speaker never intended one, or * the listener fails to recognize an indicated context switch by the speaker.</Paragraph> <Paragraph position="1"> Context affects reference identification because it helps define the set of available objects that are possible contenders for the referent of the speaker's descriptions. If the contexts of the speaker and listener differ, then misreference might result.</Paragraph> </Section> <Section position="10" start_page="0" end_page="0" type="metho"> <SectionTitle> BAD ANALOGY </SectionTitle> <Paragraph position="0"> An analogy (see Gentner 1980) for a discussion on analogies) is a useful way to help describe an object by attempting to be more precise by using shared past experience and knowledge - especially shape and functional information. If that past experience or knowledge doesn't contain the information the speaker assumes it does, then trouble occurs. Thus, one more way referent confusion Computational Linguistics, Volume 12, Number 4, October-December 1986 281 * Bradley A. Goodman Reference Identification and Reference Identification Failures can occur is by describing an object using a poor analogY. null An analogy can be improper for several reasons. It might not be specific enough - confusing the listener because several potential referents might conform to the analogy. Alternatively, the analogy might fail because discovering a mapping between the analogous object and something in the environment is too difficult. In Excerpt 9, A at first has trouble correctly satisfying E's functional analogy &quot;stopper&quot; in &quot;the big blue stopper,&quot; but finally selects what he considers to be the closest match to &quot;stopper.&quot; The problem for A was that E's functional analogy was not specific enough. It would have been better to use cap instead of stopper.</Paragraph> <Paragraph position="1"> Excerpt 9 (Telephone) E: 1. Okay. Now, 2. take the big blue 3. stopper that's laying around \[A grabs AIRCHAMBER\] 4 .... and take the black 5. ring-A: 6. The big blue stopper? \[A is confused and tries to communicate it to E; he is holding the AIRCHAMBER here\] E: 7. Yeah, 8. the big blue stopper 9. and the black ring.</Paragraph> <Paragraph position="2"> \[E drops AIRCHAMBER and takes the O-RING and the TUBEBASE\] In other cases the analogy might be too specific confusing the 'listener because none of the available referents appear to fit it. In Line 8 of Excerpt 7, &quot;nozzle-looking&quot; forms a poor shape analogy because the object being referred to actually is an elbow-shaped spout and not a nozzle. The &quot;nozzle-looking&quot; part of the description convinced the listener that what he was looking for was something identified by the typical properties of a nozzle (which is a small tube used as an outlet). However, sometimes when an object is a clear representative of a specified analogy class, the apprentice will not tend to select it as the intended referent. He would assume tfiat, to refer to that object, the expert would not bother to form an analogy instead of just directly describing the object as a member of the class. Hence, the apprentice may very well ignore the best representative of the class for some less obvious exemplar. Given the case just mentioned, it is therefore better to say nozzle instead of nozzle-looking. In Excerpt 10, the description &quot;hippopotamus face shape&quot; (a shape analogy) in Lines 2 and 3, and &quot;champagne top&quot; (a shape analogy) in Line 9, are too specific and the listener is unable to easily find something close enough to match either of them. He can't discover a mapping between the object in the analogy and one in the real world (a discussion on discovering such mappings can be found in Gentner (1980)). In fact, when this excerpt was played back to one listener, he was so overwhelmed by E's descriptions that he exclaimed &quot;What!&quot; when he heard them and was unable to correctly proceed.</Paragraph> <Paragraph position="3"> Excerpt 10 (Audiotape) E: 1. take the bright pink flat 2. piece of hippopotamus face 3. shape piece of plastic 4. and you notice that the two 5. holes on it \[E is trying to refer to BASEVALVE\] 6. match 7. along with the two 8. peg holes on the 9. champagne top sort of 10. looking bottom that had 11. threads onit \[E is trying to refer to TUBEBASE\]</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 DETECTING REFERENCE MISCOMMUNICATION </SectionTitle> <Paragraph position="0"> The previous section illustrated some of the ways reference miscommunication occurs. Part of our research, however, has been to examine how a listener discovers the need for a repair of a description during communication. The incompatibility of a description or action with the scene is the strongest signal of possible trouble.</Paragraph> </Section> </Section> <Section position="11" start_page="0" end_page="0" type="metho"> <SectionTitle> DESCRIPTION INCOMPATIBILITY </SectionTitle> <Paragraph position="0"> The strongest hint that there is a description incompatibility occurs when the listener finds no real world object to correspond to the speaker's description (i.e., referent identification fails). This can occur when the description does not agree with the current state of the world: * when one or more of the specified feature values in the description are not satisfied by any of the pieces (e.g., saying &quot;the orange cap&quot; when none of the objects are orange ); * when one or more specified constraints do not hold (e.g., saying &quot;the red plug that fits loosely '' when all the red plugs attach tightly); or * if no one object satisfies all of the features specified in the description (i.e., there is, for each feature, an object that exhibits the specified feature value, but no one object exhibits all of the values). In Lines 7 and 8 of Excerpt 10 above, E's description of &quot;the two peg holes&quot; leads to bewilderment for the listener because the &quot;champagne top sort of looking bottom that had threads on it&quot; (i.e., the TUBEBASE) has no holes in it. E actually meant &quot;two pegs&quot;.</Paragraph> <Paragraph position="1"> An impossible reference might not only suggest a mistake in the speaker's description but it could instead indicate an earlier action error (e.g., two parts were put together improperly or never had been intended to be assembled together).</Paragraph> <Paragraph position="2"> With respect to actual reference mechanisms, description incompatibility means that a referent could 282 Computational Linguistics, Volume 12, Number 4, October-December 1986 Bradley A. Goodman Reference Identification and Reference Identification Failures not be found. The reference mechanism was not able to find a match between its representation of the speaker's description and the representations of the objects in the world (i.e., the possible referents). Section 4.2.1 provides details on how our reference mechanism attempts such a match.</Paragraph> </Section> <Section position="12" start_page="0" end_page="0" type="metho"> <SectionTitle> ACTION INCOMPATIBILITY </SectionTitle> <Paragraph position="0"> An action incompatibility problem is likely if * the listener cannot perform the action specified by the speaker because of some obstacle; * the listener performs the action but does not arrive at its intended effect (i.e., a specified or default constraint isn't satisfied); or * the current action affects a previous action in an adverse way, 3,et the speaker has given no sign of any importance to this side-effect.</Paragraph> <Paragraph position="1"> Such action incompatibility might indicate an earlier misreference (e.g., the wrong part was chosen and used in an earlier action).</Paragraph> <Paragraph position="2"> The detection of most misreferences isn't so hard the difficult part is determining why there is a problem so that the problem can be repaired. The problem could be one of the many illustrated in this section. The knowledge sources described in the next section help provide a better handle for determining the problem with the speaker's description.</Paragraph> </Section> <Section position="13" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 KNOWLEDGE FOR REFERENCE </SectionTitle> <Paragraph position="0"> This section describes the language and physical knowledge that people use to perform reference identification and to recover from reference failure. The classification of knowledge sources and the observations on how to perform reference and to recover from reference failures were motivated from the analysis of the excerpts in the previous section. Those observations have been formalized as a set of metarules (which we call relaxation rules) that are used both to guide the reference process and to determine when to delete or modify portions of a speaker's description. Section 4 presents those rules in the context of the reference and miscommunication recovery mechanism. We feel that the knowledge sources motivated in this section carry across different people and domains. However, we recognize that the particulars described within each knowledge source are not universal and can vary across people and domains. For example, we would expect a difference in the knowledge used by two experts communicating as opposed to that employed by a novice and an expert.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 KNOWLEDGE FOR REPAIRING DESCRIPTIONS </SectionTitle> <Paragraph position="0"> When things go wrong during a conversation, people have many sources of knowledge that they bring to bear to get around the problem (e.g., see Ringle and Bruce 1981). Much of the time the repairs are so natural that we aren't conscious that they have taken place. At other times, we must make an effort to correct what we have heard, or determine that we need clarification from the speaker. Either repair process involves the use of knowledge about conversation, social conventions, and the world around us.</Paragraph> <Paragraph position="1"> In this work, we chose to consider the repair of descriptions rather than complete utterances. The most relevant knowledge for repairing descriptions is the conversation itself and the real world described therein (as illustrated by the excerpts in Section 2.2). This knowledge can be broken down into numerous forms.</Paragraph> <Paragraph position="2"> * Linguistic knowledge is the knowledge that expresses the use of the structure and meaning of a description.</Paragraph> <Paragraph position="3"> * Perceptual knowledge is composed of information about a person's abilities to distinguish feature values, his preferences in features and feature values (i.e., what features are most important to him in this domain), and his extraction of information from the internal representation of his perception of an object.</Paragraph> <Paragraph position="4"> * Discourse knowledge is concerned with how a person interprets the flow of conversation and its effects on highlighting relevant parts of the world.</Paragraph> <Paragraph position="5"> * Hierarchical knowledge is concerned with the use of knowledge about generality and specificity of descriptions to decide if a description is either too vague or overly specific.</Paragraph> <Paragraph position="6"> * Trial and error knowledge is information gained when a listener attempts a requested action on requested objects and then compares the result of the action with his expectations.</Paragraph> <Paragraph position="7"> Other knowledge sources will not be covered here. For example, Pragmatic knowledge about mutual belief and actions (Cohen 1978, Allen 1979, Perrault and Cohen 1981, Appelt 1981) is missing because we restricted our work to noun phrases instead of complete utterances.</Paragraph> <Paragraph position="8"> Domain knowledge (including functional information) isn't covered because it is treated well elsewhere. (Grosz 1977).</Paragraph> <Paragraph position="9"> These knowledge sources can be used to guide the repair of the speaker's description when no referent is found. They are part of a &quot;relaxation&quot; process. Relaxation would typically mean in the reference identification paradigm that the system drops features in the speaker's description one at a time until a referent is found or none are left. We have something different in mind. First, relaxation means more than simply dropping a feature value. It also means replacing the feature value with another one the knowledge sources consider reasonable.</Paragraph> <Paragraph position="10"> Second, we want an order to be chosen to drop the features. The interesting part is that this ordering comes from a negotiation among the knowledge sources. The actual negotiation, which is a control problem, is discussed in Section 4.</Paragraph> <Paragraph position="11"> Speakers can utilize many different kinds of linguistic structures to describe objects in the extensional world. This section outlines some of these structures and their Computational Linguistics, Volume 12, Number 4, October-December 1986 283 Bradley A. Goodman Reference Identification and Reference Identification Failures meanings and shows how they can be used to guide repairs in the description.</Paragraph> <Paragraph position="12"> A description of an object in the extensional world usually includes enough information about physical features of the object so that listeners can use their perceptual abilities to identify the object. 8 Those physical features are normally specified as modifiers of nouns and pronouns. The typical modifiers are adjectives, relative clauses, and prepositional phrases. They are often interchangeable; that is, one could specify a feature using any of the modifier forms. One modifier form, however, may be better suited for expressing some particular feature than another.</Paragraph> <Paragraph position="13"> Relative clauses are well suited for expressing complicated information since they are separate from the main part of the noun phrase and can be arbitrarily complex themselves. They can restrict the word or phrase they modify. They function in the following ways in extensional reference: * Complex relationships such as spatial relations (e.g., the blue cap that is on the main tube), and function information (e.g., the thing with the wire that acts like a plunger).</Paragraph> <Paragraph position="14"> * Assertions of extra (usually restrictive) information, information possibly outside the domain knowledge and not useful for finding the referent at this time (e.g., an L-shaped tube of clear plastic that is defined as a spout).</Paragraph> <Paragraph position="15"> * Material useful for confirming that the proper referent was found (e.g., the long blue tube that has two outlets on the side).</Paragraph> <Paragraph position="16"> * A respecification of the initial description in more detail. For example, in the case of the descriptions the thing that is flared at the top and the main tube which is the biggest tube, the relative clauses are needed because the initial descriptions are too general to distinguish any one object.</Paragraph> <Paragraph position="17"> Prepositional phrases are better fitted for simpler pieces of information. They are often part of expressions of predicative relationships.</Paragraph> <Paragraph position="18"> * A comparative or superlative relation (e.g., the smallest of the red pieces).</Paragraph> <Paragraph position="19"> * A subpart specification - used to access the subpart of the object under consideration (e.g., the top end o~ the little elbow joint, that water chamber with the blue bottom and the globe top).</Paragraph> <Paragraph position="20"> * Most perceptual features (e.g., with a clear tint, with a red color).</Paragraph> <Paragraph position="21"> Just like relative clauses, prepositional phrases can also provide confirmation information.</Paragraph> <Paragraph position="22"> Adjectives are used to express almost any perceptual feature - though complex relations can be awkward.</Paragraph> <Paragraph position="23"> Usually they modify the noun phrase directly, but sometimes they are expressed as a predicate complement. In those situations, the complement describes the subject of the linking verb (e.g., the tube is large). As with some of the relative clauses above, predicate complements have an assertional nature to them because they are normally used to state something about the subject of a sentence. Sometimes the head noun carries feature information.</Paragraph> <Paragraph position="24"> For example, one can use the bell to refer to a bellshaped object (though it does not necessarily have the function of a bell), or can say the cube instead of saying the block to refer to an object.</Paragraph> <Paragraph position="25"> It is implicitly clear that the structure of a noun phrase can affect its meaning in many ways (such as the ones mentioned above under relative clauses). Since there is no one-to-one mapping between a noun phrase's structure and its meaning, it is the hearer's job to determine how the structural information is being used.</Paragraph> </Section> </Section> <Section position="14" start_page="0" end_page="0" type="metho"> <SectionTitle> 3.1.2 RELAXING A DESCRIPTION USING LINGUISTIC KNOWLEDGE </SectionTitle> <Paragraph position="0"> We examined the water pump protocols and noted where and when the modifiers of a noun phrase come into play during reference resolution (e.g., we saw that people would often commence their search for a referent immediately, using each piece of the description as it is heard). Adjectives and prepositional phrases play a more central role during referent identification, because they are heard first, while relative clauses usually play a secondary role, because they normally come at the end of a description, often after a pause. However, relative clauses and predicate complements exhibit an assertional nature that, while reducing their usefulness for resolving the current reference, provides useful information that can be expressed in subsequent (anaphoric) references. For example, a speaker can describe the MAINTUBE by saying the long violet tube that has two outlets on the side versus the shorter the long violet tube with two outlets on the side. Our claim is that the speaker would use the rela-tive clause version to emphasize the information in the relative clause. Thus, relative clauses promote their contents (especially linguistically since they provide separation from the main clause) to an almost independent status. We feel this independent status stresses that the speaker took care in formulating the relative clause and that the information it conveys is less likely to be in error then if it had been expressed in a prepositional phrase or as an adjective; the water pump protocols tend to back up this claim (e.g., listeners would often use the information in a relative clause to confirm that their referent choice was correct). The head noun of the description can also be relaxed. It normally is relaxed last but could be relaxed prior to a relative clause (especially in the instances where the relative clause expresses confirmational information). Hence, our relaxation process attempts to weaken or remove features in a description in this order: adjectives, then prepositional phrases and finaIly relative clauses and predicate complements.</Paragraph> <Paragraph position="1"> For example, consider the description the blue cap that is on the main tube. Here, the features &quot;color&quot; and &quot;function&quot; are described in the adjective and head noun of the description, and the &quot;position&quot; in the relative clause. Following the rule suggested above, the relaxation 284 Computational Linguistics, Volume 12, Number 4, October-December 1986 Bradley A. Goodman Reference Identification and Reference Identification Failures of function and color should be attempted before position. The relaxation order proposed here is not meant to be the only way to relax the description. The order, in fact, may be modified by other knowledge sources.</Paragraph> <Paragraph position="2"> There are many other kinds of linguistic constituents that can be examined to see if there are principled ways to relax them, too. These include premodifier and post-modifier forms, nominals, participles, and genitives.</Paragraph> <Paragraph position="3"> While we didn't consider any of them in detail, there is no reason why they should not fit into the relaxation framework.</Paragraph> <Paragraph position="4"> Our system must take into account how people perceive objects in the world and how their perceptions can be represented. To do so, each object in the world has two representations in our system: a spatial (3-D) representation and a cognitive/linguistic representation that shows how the system could actually talk about the object. The spatial description is a physical description of the object in terms of its dimensions, the basic 3-D shapes composing it, and its physical features (along the lines developed in Agin (1979) and Goodman (1981)). It represents the result of human perceptual skill. The cognitive/linguistic form is a representation of the parts and features of the object in linguistic terms. In many ways this representation encodes the human capacity to extract information from our perceptual system and turn physical representations into words. It overlaps the spatial form - which holds relatively constant across people - in many respects, but it is more suggestive of the listener's own perceptions. The cognitive/linguistic form often describes aspects of an object, such as its subparts, by its position on the object (&quot;top&quot;, &quot;bottom&quot;) and its functionality (&quot;outlets&quot;, &quot;places for attachment&quot;). More than one cognitive/linguistic form can refer to the same physical description. Some properties of an object differ in how they are expressed in the two forms. In the 3-D form, there are primarily properties such as numerical dimensions (e.g., 3 feet by 5 feet) and basic shapes (e.g., generalized cylinders), while, in the cognitive/linguistic form, there are relative dimensions (e.g., large) and analogical shapes (e.g., the L-shaped tube or the champagne top sort of looking bottom).</Paragraph> <Paragraph position="5"> Perceived objects, when spoken about, .must be interpreted. This can lead to discrepancies between individuals. People usually agree on the spatial representation but not necessarily on the cognitive/linguistic description. This disagreement can lead to reference problems. For example, misjudgements by the speaker in calling an object &quot;large&quot; can cause the hearer to fail to find an object in the visual world that has dimensions that are perceptually &quot;large&quot; to the listener.</Paragraph> <Paragraph position="6"> To avoid confusing the listener, a speaker must distinguish the objects in the environment from each other using perceptually useful features because these perceptual features provide people with a way to discriminate one object from another. A speaker must take care when selecting from these features since the hearer can become confused about the values of a feature irrespective of the actual object being d~scribed. Perceptual features may be inherently confusing because a feature's values are difficult to differentiate (e.g., is the tube a cylinder or a slightly tapering cone?). They may also be confusing because the speaker and listener may have differing sets of values for a feature (e.g., what may be blue for someone may be turquoise for another). These characteristics affect the salience of a feature (see McDonald and Conklin (1982) for a description of feature salience) which in turn determines the feature's usefulness in a description. A feature that is common in everyday usage (e.g., color, shape, or size) is salient because the listener assumes that he can readily distinguish the feature's possible values from one another. Of course, very unusual values of a feature can stand out, making it even easier to discriminate a unique object from all other objects (McDonald and Conklin 1982).</Paragraph> <Paragraph position="7"> The objects in the world may exhibit a feature whose possible values are difficult to distinguish. This occurs when a perceived feature does not have much variability in its range of values: all or subsets of the values are clustered closely together making it hard to tell the difference between one value and the next. 9 This increases the likelihood of confusion because the usefulness of specifying the feature to a non-expert is diminished (especially if the speaker is more expert than the listener in distinguishing feature values). Hence, if one of these difficult feature values appears in the speaker's description, the listener, if he isn't an expert, will often relax the feature value to any of the members of the set of feature values. For example, if the speaker knows many shades of the color &quot;red&quot; (such as scarlet, crimson, cherry, maroon, or magenta), the average listener may not be able to distinguish them from each other and may be just as happy to pick up the maroon plug for the magenta plug.</Paragraph> <Paragraph position="8"> When the number of features available for describing an object is small, one could expect to have trouble discerning one object from the next depending on the quality of the features themselves. If the environment is full of objects whose perceived features (e.g., color, size or shape) are similar, one would expect more miscommunication the larger the similarities. In those cases where perceptual information can only group objects instead of highlighting a unique one, the members of the group might become distinguishable when functional information is added. 1deg In other words, one may only know about the appearance of an object, but once one knows the function, the object and other potential contenders (might) become dissimilar (Grosz 1981). Of course, poor functional descriptions, like the ones illustrated in Section</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 for Bad Analogies, can lead to even more trouble. </SectionTitle> <Paragraph position="0"/> </Section> </Section> <Section position="15" start_page="0" end_page="0" type="metho"> <SectionTitle> 3.1.4 RELAXING A DESCRIPTION USING PERCEPTUAL KNOWLEDGE </SectionTitle> <Paragraph position="0"> When examining the features presented in a speaker's description, one can consider perceptual aspects to deter-Computational Linguistics, Volume 12, Number 4, October-December 1986 285 Bradley A. Goodman Reference Identification and Reference Identification Failures mine which features are most likely in error. Such an inspection generates a partial ordering of features for use during the repair process to determine which feature in a description to relax. The relaxation ordering suggested by the inspection of features interacts with ordering proposals from other knowledge sources.</Paragraph> <Paragraph position="1"> Active features are ones that require a listener to do more than simply recognize that a particular feature value belongs to a set of possible values - the listener must perform some kind of evaluation. They include the use of relative dimensions (e.g., large), comparatives (e.g., larger), or superlatives (e.g., largest). When considering the water pump domain, we found that listeners were better at judging less active feature values (e.g., color values). Speakers, however, seem to be casual with less active features (possibly because they feel listeners are better with them) while the active ones require their full attention. Hence, in a reference failure, the source of the problem is often the less active ones. This suggests that one should first relax those features that require less active consideration such as color (though it is easier to relax red to orange than red to blue; we will ignore such facts until a later stage of the relaxation process), composition, transparency, shape, and function because we would expect a speaker to be more serious about his use of active features. Only after them should one relax those features that require active consideration of the object under discussion and its surroundings (such as superlatives, comparatives, and relative values of size, length, height, thickness, position, distance, and weight).</Paragraph> <Paragraph position="2"> The water pump dialogues provided some evidence for this. For example, many speakers described the MAINTUBE using a relative size adjective such as big or large. One of the descriptions of the tube was the large blue tube. The MAINTUBE, which was the largest object, actually was violet but there was a smaller blue tube, the STAND. Subjects still tended to select the MAINTUBE over the STAND, even with the color discrepancy, hinting that they preferred relaxing color (a less active feature) before relative size (an active feature).</Paragraph> </Section> <Section position="16" start_page="0" end_page="0" type="metho"> <SectionTitle> 3.1.5 DISCOURSE KNOWLEDGE IN PREFERENCE </SectionTitle> <Paragraph position="0"> Discourse knowledge concerns discourse structure, the flow of discourse, and the use of discourse to highlight parts of the real world (see Grosz (1977), Reichman (1978, 1981), Sidner (1979), Allen, Frisch, and Litman (1982), Litman (1983), and Polanyi and Scha (1984~ for detailed treatments on discourse). There are several mechanisms that can highlight objects in discourse (see work on focus by Grosz (1977), Reichman (1978) and Sidner (1979)). They provide a partition of the real world that prunes the set of objects to consider during referent identification. Discourse knowledge also helps highlight what knowledge a speaker and listener have in common at any point in a dialogue. Conversants share knowledge about past actions and objects and general knowledge about the world (e.g., how to fit objects together or the functions of common objects). Focusing can demarcate which of several perspectives of world knowledge conversants should be using to interpret each other's utterances. This simplifies the amount of information that must be packaged in each utterance, reducing places for error. For example, deictics can be used to anchor descriptions to current or past context. The description the yellow polka-dotted motor requires a listener to look to see how the description hooks up to the current discourse situation. However, the description the yellow polka-dotted motor ! showed you yesterday is anchored by the deictic yesterday and is more easily searchable.</Paragraph> </Section> <Section position="17" start_page="0" end_page="0" type="metho"> <SectionTitle> 3.1.6 RELAXING A DESCRIPTION USING DISCOURSE KNOWLEDGE </SectionTitle> <Paragraph position="0"> Discourse knowledge helps the listener determine whether or not the problem is in the speaker's description or resides elsewhere. When normal reference fails (i.e., no referent corresponds to a description) and recovery is attempted, discourse knowledge can be used to determine whether the problem resides not in the description itself but possibly at the discourse level. For example, midstream corrections in an utterance by a speaker could cause a listener to either miss a shift in focus or to shift focus when no shift was intended. This was exemplified in Excerpt 7 in Section 2.2 when the speaker attempted to undo an earlier request and did not properly demark the shift of focus. The work of Grosz (1977, 1981), Reichman (1978, 1981), Webber (1978), and Sidner (1979) provided rules on deictics, anaphoric definite noun phrases, the use of pronominals versus nonpronominals, and so forth, that can be used to zero in on discourse problems. So, for example, if a self-correction of the use of a pronominal occurs (e.g., &quot;....it - the X&quot;), then a rule might state that focus could have shifted to X.</Paragraph> <Paragraph position="1"> Relaxation is then achieved by trying the hypothesized focus to see if a referent can now be found. In general, discourse knowledge can suggest when the problem may be due to the listener focussing on the wrong set of objects. Correction can be attempted by shifting to another set and testing whether or not the description better fits one of the objects in the new set.</Paragraph> </Section> <Section position="18" start_page="0" end_page="0" type="metho"> <SectionTitle> 3.1.7 HIERARCHICAL KNOWLEDGE IN REFERENCE </SectionTitle> <Paragraph position="0"> Imprecision (i.e., being overly general) in a speaker's description can lead to confusion. Being too specific can lead to similar results. Hierarchical knowledge - that is, knowledge about a hierarchy of taxonomic information about our world - can be used by a listener to determine the degree of imprecision or specificity of a description.</Paragraph> <Paragraph position="1"> We can model this behavior by consulting a prestored generic/specific hierarchy of world elements, using the current context to guide the comparison of the speaker's current description to elements in the hierarchy, and deciding on the basis of the comparison if the description was imprecise. This comparison can isolate two types of imprecision: imprecision of the whole description or imprecision of a particular feature value.</Paragraph> <Paragraph position="2"> 286 Computational Linguistics, Volume 12, Number 4, October-December 1986 Bradley A. Goodman Reference Identification and Reference Identification Failures An imprecise description, missing details needed to fully distinguish a unique real world object, should point out numerous candidates that exhibit the general features in the description rather than none at all. Imprecise descriptions can, however, lead to confusion that blocks the listener from finding any referent. If a particular feature specified in a description is difficult to apply because it isn't specific or well-defined, then it may be necessary to ignore it (e.g., the use of a value like &quot;funny&quot; such as in that funny red thing). If a feature is ambiguous with respect to how it should be applied, then it may either require relaxation or further restriction (e.g., for the use of a feature value like &quot;rounded,&quot; we must ask whether we mean &quot;2-D&quot; or &quot;3-D&quot; rounded, &quot;cylindrical&quot; or &quot;bell-shaped&quot;, and so on). The determination that a feature is too imprecise might be possible before a search for a referent is commenced. An examination of how high in the hierarchy the feature value appears could signal when a more detailed value is needed. Each of these problems was reflected in the water pump protocols by listeners (e.g., see Excerpts 4 and 6).</Paragraph> <Paragraph position="3"> They often avoided searching for a referent because the speaker's description was just too imprecise, causing them confusion from the onset.</Paragraph> <Paragraph position="4"> The condition of being too specific is more difficult to detect. In a task-oriented environment, one would not easily notice that something was too specific since normally being very specific is a wise goal for a speaker.</Paragraph> <Paragraph position="5"> The drawback of being too specific occurs not so much because of the specificity itself but because of its adverse side-effects. These side-effects include the use of feature values that are too difficult for a non-expert to determine, leading to confusion. A description can also be overspecific if it contains too many feature values or contains a feature that is overpowering (e.g., see, respectively, Excerpts 2 and 10).</Paragraph> </Section> <Section position="19" start_page="0" end_page="0" type="metho"> <SectionTitle> 3.1.8 RELAXING A DESCRIPTION USING HIERARCHICAL KNOWLEDGE </SectionTitle> <Paragraph position="0"> Hierarchical knowledge can resolve certain ambiguities by climbing or descending the hierarchy. Such a hierarchy search requires looking at a description at two levels: * the description's placement' in the generic/specific hierarchy and * the placement of the filler of each feature of the description in the generic/specific hierarchy.</Paragraph> <Paragraph position="1"> Hierarchical knowledge also interacts with perceptual knowledge. The hearer can become confused when a feature value in the speaker's description is too hard to judge. For example, it is difficult to determine whether a particular feature value applies when it is too specific. If a more imprecise value is used (and it applies only to one object), it might be easier to find the described object (e.g., hippopotamus face shaped valve would be better stated as rounded valve, as seen in Excerpt 10). Hence, in cases where a feature value is too specific, more imprecise values could be tried to see if a referent can then be found. These more imprecise values are found by looking higher in the hierarchy above the current feature value for more general terms.</Paragraph> <Paragraph position="2"> The use of 'hierarchical knowledge isn't always the most appropriate way to repair a description. Consider descriptions introduced only by head nouns (i.e., category descriptions) such as the plunger. In such instances, when no clear representative of the category object is present, it is not necessarily best to check the generic/specific hierarchy to see what is &quot;above&quot; the concept representing the category (e.g., finding device above plunger). It might be better to examine the attributes relevant to an average member of the category set since it may be the standard values of those attributes that the speaker is trying to get across in his or her description. For example, in the description the man drinking the martinL the speaker may be trying to get the listener to look for someone drinking a clear liquid from a certain shaped glass with an olive in it. The speaker isn't particularly concerned with the fact that the drink contains gin and vermouth. If we just consulted the generic/specific taxonomy, however, we might simply relax martini to alcoholic beverage (such as to the man drinking the beer) to liquid (such as to the man drinking water) and miss the descriptors that the speaker really intended us to use. 11 3.1.9 TRIAL AND ERROR KNOWLEDGE IN REFERENCE Trial and error knowledge has to do with performance feedback. Its primary use is to determine whether a referent was properly identified (including ones found with the relaxation process). Performance of a requested action is the strongest determining factor of whether or not the listener correctly interpreted a speaker's description, t2 Successful completion of an action will be likely to build confidence in the listener that he correctly interpreted a description. Failure to find an object after relaxation leads the listener to ask the speaker to clarify; failure to successfully perform the requested action on the object found during referent identification causes the listener to ask himself what is wrong. The trouble might be due to: an action but also from an action's postcondition failing. 13 Determination of how badly a postcondition must fail before the listener asks for clarification instead of reconsidering the description - is unclear from the current protocols; further analysis collected from different protocols might resolve this matter.</Paragraph> </Section> <Section position="20" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 REPAIRING REFERENCE FAILURES 4.1 INTRODUCTION </SectionTitle> <Paragraph position="0"> The previous sections illustrated how task-oriented natural language interactions in the real world can induce contextually poor utterances and the kinds of knowledge Computational Linguistics, Volume 12, Number 4, October-December 1986 287 Bradley A. Goodman Reference Identification and Reference Identification Failures people use to reason about them. Given all the possibilities for confusion, when confusions do occur, they must be resolved if the task is to be performed. This section explores the problem of fixing reference failures.</Paragraph> <Paragraph position="1"> Reference identification is a search process where a listener looks for something in the world that satisfies a speaker's uttered description. A computational scheme for performing such reference identifications has evolved from work by other artificial intelligence researchers (e.g., see Grosz 1977, Hoeppner et al. 1983). That traditional approach succeeds if a referent is found, or fails if no referent is found (see Figure 3a). However, a reference identification component must be more versatile than those previously constructed. The excerpts provided in Section 2.2 show the traditional approach is inadequate because people's real behavior is much more elaborate. In particular, listeners often find the correct referent even when the speaker's description does not describe any object in the world. For example, a speaker could describe a turquoise block as the blue block. Most listeners would go ahead and assume the turquoise block was a listener's perception of the world. The listener must ask himself whether he can perceive one of the objects in the world the way the speaker described it. In some cases, the listener's perception may overrule parts of the description because the listener can't perceive it the way the speaker described it.</Paragraph> <Paragraph position="2"> To repair the traditional approach we have developed an algorithm that captures for certain cases the listener's ability to negotiate with himself for a referent. It can search for a referent and, if it doesn't find one, it can try to find possible referent candidates that might work, and then loosen the speaker's description using knowledge about the speaker, the conversation, and the listener himself. Thus, the reference process becomes multi-step and resumable. This computational model, which we call FWlM for &quot;Find What I Mean&quot;, is more faithful to the data than the traditional model (see Figure 3b).</Paragraph> <Paragraph position="3"> A key feature to reference identification is negotiation.</Paragraph> <Paragraph position="4"> Negotiation in reference identification comes in two forms. First, it can occur between the listener and the speaker. The listener can step back, expand greatly on the speaker's description of a plausible referent, and ask for confirmation that he has indeed found the correct referent. For example, a listener could initiate negotiation with I'm confused. Are you talking about the thing that is kind of flared at the top? Couple inches long. It's kind of blue. Second, negotiation can be with oneself. This selfnegotiation is the one we are most concerned with in this research. The listener considers aspects of the speaker's description, the context of the communication, the listener's own abilities, and other relevant sources of knowledge. He then applies that deliberation to determine whether one referent candidate is better than another or, if no candidate is found, what are the most likely places for error or confusion. Such negotiation can result in the listener testing whether or not a particular referent works. For example, linguistic descriptions can influence One means of making sense of a failed description is to delete or replace the portions that cause it not to match objects in the hearer's world. In our program we are using &quot;relaxation&quot; techniques to capture this behavior. Our reference identification module treats descriptions as approximate. It relaxes a description in order to find a referent when the literal content of the description fails to provide the needed information.</Paragraph> <Paragraph position="5"> Relaxation, however, is not performed blindly on the description. We try to model a person's behavior by drawing on sources of knowledge used by people. We have developed a computational model that can relax aspects of a description using many of these sources of knowledge. Relaxation then becomes a form of communication repair (in the style of the work on repair theory found in Brown and VanLehn (1980)). A goal in our model is to use the knowledge sources to reduce the number of referent candidates that must be considered while making sure that a particular relaxation makes sense.</Paragraph> </Section> <Section position="21" start_page="0" end_page="0" type="metho"> <SectionTitle> 4.2 THE REFERENT IDENTIFIER AND RELAXATION COMPONENT </SectionTitle> <Paragraph position="0"> This section describes the overall relaxation component in the context of the referent identifier. We explain how the relaxation component draws on knowledge sources about descriptions and the real world as it tries to relax an errorful description to one for which a referent can be identified.</Paragraph> <Paragraph position="1"> Identifying the referent of a description requires finding an element in the world that corresponds to the speaker's description (where every feature specified in the description is present in the element in the world but not necessarily vice versa). This process corresponds to the technique employed in the traditional reference mechanism. The initial task of our reference mechanism is to determine whether or not a search of the (taxonomic) knowledge base that we use to model the world is necessary. For example, in the water pump domain, the reference component should not bother searching - unless specifically requested to do so - for a referent for indefinite noun phrases (which usually describe new or hypothetical objects) or extremely vague descriptions (which are ambiguous because they do not clearly describe an object since they are composed of imprecise feature values). A number of aspects of discourse pragmatics can-be used in that determination. For example, the use of a deictic in a definite noun phrase, such as this X or the last X, hints that the object was either mentioned previously or that it probably was evoked by some previous reference, and that it is searchable. We will not examine such aspects any further in this paper.</Paragraph> <Paragraph position="2"> The knowledge base contains linguistic descriptions and a description of the listener's visual scene itself. In our implementation and algorithms, we assume it is represented in KL-One (Brachman 1977), a system for describing taxonomic knowledge. KL-One is composed of CONCEPTs, ROLEs on concepts, and links between them.</Paragraph> <Paragraph position="3"> A CONCEPT denotes a set, representing those elements described by it. A SUPERC link (&quot;==>&quot;) is used between concepts to show set inclusion. It defines a relation called subsumption that specifies that the set denoted by one concept is included in the other. For example, consider Figure 4. The SUPERC from Concept B to Concept A is like stating B CA for two sets A and B.</Paragraph> <Paragraph position="4"> An INDIVIDUAL CONCEPT is used to guarantee that the set specified by a concept denotes a singleton set. The Individual Concept D shown in the figure is defined to be a unique member of the set specified by Concept C.</Paragraph> <Paragraph position="5"> ROLEs on concepts are like attributes or slots in other knowledge representation languages. They define a functional relationship between the concept and other concepts that specifies a restriction on what can fill a particular slot.</Paragraph> <Paragraph position="6"> Once a search of the knowledge base is considered necessary, a reference search mechanism is invoked. The search mechanism uses the KL-One Classifier (Lipkis 1982) to search the knowledge base taxonomy. This search is constrained by a focus mechanism based on the one developed by Grosz (1977). The Classifier's purpose is to discover all appropriate subsumption relationships between a newly-formed description and all other concepts in a given taxonomy. With respect to reference, this means that descriptions of all possible referents of the description will be subsumed by the description after it has been classified into the knowledge base taxonomy.</Paragraph> <Paragraph position="7"> If more than one candidate referent is below (when a concept A is subsumed by B, we say A is below B) the classified description, then, unless a quantifier in the description specified more than one element, the speaker's description is ambiguous. If exactly one concept is below it, then the intended referent is assumed to have been found. Finally, if no referent is found below the classified description, the relaxation component can be invoked. Prior to actually using the relaxation component, FWlM checks to see if the problem resides not with the description but with pragmatic issues. We will only consider the no reference case in the rest of the paper.</Paragraph> </Section> <Section position="22" start_page="0" end_page="0" type="metho"> <SectionTitle> 4.2.2 COLLECT VOTES FOR OR AGAINST RELAXING THE DESCRIPTION </SectionTitle> <Paragraph position="0"> If the referent search fails, then it is necessary to determine whether the lack of a referent for a description has to do with the description itself (i.e., reference failure) or with outside forces that are causing reference confusion.</Paragraph> <Paragraph position="1"> For example, an external problem due to outside forces may be with the flow of the conversation and the speaker's and listener's perspectives on it; it may be due to incorrect attachment of a modifier; it may be due to the action requested; and so on. Pragmatic rules are invoked to decide whether or not the description should be relaxed. For example, aspects on focus, metonomy and synecdoche are considered to see if they affected the referent search) 4 These rules will not be discussed here; we will assume that the problem lies in the speaker's description.</Paragraph> <Paragraph position="2"> to relax and in what order, and use those ordered features to order the potential candidates with respect to the preferred ordering of features, and * determine the proper relaxation techniques to use and apply them to the description.</Paragraph> </Section> <Section position="23" start_page="0" end_page="0" type="metho"> <SectionTitle> FIND POTENTIAL REFERENT CANDIDATES </SectionTitle> <Paragraph position="0"> Before relaxation takes place, the algorithm looks for potential candidates for referents (which denote elements in the listener's visual scene). These candidates are discovered by performing a &quot;walk&quot; in the knowledge base taxonomy in the general vicinity of the speaker's classified description as partitioned by the focusing mechanism. The walk is performed by moving up and down the SuperC links, checking each candidate. A KL-One partial matcher is used to determine how close the candidate descriptions found during the walk are to the speaker's description. The partial matcher generates a numerical score to represent how well the descriptions match (after first generating scores at the feature level to help determine how the features are to be aligned and how well they match). This score is based on information about KL-One (e.g., the subsumption relationship between or the equality of two feature values) and does not take into account any information about the task domain. The set of best descriptions returned by the matcher (as determined by some cutoff score) is selected as the set of referent candidates. The ordering of features and candidates for relaxation described below takes into account the task domain.</Paragraph> </Section> <Section position="24" start_page="0" end_page="0" type="metho"> <SectionTitle> ORDER THE FEATURES AND CANDIDATES FOR RELAXATION </SectionTitle> <Paragraph position="0"> At this point the reference system inspects the speaker's description and the candidates, decides which features to relax and in what order, 15 and generates a master ordering of features for relaxation. That ordering is important since relaxing in different orders could yield matches to different objects. Once the feature order is created, the reference system uses that ordering to determine the order in which to try relaxing the candidates.</Paragraph> <Paragraph position="1"> We draw primarily on sources of linguistic knowledge, pragmatic knowledge, discourse knowledge, domain knowledge, perceptual knowledge, hierarchical knowledge, and trial and error knowledge during this repair process. A detailed treatment of many of them was presented in Section 3. These knowledge sources are consulted to determine the feature ordering for relaxation. We represent information from each knowledge source as a set of relaxation rules. Most of the rules were motivated by the problems illustrated in the protocols.</Paragraph> <Paragraph position="2"> They are written in a PROLOG-like language. Figure 5 illustrates one such linguistic knowledge relaxation rule.</Paragraph> <Paragraph position="3"> This rule is motivated by the observation that speakers typically add more important information at the end of a description (where it is separated from the main part of the description and, thus, provides more emphasis). The rule in Figure 5 simply embodies the fact that relative clauses are found at the end of noun phrases, while adjectives are not and, thus, the features of a description that are provided adjectivally should be reiaxed before those provided by a relative clause. However, a more general and more applicable rule is that information presented at the end of a description is usually more prominent (i.e., that information was placed more strongly in focus by the speaker).</Paragraph> <Paragraph position="4"> Relax the features in the speaker's description in the order: adjectives, then prepositional phrases, and finally relative clauses and predicate complements.</Paragraph> <Paragraph position="5"> discourse knowledge relaxation rules. The rules note when misfocus is likely. They simulate how a listener can detect confusion on the part of the speaker during the search for a referent if the speaker interrupts his own utterance. 16 An interruption can come about with a false start or a self-correction. A false start occurs when the speaker goofs on his initial description, stops, and then restarts the description (also see Polanyi (1978) on false starts). For example, exclamations like oops, never mind, oh no, and so on, are signals of false starts meant to inform the listener that there is a problem, though not stating precisely where the problem occurred. The problem could be due to the current utterance or a previous one. Speakers often (falsely) assume the listener &quot;knows&quot; just where the speaker means. Typically, a listener presumes the problem is with the current utterance. A listener should, however, note that a false start has occurred at this point in the dialogue and be prepared to back up to the same place later on. Self-corrections are less interruptive than false starts and more explicit about the source of the problem. They are redescriptions of a piece of the speaker's utterance that occur as it is spoken. Descriptions like it-the tube or the large blue-uh violet tube are typical ones that occur. As with false 290 Computational Linguistics, Volume 12, Number 4, October-December 1986 Bradley A. Goodman Reference Identification and Reference Identification Failures starts, such places are conducive to confusion and should be noted by the listener.</Paragraph> <Paragraph position="6"> Focus shift relaxation rules:</Paragraph> <Paragraph position="8"> where FalseStartlui: This predicate determines whether or not a false start has occurred tn some utterance, u. Such false starts have to be caught by the parser.</Paragraph> <Paragraph position="9"> Self-Correction|all: This predicate looks for self-correct/ons in a description, d. As with PalseStarto it would have to be implemented inside the parser.</Paragraph> <Paragraph position="10"> Figure 6. Two discourse knowledge relaxation rules. Each knowledge source produces its own partial ordering of features. 17 Each partial ordering is topologically sorted to provide a consistent format for comparison. The partial orderings are then considered together. For example, perceptual knowledge may say to relax colo,'. However, if the color value was asserted in a relative clause, linguistic knowledge would rank color lower, i.e., placing it later in the list of things to relax. Since different knowledge sources generally produce different partial orderings of features, these differences can lead to a conflict over which features to relax. It is the job of the best candidate algorithm to resolve these disagreements among knowledge sources. Its goal is to order the referent candidates, C1, C 2 ..... Cn, so that relaxation is attempted on the best candidates first. Those candidates are the ones that conform best to the proposed feature orderings. To start, the algorithm examines candidates and the feature orderings from each knowledge source. For each candidate Cj, the algorithm scores the effect of relaxing the speaker's original description D to Cj, using the feature ordering from one knowledge source. The score reflects the goal of minimizing the number of features relaxed while trying to relax the features that are &quot;earliest&quot; in the feature ordering? 8 Thus, these heuristics provide a simple way to reflect in the score how well a particular candidate fits a feature ordering. Notice that such scoring could very well favor a candidate C 1 that requires more features to be relaxed in D than another candidate C 2 if those features are earlier in the feature ordering than those required by is represented at the top of the figure. The set of specified features and their assigned feature value (e.g., the pair Color: Maroon) are also ~hown there. A set of objects in the real world are selected by the partial matcher as potential candidates for the referent. These candidates are shown near the top of the figure (C 1, C 2, .... Cn). Inside each box is a set of features and feature values that describe that object. A set of partial orderings are generated that suggest which features in the speaker's description should be relaxed first - one ordering for each knowledge source (shown as &quot;Linguistic,&quot; &quot;Perceptual,&quot; and &quot;Hierarchical&quot; in the figure). For example, linguistic knowledge recommends relaxing Color or Shape before Function, and relaxing Function before Size. Finally, the referent candidates are reordered using the information expressed in the speaker's description and in the partial orderings of features.</Paragraph> </Section> <Section position="25" start_page="0" end_page="0" type="metho"> <SectionTitle> DETERMINE WHICH RELAXATION METHODS TO APPLY </SectionTitle> <Paragraph position="0"> Once a set of ordered, potential candidates is selected, the relaxation mechanism begins step 3 of relaxation; it tries to find proper relaxation methods to relax the features that have just been ordered (success in finding such methods justifies relaxing the speaker's description to a particular candidate). It stops at the first candidate in the list of candidates to which methods can be successfully applied. This step is the second place where the knowledge sources are useful.</Paragraph> <Paragraph position="1"> Relaxation can take place with many aspects of a speaker's description: with complex relations specified in the description, with individual features of a referent specified by the description, and with the focus of attention in the real world where one attempts to find a match. Complex relations specified in a speaker's description include spatial relations (e.g., the outlet near the top of the tube), comparatives (e.g., the larger tube), and superlatives (e.g., the longest tube). These can be relaxed. The simpler features of an object (such as size or color) specified in the speaker's description are also open to relaxation.</Paragraph> <Paragraph position="2"> Relaxation of a description has a few global strategies that can be followed for each part of the description: 1. drop the errorful feature value from the description altogether, 2. weaken or tighten the feature value in a principled way keeping its new value close to the specified one (e.g., movement within a subsumption hierarchy of feature values), or 3. try some other feature value based on some outside information (e.g., knowing that people often confuse opposite word pairs such as using hole for peg as illustrated in Excerpt 10).</Paragraph> <Paragraph position="3"> When performing relaxation, one would attempt to use the least drastic measures first. (1) is the most drastic, while (2) is the least; (3) is in between.</Paragraph> <Paragraph position="4"> Often the objects in focus in the real world implicitly cause other objects to be in focus (Grosz 1977, Webber 1978). The subparts of an object in focus, for example, are reasonable candidates for the referent of a failing description and should be checked. At other times, the speaker might attribute features of a subpart of an object to the whole object (e.g., describing a plunger that is composed of a red handle, a metal rod, a blue cap, and a green cup as the green plunger). In these cases, the relaxation mechanism utilizes the part-whole relation in object descriptions to suggest a way to relax the speaker's description.</Paragraph> <Paragraph position="5"> These strategies are realized through a set of procedures (or relaxation methods) that are organized hierarchically. Each procedure is an expert at relaxing its particular type of feature and draws on the knowledge sources for its expertise. For example, a Generate-Similar-Feature-Values procedure is composed of procedures like Generate-Similar-Shape-Values, Generate-Similar-Color-Values and Generate-Similar-Size-Values. Each of those procedures is a specialist that attempts to first relax the feature value to one &quot;near&quot; or somehow &quot;related&quot; to the current one (e.g., one would prefer to first relax the color red to pink before relaxing it to blue) and then, if that fails, to try relaxing it to any of the other possible values. 19 The effect of the latter case is really the same as if the feature was simply ignored.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 AN EXAMPLE OF MISREFERENCE RESOLUTION </SectionTitle> <Paragraph position="0"> This section describes how a&quot; referent identification system can recover from a misreference using the scheme outlined in the previous section. For the purposes of this example, assume that the water pump objects currently in focus include the CAP, the MAINTUBE, the AIRCHAMBER and the STAND. Assume also that the speaker tries to describe two of the objects - the MAINTUBE and the AIRCHAMBER.</Paragraph> <Paragraph position="1"> &quot;... two devices that are clear plastic.</Paragraph> <Paragraph position="2"> One of them has two openings on the outside with threads on the end, and it's about five inches long.</Paragraph> <Paragraph position="3"> The other one is a rounded piece with a turquoise base on it. Both are tubular.</Paragraph> <Paragraph position="4"> The rounded piece fits loosely over ...&quot;</Paragraph> </Section> </Section> <Section position="26" start_page="0" end_page="297" type="metho"> <SectionTitle> MAINTUBE AIRCHAMBER </SectionTitle> <Paragraph position="0"> The reference system can find a unique referent for the first object (described by DeserA, DeserB and DeserD) but not for the second (described by DeserA, DeserC, DeserD, and DeserE), since none of the focused objects are TURQUOISE. The relaxation algorithm is shown below to reduce the number of referent candidates for the second object to two. It, then, requires the system/listener to try out those candidates to determine if one, or both, fits loosely. The protocols exhibit a similar result when the listener uses &quot;fits loosely&quot; to get the correct referent (e.g., Excerpt 6 exemplifies where &quot;fit&quot; is used by the speaker to help confirm that the proper referent was found). Our system simulates this test by asking the user about the fit.</Paragraph> <Paragraph position="1"> Figure 8 provides a simplified and linearized view of the actual KL-One representation of the speaker's descriptions after they have been parsed and semantically interpreted.</Paragraph> <Paragraph position="2"> A representation of each of the water pump objects currently under consideration (i.e., in focus) is presented in Figure 9. Each provides a physical description of the object - in terms of its dimensions, the basic 3-D shapes composing it, and its physical features - and a basic functional description of the object. The first upper case entry in each representation in Figure 9 defines the basic kind of entity being described (e.g., TUBE means that the object being described is some kind of tube). The words in mixed case refer to the names of features, and the other upper case words refer to possible fillers of those features from things in the water pump world. The &quot;Subpart&quot; feature provides a place for an embedded description of an object that is a subpart of a parent.</Paragraph> <Paragraph position="3"> object. Such subparts can be referred to on their own or as part of the parent object. The &quot;Orientation&quot; feature, used in the representations in Figure 9, provides a rotation and translation of the object from some standard orientation to the object's current orientation in 3-D space. The standard orientation provides a way to define relative positions such as top, bottom, or side. Figure 10 shows the KL-One taxonomy representing the same objects.</Paragraph> <Paragraph position="4"> Computational Linguistics, Volume 12, Number 4, October-December 1986 293 Bradley A. Goodman Reference Identification and Reference Identification Failures The first step in the reference process is the actual search for a referent in the knowledge base. In people, the reference identification process is incremental in nature, i.e., the listener can begin the search process before he hears the complete description. This was observed throughout the videotape excerpts where an apprentice would commence his search after just a few words in a description. We try to simulate this incremental nature in our algorithm. It is readily apparent when considering the placement of the first description in DeserD into the KL-One taxonomy shown in Figure 10.</Paragraph> <Paragraph position="5"> DeserD is incrementally defined by first adding DeserA as shown in Figure 1 1 - and then DescrB - as shown in Figure 13 - to the taxonomy. The KL-One Classifier compares the features specified in the speaker's descriptions with the features specified for each element in the KL-One taxonomy that corresponds to one of the current objects of interest in the real world. Notice that some features are directly comparable. For example, the &quot;Transparency&quot; feature of DeserA and the &quot;Transparency&quot; feature of MAINTUBE are both equal to &quot;CLEAR.&quot; All the other features specified in DeserA fit the MAINTUBE so the MAINTUBE can be described by DeserA. This is illustrated in Figure 12, where MAINTUBE is shown as a subconcept of DeserA. STAND also is shown as a subconcept of DescrA. AIRCHAMBER is shown as a possible subconcept (with the dotted arrow) because DeserA mismatches with it on one of its subparts. 2deg CAP#1 is not shown as a subconcept of DeserA since its &quot;Transparency&quot; feature is OPAQUE and not CLEAR. Other features require in-depth processing - which is outside the capability of the KL-One classifier - before they can be compared. The OPENING value of &quot;Subpart&quot; in DeserB provides a good example of this. Consider comparing it to the &quot;Subpart&quot; entries for MAINTUBE shown in Figure 9. An OPENING, as seen in Figure 14, is thought of primarily as a 2-D cross-section (such as a &quot;hole&quot;), while the three CYLINDER subparts of MAINTUBE (labelled as Lip, Outlet1, and Outlet2 in Figure 9) are viewed as (3-D) cylinders that have the &quot;Function&quot; of being outlets, i.e., OUTLET-ATTACH-MENT-POINTS. To compare OPENING and one of the cylinders, say CYLINDER#1 (for Lip), the inference must be made that both things can describe the same thing (similar inferences are developed in Mark (1982)). One way this inference can occur is by recursively examining the subparts of MAINTUBE (and their subparts, etc.) with the KL-One partial matcher until the cylinders are examined at the 2-D level. At that level, an end of the cylinder will be defined as an OPENING. With that examination, the MAINTUBE can be seen as described by DescrB. This inference process is illustrated in Figure 14. There the partial matcher examines the roles Lip, Outletl, and Outlet2 of MAINTUBE , which represents its subparts, and determines the following: These facts imply that OPENING can match any of the subparts Lip, Outlet1, or Outlet2 on MAINTUBE since those subparts are defined as cylinders that function as outlets (i.e., Outlet-Attachment-Points).</Paragraph> <Paragraph position="6"> DeserC poses different problems. DeserC refers to an object that is supposed to have a subpart that is TURQUOISE. The Classifier determines that DescrC could not describe either the CAP or STAND because both are BLUE. It also could not describe the MAINTUBE 21 or AIRCHAMBER since each has subparts that are either VIOLET or BLUE. The Classifier places DescrC as best it can in the taxonomy, showing no connections between it and any of the objects currently in focus. DescrD provides no further help and is similarly placed. This is shown in Figure 1 5. At this point, a probable misreference is noted. The reference mechanism now tries to find potential referent candidates, using the taxonomy exploration routine described in Section 4.2.3, by examining the elements closest to DeserD in the taxonomy and using the partial matcher to score how close each element is to DeserD. 22 This is illustrated in Figure 16. The matcher determines MAINTUBE, STAND, and AIRCHAMBER as reasonable candidates by aligning and comparing their features to DeserD.</Paragraph> <Paragraph position="7"> Scoring DeserD to MAINTUBE: viewed as a kind of BOTTOM. Therefore, BASE in DeserD could match to the subpart in MAINTUBE that has a Translation of (0.0 0.0 0.0) - i.e., Threads of MAINTUBE. However, they mismatch since color DescrD could match to the subpart in AIRCHAMBER that has a translation of (0.0 0.0 0.0) - i.e., Chamber-Bottom of AIRCHAMBER. However, they mismatch since color TURQUOISE in DescrD differs from color BLUE of AIRCHAMBER. (--) Figure 17 summarizes the scoring. A weighted, overall numerical score is generated from the scores shown there.</Paragraph> <Paragraph position="8"> Computational Linguistics, Volume 12, Number 4, October-December 1986 299 Bradley A. Goodman Reference Identification and Reference Identification Failures The above analysis using the partial matcher provides no clear winner since the differences are so close, causing the scores generated for the candidates to be almost exactly the same (i.e., the only difference was in the score for Transparency). If there was a candidate that had a score significantly better than the others, then that candidate would be a clear winner. For example, a clear winner occurs if all but one of the candidates differ drastically in their feature values when compared to the feature values in the speaker's description. In that instance, it would be unnecessary to proceed further; we would assume the winner was our referent. For this example, however, all candidates will be retained.</Paragraph> <Paragraph position="9"> At this point, the knowledge sources and their associated rules, mentioned earlier, apply. These rules attempt to order the feature values in the speaker's description for relaxation. First, we'll order the features in DescrD using linguistic knowledge. Linguistic analysis of DeserD, &quot;.... are clear plastic ... a rounded piece with a turquoise base ... Both are tubular ... fits loosely over .... &quot; tells us that the features were specified using the following modifiers: null Observations from the protocols (as described by the rules developed by Goodman (1984)) has shown that people tend to relax first those features specified as adjectives, then as prepositional phrases, and finally as relative clauses or predicate complements. Figure 5 shows this rule. The rule suggests relaxation of DescrD in the order: {Shape} < {Color,Subpart} < {Transparency,Composition, Analogical-Shape,Fit}.</Paragraph> <Paragraph position="10"> The set of features on the left side of a &quot;<&quot; symbol is relaxed before the set on the right side. The order that the features inside the braces, &quot;{...}&quot;, are relaxed is left unspecified (i.e., any order of relaxation is all right). Perceptual information about the domain also provides suggestions. Whenever a feature has feature values that are close, then one should be prepared to relax any of them to any of the others (we call this the clustered feature value rule; it was motivated in Section 3.1.3).</Paragraph> <Paragraph position="11"> Figure 18 illustrates a set of assertions that compose a data base of similar color values in some domain. The Similar-Color predicate is defined to be reflexive and symmetric but not transitive. In this example, since a number of the color pairs are very close, color may be a reasonable thing to relax (see Figure 19). The clustered color rule defined in Figure 20 would suggest such a relaxation. It requires that at least three objects in the world have similar colors. It is meant as an exemplar for a whole series of rules (e.g., ClusteredShapeValues, ClusteredTransparencyValues, and so on).</Paragraph> <Paragraph position="12"> One can relax a feature whose feature values are clustered closely to9ether before those of a non-clustered feature.</Paragraph> <Paragraph position="13"> Hierarchical information about how closely related one feature value is to another can also be used to determine what to relax. The Shape values are a good example, as shown in Figure 21. A CYLINDRICAL shape is also a CONICAL shape, which is also a 3-D ROUND shape. Hence, it is very reasonable to match ROUNDED to CYLINDRICAL.</Paragraph> <Paragraph position="14"> {Shape} < {Color,Subpart,Transparency,Composition, Analogical-Shape,Fit }.</Paragraph> <Paragraph position="15"> The suggested orderings above can be merged. Since such orderi.ngs can have contradictory suggestions, we must describe the merging process. For example, in the above orderings, one says to relax Color first, while the other says to relax Shape first. We combine both into one rule that says to relax either of them firsti &quot;{Shape,Color} < ...&quot;. Another condition that occurs in the above orderings is when one rule says to relax a particular feature before others, while another rule does not care which of those features are relaxed first. In that case, one should use the more restrictive rule. Hence since one rule states that Subpart should be relaxed before the features Transparency, Composition, Analogical-Shape, and Fit, and the other rule does not care about their ordering, we split out Subpart and put it before the others: Analogical-Shape,Fit}.</Paragraph> <Paragraph position="16"> The referent candidates MAINTUBE, STAND, and AIRCHAMBER can be examined and possibly ordered themselves using the above feature ordering. For this example, the relaxation of DeserD to any of the candidates requires relaxing their SHAPE and COLOR features. Since they each require relaxing the same features, the candidates Can not be ordered with respect to each other (i.e., none of the possible feature orders is better for relaxing the candidates). Hence, no one candidate stands out as the most likely referent.</Paragraph> <Paragraph position="17"> While no ordering of the candidates was possible, the order generated to relax the features in the speaker's description can still be used to guide the relaxation of each candidate. The relaxation methods mentioned at the end of the last section come into use here. Consider the shape values. The goal is to see if the ROUND shape specified in the speaker's description is similar to the shape values of each candidate. Generate-Similar-Shape-Values determines that it is reasonable to match ROUND to either the CYLINDRICAL or HEMISPHERICAL shapes of the AIRCHAMBER by examining the taxonomy shown in Figure 21 and noting that both shapes are below ROUND and 3D-ROUND. Notice that it is less reasonable to match CYLINDRICAL to HEMISPHERICAL since they are in different branches of the taxonomy. This holds equally true for the CYLINDRICAL shapes of the MAINTUBE and the STAND. Generate-Similar-Color-Values next tries relaxing the Color TURQUOISE. The assertions Similar-Color(&quot;BLUE&quot;,&quot;TURQUOISE&quot;) 4- and Similar-Color(&quot;GREEN&quot;,&quot;TURQUOISE&quot;)--- are found as rules containing TURQUOISE. The colors BLUE and GREEN are, thus, the best alternates. Here only two clear winners exist - the AIRCHAMBER and the STAND while the MAINTUBE is dropped as a candidate since it is reasonable to relax TURQUOISE to BLUE or to GREEN but not to VIOLET. Subpart, Transparency, Analogical-Shape, and Composition provide no further help (though the fact that the AIRCHAMBER has both CLEAR and OPAQUE subparts could be used to put it slightly lower than the STAND whose subparts are all CLEAR. This difference, however, is not significant.). This leaves trial and error attempts to try to complete the FIT action specified in DeserE. The one (if any) that fits - and fits loosely - is selected as the referent. The protocols showed that people often do just that - reducing their set of choices as best they can and then taking each of the remaining choices and trying out the requested action on them.</Paragraph> </Section> <Section position="27" start_page="297" end_page="297" type="metho"> <SectionTitle> 4.4 THE ACTUAL IMPLEMENTATION </SectionTitle> <Paragraph position="0"> The goal of our actual implementation of the reference and miscommunication mechanism was to provide a simulation of such a module in the context of a natural language system. We did not use an actual parser or semantic interpreter but assumed that we started with output expected from them. Such output was a representation in KL-One of the semantic interpretation of a description of an object in the water pump domain. We also built in KL-One a network of approximately 250 concepts to represent many of the water pump parts and their physical and functional features. A focus mechanism was simulated by a menu-driven routine that partitioned the network representation of the world to reflect focus spaces of referent candidates. We built a KL-One partial matcher and a network explorer to look for feasible referent candidates in the network. Finally, we wrote up a small batch of relaxation rules to test out our mechanism. null</Paragraph> </Section> class="xml-element"></Paper>