XML Viewer - j84-2002

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/84/j84-2002_metho.xml
Size: 38,752 bytes
Last Modified: 2025-10-06 14:11:34
<?xml version="1.0" standalone="yes"?>
<Paper uid="J84-2002">
  <Title>The Pragmatics of Referring and the IViodality of Communication 1</Title>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 The Phenomena of Interest: Referent Identifi-
</SectionTitle>
    <Paragraph position="0"> cation One referential goal that is essential to the present communication task is to get the hearer to identify the object the speaker has in mind. I shall be using the term &amp;quot;identify&amp;quot; in a very narrow, though important and basic, sense - one that intimately involves perception. Thus, the analysis is not intended to be general; it applies only when the referents are perceptually accessible to the hearer, and when the hearer is intended to use perceptual means to pick them out. For the time being, I shall explicitly not be concerned with a hearer's mentally &amp;quot;identifying&amp;quot; some entity satisfying a description, or discovering a co-referring description, although these operations are certainly important aspects of processing many referring expressions. In the remainder of this section, properties of the referent identification act are examined, in part by contrasting it with other concepts that have previously entered into computational linguistic analyses of reference.</Paragraph>
    <Paragraph position="1"> Referent identification requires an agent and a description. The essence of the act is that the agent pick out the thing or things satisfying the description. The agent need not be the speaker of the description, and indeed, the description need not be communicated linguistically, or even communicated at all. A crucial component of referent identification is the act of perceptually searching for something that satisfies the description. To determine which method(s) should be used in identifying the referent, the agent first requires some representation of the description per se. The description is decomposed by the hearer into a plan of action for identifying the referent. The intended and expected physical, sensory, and cognitive actions to be included in that plan may be signalled by the speaker's choice of predicates. For example, a speaker who utters, &amp;quot;the magnetic screwdriver, please&amp;quot;, may expect and intend for the hearer to place various screwdrivers against some piece of iron to determine which is magnetic. Similarly, a speaker uttering the description &amp;quot;the three two-inch long salted green noodles&amp;quot; may expect and intend the hearer to count, look at, measure, and perhaps taste various objects. For their part, hearers decompose the noun phrase/description to discover that &amp;quot;green&amp;quot; is determinable by vision, &amp;quot;inch&amp;quot; by measuring, &amp;quot;salted&amp;quot; primarily by taste, &amp;quot;noodle&amp;quot; primarily by vision, and &amp;quot;three&amp;quot; by counting. Speakers know this is what hearers can do, and l~hus, using a model of the hearer's capabilities and the causal connections among people, their senses, and physical objects, design the referring expression D to suggest the actions needed to identify the referent.</Paragraph>
    <Paragraph position="2"> Speakers often not only plan for hearers to identify the referents of descriptions, but also communicate, in the Gricean way (1957), their intention that the hearers do so. This intention may not be explicitly signalled in the utterance, but rather have to be recognized by the hearer. To respond appropriately, a hearer decides when identification is the intended act to perform in response to a description, what part this act will play in the speaker's and hearer's plans, and when to perform the act. If perceptually identifying a referent is represented as an action in the speaker's plan, hearers could reason about it just as they do about any other act, thereby becoming able to infer the speaker's intentions behind, for example, indirect identification requests.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 A sketch of a definition of perceptual refer-
</SectionTitle>
      <Paragraph position="0"> ent identification Figure 1 presents a sketchy definition of the referent identification action, in which the description is formed from &amp;quot;a/the y such that D(y)&amp;quot;. 7  7 This definition is not particularly illuminating, but it is not any vaguer than others in the literature, including Searle's (1969). The point of giving it is that if a definition can be given in this form (i.e., as an action characterizable in a dynamic logic), a plan-based analysis (see section 7.5) applies.</Paragraph>
      <Paragraph position="1"> Computational Linguistics Volume 10, Number 3-4, July-December 1984 101 Philip R. Cohen The Pragmatics of Referring and the Modality of Communication The formula follows the usual axiomatization of actions in a dynamic logic: P z \[Act\]Q; that is, if P is true, after doing Act, Q holds.</Paragraph>
      <Paragraph position="2"> (1980) possible worlds semantics operator RESULT is taken to be action, and a formula, iff in all from the agent's performing that true. 8 Following Moore's for action, the modal true of an agent, an world states resulting action, the formula is The antecedent says there exists some (perhaps more than one) object satisfying three conditions. The first is a &amp;quot;perceptual accessibility&amp;quot; condition to guarantee that the IDENTITY-REFERENT action is applicable. This should guarantee that, for example, a speaker does not intend someone to pick out the referent of &amp;quot;3&amp;quot;, &amp;quot;democracy&amp;quot;, or &amp;quot;the first man to land on Mars&amp;quot;. The condition is satisfied in the experimental task because it rapidly becomes mutual knowledge that the task requires communication about the objects in front of the hearer. The second condition states that X fulfills the description D. Here, I am ignoring cases in which the description is not literally true of the intended referent, including metonymy, irony, and the like (but see Perrault and Cohen 1981). Finally, D should be a description that is identifiable to this particular Agt. It should use descriptors whose extension the agent already knows or can discover by action. I am assuming that we can know that a combination of descriptors is identifiable without having formed a plan for identifying the referent.</Paragraph>
      <Paragraph position="3"> If the antecedent is true, then the agent picks out something (not necessarily the object satisfying the antecedent) as the referent of D. His picking out the &amp;quot;right&amp;quot; (i.e., the intended) object is handled by a separate characterization of the speaker's intention with respect to this action (see section 7.5). Here, I will merely give a name to the state of knowledge the agent is in after having identified the referent of D - (IDENTIFIED-REFERENT Agt D X). That is, Agt has identified the referent of D to be X. Of course, what has been notoriously difficult to specify is just what Agt has to know about X to say he has identified it as the referent of D. Clearly, &amp;quot;knowing who the D is&amp;quot; (Hintikka 1969, Moore 1980) is no substitute for having identified a referent. After having picked out the referent of a description, we may still not not know who the D is. On the other hand, we may know who or what the description denotes, for example, by knowing some &amp;quot;standard name&amp;quot; for it, and yet be unable to use that knowledge to pick out the object. For example, if we ask &amp;quot;Which is the Seattle train?&amp;quot; and receive the reply &amp;quot;It's train number 11689&amp;quot;, we may still not be able to pick out and board the train if its serial number is not plainly in view. Clearly, the notion of identification needs to be made relative to a purpose, which perhaps could be derived from the bodily actions that Agt is intended to perform upon the intended referent. 9 Finally, although not stated in this definition, the means by which the act is performed is some function mapping D to some plan or procedure that, when executed by Agt, enables Agt to discover the X that is the referent of D.</Paragraph>
      <Paragraph position="4"> Even with this imprecise understanding of referent identification, it is apparent that not all noun phrases used in task-oriented conversations (even with the perceptual access conditions satisfied) are uttered with the intention that their referents be identified. For example, in dialogues with an information booth clerk in a train station (Allen 1979, Horrigan 1977), patrons uttering &amp;quot;the 3:15 to Montreal?&amp;quot; are not intending the clerk to pick out the train. Instead, as part of their plan for boarding a train, patrons are intending the clerk to supply them with a co-referring noun phrase that will allow them to identify the train. The attributive use of definite noun phrases (Donnellan 1960) is another case in which the speaker has no intention that the hearer identify a referent. Other non-anaphoric uses of noun phrases include labeling an object, correcting a referential miscommunication, getting the speaker to wait while the speaker identifies the referent, etc. 1deg</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Comparisons with computational linguistics
</SectionTitle>
      <Paragraph position="0"> approaches to reference Computational linguistics research has usually been concerned with co-reference - the relationship of words and symbols to other words and symbols. Typically, referents of descriptions are determined by intersecting the extensions of the predicates in the description, subject to the quantificational constraints imposed by the determiner. Although perhaps adequate for interfacing with databases, this approach presupposes that the extensions can be computed from information currently in the database. However, in interpreting and generating discourse about some physical task, the system may have to form a plan that it or its user perform physical actions to determine the extensions of the predicates.</Paragraph>
      <Paragraph position="1"> Five approaches are most closely related to ours. First, Winograd's SHRDLU (1972) attempted to simulate true reference with co-reference, at SHRDLU had a PLANNER function, THFIND, that could find objects in the database satisfying THGOAL statements as a simulation of finding 8 Actually, Moore characterizes RESULT as taking an event and a formula as arguments. In his framework, an agent's doing an action denotes an event. However, this difference is not critical for what follows.</Paragraph>
      <Paragraph position="2"> The connection with the contextually relevant actions is a matter of inference (see section 6).</Paragraph>
      <Paragraph position="3"> ~0 For other discussion of speakers' goals in uttering noun phrases (see Sidner (1983) and Wilkes-Gibbs, unpublished ms).</Paragraph>
      <Paragraph position="4"> Ii On the other hand, one might argue that SHRDLU engaged in true reference because the discourse was about non-existing blocks &amp;quot;contained&amp;quot; within the system. To pursue the truth of the matter would take us too far afield.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
102 Computational Linguistics Volume 10, Number 2, April-June 1984
</SectionTitle>
      <Paragraph position="0"> Philip R. Cohen The Pragmatics of Referring and the Modality of Communication blocks in the real world. THFIND was included in the semantic representation of definite NPs, and in the representation of indefinite NPs when those NPs were embedded in an action verb. However, THFIND is not attributed as a user goal, nor is it reasoned about (other than to maintain a distinction between definite and indefinite NPs). Furthermore, it is not treated in the same way as acts such as PICKUP, whose execution is marked specially so that the system can later answer &amp;quot;why&amp;quot; questions. null Second, Allen's (1979) system used an IDENTIFY state in the control part of the plan-recognition mechanism. Again, for this system, identification meant to find something in the database satisfying the requisite predicates. However, the IDENTIFY action itself was not part of the plan being recognized. The system did not reason about when IDENTIFY should be done (it always tried to IDENTIFY referents), nor did it attribute IDENTIFY to be part of its user's plan.</Paragraph>
      <Paragraph position="1"> The TDUS system (Robinson et al. 1980) engaged in a dialogue about the assembly of an air compressor that, it was understood, was being assembled by an apprentice.</Paragraph>
      <Paragraph position="2"> Thus, the referents of the system's noun phrases were perceptually accessible to the hearer. The system was primarily oriented towards utterance interpretation, but it did generate responses to questions. In doing so, the system was in the same circumstances as the experts in the present study. However, because it was assumed that the extensions of all of the system's descriptors were already known to the hearer, the system did not reason that it should choose particular referring expressions so that the hearer could pick out their referents. Instead, the choice of referring expressions was constrained by uniqueness and focus (Grosz 1977), constraints that are not considered here but are clearly necessary. Although TDUS employed the concept of locating an object in its representation of successful task performance, this concept did not play a role in choosing referring expressions unless the system was asked a question about an object's location.</Paragraph>
      <Paragraph position="3"> Appelt's KAMP system (1981) generalized TDUS to plan referring actions as part of the planning of illocutionary acts. However, KAMP would only include descriptors in a referring expression for which it was already mutually believed that the hearer knew the referent. Thus, it could not generate referring expressions to new objects for the hearer to pick out. Furthermore, as argued earlier, the concept of &amp;quot;knowing what the referent is&amp;quot;, which was central to KAMP's planning of referring phrases, is too strong to be an accurate representation of referent identification.</Paragraph>
      <Paragraph position="4"> Finally, the HAM-ANS question-answering system (Hoeppner, Morik, and Marburger 1984) generates descriptions of objects in a hotel room from visually derived information, assuming the user's visual search processes are identical with its own. In another application, the system answers questions about traffic flow based on visual data. In its tying reference to perception, the HAM-AN_S System has some of the flexibility that I am advocating. However, as with the others, it does not reason about identification as an action that the speaker intends it to do. In this paper, 1 argue why such reasoning is needed.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Summary
</SectionTitle>
      <Paragraph position="0"> In summary, I am suggesting that referent identification be an action that the hearer infers to be part of the speaker's plan, and that speakers plan for hearers to perform. To ensure that hearers can do so, speakers employ their knowledge of the hearer's perceptual abilities, and choose descriptions that will make use of those abilities. The ability to reason about the referent identification act will allow the hearer to infer the intentions behind many utterances that secure reference separately from predication, and do so indirectly. With this concept in mind, we can proceed to examine its use in discourse.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 The Study
</SectionTitle>
    <Paragraph position="0"> Twenty-five subjects (&amp;quot;experts&amp;quot;) each instructed a randomly chosen &amp;quot;apprentice&amp;quot; in assembling a toy water pump, following Grosz's (1977) and Chapanis et al.'s (1972) task-oriented dialogue paradigm. 12 Subjects were paid volunteer students from the University of Illinois, all of whom were familiar with CRT terminals. Five &amp;quot;dialogues&amp;quot; took place in each of the following modalities: face-to-face, by telephone, keyboard (&amp;quot;linked&amp;quot; CRTs), (noninteractive) audiotape, and (non-interactive) written. In all modes, the apprentices were videotaped as they followed the experts' instructions.</Paragraph>
    <Paragraph position="1"> Face-to-face and written modalities are the ones usually compared in oral/written discussions. However, they differ along many dimensions (Rubin 1980). Pairwise comparisons of the modalities in this study can determine the effects of mutual vision, interaction, and the use of voice or print. Telephone and keyboard dialogues are analyzed first because our conclusions would indicate the effects of having a voice channel, and moreover would have implications for the design of speech understanding and production systems. These modalities take'on intermediate values in Rubin's dimensional space: the conversants share the same time frame, can interact, cannot see each other, and are conversing about objects mutually known to be physically present to one of them.</Paragraph>
    <Paragraph position="2"> Each expert participated in the experiment on two consecutive days, the first for training and the second for instructing an apprentice. Subjects playing the expert role were trained by following a set of assembly directions consisting entirely of imperatives, assembling ,2 An exploded parts diagram of the pump can be found in Appendix A. Computational Linguistics Volume 10, Number 2, April-June 1984 103 Philip R. Cohen The Pragmatics of Referring and the Modality of Communication the pump as often as desired, and then instructing a research assistant. This practice session took place face to face. Experts knew that the research assistant already knew how to assemble the pump. Experts were given an initial statement of the purpose of the experiment, which indicated that communication would take place in one of a number of different modes. Experts were not informed of the modality in which they would communicate until the next day. t3 Apprentices were told the purpose of the experiment was to analyze the communicating of a set of instructions in different modalities. They were not initially informed that they were engaged in an assembly task.</Paragraph>
    <Paragraph position="3"> In both modes, experts and apprentices were located in different rooms. Experts had a set of pump parts that, they were told, were not to be assembled but could be manipulated. In Telephone mode, experts communicated through a standard telephone and apprentices communicated through a speaker-phone. This device did not need to be held and allowed simultaneous two-way communication. Distortion of the expert's voice was apparent, but not measured.</Paragraph>
    <Paragraph position="4"> Subjects in &amp;quot;keyboard&amp;quot; mode typed their communication on Elite Datamedia 1500 CRT terminals connected by the Telenet computer network to a computer at Bolt Beranek and Newman Inc. The terminals were &amp;quot;linked&amp;quot; so that whatever was typed on one would appear on the other. Simultaneous typing was possible and did occur.</Paragraph>
    <Paragraph position="5"> Subjects were informed that their typing would not appear simultaneously on either terminal. Response times averaged 1 to 2 seconds, with occasionally longer delays due to system load.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Sample transcripts
</SectionTitle>
      <Paragraph position="0"> The following are representative samples of transcripts in the two modalities.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
A TELEPHONE DIALOGUE FRAGMENT
</SectionTitle>
    <Paragraph position="0"> S: &amp;quot;OK. Take that. Now there's a thing called a plunger. It has a red handle on it, a green bottom, and it's got a blue lid.</Paragraph>
    <Paragraph position="1"> J: OK S: OK now, the small blue cap we talked about before? J: Yeah S: Put that over the hole on the side of that tube -J: Yeah t3 The instructions given to the expert about the experiment and the assembly task are given in Appendix A. Burke (1982) reports that the  order of the instructions, and the descriptions of the pieces, influenced the order and vocabulary of the expert's subsequent instructions. S: - that is nearest to the top, or nearest to the red handle.</Paragraph>
    <Paragraph position="2"> J: OK S: OK. Now. now, the smallest of the red pieces? J: OK&amp;quot; A KEYBOARD DIALOGUE FRAGMENT B: &amp;quot;fit the blue cap over the tub end N: done B: put the little black ring into the large blue cap with the hole in it...</Paragraph>
    <Paragraph position="3"> N: ok B: right Put the 1/4 inch long 'post' into the loosely fitting hole...</Paragraph>
    <Paragraph position="4"> N: i don't understand what you mean B: the red piece, with the four tiny projections? N: OK B: place it loosely into the hole on the side of the large tube...</Paragraph>
    <Paragraph position="5"> N: done B: very good. See the clear elbow tube? N: yes B: place the large end over that same place. N: ready B: take the clear dome and attach it to the end of the elbow joint...&amp;quot;</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Method of analysis
</SectionTitle>
      <Paragraph position="0"> Discourses are analyzed for many reasons, with a corresponding variety of methods. Some analyses of discourse strive to explain what the text itself meant. Recent work on discourse pragmatics emphasizes the need to explain what the speaker meant in producing the utterances, i.e., what were the speaker's intentions? To build dialogue systems, we need to devise first theories and then algorithms for deriving what the speaker meant as a function of what was said and of contextual factors. The logical first step toward such a formalization is to establish reliable methods for isolating the words, context, and speak- null er intent. Each of these aspects of the discourse is considered below.</Paragraph>
      <Paragraph position="1"> 104 Computational Linguistics Volume 10, Number 2, April-June 1984 Philip R. Cohen The Pragmatics of Referring and the Modality of Communication  First, the typewritten transcript of a verbal interaction provides reasonably accurate data on what was said, provided one's goal is not to study prosody. Second, contextual factors can be modelled in a setting in which the objects, communication task, and modality have been selected by the experimenter. The conversants' knowledge of the domain is somewhat constrained by the experimental setup and the initial instructions. This semi-controlled environment can enable the experimenter to model the participants' initial experiment-induced beliefs, intentions, and expectations, which constitute our model of the cognitive effects of context.</Paragraph>
      <Paragraph position="2"> Finally, as the conversation progresses, one needs interpretations of what each speaker meant, stated in terms of further attributions of beliefs and intentions.</Paragraph>
      <Paragraph position="3"> Standard empirical methods should be used to minimize experimenter bias in making such attributions. In particular, the theorist must be careful not to be the source of belief/intent attributions, for if given the leeway, he will undoubtedly find what he is looking for. To avoid this problem, I trained two people to employ a vocabulary for describing intentions in discourse, the so-called &amp;quot;illocutionary acts&amp;quot; (or, loosely, &amp;quot;speech acts&amp;quot;) (Austin 1962, Searle 1969). That is, the discourse analysts &amp;quot;code&amp;quot; the speaker's intentions in making an utterance by assigning illocutionary act labels to utterances (or groups of them). Fortunately, the illocutionary act vocabulary is the natural one in our common-sense psychology for making such attributions. However, unlike most theories of illocutionary acts, I do not claim that the conversants themselves attempt to determine what illocutionary acts were performed, although they might be able to do so if requested. 14 The illocutionary act interpretations are therefore our interpretations, as coders and as theorists.</Paragraph>
      <Paragraph position="4"> The data that need to be compared and explained are these illocutionary act codings. As mentioned earlier, a number of researchers have attempted similar analyses, but are content with solely identifying regularities in their discourses. A preferable analysis would derive regularities from more basic principles. The method employed here for formulating such derivations includes the follow- null ing components: * A logic of beliefs, mutual beliefs, and goals.</Paragraph>
      <Paragraph position="5"> * A specification of the goals achieved by utterances of various forms (e.g., a yes/no question is an attempt to get the hearer to inform the speaker whether or not the proposition in question is true).</Paragraph>
      <Paragraph position="6"> * A formal theory of rational, intentional action that  specifies how an agent's actions are determined by both his goals and his knowledge of the effects of, ~4 See Cohen and Levesque (1980, in preparation) for a plan-based theory of communication that does not require the recognition of illocutionary acts.</Paragraph>
      <Paragraph position="7"> preconditions for, and means of accomplishing various action types.</Paragraph>
      <Paragraph position="8"> The aim of a competence theory of communication based on plans is to specify the set of possible plans underlying the appropriate use of various illocutionary acts. In applying such a theory to the analysis of discourse, plans are used to connect an utterance's form and content with the observers' illocutionary act coding, which is our best approximation to the speaker's intent. It is important to remember that these intentions may not be identical to those conveyed by the literal utterance. The plans make use of a formalization of the experimental task, the modality and the prior discourse, expressed in terms of the participants' mutual beliefs, goals, expectations, and possibilities for action. Thus, the theory captures, albeit in an indirect way, the dependence of the discourse structure on the experimental task and communication modality.</Paragraph>
      <Paragraph position="9"> In addition, a performance model would include algorithms for forming and recognizing plans of action to derive the observer's intent codings. Although such models have been built (Allen 1979, Brachman et al.</Paragraph>
      <Paragraph position="10"> 1979), I do not discuss them further here.</Paragraph>
      <Paragraph position="11"> In summary, the discourse analysis methodology is as follows:  * Train coders to identify various illocutionary acts (IAS).</Paragraph>
      <Paragraph position="12"> * Compare the distribution of IAs across modalities.</Paragraph>
      <Paragraph position="13"> * Independently, characterize those IA types in terms of plans.</Paragraph>
      <Paragraph position="14"> * Formally derive the IA codings as a rational strategy of action, given attributions of the participants' beliefs, goals, and expectations at the point in the discourse in which the IAs occurred.</Paragraph>
      <Paragraph position="15">  When our work is complete, we will have analyses of the differences in achievement of the same overarching set of goals (the assembly task) as a function of modality.</Paragraph>
      <Paragraph position="16">  The first stage of discourse analysis involves the coding of the communicator's intent in making various utterances. Following the experiences of Sinclair and Coulthard (1975), Dore et al. (1978), and Mann, Carlisle, Moore, and Levin (1977), a coding scheme was developed and two people were trained in its use. The coders relied on written transcripts, audiotapes, and on videotapes. null The scheme, which was tested and revised on pilot data until reliability was attained, included a set of approximately eight illocutionary act categories that were used to label intent, and a set of &amp;quot;operators&amp;quot; and propositions that were used to describe the assembly task, as in Sacerdoti (1975). Appendix B lists the propositions and operators for the physical actions. For example, putting two hollow, pipe-like pieces together was termed Computational Linguistics Volume 10, Number 2, April-June 1984 105 Philip R. Cohen The Pragmatics of Referring and the iodality of Communication CONNECTing; putting a part with a protrusion into a part with a hole was termed MESHing. The operators for physical actions often served as the propositional content of the communicative acts.</Paragraph>
      <Paragraph position="17"> The following illocutionary act categories were coded:  &amp;quot;and the purpose of that is to cover up that hole ....</Paragraph>
      <Paragraph position="19"> &amp;quot;'that's a plunger&amp;quot; As discussed earlier, the action of referent identification is labelled IDENTIFY-REFERENT, and the state of affairs resulting from it is termed IDENTIFIED-REFERENT. Communicating that the speaker wants, the hearer to do something is termed REQUESTing. Yes/no questions are REQUESTs to get a hearer to perform an INFORMIF action, i.e., to tell the speaker whether or not some proposition holds. One subcase of this is to tell the hearer whether or not a referent for a description has been identified. Finally, speakers often request that hearers make a relation true, without specifying an action that would do so. This is cap\[ured by the REQUEST to ACHIEVE \[relation\] coding.</Paragraph>
      <Paragraph position="20"> Regarding referent identification, the coders were asked to state which utterances, or groups of utterances, constituted either an explicit request by the speaker that the hearer identify the referent of a noun phrase or a question about whether or not the hearer had done so.</Paragraph>
      <Paragraph position="21"> The coders were instructed not to consider whether or not an utterance was an indirect request to pick something up (but see section 6.4.1). Furthermore, they were told not to consider noun phrases in assembly requests as identification requests unless identification was somehow &amp;quot;explicitly marked&amp;quot;. 15 Because agreement about the intent behind utterance parts was not obtainable, I cannot assert, on the basis of empirical evidence alone, that noun phrases embedded in imperatives are requests to identify the referents. Instead, the speaker's intent behind whole utterances (though not necessarily complete sentences) was coded, t6  Of course, a coding scheme must not only capture the domain of discourse, it must be tailored to the nature of discourse per se. Many theorists have observed that a speaker can use a number of utterances to achieve a goal, and can use one utterance to achieve a number of goals.</Paragraph>
      <Paragraph position="22"> Correspondingly, the coders could consider utterances as jointly achieving one intention (by &amp;quot;bracketing&amp;quot; them), could place an utterance in multiple categories, and could attribute more than one intention to the same utterance or utterance part. The coders were instructed to ignore false starts, even though a false start may communicate information.</Paragraph>
      <Paragraph position="23"> Although our goals did not include a precise analysis of how prosody reflects speaker-intent and meaning, some decisions about bow to translate prosody into orthographic form, which undoubtedly influence subsequent discourse analyses, were made by the transcriber of the audiotapes. To minimize inconsistencies in transcription, all transcriptions were checked by a second party. Moreover, it was discovered that the physical layout of a transcript, particularly the location of line breaks, affected which utterances were coded. To ensure uniformity, each coder first divided each transcript into utterances that he or she would code. These joint &amp;quot;bracketings&amp;quot; were compared to yield a base set of codable utterance parts. The coders could later bracket utterances differently if necessary.</Paragraph>
      <Paragraph position="24"> For one third of the transcripts, interrater reliabilities were calculated within each mode, for each category.</Paragraph>
      <Paragraph position="25"> The measure consisted of twice the number of agreements divided by the number of times that category was coded (cf. Mann, Carlisle, Moore, and Levin 1977).</Paragraph>
      <Paragraph position="26"> Reliabilities were high (above 88%). Because each disagreement counted twice (against both categories that were coded), agreements also counted twice.</Paragraph>
      <Paragraph position="27"> 4.2.3 Coding the sample dialogue fragments The previous fragments are coded below to indicate some of the complexities of the data as well as the scoring  scheme. A number of shortcuts have been taken for expository purposes. First, if an act is stated as ~ The above Telephone dialogue fragment contains one such intonationally marked noun phrase.</Paragraph>
      <Paragraph position="28"> t6 For a formal analysis that does make such a claim, see section 8.4 and Cohen (1984).</Paragraph>
      <Paragraph position="29"> ~7 The action-effect relation holding between the various propositions and assembly actions can be readily inferred from Appendix B. 106 Computational Linguistics Volume 10, Number 2, April-June 1984 Philip R. Cohen The Pragmatics of Referring and the Modality of Communication  COMPLETE, then the proposition stated as the effect of that act holds. 17 Second, some of the arguments to the embedded propositions have not been presented when those arguments are not problematic. Third, as argued above, the second argument of IDENTIFY-REFERENT should be a description in some appropriate logical form representing the meaning of the speaker's noun phrase.</Paragraph>
      <Paragraph position="30"> However, because it was too difficult to get coders to determine logical forms for the noun phrases, they instead coded only the canonical names of the referents as arguments. Finally, the elapsed time between utterances is not shown here, but is available from the videotapes. null The codings of S's first turn indicate an attempt to achieve more than one intention in one utterance.</Paragraph>
      <Paragraph position="31"> Specifically, the form &amp;quot;there's a ...&amp;quot;, is a typical way to perform a request to identify something satisfying the description (the &amp;quot;...&amp;quot;). In this case, the speaker said &amp;quot;thing&amp;quot;, and labelled that thing a plunger. Whereas the labelling act may be finished, the request for referent identification apparently is not, and is continued over a number of utterances.</Paragraph>
      <Paragraph position="32"> The other &amp;quot;bracketed&amp;quot; turn is an example of a speaker's prosodically achieving multiple goals at once. Here, the use of rising intonation in the middle of an imperative is used to check whether the hearer knows what the speaker is talking about. The pragmatics of this discourse situation led to the coding of &amp;quot;knowing what the speaker is talking about&amp;quot; as a request to physically identify a referent. Finally, notice the subsequent use of a questioned noun phrase fragment to perform the same act. The use of fragments will be discussed further below.</Paragraph>
      <Paragraph position="33"> The coding of the Keyboard utterances is more straightforward. There are three strategies of instruction here. First, direct requests for assembly actions, in the form of imperatives, as in line (1). Second, there are conjoined direct requests, for picking up followed by an assembly action, as in (12). Finally, B performs separate identification requests, as in (7) and (8).</Paragraph>
      <Paragraph position="34"> What is important to notice here is that B shifts his strategy (in a fashion that resembles driving a threespeed car). Before this fragment, the conversation had proceeded smoothly, in &amp;quot;high gear&amp;quot;, with B initially &amp;quot;upshifting&amp;quot; from first a &amp;quot;take and assemble&amp;quot; request to six consecutive assembly requests (one of them indirect), the last of which is utterance (1) of this fragment.</Paragraph>
      <Paragraph position="35"> In (5)-(7), we observe clarification dialogue about a noun phrase. Immediately after an apparent breakdown at (3), B &amp;quot;downshifts&amp;quot; to questioning the achievement of his first subgoal, identifying the red piece. Once that is corrected, B stays in &amp;quot;low gear&amp;quot;, explicitly ensuring success of his reference, in (8), before requesting an assembly action in (9). After that success, he &amp;quot;upshifts&amp;quot; to &amp;quot;second gear&amp;quot; - with requests to pick-up and assemble Computational Linguistics Volume 10, Number 2, April-June 1984 in (13). After being successful yet again, B &amp;quot;upshifts&amp;quot; to &amp;quot;high gear&amp;quot;, using direct assembly requests, for the rest of the dialogue (seven more requests).</Paragraph>
      <Paragraph position="36"> What could explain this conversation pattern? A common sense analysis of the plan for assembling would indicate that to install a piece, one must be holding it; to hold it, one must pick it up; to perform any action on an object, one must have identified that object. By requesting an assembly action (&amp;quot;high gear&amp;quot;), one requires the listener to infer the rest of the plan. By requesting the sequence take-and-assemble (&amp;quot;second gear&amp;quot;), the speaker makes one of the inferences himself, but requires the listener to realize that identification of the speaker's part description is needed. Finally, &amp;quot;low gear&amp;quot; involves the speaker's checking the success of the component subgoals, which involves identifying the referents of the speaker's descriptions. In summary, the strategy shift to &amp;quot;low gear&amp;quot; occurs after a referential miscommunication because it affords a more precise monitoring of the listener's achievement of the speaker's goals. The question to be asked is how, if at all, the use of identification requests differs across modes of communication.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML