File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/w98-0319_metho.xml

Size: 13,341 bytes

Last Modified: 2025-10-06 14:15:08

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-0319">
  <Title>Lexical, Prosodic, and Syntactic Cues for Dialog Acts</Title>
  <Section position="3" start_page="115" end_page="116" type="metho">
    <SectionTitle>
3 Lexical Cues to Dialog Act Identity
</SectionTitle>
    <Paragraph position="0"> Perhaps the most studied cue for discourse structure are lexical cues, also called 'cue phrases', which are defined as follows by Hirschberg and Litman (1993): &amp;quot;Cue phrases are linguistic expressions a telling (Drummond and Hopper, 1993b) such as NOW and WELL that function as explicit indicators of the structure of a discourse&amp;quot;. This section examines the role of lexical cues in distinguishing four common DAs with considerable overlap in lexical realizations. These are continuers, agreements, yes-answers, and incipient-speakership.</Paragraph>
    <Paragraph position="1"> What makes these four types so difficult to distinguish is that they all can be realized by common words like uh.huh, yeah, right, yes, okay.</Paragraph>
    <Paragraph position="2"> But while some tokens (like yeah) are highly ambiguous, others, (like uh-huh or okay) are somewhat less ambiguous, occurring with different likelihoods in different DAs. This suggests a generalization of the 'cue word' hypothesis: while some utterances may be ambiguous, in general the lexical form of a DA places strong constraints on which DA the utterance can realize. Indeed, we and our colleagues as well as many other researchers working on automatic DA recognition, have found that the words and phrases in a DA were the strongest cue to its identity.</Paragraph>
    <Paragraph position="3"> Examining the individual realization of our four DAs, we see that although the word yeah is highly ambiguous, in general the distribution of possible  realizations is quite different across DAs. Table 6 shows the most common realizations.</Paragraph>
    <Paragraph position="4"> As Table 6 shows, the Switchbc, ard data supports Jefferson's (1984) hypothesis that uh-huh tends to be used for passive recipiency, while yeah tends to be used for incipient speakership. (Note that the transcriptions do not distinguish mm-hm from uhhuh; we refer to both of these as uh-huh). In fact uh-huh is twice as likely as yeah to be used as a continuer, while yeah is three times as likely as uh-huh to be used to take the floor.</Paragraph>
    <Paragraph position="5"> Our results differ somewhat from earlier statistical investigation of incipient speakership. In their analysis of 750 acknowledge tokens from telephone conversations, Drummond and Hopper (1993a) found that yeah was used to initiate a turn about half the time, while uh huh and mm-hm were only used to take the floor 4% - 5% of the time. Note that in Table 6, uh-huh is used to take the floor 1402 times. The corpus contains a total of 15,818 tokens of uh-huh, of which 13,106 (11,704+1402) are used as backchannels. Thus 11% of the backchannel tokens of uh-huh (or alternatively 9% of the total tokens Of uh-huh) are used to take the floor, about twice as many as in Drummond and Hopper's study. This difference could be caused by differences between SWBD and their corpora, and bears further investigation.</Paragraph>
    <Paragraph position="6"> Drummond and Hopper (1993b) were not able to separately code yes-answers and agreements, which suggests that their study might be extended in this way. Since we did code these separately, we also checked to see what percentage of just the backchannel uses of yeah marked incipient speakership. We found that 41% of the backchannel uses of veah were used to take the floor (4773/(4773+6961 )) similar to their finding of 46%.</Paragraph>
    <Paragraph position="7"> While veah is the most common token for continuer, agreement, and yes-answer, the rest of the distribution is quite different. Uh-huh is much less common as an yes-answer than tokens of veah or yes - in fact 86% of the yes-answer tokens contained the words yes, yeah. or vep, while only 14% contained uh-huh.</Paragraph>
    <Paragraph position="8"> Note also that uh-huh is also not a good cue for agreements, only occurring 4% of the time.</Paragraph>
    <Paragraph position="9"> Tokens like exactly and that's right, on the other hand. uniquely specify agreements (among these four types). The word no, while not unique (it also marks incipient speakership), is a generally good discriminative cue for agreement (it is very commonly used to agree with negative statements).</Paragraph>
    <Paragraph position="10"> We are currently investigating speakerdependencies in the realization of these four DAs. Anecdotally we have noticed that some speakers used characteristic intonation on a particular lexical item to differentiate between its use as a continuer and an agreement, while others seemed to use one lexical item exclusively for backchannets and others for agreements.</Paragraph>
  </Section>
  <Section position="4" start_page="116" end_page="118" type="metho">
    <SectionTitle>
4 Prosodic Cues to Dialog Act Identity
</SectionTitle>
    <Paragraph position="0"> While lexical information is a strong cue to DA identity, prosody also clearly plays an important role. For example Hirschberg and Litman (1993) found that intonational phrasing and pitch accent play a role in disambiguating cue phrases, and hence in helping determine discourse structure.</Paragraph>
    <Paragraph position="1">  Hirschberg and Litman also looked at the difference in cues between text transcriptions and complete speech.</Paragraph>
    <Paragraph position="2"> We followed a similar line of research to examine the effect of prosody on DA identification, by studying how DA labeling is affected when labelers are able to listen to the soundfiles. As mentioned earlier, labeling had been done only from transcripts for practical reasons, since listening would have added time and resource requirements beyond what we could handle for the JHU workshop. The fourth author (an original labeler) listened to and relabeled 44 randomly selected conversations that she had previously labeled only from text. In order not to bias changes in the labeling, she was not informed of the purpose of the relabeling, other than that she should label after listening to each utterance. As in the previous labeling, the transcript and full context was available; this time, however, her originallycoded labels were also present on the transcripts. Also as previously, segmentations were not allowed to be changed; this made it feasible to match up previous and new labels. The relabeling by listening took approximately 30 minutes per conversation.</Paragraph>
    <Paragraph position="3"> For this set of 44 conversations, 114 of the 5757 originally labeled Dialog Acts (2%) were changed, The fact that 98% of the DAs were unchanged suggests that DA labeling from text transcriptions was probably a good idea for our purposes overall. However, there were some frequent changes which were significant for certain DAs. Table 7 shows the DAs that were most affected by relabeling, and hence were presumably most ambiguous from text-alone:  The most prominent change was clearly the conversion of continuers to agreements. This accounted for 38% of the 114 changes made. While there were also a number of changes to statements and opinions, the changes to continuers were primary for two reasons. First, statements have a much higher prior probability than continuers or agreements. After normalizing the number of changes by DA prior, continuer --~ agreement changes occur for over 4% of original continuer labelers. In contrast, the normalized rate for the second and third most frequent types of changes were 22/989 (2%) for opinions ~ statements and 17/2147 (1%) for statements ~ opinions. Second, continuer -+ agreement changes often played a causal role in the other changes: a continuer which changed to an agreement often caused a preceding statement to be relabeled as an opinion.</Paragraph>
    <Paragraph position="4"> There are a number of potential causes for the high rate of continuer -+ agreement changes.</Paragraph>
    <Paragraph position="5"> First, because continuers were more frequent and less marked than agreements, labelers were originally instructed to code ambiguous cases as contin- null uers. Second, the two codes often shared identical lexical form: as was mentioned above, while some speakers used lexical form to distinguish agreemerits from continuers, many others used prosody. We did find some distinctive prosodic indicators when a continuer was relabeled as an agreement. In general, continuers are shorter in duration, less intonationally marked (lower F0, flatter, lower energy (less loud)) than agreements. There are exceptions, however. A continuer can be higher in F0, with considerable energy and duration, if it ends in a continuation rise. This has the effect of inviting the other speaker to continue, resembling question intonation for English. A high fall, on the other hand, sounds more like an agreement than a continuer.</Paragraph>
    <Paragraph position="6"> Another important prosodic factor not reflected in the text is the latency between DAs, since pauses were not marked in the SWBD transcripts. One mark of a dispreferred response is a significant pause before speaking. Thus when listening, a DA which was marked as an agreement in the text could be easily heard as a continuer if it began with a particularly long pause. Lack of a pause, conversely, contributes to an opposite change, from continuer ~ agreement. The SWBD segmentation conventions placed yeah and uh-huh in separate units from the subsequent utterances. Listening, however, sometimes indicated that these veahs or uh-htths were followed by no discernible pause or delay, in effect &amp;quot;latched&amp;quot; onto the subsequent utterance. Taken as a single utterance, the combination of the affirmative lexical items and the other material actually indicated agreement. In the following example there is no pause between A.1 and A.2, which led to relabeling of A.1 as an agreement, based mainly on this latching effect and to a lesser extent on the intonation (which is probably colored by the latching, since both utterances are part of one intonation contour).</Paragraph>
    <Paragraph position="7">  I don't think they even I realize vohat's ottt there I and to vchat extent. I &lt;Lipsmack&gt; Yeah, / f I'm sure a lot of them are I missing those household I items &lt;laugh&gt;. It</Paragraph>
  </Section>
  <Section position="5" start_page="118" end_page="119" type="metho">
    <SectionTitle>
5 Syntactic Cues
</SectionTitle>
    <Paragraph position="0"> As part of our exploratory study, we have also begun to examine the syntactic realization of certain dialog acts. In particular, we have been interested in the syntactic formats found in evaluations and assessments. null Evaluations and assessments represent a subtype of what Lyons (1972) calls &amp;quot;ascriptive sentences&amp;quot; (471). Ascriptive sentences &amp;quot;are used...to ascribe to the referent of the subject-expression a certain property&amp;quot; (471). In the case of evaluations and assessments, the property being ascribed is part of the semantic field of positive-negative, good-bad. Common examples of evaluations and assessments are:  1. That's good.</Paragraph>
    <Paragraph position="1"> 2. Oh that's nice.</Paragraph>
    <Paragraph position="2"> 3. It's great.</Paragraph>
    <Paragraph position="3">  The study of evaluations and assessments has attracted quite a bit of work in the area of Conversation Analysis. Goodwin and Goodwin (1987) provide an early description of evaluations/ assessments. Goodwin (1996:391) found that assessments often display the following format:</Paragraph>
    <Paragraph position="5"> In examining evaluations and assessments in the SWBD data. we found that this format does occur extremely frequently. But perhaps more interestingly, at least in these data we find a very strong tendency with regard to the exact lexical identity of the Pro Term (the first grammatical item in the format): that is, we found that the Pro Term is overwhelmingly &amp;quot;that&amp;quot; in the Switchboard data (out of 1150 instances with an overt subject. 922 (80%1 had that as the subject). Moreover. in the 1150 utterances included in this study (those displaying an overt subject), intensifiers (like very, so) were extremely rare, occurring in only 27 instances (2%), and all involved the same two intensifiers -- really and preny. Of the 1150 utterances used as the database for this exploratory study, those utterances that showed an assessment adjective displayed a very small range of such adjectives. The entire list follows: great, good, nice, wonderful, cool, fun, terrible, exciting, interesting, wild, scary, hilarious.</Paragraph>
    <Paragraph position="6"> neat, funny, amazing, tough, incredible, awful.</Paragraph>
    <Paragraph position="7"> The very strong patterning of these utterances: suggests a much more restricted notion of grammatical production than linguistic theories typically propose. This result lends itself to the notion of &amp;quot;micro-syntax&amp;quot; -- that is, the possibility that panic- null ular dialog acts show their own syntactic patterning and may, in fact, be the site of syntactic patterning.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML