Abstract
Acknowledgments, e.g., “yeah” and “uh-huh,”
are ubiquitous in human conversation but are
rarer in human-computer interaction. What
interfacefactorsmightcontributetothisdiffer-
ence? Using a simple spoken-language inter-
face that responded to acknowledgments, we
compared subjects’ use of acknowledgments
when the interface used recorded speech with
that seen when the interface used synthesized
speech. Contrary to our hypothesis, we saw a
dropinthenumbersofsubjectsusingacknowl-
edgments: subjects appeared to interpret the
recorded-voice interface as signalling a more
limitedinterface.Theseresultswereconsistent
for both Mexican Spanish and American
English versions of the interface.
1 Introduction
In previous studies, we showed that subjects use
acknowledgmentsandpolitenesswordswheninteracting
with a simple spoken-language application even when
theinterfacedoesnotoffersuchbehaviorsitself(Ward
and Heeman, 2000; Ward et al., 2003). In post-experi-
mentinterviewsconductedaspartofthatstudy,50%of
thesubjects(11intheEnglish-languagecondition,9in
theSpanish)hadthoughtthattheymightbemorelikely
to use acknowledgments if the interface had a more
researchers(e.g.,Chu-CarrollandBrown,1997),areone
of several meta-dialogue behaviors that people use to
control the ﬂow of conversation.
Meta-dialogue behaviors such as acknowledgment
areofinterestbecauseoftheirroleinmanagingturn-tak-
ing:althoughacknowledgmentsmayprefaceanewcon-
tribution by the same speaker (Novick and Sutton,
1994),oftentheyoccuraloneasasingle-phraseturnthat
appears to serve the purpose of explicitly declining an
opportunity to take a turn (Sacks et al., 1974). If
acknowledgmentbehaviorisincorporatedinspoken-lan-
guagesystems,itmayofferamoreﬂuidandadaptable
means of managing turn-taking and pacing in human-
computer interaction.
Although some research systems incorporate
acknowledgments (e.g., Aist, 1998; Iwase and Ward,
1998; Okato et al., 1998), real-world spoken-language
interfaces generally don’t allow acknowledgments to
serve their turn-taking purpose. Turn-taking is com-
pletelycontrolledbyoneconversant,usuallythesystem.
Toreduceerrors,designersofspoken-languagesystems
createpromptsthatguidetheusertowardshort,focused,
in-vocabularyresponses(e.g.,Bassonetal.,1996;Cole
et al., 1997). In many systems, the use of barge-in
defeats the common interpretation of an acknowledg-
ment:iftheuserspeaks,thesystemquitsspeakingand
begins interpreting the user utterance. If the user
intended to signal that the system should continue, the
effectisexactlytheoppositeoftheoneintended.Thus,
current design practices both discourage and render
meaningless the standard uses of acknowledgments.
Acknowledgment Use with Synthesized and Recorded Prompts
Karen Ward, Tasha Hollingsed, Javier A. Aldaz Salmon
The University of Texas at El Paso
El Paso, Texas USA 79968
{kward,tasha,jaldaz}@cs.utep.edu
human-likevoice.Inthisstudy,wetestedthathypothe-
sis: we examined the effect of changing the interface
prompts from synthesized speech to recorded speech.
The term “acknowledgment” is from Clark and
Schaefer (1989), who describe a hierarchy of methods
by which one conversant may signal that another’s
contributionhasbeenunderstoodwellenoughtoallow
the conversation to proceed. Acknowledgments often
appearinEnglishas“uh-huh”andinSpanishas“ajá.”
Acknowledgments,alsocalled“back-channels”bysome
2 Experiment
The study design, described below, is identical to that
usedinourbaselinestudy(Wardetal.,2003)exceptthat
theinterfacepromptsandmessagesweredeliveredusing
recorded human voices instead of synthesized voice.
ThesestudieswereconductedinbothAmericanEnglish
and Mexican Spanish.
2.1 Method
Wedidnotwanttoexplicitlyinstructorrequiresubjects
to use acknowledgment behavior, as that would tell us
nothing about their preferences. Instead, we wanted to
createasituationinwhichsubjectswouldhaveareason
to use acknowledgments, perhaps even gain an advan-
tage from doing so, while still keeping the behavior
optional. Conversants are likely to offer acknowledg-
mentsandrepetitionswhencomplexorimportantinfor-
mationisbeingtranscribed,especiallywhenthecostof
makinganerrormaybehigh.Acknowledgmentsinthis
context may serve a dual purpose of conveying under-
standing and of controlling the pace of the interaction.
Furthermore, there may be more verbal acknowledg-
ments offered during telephone-based interaction than
during face-to-face interaction (Cohen and Oviatt,
1993).Wethereforedesignedataskinwhichthesubject
isaskedtomakewrittennotesofinformationpresented
verbally over the telephone.
Weselectedthedomainofatelephoneinterfaceto
E-mail. Subjects were told that the computer system
wouldreadE-mailmessagestothemoverthetelephone
andthattheirtaskwastolocateandtranscribeparticular
items of information contained in the messages, e.g.,
“How do you get to the coffee house?” The messages
included both “interesting” information that was to be
copiedand“uninteresting”informationthatwasnot,so
thatsubjectswouldwanttomovethroughthe“uninter-
esting”materialmorequickly.Inthiswaywehopedto
motivate subjects to try to control the pace at which
information was presented.
TheE-mailwaspresentedinsegmentsroughlycor-
respondingtoalongphrase,witheachsegmentfollowed
byapauseofaboutﬁveseconds.Fivesecondsisalong
response time, uncomfortably so for human conversa-
tion,sowehopedthatthislengthypausewouldencour-
age the subjects to take the initiative in controlling the
pace of the interaction. If the subject said nothing, the
systemwouldcontinuebypresentingthenextmessage
segment. Subjects could reduce this delay byacknowl-
edgingthecontribution,e.g.,“okay,”orbycommanding
thesystemtocontinue,e.g.,“goon”or“continuar.”The
systemsignalledthepossibilityofcontrollingthedelay
byaskingthesubjectthequestion“Areyoureadytogo
on” or “Estas listo(a) para continuar” after the ﬁrst
pause. This prompting was repeated for every third
pauseinwhichthesubjectsaidnothing.Inthiswaywe
hopedtosuggesttothesubjectsthattheycouldcontrol
the wait time without explicitly telling them to do so.
Onthesurface,thereisnofunctionaldifferencein
systembehaviorbetweenasubject’suseofacommand
tomovethesystemonward(e.g.,“goon,”“next,”“con-
tinue”) and the use of an acknowledgment. In either
case, the system responds by presenting the next mes-
sagesegment,andinfactiteventuallypresentsthenext
segmentevenifthesubjectsaysnothingatall.Thus,the
design allows the subject to choose freely between
acceptingthesystem’space,orcommandingthesystem
to continue, or acknowledging the presentations in a
fashionmoretypicalofhumanconversation.Inthisway,
we hoped to understand how the subject preferred to
interact with the computer.
Subjects were told that the study’s purpose was to
assess the understandability and usability of the inter-
face,andthattheirtaskwastoﬁndtheanswerstoalist
ofquestions.Theyweregivennoinstructionsintheuse
oftheprogrambeyondtheinformationthattheywereto
talk to it using normal, everyday speech.
Wetestedatotalof40subjects,balancedforgender
andlanguage.SubjectsweresolicitedfromtheUniver-
sity of Texas at El Paso campus. They ranged in age
from18to65,withmostbeingbetween20and25.Each
subject was paid $10.00 for participating in the study.
WeusedaWizardofOzprotocolasawaytoallow
thesystemtorespondtoacknowledgmentsandtopro-
vide robustness in handling repetitions. The wizard’s
interface was constructed using the Rapid Application
Developer in the Center for Spoken Language Under-
standing Toolkit (Sutton et al., 1998). A simple button
panel allowed the wizard to select the appropriate
responsefromtheactionssupportedbytheapplication.
Theapplicationfunctionalitywaslimitedtosuggestreal-
istic abilities for a current spoken-language interface.
Thesubjectcouldrequestamessagebymessagenum-
ber, for example, but not by content or sender.
Theinterfacepromptsandmessageswerepresented
using recorded human voices. The message texts were
presentedinamalevoice,andthecontrolportionsofthe
interfacewereinafemalevoice.Itwashopedthatthe
twovoiceswouldhelpthesubjectsdeterminethestateof
theinterface:deliveringmessagetextvs.controllingthe
interface functions.
2.2 Measures
Incomparingthestrategiesusedtocontrolthelengthof
thepauses(acknowledgmentorcommanduseornone),
the dependent variable was the number of times each
strategywasusedtocontrolthepacingoftheinterface.
The total number of turns varied between subjects
because some subjects listened to each message only
once while others went through messages multiple
times. We therefore normalized the counts by dividing
thenumberoftimeseachstrategywasusedbythenum-
berofturnswherethesubjecthadhadanchoiceofstrat-
egies. We considered the possibility that subjects who
completedthetaskinonlyonepassthoughthemessages
might show a preference for a different strategy than
those who required multiple passes through the mes-
sages,thuscreatingabiasinthenormalizedstatistic.A
preliminaryanalysisshowednosigniﬁcantdifference,so
we did not consider this possibility further.
Thedeterminationastowhetheraparticularutter-
anceconstitutedanacknowledgmentoracommandwas
based primarily on word choice and dialogue context;
thisapproachisconsistentwithdeﬁnitionsofacknowl-
edgment,e.g.,(Chu-CarrollandBrown,1997).Immedi-
ately following a system inform (presentation of a
segment of an E-mail message), the words “yes,” “sí,”
“uh-huh,”“ajá,”and“okay”orarepetitionofpartorall
ofthesysteminformwereconsideredacknowledgments.
Phrasessuchas“goon,”“continue,”“next,”“continuar,”
or “siguiente” following an inform were considered
commands.Theinterpretationwasconﬁrmedduringthe
post-experiment interview by questioning the subjects
about their word choice.Transcriptions and categoriza-
tionsofthesubjectutteranceswerecheckedbyasecond
person for accuracy.
Somesubjects(oneintheSpanish-languagecondi-
tionandeightintheEnglish-languagecondition)com-
bined acknowledgments and commands in a single
utterance,e.g.,“okay,goon.”Ifanacknowledgmentwas
the ﬁrst part of the phrase, then it was included in the
analysis asanacknowledgment and ifacommand was
theﬁrstpart,thenitwasincludedasacommand.Most
subjects did this only once (the single subject in the
Spanish-languageconditionandthreeoftheeightinthe
English-languagecondition),andonespeaker(English)
produced as many as six combined-type responses.
A post-experiment interview was conducted to
determineeachsubject’simpressionofthesystem.Sev-
eral of the questions were drawn from the PARADISE
model (Walker et al. 2000). The experimenter also
explained the true purpose of the experiment and
answeredsubjects’questions.Thisinterviewwastaped
andtheexperimentertooknotes.Datafromsubjectswho
had realized that they were interacting with a human
instead of a completely-automated system were
excludedfromthestudybecauseofthewell-veriﬁedten-
dencyforpeopletospeakdifferentlywhentheybelieve
thattheyarespeakingwithahumaninsteadofacom-
puter (e.g., Brennan, 1991).
3 Results
We hypothesized that subjects would use acknowl-
edgmentbehaviorstocontroltherecorded-voiceversion
oftheinterfacethantheydidwiththesynthesized-voice
version. We expected this increase to be seen in both
SpanishandEnglishconditionsandacrossbothfemale
and male speakers. The results were contrary to our
expectations.
Wheninteractingwiththerecorded-voiceinterface,
commands and acknowledgments were preferred as a
strategyby15%and17.5%,respectively,ofallsubjects.
Thisresultwasnotsigniﬁcantlydifferentthanthatseen
inthesynthesized-voicestudy,asconﬁrmedbytheWil-
coxen-Mann-Whitney test (z = -0.5041, p= 0.0139 for
commands,z=1.686,p=0.0465foracknowledgments).
Contrary to our expectations, the numbers of sub-
jectsusingeitheracknowledgmentsandcommandsactu-
allydropped.Thiswasduetothefactthatthenumbers
ofsubjectswhousedwaitingastheirsolestrategyrose
sharply, from 9 subjects in the synthesized-voice study
to19intherecorded-voicestudy( ,p<0.001).
Forty percent of the subjects used a command at
leastonce,and45%usedanacknowledgementatleast
once. Seven subjects seemed comfortable with both
commands and acknowledgments, using at least ﬁve
examples of each. When acknowledgments were used,
the most common word choice was “okay” (both lan-
guages).Whencommandswereused,themostcommon
wordchoiceswere“goon”inEnglish,and“continuar”
in Spanish.
We found no signiﬁcant difference between the
recorded and synthesized-voice conditions when com-
paring male and female speakers nor when comparing
English and Spanish speakers.
Politenessbehaviorswerecommon.Theseincluded
the use of the phrases “thank you” or “gracias” and
“please”or“porfavor”aswellasaresponding“good-
bye”or“adiós”tothesystem.Manysubjects(7Spanish-
language and 8 English-language, 37.5% total) used a
politeness behavior at least once and a few subjects (1
Spanish-language and 5 English-language, 15% total)
usedthemmorethanonce.OneEnglish-speakingfemale
usedpolitenessbehaviorswithalmosthalfofherinterac-
tions with the system. One subject, when asked in the
post-experiment interview why he chose to use this
behavior, responded “I don’t know, it’s just habit I
guess.” Three other subjects made similar statements.
Webelieve,andsomesubjectsconﬁrmed,thatsome
subjectsintherecordedversionassumedthattheywere
listeningtorecordingssimilartovoice-mailmessageson
theirtelephones.Theybelievedthatthepauseswerepart
ofthemessageandsodidnotrealizethatthesystemwas
awaiting their response.
4 Conclusions
Wecomparedsubjects’useofvariousstrategiesforcon-
trollingthepacingofinformationpresentationinasim-
ple spoken-language interface using synthetic speech
with one using recorded speech. We had hypothesized
thatsubjectswouldoffermoreacknowledgmentsinthe
recorded-voicecondition.Infact,wesawnodifferences
χ
2
14.34=
in the numbers of subjects using acknowledgment as a
preferredstrategy.Wealsosawasigniﬁcantincreasein
thenumberofsubjectswhomadenoattempttocontrol
the pacing of information presentation at all. We con-
cludethat,inthiscase,useofahumanvoiceintheinter-
face misled subjects into assuming a more limited
capabilitybasedontheirpreviousexperiencewithexist-
ing technology. In future work, we plan to move to a
richerdomainthatwillsupportamorecomplexinterac-
tion,oneinwhichthesystemwillhavemoreopportuni-
ties to signal its interactive capabilities to the user.
Acknowledgments
This work was partially supported by a gift from
Microsoft Corporation and by the National Science
Foundation’sModelInstitutionsforExcellenceInitiative
EEC9550502.TheauthorsthankDavidHerrera,Chris-
tian Servin, Tyler Smith, and Pauline Williamson for
theirassistancewiththestudy.Wealsothanktheanony-
mousreviewersfortheirhelpfulcommentsandsugges-
tions.

References
GregoryAist.1998.“ExpandingaTime-SensitiveCon-
versationalArchitectureforTurn-TakingtoHandle
Content-Driven Interruption,” in Proceedings of
ICSLP 98 Fifth International Conference on Spoken
Language Processing, 413-417.
Sara Basson, Stephen Springer, Cynthia Fong, Hong
Leung,EdMan,MicheleOlson,JohnPitrelli,Ran-
virSingh,andSukWong.1996.“UserParticipation
andComplianceinSpeechAutomatedTelecommu-
nicationsApplications,”inProceedingsofICSLP96
Fourth International Conference on Spoken Lan-
guage Processing, 1676-1679.
Susan E. Brennan. 1991. “Conversation With and
Through Computers,” User Modeling and User-
Adapted Interaction, 1:67-86.
Jennifer Chu-Carroll and Michael K. Brown. 1997.
“TrackingInitiativeinCollaborativeDialogueInter-
actions,”inProceedings ofthe35thAnnual Meeting
of the Association for Computational Linguistics,
262-270.
HerbertH.ClarkandEdwardF.Schaefer.1989.“Con-
tributingtoDiscourse,” Cognitive Science,13:259-
294.
Ron A. Cole, David G. Novick, P.J.E. Vermeulen,
Stephen Sutton, Mark Fanty, L.F.A. Wessels,
JacquesdeVilliers,J.Schalkwyk,BrianHansenand
D.Burnett.1997.“ExperimentswithaSpokenDia-
logueSystemforTakingtheU.S.Census,” Speech
Communications, Vol. 23.
Phillip Cohen and Sharon Oviatt. 1994. “The role of
voice in human-machine communication,” Voice
Communication between Humans and Machines
(ed.byD.RoeandJ.Wilpon),NationalAcademyof
Sciences Press, Washington, D. C., Ch. 3, 34-75.
Tatsuya Iwase and Nigel Ward. 1998. “Pacing Spoken
Directions to Suit the Listener,” in Proceedings of
ICSLP 98 Fifth International Conference on Spoken
Language Processing, Vol. 4, 1203-1207.
DavidG.NovickandStephenSutton.1994.“AnEmpiri-
cal Model of Acknowledgment for Spoken-Lan-
guageSystems,”inProceedings of the 32nd Annual
Meeting of the Association for Computational Lin-
guistics, 96-101.
Y.Okato,K.Kato,M.YamamotoandS.Itahashi.1998.
“System-UserInteractionandResponseStrategyin
SpokenDialogueSystem,”inProceedingsofICSLP
98 Fifth International Conference on Spoken Lan-
guage Processing, Vol. 2, 495-498.
H.Sacks,E.SchegloffandG.Jefferson.1974.“ASim-
plestSystematicsfortheOrganizationofTurn-Tak-
ing in Conversation,”Language, 50:696-735.
StephenSutton,RonCole,JacquesdeVilliers,J.Schalk-
wyk,P.Vermeulen,M.Macon,Y.Yan,E.Kaiser,B.
Rundle,K.Shobaki,P.Hosom,A.Kain,J.Wouters,
D.MassaroandM.Cohen.1998.”UniversalSpeech
Tools: the CSLUToolkit,” in Proceedings of the
International Conference on Spoken Language
Processing, 3221-3224.
MarilynA.Walker,CandaceA.KammandDianeJ.Lit-
man. 2000. “Towards Developing General Models
of Usability with PARADISE,” Natural Language
Engineering.
KarenWardandPeterA.Heeman.2000.“Acknowledg-
ments in Human-Computer Interaction,” in Pro-
ceedings of the 1st Meeting of the North American
Chapter of the Association for Computational Lin-
guistics (NAACL 2000), April 29-May 4, 280-287.
Karen Ward, Tasha Hollingsed, and Javier A. Aldaz
Salmon. 2003. “Toward Building Conversational
Spoken-LanguageInterfaces:AcknowledgmentUse
in American English and Mexican Spanish,” Pro-
ceedings of the Fourth Mexican International Con-
ferenceonComputerScience,September10-12,10-
17.
