Flexible Mixed-Initiative Dialogue Management using 
Concept-Level Confidence Measures of Speech Recognizer Output 
Kazunori Korea|an| and Tatsuya Kawahara 
Graduat(~ School of lnt'ormati(:s, I{yoto University 
Kyoto 606-8501, JaI)an 
{kolnatani, kawahara} (@kuis.k.yoto-u. ac.j i) 
Abstract 
We i)rcsent a method to r(:aliz(: th:xil)le mix(;(l- 
initiative dialogue, in which the syst(:m can 
mak(, etti:ctive COlflirmation mad guidmn(:(: us- 
ing (-oncel)t-leve,1 confidcn('e mcmsur(,s (CMs) 
derived from st)eech recognizer output in ord(:r 
to handl(: sl)eech recognition errors. W(: d(:tine 
two con('et)t-level CMs, which are oil COllt(~,Ilt - 
words and on semantic-attrilmtes, using 10-best 
outtmts of the Sl)e(:ch r(:cognizt:r and l)arsing 
with t)hrmse-level grammars. Content-word CM 
is useflll for s(:lecting 1)\]ausible int(:rl)retati(ms. 
Less contid(:nt illt(:rl)r(:tmtions arc given to con- 
firmation 1)roc(:ss. The strat(:gy iml)roved the 
interpr(:tmtion accuracy l)y 11.5(/0. Moreover, 
th(: semanti(:-mttrilmt(: CM ix us(:d to (:stimmtc 
user's intention and generates syst(mi-initiative 
guidances (:v(,,n wh(:n suc(-(:sstSfl int(:rl)r(:tmtiol~ is 
not o|)tain(:(1. 
1 Introduction 
In a st)oken dialogu(: system, it fr(:(tuently o(:- 
cm:s that the system incorrectly rccogniz(:s user 
utterances and the user makes exl)ressions the 
system has not (~xt)ccted. These prot)lcms arc 
essentially incvital)le in handling the natural 
 1)y comlmters , even if vocal)ulary and 
grammar of the system are |~lmed. This lack of 
robustness is one of the reason why spoken dia- 
logue systems have not been widely deployed. 
In order to realize a rol)ust st)oken dialogue 
system, it is inevital)le to handle speech recog- 
nition errors. To sut)t)ress recognition errors, 
system-initiative dialogue is eitbctive. But it 
ca.n 1)e adopted only in a simi)le task. For in- 
stance, the form-tilling task can 1)e realized 1)y a 
simi)le strategy where the system asks a user the 
slut wdues in a fixed order. In such a systeln- 
initiated intera('tion, the recognizer easily nar- 
rows down the vocabulary of the next user's ut- 
tcrance, thus the recognition gets easier. 
()n the other hand, in more eoniplicat('A task 
such ms inforination rctriewd, the vocml)ulmry of 
the llCXI; lltt(2rauco callllot 1)e limited on all oc- 
casions, because the user should be abh~ to in- 
put the values in various orders based on his 
i)rel'erence. Therefore, without imposing a rigid 
teml)late ut)on the user, the system must behav(~ 
at)t)rol)riately even when sl)ecch recognizer out- 
1)ut contains some errors. 
Obviously, making confirmal;ion is efl'cctive 
to mvoid misun(lerstandings caused by slme(:h 
recognition errors, ttowcver, when contirmm- 
tions are made \]'or every utterance, |;lie di- 
~dogue will l)ccome too redundant mad con- 
sequcntly |;rout)lcsomc, for users. Previous 
works have, shown that confirmation strategy 
shouM 1)c decided according to the frequency of 
stretch recognition errors, using mathematicml 
formula (Niimi and Kolmymshi, 1.996) and using 
comt)uter-to-comlml;er silnulation (W~tanabe et 
al., 1!)98). These works assume tixe(t l)erfof 
mance (averaged speech recognition accuracy) 
in whole (lialogue with any speakers. For flex- 
ible dialogue management, howeve, r the confir- 
mation strategy luust 1)e dynamically changc, d 
bmsed on the individual utterances. For in- 
stmncc, we human make contirmation only when 
we arc not coat|dent. Similarly, confidence, inca- 
sures (CMs) of every speech recognition output 
should be modeled as a criterion to control dia- 
logue management. 
CMs have been calculated in previous works 
using transcripts and various knowledge sources 
(Litman et al., 1999) (Pao et, al., 1998). For 
more tlexible interaction, it, ix desirable that 
CMs are detined on each word rather than whole 
sentence, because the systeln can handle only 
unreliable portions of an utterance instead of 
accepting/rejecting whole sentence. 
467 
In this paper, we propose two concept-level 
CMs that are on content-word level and on 
semantic-attribute level for every content word. 
Because the CMs are defined using only speech 
recognizer output, they can be computed in real 
time. The system can make efficient confir- 
mation and effective guidance according to the 
CMs. Even when successful interpretation is 
not obtained o51 content-word level, the system 
generates system-initiative guidances based on 
the semantic-attribute level, which lead the next 
user's utterance to successful interpretation. 
2 Definition of Confidence Measures (CMs) 
Confidence Measures (CMs) have been studied 
for utterance verification that verifies speech 
recognition result as a post-processing (Kawa- 
hara et al., 1998). Since an automatic speech 
recognition is a process finding a sentence hy- 
pothesis with the maximum likelihood for an 
input speech, some measures are needed in or- 
der to distinguish a correct recognition result 
from incorrect one. In this section, we de- 
scribe definition of two level CMs which are on 
content-words and on semantic-attritmtes, us- 
ing 10-best output of the speech recognizer and 
parsing with phrase-level grammars. 
2.1 Definition of CM for Content Word 
In the speech recognition process, both acoustic 
probability and linguistic t)robability of words 
are multiplied (summed up in log-scale) over 
a sentence, and the sequence having maximum 
likelihood is obtained by a search algorithm. A 
score of sentence derived from the speech rec- 
ognizer is log-scaled likelihood of a hypothesis 
sequence. We use a grammar-based speech rec- 
ognizer Julian (Lee et al., 1999), which was de- 
veloped in our laboratory. It correctly obtains 
the N-best candidates and their scores by using 
A* search algorithm. 
Using the scores of these N-best candidates, 
we calculate content-word CMs as below. The 
content words are extracted by parsing with 
phrase-level grammars that are used in speech 
recognition process. In this paper, we set N = 
10 after we examined various values of N as the 
nmnber of computed candidates J 
1Even if we set N larger tt,an 10, the scores of i-th 
hypotheses (i > 10) are too small to affect resulting CMs. 
First, each i-th score is multiplied by a factor 
a(a < 1). This factor smoothes tile difference 
of N-best scores to get adequately distributed 
CMs. Because the distribution of the abso- 
lute values is different among kinds of statisti- 
cal acoustic model (monophone, triphone, and 
so oi1), different values must be used. The value 
of c~ is examined in the preliminary experiment. 
In this paper, we set c~ = 0.05 when using tri- 
phone model as acoustic model. Next, they are 
transtbrnmd from log-scaled value (<t. scaledi) 
to probability dimension by taking its exponen- 
tial, and calculate a posteriori probability tbr 
each i-th candidate (Bouwman et al., 1999). 
e~.scaledi 
Pi = ~n Co~.scaledj j=l 
This Pi represents a posteriori probability of the 
i-th sentence hypothesis. 
Then, we compute a posteriori probability tbr 
a word. If the i-th sentence contains a word w, 
let 5w,i = 1, and 0 otherwise. A posteriori prob- 
ability that a word w is contained (Pw) is de- 
rived as summation of a posteriori prob~bilities 
of sentences that contain the word. 
/L 
Pw = ~ Pi " 5w,i 
i=1 
We define this Pw as the content-word CM 
(CM,,). This CM.,, is calculated tbr every con- 
tent word. Intuitively, words that appear many 
times in N-best hypotheses get high CMs, and 
frequently substituted ones in N-best hypothe- 
ses are judged as mn'eliable. 
In Figure 1, we show an example in CMw 
calculation with recognizer outputs (i-th recog- 
nized candidates and their a posteriori proba- 
bilities) tbr an utterance "Futaishisctsu ni rcsu- 
toran no aru yado (Tell me hotels with restau- 
rant facility.)". It can be observed that a correct 
content word 'restaurant as facility' gets a high 
CM value (CMw = 1). The others, which are 
incorrectly recognized, get low CMs, and shall 
be rejected. 
2.2 CM for Semantic Attribute 
A concept category is semantic attribute as- 
signed to content words, and it is identified 
by parsing with phrase-level gramnmrs that are 
used in speech recognition process and repre- 
sented with Finite State Automata (FSA). Since 
468 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
Recognition candidates 
aa shisetsu ni resutmnu, no kayacho 
with restaurant facility / Kayacho(location) 
aa shisetsu ni rcsuto7nn no katsurn no 
with restaurant fimility / Katsura(location) 
aa shisctsu ni resutoran no kamigamo 
with restaurant facility / Kmnigamo(location) 
<g> shisctsu ni 
with restaurant 
<g> shisetsu ni 
with restaurant 
<g> shisetsu ni 
resutoran no kayacho 
facility / Kayacho(location) 
rcsutor'a~t 7to kat.~'~tra 
facility / Katsura(location) 
7"cs?ttoritTt 7~,o kamigamo 
with restaurant facility / I(amigamo(location) 
aa sh, isetsu ni resutoran no kafc 
with restaurant fimility / care(facility) 
<g> shisetsu ni resutoran no kafe 
with restaurant facility / cafc(facility) 
<g> setsubi wo rcsutoran no kayacho 
with restaurant facility / I(ayacho(locatlon) 
<g> sctsubi wo resutoran no katsura no 
with restaurant facility / Katsura(location) 
.24 
.24 
.20 
.08 
.08 
.06 
.05 
.02 
.01 
.01 
<g>: tiller model 
CM,,, \]. 
0.33 0.33 
0.25 
0,07 
(content word) ~ (semantic attribute) 
restaurant @ fimility 
Kayacho @ location 
Katsura 0 location 
Kmnigmno ~ location 
care ~ facility 
Figure. 1: Example of content-word CM (CM,,,) 
these FSAs are, classified into (:on(:cl)t categories 
lmforehand, we can auton|atically derive the 
concept categories of words by parsing with 
these grammars. In our hotel query task, there 
are sevelt concept categories such as qocation', 
'fi, cility' and so on. 
For this concept (:ategory, we also de- 
fine semantic-attritmtc CMs (CM~:) as tbllows. 
First, we (-ah:ulnte a t)osteriori probabilities of 
N-best sentences in the same. way of comtmt- 
ing content-word CM. If a concel)t c~tegory c is 
contained in the i-th sentence, let 5,,,i = 1, and 0 
otherwise. The t)robability that a concept cat- 
egory c is correct (Pc) is derived as below. 
Pc = E pi ' sc,i 
i=1 
We define this Pc as semantic-attribute CM 
(CM~). This CMc estimates which category the 
user refers to and is used to generate ett'ective 
guidances. 
HSel'~ S ut\[el'ancc ) 
v 
~__ speech recognizer 
(___ each content word ) g'N-be~t, candidatcs-~' j. 
/ cont{3n~wo|'d / ', 
CM 
r 
acccpt~ 
i 
semantic atl|ibutc / 
CM S 
fill \] semantic slots \[ guidance \] prompt to 
rcpht'asc 
Figure 2: Overview of OlU" strategy 
3 Mixed-initiative Dialogue Strategy 
using CMs 
There m:e a lot of systems that hawe a(lopted 
a mixed-initiative strategy (Sturm et al., 
1999)(Goddeau et a.l., 1996)(Bennacef e.t al., 
1996). It has several adwmtages. As the. sys- 
tems do not impose rigid system-initiated tem- 
plates, the user can input values he has in 
mind directly, thus the dialogue l)ecomes more 
natural. In conventional systems, the system- 
initiated utterances are considered only when 
semantic mnbiguity occurs. But in order to re- 
alize robust interaction, the system should make 
confirmations to remove recognition errors and 
generate guidances to lead next user's utterance 
to succcssflll interpretation. In this section, we 
describe how to generate the system-initiated 
utterances to deal with recognition errors. An 
overview of our strategy is shown in Figure 2. 
3.1 Making Effective Confirmations 
Confidence Measure (CM) is useflll in selecting 
reliable camlidates and controlling coniirnlation 
strategy. By setting two thresholds 01,02(01 > 
0~) on content-word CM (CM.,), we provide the 
confirmation strategy as tbllows. 
469 
• C-Mw > 0~ 
accept the hypothesis 
• Oj >_CM~>02 
-~ make confirmation to the user 
"Did you say ...?" 
• 02 >_ CM~,, 
--* reject the hypothesis 
The. threshold 01 is used to judge whether the 
hypothesis is accepted or should be confirmed, 
and tile threshold 02 is used to judge whether it 
is reiected. 
Because UMw is defined for every content 
word, judgment among acceptance, confirma- 
tion, or rejection is made for every content 
word when one utterance contains several con- 
tent words. Suppose in a single utterance, one 
word has CM,,,, between 0~ and 0~ and tile other 
has t)elow 02, the tbrlner is given to confirma- 
tion process, and tile latter is rejected. Only if 
all content words are rejected, the system will 
prompt the user to utter again. By accepting 
confident words and rejecting mlreliable candi- 
dates, this strategy avoids redundant confirma- 
tions and tbcuses on necessary confirmation. 
We optinfize these thresholds 0t, 02 consider- 
tug the false, acceptance (FA) and the false re- 
jection (FR) using real data. 
Moreover, the system should confirm using 
task-level knowledge. It is not usual that users 
change the already specified slot; values. Thus, 
recognition results that overwrite filled slots are 
likely to be errors, even though its CM~, is high. 
By making confirmations ill such a situation, it 
is expected that false acceptance (FA) is sup- 
pressed. 
3.2 Generating System-Inltiated 
Guidanees 
It is necessary to guide tile users to recover ti'om 
recognition errors. Especially for novice users, 
it is often eflbctive to instruct acceptable slots 
of the system. It will be helpful that tile system 
generates a guidance about the acceptable slots 
when the user is silent without carrying out tile 
dialogue. 
The system-initiated guidances are also effec- 
tive when recognition does not go well. Even 
when any successflfl output of content words is 
not obtained, the system cast generate effective 
guidances based on the semantic attribute with 
f 
utterance: 
correct: 
"shozai ga oosakaflt no yado" 
(hotels located in Osaka pref.) 
Osaka-pref¢location 
i 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
recognition candi(tat, es (<g>: filler model) 
dtozai ga potoairando no <g> located in Port-island 
shozai ga potoairando no <g> 
located in Port-island 
shozai ga oosakafu no <g> 
located in Osaka-pref. 
shozai ga oosakafu no <g> 
located in Osaka-pref. 
shozai ga oosa\]cashi no <g> located in Osaka-city 
shozai ga oosakashi no <g> 
located in Osaka-city 
shozai ga ohazaki no <g> 
located in Okazaki 
shozai ga otcazaki no <g> located in Okazaki 
shozai ga oohara no <g> located in Ohara 
shozai ga oohara no <g> located in Ohara 
C2~tc semantic attributes 
1 location 
CMw 
0.38 
0.30 
0.13 
0.11 
0.08 
content words 
Port-islandelocation 
Osaka-pref.~location 
Osaka-city,location 
Okazakielocation 
Ohara01ocation 
Figure 3: Example of high semantic attribute 
confidence in spite of low word confidence 
high confidence. An example is shown in Fig- 
ure 3. In this example, all the 10-best candi- 
dates are concerning a name of place but their 
CMw values are lower than the threshold (02). 
As a result, any word will be neither accepted 
nor confirmed. In this case, rather than re- 
jecting the whole sentence and telling the user 
"Please say again", it; is better to guide the user 
based oll the attribute having high CM,., such 
as "Which city is your destination?". This guid- 
ance enables tile system to narrow down the 
vocabulary of the next user's utterance and to 
reduce the recognition difficulty. It will conse- 
quently lead next user's utterance to successful 
interpretation. 
When recognition on a content word does not 
470 
go well repeatedly in spite of high semanti(:- 
attribute CM, it is reasoned that the content 
word may be out-ofvocalmlary, in such a case, 
the systmn shouht change the que.stion. For 
example, if an uttermme coal;alas all out-of 
vocat)ulary word and its semantic-attribute is 
inibrred as "location", the system can make 
guidance, "Please st)eci(y with the name of t)re- 
fecture", which will lead the next user's utter- 
ance into the system's vocabulary. 
4 Experimental Evaluation 
4.1 Task and Data 
We evaluate our nmthod on the hotel query 
task. We colh;cted 120 mimll;es speech data by 
24 novice users l)y using the 1)rototylm system 
with GUI (Figure 4) (Kawahara et al., 1999). 
The users were given siml)le instruction before- 
hand oll the system's task, retriewfi)le il;(nns, 
how to cancel intmt values, and so o11. The data 
is segmented into 705 utterances, with a t)ause 
of 1.25 seconds. The voeal)ulary of I;he system 
contains 982 words, and the aural)or of database 
records is 2040. 
()tit of 705 lltterailces, \]24 llttelTallces (1.7.6%) 
are beyond the system's eal)al)ility , namely they 
are out-ofvocalmlary, ou|;-ofgrmnmar~ out-of 
task, or fragment of llttel'allC(L \]i1 tbllowing ex- 
1)erim(mt;s, we cvahmte th(', sys|;t',ln \])erl))rm~nce 
using all (lath including these mm,c(:el)tnt)le ut- 
terances in order to evahlalt;e how the system 
can reject unexl)ected utterances at)t)rot)riately 
as well as recognize hernial utterances correctly. 
4.2 Thresholds to Make Confirmations 
In section 3.1, we t)resented confirmation strat- 
egy 1)y setting two thresholds 01,02 (01 > 02) for 
eolfl, enl;-word CM (CMw). We optinlize these 
threshoht wflues using t;11(; collected data. \¥e 
count ca:ors 11ol; by the utterance lint by the 
content-word (slot). The number of slots is 804. 
The threshold 01 decides t)etween accel)tanee 
and confirmation. The wdue of 0\] shouhl be 
determined considering both the ratio of ineof 
rectly accepting recognition errors (False At-- 
ceptance; FA) and the ratio of slots that are 
not filh;d with correct wfiues (Slot; Error; SErr). 
Namely, FA and SErr are defined an the (:(mq)le- 
meats of t)recision and recall rate of the outl)ltt , 
respectively. 
FA = ~ el' incorrectly accepted words 
of accepted words 
fl~ of correet;ty aecel)ted words SE'rr = I - 
of all correct words 
After experimental optimization to minimize 
FA+SErr, we derive a wflue of 0i as 0.9. 
Similarly, the threshold 02 decides contirlna- 
tion and rejection. The value of 02 should be 
decided considering both the ratio of incorrectly 
rqjeeting content words (False Rejection; FR) 
and the ratio of aceel)ting recognition errors into 
the eonfirlnation 1)recess (conditional False At:- 
eel)tahoe; cFA). 
fl: of incorrectly re.jetted words 
~- of all rejected words 
If we set the threshohl 02 lower, FR de- 
creases and correspondingly cFA increases, 
which means that more candidates are ol)tained 
but more eontirmations are needed. By mini- 
m izing \]q/.+cFA, we deriw; a value of 02 as 0.6. 
4.3 Comparison with Conventional 
Methods 
Ill many conventional st)oken di~dogue syst;ems, 
only 1-best candidate of a speech recognizer 
outt)ut is used in the subsequent processing. 
\¥e (:Oral)are ore' method with a conventional 
method that uses only 1-best ean(lidate in in- 
terpretation ae(:uraey. 'l.'he result is shown in 
%rifle 1. 
1111 the qlo eonfirnlation' strategy, the hy- 
pothes(,s are classified by a single threshohl (0) 
into either the accepted or the rejected. Namely, 
(:ontent words having CM,,, over threshohl 0 are 
aecet)ted, mM otherwise siml)ly r(\[iected. In this 
case, a 1;hreshold wflue of 0 is set to 0.9 that 
gives miniature FA-FSErr. 111 the 'with con- 
firmation' strategy~ the proposed (:oniirmation 
strategy is adol)ted using ()1 and 02. We set 
01 = 0.9 and 02 = 0.6. The qTA+SErr' in Ta- 
ble 1 means FA(0~)+SErr(02), on the assump- 
tion that the contirnmd l)hrases are correctly ei- 
ther accel)ted or rejected. -We regard this as- 
smnt)tion as at)l)rol)riate, because users tend to 
answer ~ye, s' simply to express their affirmation 
(Hockey et al., 1997), so the sys|;em can dis- 
tinguish affirmative answer and negative olle by 
grasping simple 'yes' utterances correctly. 
471 
i~ ........................ % ............. II III I ~t 
(a) A real system in Japanese 
Hotel Accommodation Search 
hotel type is I Japanese-style I 
location is I downtown Kyoto \] 
room rate is less than I 10,000 I yen 
These are query results • 
(b) Upper I)ortion translated in English 
Figure 4: An outlook of GUI (Graphical User Interface.) 
Table 1: Comparison of methods 
FA+SErr FA SErf 
only 1st candidate 51.5 27.6 23.9 
no confirmation 46.1 14.8 31.3 
with confirmation 40.0 14.8 25.2 
FA: ratio of incorrectly accepting recognition errors 
SErr: ratio of slots that are not filled with correct values 
As shown in Table 1, interpretation ~,c('u- 
racy is improved by 5.4% in the 'no confirma- 
tion' strategy compared with the conwmtional 
method. And 'with confirmation' strategy, we 
achieve 11.5% improvement in total. This result 
proves that our method successflflly eliminates 
recognition errors. 
By making confirmation, the interaction be- 
comes robust, but accordingly the number of 
whole utterances increases. If all candidates 
having CM, o under 01 are given to confirma- 
tion process without setting 0u, 332 wdn con- 
firmation for incorrect contents are generated 
out of 400 candidates. By setting 02,102 candi- 
dates having CMw between 01 and 02 are con- 
firmed, and the number of incorrect confirma- 
tions is suppressed to 53. Namely, the ratio 
of correct hypotheses and incorrect ones being 
confirmed are ahnost equM. This result shows 
indistinct candidates are given to confirmation 
process whereas scarcely confident candidates 
are rejected. 
content-word CM and semantic-attribute CM 100 
FA+SErr(content word) 
FA+SErr(semantic attribute) ----- 
8O 
6O 
+ 
m< 40 
"N,_~.~.~.~ ..................... ~'//1 21 , _\] 
0 0.2 0.4 0.6 0.8 1 
threshold 
Figure 5: Pertbrm~mce of the two CMs 
4.4 Effectiveness of Semantic-Attribute 
CM 
In Figure 5, the relationship between content- 
word CM and semantic-attribute CM is shown. 
It is observed that semantic-attribute CMs are 
estimated more correctly than content-word 
CMs. Therefore, even when successful interpre- 
tation is not obtained fl'om content-word CMs, 
semantic-attribute can be estimated correctly. 
In experimental data, there are 148 slots 2 
that are rejected by content-word CMs. It is 
also observed that 52% of semantic-attributes 
2Out-of-vocabulary and out-of-grammar utterances 
are included in their phrases. 
472 
with CA4c over 0.9 is correct. Such slots amomit 
to 34. Namely, our system can generate ett.'cc- 
rive guidances against 23% (34/148) of utter- 
antes that had been only rejected in conven- 
tional methods. 
5 Conclusion 
We present dialogue mallagement using two 
concel)t-level CMs in order to realize rolmst ill- 
teractioll. The content-word CM provides a 
criterion to decide whether an interpretation 
should be accel)ted, confirmed, or rejected. This 
strategy is realized by setting two thresholds 
that are optimized balancing false acceptance 
and false rejection. The interpretation error 
(FA+SErr) is reduced by 5.4% with no confir- 
mation and by \] 1.5% with confirmations. More- 
over, we &',line CM on semantic attribut(~s, and 
propose a new met;hod to generate eilbx:tive 
guidances. The concept-t)ased (:onfidence mea- 
sure realizes tlexible mixed-initiative dialogue in 
which the system can make effective contirma- 
tion and guidance by estimating user's inten- 
tion. 

References 

S. \]~(mnacef, L. Devillers: S. Rosset, and 
L. Lamel. 1996. Dialog in the I/.AIIfl?EL 
telet)hone-1)as(~(t system. In Pwc. \]nt'l Con.fi 
on ,5'pokc'n, Language l)Tvcc.ssi'n.g. 

G. Bouwman, ,1. Sturm, and L. Boves. 1999. In- 
cort)orating contidcnce measures in the. Dutch 
train timetable information system developed 
in the ARISE t)roject. In P'lvc. ICASSP. 

D. God(lean, H. Meng, J. Polifroni, S. Seneff, 
and S. Busayapongchai. 1996. A form-based 
diah)gue manager for spoken  al)pli- 
cations. In P~vc. lnt'l Co7@ on Spoken Lan- 
guage \])rocessing. 

B. A. Hockey, D. l:l,ossen-Knill, B. Spejew- 
ski, M. Stone, and S. Isard. 1997. Can 
you predict resl)onses to yes/no questions? 
yes,no,and stuff. In Proc. EUIl, OSPEECI\]'97. 

T. Kawahara, C.-H. Lee, ~md B.-H. Juang. 
1998. Flexible speech understanding based 
on confl)ined key-t)hrase detection and veri- 
fication. IEEE TTnns. on Speech and Audio 
Processing, 6 (6):558-568. 

T. Kawahara, K. q_?maka, and S. Doshita. 1999. 
Domain-independent t)latform of spoken di- 
alogue interfaces for information query, in 
Proc. ESCA workshop on Interactive Dia- 
loguc in Multi-Modal Systems, pages 69 72. 

A. Lee, T. Kawahara, and S. Doshita. 1999. 
Large. vocabulary continuous speech recogni- 
tion 1)arser based on A* search using gram- 
mar category category-pair constraint (in 
,lapancsc). Trans. h~:formation Processing 
Society of ,lapau,, 40(4):1374-1382. 

D. J. Litman, M. A. Walker, and M. S. Kearns. 
1999. Automatic detection of 1)oor speech 
recognition at the dialogue level. In Pwc. of 
37th Annual Meeting o.f the A CL. 

Y. Niimi and Y. Kobayashi. 1996. A dialog con- 
trol strategy based on the reliability of st)eech 
recognition. In Proc. Int'l Cml:\[. on ,5'pokcn 
Language Processing. 

C. Pao, P. S('hmid, and J. Glass. 1998. Con- 
tidence scoring tbr st)eech mlderstnnding sys- 
tems. In P~v(:. \]'nt'l Conf. on S"poken Lan- 
guage P~vcessing. 

.}. Sturm, E. Os, and L. Boves. 1999. Issues in 
spoken dialogue systems: Experiences with 
the Dutch ARISE system. In Pwc. of ESCA 
IDS'99 Workshop. 

T. Watanal)e, M. Ar~ki, and S. Doshitm 1998. 
Evaluating diah)gue strategies un(lex colnnm- 
nication errors using COlntmter-to-comlmter 
simulation. 7}'an.s. of IEICE, he:fo g Syst., 
E81-D(9):I025 1033. 
