File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-1068_metho.xml
Size: 12,536 bytes
Last Modified: 2025-10-06 14:07:10
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1068"> <Title>Flexible Mixed-Initiative Dialogue Management using Concept-Level Confidence Measures of Speech Recognizer Output</Title> <Section position="3" start_page="468" end_page="470" type="metho"> <SectionTitle> 3 Mixed-initiative Dialogue Strategy </SectionTitle> <Paragraph position="0"> using CMs There m:e a lot of systems that hawe a(lopted a mixed-initiative strategy (Sturm et al., 1999)(Goddeau et a.l., 1996)(Bennacef e.t al., 1996). It has several adwmtages. As the. systems do not impose rigid system-initiated templates, the user can input values he has in mind directly, thus the dialogue l)ecomes more natural. In conventional systems, the system-initiated utterances are considered only when semantic mnbiguity occurs. But in order to realize robust interaction, the system should make confirmations to remove recognition errors and generate guidances to lead next user's utterance to succcssflll interpretation. In this section, we describe how to generate the system-initiated utterances to deal with recognition errors. An overview of our strategy is shown in Figure 2.</Paragraph> <Section position="1" start_page="468" end_page="469" type="sub_section"> <SectionTitle> 3.1 Making Effective Confirmations </SectionTitle> <Paragraph position="0"> Confidence Measure (CM) is useflll in selecting reliable camlidates and controlling coniirnlation strategy. By setting two thresholds 01,02(01 > 0~) on content-word CM (CM.,), we provide the confirmation strategy as tbllows.</Paragraph> <Paragraph position="1"> --* reject the hypothesis The. threshold 01 is used to judge whether the hypothesis is accepted or should be confirmed, and tile threshold 02 is used to judge whether it is reiected.</Paragraph> <Paragraph position="2"> Because UMw is defined for every content word, judgment among acceptance, confirmation, or rejection is made for every content word when one utterance contains several content words. Suppose in a single utterance, one word has CM,,,, between 0~ and 0~ and tile other has t)elow 02, the tbrlner is given to confirmation process, and tile latter is rejected. Only if all content words are rejected, the system will prompt the user to utter again. By accepting confident words and rejecting mlreliable candidates, this strategy avoids redundant confirmations and tbcuses on necessary confirmation. We optinfize these thresholds 0t, 02 considertug the false, acceptance (FA) and the false rejection (FR) using real data.</Paragraph> <Paragraph position="3"> Moreover, the system should confirm using task-level knowledge. It is not usual that users change the already specified slot; values. Thus, recognition results that overwrite filled slots are likely to be errors, even though its CM~, is high. By making confirmations ill such a situation, it is expected that false acceptance (FA) is suppressed. null</Paragraph> </Section> <Section position="2" start_page="469" end_page="470" type="sub_section"> <SectionTitle> 3.2 Generating System-Inltiated Guidanees </SectionTitle> <Paragraph position="0"> It is necessary to guide tile users to recover ti'om recognition errors. Especially for novice users, it is often eflbctive to instruct acceptable slots of the system. It will be helpful that tile system generates a guidance about the acceptable slots when the user is silent without carrying out tile dialogue.</Paragraph> <Paragraph position="1"> The system-initiated guidances are also effective when recognition does not go well. Even when any successflfl output of content words is not obtained, the system cast generate effective guidances based on the semantic attribute with confidence in spite of low word confidence high confidence. An example is shown in Figure 3. In this example, all the 10-best candidates are concerning a name of place but their CMw values are lower than the threshold (02).</Paragraph> <Paragraph position="2"> As a result, any word will be neither accepted nor confirmed. In this case, rather than rejecting the whole sentence and telling the user &quot;Please say again&quot;, it; is better to guide the user based oll the attribute having high CM,., such as &quot;Which city is your destination?&quot;. This guidance enables tile system to narrow down the vocabulary of the next user's utterance and to reduce the recognition difficulty. It will consequently lead next user's utterance to successful interpretation.</Paragraph> <Paragraph position="3"> When recognition on a content word does not go well repeatedly in spite of high semanti(:attribute CM, it is reasoned that the content word may be out-ofvocalmlary, in such a case, the systmn shouht change the que.stion. For example, if an uttermme coal;alas all out-of vocat)ulary word and its semantic-attribute is inibrred as &quot;location&quot;, the system can make guidance, &quot;Please st)eci(y with the name of t)refecture&quot;, which will lead the next user's utterance into the system's vocabulary.</Paragraph> </Section> </Section> <Section position="4" start_page="470" end_page="471" type="metho"> <SectionTitle> 4 Experimental Evaluation </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="470" end_page="470" type="sub_section"> <SectionTitle> 4.1 Task and Data </SectionTitle> <Paragraph position="0"> We evaluate our nmthod on the hotel query task. We colh;cted 120 mimll;es speech data by 24 novice users l)y using the 1)rototylm system with GUI (Figure 4) (Kawahara et al., 1999).</Paragraph> <Paragraph position="1"> The users were given siml)le instruction beforehand oll the system's task, retriewfi)le il;(nns, how to cancel intmt values, and so o11. The data is segmented into 705 utterances, with a t)ause of 1.25 seconds. The voeal)ulary of I;he system contains 982 words, and the aural)or of database records is 2040.</Paragraph> <Paragraph position="2"> ()tit of 705 lltterailces, \]24 llttelTallces (1.7.6%) are beyond the system's eal)al)ility , namely they are out-ofvocalmlary, ou|;-ofgrmnmar~ out-of task, or fragment of llttel'allC(L \]i1 tbllowing ex1)erim(mt;s, we cvahmte th(', sys|;t',ln \])erl))rm~nce using all (lath including these mm,c(:el)tnt)le utterances in order to evahlalt;e how the system can reject unexl)ected utterances at)t)rot)riately as well as recognize hernial utterances correctly.</Paragraph> </Section> <Section position="2" start_page="470" end_page="470" type="sub_section"> <SectionTitle> 4.2 Thresholds to Make Confirmations </SectionTitle> <Paragraph position="0"> In section 3.1, we t)resented confirmation strategy 1)y setting two thresholds 01,02 (01 > 02) for eolfl, enl;-word CM (CMw). We optinlize these threshoht wflues using t;11(; collected data. \Y=e count ca:ors 11ol; by the utterance lint by the content-word (slot). The number of slots is 804.</Paragraph> <Paragraph position="1"> The threshold 01 decides t)etween accel)tanee and confirmation. The wdue of 0\] shouhl be determined considering both the ratio of ineof rectly accepting recognition errors (False At-ceptance; FA) and the ratio of slots that are not filh;d with correct wfiues (Slot; Error; SErr).</Paragraph> <Paragraph position="2"> Namely, FA and SErr are defined an the (:(mq)lemeats of t)recision and recall rate of the outl)ltt , respectively.</Paragraph> <Paragraph position="3"> FA = ~ el' incorrectly accepted words of accepted words fl~ of correet;ty aecel)ted words SE'rr = I of all correct words After experimental optimization to minimize FA+SErr, we derive a wflue of 0i as 0.9.</Paragraph> <Paragraph position="4"> Similarly, the threshold 02 decides contirlnation and rejection. The value of 02 should be decided considering both the ratio of incorrectly rqjeeting content words (False Rejection; FR) and the ratio of aceel)ting recognition errors into the eonfirlnation 1)recess (conditional False At:eel)tahoe; cFA).</Paragraph> <Paragraph position="5"> fl: of incorrectly re.jetted words ~- of all rejected words If we set the threshohl 02 lower, FR decreases and correspondingly cFA increases, which means that more candidates are ol)tained but more eontirmations are needed. By minim izing \]q/.+cFA, we deriw; a value of 02 as 0.6.</Paragraph> </Section> <Section position="3" start_page="470" end_page="471" type="sub_section"> <SectionTitle> 4.3 Comparison with Conventional Methods Ill many conventional st)oken di~dogue syst;ems, </SectionTitle> <Paragraph position="0"> only 1-best candidate of a speech recognizer outt)ut is used in the subsequent processing.</Paragraph> <Paragraph position="1"> \Y=e (:Oral)are ore' method with a conventional method that uses only 1-best ean(lidate in interpretation ae(:uraey. 'l.'he result is shown in %rifle 1.</Paragraph> <Paragraph position="2"> 1111 the qlo eonfirnlation' strategy, the hypothes(,s are classified by a single threshohl (0) into either the accepted or the rejected. Namely, (:ontent words having CM,,, over threshohl 0 are aecet)ted, mM otherwise siml)ly r(\[iected. In this case, a 1;hreshold wflue of 0 is set to 0.9 that gives miniature FA-FSErr. 111 the 'with confirmation' strategy~ the proposed (:oniirmation strategy is adol)ted using ()1 and 02. We set 01 = 0.9 and 02 = 0.6. The qTA+SErr' in Table 1 means FA(0~)+SErr(02), on the assumption that the contirnmd l)hrases are correctly either accel)ted or rejected. -We regard this assmnt)tion as at)l)rol)riate, because users tend to answer ~ye, s' simply to express their affirmation (Hockey et al., 1997), so the sys|;em can distinguish affirmative answer and negative olle by grasping simple 'yes' utterances correctly.</Paragraph> <Paragraph position="3"> i~ ........................ % ............. II III I ~t (a) A real system in Japanese</Paragraph> </Section> <Section position="4" start_page="471" end_page="471" type="sub_section"> <SectionTitle> Hotel Accommodation Search </SectionTitle> <Paragraph position="0"> hotel type is I Japanese-style I location is I downtown Kyoto \] room rate is less than I 10,000 I yen These are query results * only 1st candidate 51.5 27.6 23.9 no confirmation 46.1 14.8 31.3 with confirmation 40.0 14.8 25.2 FA: ratio of incorrectly accepting recognition errors SErr: ratio of slots that are not filled with correct values As shown in Table 1, interpretation ~,c('uracy is improved by 5.4% in the 'no confirmation' strategy compared with the conwmtional method. And 'with confirmation' strategy, we achieve 11.5% improvement in total. This result proves that our method successflflly eliminates recognition errors.</Paragraph> <Paragraph position="1"> By making confirmation, the interaction becomes robust, but accordingly the number of whole utterances increases. If all candidates having CM, o under 01 are given to confirmation process without setting 0u, 332 wdn confirmation for incorrect contents are generated out of 400 candidates. By setting 02,102 candidates having CMw between 01 and 02 are confirmed, and the number of incorrect confirmations is suppressed to 53. Namely, the ratio of correct hypotheses and incorrect ones being confirmed are ahnost equM. This result shows indistinct candidates are given to confirmation process whereas scarcely confident candidates are rejected.</Paragraph> </Section> <Section position="5" start_page="471" end_page="471" type="sub_section"> <SectionTitle> 4.4 Effectiveness of Semantic-Attribute </SectionTitle> <Paragraph position="0"/> </Section> </Section> <Section position="5" start_page="471" end_page="472" type="metho"> <SectionTitle> CM </SectionTitle> <Paragraph position="0"> In Figure 5, the relationship between content-word CM and semantic-attribute CM is shown.</Paragraph> <Paragraph position="1"> It is observed that semantic-attribute CMs are estimated more correctly than content-word CMs. Therefore, even when successful interpretation is not obtained fl'om content-word CMs, semantic-attribute can be estimated correctly.</Paragraph> <Paragraph position="2"> In experimental data, there are 148 slots 2 that are rejected by content-word CMs. It is also observed that 52% of semantic-attributes 2Out-of-vocabulary and out-of-grammar utterances are included in their phrases.</Paragraph> <Paragraph position="3"> with CA4c over 0.9 is correct. Such slots amomit to 34. Namely, our system can generate ett.'ccrive guidances against 23% (34/148) of utterantes that had been only rejected in conventional methods.</Paragraph> </Section> class="xml-element"></Paper>