File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0609_metho.xml
Size: 7,449 bytes
Last Modified: 2025-10-06 14:14:45
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0609"> <Title>Turn off the Radio and Call Again: How Acoustic Clues can Improve Dialogue Management</Title> <Section position="2" start_page="0" end_page="46" type="metho"> <SectionTitle> 2 State-of-the-art in Dialogue </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="46" type="sub_section"> <SectionTitle> Management </SectionTitle> <Paragraph position="0"> Today, many dialogue management components base their operations on a combination of dialogue history, a rule-based or statisti-Most of the ideas presented here were developed during work at Forum Technology, Malvern. I am indebted to all of my colleagues there, especicially J6rg ~lberla and David Bijl for providing valuable discussions. I also owe thanks to Afzal Ballhn, and Yoshiki Mori for their comments on drafts of this poster.</Paragraph> <Paragraph position="1"> cal dialogue model, rule-based or statistical dialogue act idenfication rules, and semantic representations of user utterances. For spoken language input, this repertoire of information ensues interactions like the (unsuccessful) dialogue in Figure 1 (from a train timetable enquiry system). 1 What happens in a dialogue like this is the following: The dialogue manager tries to verify a value that is needed for a database query. This attempt fails several times, since the incoming semantic representations are inconsistent due to recognition errors. After an arbitrary number of trials (two in the example), the dialogue manager surrenders.</Paragraph> <Paragraph position="2"> More advanced systems exploit additional information to generate more intelligent feedback for the user. The Verbmobil device which translates spoken utterances between German and English as well as between Japanese and English (see (Kay et al., 1994)), e.g. exploits representations of unknown words and information about phonetically similar words to generate clarification dialogues along the lines of Figure 2 (cf. (Maier et al., 1997)).</Paragraph> <Paragraph position="3"> Ich mSchte gerne um zwei Uhr nach Hamburg fahren.</Paragraph> <Paragraph position="4"> I would like to travel to Hamburg at two o 'clock.</Paragraph> <Paragraph position="5"> Sie mSchten nach Homburg fahren? You would like to go to Homburg? Nein, nicht nach Homburg, sondern nach Hamburg.</Paragraph> <Paragraph position="6"> No, not to Homburg but to Hamburg. null Sie mSchten nach Offenburg fahren? You would like to go to Oftenburg? null Nein, ich mSchte auch nicht nach Offenburg, sondern nach Hamburg. null No, I also don't want to go to Offenburg but to Hamburg.</Paragraph> <Paragraph position="7"> Es scheint ein Kommunikationsproblem zu geben. Bitte versuchen Sie es sp~ter noch einmal. null We seem to have a communica- null Still missing even from the more sophisticated systems like Verbmobil is a flow of information all the way from the acoustic level up to the dialogue management component. 2 This kind of coupling will enable more successful systems for two reasons: improved speech recognition and more informative responses to the user.</Paragraph> <Paragraph position="8"> Improvement in speech recognition will stem from the dialogue manager acting as a kind of mediator between the speech recognizer and the user. In case of bad recognition rates (speech recognizers already deliver confidence scores), the dialogue manager could ask for acoustic clues concerning the recognition conditions. If it then receives some clues about background noise (e.g., a radio), it might initiate a request to the user to establish a better acoustic environment. More specifically, the dialogue manager could generate the concepts to ask Could you please turn off the radio? The quality of responses to users equally well can profit from information about the acoustic environment. To see this, imagine a situation where a police officer reports from the scene of an accident: In this situation, the acoustic conditiohs presumably are so adverse that recognition accuracy is inacceptable. Unlike in the scenario above, however~ little can be done to change the environment. The appropriate action of the dialogue manager thus would be to make it clear to the officer that he is wasting his time in trying to get his message through.</Paragraph> <Paragraph position="9"> Thus, with information on the acoustic conditions/environment, the dialogue in Figure 1 could become the one outlined in Fi- null Ich mSchte gerne um zwei Uhr nach Hamburg fahren.</Paragraph> <Paragraph position="10"> I would like to travel to Hamburg at two o'clock. Sie sind wegen der Musik im Hintergrund leider schwer zu verstehen. WPSre es mSglich, dat~ Sie nochmals anrufen, wenn Sie die Musik abgestellt haben? Unfortunately, I have got difficulties in understanding you due to the music in the background. Could you call again after having turned it off? Ja; bis gleich.</Paragraph> <Paragraph position="11"> Yes; until later.</Paragraph> </Section> </Section> <Section position="3" start_page="46" end_page="46" type="metho"> <SectionTitle> 4 Techniques for Detecting Acoustic </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="46" end_page="46" type="sub_section"> <SectionTitle> Clues </SectionTitle> <Paragraph position="0"> Work on the acoustic level that is suited for integration into a dialogue framework like that depicted above has not advanced far enough, yet. Only recently, in the background of the DARPA evaluations for speech recognition systems, the importance of the type of noise tracking that is needed has been realized. A component that not only detects but also classifies noise (as, e.g., music or street noise) has a good chance of becoming the first plug-and-play spoken language interface entity (SLIE), and seems to be realizable by well-mastered techniques for speech recognition like Hidden Markov Models ((Rabiner, 1989)).</Paragraph> <Paragraph position="1"> One approach for classifying acoustic conditions into different categories (e.g.</Paragraph> <Paragraph position="2"> background music) would be to use techniques like the ones used for non-word based topic spotting (see (Nowell and Moore, 1995)). The different categories of noise would correspond to topics, and typical sections of acoustic material from each category would correspond to keywords. Based on samples from each category/topic, keywords which are most useful in identifying this topic would then be extracted automatically. An incoming signal could then be classified as belonging to one of the categories, depending on which keywords appear most frequently.</Paragraph> <Paragraph position="3"> Another approach is to build a simple Hidden Markov Model which gets trained for each category from the data in that category.</Paragraph> <Paragraph position="4"> An incoming signal could then be assigned to the category whose HMM gives the best match.</Paragraph> <Paragraph position="5"> Research is also needed in the realm of dialogue management. It remains to be investigated in exactly which ways the acoustic information can be used. Obvious requests or follow-up questions like those examplifled above are one option; more clever questions like Our communication may proceed more smoothly if the system adapts to your acoustic conditions; shall this be done? are another.</Paragraph> </Section> </Section> class="xml-element"></Paper>