File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/99/p99-1040_concl.xml
Size: 3,380 bytes
Last Modified: 2025-10-06 13:58:27
<?xml version="1.0" standalone="yes"?> <Paper uid="P99-1040"> <Title>Automatic Detection of Poor Speech Recognition at the Dialogue Level</Title> <Section position="6" start_page="314" end_page="314" type="concl"> <SectionTitle> 4 Discussion </SectionTitle> <Paragraph position="0"> The experiments presented here establish several findings. First, it is possible to give an objective definition for poor speech recognition at the dialogue level, and to apply machine learning to build classifiers detecting poor recognition solely from features of the system log. Second, with appropriate sets of features, these classifiers significantly outperform the baseline percentage of the majority class. Third, the comparable performance of classifiers constructed from rather different feature sets (such as acoustic and lexical features) suggest that there is some redundancy between these feature sets (at least with respect to the task). Fourth, the fact that the best estimated accuracy was achieved using all of the features suggests that even problems that seem inherently acoustic may best be solved by exploiting higher-level information.</Paragraph> <Paragraph position="1"> This work differs from previous work in focusing on behavior at the (sub)dialogue level, rather than on identifying single misrecognitions at the utterance level (Smith, 1998; Levow, 1998; van Zanten, 1998). The rationale is that a single misrecognition may not warrant a global change in dialogue strategy, whereas a user's repeated problems communicating with the system might warrant such a change. While we are not aware of any other work that has applied machine learning to detecting patterns suggesting that the user is having problems over the course of a dialogue, (Levow, 1998) has applied machine learning to identifying single misrecognitions. We are currently extending our feature set to include acoustic-prosodic features such as those used by Levow, in order to predict misrecognitions at both the dialogue level as well as the utterance level.</Paragraph> <Paragraph position="2"> We are also interested in the extension and generalization of our findings in a number of additional directions. In other experiments, we demonstrated the utility of allowing the user to dynamically adapt the system's dialogue strategy at any point(s) during a dialogue. Our results show that dynamic adaptation clearly improves system performance, with the level of improvement sometimes a function of the system's initial dialogue strategy (Litman and Pan, 1999). Our next step is to incorporate classifiers such as those presented in this paper into a system in order to support dynamic adaptation according to recognition performance. Another area for future work would be to explore the utility of using alternative methods for classifying dialogues as good or bad. For example, the user satisfaction measures we collected in a series of experiments using the PAR-ADISE evaluation framework (Walker et al., 1998c) could serve as the basis for such an alternative classification scheme. More generally, in the same way that learning methods have found widespread use in speech processing and other fields where large corpora are available, we believe that the construction and analysis of spoken dialogue systems is a ripe domain for machine learning applications.</Paragraph> </Section> class="xml-element"></Paper>