File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/p01-1012_intro.xml

Size: 4,601 bytes

Last Modified: 2025-10-06 14:01:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="P01-1012">
  <Title>Detecting problematic turns in human-machine interactions: Rule-induction versus memory-based learning approaches</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Related work
</SectionTitle>
    <Paragraph position="0"> Recently there has been an increased interest in developing automatic methods to detect problematic dialogue situations using machine learning techniques. For instance, Litman et al. (1999) and Walker et al. (2000a) use RIPPER (Cohen 1996) to classify problematic and unproblematic dialogues. Following up on this, Walker et al.</Paragraph>
    <Paragraph position="1"> (2000b) aim at detecting problems at the utterance level, based on data obtained with AT&amp;Ts How May I Help You (HMIHY) system (Gorin et al. 1997). Walker and co-workers apply RIPPER to 43 features which are automatically generated by three modules of the HMIHY system, namely the speech recognizer (ASR), the natural language understanding module (NLU) and the dialogue manager (DM). The best result is obtained using all features: communication problems are detected with an accuracy of 86%, a precision of 83% and a recall of 75%. It should be noted that the NLU features play first fiddle among the set of all features. In fact, using only the NLU features performs comparable to using all features. Walker et al. (2000b) also briefly compare the performance of RIPPER with some other machine learning approaches, and show that it performs comparable to a memory-based (instance-based) learning algorithm (IB, see Aha et al. 1991).</Paragraph>
    <Paragraph position="2"> The results which Walker and co-workers describe show that it is possible to automatically detect communication problems in the HMIHY system, using machine learning techniques. Their approach also raises a number of interesting follow-up questions, some concerned with problem detection, others with the use of machine learning techniques. (1) Walker et al. train their classifier on a large set of features, and show that the set of features produced by the NLU module are the most important ones. However, this leaves an important general question unanswered, namely which particular features contribute to what extent? (2) Moreover, the set of features which the NLU module produces appear to be rather specific to the HMIHY system and indicate things like the percentage of the input covered by the relevant grammar fragment, the presence or absence of context shifts, and the semantic diversity of subsequent utterances. Many current day spoken dialogue systems do not have such a sophisticated NLU module, and consequently it is unlikely that they have access to these kinds of features. In sum, it is uncertain whether other spoken dialogue systems can benefit from the findings described by Walker et al. (2000b), since it is unclear which features are important and to what extent these features are available in other spoken dialogue systems. Finally, (3) we agree with Walker et al. (and the machine learning community at large) that it is important to compare different machine learning techniques to find out which techniques perform well for which kinds of tasks. Walker et al. found that RIPPER does not perform significantly better or worse than a memory-based learning technique.</Paragraph>
    <Paragraph position="3"> Is this incidental or does it reflect a general prop-erty of the problem detection task? The current paper uses a similar methodology for on-line problem detection as Walker et al.</Paragraph>
    <Paragraph position="4"> (2000b), but (1) we take a bottom-up approach, focussing on a small number of features and investigating their usefulness on a per-feature basis and (2) the features which we study are automatically available in the majority of current spoken dialogue system: the sequence of system question types and the word graphs corresponding to the respective user utterances. A word graph is a lattice of word hypotheses, and we conjecture that various features which have been shown to cue communication problems (prosodic, linguistic and ASR features, see e.g., Hirschberg et al. 1999, Krahmer et al. 1999 and Swerts et al.</Paragraph>
    <Paragraph position="5"> 2000) have correlates in the word graph. The sequence of system question types is taken to model the dialogue history. Finally, (3) to gain further insight into the adequacy of various machine learning techniques for problem detection we use both RIPPER and the memory-based IB1-IG algorithm.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML