File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-0603_intro.xml
Size: 5,549 bytes
Last Modified: 2025-10-06 14:06:20
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0603"> <Title>GENERALITY AND OBJECTIVITY Central Issues in Putting a Dialogue Evaluation Tool into Practical Use</Title> <Section position="3" start_page="0" end_page="17" type="intro"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> Spoken language technologies are being viewed as one of the most important next steps towards truly natural interactive systems which are able to communicate with humans the same way that humans communicate with each other. After more than a decade of promises that versatile spoken language dialogue systems (SLDSs) using speaker-independent continuous speech recognition were just around the comer, the first such systems are now in the market place. These developments highlight the needs for novel tools and methods that can support efficient development and evaluation of SLDSs.</Paragraph> <Paragraph position="1"> There is currently no best practice methodology available which specialises software engineering best practice to the particular purposes of dialogue engineering, that is, to the development and evaluation of SLDSs. In June 1997, a European Concerted Action, DISC (Spoken Language Dialogue Systems and Components - Best Practice in Development and Evaluation), will be launched with the goal of systematically addressing this problem. DISC aims to develop a first detailed and integrated set of development and evaluation methods and procedures (guidelines, checklists, heuristics) for best practice in the field of dialogue engineering as well as a range of much needed dialogue engineering support concepts and software tools. The goals of dialogue engineering include optimisation of the user-friendliness of SLDSs which will ultimately determine their rank among emerging input/output technologies. The present paper will present ongoing work on one of the tools that are planned to result from DISC.</Paragraph> <Paragraph position="2"> It is a well-recognised fact that the production of a new software engineering tool or method is difficult and time consuming. The difficulties lie not only in the initial conception of, for instance, a new tool, or in tool drafting and early in-house testing. Even if these stages yield encouraging results, there is a long way to go before the tool can stand on its own and be used as an integral part of best practice in the field. One central reason why this is the case is the problem of generalisation. A tool which only works, or is only known to work, on a single system, in a highly restricted domain of application, or in special circumstances, is of little interest to other developers. In-house testing will inevitably be made on a limited number of systems and application domains and often is subject to other limitations of scope as well. To achieve and demonstrate an acceptable degree of generality, the tool must be iteratively developed and tested on systems and application domains, and in circumstances that are significantly different from those available in-house. Achievement of generality therefore requires access to other systems, corpora and/or development processes. Such access is notoriously difficult to obtain for several reasons, including commercial confidentiality, protection of in-house know-how and protection of developers' time. A second reason why software engineering tool or method development is difficult and time consuming is the problem of objectivity. It is not sufficient that some method or tool has been trialled on many different cases and in widely different conditions. It must also have been shown that different developers are able to use the new method or tool with approximately the same result on the same corpus, system or development process. The benefits from using a new tool or method should attach to that tool or method rather than to its originators.</Paragraph> <Paragraph position="3"> ,/ Prior to the start of DISC, we have developed and tested a tool for dialogue design evaluation on an in-house SLDSs project (Bemsen et al. 1996, Bemsen et al. 1997a). The paper will present first test results on the generality and objectivity of this tool called DET (Dialogue Evaluation Tool). Building on the assumption that most, if not all, dialogue design errors can be viewed as problems of non-cooperative system behaviour, DET has two closely related aspects to its use. Firstly, it may be used as part of a methodology for diagnostic evaluation of spoken human-machine dialogue. Following the detection of human-machine miscommunication, DET enables in-depth classification of miscommunication problems that are caused by flawed dialogue design. In addition, the tool supports the repair&quot; of those problems, preventing their occurrence in future user interactions with the system. Secondly, DET can be used to guide early dialogue design in order to prevent dialogue design errors from occurring in the implemented system. The distinction between use of DET for diagnostic evaluation and as design guide mainly depends on the stage of systems development at which it is being used. When used prior to implementation, DET acts as a design guide; when applied to an implemented system, DET acts as a diagnostic evaluation tool. In what follows, we describe the development and in-house testing of the tool (Section 2), present ongoing work on testing its generality and objectivity (Section 3), and conclude the paper taking a look at the work ahead (Section 4).</Paragraph> </Section> class="xml-element"></Paper>