Knowledge acquisition for a constrained speech system using WoZ 
Laila Dybkjver & Niels Ole Bernsen & Hans Dybkjaer 
Centre for Cognitive Informatics (CCI), Roskilde University 
PO Box 260, DK-4000 Roskilde, Denmark 
emails: laila@ruc.dk, nob@ruc.dk, dybkjaer@ruc.dk 
This paper describes the knowledge acquisition 
phase in a national project z aimed at the design of 
realistic spoken language dialogue system prototypes 
in the domain of airline ticket reservation and flight 
information \[Dybkj~er and Dybkjaer, 1993\]. 
The goals of the knowledge acquisition phase were 
to define a dialogue structure and a sublanguage vo- 
cabulary and grammar for subsequent implementa- 
tion of a first prototype. The development method 
was the Wizard of Oz simulation technique \[Fraser 
and Gilbert, 1991\]. The dialogue model had to sat- 
isfy a number of conflicting constraints, most im- 
portantly: (1) A maximum user vocabulary of 500 
word forms. (2) A maximum user utterance length 
of 10 words and an average length of 3-4 words. (3) 
A usable dialogue, including sufficient domain and 
task coverage, robustness and real-time system per- 
formance. (4) A natural form of dialogue and lan- 
guage. 
A usable system is one which can do the tasks re- 
quired of it. In principle, it can replace a human op- 
erator on those tasks. A natural system, on the other 
hand, is one which allows users to use free and uncon- 
strained spontaneous speech in efficiently achieving 
their goals. In the development of the first proto- 
type to be described here, the focus was on usability 
(constraints (1)-(3) above) and on laying the founda- 
tions for meeting the naturalness constraint (4) in a 
second prototype. The real-time requirement of (3) 
forces the recogniser to handle at most 100 active 
words at a time, and together with (1) and (2) this 
obviously pushes the dialogue model towards a rigid 
system-directed dialogue structure. 
Seven iterations of Wizard of Oz experiments were 
performed involving taped and transcribed dialogues 
between the wizard and subjects. Voice distorting 
hardware (equalizer and harmonizer) was only used 
in the final set of experiments. A wizard's assis- 
tant was used in the three last sets of experiments. 
From iteration 3 onwards, the wizard used a graph 
structure based on the notion of basic tasks and con- 
taining canned phrases in the nodes and contents of 
possible user answers along the edges. In addition, 
users were instructed to answer questions briefly and 
one at a time in order to be understood by the sys- 
tem. Users were given broadly described scenarios 
ZThe project is carried out in collaboration with the 
Speech Technology Centre at Aalborg University (STC) 
and the Centre for Language Technology at Copenhagen 
University (CST). We gratefully acknowledge the support 
of the project by the Danish Government's Informatics 
Programme. 
the goals of which they had to achieve in dialogue 
with the system. In the last three iterations 23 sub- 
jects performed in all 107 dialogues with 28 different 
scenarios using a total of 4455 words. 
The constraints (1) and (2) above on vocabulary 
size and maximum and average user utterance length 
have been met. In the last iteration only 3 user utter- 
ances out. of 881 contained more than 10 tokens and 
the average number of tokens per user turn was 1.85. 
The total number of word types was 165 excluding 
numbers, weekdays, months, and destinations. Ad- 
ditional inflexions and a complete list of numbers, 
weekdays, months, and destinations are incorporated 
in the final sublanguage which includes close to 500 
word forms. 
In order to evaluate the simulated system's usabil- 
ity and naturalness (3)-(4), users were given a ques- 
tionaire asking them about their opinion of the sys- 
tem. On average they found the system desirable 
(62%), efficient (60%), robust (82%), reliable (73%), 
easy to use (73%), simple (78%), and friendly (82%), 
but still 81% preferred to talk to a human travel 
agent! Apart from a general preference for talking to 
humans this is probably due to the rigid menu-like 
structure. As for robustness the wizard did not sim- 
ulate misrecognitions. This may result in lack of ro- 
bustness in the first prototype. The domain and task 
coverage was sufficient for the scenarios used and the 
system would seem adequate for handling the tasks 
which were found in recordings from a travel agency. 
The vocabulary is believed to be usable but its 
natural limits have not yet been identified. More- 
over, subjects tended to model formulations from 
the scenarios. To improve data reliability, scenar- 
ios should be used which only provide an abstract 
scenario frame and force subjects to be inventive. 
The second prototype should demonstrate im- 
proved naturalness, including: a less rigid menu 
structure which allows immediate focused choice; 
longer average user utterances; well-tested robust- 
ness; and an increased amount of information trans- 
ferred between different tasks and subtasks. 
References 
\[Dybkj~er and Dybkjaer, 1993\] Laila Dybkj~er and 
Hans Dybkjaer. Wizard of Oz Experiments in the 
Development of the Dialogue Model for P1. Report 
3, STC, CCI, CST, 1993. 
\[Fraser and Gilbert, 1991\] Norman M. Fraser and 
G. Nigel Gilbert. Simulating Speech Systems. 
Computer Speech and Language , no. 5, 1991. 
467 
