A. Generic Template to evaluate integrated components 
in spoken dialogue systems 
Gavin E Churcher and Eric S Atwell and Clive Souter 
Centre for Computer Analysis of Language And Speech (CCALAS) 
Artificial Intelligence Division, School of Computer Studies 
The University of Leeds, LEEDS LS2 9JT, Yorkshire, England 
gavin~scs.leeds.ac.uk and eric~scs.leeds.ac.uk and cs@scs.leeds.ac.uk 
WWW: http://agora.leeds.ac.uk/amalgam/ 
Abstract enough to be employed in the commercial market 
We present a generic template for spoken 
dialogue systems integrating speech recog- 
nition and synthesis with 'higher-level' nat- 
ural language dialogue modelling compo- 
nents. The generic model is abstracted 
from a number of real application sys- 
tems targetted at very different domains. 
Our research aim in developing this generic 
template is to investigate a new approach 
to the evaluation of Dialogue Management 
Systems. Rather than attempting to mea- 
sure accuracy/speed of output, we propose 
principles for the evaluation of the underly- 
ing theoretical linguistic model of Dialogue 
Management in a given system, in terms 
of how well it fits our generic template for 
Dialogue Management Systems. This is 
a measure of 'genericness' or 'application- 
independence' of a given system, which can 
be used to moderate accuracy/speed scores 
in comparisons of very unlike DMSs serving 
different domains. This relates to (but is 
orthogonal to) Dialogue Management Sys- 
tems evaluation in terms of naturalness 
and like measurable metrics (eg Dybkjaer 
et al 1995, Vilnat 1996, EAGLES 1994, 
Fraser 1995); it follows more closely emerg- 
ing qualitative evaluation techniques for 
NL grammatical parsing schemes (Leech et 
al 1996, Atwell 1996). 
KEYWORDS: evaluation, comparisons, generic 
model, standards. 
1 Background 
Dialogue management systems, particularly those 
which replace a graphical user interface with a spo- 
ken language one, have become increasingly popu- 
lar. Speech recognition is gradually becoming robust 
place, and because of this many companies are real- 
ising the value of a spoken interface to their prod- 
ucts and services. The research community pro- 
7" vides a number of methodologies to the represen- 
tation of dialogue and its implementation on a com- 
puter. Correspondingly, there are a number of de- 
sign methodologies for building such a system. De- 
spite there many differences, every one contains a 
common process: an evaluative cycle. Evaluating a 
dialogue management system is a difficult and often 
subjective experience. Whilst it is possible to objec- 
tively measure recognition performance, evaluation 
of a dialogue is not as straightforward. Even those 
systems which exhibit appalling speech recognition 
performance can nevertheless lead to "successful" di- 
alogues. 
2 Quantitative and qualitative 
evaluation 
There are two approaches to evaluating a dialogue 
management system: to use a qualitative or a quan- 
titative measure. A qualitative evaluation would 
rely on the user's opinion of the system. Dybkjaer 
et al (1995) conducted interviews after each session 
and asked whether the dialogue seemed natural and 
pleasant. Such a subjective evaluation is fraught 
with problems. For example, the user may learn af- 
ter the first attempt how to address the system and 
which words to use or avoid. Subsequent evaluations 
of the same system may then vary even though the 
system has not changed. Some users may find the 
system difficult to use whilst other will find it effort- 
less. "Pleasantness" differs from person to person, 
too. As Vilnat (1996) argues, there is no clear con- 
sensus of what comprises a good dialogue. When 
asking the user, the designer has to make sure that 
the user is representative of the end user in terms of 
background and frequency of use. Because of these 
problems, many researchers have tried to provide a 
means of objectively evaluating a system. 
The two methodologies for quantitative evalua- 
tion, black and glass box, are concerned with input 
and output behaviour and the behaviour of each of 
the components in the system, respectively. Glass 
box evaluation can rely on a comparison between 
the output of a component and a retrospective ref- 
erence. By directly comparing the two it is possible 
to measure the accuracy of that component. The 
black box approach, on the other hand, cannot use 
this method to evaluate a dialogue since there is no 
"correct" dialogue to compare it with. Despite this, 
objective evaluation of the dialogue is necessary in 
order to compare the performance of different sys- 
tems. Initial efforts have been made to standardise 
this (for example in EAGLES, see Fraser 1995a) but 
remain work in progress. 
3 Common components in practical 
Dialogue Management Systems 
Our recent survey of a number of dialogue man- 
agement systems has led us to identify those fea- 
tures and components which occur in many of the 
systems. By examining a range of successful sys- 
tems, from flight information services (Fraser 1995b) 
and appointment scheduling in Verbmobil (Alexan- 
der and Reithinger 1995, Maier 1996, Alexandersson 
1996) to theatre ticket booking (Hulstijn et al. 1996) 
and virtual space navigation (Nugues et al. 1996), a 
template for a generic dialogue management system 
has been drafted. A number of features are incor- 
porated, including a pragmatics interpreter dealing 
with discourse phenomena such as anaphoric resolu- 
tion and ellipsis, a model of the task structure and 
how it relates to the dialogue structure, a model of 
conversation incorporating an interaction strategy 
and a recovery strategy, and a semantic interpreter 
which resolves the full interpretation of an utterance 
in light of its context. This generic template can 
be used in the design of future dialogue manage- 
ment systems, highlighting important features and 
the mechanisms required to implement them. The 
template also provides an application-independent 
method for assessing systems according to the fea- 
tures they exhibit. 
4 Advantages of qualitative 
assessment against a standard 
Speech And Language Technology researchers are 
used to thinking of evaluation in terms of speed 
and accuracy of system outputs, for example 'suc- 
cess rate' of a speech recogniser or syntactic parser 
in analysing a standard test corpus. However, 'Din- 
10 
logue Management' is a high-level linguistic concept 
which cannot be measured so straightforwardly for 
several reasons: 
- existing DMSs are very domain-specific, and we 
need to compare dialogue systems across domains; 
so it makes no sense to look for a common standard 
'test corpus'; 
- the boundary between 'good' and 'bad' dialogue 
is very ill-defined, so it makes little sense to try to 
assess against a target 'correct output', or even by 
subjective assessment of 'pleasantness' of output; 
- the structure of dialogue (and hence a DMS) is 
complex, multi-level, and non-algorithmic, making a 
single overall 'evaluation metric' meaningless with- 
out consideration of component behaviours; 
- we need to evaluate the integrated system holis- 
tically, as opposed to measuring speed or accuracy 
of individual components; 
- alternative dialogue systems use a wide range of 
alternative component technologies; only by fitting 
these against a generic template can we discrimi- 
nate between superficial and substantive differences 
in component assumptions and functionalities. 
There is a useful analogy with evaluation of NL 
parsers; typically, rival parsers are compared by 
measuring speed (sentences-per-minute) and/or ac- 
curacy (e.g. percentage of sentences parsed) - e.g. 
(Sutcliffe et al 1996). However, rival parsing schemes 
include varying 'levels' of syntactic information, as 
shown in EAGLES recommendations (Leech et al 
1995). Atwell (1996) proposes an orthogonal eval- 
uation of parsing schemes against the generic EA- 
GLES 'template' of syntactic levels, so that a given 
parser speed/accuracy measure should be moder- 
ated by a 'genericness' weight; for example, the EN- 
GCG parser (Voutilainen and Jarvinen 1996) is very 
fast and accurate BUT its underlying parsing scheme 
instantiates only a small subset of the EAGLES 
'template', which moderates an overall 'score'. In 
much the same way, we propose that very unlike ri- 
val DMSs can be meaningfully compared by assess- 
ing how well they match our generic template for 
dialogue management architecture, and using this 
'genericness' score to temper any measures of speed, 
accuracy, naturalness, etc. 
Consider (Churcher et al 1997), which included a 
first attempt at an outline of a generic spoken lan- 
guage system. The model includes generic modules 
for syntactic, semantic, and speech act constraints; 
these constraints are integrated into spoken input in- 
terpretation to compensate for limitations in speech 
recognition components. The model constitutes a 
template tool for designing integrated systems; it 
specifies the standard components and how they fit 
together. As is the predicament of any generic sys- 
tem it is necessarily vague and since it attempts to 
combine components found in a variety of individual 
models, it may not fit all systems, if any in particu- 
lar. 
In our survey, we studied how this generic model 
mapped onto a range of existing real systems, by 
looking at the representation formats for the vari- 
ous linguistic features in the dialogue management 
schemes; as with grammatical analysis schemes, 
there is a need for a theory-neutral 'interlingua' stan- 
dard dialogue representation scheme (Atwell 1996). 
5 Features of Natural Dialogue 
'Naturalness' in dialogue is difficult to define, but 
by examining phenomena which occur in human to 
human dialogue we can begin to draw some fea- 
tures which contribute to its definition. The pro- 
posed model in (Churcher et al 97) reflects this to 
a certain extent by incorporating components for 
phenomena such as anaphora and ellipsis whilst ab- 
stracting away from those components which are do- 
main specific, such as the model of task/dialogue 
structure. To begin with, seven such features are 
described below. 
A: Anaphora 
Anaphora frequently occurs in dialogue. This form 
of deixis is applied to words which can only be inter- 
preted in the given context of the dialogue. There 
are a number of different forms of anaphora includ- 
ing personal pronouns (" I", "you", "he/she/it" etc.), 
spatial anaphora ("there", "that" etc.) and tempo- 
ral anaphora ("then"). Expressions relative to the 
current context often need to be interpreted into an 
absolute or canonical form. This form of anaphora 
includes expressions such as "next week" and "the 
next entry" which can only be resolved in relation to 
a previous expression. By incorporating anaphora, a 
speaker can reduce redundancy and economise their 
speech. 
B: Ellipsis 
Ellipsis commonly occurs in a sentence where for 
reasons of economy, style or emphasis, part of the 
structure is omitted. The missing structure can be 
recovered from the context of the dialogue and nor- 
mally the previous sentences. Without modelling 
ellipsis, dialogue can appear far from natural. 
C: Recovery strategy 
Although misunderstandings often occur in conver- 
sations, speakers have the ability to recover from 
these and other deviations in communication. Taleb 
(1996) presents an analysis of the type of commu- 
nicative deviations which can occur in conversation 
and categorises them into content and role devia- 
tions. The inadequacies of speech recognition tech- 
nology introduces additional potential deviations. A 
dialogue management system must be able to re- 
cover from any deviations which occur. Seldom 
in human to human conversation does the dialogue 
'break down'. 
D: Interaction strategy 
At any stage in a dialogue, one participant has the 
initiative of the conversation. In everyday conversa- 
tion, it is possible for either participant to take the 
initiative at any stage. Turning to dialogue man- 
agement, the interaction strategy is important when 
defining the naturalness of the system. System- 
orientated question and answer systems where the 
system has the initiative throughout the dialogue 
are the simplest to model since the user is explicitly 
constrained in their response. The greater freedom 
the user has to control the dialogue, the more com- 
plicated this modelling strategy becomes. Where the 
user has the initiative throughout the dialogue such 
as in command and control applications, the user has 
greater expressibility and freedom of choice. The 
most difficult dialogues to model are those where 
the initiative can be taken be either the system or 
the user at various points in the dialogue. As noted 
by Eckert (1996), mixed initiative systems involve 
dialogues which approach the intricacies of conver- 
sational turn-taking, utilising strategies which deter- 
mine when, for example, the system can take the ini- 
tiative away from the user. For systems using speech 
recognition, the ability to confirm or clarify given 
information is essential, hence system-orientated or 
mixed initiative should exist. 
E: Functional perplexity 
To a lesser extent, the range of tasks that can be 
performed by a particular dialogue is important. In 
human to human conversations, for example, an ut- 
terance can perform more than one illocutionary or 
speech act. In an analogous way, a dialogue can 
include more than one task, whether it is to book 
tickets for a performance, or to enquire about flight 
times. Looking to individual utterances, the greater 
the number of acts which can be performed, the 
more complex (or perplex) the language model be- 
comes. In everyday conversation, humans are adept 
at marking topic boundaries and changes. For appli- 
cations where more than one task is to be performed 
in a single dialogue, the dialogue manager needs to 
11 
be able to identify when the user switches from one 
task to another. Functional perplexity is a measure 
of the density of the topic changes in a single di- 
alogue and is accordingly difficult to calculate. A 
simpler measure is to count the number of semanti- 
cally distinct tasks a user can perform. 
F: Language perplexity 
The ability to express oneself as one wishes and still 
be understood is an important factor which con- 
tributes to naturalness in dialogue. This does not 
necessarily entail a very large vocabulary since cor- 
pus studies and similar language elicitation exer- 
cises can provide a relatively small, core vocabulary. 
The user's freedom of expression is implicitly related 
to the initiative strategy employed by the dialogue 
manager. For example, when the system has the 
initiative, the user's language can be explicitly con- 
strained. In contrast a system which allows the user 
to take the initiative has less control of the user's 
language. Again, as with functional perplexity, the 
perplexity of a language in this sense is difficult to 
measure but it is helpful to look to the extent that 
the system attempts to constrain the user's language 
for performing a task. The level of constraint should 
not be measured when the system is recovering from 
deviations in the dialogue, since focussing the user 
may be necessary for recovering from the deviation 
in as few steps as possible. 
G: Over-informativeness 
There are two inter- 
pretations of over-informativeness, system and user 
orientated, system orientated over-informativeness 
allows the dialogue manager to present more infor- 
mation to the user than was actually explicitly re- 
quested. User orientated over-informativeness is an 
important feature to have and is directly related to 
the degree of freedom of expression. In natural dia- 
logue, a speaker can provide more information than 
is actually requested. Humans are able to take this 
additional information into consideration or ignore 
it depending on how relevant it is to the conversa- 
tion. The information may have been volunteered in 
anticipation of a future request for information and 
as a result a dialogue manager which ignores it will 
not appear very natural. As an example, consider 
the following dialogue between the system and user 
where the user responds with a reply which is over- 
informative: 
User: I'd like to make an appointment. 
System: Who would you like to make an appoint- 
ment with? 
User: John Smith at 2pm. 
6 A Questionnaire 
Whilst each of the above features are important, it 
is not obvious which are more important to 'natural- 
ness' than others. Turning to the research commu- 
nity we asked those who had designed systems incor- 
porating dialogue management for their experiences 
and opinions. The questionnaire asked the commu- 
nity to rank the features according to how important 
they thought they were to their particular dialogue 
manager and to comment on each one. Given the 
time constraints, it was not possible to ask more 
detailed questions about each feature, although the 
respondents were encouraged to give examples. 
Table 1 shows the six systems detailed, table 2 a 
summary of the importance of the features to each 
system. The results range from 1 - the most im- 
portant to 7 - the least important; the ratings were 
allowed to be tied. 
Table 1:6 DMSs 
\[1\] Daimler-Benz Generic DMS (Heisterkamp 
1993, Heisterkamp and McGlashan 1996, 
Regel-Brietzm~nn et al. (forthcoming)) 
\[2\] LINLIN (Ahrenberg et al. 1990, 
Jonsson 1993, 1996) 
\[3\] EVAR German Train-Timetable Spoken 
Dialogue Information System (Eckert 
et al. 1993, Boros et al. 1996) 
\[4\] VERBMOBIL dialogue component 
(Alexandersson et al. 1996,1997) 
\[5\] The Slovenian Dialog System for Air 
Flight Inquiries (Ipsic et al. 1997, 
Pepelnjak et al. 1996) 
\[6\] SAPLEN - Sistema Automatico de 
Pedidos en Lenguaje Natural 
(Lopez-Cozar(forthcoming)) 
Table 2: Features ranked in 6 DMSs 
Feature A B C D E F G 
Eli 2 1 1 2 3 2 2 
\[2\] 2 1 1 2 2 2 2 
\[3\] 2 1 1 1 3 1 1 
\[4\] 1 1 1 6 - 2 - 
\[5\] - 3 6 6 5 3 2 
\[6\] 3 5 5 5 5., ,5 ,5 
Men 2.0 2.0 2.5 3.7 3.6 2.5 2.4 
12 
Note that where '-' occurs, the feature was not 
ranked, and so is omitted from the mean. It is inter- 
esting to note that different respondents interpreted 
the ranking differently. Whilst some understood the 
points system to indicate the order of importance of 
each feature, others, such as \[6\] considered the points 
to be an indication of how important the feature was 
to their system. 
By taking the mean of the scores, the features can 
be ordered as follows, most important first: 
A: Anaphora == B: Ellipsis 
G: Over-informativeness 
C: Recovery strategy == F: Language perplexity 
E: Functional perplexity 
D: Interaction strategy 
7 Comments on approach taken 
The initial, tentative ranking of features indicates 
that anaphora and ellipsis are important, whilst 
functional perplexity and interaction strategy are 
least important. Given that the systems surveyed 
performed just one or two tasks, it is not surprising 
that functional perplexity is not ranked highly. The 
low ranking of the interaction strategy reflects the 
application of the system. For example, system \[4\], 
Verbmobil, regarded the interaction strategy to be 
of low importance since it is a minimally intrusive 
system which facilitates the dialogue between two 
humans. 
What is made clear is that we need to conduct fur- 
ther research into explicitly quantifying each feature 
for this approach to be worthwhile. Whilst features 
such as over-informativeness are either present or 
not, others are finer grained; the interaction strat- 
egy can be system-orientated, user-orientated or a 
combination of both. Language perplexity, in the 
sense meant here, needs to be quantified, too, before 
it can be considered a useful feature. In retrospect, 
the ranking of each feature needs to be made consis- 
tent. 
8 Conclusion 
~ecent technological advances are bringing spoken 
dialogue systems closer to markets, to real applica- 
tions. As the focus of this research field shifts from 
academic study to commercial reality, we feel it is 
important to maintain a theoretical underpinning: 
a generic model for independent qualitative assess- 
ment and comparison of practical Interactive Spoken 
Dialogue Systems. We invite practical systems de- 
velopers to help us assess their products against this 
generic template, allowing us in turn to maintain 
and refine the theoretical generic model to keep step 
with practical developments. 
The list of features can be used in two ways: to 
evaluate the 'genericness' of a dialogue manager, and 
to ascertain whether a dialogue manager is suitable 
to a particular application. In choosing between ri- 
val Dialogue Managment Systems, it is not sensible 
to try to use a simple metric of accuracy or natural- 
ness applicable across all applications. Different ap- 
plications require different DMS features. Prospec- 
tive users hoping to re-use a DMS should first decide 
what they want from one; if they can frame their re- 
quirements in terms of our generic template, they 
can eliminate candidate systems which do not focus 
on the required features. 

References 
L. Ahrenberg, A. Jhnsson and N. Dahlb~ick, "Dis- 
course Representation and Discourse Management 
for Natural Language Interfaces", in Proceedings of 
the 2nd Nordic Conference on Text Comprehension 
in Man and Machine, T~iby, Sweden. 1990. 
J. Alexandersson and N. Reithinger, "Designing 
the dialogue component in a speech translation sys- 
tem - a corpus-based approach", in Andernach et hi. 
(eds.) 1995. 
J. Alexandersson, "Some ideas for the automatic 
acquisition of dialogue structure", in Luperfoy et al. 
1996. 
J. Alexandersson, N. Reithinger and E. Maier, 
"Insights into the Dialogue Processing of Verbmo- 
bil", in Proceedings of the Fifth Conference on 
Applied Natural Language Processing, ANLP '97, 
Washington, DC. 1997. 
E. Atwell, "Comparative evaluation of grammati- 
cal annotation models", in: Sutcliffe et al. 1996. 
G. Churcher, E. Atwell, C. Souter, "Dialogue 
Management Systems: a survey and overview", Re- 
search Report 97.06, School of Computer Studies, 
Leeds University, 1997. 
M. Boros, W. Eckert, F. Fallwitz, G. Hanrieder, 
G. Goerz, H. Niemann, "Towards Understanding 
Spontaneous Speech: Word Accuracy vs. Concept 
Accuracy", in Proceedings of the 4th International 
Conference on Spoken Language Processing (ICSLP- 
96), Philadelphia, 1996. 
H. Dybkjeer, L. Dybkjeer, N. O. Bernsen, "De- 
sign, formalization and evaluation of spoke language 
dialogue", in J. A. Andernach, S. P. van de Burgt 
and G. F. van der Hoeven (eds), "Corpus-based Ap- 
proaches to Dialogue Modelling", Proceedings of the 
9th Twente Workshop on Language Technology , 
University of Twente, Enschede, Netherlands. 1995. 
W. Eckert, T. Kuhn, N. Niemann, S. Rieck, A. 
Scheuer, E. G. Schukat-Talamazzini, "A Spoken Di- 
alogue System for German Intercity Train Timetable 
Inquiries", in Proceedings of Eurospeech '93, Berlin 
W. Eckert, "Understanding of Spontaneous Utter- 
ances in Human- Machine-Dialog", in Luperfoy et al. 
1996. 
N. Fraser, "Quality Standards for Spoken Dia- 
logue Systems: a report on progress in EAGLES", 
in Dalsgaard et al. (1995), pp 157-160. 1995. 
N. Fraser, "Messy data, what can we learn from 
it?", in Andernach et al. (eds.) 1995. 
P. tteisterkamp, "Ambiguity and uncertainty in 
spoken dialogue", in Proceedings of Eurospeech '93, 
Berlin, 1993. 
P. IIeisterkamp and S. McGlashan, "Units of di- 
alogue management: an example", in Proceedings 
of the 4th International Conference on Spoken Lan- 
guage Processing (ICSLP-96), Philadelphia, 1996. 
J. Hulstijn, R. Steetskamp, If. ter Doest, S. van 
de Burgt and A. Nijholt, "Topics in SCIIISMA Di- 
alogues", in Luperfoy et al. 1996. 
I. Ipsic, F. Mihelic, K. Pepelnjak, J. Gros, S. Do- 
brisek, N. Pavesic, E. Noth, "The Solvian Dialog 
System for Air Flight Inquiries", in Proceedings of 
the 2nd SQEL Workshop on Multi-Lingual Informa- 
tion Retrieval Dialogs, Pilsen, 1997. 
A. JSnsson, "Dialogue Actions for Natural Lan- 
guage Interfaces", in Proceedings of IJCAI-95, 
MontrEal, Canada, 1995. 
A. JSnsson, "A Model for Dialogue Management 
for Human Computer Interaction", in Proceedings 
of ISSD'96, Philadelphia, 1996. 
G. N. Leech, R. Barnett, P. Kahrel, "EAGLES 
Final Report and Guidelines for the Syntactic An- 
notation of Corpora", (EAGLES Document EAG- 
TCWG-SASG), Pisa. Italy. 1995 
R. Lopez-Cozr, P. Garcia, J. Diaz and A. J. Ru- 
bio, "A voice activated dialogue system for fast-food 
restaurant applications", Proceedings of Eurospeech 
'97, Rhodes. (forthcoming) 
S. Luperfoy, A. Nijholt and G. Veldhuijzen van 
Zanten (eds.), "Dialogue Management in Natural 
Language Systems", Proceedings of the 11th Twente 
Workshop on Language Technology , University of 
Twente, Enschede, Netherlands. 1996. 
E. Maier, "Context construction as subtask of di- 
alogue processing - the VEKBMOBIL case", in Lu- 
perfoy et al. 1996. 
P. Nugues, C. Godereaux, P. El Guedj and F. Re- 
volta, "A conversational agent to navigate in virtual 
worlds", in Luperfoy et al. 1996. 
K. Pepelnjak, F. Mihelic, N. Pavesic, "Semantic 
Decomposition of Sentences in the System Support- 
ing Flight Services", in Journal of Computing and 
Information Technology (CIT), Vol. 4, No. 1, Za- 
greb, 1996. 
P. Regel-Brietzmann et al., "ACCESS - Auto- 
mated Call CEnter through Speech understanding 
System. A description of an advanced application. 
Proceedings of Eurospeech '97, Rhodes. (forthcom- 
ing) 
M. Schillo, "Working while Driving: Corpus based 
language modelling of a natural English Voice-User 
Interface to the in-car Personal Assistant", MSc 
Thesis, School of Computer Studies, Leeds Univer- 
sity, 1996. 
M. Schillo, E. Atwell, C. Souter, T. Denson, "Lan- 
guage modelling for the in-car personal assistant", 
in C. Moghrabi (ed), "Proceedings of NLP+IA'96: 
International Conference on Natural Language Pro- 
cessing and Industrial Applications", Universite de 
Moncton, Canada, 1996. 
R. Sutcliffe, If. D. Koch, and A. McElligott 
(editors), Industrial Parsing of Software Manuals, 
Rodopi. 1996 
L. Taleb, "Communicative Deviation in Finalized 
Informative Dialogue Management", in Luperfoy et 
al. 1996. 
A. Vilnat, "Which processes to manage human- 
machine dialogue?", in Luperfoy et al. 1996. 
A. Voutilainen, T. Jarvinen, "Using English Con- 
straint Grammar to Analyse a Software Manual Cor- 
pus", in Sutcliffe et al. 1996. 
Michael Schillo, "Working while Driving: Corpus 
based language modelling of a natural English Voice- 
User Interface to the in-car Personal Assistant", 
MSc Thesis, School of Computer Studies, Leeds Uni- 
versity, 1996. 
Michael Schillo, Eric Atwell, Clive Sourer, Tony 
Denson, "Language modelling for the in-car personal 
assistant", in Chadia Moghrabi (ed), "Proceedings of 
NLP+IA '96: International Conference on Natural 
Language Processing and Industrial Applications", 
Universite de Moncton, Canada, 1996. 
