DARPA FEBRUARY 1992 ATIS BENCHMARK TEST 
RESULTS 
David S. Pallett, Nancy L. DahIgren, Jonathan G. Fiscus, 
William M. Fisher, John S. Garofolo, Brett C. Tjaden 
National Institute of Standards and Technology 
Building 225, Room A216 
Gaithersburg, MD 20899 
1 INTRODUCTION 
This paper documents the third in a series of Bench- 
mark Tests for the DARPA Air Travel Information Sys- 
tem (ATIS) common task domain. The first results in 
this series were reported at the June 1990 Speech and 
Natural Language Workshop \[1\], and the second at the 
February 1991 Speech and Natural Language Workshop 
\[2\]. The February 1992 Benchmark Tests include: (1) 
ATIS domain spontaneous speech recognition system 
tests, (2) ATIS natural language understanding tests, 
and (3) ATIS spoken language understanding tests. 
Since the February 1991 tests, a large ATIS spoken 
language corpus has been collected, coordinated by a 
DARPA "Multi-Site ATIS Data COllection Working" 
(MADCOW) Group. The activities of this group, and 
NIST's role in that effort, are documented in another 
paper in this Proceedings \[3\]. 
2 OCTOBER 1991 "DRY RUN" TESTS 
The procedures for test set selection, testing, scoring, 
adjudication, and reporting for the February 1992 ATIS 
Benchmark Tests were developed and used for a "dry 
run" test in October 1991, with unpublished results. A 
somewhat smaller test set was used at that time, which 
did not include test data from AT&T. The implemen- 
tation of the tests was generally regarded as success- 
ful within the DARPA MADCOW Group and by the 
DARPA Spoken Language Program Coordinating Com- 
mittee. 
3 NEW CONDITIONS FOR THESE 
TESTS 
The structure (and scoring) of these ATIS domain tests 
differ in several ways from the tests reported at the June 
1990 and February 1991 Workshops: 
• Following the February 1991 Workshop, minor revi- 
sions (e.g., to accommodate connecting flights, clar- 
ify terminology, revise headings and restructure ta- 
bles, improve representation of fare structures, bug 
fixes, etc) were made to the relational air-travel- 
information database. The MADCOW data collec- 
tion effort, and systems developed with this data, 
made use of this revised relational database (Ver- 
sion 3.3). 
• The MADCOW data collection effort provided data 
from five sites (AT&T, BBN, CMU, MIT/LCS, and 
SRI), rather than the single ATIS data collection 
site (TI) used for the June 1990 and February 1991 
tests. 
• Some (but not all) of the collecting sites provided 
secondary (Crown PCC-160) microphone data in 
addition to the primary (close- talking Sennheiser) 
microphone. The use of the secondary microphone 
data was encouraged, but not required, for the 
February 1992 tests. 
• The definition of "Class D" queries was broadened 
to include "Class DI" queries. 
• The files indicating the "classification" (i.e., Class 
A, D or X) for each query were not provided along 
with the test queries (as they had been in previous 
tests), so'that each site had no extra information 
regarding the context-dependency or answerability 
of each query. 
• Similarly, "unanswerable" (Class X) queries were 
not identified when the test material was released. 
If system developers provided answers for these 
queries, they were not scored. 
• No utterances were to be treated differently on the 
grounds of the presence of disfluencies such as false 
starts or restarts. In the February 1991 tests, these 
utterances were regarded as "Optional". 
• Concern had been expressed at the February 1991 
meeting that some sites might have chosen to "over- 
generate" (by providing verbose) NL and SLS an- 
swers rather than provide more succinct answers. It 
was argued that "correct" answers should have at 
15 
least the information in the ".ref" files previously 
used in scoring answers, but no more than in some 
specified maximal answer. Bob Moore and Eric 
Jackson, at SRI, proposed and implemented an al- 
gorithmic procedure for deriving maximal reference 
answers (".rf2") from the NLParse-generated SQL 
files used to generate the .ref files. Bill Fisher at 
NIST subsequently modified the NIST comparator 
(used in scoring the NL and SLS results) to imple- 
ment the new "minimum~maximum" scoring pro- 
cedure. The Principles of Interpretation document 
was modified to accommodate these changes. 
* Special reports were to be prepared by NIST to 
partition the tabulations of results according to the 
originating sites for the test data. 
• Following completion of each phase of scoring the 
results, NIST was to prepare and make available to 
all participants both detailed and summary reports 
via anonymous ftp. 
• Because there had been a recommendation to re- 
port results for all answerable queries in complete 
subject-scenarios (i.e., the material collected during 
one subject's working of one scenario), test mate- 
rial was to be provided to the testing sites in com- 
plete subject-scenarios. Emphasis was to be placed 
on analysis of the subset of "answerable" queries 
(i.e., Class A+D), rather than on the individual 
classes A and/or D. Further, the weighted error 
percentage (defined as twice the percentage of in- 
correct or "false" answers plus the percentage of 
"No_Answer" responses) was identified as preferable 
to the single-number "Score" reported at the Febru- 
ary 1991 meeting (Score (%) = 100 (%)- \[Weighted 
Error (%)1) 
4 TEST MATERIAL SELECTION 
AND DISTRIBUTION 
With the approval of the MADCOW Group, NIST had 
reserved approximately 20% of the pooled MADCOW 
data for test purposes. NIST screened this data for the 
occurrence of truncated utterances, rejected the subject- 
scenarios that included these phenomena, and deter- 
mined that there was a sufficient quantity of reserved 
potential test material to permit release of a test set con- 
sisting of approximately 200 utterances from each of the 
five MADCOW sites contributing data. NIST did not 
monitor the audio quality of the .wav files nor review 
the accuracy of the transcriptions, since no criteria for 
acceptability based on these have been defined, although 
in retrospect this might have simplified the adjudication 
process. 
The test material, subsequent to deletion of some ma- 
terial during the adjudication process, consisted of 970 
non-null (and 1 null) utterances in all classes. The num- 
ber of distinct scenarios used by all subjects was 42, 
with a total of 37 subjects ("speakers") completing 122 
subject-scenarios. There were 17 male subjects, and 20 
were female. Seven of the 122 subject-scenarios used 
the "Common-l" scenario; however, the test material 
selected from BBN and CMU did not include any in- 
stances of this scenario. The average number of queries 
per subject-scenario was 8. The MIT subject-scenarios 
had an average number of 4.6 queries, and SRI and CMU 
each had an average number of 12.1 queries per subject- 
scenario. There were 508 lexemes represented in the test 
material. The average number of words per utterance 
was about 11. 
After NIST selected the test material, it was produced 
on CD-ROM. The test disc (NIST Speech Disc T3-1.1) 
was distributed to the testing sites on Jan. 6, 1992. 
Concurrent with preparation of the CD-ROMs, NIST 
staff and the "Annotation Group" at SRI initiated 
preparation of the annotation files required to implement 
scoring. 
5 TEST PROCEDURE 
Following completion of locally administered single-pass- 
per-system tests, participating sites submitted results 
for (at least) three ATIS tests: the SPeech RECogni- 
tion (SPREC), Natural Language (NL) and Spoken Lan- 
guage System (SLS) tests. 
The format for data submission via e-mail was specified 
by NIST and all "official" results were received at NIST 
by 6:00 AM on Jan. 20, 1992. As in previous ATIS 
tests, answer hypotheses were to be in the form of lexical 
SNOR (.lsn) files for the SPREC results and in Common 
Answer Specification (CAS) format files for the NL and 
SLS results. Each submission was to be accompanied by 
a text file for each system providing a system description 
following a suggested format. 
6 TEST SCORING, ADJUDICATION 
AND REPORTING PROCEDURE 
Upon receipt of the test results, NIST implemented pre- 
liminary scoring with a reference answer set including 
.cat, .ref and .rff2 files developed at NIST and SRI for 
the NL and SLS tests, and the "lexical SNOR" (.lsn) files 
derived from the detailed (.sro) transcriptions provided 
by the collecting sites for the SPREC tests. On Jan. 24, 
1992, upon completion of the preliminary scoring and 
16 
preparation of the required reports, NIST released the 
preliminary results by anonymous ftp. 
A detailed and formal procedure was established at NIST 
at the MADCOW group's request for handling requests 
for adjudication. 
The participating sites filed a total of 122 requests for 
adjudication, which were treated by NIST and the SRI 
Annotation group in a manner similar to that followed 
for the training data's bug reports. Some of these re- 
quests involved more than one utterance, or reported on 
more than one "bug" in an utterance, so that the number 
of unique utterances potentially affected by the requests 
for adjudication was 193, or approximately 19% of the 
test material. 
Of these utterances, the adjudicators determined that 
99 (51%) actually required one or more changes. "No 
Action" decisions were made for the remaining 49%. 
NIST was advised by Francis Kubala at BBN during 
the adjudication period that some of the reference tran- 
scriptions used for scoring the SPREC test appeared to 
be inaccurate. NIST subsequently reviewed all of the 
transcriptions noted by Kubala and corrected them as 
deemed appropriate. 
In addition to the 99 utterances noted as part of the 
formal requests for adjudication requiring changes to the 
annotations, 26 test utterances were identified by the 
adjudicators as requiring changes. 
The final total of 125 utterances (12.9% of the entire test 
set) for which annotation changes were made includes 
the following breakdown (by category): 
• 42 with software problems related to annotations 
or scoring (e.g., NLParse, batching, or Comparator 
bugs), 
• 36 for which annotation errors had been made, 
• 27 involved problems with the transcriptions devel- 
oped at the originating sites, and 
• 20 involved differences of opinion in applying the 
Principles of Interpretation or the use of context in 
interpreting the query. 
Following completion of the adjudication process, NIST 
released a set of "Official" ATIS Benchmark Test results 
to the community on Feb. 5, 1992. 
NIST was subsequently advised by Paramax that cor- 
rections to the reference answer set that were to have 
been made during the adjudication process did not ap- 
pear to have been made. NIST and SRI determined that 
this had in fact been the case, and a total of 19 .rf2 files 
were corrected. The entire set of NL and SLS results 
were then re-scored, and a "Revised Official" set of re- 
sults was made available to the community. Analysis of 
the differences between these two sets of "official" results 
shows that only 5 of Paramax's NL and 4 of their SLS 
answers were scored differently. 
Paramax also noted, following release of the "Revised 
Official" results, that 20 of their NL as well as another 
20 of their SLS answers were scored as "False" because 
of known limitations in the NIST official scoring soft- 
ware. NIST had determined that the degree to which 
Paramax's answers were affected by this known limita- 
tion was approximately ten times more severe than for 
any other site, and declined to alter the scoring software 
to accomodate Paramax's unusual responses. NIST en- 
couraged Paramax to develop and document "unofficial" 
results \[4\] with slightly modified scoring software. 
A "handout" was prepared for, and distributed at, the 
February 1992 Speech and Natural Language Workshop 
containing the System Descriptions provided by the par- 
ticipants and NIST's summaries of Benchmark Test re- 
sults. 
7 BENCHMARK TEST 
AND DISCUSSION 
RESULTS 
7.1 ATIS SPeech RECognition 
(SPREC) Test Results: 
7.1.1 Close-Talking Microphone 
Table 1 presents a tabulation of the February 1992 ATIS 
spontaneous Speech RECognition (SPREC) test results. 
Results are presented for a number of defined subsets of 
the utterances, with the utterance classes defined in the 
annotation process. The set Class ATDWX is the set of 
all utterances in all classes, consisting of 971 utterances. 
The set Class A+D includes all answerable utterances, 
687 in all. Individual scores for the component subsets 
Class A, Class D, and Class X are also included. The 
utterances in Classes D and X tend to have a greater 
degree of disfluency than those in Class A. This factor 
may be reflected in the corresponding error rates, since 
the lowest subset error rates are to be found for Class A 
utterances, and the highest for Class X. 
In the set of answerable queries, Class A+D, the word 
error ranges from 6.2% to 13.8%, and the "Utterance 
17 
error rate" (corresponding approximately to "sentence 
error rate", but acknowledging the fact that some ut- 
terances consist of more than one sentence) range from 
34.6% to 60.1%. 
The lowest word error rate, in any of the subsets, 5.8%, is 
noted for the BBN system described in \[5\] for the subset 
of Class A utterances. 
Table 2 presents a matrix tabulation of ATIS SPREC re- 
sults for the set of answerable queries, Class A+D. This 
matrix form of tabulation of results was developed at 
the MADCOW group's request to shed light on poten- 
tial variabilities in the data for test set components from 
differing originating sites. The five columns of the ma- 
trix block correspond to the five originating sites for the 
MADCOW test data. In this case, the six rows of the 
matrix block correspond to the six sets of SPREC test 
results sent to NIST. The "Overall Totals" column at the 
right of the central block presents results corresponding 
to those cited for the Class A+D subset in Table 1. Note, 
for example, that the previously cited lowest Class A+D 
subset word error of 6.2% (for the BBN system) is shown 
in the second row entry of this column. 
The "Overall Totals" row presents results accumulated 
over all systems for which results were reported to NIST. 
Note that the Overall (subset) Total Word Error ("W. 
Error") ranges from a low of 5.9%, for the data originat- 
ing at MIT/LCS, to 14.6% for the AT&T data subset. 
These data suggest that the MIT data subset is less chal- 
lenging for ATIS SPREC systems than the data from 
other sites, but the reasons for this are not immediately 
evident. 
Analysis of the transcriptions suggests that the AT&T 
data subset has a higher incidence of disfluencies than 
other subsets, partially explaining why it is more chal- 
lenging than the other data subsets. 
For the "Class A+D" data, the lowest subset word er- 
ror for any SPREC system is 3.2%, again for the BBN 
SPREC system and for the MIT data subset. Analysis 
of a similar matrix for the Class A data (not shown) in- 
dicates that the lowest subset word error (again for the 
MIT data subset) is 2.6% for the BBN system, with a 
corresponding utterance error of 20.7%. 
7.1.2 Secondary (Crown PCC-160) Microphone 
Data 
Three ATIS MADCOW sites provided data for both the 
Sennheiser close-talking microphone and the secondary 
(Crown PCC-160) microphone: CMU, MIT/LCS, and 
SRI. Two sites agreed to use the Crown microphone data 
with SPREC systems, using "robust" recognition algo- 
rithms: CMU and SRI. In some cases, results for other 
algorithms for comparable subsets of the data are avail- 
able, and these have been excised from larger sets of data 
provided to NIST by CMU and SRI for the purposes of 
comparisons. 
Table 3 presents a matrix tabulation of the SPREC 
data for the Class A+D data from CMU, MIT/LCS 
and SRI for 5 systems (i.e., 3 from CMU and 2 from 
SRI). The "cmu4" system is the CMU Sphinx II sys- 
tem \[6\] processing the close-talking microphone data, 
the "cmu6" system is the CMU codeword-dependent- 
cepstral-normalization (CDCN) system \[7\] processing 
the close-talking data, and the "cmu3" system is the 
CMU CDCN system processing the Crown microphone 
data. The "sri3" system (processing the close-talking mi- 
crophone data) and "sri4" system (processing the Crown 
microphone data) are versions of the Sl:tI Decipher sys- 
tem incorporating the "I:tASTA" procedure for high-pass 
filtering of a log-spectral representation of speech \[8\]. 
For the close-talking microphone data subset, the lowest 
word error rate (7.0%) is for the sri4 system, which may 
be compared to the cmu4 system (10.4%) and the cmu6 
system (13.7%). According to the system description 
provided by CMU, the two CMU systems differ in the 
amount of training material, among other factors. 
For the secondary microphone data subset, the word er- 
ror rate for the cmu3 system is 17.8%, and for the sri4 
system is 30.4%. 
There are indications of substantial variabilities due to 
originating site for the secondary microphone data, with 
both the SRI and CMU data secondary microphone data 
subsets giving rise to higher error rates than for the MIT 
data subsets. 
7.1.3 Statistical Significance: SPREC 
As in previous benchmark tests, two statistical signifi- 
cance tests are routinely implemented at NIST in anal- 
ysis of speech recognition performance assessment tests. 
The utterance (sentence) error test is an application of 
McNemar's test, first suggested for use in this commu- 
nity by Gillick \[9\]. Another test consists of a MAtched- 
Pairs Sentence-Segment Word Error ("MAPSSWE") sig- 
nificance test, originally devised for use with the Re- 
source Management corpora. 
Analysis of the tabulation of the word error test results 
18 
for the answerable query subset (Class A+D) shown in 
Table 4a indicates that for the BBN system \[5\], the word 
error rates are significantly different from (lower than) 
those for the other systems included in these tests. The 
sentence error McNemar test (Table 4b) indicates a simi- 
lar result, but in this case, the sentence error rate for the 
Paramax SPREC system \[4\] does not differ significantly 
from the BBN system. 
7.2 Natural Language (NL) Tests 
Table 5 presents a tabulation of the February 1992 ATIS 
Natural Language (NL) understanding tests results. Re- 
sults are presented for the set of all "answerable utter- 
ances", Class A+D, and for the individual Class A and 
Class D subsets. As was the case for the SPREC results, 
in general the error rates are higher for Class D than for 
Class A utterances. 
For the set of answerable queries, Class A+D, the 
weighted error ranges from 30.1% to 75.4%. Note that 
five of the systems have weighted error percentages be- 
tween 30.1% and 33.9%. 
Table 6 presents a matrix tabulation for the NL test re- 
sults for the set of answerable queries, Class A+D. There 
were a total of 687 queries in this set. The numbers 
tabulated for this set in Table 5 appear in the "Overall 
Totals" column, along with corresponding percentages. 
The "Overall Totals" row indicates the variability due 
to the test subsets' originating site. 
Of the 5 data subsets, the lower weighted error percent- 
ages in the "Overall Totals" row are to be found for the 
CMU and MIT data, with the SPd, AT&T, and BBN 
data giving rise to higher weighted error percentages. 
Since the AT&T data was collected using a significantly 
different collection paradigm- with the subject interfac- 
ing with the ATIS system simulation only over a phone 
line, rather than viewing a screen display of travel infor- 
mation \[10\] - the fact that the AT&T data subset is more 
difficult than three other sites is perhaps not surprising. 
However, the BBN ATIS data collection effort also dif- 
fered somewhat from that at other MADCOW sites in 
that - although information was presented using a screen 
display - the BBN scenarios "included not only trip plan- 
ning scenarios, but also problem solving involving more 
general kinds of database access... This was done to try 
to elicit a richer range of language usage.\[3\]" This factor 
("richer language usage") may provide a partial expla- 
nation for the high NL error rates noted for the BBN 
data subset. 
For the CMU and MIT \[11\] systems, there appears to be 
some indication that the error percentages for "locally- 
collected" data are lower than for "foreign" data, per- 
haps because of greater familiarity with the local data- 
collection scenarios and environment, or use of a variant 
of the system under test when collecting the MADCOW 
data from which the test set was selected. 
7.3 Spoken Language Systems (SLS) 
Tests 
Table 7 presents a tabulation of the February 1992 Spo- 
ken Language System understanding test results. As was 
the case for Table 5 (for the corresponding NL results), 
results are shown for several classes of the data, but em- 
phasis in this material is placed on the answerable ut- 
terances, comprising Class A+D. 
For the Class A+D set, the seven SLS systems have 
weighted error ranging from 43.7% to 90.2%. Note that 
four systems (from three sites: BBN, MIT and SRI 
\[12\]) have weighted error percentages between 43.7% and 
52.8%. 
Table 8 presents a matrix tabulation for the SLS test 
results for Class A+D, comparable in structure to that 
for the NL results of Table 6. 
Of the 5 data subsets corresponding to different collec- 
tion sites, the range in weighted error is from 49.5% (for 
the MIT data) to 73.1% (for the AT&T data). 
8 ACKNOWLEDGEMENT 
The authors would like to acknowledge the help pro- 
vided by the entire MADCOW community throughout 
the ordeal of collecting, annotating, distributing and us- 
ing the MADCOW corpus for test purposes. A com- 
panion paper in this Proceedings provides a detailed ac- 
knowledgement, but special credit was earned by Lynette 
I-Iirschman as Chair of the MADCOW Group. It is to 
everyone's credit that the essential data was collected, 
annotated, and distributed and that "deadlines" were 
usually honored! 
Special thanks are also due to the group at MIT, par- 
ticularly Michael Phillips and Christie Clark Winterton, 
for quick turn- around in producing recordable CD-ROM 
discs for distribution of the MADCOW training corpus 
from master tapes produced by Brett Tjaden at NIST. 
The Annotation Group at SRI, consisting of Kate 
Hunicke-Smith, Harry Bratt and Beth Bryson, was in- 
valuable to the NIST effort to implement these tests. 
19 
They participated actively and cheerfully in annotation 
of the test material and the adjudication process, in ad- 
dition to "training" one of the authors (ND) in the use 
of the NLParse software and annotation techniques. 
Francis Kubala, at BBN, called NIST's attention to some 
problematic transcriptions for the SPREC tests. NIST 
reviewed and revised these as appropriate, and in the 
process noted 3 truncated utterances (in one subject- 
scenario collected at BBN). While the revised transcrip- 
tions were used in NIST's "revised official" scoring, NIST 
neglected to delete this subject-scenario from the NL and 
SLS tests, as specified by MADCOW protocols for han- 
dling data with truncated utterances. Analysis of perfor- 
mance on this particular subject-scenario indicates that 
most sites did well, nonetheless. 
9 References 
1. Pallett, et al., "DARPA ATIS Test Results June 1990", 
in Proc. Speech and Natural Language Workshop, June 
1990, (R. Stern, ed.) Morgan Kaufmann Publishers, Inc. 
ISBN 1-55860-157-0, pp. 114-121. 
2. Pallett, D.S., "Session 2: DARPA Resource Manage- 
ment and ATIS Benchmark Test Poster Session", in 
Proc. Speech and Natural Language Workshop, Febru- 
ary 1991, (P. Price, ed.) Morgan Kaufmann Publishers, 
Inc. ISBN 1-55860-207-0, pp. 49-58. 
3. MADCOW, "Multi-Site Data Collection for a Spoken 
Language Corpus", in Proc. Speech and Natural Lan- 
guage Workshop, February 1992, (M. Marcus, ed.) Mor- 
gan Kaufmann Publishers, Inc. 
4. Norton, L.M., Dahl, D.A., and Linebarger, M.C., "Re- 
cent Improvements and Benchmark Results for the Para- 
max ATIS System", in Proc. Speech and Natural Lan- 
guage Workshop, February 1992, (M. Marcus, ed.) Mor- 
gan Kaufmann Publishers, Inc. 
5. Kubala, F. et al., "BBN BYBLOS and HARC February 
1992 ATIS Benchmark Results", in Proc. Speech and 
Natural Language Workshop, February 1992, (M. Mar- 
cus, ed.) Morgan Kaufmann Publishers, Inc. 
6. Ward, W. et al., "Speech Understanding in Open 
Tasks", in Proc. Speech and Natural Language Work- 
shop, February 1992, (M. Marcus, ed.) Morgan Kauf- 
mann Publishers, Inc. 
7. Stern, R. M., et al., "Multiple Approaches to Robust 
speech Recognition" in Proc. Speech and Natural Lan- 
guage Workshop, February 1992, (M. Marcus, ed.) Mor- 
gan Kaufmann Publishers, Inc. 
8. Murveit, H., Butzberger, J. and Weintraub, M., "Re- 
duced Channel Dependence for Speech Recognition", in 
Proc. Speech and Natural Language Workshop, Febru- 
ary 1992, (M. Marcus, ed.) Morgan Kaufmann Publish- 
ers, Inc. 
9. Gfllick, L. and Cox, S.J., ~'Some Statistical Issues in the 
Comparison of speech Recognition Algorithms", Pro- 
ceedings of ICASSP-89, Glasgow, May 1989, pp.532-535. 
10. Pieraccini, R. et al., "Progress Report on the Chronus 
System: ATIS Benchmark Results", in Proc. Speech and 
Natural Language Workshop, February 1992, (M. Mar- 
cus, ed.) Morgan Kaufmann Publishers, Inc. 
11. Zue, V., et al., "The MIT ATIS System: February 1992 
Progress Report", in Proc. Speech and Natural Lan- 
guage Workshop, February 1992, (M. Marcus, ed.) Mor- 
gan Kaufmann Publishers, Inc. 
12. Appelt, D.E. and Jackson, E., "SRI International Febru- 
ary 1992 ATIS Benchmark Test Results", in Proc. 
Speech and Natural Language Workshop, February 
1992, (M. Marcus, ed.) Morgan Kaufmann Publishers, 
Inc. 
10 APPENDIX: "OFFICIAL" 
"UNOFFICIAL" RESULTS 
VS. 
Several sites expressed interest in having results for ad- 
ditional systems included in NIST's "official" summary, 
although these results typically were not available at the 
required time for "official" scoring. At least one site took 
exception to an idiosyncratic property of the "official" 
comparator's treatment of their system's responses to 
several queries, and requested permission to present "un- 
official" results at the meeting. Another site noted that 
they had identified a "bug" in their CAS-answer- for- 
mat software, and after it was fixed, they also requested 
permission to report unofficial results. 
It was subsequently decided that the results submitted to 
NIST by the specified deadline, and uniformly scored at 
NIST with the "official" comparator and the adjudicated 
final set of reference answers would comprise the only 
"official" results, and that locally scored results should 
be represented as "unofficial", even if scored with the 
same scoring software and answer set as the "official" 
results. 
It should be noted that since the results are for locally 
implemented tests, and since NIST's role in the tests is 
principally one of selecting and distributing the test ma- 
terial, and implementing the scoring software and uni- 
formly tabulating the results of the tests, the results are 
not to be construed or represented as endorsements of 
any systems or official findings on the part of NIST, 
DARPA or the U.S. Government. 
20 
Class A+D+X Subset 
Corr Sub Del Ins 
att3-adx 85.6 10.5 3.9 3.1 
bbn3-adx 92.5 5,7 1.8 1.8 
cmu4-adx 88,2 9.7 2.1 4.4 
mlt4-adx 84.1 11.5 4.4 2,3 
paramax3-adx 91.5 6.3 2.1 2.1 
sP13-adx 91.4 6,8 1,8 2.4 
Err U. Err # Utt, 
17.5 64.6 970 
9.4 40.3 971 
16.2 60.2 971 
18.1 59,6 971 
10,6 42.2 971 
11.0 48.7 971 
ATT Feb 92 Sprec Results 
BBN Feb 92 Sprec Results 
CMU Feb 92 ATIS Sphlnx-II Senn. 
MIT-LCS Feb 92 Sprec Results 
Paramax/BBN Feb 92 Sprec Results 
SRI Feb 92 Sprec Results 
Class A+D Subset 
Corr Sub Del Ins 
att3-a_d 88.9 7.7 3.4 2.7 
bbn3-a d 95.2 3,6 1,1 1.5 
cmu4-a d 91.9 6.5 1,6 3.7 
mlt4-a_d 88.3 8.7 3.1 1.9 
paramax3-a_d 94.6 4.0 1.4 1.7 
sri3-a_d 93.8 4.9 1.4 2.1 
Err U. Err # Utt. 
13.8 60.1 687 
6.2 34.6 687 
11.8 54.4 687 
13.6 54.1 687 
7.1 36.4 687 
8.4 44.5 687 
ATT Feb 92 Sprec Results Class A+D 
BBN Feb 92 Sprec Results Class A+D 
CMU Feb 92 ATIS Sphinx-II Senn. Class A+D 
MIT-LCS Feb 92 Sprec Results Class A+D 
Paramax/BBN Feb 92 Sprec Results Class A+D 
SRI Feb 92 Sprec Results Class A+D 
Class A Subset 
Corr Sub De1 Ins Err U, Err # Utt. 
att3-a 88.9 7.2 3.9 2,0 13.1 60.9 402 
bbn3-a 95.4 3.3 1.3 1.2 5.8 35.6 402 
cmu4-a 92,8 5.7 1.6 3.2 10.4 54.2 402 
mit4-a 89.1 7.8 3.1 1.6 12.5 54.5 402 
paramax3-a 94.9 3.6 1.5 1.4 6.5 36.6 402 
srl3-a 94.4 4.0 1.5 1.7 7.3 44.0 402 
ATT Feb 92 Sprec Results Class A 
BBN Feb 92 Sprec Results Class A 
CMU Feb 92 ATIS Sphinx-II Senn. Class A 
MIT-LCS Feb 92 Sprec Results Class A 
Paramax/BBN Feb 92 Sprec Results Class A 
SRI Feb 92 Sprec Results Class A 
Class D Subset 
Corr Sub Del Ins Err U, Err # Utt. 
att3-d 89.0 8.7 2.3 4.1 15,2 58.9 285 
bbn3-d 94.9 4.2 0,8 1.9 7,0 33.3 285 
cmu4-d 90,3 8.2 1.5 4.8 14.5 54.7 285 
mlt4-d 86,7 10.3 3.0 2.3 15.7 53.7 285 
paramax3-d 94.1 4.7 1.1 2.2 8.1 36.1 285 
srl3-d 92,5 6.4 1.1 2.8 10.3 45.3 285 
ATT Feb 92 SpPec Results Class D 
BBN Feb 92 Sprec Results Class D 
CMU Feb 92 ATIS Sphlnx-II Senn. Class D 
MIT-LCS Feb 92 Sprec Results Class D 
Paramax/BBN Feb 92 Sprec Results Class D 
SRI Feb 92 Sprec Results Class D 
Class X Subset 
Corr Sub Del Ins 
att3-x 77.4 17.3 5.3 3.9 
bbn3-x 85.5 11.0 3,5 2.7 
cmu4-x 78.9 17.6 3.4 6.1 
mlt4-x 73.8 18.5 7,7 3.3 
paramax3-x 83.7 12.2 4.0 3.1 
srl3-x 85,5 11.5 3.0 2.9 
Err U. Err # Utt, 
26.5 75.6 283 
17.2 53.9 284 
27.2 74.3 284 
29.5 72,9 284 
19.4 56.3 284 
17.4 58.8 284 
ATT Feb 92 Spree Results Class X 
BBN Feb 92 Spree ResuZts Class X 
CMU Feb 92 ATIS Sphinx-II Senn. Class X 
MIT-LCS Feb 92 Spree Results Class X 
Paramax/BBN Feb 92 Sprec Results Class X 
SRI Feb 92 Spree Results Class X 
Table 1: ATIS SPREC Test Results 
21 
I Class A+D Subset II I 
I OPiginating Site of Test Data II Overall I Foreign 
I ATT I BBN I CaU I MZT I SRX II Totals I Co11. Site 
I (114 utt.) I (151 utt.) I (137 Utt.) \[ (152 utt.) I (133 utt.) II 687 I Totals 
............. + ............... + ............... + ............... + ............... + ............... II ............... + ............... 
att3 I 13.0 3.2 4.91 8.9 2.7 .51 7.3 8.8 2.ol 4.4 1.3 2.ol 8.7 2.9 4.oll 7.7 3.4 2.71 6.7 3,4 2.3 
I 21.0 69.3 I 11.1 55.0 I 16.0 75.9 I 7.8 42.1 I 15.6 62.4 II 13.8 60.1 I 12.4 58.3 
......... + ............... + ............... + ............... + ............... * ............... II ............... + ............... 
bbn3 I 6.3 1.1 3.ol 3.3 O.S .21 3.2 1,6 1.ol 1.8 0.7 0.8\[ 4.5 1.4 1.711 3.6 1.1 1.51 3.7 1.2 1.5 
I 10.4 50.9 I 5,3 31.1 I 5.8 38.7 I 3.2 21.7 I 7.7 35.3 II 6.2 34,6 I 6.5 35.6 
s ......... + ............... + ............... + ............... + ............... ÷ ............... II ............... ÷ ............... 
Y cmu4 J 10.2 1.4 7.31 4.8 1.5 .5J 7.0 2.1 5.2J 3.9 1.1 1,2J 7.9 1.7 4.411 6.5 1.6 3.71 6.4 1.4 3.3 
S I 18.9 68.4 I 7.9 47.0 I 14.3 89.3 I 6.3 44.1 I 14.0 47.4 II 11.8 54.4 \[ 11.1 50.7 
T ......... + ............... * ............... + ............... + ............... + ............... II ............... + ............... 
E Bit4 I 7,9 2.5 3.11 7.6 3.6 .51 9.9 3.9 1.71 5.9 1.8 0.91 13.1 3.7 2.611 8.7 3.1 1.91 9.5 3.5 2.2 
M I 13.4 51.8 \[ 12.7 57.6 I 15.5 61.3 I 8.6 46.I I 19.4 64.1 II 13.6 54.1 I 15.1 56.4 
s ......... + ............... + ............... + ............... + ............... + ............... II ............... + ............... 
paPamax3 I 6.8 1.5 2.81 3.1 0.7 ,21 3.4 2.3 1.21 2.5 0.7 1.11 5.2 1.9 2.611 4.0 1.4 1.71 4.0 1.4 1.7 
I 11.0 48.2 I 5.0 28.5 I 6.9 41.6 I 4.3 26.3 I 9.7 41,4 II 7.1 36.4 I 7.1 36.4 
......... + ............... + ............... + ............... + ............... + ............... II ............... + ............... 
sri3 I 8.1 1.3 3.61 3.2 1.2 ,51 5.4 1.9 2.71 3.2 1,3 1.01 5.2 1.1 2.211 4.9 1.4 2.11 4.8 1.4 2.1 
I 13.1 57.0 I 6.9 35.8 I 10.0 56.2 I 5.5 40.8 I 8.6 36.1 II 8.4 44.5 I 8.3 46.6 ======================= =======:==================== =========:====== ================================================= ============ 
Overall I 8.7 1.8 4.11 4.8 1.8 .41 6.0 3.1 2.31 3.6 1.1 1.21 7.4 2.1 2.911 
Totals I 14.6 57.6 I 8.0 42,5 I 11.4 57,2 I 5.9 3o.s I 12.5 46.1 II 
............. + ............... + ............... * ............... * ............... + ............... II ................... 
Foreign \[ 7.9 1.5 4.ol 5.1 1.9 .41 5.8 3.3 1.71 3.2 1.0 1.21 7.9 2.3 3.111 I ~Sub ~Dsl ~Ins I 
System I 13.4 55.3 I 8.5 44.8 I 10.8 54.7 I 5.4 35.0 I 13,3 48.1 II I ~W.EPP ~Utt.Err I .................................................................................................................. 
Table 2: ATIS SPREC Results Class A+D by Collection Site 
I Class A+D Subset \[ 
I Originating Site of Test Data Overall \[ Foreign 
I CMU I MIT I SRI Totals I Co11. Site 
I (101 utt.) I (152 utt.) I (79 Utt.) 332 I Totals .......... + ............... + ............... ÷ ............... ,, ............... + ............... 
cmu3 I 14.9 3,4 6,9 I 8.3 2.9 1.31 11.1 3.9 4.6 10.9 3.3 3.71 9.1 3.2 2.2 
J 25.2 84.2 I 12.4 61.8 J 19.6 54.4 17,8 66,9 I 14.5 59.3 
...... + ............... + ............... + ............... ,, ............... + ............... 
S cmu4 I 7.6 1,4 6,61 3.9 1.1 1.21 6.4 0.8 5.5 5.6 1.1 3.71 4.7 1.1 2.4 
Y I 15.5 70.3 I 6.3 44.1 I 12.7 43.0 10.4 51.8 I 8.1 43.7 
T cmu6 I 9,1 1.4 11.71 5.0 1.3 2.71 S,O 1.1 5.3 8.4 1.3 6.01 5.3 1.2 3.4 
E I 22.1 80.2 I 8.9 53.3 I 12,5 50.6 13.7 60.8 I 9.9 52.4 
M ...... + ............... + ............... + ................................ + ............... 
S aria \[ 5.2 1.5 2.71 3.2 1.3 1.01 3.8 1.1 2.2 4.0 1.3 1.81 4.0 1.3 1.6 
I 9.4 52.5 I 5.5 40.8 I 7.1 34.2 7,0 42.8 I 7.0 45.5 
...... + ............... + ............... + ................................ + ............... 
8ri4 I 26.0 2.7 17.41 14.1 3.4 3.31 17.1 4,1 8.4 18.4 3.3 8.71 18.7 3.2 8.7 
I 46.2 93.1 I 20.8 78.9 \[ 29.6 77,2 30.4 82,8 I 30,6 84.6 ============================================================================================ 
OvePall I 12.6 2,1 9.11 6.9 2.0 1.91 8.9 2.2 5.211 
Totals I 23.7 76.0 I 10.8 55.8 I 16.3 51.9 II 
.......... + ............... + ............... + ............... II ................... 
FoPeign I 15.6 2,1 10.11 6.9 2,0 1.91 7.8 2.0 5.111 I ~Sub ~Del ~Ins 
Syat~ I 27.8 72.8 I 10.8 55.8 I 14.9 49.4 II laW.Err bUtt.Err I ............................................................................... 
Table 3: ATIS SPREC Crown and Crown Subset of Sennheiser 
Test Results by Collection sites 
22 
COMPARISON MATRIX: FOR THE WATCHED PAIRS TEST 
Feb 91ATIS SPREC Class A+D Results 
Min~J~um Number of Correct Boundary words 2 
att3-a_d bbn3-a_d omu4-a_d mit4-a_d paramax3 eri3-a_d 
.......... + .......... + .......... + .......... + .......... + .......... + .......... 
att3-a_d bbn3-ad cmu4-a_d same paramax3 8ri3-ad 
.......... + .......... + .......... + .......... + .......... + .......... + .......... 
bbn3-a_d bbn3-ad bbn3-a_d bbn3-ad bbn3-a_d 
.......... + .......... + .......... + .......... + .......... + .......... + .......... 
cmu4-a_d ¢mu4-ad paramax3 sri3-ad 
.......... + .......... + .......... + .......... + .......... + .......... + .......... 
mit4-a_d paramax3 eri3-ad 
.......... + .......... + ......... ~+ .......... + .......... + .......... + .......... 
paramax3 paramax3 
.......... + .......... + .......... + .......... + .......... + .......... + .......... 
sri3- a_d 
............................................................................ I 
COMPARISON MATRIX: McNEMAR'S TEST ON CORRECT SENTENCES FOR THE TEST: 
Feb 91ATIS SPREC Class A+D Results 
For all systems 
I att3"a_d(274)l bbn3"a_d(449)l cmu4-a_d(313)l mit4-a_d(315)l paramax3(437) l sri3-a_d(381) 
.............. + .............. + .............. + .............. ÷ .............. + .............. + .............. 
att3"a-d(274)l I D=(175) I 0=(39) I o=(41) I D=(163) I 0=(107) 
I I bbn3-ad I cmu4-e_d I mit4-a_d I paramax3 I sri3-a_d 
.............. + .............. + .............. + .............. + .............. + .............. ÷ .............. 
bbn3"a-d(449) l I I D=(13e) I D=(134) t D=(12) I D=(68) 
I I I bbn3-ad I bbn3-a_d I same I bbn3-a_d 
.............. + .............. ÷ .............. + .............. + .............. + .............. + .............. 
cmu4"a-d(313)l I I I O=( 2) I D=(124) t D=(68) 
I I I I same I paramax3 I eri3-a_d 
.............. + .............. + .............. ÷ .............. ÷ .............. + .............. ÷ .............. 
mit4"a-d(31S) l I I I I D=(122) I D=(66) 
I I I I I paramax3 I sri3-ad 
.............. + .............. + .............. + .............. + .............. + .............. + .............. 
paramax3(437) l ~ I I I I D=(Be) 
I I I 1 I I paramax3 
.............. + .............. ÷ .............. + .............. + .............. + .............. + .............. 
sri3"a-d(381)l I } I I I 
I I I I I I ........................................................................................................ 
Table 4: ATIS SPREC Significance Test Comparisons: Class A+D 
23 
system 
art1 
bbnl 
cmul 
cmu8 
mlt2 
paramaxl 
sril 
system 
attl-a 
bbnl-a 
cmul-a 
cmu8-a 
mlt2-a 
paramaxl-a 
srll-a 
#T 
378 
527 
582 
560 
551 
311 
533 
#T 
256 
322 
356 
346 
342 
223 
335 
#F 
209 
73 
102 
101 
87 
122 
60 
#F 
96 
26 
46 
46 
34 
50 
25 
# NA 
100 
87 
3 
26 
49 
254 
94 
Class A+D 
# Utt 
687 
687 
687 
687 
687 
687 
687 
W. EPP 
75.4 
33.9 
30.1 
33.2 
32.5 
72.5 
31.1 
# NA 
50 
54 
0 
10 
26 
129 
42 
Description 
ATT Feb92 ATIS 
BBN Feb92 ATIS 
CMU-Phoenlx Feb92 ATI$ 
CMU-MINDS-II Feb92 ATIS 
MIT Feb92 ATIS 
PARAMAX Feb92 ATIS 
SRI Feb92 ATIS 
Class A 
# Utt 
402 
402 
402 
402 
402 
402 
402 
W. Err 
60.2 
26.4 
22.9 
25.4 
23.4 
57.0 
22.9 
Description 
ATT Feb92 ATIS Class A NL 
BBN Feb92 ATIS Class A NL 
CMU-Phoenlx Feb92 ATI8 Class A NL 
CMU-MINDS-II Feb92 ATI8 Class A NL 
MZT Feb92 ATI8 Class A NL 
PARAMAX Feb92 ATIS Class A NL 
SRI Feb92 ATIS Class A NL 
Class D 
system # T # F # NA # Utt W. Err 
attl-d 122 113 50 285 96.8 
bbnl-d 205 47 33 285 44.6 
cmul-d 226 56 3 285 40.4 
cmu8-d 214 55 16 285 44.2 
mlt2-d 209 53 23 285 45.3 
paramaxl-d 88 72 125 285 94.4 
sr11-d 198 35 52 285 42.8 
Description 
ATT Feb92 ATIS Class D NL 
BBN Feb92 ATIS Class D NL 
CMU-Phoenlx Feb92 ATI8 Class D NL 
CMU-MINDS-II Feb92 ATIS Class D NL 
MIT Feb92 ATIS Class D NL 
PARAMAX Feb92 ATIS Class D NL 
SRI Feb92 ATIS Class D NL 
Table 5: Feb 92 ATIS NL Test Results - Using 
Minimal/Maximal Scoring Criterion 
24 
I class (A+D) Set II I i 
OPig£neting Site of Test Data I I Overall I Foeeign 
I ATT I BBN I cuu I MIT I SRI II Totals I Coll. Site 
I 114 I 151 I 137 I 152 I 133 II 687 \[ Totals 
............. + ............. + ............. + ............. + ............. + ............. II ............. + ............. 
art1 I 60 39 15 I 69 36 48 I 80 46 11 I 98 43 11 I 71 45 17 II 378 209 100 I 318 170 85 
I 53 34 131 48 24 30 I 58 34 8 I 54 28 71 53 34 13 II 55 30 151 55 30 15 
I 81.5 I 78.1 I 75.2 I 63.8 I 80.5 I I 75.4 I 74.2 
......... + ............. + ............. + ............. + ............. + ............. II ............. + ............. 
bbnl I 84 13 17 I 120 19 12 I 98 10 29 I 130 11 11 I 95 20 18 II 527 73 87 I 407 54 75 
I 74 11 15 \] 79 13 8 I 72 7 21 I 86 7 7 I 71 15 14 II 77 11 13 I 75 10 14 
\[ 37.7 I 33.1 I 35.8 I 21.7 I 43.6 I1 33.9 I 34.1 
......... + ............. + ............. + ............. + ............. + ............. tl ............. + ............. 
cmul I g9 14 1 I 110 41 0 I 125 10 2 I 134 18 0 I 114 19 0 II 582 102 3 \] 457 92 1 
I 37 12 11 73 27 0 I 91 7 11 88 12 0 I 815 14 o II 85 15 O l 83 17 0 
s I 25.4 I 54.3 \[ 16.1 I 23,7 I 28.6 II 30.1 I 33.8 
Y ......... + ............. + ............. + ............. + ............. + ............. II ............. + ............. 
S cmu8 I 8g 13 12 I 107 41 3 I 122 10 5 I 131 lg 2 I 111 18 4 II 560 101 26 I 438 91 21 
T I 78 11 11 I 71 27 2 i 89 7 4 i 86 12 1 i 83 14 3 II 82 15 4 I 83 17 4 
E I 33.3 I 56.3 I 18.2 I 25.3 I 30.1 II 33.2 I 35.9 
M ......... + ............. + ............. + ............. + ............. + ............. II ............. + ............. 
S m£t2 I 83 19 12 I 114 26 11 I 111 14 12 I 137 10 5 I 108 18 9 II 581 87 49 I 414 77 44 
I 73 17 11 I 75 17 7 I 81 10 g I go 7 3 I so 14 7 It so 13 7 I 77 14 8 
I 43.9 I 41.7 I 2g.2 I 16.4 I 33.8 II 32.5 I 37.0 
......... + ............. + ............. + ............. + ............. + ............. il ............. + ............. 
panamaxl I 36 22 56 I 58 27 66 I 89 18 30 I 71 25 55 I 57 29 47 II 311 122 254 I 311 122 254 
I 32 19 4gl 38 18 441 55 13 22 I 47 17 36 I 43 22 35 II 45 18 371 48 18 37 
I 87.7 I 79.5 I 48.2 I 70.4 I 78.9 II 72.5 I 72.5 
......... + ............. + ............. + ........................... + ............. II ........................... 
sr£1 \[ 76 13 25 I 116 11 24 I 119 10 8 I 12g 18 7 I 93 10 30 tl 533 60 94 \] 440 50 54 
I 67 11 22 I 77 7 16 I 87 7 6 \[ 85 11 5 I 70 8 23 II 78 g 14 I 79 9 12 
I 44.7 I 30.5 \[ 20.4 I 25.7 I 37.6 II 31.1 I 29.6 ==~=~=================================~================~==============================================~========== 
OvePall I 527 133 138 I 594 201 152 I 744 118 g7 I 830 143 91 I 647 15g 125 II 
Totals I 66 17 17 I 55 19 15 I 78 12 10 I 78 13 g I 69 17 13 
I 50.6 I 53.4 I 34.7 I 35.4 I 47.6 Legend: 
............. + ............. + ............. + ............. + ............. + ................................. 
FoPeign I 457 94 123 \] 574 182 150 I 497 98 90 I 593 133 86 I 554 14g 95 I #T #F #NA I 
System I 88 14 18 I 53 20 17 I 73 14 13 I 75 15 9 I 6g 19 12 II I ~T ~F ~NA I 
Totals ~ 45.5 I 56.7 I 41.8 ~ 38,6 I 4g.2 i i i ~ We£ghted Error I ......................................................................................................... 
Table 6: ATIS NL Test Results - Using 
Minimal/Maximal Scoring Criterion 
25 
system 
art2 
bbn2 
cmu2 
mltl 
mlt3 
paramax2 
sr12 
system 
att2-a 
bbn2-a 
cmu2-a 
mltl-a 
mlt3-a 
paramax2-a 
srl2-a 
#T 
300 
493 
458 
471 
419 
302 
444 
#T 
208 
301 
298 
305 
288 
215 
305 
#F 
233 
106 
226 
132 
95 
148 
69 
#F 
118 
43 
104 
58 
47 
70 
32 
# NA 
154 
88 
3 
84 
173 
237 
174 
# NA 
76 
58 
0 
39 
67 
117 
65 
# Utt 
687 
687 
687 
687 
687 
687 
687 
# Utt 
402 
402 
402 
402 
402 
402 
402 
Class A+D 
W. Err 
90.2 
43.7 
66.2 
50.7 
52.8 
77.6 
45.4 
Description 
ATT Feb 92 ATI8 8LS 
BBN Feb 92 ATI8 $L8 
CMU Feb 92 ATI8 $L$ 
MIT/SRI Feb 92 ATIS SLS 
MIT Feb 92 ATI6 6LS 
PARAMAX/BBN Feb 92 ATIS 8LS 
SRI Feb 92 ATI6 SLS 
Class A 
W. Err 
77.6 
35.8 
51.7 
38.6 
40.0 
63.9 
32.1 
Description 
ATT Feb 92 ATIS SL6 Class A 
BBN Feb 92 ATI8 8L8 Class A 
CMU Feb 92 ATI8 6L8 Class A 
MIT/SRI Feb 92 ATI6 SL6 Class A 
MIT Feb 92 ATI8 8L8 CZass A 
PARAMAX/BBN Feb 92 ATI8 8LS Class A 
SRI Feb 92 ATIS SLS Class A 
Class D 
system # T # F # NA # Utt W. Err 
att2-d 92 115 78 285 108.1 
bbn2-d 192 63 30 285 54.7 
cmu2-d 160 122 3 285 86.7 
mltl-d 166 74 45 285 67.7 
mit3-d 131 48 106 285 70.9 
paramax2-d 87 78 120 285 96,8 
sr12-d 139 37 109 285 64.2 
Description 
ATT Feb 92 ATIS 6L8 Class 0 
BBN Feb 92 ATIS SL6 Class D 
CMU Feb 92 ATIS 8LS Class D 
MIT/SRI Feb 92 ATIS 6LS Class D 
MIT Feb 92 ATIS SLS Class D 
PARAMAX/BBN Feb 92 ATIS SLS Class D 
SRI Feb 92 ATIS SLS Class D 
Table 7: ATIS SLS Test Results - Using 
Minimal/Maximal Scoring Criterion 
26 
I class (A+D) Set II I 
I Or£g£nat£ng Site of Test Data II Overall I Foreign 
I ATT I BBN I CMU I MZT I SRZ II Totals I Co11. Site 
I 114 { 151 I 137 I 152 I 133 II 687 I Totals 
............. + ............. + ............. + ............. + ............. + ............. II ............. + ............. 
att2 I 36 50 28 I 51 41 491 69 42 261 84 45 201 50 52 31 II 50O 233 1541264 183 126 
I 32 44 251 40 27 321 50 31 19 I 55 32 13 I 38 39 23 II 44 34 221 46 32 22 
I 112.3 I 85.6 I e0.3 I 76.3 I 101.5 II 90.2 I 85.9 
......... + ............. + ............. + ............. + ............. + ............. II ............. + ............. 
bbn2 I 72 23 19 I 113 21 17 I 95 13 29 I 122 18 12 I 91 31 11 II 493 106 88 I 380 85 71 
I 63 20 17 I 75 14 11 I 69 9 21 I 80 12 8 I 68 23 8 II 72 15 13 I 71 16 13 
f 57.0 I 39.1 I 40.1 I 31.5 I 54.9 II 43.7 1 45.0 
......... + ............. + ............. ÷ ............. + ............. + ............. II ............. + ............. 
cmu2 ~ 73 38 3 I 82 69 O I 98 39 0 I 113 39 0 I 92 41 O II 458 226 3 I 360 187 3 
I 64 33 31 54 46 o l 72 28 o l 74 26 O l 69 31 0 II 67 33 O l 65 34 I 
s I 69.3 l 91.4 ~ 56.9 I 51.3 I 61.7 II 66.2 I 68.5 
Y ......... + ............. + ............. + ............. + ............. + ............. II ............. + ............. 
S m£tl I 72 28 14 ~ 103 30 18 I 94 19 24 I 121 22 9 I 81 33 19 II 471 132 84 ~ 350 110 75 
r I 63 25 121 63 20 121 69 14 181 80 14 61 61 25 14 II 69 19 121 65 21 14 
E I 61.4 I 51.7 I 45,3 I 34.9 I 63.9 II 50.7 I 55.1 
M ......... + ............. + ............. + ............. + ............. + ............. II ............. + ............. 
s mtt3 I 63 21 25 I 81 15 55 I 88 19 30 I 110 14 28 I 72 26 35 II 419 95 173 I 309 81 145 
1 60 18 221 54 10 36 1 64 14 221 72 9 18{ 54 20 26 II 61 14 251 58 15 27 
I 58,8 I 56.3 I 49.6 I 38.5 I 65.4 II 52.8 I 57.4 
......... + ............. + ............. + ............. + ............. + ............. II ............. + ............. 
paramax2 I 38 23 55 I 52 33 66 I 87 27 23 J 74 30 48 I 53 35 45 II 302 148 237 I 302 148 237 
J 32 20 48 I 34 22 44 I 64 20 17 I 49 20 32 J 40 25 34 II 44 22 34 J 44 22 34 
I 88.6 I 87.4 I 56.2 I 71.1 I 86.5 II 77.6 I 77.6 
......... + ............. + ............. + ............. + ............. + ............. II ........................... 
sri2 ~ 55 12 47 J 101 13 37 J 93 13 31 J 112 20 20 J 83 11 39 II 444 69 174 J 361 58 135 
l 48 11 411 67 9 25J 68 9 231 74 13 131 62 5 29 II 65 10 251 85 10 24 
I 62.3 I 41.7 I 41.6 I 39.5 I 45.9 II 45.4 I 45.3 =========== = ==:================= ======================== ===== ===== ====~===================== =================== = == 
Overall I 412 195 191 I 593 222 242 I 624 172 163 I 736 191 137 I 522 229 180 II 
Totals I 52 24 24 ~ 56 21 23 I 65 18 17 I 69 18 13 I 56 25 19 II 
I 72.8 I 64.9 1 52.9 I 48.8 I 68.5 II Legend: 
............. + ............. + ............. + ............. + ............. + ............. II .................... 
Foreign I 376 145 163 I 480 201 225 I 525 133 163 I 505 155 100 I 439 218 141 II I #T #F #NA I 
System I 55 21 24 I 53 22 25 I 54 16 20 I 65 20 13 I 55 27 18 II I ~T ~F ~NA I 
Totals I 66.2 I 69.2 I 52.2 ~ 53.9 I 72.3 II I ~ Weighted Error I ......................................................................................................... 
Table 8: ATIS SLS Test Results - Using 
Minimal/Maximal Scoring Criterion 
27 
