Overview of the 
1994 ARPA Human Language Technology Workshop 
Clifford J. Weinstein, Chair, Editor 
MIT Lincoln Laboratory 
Lexington, MA 02173 
1. INTRODUCTION 
This volume presents papers, session summaries, 
and project summaries from the Second ARPA 
Human Language Technology Workshop, which 
was held at the Merrill Lynch Conference Center 
in Plainsboro, NJ, March 8-11, 1994. The 
Workshop was actually the seventh in a series of 
ARPA workshops which began in 1988; the first 
five were called the DARPA Speech and Natural 
Language Workshops, and the name was changed 
to the ARPA Human Language Technology 
(HLT) Workshop in 1993 to reflect the broadened 
focus and increasing unification of ARPA- 
sponsored research in spoken and written 
language. In addition, the "D" was dropped from 
DARPA in 1993, reflecting a broadening of its 
charter. The Proceedings of the seven 
Workshops, all of which have been published by 
Morgan-Kaufmann, represent a rich source of 
information on the rapid progress in spoken and 
written language technology which has been 
achieved over the past decade, due in no small 
measure to the leadership of the ARPA Program 
Managers. 
2. THE 1994 ARPA HUMAN 
LANGUAGE TECHNOLOGY 
WORKSHOP 
As with past workshops, the 1994 HLT 
Workshop provided a forum where researchers 
were able to share information about very recent 
technical progress in a highly interactive setting. 
The scope included all areas of spoken and 
written language research under ARPA's HLT 
program, including speech recognition, speech 
understanding, text understanding, information 
retrieval, and machine translation, with an 
emphasis on topics of particular current interest 
such as evaluation of language understanding 
systems, and statistical and learning methods. 
The majority of the workshop participants 
receive funding under ARPA's HLT program. 
Other participants included: government 
researchers and users of the technology; 
researchers not funded by ARPA who participate 
voluntarily in these programs; and selected 
visitors from both inside and outside the United 
States. Non-U.S. participation was particularly 
strong in 1994, with 30 attendees representing 9 
countries; many of these non-U.S, attendees 
participated directly and voluntarily in the various 
formal evaluations of human language systems. 
In all, there were 230 attendees at the 1994 
Workshop, consisting of approximately 150 
from ARPA sites, 30 U.S. government 
representatives, 30 non-U.S, participants, and 20 
non-ARPA attendees from the U.S. 
For the first time, this HLT Workshop directly 
followed an ARPA Spoken Language 
Technology (SLT) Workshop which was held at 
the same location on March 6-8. This allowed 
the detailed reporting and discussion of the latest 
evaluations of speech recognition and spoken 
language systems to be held at the SLT 
Workshop, allowing time for broader coverage of 
technical topics at the HLT Workshop. It also 
facilitated the attendance at both Workshops of 
the non-U.S, participants in these evaluations. 
The SLT Workshop was chaired by Richard Stem 
of Carnegie Mellon University and attracted 
approximately 125 attendees. A separate SLT 
Workshop Proceedings is being published by 
Morgan Kaufmann. 
3. HUMAN LANGUAGE 
TECHNOLOGY PROGRAM 
OVERVIEW 
George Doddington, ARPA Program Manager for 
Human Language Technology, began the 
Workshop with an overview of the program's 
motivation, mission, theme areas, 
accomplishments, and future directions. 
Doddington emphasized the key role of Human 
Language Technology in providing people with 
the ability to effectively use the National 
Information Infrastructure. He stated a three-part 
Human Language Technology Program Mission, 
to: 
develop Human Language Technologies of 
key importance; 
3 
, demonstrate Human Language Technologies 
in compelling application contexts; and 
• transfer Human Language Technologies into 
productive use. 
Doddington emphasized the dual roles of 
ILechnology R&D and technology transfer in 
serving this mission. In the ARPA HLT 
Program, R&D progress is driven by 
establishing formal technical challenge tasks in 
the theme areas including speech recognition, 
speech understanding, document retrieval, 
information extraction from text, and machine 
translation, and by providing infrastructure 
support including corpus development and 
regular, formal evaluations of the technology. 
Technology transfer is driven by identifying 
critical needs and technology transfer champions, 
and supporting the transfer with focussed R&D. 
Doddington summarized and highlighted recent 
progress in; spoken language understanding in 
the Air Travel Information System (ATIS) task; 
large-vocabulary continuous speech recognition; 
document detection; and machine translation. 
With regard to new technical directions, he 
highlighted the current investigation of a task- 
independent evaluation of understanding, referred 
to as Semantic Evaluation (SemEval), which 
constituted a major topic of discussion later in 
the Workshop. 
4. HIGHLIGHTS OF THE 
WORKSHOP SESSIONS 
The Workshop was comprised of 14 sessions, 
including a government panel session and a 
demonstration session. For a good overview of 
the technical content of the workshop, the reader 
is encouraged to first read the Session Chairs' 
summaries which precede the collected papers 
from each of the sessions. These summaries 
provide perspective on the research reported as 
well as outlining the key points in each set of 
papers. 
A few of the highlights of the Workshop 
included: 
The impressive progress reported by the 
Linguistic Data Consortium (three papers in 
Session 1) in collecting and disseminating 
lexicons, text resources, and speech corpora 
which are supporting the advancement of 
Human Language Technology worldwide. 
• The new hub and spoke paradigm for large- 
vocabulary continuous speech recognition 
(CSR) evaluation, which has successfully 
balanced common evaluation and diverse 
research goals. 
The Human Language Evaluation Session, 
which included many diverse views on the 
evaluation of language understanding and a 
spirited discussion of the Semantic 
Evaluation (SemEval) approaches currently 
being explored in the ARPA HLT 
community. 
A strong Machine Translation Session, 
including a report on a very substantial 
recent evaluation which included 19 
participants; the fact that all but three of 
these participants were volunteers not 
supported by ARPA is indicative of ARPA's 
world leadership role in this area. 
A Demonstration Session, organized and 
chaired by Victor Abrash, which included 
demonstrations of HLT for command and 
control data access, spoken language 
translation, access to data sources on the 
information highway, text retrieval and 
understanding, and reading education. 
A double-length session on Statistical and 
Learning Methods, highlighting the 
continuing progress in corpus-based 
approaches to text understanding. 
A Government Panel, organized and chaired 
by Oscar Garcia, which included both U.S. 
and international views on the directions of 
Human Language Technology. 
A New Directions Session, highlighted by a 
presentation by a Canadian astronaut and 
speech researcher, Julie Payette, on 
applications of speech recognition in space, 
and also including two papers describing 
ways in which speech recognition 
technology has been applied to the automatic 
recognition of handwritten text. 
5. ACKNOWLEDGMENTS 
The success of this Workshop was due to the 
hard work of many people. A special thanks 
goes to James Glass (MIT-LCS), who served as 
Workshop Vice-Chair and contributed greatly to 
all aspects of the Workshop organization. The 
Workshop Planning Committee was responsible 
for reviewing and selecting papers and 
demonstrations for presentation, shaping the 
overall Workshop program, and setting 
4 
Workshop policies. Members of that committee 
were: James Allen (U. Rochester), Madeleine 
Bates (BBN), Michael Cohen (SRI), Oscar Garcia 
(NSF), Ralph Grishman (NYU), Donna Harman 
(NIST), Lynette Hirschman (MITRE), Eduard 
Hovy (ISI), Paul Jacobs (GE), Mitchell Marcus 
(U. Penn), Mari Ostendorf (BU), Richard 
Schwartz (BBN), Richard Stem (CMU), the Vice- 
Chair, James Glass (MIT-LCS), and myself. 
Six of those people (Bates, Garcia, Grishman, 
Marcus, Ostendorf, and Weinstein) also served on 
the Workshop Standing Committee, chaired this 
year by Mitch Marcus, which is responsible for 
overall organization and continuity of the series 
of ARPA workshops. 
A special acknowledgment also goes to Richard 
Stern, who chaired the SLT Workshop which 
preceded this one, and helped make the 
coordination go very smoothly. 
A great deal of credit and a very special thanks 
goes to Linda Nessman (MIT Lincoln 
Laboratory) who served as Workshop 
Administrator, which means everything from e- 
mall archivist to registrar. She contributed to all 
aspects of Workshop organization. She handled 
registration, finance, paper abstracts, and the 
varied requests and problems of the many 
attendees. She prepared the notebooks of 
preliminary papers that were handed out at the 
Workshop, and collected all the final papers and 
summaries and assembled this volume in photo- 
ready form. 
Victoria Palay (MIT-LCS), a past Workshop 
Administrator, provided very valuable help and 
guidance in many of the administrative aspects of 
the Workshop, and particularly in setting up the 
various databases. 
Victor Abrash (SRI) deserves particular thanks 
for taking complete charge of the demonstration 
session and handling the many difficult details 
involved in arranging and coordinating a large 
number of live demos. 
I would like to acknowledge all the Session 
Chairs, who did an excellent job of keeping the 
sessions on track and contributed the excellent 
session summaries contained in these 
Proceedings. The chairs were: George Miller 
(Princeton), Xuedong Huang (Microsoft), Lynette 
Hirschman (MITRE), Eduard Hovy (ISI), Paul 
Jacobs (GE), Madeleine Bates (BBN), Victor 
Abrash (SRI), Frederick Jelinek (Johns Hopkins), 
Oscar Garcia (NSF), Steven Young (Cambridge 
U.), Donna Harman (NIST), Richard Schwartz 
(BBN), and Richard Stem (CMU). 
As Human Language Technology Program 
Manager at ARPA, George Doddington provided 
overall direction and encouragement to the 
Workshop planners, and was particularly helpful 
to me as Workshop Chair. His enthusiasm for 
the program and his devotion to technical 
excellence were crucial in making this Workshop 
a resounding success. 
5 
