SPOKEN-LANGUAGE RESEARCH AT CARNEGIE MELLON 
Raj Reddy, Principal Investigator 
School of Computer Science 
Carnegie Mellon University 
Pittsburgh, Pennsylvania 15213 
PROJECT GOALS 
The goal of speech research at Carnegie Mellon continues to 
be the development of spoken language systems that effec- 
tively integrate speech processing into the human-computer 
interface in a way that facilitates the use of computers in the 
performance of practical tasks. Research in spoken lan- 
guage is currently focussed in the following areas: 
• Improved speech recognition accuracy: Improving 
SPHINX-II by better acoustic and language models and 
more efficient search algorithms. Obtaining task-indepen- 
dent acoustic models and senone mapping tables using 
large speech corpora. Developing task-adaptation tech- 
niques forboth acoustic and language models. 
° Fluent human/machine interfaces: Developing tools that 
provide fluent communication between users and comput- 
ers by voice and understanding the role of voice in the com- 
puter interface. 
• Robust language modeling: Developing techniques for 
combining multiple language training sources to create 
improved language models. Applying new knowledge 
sources to strengthen existing language models. 
• Understanding spoken language: Developing flexible 
recognition and parsing strategies to cope with spoken lan- 
guage. Developing and evaluating methods of integrating 
speech recognition and natural language understanding. 
Developing automatic training procedures for grammars. 
• Acoustical and environmental robustness: Extending 
environment-adaptation procedures to telephone channels 
and other more difficult environments. 
RECENT RESULTS 
• A task-independent very-large vocabulary version of 
SPHINX-II has been developed from 37,000 WSJ utter- 
antes. WSJ acoustic models have been adapted to ATIS by 
use of pruning and interpolation. Recognition accuracy has 
been further improved by the use of phone-dependent code- 
books and multiple speaker dusters. 
• A one-pass trigram langnge model has been implemented 
that reduces search time by using a lexical tree. 
• An adaptive long.distance language model was created that 
reduces pe~lexity by 32%-39% and word error rate by 
14%-16% over the conventional Izigram. 
451 
• User interaction for the ATIS system has been improved by 
incorporating clarification and mixed-initiative dialogs, 
speech output, and form-based displays. 
• The Recursive Transition Network grammars used in tho 
PHOENIX system have been combined with bigram lan- 
guage models in the A* portion of the recognition searfh of 
SPHINX-II. This reduces sentence understanding errors. 
• Error rates for telephone speech and other secondary envi- 
ronments have been reduced by 40 percent by implement- 
ing reduced-bandwidth analysis, phone-dopendent cepstral 
normalization, and codebook-adaptation procedures. 
• A phonetically-lranseribed 100,000-word dictionary has 
been completed and released to the public. 
• SPHINX-II has been adapted for use in CMU's Project 
LISTEN, a prototype reading coach funded by NSF. 
PLANS FOR THE COMING YEAR 
• Develop a 100,000-word implementation of SPHINX-II 
that runs in real time and uses less than 50 megabytes. 
• Add multiple dynamic language models and other forms of 
incremental learning to SPHINX-II. 
• Investigate the use of fully-continuous HMMs to take full 
advantage of the large amount of available training data. 
• Investigate technologies to reduce recognition errors 
caused by OOV words. 
• Develop speech coding algorithms f~ real time implemen- 
tation and use compressed speech signal in the speech rec- 
ognition system. 
• Develop techniques for combining multiple language mod- 
els from different sources, and develop new knowledge 
sources to s~engthen existing language models. 
• Implement a speech.only interface for a workstation envi- 
ronment, supporting both command and dictation capabili- 
ties. 
• Continue development and evaluation of systems in which 
both stochastic and grammar-based language model infor- 
marion are used to improve the speech recognition search. 
• Improve the representation of context information and its 
relation to the parser in our Spoken Language System 
• Continue to develop methods for automatic acqni~ition of 
Natural Language information used by an SLS system. 
• Further extend and improve environmental robustness. 
• Port spoken-language technologies to a new secretarial- 
assistant task. 
