Segment-Based Acoustic Models with Multi-level Search 
Algorithms for Continuous Speech Recognition 
Marl Oatendov/ J. Robin Rohlicel¢ 
Boston University 
Boston, MA 02215 
PROJECT GOALS 
The goal of this project is to develop improved acoustic 
models for speaker-independent recognition of continuous 
speech, together with efficient search algorithms appropri- 
ate for use with these models. The current work on acoustic 
modelling is focussed on stochastic, segment-based models 
that capture the time correlation of a sequence of observa- 
tions (feature vectors) that correspond to a phoneme. Since 
the use of segment models is computationally complex, we 
are investigating multi-level, iterative algorithms to achieve 
a more efficient search. Furthermore, these algorithms will 
provide a formalism for incorporating higher-order infor- 
mation. This research is jointly sponsored by DARPA and 
NSF. 
RECENT RESULTS 
• Demonstrated improved phoneme classification per- 
formance (23% reduction of error) with left-context- 
dependent, gender-dependent segment models. 
• Developed a new segment model for recognition based 
on a dynamical system representation of time correla- 
tion. Since classical approaches to system identifica- 
tion (parameter estimation) are too costly, an alterna- 
tive algorithm was developed based on the Estimate- 
Maximize algorithm. Phoneme classification results 
show a significant improvement over previous approaches 
to time correlation modeling. 
• Jointly with BBN, developed a methodology for in- 
tegrating recognition systems based on rescoring the 
N-best sentence hypotheses and combining scores to 
rerank the sentences. Demonstrated improved word 
recognition performance by combining the BBN Byb- 
los system results with the BU segment model rescor- 
ing. Implemented a segment model rescoring system 
using context-dependent models which gives encour- 
aging initial results. 
• Applied Bayesian techniques for speaker-adaptation 
to the stochastic segment model. Adapting distribu- 
tions means alone, with 30 sentences, yields a 16% re- 
BBN Inc. 
Cambridge, MA 02138 
duction in phoneme recogntion error relative to speaker- 
independent performance. 
• Investigated iterative search algorithms for word recog- 
nition. 
PLANS FOR THE COMING YEAR 
• Extend dynamical system model results by consid- 
ering different correlation structures and confirming 
that classification performance extends to recognition 
applications. 
• Investigate the use of context-dependent models in 
segment-based word recognition in the N-best rescor- 
ing formalism. 
• Investigate parallel algorithms for segment recognition 
implemented on the Connection Machine. 
• Extend algorithms for speaker adaptation which in- 
clude variance adaptation. 
• Continued investigation of iterative and multi-level al- 
gorithms including: incorporation of global constraints 
and features, iterative search for word recognition, 
and progressive application of phoneme context, word 
and grammar constraints. 
409 
