Evaluating the Use of Prosodic Information 
in Speech Recognition and Understanding 
Marl Ostendorf Patti Price 
Boston University SRI International 
Boston, MA 02215 Menlo Park, CA 94025 
PROJECT GOALS 
The goal of this project is to investigate the use of differ- 
ent levels of prosodic information in speech recognition 
and understanding. There are two thrusts in the current 
work: use of prosodic information in parsing and detec- 
tion/correction of disfluencies. The research involves de- 
tern'fining a representation of prosodic information suit- 
able for use in a speech understanding system, devel- 
oping reliable algorithms for detection of the prosodic 
cues in speech, investigating architectures for integrat- 
ing prosodic cues in a speech understanding system, and 
evaluating the potential performance improvements pos- 
sible through the use of prosodic information in a spo- 
ken language system (SLS). This research is sponsored 
jointly by DARPA and NSF, NSF grant no. IRI-8905249, 
and in part by a DARPA SLS grant to SKI. 
RECENT RESULTS 
• Evaluated the break index and prominence recogni- 
tion algorithms on a larger corpus, with paragraphs 
(as opposed to sentences) of radio announcer speech. 
• Extended the prosody-parse scoring algorithm to 
use a more integrated probabilistic scoring criterion 
and to include prominence information, making use 
of tree-based recognition and prediction models. 
• Collaborated with a multi-site group for develop- 
ment of a core, standard prosody transcription 
method: TOBI, (TOnes and Break Indices), and 
labeled over 800 utterances from the ATIS cor- 
pus with prosodic break and prominence informa- 
tion. Analyses of consistency between labelers 
shows good agreement for the break and prominence 
labels on ATIS. 
• Ported prosody-parse scoring algorithms to ATIS, 
which required: developing new features for the 
acoustic and prosody/syntax models and represent- 
ing new classes of breaks to represent hesitation; 
currently evaluating the algorithm for reranking the 
N-best sentence hypotheses in the MIT and SKI 
SLS systems. (This work was made possible by re- 
searchers at MIT and SRI who provided the parses 
and recognition outputs needed for training and 
evaluating the prosody models.) 
• Developed a new approach to duration modeling 
in speech recognition, involving context-conditioned 
parametric duration distributions and increased 
weighting on duration. 
• Developed tools for analysis of large number of re- 
pairs and other disfluencies; analyzed the prosody 
of filled pauses in ATIS data and extended the work 
on disfluencies to data in the Switchboard corpus of 
conversational speech. 
• Developed methods for automatic detection and 
correction of repairs in ATIS corpus, based on in- 
tegrating information from text pattern-matching, 
syntactic and semantic parsing. 
PLANS FOR THE COMING YEAR 
• Evaluate the break index and prominence recogni- 
tion algorithms on spontaneous speech, specifically 
the ATIS corpus, and further refine algorithms to 
improve performance in this domain. 
• Improve the parse scoring algorithm performance 
in the ATIS domain by exploring new syntactic fea- 
tures, and asses performance on SKI vs. MIT SLS 
systems. 
• Investigate alternative approaches to integrating 
prosody in speech understanding. 
• Continue study of acoustic and grammatical cues to 
repairs and other spontaneous speech effects. 
• Based on the results of the acoustic analyses, de- 
velop automatic detection algorithms for flagging 
repairs that are missed by the syntactic pattern 
matching algorithms and develop algorithms for 
classifying detected repairs to aid in determining the 
amount of traceback in the repair. 
388 
