Evaluating the Use of Prosodic Information 
in Speech Recognition and Understanding 
Mari Ostendor.f Patti Price 
Boston University 
Boston, MA 02215 
PROJECT GOALS 
The goal of this project is to investigate the use of dif- 
ferent levels of prosodic information in speech recognition 
and understanding. In particular, the current focus of the 
work is the use of prosodic phrase boundary information in 
parsing. The research involves determining a representation 
of prosodic information suitable for use in a speech under- 
standing system, developing reliable algorithms for detec- 
tion of the prosodic cues in speech, investigating architec- 
tures for integrating prosodic cues in a parser, and evaluat- 
ing the potential improvements of prosody in the context of 
the SRI Spoken Language System. This research is spon- 
sored jointly by DARPA and NSF. 
RECENT RESULTS 
• Investigated duration lengthening at different levels of 
prosodic breaks over different types of units (e.g., final 
syllable, interstress interval), finding that the primary 
region of lengthening is the phrase-final, word-final 
syllable rhyme. 
• Implemented a binary tree quantizer in our HMM 
prosodic phrase boundary detection system, which en- 
abled the use of multiple cues determined from the 
above analysis. A new algorithm for adapting model 
parameters has also been implemented. The result- 
ing algorithm has speaker-independent break recog- 
nition performance above the level of our previous 
speaker-dependent system. A paper describing this 
work (Wightman and Ostendorf) appears in the Pro- 
ceedings of the International Conference on Acoustics, 
Speech and Signal Processing. 
• Submitted for publication an article describing a per- 
ceptual, phonological and phonetic analysis of the re- 
lationship between prosodic structure and syntactic 
structure. A shortened version of this paper (Price 
et al.) appears in this volume. (This work was also 
funded by a related grant on prosody, NSF grant num- 
ber IRI-8805680.) 
• Developed an initial version of an analysis-by-synthesis 
approach for scoring a syntactic parse in terms of the 
SRI International 
Menlo Park, CA 94025 
observed prosodic constituents. This algorithm lever- 
ages recent results on the role of prosody in disam- 
biguation, mentioned above. We assessed this method 
by using the algorithm to decide between pairs of am- 
biguous sentences, finding that the algorithm perfor- 
mance is close to human perception. A paper describ- 
ing this work (Wightman et al.) appears in this pro- 
ceedings. 
PLANS FOR THE COMING YEAR 
• Further improve the break index algorithms by adding 
an intonation feature to the quantizer. We expect 
this to improve performance considerably because the 
main source of errors is confusion between break in- 
dices 3 (no boundary tone) and 4 (marked with bound- 
ary tone). 
• Evaluate the break index detection algorithms on para- 
graphs of speech (as opposed to sentences) and on 
spontaneous speech as opposed to read speech. 
• Implement the parse scoring algorithm using automat- 
ically obtained parses and evaluate on additional data. 
• Utilize the parse scoring algorithm in speech under- 
standing. 
408 
