Discovering the Sounds of Discourse Structure* 
Extended Abstract 
Barbara J. Grosz 
Division of Engineering and Applied Sciences 
Harvard University 
33 Oxford Street 
Cambridge, MA 02138 USA 
grosz(~eecs.harvard.edu 
It is widely accepted that discourses are com- 
posed of segments and that the recognition of 
segment boundaries is essential to a determina- 
tion of discourse meaning (Grosz and Sidner, 
1986). Written language has orthographic cues 
such as section headings, paragraph boundaries, 
and punctuation which can assist in identifying 
discourse structure. In spoken language, into- 
national variation provides essential information 
about disconrse structure. For instance, it may 
be used to mark structural features of discourse 
at the global level, such as segment boundaries. 
Intonation also provides more local information 
about relations among utterances within a seg- 
ment, for example indicating whether phrases are 
parenthetical. It can also help distinguish between 
different interpretations of phrases that can func- 
tion either as cue phrases that indicate discourse 
segment bomldaries or sententially to convey do- 
main information. Finally, variations in intona- 
tional prominence may be used to convey informa- 
tion about the discourse status of entities referred 
to by definite noun phrases and pronouns. 
An understanding of intonational variation and 
the ways in which it carries information about dis- 
course characteristics of spoken language is impor- 
tant for computer-based interpretation and gener- 
atien of speech. From the interpretation perspec- 
tive, this understanding may provide new tech- 
niques for identifying discourse structure. From 
the generation perspective, it would lead to more 
natural synthetic speech, making it possible to 
produce comtmter speech that is easier for people 
to understand and less susceptible to misinterpre- 
tation. 
Three major challenges have faced researchers 
attempting to discover the relationship between 
intonational features and the structure of spoken 
discourse. First, the collection of corpora of spon- 
taneous speech has required the development of 
* The research described in this presentation was 
supported by the National Science Foundation, Grant 
IRI 94-04756. The research has been done collabora- 
tively with Julia Hirschberg and Christine Nakatani. 
David Ahn provided invaluable technical musistance. 
new experimental methodologies. Whereas it is 
straightforward to have the same text read by 
many speakers, it is much more difficult to ob- 
tain similar samples of spontaneous speech from 
nmltiple speakers. Second, techniques must be de- 
veloped to obtain reliable segmentations and la- 
belings of the corpora. Because discourse struc- 
ture is rooted in semantics rather than syntax, this 
has proved more difficult than tagging corpora for 
sentence structure. Third, measures of agreement 
among segmentations must be designed. In this 
area too, the semantic nature of discourse struc- 
ture leads to a more complex problem than com- 
paring sentence parse structures. 
This talk will begin with a summary of pi- 
lot studies that demonstrated reliable correla- 
tions of discourse structure and intonational fea- 
tures (Grosz and Hirschberg, 1992; Hirschberg 
and Grosz, 1992; Hirschberg and Grosz, 1994). 
It will then tbcus on a new corpus of direction- 
giving monologues, the Boston Directions Corpus 
(Nakatani et al., 1995a; Hirschberg and Nakatani, 
1996). I will describe the methodology we devel- 
oped to elicit fluent spontaneous direction-giving 
monologues ranging over a spectrum of planning 
complexity. Next I will describe the development 
of annotation instructions used to train labelers 
to segment spoken discourses (Nakatani et al., 
1995b) and will discuss agreement among segmen- 
tations on the Boston Directions Corpus obtained 
using these instructions. Then I will describe re- 
sults of our analyses of the correlation between 
discourse structure and intonational features. Fi- 
nally, I will present, a list of challenges for fllture 
research in this area. 
References 
Barbara Grosz and Julia Hirschberg. 1992. Some 
intonational characteristics of discourse struc- 
ture. In John Ohala et al., editor, Proceedings 
of the 1992 International Conference on Spoken 
Language Processing (ICSLP-92), pages 429 
432, Edmonton, Canada. Personal Publishing 
Ltd. 
Barbara Grosz and Candace Sidner. 1986. Atten- 
tion, intentions, and the structure of discourse. 
Computational Linguistics, 12(3):175k-204. 
Julia Hirschberg and Barbara Grosz. 1992. Into- 
national features of local and global discourse 
structure. In Proceedings of the Speech and 
Natural Language Workshop, pages 441-446. 
Defense Advanced Research Projects Agency, 
February. 
Julia Hirschberg and Barbara Grosz. 1994. Into- 
nation and discourse structure in spontaneous 
and read direction-giving. In Proceedings of 
the International Symposium on Prosody, pages 
103 109. Japan Society for the Promotion of 
Science. 
Julia Hirschberg and Christine H. Nakatani. 1996. 
A prosodic analysis of discourse segments in 
direction-giving monologues. In Proceedings of 
the Annual Meeting o/ the Association for Com- 
putational Linguistics. 
Christine Nakatani, Julia Hirschberg, and Bar- 
bara Grosz. 1995a. Discourse structure in spo- 
ken language: Studies on speech corpora. In 
Working Notes of the AAAI-95 Spring Sym- 
posium on Empirical Methods in Discourse In- 
te~Tretation, pages 106-112, Menlo Park, CA. 
American Association for Artificial Intelligence. 
Christine H. Nakatani, Barbara J. Grosz, 
David D. Ahn, and Julia Hirschberg. 1995b. 
Instructions for annotating discourse. Techni- 
cal Report TR-21-95, Harvard University. 
2 
