File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-1207_intro.xml
Size: 5,844 bytes
Last Modified: 2025-10-06 14:06:26
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-1207"> <Title>Semantic and Discourse Information for Text-to-Speech Intonation</Title> <Section position="4" start_page="0" end_page="48" type="intro"> <SectionTitle> 2 Semantic and Discourse Effects on </SectionTitle> <Paragraph position="0"> Intonation The effects of &quot;givenness&quot; on the accentability of lexical items has been examined in some detail and has led to the development of intonation algorithms for both text-to-speech (Hirschberg, 1990; Hirschberg, 1993; Monaghan, 1991; Terken and Hirschberg, 1994) and concept-to-speech systems (Monaghan, 1994).</Paragraph> <Paragraph position="1"> While the strategy of accenting open-class items on first mention often produces appropriate and natural-sounding intonation in synthesized speech, such algorithms fail to account for certain accentual patterns that occur with some regularity in natural speech, such as items accented to mark an explicit contrast among the salient discourse entities. In addition, the given/new distinction alone does not seem to account for the variation among accent types found in natural speech. 2 Unfortunately, such issues have been difficult to resolve for text-to-speech because of the paucity of semantic a.nd discourse-level information readily available without sophisticated text understanding algorithms and robust knowledge representations.</Paragraph> <Paragraph position="2"> Previous CTS work (Prevost, 1995; Prevost, 1996; Prevost and Steedman, 1994) showed that both contrastive accentual patterns and limited pitch accent variation could be modeled in a spoken language generation system. The present work incorporates these results in a 2Of course, the granularity of the given/new distinc|.ion may be at issue here. The relationship of accent types to the given/new taxonomy proposed by (Prince, \[981) may warrant more exploration in a computational fl'amework.</Paragraph> <Paragraph position="3"> text-to-speech system, using a similar representation for discourse context (i.e. information structure), and replacing the domain-specific knowledge base with WordNet.</Paragraph> <Paragraph position="4"> We represent local discourse context using a two-tiered information structure framework. In the higher tier, propositions are divided into theme and rheme. The theme represents what the proposition is about and provides the contextual link to prior utterances. The rheme provides the core contribution of the proposition to the discourse--the material the listener is unlikely to predict from context. In the simplest case, where an utterance conveys a single proposition, the division into theme and rheme is often straightforward, as shown in the question/answer pair in Figure 1.</Paragraph> <Paragraph position="5"> (Steedman, 1991) and (Prevost and Steedman, 1994) argue that for the class of utterances exemplified by these examples, the rheme of the utterance often occurs with an intonational (intermediate) phrase carrying the H* L-L% (H* L-) tune, while the theme, when it bears any marked intonational features, often carries the L-t-H* L-Ldeg~ (L-I-H* L-) tune.</Paragraph> <Paragraph position="6"> While this mapping of thematic constituents onto intonational tunes is certainly an oversimplification, it has been quite useful in previous concept-to-speech work. We are currently using the Boston University radio news corpus (Ostendorf, Price, and Shattuck-Hufnagel, 1995) to compile statistics to support our use of this mapping. 3 Preliminary results show that the H* accent is most prevalent, occurring more than fifty percent of the time. !H* and L--t-H* occur less frequently than H*, but more than any of the other possible accents. We take the prevalence of H* and L-t-H* in the corpus to support our decision to focus on these accent types.</Paragraph> <Paragraph position="7"> Given the mapping of tunes onto thematic and rhematic phrases, one must still determine which items within those phrases are to be accented. We consider such items to be in themeor rheme-focus, the secondary tier of our in- null Q: I know the SMART programmer wrote the SPEEDY algorithm, is based on both givenness and contrastiveness.</Paragraph> <Paragraph position="8"> For the current TTS task, we consider items to be in focus on first mention and whenever Word-Net finds a contrasting item in the current discourse segment. The algorithm for determining the contrast sets is described in Section 4 below. The adaptation of an information structure approach to the TTS task highlights a number of important issues. First, while it may be convenient to think of the division into theme and theme in terms of utterances, it may be more appropriate to consider the division in terms of propositions. Complex utterances may contain a number of clauses conveying several propositions and consequently more than one theme/rheme segmentation. Our program annotates thematic and rhematic stretches of text by first trying to locate propositional constituents, as described in Section 4.</Paragraph> <Paragraph position="9"> Another information structure issue brought to fight by the TTS task is that themes may not consist solely of background material, but may also include inferable items, as shown in example (1). In this example, &quot;name&quot; is certainly not part of the shared background between the speaker and the listener. However, since it is common knowledge that pets have names, it serves as a coherent thematic rink to the previous utterance. 4 (1) Miss Smith has a Colfie.</Paragraph> <Paragraph position="10"> The dog's NAME is LASSIE.</Paragraph> <Paragraph position="11"> LWH* L- H* L-L% 4WordNet can capture some inferences, but is unable to account for a complex relationship like this one.</Paragraph> </Section> class="xml-element"></Paper>