File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-0608_intro.xml
Size: 3,011 bytes
Last Modified: 2025-10-06 14:06:22
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0608"> <Title>A practical Message-to-Speech strategy for dialogue systems</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Many of the Natural Language Generation (NLG) systems that produce flexible output, i.e. sentences with variations on the syntactical and morphological levels, only aim at the production of written text and do not deal with spoken language. By doing so, the important topic of generation of natural prosody is not touched upon (see e.g. (Elhadad, 1992; Reiter et al., 1995; Dalianis, 1996; Somerset al., 1997)). On the other hand, message generating systems that provide speech of a natural quality (e.g. announcement systems, phone banking and voice mail applications) often combine fixed pieces of pre-recorded speech. These text and message generating systems are either resource intensive (powerful CPU, large storage and memory capacity, ...) or provide only limited flexibility, which seriously hampers their integration in a dialogue system.</Paragraph> <Paragraph position="1"> The Message-to-Speech (MTS) system described below is specifically designed to function in an environment with seriously restrained computational resources where it is impossible to store large amounts of pre-recorded speech. In this context, Text-to-Speech (TTS) is an evident alternative. However, for dialogue systems using a predefined set of message types, the use of special purpose prosody models can lead to a prosodic quality that is superior to the one generated by TTS systems, which apply general purpose prosody models for unrestricted text (see also (Hovy, 1995, p.161)). Our prosody transplantation tool (see section 2) exploits this idea: for the fixed parts of a message it allows to overrule prosody generated by general models, as is done by TTS, with specific prosody copied from natural speech. Prosody by general model is only used for those parts of the message where flexibility is needed. The MTS system combines transplanted prosody with prosody by model in order to cope with partly variable messages while still preserving natural prosody (Van Coile et al., 1995).</Paragraph> <Paragraph position="2"> Details on the MTS system will be provided in the third section. It consists of two components: the MTS generation and the MTS prosodic integration parts. The former module (see section 3.1) is template driven (canned &quot;text&quot; interspersed with slots). For a discussion of template driven systems see (van Deemter et al., 1994; van Deemter and Odijk, 1997; Reiter, 1995). The templates account for the flexibility, including the linguistic variation, of the messages. The latter module (see section 3.2) specifically takes care of assimilation and the prosodic integration of the slot values with the rest of the template. A discussion concludes this paper (see section4).</Paragraph> </Section> class="xml-element"></Paper>