File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/c94-2160_intro.xml

Size: 4,103 bytes

Last Modified: 2025-10-06 14:05:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-2160">
  <Title>THE PARSODY SYSTEM : AUTOMATIC PREDICTION OF PROSODIC BOUNDARIES FOR TEXT-TO-SPEECH</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Modern text-to-speech (TTS) systems are quite good at word level synthesis, but tend to perform badly on connected word sequences. It has been suggested that the poor prosody of synthetic connected speech is the primary factor leading to difficulties in comprehension \[1,5\]. TTS systems must therefore incorporate better mechanisms for prosodic processing. For the purpose of this article, prosodic processing is narrowly interpreted as being the prediction of the location and of the relative strengths (salience) of prosodic boundaries (although, of course, there are several other important aspects to prosody). A prosodic boundary is a point in a spoken utterance associated with important acoustic prosodic phenomena, such as pauses and pitch change.</Paragraph>
    <Paragraph position="1"> There are two main approaches to the prosodic marking problem : the rule-based approach and the stochastic-based approach.</Paragraph>
    <Paragraph position="2"> The rule-based approach stems from Gee and Grosjean's work on performance structures 1 \[7\], which has been the focus of many extensions, such as that reported by Bachenko and Fitzpatrick \[2,4\]. Gee and Grosjean's work sought to account for the (then) disparity between linguistic phrase-structure theories and actual performance structures produced by humans, and focused on recreating the pause data of several analysed sentences from syntax (althougtt they claim that their method could easily account for other prosodic features). The central tenet of their work was that prosodic phrasing is a compromise between the need to respect the linguistic structure of the sentence and the performance aspect (which manifests itself as a need to balance the length of the constituents in the output).</Paragraph>
    <Paragraph position="3"> More recent efforts have extended the Gee and Grosjean approach in various ways. Bachenko and Fitzpatrick take a similar rule-based approach but believe that syntax plays a lesser role in determining phrasing, and that certain prosodic performance constraints, such as length, override syntactic structure. They allow prosodic boundaries to cross syntactic boundaries (under certain conditions), whereas Gee and Grosjean viewed their 1 Performmme structures are &amp;quot;structures based on experimental data, such as pausing and parsing values&amp;quot; \[7\].</Paragraph>
    <Paragraph position="4"> rules as acting within basic sentence clauses. Bachenko and Fitzpatrick made several other changes to the basic Gee and Grosjean algorithm, including counting phonological words 2 rather than actual words when determining node strengths. Wightman et al. have proposed some further interesting extensions to the Bachenko and Fi~patrick method \[12\].</Paragraph>
    <Paragraph position="5"> With the availability of large and accurately labelled prosodically annotated corpora, the stochastic-based approach will come more to the fore. Wang and Hirschberg \[11\], and Ostendorf et al. \[9\], have both described methods for automatically predicting prosodic information using decision tree models. Generally, decision trees are derived by associating a probability with each potential boundary site in the text, and relating various features with each boundary site (e.g.</Paragraph>
    <Paragraph position="6"> utterance and phrase duration, length of utterance (in syllables/words), positions relative to the start or end of the nearest boundary location etc.) \[11\]. The resulting decision tree provides, in effect, an algorithm for predicting prosodic boundaries on new input texts.</Paragraph>
    <Paragraph position="7"> It is interesting to note that Ostendorf et al. report similar results in their evaluations of the performance of both the rule-based and decision tree algorithms.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML