File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/94/j94-1002_relat.xml

Size: 5,870 bytes

Last Modified: 2025-10-06 14:16:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="J94-1002">
  <Title>A Hierarchical Stochastic Model for Automatic Prediction of Prosodic Boundary Location</Title>
  <Section position="3" start_page="29" end_page="30" type="relat">
    <SectionTitle>
2. Previous Work
</SectionTitle>
    <Paragraph position="0"> Initial attempts to incorporate prosody in speech synthesis involved determining intonation and duration patterns as a function of syntactic phrase structure (Allen, Hunnicutt, Carlson, and Granstrom 1979; Allen, Hunnicutt, and Klatt 1987), which requires syntactic parsing. More recently, researchers have attempted to address the fact that prosody and syntax are not directly related by explicitly predicting prosodic phrase boundaries rather than using syntactic clause boundaries. An important difference between these subsequent approaches is in the amount of syntactic information used to predict prosodic boundaries. The algorithms reflect different assumptions about the relationship between prosody and syntax, as well as different levels of computational complexity. Clearly, a greater use of syntactic information will require more computation for finding a more detailed syntactic parse.</Paragraph>
    <Paragraph position="1"> One approach is based on the idea that a prosodic parse may not require a full syntactic parse and that detailed part-of-speech information (e.g., noun, verb, determiner) may not be necessary for generating a prosodic parse. Sorin, Larreur, and Llorca (1987) proposed a simple prosodic parser for French based on content/function word classification to determine prosodic constituents referred to as prosodic groups. The length and relative location of these prosodic groups is then used to determine phrase break locations that are marked with a pause. Our earlier work drew on this scheme for predicting phrase boundaries in English: a Markov model was developed to predict phrase breaks by representing the sequence of prosodic groups and breaks as a Markov chain (Veilleux, Ostendorf, Price, and Shattuck-Hufnagel 1990). An advantage of these approaches is that they only require a small dictionary of function words to assign part-of-speech labels. Motivated by similar principles and using only a 300-word dictionary, O'Shaughnessy (1989) proposes a somewhat more sophisticated parser for English based on function word identification, number agreement, and suffix identification. O'Shaughnessy's work differs from the other approaches in that his goal is a syntactic parse, though not complete, and he does not address the issue of differences between prosody and syntax.</Paragraph>
    <Paragraph position="2"> At the other end of the spectrum are approaches based on the hypothesis that prosodic phrase boundaries can be predicted by rule from a full syntactic parse. Gee and Grosjean (1983) developed a rule-based system, called the Phi Algorithm, to predict psycholinguistic &amp;quot;performance structures&amp;quot; that are represented by assigning an integer number corresponding to boundary salience between each pair of words. Constituent length information is incorporated primarily through the application of their verb balancing rule, which splits the verb phrase and groups the verb with either the previous or subsequent material, subject to syntactic constraints. Gee and Grosjean developed their Phi Algorithm only to predict performance structures. However, their work has been extended to prosodic phrase prediction for speech synthesis applications by Bachenko and Fitzpatrick (1990), who explicitly find prosodic phrase breaks  M. Ostendorf and N. Veilleux Hierarchical Stochastic Model for Automatic Prediction from derived boundary salience indices. They relax many of the constraints on the use of the verb rule and propose a Verb Adjacency Rule, so their algorithm requires a fairly detailed parse, although not a complete one. One of the relaxed constraints obviates the need for clause information. Altenberg (1987) has also proposed an algorithm for prediction of phrase boundary locations (specifically, tone unit boundaries for British English) by rule from syntactic structure and semantic information. However, the detailed information required for the algorithm cannot currently be acquired automatically from text.</Paragraph>
    <Paragraph position="3"> Departing from these approaches, Wang and Hirschberg (1992) have recently used binary decision trees to predict the presence or absence of a prosodic break at each word boundary in a sentence. They consider a range of input variables, including textderived information such as detailed POS labels and syntactic constituent structure, and in some experiments, acoustic information. POS labels were given by Church's tagger (Church 1988) and syntactic constituents by Hindle's parser (Hindle 1987). The acoustic information (previous boundary location, pitch accent location, and phrase duration), which was based on hand-labeled prosodic markers, did not improve performance but resulted in a much smaller tree for prediction.</Paragraph>
    <Paragraph position="4"> All of these approaches have influenced the model proposed here. For example, we investigate simple content/function word POS assignment, as in Sorin, Larreur, and Llorca (1987). Like Wang and Hirschberg (1992), we use decision trees to automatically determine the important factors influencing phrase break location. In addition, all of the above works have influenced the choice of factors and questions incorporated in the decision tree. Two important differences in our approach include a stochastic model to capture variability and an explicit representation of a linguistically motivated hierarchy. Of course, whether it is effective and/or efficient for a computational model to reflect a linguistic hierarchy is an empirical question.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML