File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/ackno/86/p86-1025_ackno.xml

Size: 5,704 bytes

Last Modified: 2025-10-06 13:51:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="P86-1025">
  <Title>JAPANESE PROSODIC PHRASING AND INTONATION SYNTHESIS</Title>
  <Section position="7" start_page="177" end_page="178" type="ackno">
    <SectionTitle>
ACKNOWLEDGEMENTS
</SectionTitle>
    <Paragraph position="0"> Ken Church, Julia Hirschberg, and Mitch Marcus gave useful comments on earlier drafts of this paper.</Paragraph>
    <Paragraph position="1"> APPENDIX: GLOSSARY eatathesis. A sudden compression of pitch range that is triggered by a particular tonal configuration, and that lowers all tones following the trigger within some phrasal unit. In Japanese, catathesis is triggered by every accent, and in English, by every bitonal pitch accent.</Paragraph>
    <Paragraph position="2"> declination. &amp;quot;A gradual lowering of the pitch range that is effected as some function of time from the beginning of an utterance without regard to the tonal structure.</Paragraph>
    <Paragraph position="3"> final lowering. A gradual lowering of the pitch range starting at some distance from the end of the utterance.</Paragraph>
    <Paragraph position="4"> fundamental frequency. The reciprocal of the period in a periodic signal, and the main physical correlate of pitch. Fundamental frequency is abbreviated fO and is measured in periods per second (unit hertz). In speech, f0 corresponds to the frequency of vibration of the vocal cords during voiced segments.</Paragraph>
    <Paragraph position="5"> H. A high tone.</Paragraph>
    <Paragraph position="6"> The rates of the declination and of the final lowering and the size of the smoothing window are speaker- and rate-specific variables like the reference line, and are treated in the same way in the synthesis program.  hlgh-tone llne. In Japanese tone-scaling, the upper bound of the pitch range. Its f0 value corresponds to that of a hypothetical highest possible H tone in that range.</Paragraph>
    <Paragraph position="7"> intonational phrase. A prosodic unit delimited phonologically by some sort of intonational feature such as a boundary tone. L. A low tone.</Paragraph>
    <Paragraph position="8"> LPC coding. A specification of the spectral characteristics of a signal in terms of sets of linear predictor coefficients at fixed  the utterance is a question ending in a H% boundary tone. (3) The contour is smoothed by convolution with a syllable-sized square window. (4) Jitter is added and f0 values excised during voiceless segments It\], Ill, and \[k I. (5) The f0 contour of the original utterance is shown for comparison with (4).</Paragraph>
    <Paragraph position="9"> intervals. An nth-order analysis of the signal is obtained by a least squares estimation of successive samples within an analysis frame from the linear combination of the last n samples. The set of predictor coefficients for each analysis frame can then be used as a filter for an input pulse train to synthesize a new signal with the same spectral pattern and an arbitrarily different f0 pattern.</Paragraph>
    <Paragraph position="10"> pitch accent. A tonal configuration that is associated to a designated syllable in an utterance, and that marks the syllable (or the word containing the syllable) as accented or intonationally prominent. In Japanese, accent consists of a pitch fall from H tone to L at a lexically designated syllable in a word. In English, an accent is any one of six tonal patterns (H*, L*, H*+L, L*+H, H+L*, L+H*) that can be associated to a lexically designated syllable.</Paragraph>
    <Paragraph position="11"> pitch range. The spread of fundamental frequency between the &amp;quot;floor&amp;quot; of a speaker's voice and the highest f0 appropriate to the occasion. Linguistic factors such as prominence or intonational focus (see Section 1.2) can locally affect pitch range, but it is determined overall by paralinguistic factors such as degree of animation and projection; the overall pitch range is raised or expanded when the speaker &amp;quot;speaks up&amp;quot; to project his voice, or when he is excited.</Paragraph>
    <Paragraph position="12"> prosody. The rhythm and melody of speech as specified phonologically in the representation of its phrasal organization and intonational structure, and as realized phonetically in duration and loudness and pitch patterns.</Paragraph>
    <Paragraph position="13"> reference line. In Japanese tone-scaling, the bottom of the pitch range, corresponding to the lowest possible f0 value for a tone in a speaker's pitch range.</Paragraph>
    <Paragraph position="14"> standard Japanese. The speech of educated Tokyo speakers, as prescribed by the Japanese Broadcasting Corporation.</Paragraph>
    <Paragraph position="15"> stress. A local non-tonal prominence on a lexically designated syllable in an English word, which is realized phonetically in the rhythmic pattern of relative lengths and loudnesses, and also by certain segmental patterns such as vowel and consonant lenition.</Paragraph>
    <Paragraph position="16"> tone. The basic phonological element representing distinctive events in the melody -- i.e., the melodic counterpart of a phonemic segment in the text string. We believe that these melodic segments are target pitch level specifications such as &amp;quot;hiuh&amp;quot; and &amp;quot;low&amp;quot; rather than specifications of pitch change such as &amp;quot;rise&amp;quot; and &amp;quot;fall&amp;quot;. (See Pierrehumbert and Beckman (forthcoming) for detailed arguments on this point.) In both English and Japanese, there are two tone types -- H and L -and the type of each tone in an utterance, and its temporal location and f0 value reflect the prosodic phrasing and intonational focus structure of the utterance.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML