File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/w98-0803_intro.xml

Size: 1,553 bytes

Last Modified: 2025-10-06 14:06:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-0803">
  <Title>Speech Annotation by Multi-sensory Recording</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> 10 20 30 40 50 This paper describes our effort to mark read speech based on multi-sensor)' recording. Although speech data are available for many languages (e.g. English, Putonghua, etc.), Cantonese speech data are still rare. Annotation of speech by hand is a tedious task, subject to errors and consistency problems. Therefore, we aim to annotate read Cantonese speech automatically. However, automatic annotation can be difficult even though the pronunciation of the speech sound is known because certain phonetic events are difficult to detect (e.g.</Paragraph>
    <Paragraph position="1"> plosives). We have adopted the multi-sensory technique developed for annotating English for Cantonese, after Chan and Fourcin \[1\].</Paragraph>
    <Paragraph position="2"> Apart from annotation, marking speech data is also important for general speech analysis. For example, pitch synchronous Fourier transform can take the advantages of both a wide-band and a narrow-band Fourier transform where finer details of the spectrum are more apparent with pitch synchronous transform (Figure 1).</Paragraph>
    <Paragraph position="3">  on 64-point FFT with autocorrelation at double the pitch period (b) 64-point FFT with fixed window size of 200 and overlap of 50 sample points.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML