File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/w99-0402_intro.xml

Size: 4,671 bytes

Last Modified: 2025-10-06 14:06:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0402">
  <Title>ELICITING NATURAL SPEECH FROM NON-NATIVE USERS: COLLECTING SPEECH DATA FOR LVCSR</Title>
  <Section position="3" start_page="0" end_page="5" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> As part of work in improving speech recognition performance for non-native speakers, we wanted to develop a database that captures ways in which non-native language use differs from native language use in a specific domain. Features we were interested in include pronunciation, lexical choice, syntax, expressive goals, and strategies speakers use when they are unsure of the appropriate English expression. We wanted the recorded data to be appropriate for LVCSR system training, which means that the signal quality should be good and the speech should be as close as possible in terms of style and content to speech that will be used in the target application, a tourist information query system.</Paragraph>
    <Paragraph position="1"> We also wanted to elicit data which would contain examples of systematic and unsystematic variation in the speech of low- to mid-fluency non-native speakers.</Paragraph>
    <Paragraph position="2"> One of the most interesting aspects of these experiments was the ways in which we found  ourselves needing to adapt our usual data collection strategies to the needs of our speakers, whose English abilities varied from beginning to near-native. It is important to be aware of a number of assumptions that are commonly made which do not necessarily hold for non-native speakers, and which it is important to address when designing a data collection protocol. null The act of speaking is not difficult.</Paragraph>
    <Paragraph position="3"> When recording native speakers speaking spontaneously for standard LVCSR projects (that is, not projects geared towards special populations or difficult tasks), it is assumed that the the act of speaking does not in and of itself represent a major cognitive load for the speaker. This can be very untrue of non-native speakers, and we had several speakers ask to quit in the middle of the recording because they felt unable to continue. The researcher needs to make a decision about what to do in such a situation, and possibly prepare an alternate task.</Paragraph>
    <Paragraph position="4"> There is little risk of alienating the community. Local communities of non-native speakers are not always large, and if it is close knit, word can quickly spread if the task is too hard or embarassing. Also, it is important to de-emphasize the fact that we are interested, among other things, in imperfections in the speaker's speech, or risk offending the community.</Paragraph>
    <Paragraph position="5"> The task is not perceived as a test.</Paragraph>
    <Paragraph position="6"> Again, when speaking spontaneously, few native speakers of nonstigmatized varieties of English would feel that they are being evaluated on the correctness of their speech. Many non-native speakers will feel tested, and as this can make them nervous and affect their speech, it is important to reassure them as far as possible that they are not being tested and that the data is being anonymized.</Paragraph>
    <Paragraph position="7"> The speaker knows what to say. Most spontaneous collection tasks are chosen because they are tasks speakers can be expected to have done before and be comfortable with. Although a non-native speaker has probably made an airplane reservation in his native language before, it is entirely possible that he has never done so in the target language, and does not have a good idea of what he should say in that situation. If he were really planning to make an airplane reservation in the target language, he would probably think about what to say in advance and might even ask someone, which he may not have a chance to do during the data collection. This undermines the representativeness of the database.</Paragraph>
    <Paragraph position="8"> We carried out a number of exploratory experiments to try to determine the format which was the most comfortable for the speaJ~ers and which resulted in elicitation of the most natural data; two of these experiments are described in Section 3. For these experiments we worked with native speakers of Japanese. The protocol that we settled on, which we feel is very effective for non-native speakers, is described in Section 4. Although transcription and analysis of this data is at the beginning stages, we have already seen patterns that will be useful for developing acoustic and language models. Examples are shown in Section 5.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML