XML Viewer - w99-0402

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/99/w99-0402_relat.xml
Size: 4,393 bytes
Last Modified: 2025-10-06 14:16:09
<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0402">
  <Title>ELICITING NATURAL SPEECH FROM NON-NATIVE USERS: COLLECTING SPEECH DATA FOR LVCSR</Title>
  <Section position="4" start_page="5" end_page="6" type="relat">
    <SectionTitle>
2 Related Work
</SectionTitle>
    <Paragraph position="0"> Byrne et al.(Byrne and others, 1998) describe a conversational English data collection protocol with native speakers of Spanish as its targets. They identified their speakers with one of three skill levels and had them perform level-appropriate tasks designed to elicit specific grammatical structures. Participants spoke over the telephone with other non-native speakers, forcing them to communicate using speech. They found that this was an effective way to elicit spontaneous speech from non-native speakers of all fluency levels in a purely conversational domain.</Paragraph>
    <Paragraph position="1"> A number of studies discuss techniques for collecting spoken data from non-native speakers in the context of a language tutoring system. Most such systems ((Eskenazi, 1997; Witt  and Young, 1997; Kawai and Hirose, 1997) are examples) ask users to read a prompt or narrowly constrain what the user is allowed to say. Neumeyer et al. (Neumeyer et al., 1998) describe a system that evaulates students' pronunciation in text-independent speech. They collected a database of read speech, both newspaper and conversational sentences, and imitated speech, in which students imitated the speech of native speakers; as subjects, they used American students of French.</Paragraph>
    <Paragraph position="2"> Aist et al. (Aist and others, 1998) discuss considerations in collecting speech from children, pointing out that children may be uncooperative and easily bored, and may have difficulty reading. They describe an unsupervised data collection method in which recognized speech is compared to the transcript that the child is expected to read, and utterances in which part or all of hypothesis match the transcript are used for additional system training. This type of technique is not as effective for a system that handles completely spontaneous queries, but their observations about children's abilities (especially articulatory and reading difficulties) and reaction to formalized data collection parallel ours in our study of non-native speakers. Outside the field of speech recognition, much research has been done into methods for eliciting natural speech. Briggs (Briggs, 1986) emphasizes the importance of understanding the meaning of the speech event for the speaker.</Paragraph>
    <Paragraph position="3"> Recording for a research project may be a familiar event for the researcher, but not for the speaker. Reading aloud is commonplace in American schools, but participants of different backgrounds may be intimidated or even offended when asked to read aloud. While native speakers of English certainly vary in their comfort reading and speaking, when the researchers are also native speakers of English, there are far fewer cultural variables that can lead to misunderstanding and compromise the integrity of the data.</Paragraph>
    <Paragraph position="4"> In his description of the field methodology in the project on linguistic change and variation, Labov (Labov, 1984) describes a number of issues in spoken data collection, mentioning among other things the long-term relationship with the speaker pool. This is of course important for both longitudinal studies; also, when studying the speech of a restricted group, it is important that people do not come out of the data collection experience feeling that they have been objectified or misunderstood. Labov returns to this point in the context of ethical considerations in data collection.</Paragraph>
    <Paragraph position="5"> What exactly does &amp;quot;natural speech&amp;quot; mean in the case of the non-native speaker? Wolfson (Wolfson, 1976) defines the notion of natural speech &amp;quot;as properly equivalent to that of appropriate speech; as not equivalent to unselfconscious speech.&amp;quot; That is, in some situations, it is natural to speak carefully, and that careful speech in such contexts should not be considered unnatural. For semi-fluent non-native speakers, whether they are at a real information desk or recording a contrived scenario, their speech will most likely be planned.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML