XML Viewer - n04-1028

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-1028_intro.xml
Size: 8,133 bytes
Last Modified: 2025-10-06 14:02:15
<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-1028">
  <Title>Non-Native Users in the Let's Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.1 Spoken Dialogue Systems and Non-Native
Speakers
</SectionTitle>
      <Paragraph position="0"> Spoken dialogue systems rely on models of human language to understand users' spoken input. Such models cover the acoustic and linguistic space of the common language used by the system and the user. In current systems, these models are learned from large corpora of recorded and transcribed conversations matching the domain of the system. In most of the cases, these corpora are gathered from native speakers of the language because they are the main target of the system and because developers and researchers are often native speakers themselves. However, when the common language is not the users' native language, their utterances might fall out of this &amp;quot;standard&amp;quot; native model, seriously degrading the recognition accuracy and overall system performance. As telephone-based information access systems become more common and available to the general public, this inability to deal with non-native speakers (or with any &amp;quot;non-standard&amp;quot; subgroup such as the elderly) is a serious limitation since, at least for some applications, (e.g. tourist information, legal/social advice) non-native speakers represent a significant portion of the everyday user population.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.2 Previous Work on Non-Native Speech
Recognition
</SectionTitle>
      <Paragraph position="0"> Over the past ten years, extensive work has been done on non-native speech recognition. Early research aimed at endowing Computer Assisted Language Learning software with speech recognition capabilities (e.g. (Eskenazi and Hansma, 1998), (Witt and Young, 1997)). Usually such systems are targeted at one specific population, that is, people who share the same native language (L1). Thus, most research in non-native speech recognition uses knowledge of the L1, as well as databases of accented speech specially recorded from speakers of the target population. Ideally, by training acoustic models on target non-native speech, one would capture its specific characteristics just as training on native speech does. However collecting amounts of non-native speech that are large enough to fully train speaker-independent models is a hard and often impractical task. Therefore, researchers have resorted to using smaller amounts of non-native speech to retrain or adapt models that were originally trained on large corpora of native speech. As for native speech, such methods were mostly applied to read speech, with some success (e.g. (Mayfield Tomokiyo and Waibel, 2001)).</Paragraph>
      <Paragraph position="1"> Unfortunately, we know from past research on native speech recognition that read speech models perform poorly on conversational speech (Furui, 2001), which is the style used when talking to spoken dialogue systems.</Paragraph>
      <Paragraph position="2"> A few studies have built and used databases of non-native conversational speech for evaluation (Byrne et al., 1998), and training (Wang and Schultz, 2003).</Paragraph>
      <Paragraph position="3"> In all those cases, the native language of the speaker is known in advance. One exception is (Fischer et al., 2001) who apply multilingual speech recognition methods to non-native speech recognition. The authors train acoustic models on a database comprising native speech from five European languages (English, Spanish, French, German and Italian) and use them to recognize non-native English from speakers of 10 European countries. However, their task is the recognition of read digit strings, quite different from conversational speech.</Paragraph>
      <Paragraph position="4"> Also, because of the difficulty researchers have to record large amounts of spontaneous non-native speech, no thorough study of the impact of the linguistic differences between native and non-native spontaneous speech has been conducted to our knowledge. The two spontaneous non-native speech studies cited above, report perplexity and out-of-vocabulary (OOV) word rate (for (Wang and Schultz, 2003)) but do not provide any analysis. null In this paper, while acknowledging the importance of acoustic mismatch between native models and non-native input, we focus on linguistic mismatch in the context of a task-based spoken dialogue system. This includes differences in word choices which influences the number of OOV words, and syntax which affects the performance of the speech recognizer's language model and of the natural language understanding (NLU) grammar.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.3 Non-Native Speakers as Language Learners
</SectionTitle>
      <Paragraph position="0"> All the research on non-native speech recognition described in the previous section sees non-native speakers as a population whose acoustic characteristics need to be modeled specifically but in a static way, just like one would model the acoustics of male and female voices differently. A different approach to the problem is to see non-native speakers as engaged in the process of acquiring the target language's acoustic, phonetic and linguistic properties. In this paradigm, adapting dialogue systems to non-native speakers does not only mean being able to recognize and understand their speech as it is, but also to help them acquire the vocabulary, grammar, and phonetic knowledge necessary to fulfill the task the system was designed for.</Paragraph>
      <Paragraph position="1"> This idea follows decades of language teaching research that, since the mid sixties, has emphasized the value of learning language in realistic situations, in order to perform specific tasks. Immersion is widely considered as the best way to learn to speak a language and modern approaches to foreign language teaching try to mimic its characteristics. If the student cannot be present in the country the language is spoken in, then the student should be put into a series of situations imitating the linguistic experience that he/she would have in the target country.</Paragraph>
      <Paragraph position="2"> Thus, most current language teaching methods, following the Communicative Approach (Littlewood, 1981) have focused on creating exercises where the student is forced to use language quickly in realistic situations and thus to learn from the situation itself as well as from reactions to the student's actions.</Paragraph>
      <Paragraph position="3"> From a different viewpoint, (Bortfeld and Brennan, 1997) showed in a psycholinguistic study that non-native speakers engaged in conversation-based tasks with native speakers do not only achieve the primary goal of the task through collaborative effort but also acquire idiomatic expressions about the task from the interaction.</Paragraph>
      <Paragraph position="4"> The research described in this paper, has the dual goal of improving the accessibility of spoken dialogue systems to non-native speakers and of studying the usability of a computer for task-based language learning that simulates immersion.</Paragraph>
      <Paragraph position="5"> The next section gives an overview of the CMU Let's Go!! bus information system that we built and use in our experiments. Section 3 describes and analyzes the results of experiments aimed at comparing the accuracy of speech recognition and the quality of language modeling on both native and non-native data. Section 4 describes the use of automatically generated confirmation prompts to help the user speak the language expected by the system. Finally, section 5 draws conclusions and presents future directions of research.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML