File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-0602_intro.xml
Size: 3,531 bytes
Last Modified: 2025-10-06 14:00:59
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-0602"> <Title>Some Challenges of Developing Fully-Automated Systems for Taking Audio Comprehension Exams</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> There is currently interest in using reading comprehension exams to evaluate natural language processing (NLP) systems. Reading comprehension tests are designed to help evaluate a reader's understanding of a written passage and are thus an example of a text-based language processing task. Audio comprehension tests, on the other hand, are designed to help evaluate a listener's understanding of a spoken passage and are an example of a spoken language processing task. These tests are frequently a key component of language competency exams, such as the Test of English as a Foreign Language (TOEFL) in the United States.</Paragraph> <Paragraph position="1"> In this paper, we focus on some of the future challenges of developing fully-automated techniques for audio comprehension, in which the system developed processes the exam passages (and possibly questions) from the original audio source. Audio comprehension provides an excellent example of an und@rstanding-based evaluation paradigm for speech systems, in which the emphasis is not solely on &quot;getting all the words right&quot; but rather on using speech recognition technology to automatically accomplish a task with a human benchmark: answering questions about a natural language story. The traditional paradigm for spoken language processing tasks, such as audio comprehension, has consisted largely of applying an existing text-based system to the hypothesis words output by an automatic speech recognition (ASR) system, ignoring the fact that information is lost due to recognition errors when moving from text to speech and the possibility that it can be regained in part via word confidence prediction.</Paragraph> <Paragraph position="2"> We believe that successful approaches to audio comprehension will tackle the speech problem directly, by avoiding the use of features that are characteristic of written text and by explicitly addressing the problem of speech recognition errors through the use of smoothing techniques and word confidence information. Preliminary research in fully-automated techniques for reading comprehension, such as the Deep Read system developed by Hirschman et al.</Paragraph> <Paragraph position="3"> (1999), has included many standard NLP components, such as part-of-speech tagging, coreference/pronoun resolution, proper name finding, and morphological analysis (stemming). While the techniques that are being developed for reading comprehension provide a starting point, these techniques cannot be effectively applied to audio comprehension exams directly, because of the nature of differences between written and spoken language data. In this paper we address three specific challenges in developing audio comprehension system: * Fundamental differences between text-based data and spoken language data (Section 2) In our discussion we will use examples taken from television and radio broadcast news, a &quot;found&quot; source of audio passages with a virtually unlimited vocabulary and a wide range of opportunities for audio comprehension e~aluation. All ASR transcriptions we use will be actual output from a broadcast news ASR system with a word error rate of 30%.</Paragraph> </Section> class="xml-element"></Paper>