XML Viewer - h92-1057

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/h92-1057_intro.xml
Size: 4,216 bytes
Last Modified: 2025-10-06 14:05:17
<?xml version="1.0" standalone="yes"?>
<Paper uid="H92-1057">
  <Title>Experimental Results for Baseline Speech Recognition Performance using Input Acquired from a Linear Microphone Array</Title>
  <Section position="2" start_page="0" end_page="285" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> It is widely accepted that appropriate data-acquisition technology must be available in order to make speech-recognition a viable computer input mode \[1, 2, 3\]. While work has been done in the area of signal conditioning \[4\], for the last three years, research at Brown University has been in progress to develop hardware, software and algorithms as a means to make non-intrusive speech acquisition a practical reality \[5, 6\] Principal focus to date has been to use the phase relationships among a group of microphones spaced in a line - hence a linear array - for the remote, real-time acquisition of a talker's data. Various beamforming and talker location/tracking algorithms have been studied, reported, and evaluated relative to listening quality \[7, 8, 9, 10, 11, 12\] The quality of a speech data acquisition system may be assessed in several ways. For many applications, evaluation is usually given, quantitatively, in terms of some signal-to-noise measure or human-listening experiment score, or qualitatively in terms of human evaluation. However, for a system whose output is fed to a speech recognizer, the *Supported principally by NSF/DARPA Grant No. IRI-8901882</Paragraph>
    <Section position="1" start_page="0" end_page="285" type="sub_section">
      <SectionTitle>
Division of Engineering
Brown University
</SectionTitle>
      <Paragraph position="0"> Providence, R102912 recognition performance is an excellent, quantifiable measure; this approach and its results make up the body of this paper. A key problem for such systems to overcome is that of reverberation. Acoustic reflections in a normal room environment make the output of a remote microphone quite different from that taken from the normal, close-talking, recognizer microphone. Several ways have been suggested to alleviate this problem: * A more focused array system will attenuate reflections coming from a wider off-axis volume\[13\]. Many microphones are required to do this, and a system with beana-width control over a broad spectrum and in two or three directions is essential. This is the spatial-filtering approach to solving the problem.</Paragraph>
      <Paragraph position="1"> . The acoustic environment near the microphones is very critical. New ways of mounting the microphones in an appropriately sound absorbent material substantially improve performance, without necessarily limiting the practicality of the array. More directional elements can also be used. This is an acoustical approach to helping to resolve the problem.</Paragraph>
      <Paragraph position="2"> * One form or another of deconvolution can be used to undo the effects of reverberations \[3, 14, 15, 16, 17, 18, 19\]. Either directly or indirectly, some characterization of the room is obtained, usually as some spatially-dependent impulse response. After this non-trivial problem is solved, some processing &amp;quot;art&amp;quot; is often essential to overcome nulls in the spectrum and perform inverse filtering.</Paragraph>
      <Paragraph position="3"> This project investigates all of the above methods. It might be added that, when working with real acoustic systems, mechanisms for reducing reverberations must be carefully applied; it is a hard problem. However, the purpose of this paper is not to deal with the improvements achieved by employing various means to dereverberate the output signal of the array; rather, it is to set a baseline standard against which to compare future developments. The problem is posed: how badly does recognizer performance degrade when the input signal is from 1) a single remote omnidirectional microphone,  or from 2) the beam:formed output from a linear microphone array? This experiment quantifies the acceptability (or lack thereof) of using relatively straightforward implementations of remote microphone technology for speech recognition.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML