File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/89/h89-2049_evalu.xml
Size: 4,878 bytes
Last Modified: 2025-10-06 14:00:02
<?xml version="1.0" standalone="yes"?> <Paper uid="H89-2049"> <Title>SPEECH RECOGNITION IN PARALLEL</Title> <Section position="14" start_page="368" end_page="369" type="evalu"> <SectionTitle> EXPERIMENTAL HARDWARE </SectionTitle> <Paragraph position="0"> We are performing our studies on a small scale tree-structured machine, a newer version of the DADO machine that we call DADO4. The general parallel processing requirements of fast broadcast, report and min-resolve, as well as parfifionable MIMD processing and distributed (program) memory can be provided by a bus-based architecture, be it a tree or a linear bus. Indeed, the tree stnacture simply provides a high-speed global communication bus and a convenient structure for partitioning the machine. No general interconnecfion topology is necessary to execute almost decomposable search problems and thus speech recognition tasks.</Paragraph> <Paragraph position="1"> We believe tree-structures may provide the necessary communication bandwidths to deliver data to the PE's and report the final results of matching references to test patterns. For example, the prototype DADO4 is designed with 15 DSP32C chips delivering in the aggregate approximately 300 megaflops on a two board system. Each DSP32C-based DADO4 PE is clocked at 40 MHz delivenng 20 Megaflops per processor. Each PE has a full megabyte of storage (and space is available on the prototype boards to increase this memory configuration if needed).</Paragraph> <Paragraph position="2"> The DADO4 is designed with a new I/O feature, an advance over its earlier predecessors. The broadcast of 32-bit word data is handled through high speed PAL circuits allowing data to reach all PE's essentially instantaneously.</Paragraph> <Paragraph position="3"> (The use of PAL's allows us to directly experiment with other hardware I/O features that we may find useful later in our research. For example, a built-in hardware feature to allow high-speed voting of a number of recognizers may have utility.) Let us assume we sample speech at a 20KHz sampling rate with 16 bits per sample. That generates data at 40 Kilobytes per second. The DADO4 I/O circuits can broadcast blocks of data at 20 megabytes per second. Thus, we may broadcast a 200 sample block, or 100 words, in .002 centisecond. The percentage of time devoted to I/O for speech sampling at a high sampling rate is only .2% in this machine configuration. Each centisecond, nearly 3 megaflops is available for processing each frame of speech. Thus, nearly all of the 300 megaflops computing power of the ensemble of DSP chips is available for direct computation. This substantial number of operations is applied to acoustic preprocessing, matching and report operations.</Paragraph> <Paragraph position="4"> There are other reasons why we believe a tree structured topology suffices in our experimental work. Trees are not overly specialized since such architectures have been the focus of study over the years leading to a wide range of algorithms for many problems (see \[Kosaraju 89\] for a recent example). Trees are especially well suited to executing almost decomposable searching problems, as noted earlier, which we believe provides a general model for many pattern recognition tasks. Trees are also partitionable: a single Ixee can be dynamically configured into a collection or forest of subtrees. Hence, a number of concurrent recognizers can be executed effectively on trees. Trees are well known to be efficiently scalable \[Stolfo 84\]. Linear embeddings in a plane of trees argues for high density scaling and low volume devices. Constant pin out in interconnecting tree based modules argues for linear cost in scaling to larger numbers of PE's, and thus larger vocabulary speech recognizers and larger numbers of recognizers can be supported optimally. Thus, if our approach to executing multiple recognizers on tree structures succeeds, the resulting system has a significant chance of delivering realtime and low-cost dedicated speech recognition devices.</Paragraph> <Paragraph position="5"> Indeed, since nearly all parallel architecture networks have an underlying spanning tree, our work should be easily generalized and immediately applicable to any MIMD parallel computer that is partitionable. This we believe is an important advantage. Since the scalability of trees is well known, it is not unreasonable to expect that if our techniques are successful, then massive parallelism for dedicated speech recognition systems can be delivered by low cost tree-structured parallel computers. Inexpensive front-end speech recognition might therefore be possible. However, other more general-purpose parallel computing systems can also deliver speech recognition using our techniques with little difficulty by a straightforward mapping of our algorithms.</Paragraph> </Section> class="xml-element"></Paper>