XML Viewer - h90-1018

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/90/h90-1018_intro.xml
Size: 7,925 bytes
Last Modified: 2025-10-06 14:04:54
<?xml version="1.0" standalone="yes"?>
<Paper uid="H90-1018">
  <Title>Hardware for Hidden Markov-Model-Based, Large-Vocabulary Real-Time Speech Recognition</Title>
  <Section position="3" start_page="82" end_page="85" type="intro">
    <SectionTitle>
2 Architecture
</SectionTitle>
    <Paragraph position="0"> Fig. 1 shows the overall architecture of the recognition hardware. The phone processing system updates the state probabilities using the Viterbi algorithm, while the grammar processing system takes care of the transition between phones. The communication between these subsystems is done using &amp;quot;grammar nodes&amp;quot;. Associated with a grammar node is a probability that gives the probability function that a phone starts (source grammar node) or that a phone ends (destination grammar node). These nodes are not states in the hidden Markov model, which means, a transition into a grammar node does not consume a frame delay, and they do not output a speech segment. Their purpose is solely to formalize the communication between the subsystems.</Paragraph>
    <Paragraph position="1"> The grammar subsystem multiplies the destination grammar node probabilities (DGNP) with transition probabilities to source grammar nodes (see Fig. 1). The source grammar node probability (SGNP) of a certain phone is the maximum probability of all the incoming transitions.</Paragraph>
    <Paragraph position="2"> The recognition hardware is partitioned according to Fig. 2: The phone processing system and the part of the grammar system that computes the best SGNP is implemented on a custom board using application-specific integrated circuits. The computation of the product of the DGNP with the transition probability is performed on general-purpose hardware. Thus, different algorithms to dynamically derive the transition probabilities between phones can be implemented on the general-purpose hardware while the computationally most intensive part of the grammar system, finding the best SGNP, can be done with custom VLSI hardware.</Paragraph>
    <Paragraph position="3"> Fig. 3 shows the overall architecture of the custom board. At any given frame two processes, each implemented with three custom VLSI processors, are operating in parallel.</Paragraph>
    <Paragraph position="4"> One process computes the state probabilities of active phones that are listed in the Activeword Memory (Viterbi process) while the other process generates a list of active phones for the next frame (ToActiveWord process).</Paragraph>
    <Section position="1" start_page="82" end_page="84" type="sub_section">
      <SectionTitle>
2.1 Viterbi process
</SectionTitle>
      <Paragraph position="0"> The Viterbi process sequentially reads active phones from the ActiveWord Memory and computes their state probabilities. Based on a pruning threshold derived from the best state probability of that current frame, the Viterbi process decides whether the phoneme should stay active in the next frame and/or whether it has a high probability to end so that succeeding phonemes can be activated.</Paragraph>
      <Paragraph position="1"> Based on this decision, information associated with this phone is sent to the ToActiveWord processor and/or to the general-purpose grammar processor. To prevent arithmetic overflow, the Viterbi process also normalizes probabilities based on the best state probability of the previous frame.</Paragraph>
      <Paragraph position="3"> The model parameters that describe the topology of phonemes are partitioned into two memories. One memory is located on the prob and the back processor (see Fig. 3) and describes the graph of the hidden Markov chain for certain prototype phonemes. This description can span up to 128 states, partitioned into up to 32 prototype phonemes.</Paragraph>
      <Paragraph position="4"> The other memory is an off-chip static memory that contains the transition probabilities of up to 64,000 unique phonemes. Thus, the topology of a phoneme is defined with a 5bit value to indicate the graph and a 16-bit address that specifies the transition probabilities. null To reduce the memory bandwidth the processors contain a dual-ported register file to cache the state probabilities of the previous frame (see \[1\]).</Paragraph>
    </Section>
    <Section position="2" start_page="84" end_page="84" type="sub_section">
      <SectionTitle>
2.2 ToActiveWord process
</SectionTitle>
      <Paragraph position="0"> The ToActiveWord process has two inputs: it gets information from the Viterbi process associated with phones that were active in the current frame and should stay active in the next frame. The other input is from the grammar processor that gives information about phonemes that are newly activated because their predecessor phonemes had a high probability to end. Given these inputs, the ToActiveWord process generates a list of active phonemes for the next frame. A certain phoneme can be activated several times because it might be activated by the grammar processor as well as by the Viterbi process.</Paragraph>
      <Paragraph position="1"> Also, the grammar processor could activate the phoneme several times, especially if it is the first phoneme of a word with several predecessor words that have a high probability to end. To avoid replication in the ActiveWord Memory, the ToActiveWord process merges all these different instances of an active phoneme into one, based on the best probability that this phone starts.</Paragraph>
    </Section>
    <Section position="3" start_page="84" end_page="84" type="sub_section">
      <SectionTitle>
2.3 Caching model parameters
</SectionTitle>
      <Paragraph position="0"> To decrease the amount of memory on the system board, we use a caching scheme for the output probabilities, the parameters with the biggest storage requirements: only a small subset of these parameters-the sub-set that corresponds to the output probabilities for a given speech segment-are loaded onto the board. This loading operation is overlapped with the processing of the frame whose output probabilities had been downloaded in the previous frame. With this approach it is possible to use different modeling techniques for computing the output probability distributions. The current approach is to use as many as four independent discrete probability distributions that are stored and combined on the &amp;quot;Output Distribution Board.&amp;quot; Other modeling approaches such as continuous distributions and tied mixtures are also possible, as long as the probabilities can be computed and loaded in real time.</Paragraph>
    </Section>
    <Section position="4" start_page="84" end_page="85" type="sub_section">
      <SectionTitle>
2.4 Switching processors
</SectionTitle>
      <Paragraph position="0"> A frame is processed if the Viterbi process finished the computation of the state probabilities of the active phones in the Active-Phone Memory and if the ToActiveWord process finished the generation of the list of active phones for the next frame. Conceptually, the ActiveList Memories as well as the memories containing the state probabilities have to be swapped before the next frame can be processed. However, instead of swapping the memories, we activate a second set of processing elements that are connected to the memories in the right way. Fig.</Paragraph>
      <Paragraph position="1"> 4 sketches this principle. During frame A the ToActiveWord process A is active and builds up the ActiveWordMemoryA. Simultaneously, ViterbiB is active and processes the active phonemes listed in the Active-WordMemoryB. In the next frame, ViterbiB  and ToActiveWordA are inactive and ViterbiA and ToActiveWordB are active. This way, no multiplexors are needed to swap memories. All that is required is to activate the right set of processors. This approach also has the advantage that the complete system is symmetric: the subsystem that has the elements A is identical to the subsystem with elements B.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML