File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/89/h89-2070_abstr.xml

Size: 3,399 bytes

Last Modified: 2025-10-06 13:46:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="H89-2070">
  <Title>PILOT PROJECT ON SPEECH RECOGNITION USING LAYERED ABDUCTION AND MULTIPLE KNOWLEDGE TYPES</Title>
  <Section position="1" start_page="0" end_page="454" type="abstr">
    <SectionTitle>
PILOT PROJECT ON SPEECH RECOGNITION USING
LAYERED ABDUCTION AND MULTIPLE KNOWLEDGE TYPES
</SectionTitle>
    <Paragraph position="0"> J. Josephson, M. Beckman, R. Fox, A. Krishnamurthy, T. Patten, L. Feth, B. Chandrasekaran The Ohio State University, Columbus, OH 43210 Our long-term goal is to test the feasibility of basing a recognition system on a model of human speech processing. Key hypotheses are: (1) Computational models for reasoning from incomplete knowledge provide a useful metaphor for many aspects of human speech understanding even at levels where highly automatic perceptual processes are at work; (2) A computational model of  human cochlear processing must be used as the signal-processing front end; (3) Some representation of articulation should mediate between the acoustics and the phonology in order to accommodate contextual variation of various sorts; (4) The phonological representation must encode the prosodic structure  and intonation pattern as well as the phoneme string. Accordingly our specific short-term goal is to build a small prototype system that uses a layered-abduction architecture, in which there are many stages of processing, corresponding to different levels of knowledge. The common information-processing task at each stage is to form a coherent, composite (multi-part) hypothesis that explains the data presented from the preceding levels. The input signal will be the digitized speech processed by Patterson's Stabilized Auditory Image system. An articulatory representation based on Browman and Goldstein's gestural score will mediate between the auditory representation and the phonology, and Pierrehumbert and Beckman's prosodic tree and tone string will be used for the phonological organization and intonational melody. We are nearing completion of an intial two-level system incorporating a stratum of acoustic features, and a stratum of phonological features, for a small vocabulary of CV monosyllables produced by one speaker. An abduction machine, with distinct elements of recognition, classification, and composite hypothesis formation, controls all the processing beyond some rudimentary initial signal processing; it even does the formant tracking. Our next step is to incorporate the Stabilized Auditory Image system as the signal-processing front end for the acoustic stratum in our initial two-level system. We have also begun to write gestural scores for a small vocabulary of monosyllables and disyllables on the basis of articulatory data that we recorded for our one speaker at the University of Wisconsin's X-ray microbeam facility. We will use these scores in building a separate system with three strata representing articulation, phonology, and a lexicon. This will provide a first test of a three-level machine; it will incorporate some top-down processing, and will provide the first opportunities to test the computational efficiency of the model. We are using these articulatory data also to determine the mapping between auditory features and articulatory gestures in preparation for merging the two systems into a single four-level system.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML