File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/98/p98-1094_abstr.xml

Size: 3,846 bytes

Last Modified: 2025-10-06 13:49:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1094">
  <Title>A concurrent approach to the automatic extraction of subsegmental primes and phonological constituents from speech</Title>
  <Section position="2" start_page="0" end_page="578" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> We demonstrate the feasibility of using unary primes in speech-driven language processing. Proponents of Government Phonology (one of several phonological frameworks in which speech segments are represented as combinations of relatively few subsegmental primes) claim that primes are acoustically realisable. This claim is examined critically searching out signatures for primes in multi-speaker speech signal data. In response to a wide variation in the ease of detection of primes, it is proposed that the computational approach to phonology-based, speech-driven software should be organised in stages. After each stage, computational processes like segmentation and lexical access can be launched to run concurrently with later stages of prime detection.</Paragraph>
    <Paragraph position="1"> Introduction and overview In SS 1, the subsegmental primes and phonological constituents used in Government Phonology (GP) are described, and the acoustic realisability claims which make GP primes seem particularly attractive to developers of speech-driven software are summarised. We then outline an approach to defining identification signatures for primes (SS 2). Our approach is based on cluster analysis using a set of acoustic cues chosen to reflect familiar events in spectrograms: plosion, frication, excitation, resonance... We note that cues indicating manner of articulation, which change abruptly at segment boundaries, are computationaUy simple, while those for voicing state and resonance quality are complex and calculable only after signal segmentation. Also,  the regions of cue space where the primes cluster (and which serve as their signatures) are disconnected, with separate sub-regions corresponding to the occurrence of a prime in nuclear or non-nuclear segmental positions.</Paragraph>
    <Paragraph position="2"> A further complication is that GP primes combine asymmetrically in segments: one prime - the HEAD of the combination being more dominant, while the other element(s) - the OPERATORS(S) - tend to be recessive. This is handled by establishing in cue space a central location and within-cluster variance for each prime. The training sample needed for this consists of segments in which the prime suffers modification only by minimal combination with others, i.e on its own, or with as few other primes as possible. Then, when a segment containing the prime in less than minimal combination is presented for identification, its location in cue space lies within a restricted number of units of within-cluster variance of the central location of the prime cluster. The number of such distance units determines headedness in the segment, with separate thresholds for occurrence as head and as operator.</Paragraph>
    <Paragraph position="3"> In SS 3 we describe in more detail the stagewise procedure for identifying via quadratic discriminants the primes present in segments. At each stage, we detail the computational processes which are driven by the partial identification achieved by theend of the stage. The processes include segmentation, selection of lexical cohort by manner class, detection of constituent structure, detection and repair of the effects of phonological processes on the speech signal. The prototype, speaker-independent, isolated-word automatic speech recognition (ASR) system is described in SS 4. Called 'PhonMaster', it is  implemented in C++ using objects which perform separate stages of lexical access and process repair concurrently.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML