File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1313_intro.xml

Size: 3,867 bytes

Last Modified: 2025-10-06 14:02:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1313">
  <Title>Combining Utterance-Boundary and Predictability Approaches to Speech Segmentation</Title>
  <Section position="2" start_page="93" end_page="94" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The design of speech segmentation1 methods has been much studied ever since Harris' seminal propositions (1955). Research conducted since the mid 1990's by cognitive scientists (Brent and Cartwright, 1996; Sa ran et al., 1996) has established it as a paradigm of its own in the eld of computational models of language acquisition.</Paragraph>
    <Paragraph position="1"> In this paper, we investigate two boundary-based approaches to speech segmentation. Such methods \attempt to identify individual word-boundaries in the input, without reference to words per se&amp;quot; (Brent and Cartwright, 1996). The rst approach we discuss relies on the utterance-boundary strategy, which consists in reusing the information provided by the occurrence of speci c phoneme sequences at utterance beginnings or endings in order to hypoth1To avoid a latent ambiguity, it should be stated that speech segmentation refers here to a process taking as input a sequence of symbols (usually phonemes) and producing as output a sequence of higher-level units (usually words).</Paragraph>
    <Paragraph position="2"> esize boundaries inside utterances (Aslin et al., 1996; Christiansen et al., 1998; Xanthos, 2004).</Paragraph>
    <Paragraph position="3"> The second approach is based on the predictability strategy, which assumes that speech should be segmented at locations where some measure of the uncertainty about the next symbol (phoneme or syllable for instance) is high (Harris, 1955; Gammon, 1969; Sa ran et al., 1996; Hutchens and Adler, 1998; Xanthos, 2003).</Paragraph>
    <Paragraph position="4"> Our implementation of the utterance-boundary strategy is based on n-grams statistics. It was previously found to perform a \safe&amp;quot; word segmentation, that is with a rather high precision, but also too conservative as witnessed by a not so high recall (Xanthos, 2004). As regards the predictability strategy, we have implemented an incremental interpretation of the classical successor count (Harris, 1955). This approach also relies on the observation of phoneme sequences, the length of which is however not restricted to a xed value. Consequently, the memory load involved by the successor count algorithm is expected to be higher than for the utterance-boundary approach, and its performance substantially better.</Paragraph>
    <Paragraph position="5"> The experiments presented in this paper were inspired by the intuition that both algorithms could be combined in order to make the most of their respective strengths. The utterance-boundary typicality could be used as a computationally inexpensive preprocessing step, nding some true boundaries without inducing too many false alarms; then, the heavier machinery of the successor count would be used to accurately detect more boundaries, its burden being lessened as it would process the chunks produced by the rst algorithm rather than whole utterances. We will show the results obtained for a word segmentation task on a phonetically transcribed and child-oriented French corpus, focusing on the e ect of the preprocessing step on precision and recall, as well as its impact on  memory load and processing time.</Paragraph>
    <Paragraph position="6"> The next section is devoted to the formal definition of both algorithms. Section 3 discusses some issues related to the space and time complexity they involve. The experimental setup as well as the results of the simulations are described in section 4, and in conclusion we will summarize our ndings and suggest directions for further research.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML