File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/89/h89-2047_metho.xml

Size: 6,615 bytes

Last Modified: 2025-10-06 14:12:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="H89-2047">
  <Title>Improvements in the Stochastic Segment Model for Phoneme Recognition</Title>
  <Section position="4" start_page="332" end_page="332" type="metho">
    <SectionTitle>
TIME CORRELATION
MODELLING
</SectionTitle>
    <Paragraph position="0"> Preliminary experiments using full, rru/-dimensional covariance structure for p(Xla) did not improve perforrnance over the block-diagonal structure used when time samples are assumed to be uncorrelated. We believe that this was due to insufficient training data since a full covariance model has roughly m times as many free parameters as a block diagonal one and since the particular task on which the SSM was tested had a very fimited amount of training data. Our efforts have focused on two parallel approaches of handling the training data problem: devising an effective parameter reduction method and constraining the structure of the covariance matrix to further reduce the number of parameters to estimate.</Paragraph>
  </Section>
  <Section position="5" start_page="332" end_page="332" type="metho">
    <SectionTitle>
PARAMETER REDUCTION
</SectionTitle>
    <Paragraph position="0"> A first step towards the incorporation of time correlation in the SSM is parameter reduction; an obvious candidate is the method of linear discriminants \[Wilks 1962\].</Paragraph>
    <Paragraph position="1"> Our intuition suggested that sample-dependent reduction would outperform a single transformation for reduction.</Paragraph>
    <Paragraph position="2"> In fact, contrary to other results \[Brown 1987\], the single tranformation yielded poor performance (see Section 3).</Paragraph>
    <Paragraph position="3"> Linear discriminant parameter reduction for the SSM was implemented using sample-dependent transformations as follows. The speech segment X = \[Zl z2 ... zm\] ~r is substituted by a sequence of linear transformations of the ra original cepstral vectors = \[zl ~2 ... Zm\] ~ = \[R(I)zI R(2)z2 ... R(ra)zm\] T The transformation matrices, R (i), are obtained by solving m generalized eigenvalue problems, S~3~=~4~, iffil,2,...,m The classes used at each time instant from 1 to m are the phonemes, but the between- and within-class scatter matrices (S~) and Sty, respectively) are computed using only the observations for that specific sample.</Paragraph>
  </Section>
  <Section position="6" start_page="332" end_page="334" type="metho">
    <SectionTitle>
MARKOVIAN ASSUMPTION
</SectionTitle>
    <Paragraph position="0"> Structure In order to obtain the advantages of parameterization of time correlation without the m-fold increase in the number of parameters in the full-covariance case, we consider a constrained structure for the covariance matrix. More specifically, we assume that the density of the unobserved segment is that of a non-homogeneous</Paragraph>
    <Paragraph position="2"> Under this hypothesis, the number of parameters that have to be estimated increases by less than a factor of 3 over the block-diagonal case (see Table 1). Furthermore, by introducing the Markov restriction to the covariance marx, we shall also see in the following section that we can simplify the reestimation formulas for the Estimate-Maximize (EM) algorithm.</Paragraph>
    <Section position="1" start_page="333" end_page="334" type="sub_section">
      <SectionTitle>
Parameter Estimation
</SectionTitle>
      <Paragraph position="0"> As mentioned earlier, we assume that the observed segment Y is a &amp;quot;downsampled&amp;quot; version of the underlying fixed-length &amp;quot;hidden&amp;quot; segment, X. We used two different approaches for the parameter estimation problem.</Paragraph>
      <Paragraph position="1"> Linear Time Upsampling. The first, linear time upsampling, interpolates an unobserved sample zi of the underlying sequence X, by mapping that point to an observed frame, y~, with a linear-in-time warping transformation of the observed length k to the fixed length rn. The disadvantage of this method is that linear time upsampling introduces a correlation problem when models with non-independent flames are assumed, and in \[Roucos et al 1988\] better results were reported when the parameter estimates were obtained with the EM algorithm. However, in the case of frame dependent transformations, a missing observation is not interpolated by an adjacent one, but by a different transformation of that observation. This partially eliminates the correlation problem.</Paragraph>
      <Paragraph position="2"> Estimate-Maximize Algorithm for the Markovian Case. A second approach for the parameter estimation problem is to use the Estimate-Maxitrdze (EM) algorithm to obtain a maximum likelihood solution under the assumptions given in Section 2.2.1. As defined in Section 1, X represents the sequence of the incomplete d~!~ and Y that of the observed data. In this case, the observed length k is mapped through a linear time warping transformation to the fixed length ra, and each observation y~ is assigned to the closest in time zi. Under the assumption that rn is always greater than k, there are certain elements of X that have no elements of Y assigned to them, and we refer to them as &amp;quot;missing&amp;quot;. Let/~\[. and C~ denote the estimates at the r-th iteration of the mean vector of the i-th sample and the cross-covariance between the i-th and j-th samples respectively. Then the steps of the EM algorithm are: 1. Estimate step: Estimate the following complete dam statistics for each frame or appropriate combination of frames and each observation:</Paragraph>
      <Paragraph position="4"> where j = i or ../= i+ 1, ' denotes transposition and .E&amp;quot;(-) is the expectation operator using the r-th iteration density estimates. Under the assumption that the observations form a Markov chain, we have that</Paragraph>
      <Paragraph position="6"> where k and 1 are the immediately last and next nonmissing elements of X to i and j. Let/~ be the mean of =i and Ci~ be the covariance of zi and =$. Assuming Gaussian densities, the conditional expectations become</Paragraph>
      <Paragraph position="8"> and the Vtk, V6z, ~i matrices are obtained from the matrix inversion by partitioning lemma.</Paragraph>
      <Paragraph position="9">  where T is the set of all observations for a certain phoneme. In order to simplify the reestimation formulas, we also investigated a &amp;quot;forward prediction&amp;quot; type approximation, where the expectations of the r-th step are conditioned only on the last observed sample instead of both the last and the next:</Paragraph>
      <Paragraph position="11"/>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML