File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/h94-1014_evalu.xml

Size: 4,883 bytes

Last Modified: 2025-10-06 14:00:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="H94-1014">
  <Title>Language Modeling with Sentence-Level Mixtures</Title>
  <Section position="4" start_page="83" end_page="84" type="evalu">
    <SectionTitle>
3. EXPERIMENTS
</SectionTitle>
    <Paragraph position="0"> The corpus used for training the different component models comprised the 38 million WSJ0 data, as well as the 38 million word augmented LM data obtained from BBN Inc. The vocabulary is the standard 5K non-verbalized pronunciation (NVP) data augmented with the verbalized punctuation words and a few additional words. In order to compute the mixture weights, both at the trigram-level as well as the sentencelevel, the WSJ1 speaker-independent transcriptions serve as  the &amp;quot;held out&amp;quot; da!a set. Because we felt that the training data may not accurately represent the optional verbalized punctuation frequency in the WS.I1 data, we chose to train models on two dat~ sets. The general model Pa and the component models Pt were trained on the WSJ0 NVP data augnlented by the BBN data. The general model Pa, was trained on the WSJ0 verbalized pronunciation data, so that using Po, in smoothing the component models also provides a simple means of allowing for verbalized pronunciation.</Paragraph>
    <Paragraph position="1"> The experiments compare a single trigram language model to a five-component mixture of trigram models. To explore the trade-offs of using different numbers of clusters, we also consider an eight-component trigram mixture. Perplexity and recognition results are reported on the Nov. '93 ARPA development and evaluation 5k vocabulary WSJ test sets.</Paragraph>
    <Section position="1" start_page="84" end_page="84" type="sub_section">
      <SectionTitle>
3.1. Recognition Paradigm
</SectionTitle>
      <Paragraph position="0"> The BU Stochastic Segment Model recognition system is combined with the BBN BYBLOS system and uses the N-best resc0ring formalism \[10\]. In this formalism, the BYBLOS system, a speaker-independent Hidden Markov Model System \[14\] 1, is used to compute the top N sentence hypotheses of which the top 100 are subsequently rescored by the SSM.</Paragraph>
      <Paragraph position="1"> A five-pass search strategy is used to generate the N-best hypotheses, and these are re.scored with thirteen state HMMs.</Paragraph>
      <Paragraph position="2"> A weighted combination of scores from different knowledge sources is used to re-rank the hypotheses and the top ranking hypothesis is used as the recognized output. The weights for recombination are estimated on one test set (in this case the  on the '93 ARPA 5k WSJ development test set.</Paragraph>
      <Paragraph position="3"> The next series of experiments, summarized in Table 22 , compared recognition performance for the BBN trigram language model \[15\], the BU 5-component mixture model, and the case where both language model scores are used in the N-best reranking. All language models were estimated from the same training a,!8: The results show a 7% reduction in error rate on the evaluation test set, comparing the combined language models to the BBN trigram. It is interesting that the combination of the trigram and the mixture model yielded a small improvement in performance (not significant, but consistent across lest sets), since the trigram is a component of the mixture model. The difference between the mixture model and the two combined models corresponds to a linear vs. non-linear combination of component probabilities, respectively.</Paragraph>
      <Paragraph position="4"> For reference, we also include the best case system performance, which corresponds the the case where all acoustic and language model scores. Even with all the acoustic model scores, adding the mixture language model improves performance, giving a best case result of 5.3% word error on the '93  We conducted a series of experiments in the rescoring paradigm to assess the usefulness of the mixture model. Unless otherwise noted, the only acoustic model score used was based on the stochastic segment model. The language model scores used varied with the experiments. For the best-case system, we used all scores, which included the SSM and the BBN Byblos HMM and SNN acoustic scores, and both the BBN trigram and BU mixture language model scores.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="84" end_page="84" type="evalu">
    <SectionTitle>
4. DISCUSSION
</SectionTitle>
    <Paragraph position="0"> In summary, this paper presents a new approach to language modeling, which offers the potential for capturing both topic-dependent effects and long-range sentence level effects in 2The performance figul~ quoted here are better than throe repoaed in the official November 1993 WSJ benchmark results, because more language model training data was available in the experimant repoNe.d here.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML