File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/n04-4006_evalu.xml
Size: 7,936 bytes
Last Modified: 2025-10-06 13:59:08
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-4006"> <Title>Language model adaptation with MAP estimation and the perceptron algorithm</Title> <Section position="5" start_page="0" end_page="0" type="evalu"> <SectionTitle> 3 Experimental Results </SectionTitle> <Paragraph position="0"> We evaluated the language model adaptation algorithms by measuring the transcription accuracy of an adapted voicemail transcription system on voicemail messages received at a customer care line of a telecommunications network center. The initial voicemail system, named Scanmail, was trained on general voicemail messages collected from the mailboxes of people at our research site in Florham Park, NJ. The target domain is also composed of voicemail messages, but for a mailbox that receives messages from customer care agents regarding network outages. In contrast to the general voicemail messages from the training corpus of the Scanmail system, the messages from the target domain, named SS-NIFR, will be focused solely on network related problems. It contains frequent mention of various network related acronyms and trouble ticket numbers, rarely (if at all) found in the training corpus of the Scanmail system.</Paragraph> <Paragraph position="1"> To evaluate the transcription accuracy, we used a multi-pass speech recognition system that employs various unsupervised speaker and channel normalization techniques. An initial search pass produces word-lattice output that is used as the grammar in subsequent search passes. The system is almost identical to the one described in detail in (Bacchiani, 2001). The main differences in terms of the acoustic model of the system are the use of linear discriminant analysis features; use of a 100 hour training set as opposed to a 60 hour training set; and the modeling of the speaker gender which in this system is identical to that described in (Woodland and Hain, 1998). Note that the acoustic model is appropriate for either domain as the messages are collected on a voicemail system of the same type. This parallels the experiments in (Lamel et al., 2002), where the focus was on AM adaptation in the case where the LM was deemed appropriate for either domain.</Paragraph> <Paragraph position="2"> The language model of the Scanmail system is a Katz backoff trigram, trained on hand-transcribed messages of approximately 100 hours of voicemail (1 million words).</Paragraph> <Paragraph position="3"> The model contains 13460 unigram, 175777 bigram, and 495629 trigram probabilities. The lexicon of the Scanmail system contains 13460 words and was compiled from all the unique words found in the 100 hours of transcripts of the Scanmail training set.</Paragraph> <Paragraph position="4"> For every experiment, we report the accuracy of the one-best transcripts obtained at 2 stages of the recognition process: after the first pass lattice construction (FP), and after vocal tract length normalization and gender modeling (VTLN), Constrained Model-space Adaptation (CMA), and Maximum Likelihood Linear regression adaptation (MLLR). Results after FP will be denoted FP; results after VTLN, CMA and MLLR will be denoted MP.</Paragraph> <Paragraph position="5"> For the SSNIFR domain we have available a 1 hour manually transcribed test set (10819 words) and approximately 17 hours of manually transcribed adaptation data (163343 words). In all experiments, the vocabulary of the system is left unchanged. Generally, for a domain shift this can raise the error rate significantly due to an increase in the OOV rate. However, this increase in error rate is limited in these experiments, because the majority of the new domain-dependent vocabulary are acronyms ing systems obtained by supervised LM adaptation on the 17 hour adaptation set using the two methods, versus the baseline out-of-domain system.</Paragraph> <Paragraph position="6"> which are covered by the Scanmail vocabulary through individual letters. The OOV rate of the SSNIFR test set, using the Scanmail vocabulary is 2%.</Paragraph> <Paragraph position="7"> Following (Bacchiani and Roark, 2003), th in Eq. 2 is set to 0.2 for all reported MAP estimation trials. Following (Roark et al., 2004), l in Eq. 4 is also (coincidentally) set to 0.2 for all reported perceptron trials. For the perceptron algorithm, approximately 10 percent of the training data is reserved as a held-out set, for deciding when to stop the algorithm.</Paragraph> <Paragraph position="8"> Table 1 shows the results using MAP estimation and the perceptron algorithm independently. For the perceptron algorithm, the baseline Scanmail system was used to produce the word lattices used in estimating the feature weights. There are two ways to do this. One is to use the lattices produced after FP; the other is to use the lattices produced after MP.</Paragraph> <Paragraph position="9"> These results show two things. First, MAP estimation on its own is clearly better than the perceptron algorithm on its own. Since the MAP model is used in the initial search pass that produces the lattices, it can consider all possible hypotheses. In contrast, the perceptron algorithm is limited to the hypotheses available in the lattice produced with the unadapted model.</Paragraph> <Paragraph position="10"> Second, training the perceptron model on FP lattices and applying that perceptron at each decoding step out-performed training on MP lattices and only applying the perceptron on that decoding step. This demonstrates the benefit of better transcripts for the unsupervised adaptation steps.</Paragraph> <Paragraph position="11"> The benefit of MAP adaptation that leads to its superior performance in Table 1 suggests a hybrid approach, that uses MAP estimation to ensure that good hypotheses are present in the lattices, and the perceptron algorithm to further reduce the WER. Within the multi-pass recognition approach, several scenarios could be considered to implement this combination. We investigate two here.</Paragraph> <Paragraph position="12"> For each scenario, we split the 17 hour adaptation set into four roughly equi-sized sets. In a first scenario, we produced a MAP estimated model on the first 4.25 hour subset, and produced word lattices on the other three subsets, for use with the perceptron algorithm. Table 2 shows systems obtained by supervised LM adaptation on the 17 hour adaptation set using the first method of combination of the two methods, versus the baseline out-of-domain system.</Paragraph> <Paragraph position="13"> the results for this training scenario.</Paragraph> <Paragraph position="14"> A second scenario involves making use of all of the adaptation data for both MAP estimation and the perceptron algorithm. As a result, it requires a more complicated control of the baseline models used for producing the word lattices for perceptron training. For each of the four sub-sections of the adaptation data, we produced a baseline MAP estimated model using the other three subsections. Using these models, we produced training lattices for the perceptron algorithm for the entire adaptation data set. At test time, we used the MAP estimated model trained on the entire adaptation set, as well as the perceptron model trained on the entire set. The results for this training scenario are shown in table 3.</Paragraph> <Paragraph position="15"> Both of these hybrid training scenarios demonstrate a small improvement by using the perceptron algorithm on FP lattices rather than MP lattices. Closely matching the testing condition for perceptron training is important: applying a perceptron trained on MP lattices to FP lattices hurts performance. Iterative training did not produce further improvements: training a perceptron on MP lattices produced by using both MAP estimation and a perceptron trained on FP lattices, achieved no improvement over the 19.6 percent WER shown above.</Paragraph> </Section> class="xml-element"></Paper>