File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/p94-1011_evalu.xml
Size: 3,197 bytes
Last Modified: 2025-10-06 14:00:16
<?xml version="1.0" standalone="yes"?> <Paper uid="P94-1011"> <Title>PRECISE N-GRAM PROBABILITIES FROM STOCHASTIC CONTEXT-FREE GRAMMARS</Title> <Section position="9" start_page="77" end_page="77" type="evalu"> <SectionTitle> EXPERIMENTS </SectionTitle> <Paragraph position="0"> The algorithm described here has been implemented, and is being used to generate bigrams for a speech recognizer that is part of the BeRP spoken-language system (Jurafsky et al., 1994). An early prototype of BeRP was used in an experiment to assess the benefit of using bigram probabilities obtained through SCFGs versus estimating them directly from the available training corpus. 4 The system's domain are inquiries about restaurants in the city of Berkeley. The training corpus used had only 2500 sentences, with an average length of about 4.8 words/sentence.</Paragraph> <Paragraph position="1"> Our experiments made use of a context-free grammar hand-written for the BeRP domain. With 1200 rules and a vocabulary of 1 I00 words, this grammar was able to parse 60% of the training corpus. Computing the bigram probabilities from this SCFG takes about 24 hours on a SPARCstation 2-class machine. 5 In experiment 1, the recognizer used bigrams that were estimated directly from the training corpus, without any smoothing, resulting in a word error rate of 35.1%. In experiment 2, a different set of bigram probabilities was used, computed from the context-free grammar, whose probabilities had previously been estimated from the same training corpus, using standard EM techniques. This resulted in a word error rate of 35.3%. This may seem surprisingly good given the low coverage of the underlying CFGs, but notice that the conversion into bigrams is bound to result in a less constraining language model, effectively increasing coverage. null Finally, in experiment 3, the bigrams generated from the SCFG were augmented by those from the raw training data, in a proportion of 200,000 : 2500. We have not attempted to optimize this mixture proportion, e.g., by deleted interpolation (Jelinek and Mercer, 1980). 6 With the bigram estimates thus obtained, the word error rate dropped to 33.5%. (All error rates were measured on a separate test corpus.) The experiment therefore supports our earlier argument that more sophisticated language models, even if far from perfect, can improve n-gram estimates obtained directly from sample data.</Paragraph> <Paragraph position="2"> 4Corpus and grammar sizes, as well as the recognition performance figures reported here are not up-to-date with respect to the latest version of BeRP. For ACL-94 we expect to have revised results available that reflect the current performance of the system. predating the method described in this paper, bigrams had to be estimated from the SCFG by random sampling. Generating 200,000 sentence samples was found to give good converging estimates for the bigrams. The bigrams from the raw training sentences were then simply added to the randomly generated ones. We later verified that the bigrams estimated from the SCFG were indeed identical to the ones computed directly using the method described here.</Paragraph> </Section> class="xml-element"></Paper>