File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/n03-2028_metho.xml
Size: 6,103 bytes
Last Modified: 2025-10-06 14:08:15
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-2028"> <Title>LM Studies on Filled Pauses in Spontaneous Medical Dictation Jochen Peters</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Experimental results </SectionTitle> <Paragraph position="0"> The three approaches are evaluated in terms of the overall perplexity (PP) and local values: PPFP and PPword are measured at FP and word positions only, and PPafter [?] are measured immediately thereafter.</Paragraph> <Paragraph position="1"> The results in Table 2 show that discarding FP from the history clearly improves the performance (2. versus 1.).</Paragraph> <Paragraph position="2"> The overall PP is reduced by 4-5%. Big reductions by 30-40% are found at positions immediately following FP.</Paragraph> <Paragraph position="3"> This, and the improvements as we go from bi- to trigrams (which are contrary to (Siu and Ostendorf, 1996)), indicates that sentences are - on average - continued after FP. Using merged counts further improves our LMs. Gains are (almost) additive to those from FP-skipping. Especially, PPafter FP decreases by another 10% for approach 2. which shows that the &quot;recovered&quot; FP-free M-Grams are indeed valuable if we use FP-free histories.</Paragraph> <Paragraph position="4"> A comparison of PPafter FP and PPafter word confirms the common knowledge that word prediction after FP is pretty hard. Even the unigram perplexity is almost 50% higher for words following FP than for words following fluent contexts. This supports (Shriberg and Stolcke, 1996) where the reduced predictability after FP is partly attributed to the chosen words in those positions.</Paragraph> <Paragraph position="5"> For trigrams, the discrepancy between PPafter FP and PPafter word is much larger. Asking &quot;how unexpected is a word in a given context ?&quot; we evaluated the entropy H(hi) = [?]summationtextw pLM(w |hi) * logpLM(w |hi) and the rank Ri of wi following hi in the distribution pLM([?] |hi).</Paragraph> <Paragraph position="6"> Both quantities were averaged over histories hi ending on FP or on words.1 Note that eHmean represents a perplexity for the case that words following each history are distributed according to pLM([?] |h). An actually measured PP above eHmean indicates a bias in the corpus towards words with low pLM(w |h). The results from Table 3 show almost no such bias after words. After FP, however, following words are clearly biased to low probabilities within the trigram distributions. Also, the mean ranks are considerably higher after FP than after words.</Paragraph> <Paragraph position="7"> Together, these findings support our impression that FP often represents a hesitation where the speaker is searching for a less common word or formulation.</Paragraph> <Paragraph position="8"> LM range Appr. Counts PPoverall PPFP PPword PPafter FP PPafter wordsize: 81 k 5 k 76 k 5 k 76 k Unigram 1. = 2. with FP 786.5 +- 14.0 12.4 +- 0.0 1042.2 +- 17.9 1136.8 +- 85.7 767.1 +- 14.03. FP-free 786.4 +- 14.0 12.5 +- 0.0 1041.7 +- 17.9 1136.3 +- 85.6 767.0 +- 14.0 1. with FP 115.6 +- 2.4 11.0 +- 0.2 135.7 +- 3.0 957.5 +- 76.0 100.2 +- 2.2 Bigram 2. with FP 112.0 +- 2.4 11.1 +- 0.2 131.0 +- 2.9 579.3 +- 50.6 100.2 +- 2.2 3. FP-free 110.9 +- 2.3 12.5 +- 0.0 128.6 +- 2.8 503.5 +- 42.6 100.1 +- 2.1 1. with FP 61.4 +- 1.4 10.4 +- 0.2 69.3 +- 1.6 605.9 +- 49.9 52.6 +- 1.2 1. merged 60.3 +- 1.3 9.8 +- 0.2 68.2 +- 1.6 646.3 +- 53.2 51.4 +- 1.1 Trigram 2. with FP 59.2 +- 1.3 10.9 +- 0.2 66.4 +- 1.5 427.2 +- 39.8 51.8 +- 1.2 2. merged 57.5 +- 1.2 11.4 +- 0.2 64.2 +- 1.5 383.6 +- 34.5 50.5 +- 1.1 3. FP-free 57.9 +- 1.2 12.5 +- 0.0 64.3 +- 1.5 367.0 +- 33.0 51.1 +- 1.1 Uni 1. = 2. 1.6 1.1 1301 881 Tri 1. 2.6 1.2 1050 3362. 5.1 1.2 719 335 Recall that approach 3. cannot discriminate between positions with an increased or reduced FP probability. To evaluate the discrimination for approaches 1. and 2. we calculated p(FP |h) instead of p(w |h) at each position in the corpus. The crucial result is that the mean FP probability is reduced by 48% and 45% (approach 1. and 2.) at word as compared to FP positions. This is an important feature of these LMs since small FP probabilities reduce confusions of proper words with FP.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Summary </SectionTitle> <Paragraph position="0"> Concerning the question how to best predict words next to FP we get the following results for our spontaneous dictation task: Discarding FP from the LM histories reduces PPoverall by 4% and PPafter FP by 30%. (The latter reduction is bigger than in (Stolcke and Shriberg, 1996). Note that our measurements include positions after sentence-initial FP which suffer from the FP-removal.) Count merging with FP-free M-Grams gives an additional reduction of PPoverall by 3% and of PPafter FP by 10%.</Paragraph> <Paragraph position="1"> Comparisons of local perplexities and studies of entropies and word rankings indicate that FP often represents a hesitation as speakers are searching for a less common word or formulation which is hard to predict.</Paragraph> <Paragraph position="2"> At positions following FP, trigrams outperform bigrams. This together with gains from discarded FP suggests that FP rarely represent sentence breaks or restarts. We presented a new analysis of the LM's power to discriminate between FP and word positions. Predicting FP with a trigram allows to lower the FP probability at word positions by almost 50%. This is an important feature to reduce confusions of words with FP.</Paragraph> <Paragraph position="3"> Speech recognition experiments are published in (Schramm et al., 2003). Using merged counts and discarding FP from the LM history reduces the error rate on Eval by 2.2% (relative) while PP is reduced by 7%.</Paragraph> <Paragraph position="4"> Acknowledgements: I would like to thank Xavier Aubert and Hauke Schramm for their contribution of speech recognition experiments.</Paragraph> </Section> class="xml-element"></Paper>