File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/p99-1022_intro.xml
Size: 3,323 bytes
Last Modified: 2025-10-06 14:06:57
<?xml version="1.0" standalone="yes"?> <Paper uid="P99-1022"> <Title>Dynamic Nonlocal Language Modeling via Hierarchical Topic-Based Adaptation</Title> <Section position="3" start_page="0" end_page="167" type="intro"> <SectionTitle> NESS and INTERNATIONAL news) as illustrated in </SectionTitle> <Paragraph position="0"> bility variation for peace within INTERNATIONAL news (the word usage is 50-times more likely in INTERNATIONAL:MIDDLE-EAST than INTERNA-TIONAL:JAPAN). We propose methods of hierarchical smoothing of P(w~ Itopict) in a topic-tree to capture this subtopic variation robustly.</Paragraph> <Section position="1" start_page="167" end_page="167" type="sub_section"> <SectionTitle> 1.1 Related Work </SectionTitle> <Paragraph position="0"> Recently, the speech community has begun to address the issue of topic in language modeling. Lowe (1995) utilized the hand-assigned topic labels for the Switchboard speech corpus to develop topic-specific language models for each of the 42 switchboard topics, and used a single topic-dependent language model to rescore the lists of N-best hypotheses. Error-rate improvement over the baseline language model of 0.44% was reported.</Paragraph> <Paragraph position="1"> Iyer et al. (1994) used bottom-up clustering techniques on discourse contexts, performing sentence-level model interpolation with weights updated dynamically through an EM-like procedure. Evaluation on the Wall Street Journal (WSJ0) corpus showed a 4% perplexity reduction and 7% word error rate reduction. In Iyer and Ostendorf (1996), the model was improved by model probability reestimation and interpolation with a cache model, resulting in better dynamic adaptation and an overall 22%/3% perplexity/error rate reduction due to both components.</Paragraph> <Paragraph position="2"> Seymore and Rosenfeld (1997) reported significant improvements when using a topic detector to build specialized language models on the Broadcast News (BN) corpus. They used TF-IDF and Naive Bayes classifiers to detect the most similar topics to a given article and then built a specialized language model to rescore the N-best lists corresponding to the article (yielding an overall 15% perplexity reduction using document-specific parameter re-estimation, and no significant word error rate reduction). Seymore et al. (1998) split the vocabulary into 3 sets: general words, on-topic words and off-topic words, and then use a non-linear interpolation to compute the language model. This yielded an 8% perplexity reduction and 1% relative word error rate reduction.</Paragraph> <Paragraph position="3"> In collaborative work, Mangu (1997) investigated the benefits of using existing an Broadcast News topic hierarchy extracted from topic labels as a basis for language model computation. Manual tree construction and hierarchical interpolation yielded a 16% perplexity reduction over a baseline uni-gram model. In a concurrent collaborative effort, Khudanpur and Wu (1999) implemented clustering and topic-detection techniques similar on those presented here and computed a maximum entropy topic sensitive language model for the Switchboard corpus, yielding 8% perplexity reduction and 1.8% word error rate reduction relative to a baseline maximum entropy trigram model.</Paragraph> </Section> </Section> class="xml-element"></Paper>