XML Viewer - w02-1031

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1031_intro.xml
Size: 4,009 bytes
Last Modified: 2025-10-06 14:01:34
<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1031">
  <Title>The SuperARV Language Model: Investigating the E ectiveness of Tightly Integrating Multiple Knowledge Sources</Title>
  <Section position="2" start_page="0" end_page="1" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The purpose of a language model (LM) is to determine the aprioriprobability of a word sequence  ). Language modeling is essential in a wide variety of applications; we focus on speech recognition in our research. Although word-based LMs (with bigram and trigram being the most common) remain the mainstay in many continuous speech recognition systems, recent e orts have explored a variety of ways to improve LM performance (Niesler and Woodland, 1996; Chelba et al., 1997; Srinivas, 1997; Heeman, 1998; Chelba, 2000; Rosenfeld, 2000; Goodman, 2001; Roark, 2001; Charniak, 2001).</Paragraph>
    <Paragraph position="1"> Class-based LMs attempt to deal with data sparseness and generalize better to unseen word sequences by rst grouping words into classes and then using these classes to compute n-gram probabilities. Part-of-Speech (POS) tags were initially used as classes by Jelinek (1990) in a conditional probabilistic model (which predicts the tag sequence for a word sequence rst and then uses it to predict the word sequence):</Paragraph>
    <Paragraph position="3"> However, Jelinek's POS LM is less e ective at predicting word candidates than an n-gram word-based LM because it deletes important lexical information for predicting the next word. Heeman's (1998) POS LM achieves a perplexity reduction compared to a trigram LM by instead rede ning the speech recognition problem as determining:  given the speech utterance A.TheLMP(W;T)isajoint probabilistic model that accounts for both the sequence of words w N  and their tag assignments t N  by estimating the joint probabilities of words and tags:</Paragraph>
    <Paragraph position="5"> Johnson (2001) and La erty et al. (2001) provide insight into why a joint model is superior to a conditional model.</Paragraph>
    <Paragraph position="6"> Recently, there has been good progress in developing structured models (Chelba, 2000; Charniak, Association for Computational Linguistics.</Paragraph>
    <Paragraph position="7"> Language Processing (EMNLP), Philadelphia, July 2002, pp. 238-247. Proceedings of the Conference on Empirical Methods in Natural 2001; Roark, 2001) that incorporate syntactic information. These LMs capture the hierarchical characteristics of a language rather than speci c information about words and their lexical features (e.g., case, number). In an attempt to incorporate even more knowledge into a structured LM, Goodman (1997) has developed a probabilistic feature grammar (PFG) that conditions not only on structure but also on a small set of grammatical features (e.g., number) and has achieved parse accuracy improvement. Goodman's work suggests that integrating lexical features with word identity and syntax would bene t LM predictiveness. PFG uses only a small set of lexical features because it integrates those features at the level of the production rules, causing a significant increase in grammar size and a concomitant data sparsity problem that preclude the addition of richer features. This sparseness problem can be addressed by associating lexical features directly with words.</Paragraph>
    <Paragraph position="8"> We hypothesize that high levels of word prediction capability can be achieved by tightly integrating structural constraints and lexical features at the word level. Hence, we develop a new dependencygrammar almost-parsing LM, SuperARV LM,which uses enriched tags called SuperARVs. In Section 2, we introduce our SuperARV LM. Section 3 compares the performance of the SuperARV LM to other LMs.</Paragraph>
    <Paragraph position="9"> Section 4 investigates the knowledge source contributions by constraint relaxation. Conclusions appear in Section 5.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML