File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/n03-2002_intro.xml

Size: 1,385 bytes

Last Modified: 2025-10-06 14:01:45

<?xml version="1.0" standalone="yes"?>
<Paper uid="N03-2002">
  <Title>Factored Language Models and Generalized Parallel Backoff</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The art of statistical language modeling (LM) is to create probability models over words and sentences that trade-off statistical prediction with parameter variance. The field is both diverse and intricate (Rosenfeld, 2000; Chen and Goodman, 1998; Jelinek, 1997; Ney et al., 1994), with many different forms of LMs including maximumentropy, whole-sentence, adaptive and cache-based, to name a small few. Many models are simply smoothed conditional probability distributions for a word given its preceding history, typically the two preceding words.</Paragraph>
    <Paragraph position="1"> In this work, we introduce two new methods for language modeling: factored language model (FLM) and generalized parallel backoff (GPB). An FLM considers a word as a bundle of features, and GPB is a technique that generalized backoff to arbitrary conditional probability tables. While these techniques can be considered in isolation, the two methods seem particularly suited to each other -- in particular, the method of GPB can greatly facilitate the production of FLMs with better performance.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML