XML Viewer - p01-1068

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/p01-1068_intro.xml

Size: 1,291 bytes

Last Modified: 2025-10-06 14:01:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="P01-1068">
  <Title>Multi-Class Composite N-gram Language Model for Spoken Language Processing Using Multiple Word Clusters</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> In this paper, a new language model, the Multi-Class Composite N-gram, is proposed to avoid a data sparseness problem for spoken language in that it is difficult to collect training data. The Multi-Class Composite N-gram maintains an accurate word prediction capability and reliability for sparse data with a compact model size based on multiple word clusters, called Multi-Classes. In the Multi-Class, the statistical connectivity at each position of the N-grams is regarded as word attributes, and one word cluster each is created to represent the positional attributes. Furthermore, by introducing higher order word N-grams through the grouping of frequent word successions, Multi-Class N-grams are extended to Multi-Class Composite N-grams. In experiments, the Multi-Class Composite N-grams result in 9.5% lower perplexity and a 16% lower word error rate in speech recognition with a 40% smaller parameter size than conventional word 3-grams.</Paragraph>
  </Section>
class="xml-element"></Paper>

Download Original XML