File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/89/h89-2020_metho.xml

Size: 3,001 bytes

Last Modified: 2025-10-06 14:12:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="H89-2020">
  <Title>A Simple Statistical Class Grammar for Measuring Speech Recognition Performance</Title>
  <Section position="4" start_page="147" end_page="147" type="metho">
    <SectionTitle>
3 GRAMMAR TRAINING
</SectionTitle>
    <Paragraph position="0"> To train the grammar, we began by assigning class(es) to each word in the vocabulary. A word may be assigned to multiple classes. For example, the word &amp;quot;SEA-WOLF&amp;quot; is assigned to one class: ship-name. On the other hand, the word &amp;quot;DISPLAY&amp;quot; is assigned to three classes: command-verb, adjectave, and noun. Once the words are assigned to appropriate classes, the statistics of the grammar were counted directly from the training data by counting the number of transitions from each class to each other class. These counts were then padded slightly (to account for unobserved class-to.class transitions) to allow the grammar to parse sentences containing unobserved class transitions. Finally, the grammar was tested on a test set to measure its perplexity.</Paragraph>
  </Section>
  <Section position="5" start_page="147" end_page="148" type="metho">
    <SectionTitle>
4 GRAMMAR PERFORMANCE
</SectionTitle>
    <Paragraph position="0"> Below is a summary some of the characteristics of the statistical first-order class grammar with 99 classes for the DARPA 1000-word Resource Management task domain. with null and word-pair grammar characteristics given for comparison.</Paragraph>
    <Paragraph position="1">  The class grammar figures given here are based on a grammar trained using all 2800 sentences available for  the 1000 word DARPA resource management task domain. The training set perplexity is computed over the entire training set and the test set perplexity is computed over the 300 sentences used for the May 1988 standard system evaluation. The word-pair coverage and perplexity are approximate theoretical figures assuming an independent test set. The test set perplexity for the word-pair is degenerate, since, if a single sentence doesn't parse, the perplexity becomes infinite.</Paragraph>
    <Paragraph position="2"> In informal tests, we were able to &amp;quot;tune&amp;quot; the perplexity of the grammar by adjusting the number of classes into which the words are categorized. On a fixed test set of 100 sentences and the full training set of 2800 sentences, the perplexity varied from 203 (with 50 classes) to 62 (with 168 classes).</Paragraph>
    <Paragraph position="3"> We have obtained some preliminary results for a statistical class grammar designed for a 2170 word personnel database access task domain. The grammar uses 637 classes (l to 5 per word) and is trained using 750 sentences. The perplexity of this grammar on an independent test set of 200 sentences was measured to be 89.4. The perplexity measured on the training set was 46.1. We haven't yet performed a full set of recognition experiments using this grammar.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML