File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-1028_intro.xml

Size: 2,877 bytes

Last Modified: 2025-10-06 14:06:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1028">
  <Title>Beyond N-Grams: Can Linguistic Sophistication Improve Language Modeling?</Title>
  <Section position="3" start_page="186" end_page="187" type="intro">
    <SectionTitle>
2 Experimental Framework
</SectionTitle>
    <Paragraph position="0"> In an attempt to gain insight into what linguistic knowledge we should be exploring to improve language models for speech recognition, we ran experiments where people tried to improve the output of speech recognition systems and then recorded what types of knowledge they used in doing so. We hoped to both assess how much gain might be expected from very sophisticated models and to determine just what information sources could contribute to this gain.</Paragraph>
    <Paragraph position="1"> People were given the ordered list of the ten most likely hypotheses for an utterance according to the recognizer. They were then 2 For a more comprehensive review of the historical involvement of natural language parsing in language modelling, see Stolcke(1997).</Paragraph>
    <Paragraph position="2">  asked to choose from the ten-best list the hypothesis that they thought would have the lowest word error rate, in other words, to try to determine which hypothesis is closest to the truth. Often, the truth is not present in the 10-best list. An example 5-best list from the Wall Street Journal corpus is shown in Figure 1. Four subjects were used in this experiment, and each subject was presented with 75 10-best lists from three different speech recognition systems (225 instances total per subject). From this experiment, we hoped to gauge what the upper bound is on how much we could improve upon state of the art by using very rich models) For our experiments, we used three different speech recognizers, trained respectively on Switchboard (spontaneous speech), Broadcast News (recorded news broadcasts) and Wall Street Journal data. 4 The word error rates of the recognizers for each corpus are shown in the first line of Table 1.</Paragraph>
    <Paragraph position="3"> The human subjects were presented with the ten-best lists. Sentences within each ten-best list were aligned to make it easier to compare them. In addition to choosing the most appropriate selection from the 10-best list, subjects were also allowed to posit a string not in the list by editing any of the strings in the 10-best list in any way they chose. For each sample, subjects were asked to determine what types of information were used in deciding.</Paragraph>
    <Paragraph position="4"> This was done by presenting the subjects with a set of check boxes, and asking them to check all that applied. A list of the options presented to the human can be found in Figure 2. Subjects were provided with a detailed explanation, as well as examples, for each of these options .5</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML