File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/91/h91-1011_intro.xml

Size: 3,249 bytes

Last Modified: 2025-10-06 14:05:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="H91-1011">
  <Title>Modelling Context Dependency in Acoustic-Phonetic and Lexical Representations 1</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Variability in speech arises from many different sources.</Paragraph>
    <Paragraph position="1"> For example, acoustic variability can be due to noise or chain nel characteristics, phonetic variability can be due to contextual or speaker-specific effects, and dialect effects can alter speakers' pronunciations of words. Speech recognition systems must have mechanisms to model these various types of variability, and sometimes it may be necessary to deal with different types of variability with different mechanisms. For example, it may be difficult to find a single model that is able to deal effectively with both low-level acoustic variability and dialect differences among speakers.</Paragraph>
    <Paragraph position="2"> find mechanisms that are able to account for many different types of contextual factors.</Paragraph>
    <Paragraph position="3"> In this paper, we will describe a number of experiments intended to address some of the problems mentioned above. So far, we have attempted to account for some of the contextual effects on our phonetic models, although the approach that we have taken should apply to the higher levels of the system also. Briefly, we have found that we can increase recognition performance by creating context-specific models or by using more flexible models. However, we did not see a performance increase when we combined the two in a straight-forward manner, presumably due to the fact that more flexible models tend to require more training data. If, instead of using context-specific models, we accounted for context by adjusting the input to the phonetic models (creating a context-normalized input vector), we were able to account for contextual effects and were able to use more flexible phonetic models, resulting in the highest performance for our system.</Paragraph>
    <Paragraph position="4"> In the following sections, we will first provide an overview of the system. This will be followed by a more detailed description of the changes we have made to the system, and evaluation results on the Resource Management task.</Paragraph>
    <Paragraph position="5"> In the SUMMIT system, we have made a rough distinction between the sort of variability that we can deal with within our phonetic models (including acoustic variability and speaker differences at a phonetic level), and higher level phonological variation (including dialect effects and word-boundary effects). In both cases, our goal is to account for as much of the variability as possible, and it is clear that at least some of the variability is due to contextual effects.</Paragraph>
    <Paragraph position="6"> Just as there are many types of variability, there are many types of contextual effects, including local phonetic effects (coarticulation), effects of stress, phrase-level effects (such as prepausal lengthening), and higher level effects (such as sentential stress or dialect differences). Therefore, we need to</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML