XML Viewer - p91-1039

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/91/p91-1039_intro.xml
Size: 4,004 bytes
Last Modified: 2025-10-06 14:05:06
<?xml version="1.0" standalone="yes"?>
<Paper uid="P91-1039">
  <Title>FACTORIZATION OF LANGUAGE CONSTRAINTS IN SPEECH RECOGNITION</Title>
  <Section position="3" start_page="0" end_page="299" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> In the past, speech recognition has mostly been applied to small domain tasks in which language constraints can be characterized by regular grammars. All the knowledge sources required to perform speech recognition and understanding, including acoustic, phonetic, lexical, syntactic and semantic levels of knowledge, are often encoded in an integrated manner using a finite state network (FSN) representation. Speech recognition is then performed by finding the most likely path through the FSN so that the acoustic distance between the input utterance and the recognized string decoded from the most likely path is minimized. Such a procedure is also known as maximum likelihood decoding, and such systems are referred to as integrated systems. Integrated systems can generally achieve high accuracy mainly due to the fact that the decisions are delayed until enough information, derived from the knowledge sources, is available to the decoder. For example, in an integrated system there is no explicit segmentation into phonetic units or words during the decoding process. All the segmentation hypotheses consistent with the introduced constraints are carried on until the final decision is made in order to maximize a global function. An example of an integrated system was HARPY (Lowerre, 1980) which integrated multiple levels of knowledge into a single FSN. This produced relatively high performance for the time, but at the cost of multiplying out constraints in a manner that expanded the grammar beyond reasonable bounds for even moderately complex domains, and may not scale up to more complex tasks.</Paragraph>
    <Paragraph position="1"> Other examples of integrated systems may be found in Baker (1975) and Levinson (1980).</Paragraph>
    <Paragraph position="2"> On the other hand modular systems clearly separate the knowledge sources. Different from integrated systems, a modular system usually make an explicit use of the constraints at each level of knowledge for making hard decisions.</Paragraph>
    <Paragraph position="3"> For instance, in modular systems there is an explicit segmentation into phones during an early stage of the decoding, generally followed by lexical access, and by syntactic/semantic parsing. While a modular system, like for instance HWIM (Woods, 1976) or HEARSAY-II (Reddy, 1977) may be the only solution for extremely large tasks when the size of the vocabulary is on the order of 10,000 words or more (Levinson, 1988), it generally achieves lower performance than an integrated system in a restricted domain task (Levinson, 1989). The degradation in performance is mainly due to the way errors propagate through the system. It is widely agreed that it is dangerous to make a long series of hard decisions. The system cannot recover from an error at any point along the chain. One would want to avoid this chainarchitecture and look for an architecture which would enable modules to compensate for each other. Integrated approaches have this compensation capability, but at the cost of multiplying the size of the grammar in such a way that the computation becomes prohibitive for the recognizer. A solution to the problem is to factorize the constraints so that the size of the  grammar, used for maximum likelihood decoding, is kept within reasonable bounds without a loss in the performance. In this paper we propose an approach in which speech recognition is still performed in an integrated fashion using a covering grammar with a smaller FSN representation. The decoded string of words is used as input to a second module in which the complete set of task constraints is imposed to correct possible errors introduced by the speech recognition module.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML