File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/h94-1040_intro.xml
Size: 3,428 bytes
Last Modified: 2025-10-06 14:05:47
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1040"> <Title>COMBINING KNOWLEDGE SOURCES TO REORDER N-BEST SPEECH HYPOTHESIS LISTS</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. INTRODUCTION </SectionTitle> <Paragraph position="0"> During the last few years, the previously separate fields of speech and natural language processing have moved much closer together, and it is now common to see integrated systems containing components for both speech recognition and language processing. An immediate problem is the nature of the interface between the two. A popular solution has been the N-best list-for example, \[9\]; for some N, the speech recognizer hands the language processor the N utterance hypotheses it considers most plausible. The recognizer chooses the hypotheses on the basis of the acoustic information in the input signal and, usually, a simple language model such as a bi-gram grammar. The language processor brings more sophisticated linguistic knowledge sources to bear, typically some form of syntactic and/or semantic analysis, and uses them to choose the most plausible member of the N-best list. We will call an algorithm that selects a member of the N-best list a preference method. The most common preference method is to select the highest member of the list that receives a valid semantic analysis. We will refer to this as the &quot;highest-in-coverage&quot; method. Intuitively, highest-in-coverage seems a promising idea. However, practical experience shows that it is surprisingly hard to use it to extract concrete gains. For example, a recent paper \[8\] concluded that the highest-in-coverage candidate was in terms of the word error rate only very marginally better than the one the recognizer considered best. In view of the considerable computational overhead required to perform linguistic analysis on a large number of speech hypotheses, its worth is dubious.</Paragraph> <Paragraph position="1"> In this paper, we will describe a general strategy for constructing a preference method as a near-optimal combination of a number of different knowledge sources. By a &quot;knowledge source&quot;, we will mean any well-defined procedure that associates some potentially meaningful piece of information with a given utterance hypothesis H. Some examples of knowledge sources are The methods described here were tested on a 1001-utterance unseen subset of the ATIS corpus; speech recognition was performed using SRI's DECIPHER TM recognizer \[7, 5\], and linguistic analysis by a version of the Core Language Engine (CLE \[2\]). For 10-best hypothesis lists, the best method yielded proportional reductions of 13% in the word error rate and 11% in the sentence error rate; if sentence error was scored in the context of the task, the reduction was about 21%. By contrast, the corresponding figures for the highest-in-coverage method were a 7% reduction in word error rate, a 5% reduction in sentence error rate (strictly measured), and a 12% reduction in the sentence error rate in the context of the task.</Paragraph> <Paragraph position="2"> The rest of the paper is laid out as follows. In Section 2 we describe a method that allows different knowledge sources to be merged into a near-optimal combination. Section 3 describes the experimental results in more detail. Section 4 concludes.</Paragraph> </Section> class="xml-element"></Paper>