File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/89/h89-2027_metho.xml

Size: 10,258 bytes

Last Modified: 2025-10-06 14:12:20

<?xml version="1.0" standalone="yes"?>
<Paper uid="H89-2027">
  <Title>An The N-Best Algorithm: Efficient Procedure for Finding Top N Sentence Hypotheses</Title>
  <Section position="3" start_page="0" end_page="199" type="metho">
    <SectionTitle>
2 The N-best Paradigm
</SectionTitle>
    <Paragraph position="0"> Figure I illustrates the general N-best search paradigm.</Paragraph>
    <Paragraph position="1"> We order the various KSs in terms of their relative power and cost. Those that provide more constraint, at a lesser cost, are used first in the N-best search. The output of this search is a list of the most likely whole sentence hypotheses, along with their scores. These hypotheses  are then rescored (or filtered) by the remaining KSs.</Paragraph>
    <Paragraph position="2"> Depending on the amount of computation required, we might include more or less KSs in the first N-best search. For example, it is quite inexpensive to search using a first-order statistical language model, since we need only one instance of each word in our network. Frequently, a syntactic model of NL will be quite large, so it might be reserved until after the list generation. Given the list, each alternative can usually be considered in turn in a fraction of a second. If the syntax is small enough, it can be included in the initial N-best search, to further reduce the list that would be presented to the remainder of the KSs. Another example of this progressive filtering could be the use of high-order statistical language models. While the high-order model ~equently provides added power (over a first-order model), the added power is usually not commensurate with the large amount of extra computation and storage needed for the search. In this case, a first-order language model can be used to reduce the choice to a small number of alternatives which can then be reordered using the higher-order model.</Paragraph>
    <Paragraph position="3"> Besides the obvious computational advantages, there are several other practical advantages of this paradigm.</Paragraph>
    <Paragraph position="4"> Since the output of the first stage is a small amount of text, and there is no further processing required from the acoustic recognition component, the interface between the speech recognition and the other KSs is trivially simple, while still optimal. As such this paradigm provides a most convenient mechanism for integrating work among several research sites. In addition, the high degree of modularity means that the different component subsystems can be optimized and even implemented separately (both hardware and software). For example, the speech recogmtion might run on a special-purpose array processor-like machine, while the NL might run on a general purpose host.</Paragraph>
  </Section>
  <Section position="4" start_page="199" end_page="200" type="metho">
    <SectionTitle>
3 The N-Best Algorithm
</SectionTitle>
    <Paragraph position="0"> The optimal N-Best algorithm is quite similar to the time-synchronous Viterbi decoder that is used quite commonly, with a few small changes. It must compute probabilities of word-sequences rather than state-sequences, and it must find all such sequences within the specified beam.</Paragraph>
    <Paragraph position="1">  ories whose probabilities are within a threshold of the probability of most likely word sequence at that state. Note that this state-dependent-threshold is distinct from the global beam search threshold.</Paragraph>
    <Paragraph position="2"> This algorithm requires (at least) N times the memory for each state of the hidden Markov model. However, this memory is typically much smaller than the amount of memory needed to represent all the different acoustic models. We assume here, that the overall &amp;quot;beam&amp;quot; of the search is much larger than the &amp;quot;beam at each state&amp;quot; to avoid pruning errors. In fact, for the first-order grammar, it is even reasonable to have an infinite beam, since the number of states is determined only by the vocabulary size.</Paragraph>
    <Paragraph position="3"> At first glance, one might expect that the cost of combining several sets of N theories into one set of N theodes at a state might require computation on the order of N 2. However, we have devised a &amp;quot;grow ~.md prune&amp;quot; strategy that avoids this problem. At each state, we simply gather all of the incoming theories. At any instant, we know the best scoring theory coming to this state at this time. From this, we compute a pruning threshold for the state. This is used to discard any theories that are below the threshold. At the end of the frame (or if the number of theories gets too large), we reduce the number of theories using aprune and count strategy that requires no sorting. Since this computation is small, we find, empirically that the overall computation increases with x/N, or slower than linear. This makes it practical to use somewhat high values of N in the search.</Paragraph>
    <Paragraph position="4"> 4 Rank of the Correct Answer Whether the N-best search is practical depends directly on whether we can assure that the correct answer is reliably within the list that is created by the first stage. (Actually, it is sufficient if there is any answer that will be accepted by all the NL KSs, since no amount of search would make the system choose the lower scoring correct answer m this case.) It is possible that when the correct answer is not the top choice, it might be quite far down the list, since there could be exponentially many other alternatives that score between the highest scoring answer and the correct answer. Whether this is true depends on the power of the acoustic-phonetic models and the statistical language model used in the N-best  search. Therefore we have accumulated statistics of the rank of the correct sentence in the list of N answers for two different language models: the statistical class gram- null for the two different language models. The distribution is plotted for lists up to 100 long. We have also marked the average rank on the distribution. As can be seen, for the case of no language model, the average rank is higher than that for the statistical grammar. In fact, about 20% of the time, the correct answer is not on the list at all. However, when we use the statistical class grammar, which is a fairly weak grammar for this domain, we find that the average rank is 1.8, since most of the time, the correct answer is within the first few choices. In fact, for this test of 215 sentences. 99 percent of the sentences were round within the 24 top choices.</Paragraph>
    <Paragraph position="5"> Furthermore, the acoustic model used in this experiment is an earlier version that results in twice the word error rate of the most recent models reported elsewhere in these proceedings. This means that when redone with better acoustic models, the rank will be considerably lower.</Paragraph>
    <Paragraph position="6"> To illustrate the types of lists that we see we show below a sample N-best output. In this example, the correct answer is the fifth one on the list.</Paragraph>
    <Section position="1" start_page="200" end_page="200" type="sub_section">
      <SectionTitle>
Example of N-best Output
</SectionTitle>
      <Paragraph position="0"> Answer: Set chart switch resolution to high. Top N Choices: Set charts which resolution to five. Set charts which resolution to high. Set charts which resolution to on. Set chart switch resolution to five. Set chart switch resolution to high. (***) Set chart switch resolution to on. Set charts which resolution to the high. Set the charts which resolution to five. procedure, or by using overall statistics of typical errors. Instead, we can generate all the actual alternatives that are appropriate to each particular sentence. A second application for the N-best algorithm is to generate aitemative sentences that can be used to test overgenerafion in the design of natural language systems. Typically, if overgeneration is tested at all, it is by generating random sentences using the NL model, and seeing whether they make sense. One problem with this is that many of the word sequences generated this way would never, in fact, be presented to a NL system by any reasonable acoustic recognition component. Thus, much of the tuning is being done on unimportant problems. A second problem is that the work of exanfining the generated sentences is a very tedious manual process. If, instead, we generate N-best lists from a real acoustic recognition system, then we can ask the NL system to parse all the sentences that are known to be wrong. Hopefully the NL system will reject most of these, and we only need to look at those that were accepted, to see whether they should have been.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="200" end_page="200" type="metho">
    <SectionTitle>
6 Conclusion
</SectionTitle>
    <Paragraph position="0"> We have presented a new algorithm for computing the top N sentence hypotheses for a hidden Markov model.</Paragraph>
    <Paragraph position="1"> Unlike previous algorithms, this one is guaranteed to find the most likely scoring hypotheses with essentially constant computation time. This new algorithm makes possible a simple and efficient approach to integration of several knowledge sources, in particular the integration of arbitrary natural language knowledge sources in spoken language systems. In addition there are other useful applications of the algorithm.</Paragraph>
  </Section>
  <Section position="6" start_page="200" end_page="200" type="metho">
    <SectionTitle>
Acknowledgement
</SectionTitle>
    <Paragraph position="0"> This work was .supported by the Defense Advanced Research Projects Agency and monitored by the Office of Naval Research under Contract No. N00014-85-C0279. null</Paragraph>
  </Section>
  <Section position="7" start_page="200" end_page="202" type="metho">
    <SectionTitle>
5 Other Applications for N-Best Algorithm
</SectionTitle>
    <Paragraph position="0"> We have, so far, found two additional application for the N-Best algorithm. The first is to generate alternative hypotheses for discriminative training algorithms. Typically, alternatives must be generated using a fast match</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML