File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/h91-1043_metho.xml
Size: 8,569 bytes
Last Modified: 2025-10-06 14:12:44
<?xml version="1.0" standalone="yes"?> <Paper uid="H91-1043"> <Title>SOME RESULTS ON STOCHASTIC LANGUAGE MODELLING</Title> <Section position="4" start_page="225" end_page="226" type="metho"> <SectionTitle> Complexity Results </SectionTitle> <Paragraph position="0"> First, consider the computation of the probability that a given nonterminal H generates a tree whose yield is the string uz(*)vy (*), where u = to/.., toi+p and v = toj ... toi+q are two already recognized substrings, while z (*) and y(*) represent two unspecified length gaps, i.e. two not yet specified strings of terminal symbols that can be generated in those positions by G,. Such a probability will be indicated by Pr(H < uz(*)vV (*) >). For H = S, this probability gives the syntactical plausibility of the partial theory uz(*)vy (*), which may be used for computing hypothesis scores in search of the most plausible interpretation of a spoken sentence. The asterisk indicates that nothing is known about gap z.</Paragraph> <Paragraph position="1"> We have determined that calculation of such island probabilities with unknown gap length requires solving a rather huge non-llnear system of \[N\[(q + 1) 2 equations, q being the length of the island. If an approximate solution is of any interest, such a system can be rendered linear and can be solved by means of the computation of an \[Nl(q + if x \[Nl(q + 1) 3 inverse square matrix; this takes an O(\[Nl~q 6) amount of time. For practicai values of N and q the required computational effort seems unaffordable.</Paragraph> <Paragraph position="2"> Tables 1, 2, and 3 list the remaining cases that have been examined, along with the worst-case time complexity of calculating each probability given a known SCFG G,. Table 1 is self-explanatory. Table 2 deals with a problem of great practical interest - the computation of a theory that has been obtained from a previous theory by means of a single-word extension. In these cases the only calculation required concerns the new terms whose introduction is due to the added word. Table 3 shows the complexity of additional computation when not the theory, but the gap, is extended by one term. Since sufKxes and prefixes are symmetric, the tables show only one of two symmetric cases (results still valid if strings are reversed).</Paragraph> <Paragraph position="3"> The computations shown in Table 3 are particularly worth studying because we do not know ezactly the number of words filling the gap but often know a probability distribution for this quantity; hence we have to take into account more than one value for the gap length.</Paragraph> <Paragraph position="4"> Rows 3 and 5 in Table 3 show that a one-unit extension of a gap within a string costs a cubic amount of time (on top of work already done). If it is possible to get bounds on the number of (possible) words in a gap, this extra work will be repeated a fixed (in practical cases small) number of times.</Paragraph> <Section position="1" start_page="225" end_page="226" type="sub_section"> <SectionTitle> Island-Driven Parsing Strategies </SectionTitle> <Paragraph position="0"> Given a method for scoring partial sentence interpretations in ASU systems, how can the method be utilized? This section discusses how the computations listed previously support island-driven bidirectional strategies for ASU. In speech recognition and speech understanding tasks, partial theories are created and a strategy is used to select the most probable theory (theories) for growing.</Paragraph> <Paragraph position="1"> The score of a theory th can be expressed as:</Paragraph> <Paragraph position="3"> A parsing strategy can be considered that starts from left to right generating a sequence of word hypotheses ~; subsequently, syntactic or semantic predictions generate a sequence v.</Paragraph> <Paragraph position="4"> An upper bound for Pr(Aluz(*)vlt(*) ) can be obtained by running the Viterbi algorithm using a model for u, followed by a looped model of the lexicon (or the phonemes) for z (*), followed by a model for v and by a looped model for y(*). Starting from th, a theory can grow by trying to frill the gap z (*) with a sequence of words. The hypotheses used for filling the gap may have one word, two words, three words, etc.. For each size of the gap an upper bound of the probability coming from the language model is Pr(uz('~0vy(*)). Reasonable assumptions about possible values of m can be obtained if suprasegmental acoustic cues such as energy contour descriptors are available. Based on a string Ag describing these features in the gap, it is possible to express the probability Pr(Aslm ) of observing A s given a gap of m words as follows:</Paragraph> <Paragraph position="6"> where n, indicates the number of syllables in the gap, and Pr(Ag ) n, ---- s) denotes the ~priori probability of observing A s given that there are $ syllables in the gap. It is reasonable to assume that this probability is a good approximation of the probability of observing A s given that there are s syllables and rn words in the gap. Pr(no = s ) m) is the probability that a string of m words is made up of 8 syllables, and it can be estimated from a written text.</Paragraph> <Paragraph position="7"> The limits s~i.. and s~= are chosen in such a way that Pr(n, = s \[ m) < * for s < s,~,: and s > s .... so that they depend on m and on the language model, but not on the input string and can be computed oil-line.</Paragraph> <Paragraph position="8"> Thanks to (5) it is possible to delimit practical values between which m can vary. Let m, and m2 be the lowest and the highest value for m.</Paragraph> <Paragraph position="9"> An upper bound for the probability of the language model relative to theory &quot;th&quot; can be expressed as:</Paragraph> <Paragraph position="11"> We are mainly interested in ASU systems performing sentence interpretation in restricted domains. In this kind of task, non-syntactic information is usually available to predict words on the basis of previously obtained partial interpretations of the uttered sentence. Predicted words may be &quot;islands&quot; in the sense that they do not follow an existing partial theory in a strictly left-to-right manner.</Paragraph> <Paragraph position="12"> The acoustic evidence of these islands can be evaluated using word-spotting techniques. For these situations, island-driven parsers can be used. These parsers produce partial parses in which sequences of hypothesized words can be interleaved by gaps, making theories of the kind listed in the previous section (whose probabilities are calculated as described in \[3\]).</Paragraph> <Paragraph position="13"> The same methods permit assessment of word candidates adjacent to an already recognized string - i.e., computation of the probability that the first (last) word of the gap z, (z,~,) is a certain a 6 E. This new word will extend the current theory. Normally, the system would select the word candidate(s) which maximize the prefix-string-withgap probability of the theory augmented with it. Instead of computing these probabilities for all the elements in the dictionary, it is possible to restrict such an expensive process to the preterminal symbols (as in \[8\]).</Paragraph> <Paragraph position="14"> The approach discussed here should be compared with standard lattice parsing techniques, where no restriction is imposed by the parser on the word search space (see, for example \[4\] and the discussion in \[11\]). Our framework accounts for bidirectional expansion of partial analyses; this improves the predictive capabilities of the system. In fact, bidirectional strategies can be used in restricting the syntactic search space for gaps surrounded by two partial analyses. This idea has been discussed without reference to stochastic grammars in \[12\] for the case of one word length gaps. We propose a generalization to m-length gaps and to cases where partial analyses do not represent entire parse trees but partial derivation trees.</Paragraph> <Paragraph position="15"> A fair comparison between island-driven and left-to-right theory growing in stochastic parsing is not possible at present. In practice, island-driven parsers may remarkably accelerate the theory-growing process if island predictions are made by a look-ahead mechanism that leads to a correct partial theory with a limited number of competitors and if a limited number of predictions can be made for the words that can fill the gap.</Paragraph> </Section> </Section> class="xml-element"></Paper>