SOME RESULTS ON STOCHASTIC LANGUAGE MODELLING 
Renato De Mori and Roland Kuhn 
School of Computer Science 
McGill University 
3480 University St. 
Montreal, Quebec, Canada H3A 2A7 
ABSTRACT 
The paper will discuss three issues. The first is the deriva- 
tion of precise probability scores for partial hypotheses contain- 
ing islands, in the context of a Stochastic-Context-Free-Grammar 
(SCFG) for Language Modeling (LM). The second issue is the 
possibility of adding a cache component to a LM. This component 
alters the expected probability of words to reflect the speaker's 
patterns of word use. Finally, the idiosyncratic properties of di- 
alogue are being studied; this work will indicate how knowledge 
about the discourse state can be incorporated into the LM and 
into the semantic component. 
ISLAND-DRIVEN PARSING 
Language Modeling and Theories with 
Islands 
Automatic Speech Understanding (ASU) is based on a 
search process that generates partial interpretations of a 
spoken sentence cared theoeies; theories are scored on the 
basis of a likellhood L = O(Pr(A I ¢h) Pr(th)). We are in- 
terested in the computation of Pr(th) when th is a partial 
interpretation of a spoken sentence generated by a Stochas- 
tic Context-Free Grammar (SCFG) G,. A recent report \[2\] 
reviews this problem and gives interesting results. 
The most popular parsers used in Automatic Speech 
Recognition (ASR) generate new theories in a left-to-right 
fashion. To score the theories generated by these parsers, 
the probability of all parse trees generating the first p words 
of a sentence must be computed; the appropriate algorithms 
are given in \[8\]. 
Parsers that are "island-driven" proceed outward in both 
directions from island8 of words that have been hypothe- 
sized with high acoustic evidence. Interesting island-driven 
parsers have been proposed by \[13\], \[12\], \[6\] who have also 
discussed the motivations for considering these parsers for 
ASU. None of these parsers uses a stochastic grammar. 
If island-driven parsers are used for generating partial 
interpretations of a spoken sentence, it is important to com- 
pute Pr(th), which is the probability that a SCFG generates 
sequences of words intermixed with gaps corresponding to 
portions of the acoustic signal that are still uninterpreted. 
Recent work provides a precise theoretical framework for 
this computation \[3\]. 
Many different cases involving islands and gaps have 
been examined; space considerations do not permit us to 
give here the lengthy formulas obtained for each of these 
cases. Instead, this paper will list the cases along with the 
worst-case time complexity of the computation of Pr(th) for 
each. Perhaps the most striking result of this work was the 
sharp division between the cases where one must compute 
the probability that a partial tree generates substrings of 
a sentence intermixed with a gap of unknown length, and 
the cases where the gap has a known hngth. The former 
computation appears to have an unacceptable time com- 
plexity; the latter computation is quite tractable. For this 
reason a later section considers ways in which one might 
estimate the length of a gap. 
Definitions 
A SCFG is a quadruple Go = (N, E, P, S), where N is a 
finite set of no,terrainalsymbols, ~ is a finite set of terminal 
symbols disjoint from N, P is a finite set of productions of 
the form H ~ a, H 6 N, c~ 6 (SUN)*, and S 6 Nis 
a special symbol called Jtart symbol. Each production is 
associated with a probability, indicated with Pr(H --~ c~). 
If the grammar is proper the following relation holds: 
Pr(H -. a) = l, H6N. (1) 
ae(~uN)" 
An SCFG G, is in Chomsky Normal Form (CNF) if all pro- 
ductions in Go are in one of the following forms: 
H ~ FG H ~ ~, H,F, G G N, w 6 ~. (2) 
In the following we will always refer to SCFGs in CNF. 
In the adopted formalism u, v and t represent strings of 
already recognized terminals; i, \] and I are position indices; 
225 
p, q and r axe shift indices; m indicates a (known) gap 
length and k, h are used as running indices. Furthermore, 
z ('~') stands for a gap of unknown terminals with specified 
length ra, while a gap of unknown terminals with unknown 
length is represented by z ('). Finally, E* represents the set 
of all strings of finite length over E, while E',m _> 0 is the 
set of all strings in E* of length m. 
The derivation of a string in G. is usually represented as 
a parse (or derivation) tree, which shows the rules employed. 
It is also possible to assodate with each derivation tree the 
probability that it was generated from a nonterminal symbol 
H by the grammar G,. This probability is the product of 
the probabilities of all the rules employed in the derivation. 
Given a string z 6 E*, the notation H < z >, H 6 N, 
indicates the set of all trees with root H generated by Go 
and spanning z. Therefore Pr(H < z >) is the sum of the 
probabilities of these subtrees, i.e. the probability that the 
string z ha, been generated by G, starting from symbol H. 
We assume that the grammar G, is consistent. This means 
that the following condition holds: 
E Pr(S < z >) = 1. (3) 
zE~* 
From this hypothesis it follows that a similar condition holds 
for all nonterminals. 
We are concerned with the computation of probabilities 
of strings involving islands. The assumed model of com- 
putation is the Random Access Machine, taken under the 
uniform cost criterion (see \[1\]). We will indicate with IPI 
the size of set P, i.e. the number of productions in G,. We 
will also write f(z) = O(g(z)) whenever there exist con- 
stants c, ~ > 0 such that f(z) > c g(z) for every z > ~. In 
the following section, we give the worst-case time complex- 
ity results we have derived. 
Complexity Results 
First, consider the computation of the probability that 
a given nonterminal H generates a tree whose yield is the 
string uz(*)vy (*), where u = to/.., toi+p and v = toj ... toi+q 
are two already recognized substrings, while z (*) and y(*) 
represent two unspecified length gaps, i.e. two not yet spec- 
ified strings of terminal symbols that can be generated in 
those positions by G,. Such a probability will be indicated 
by Pr(H < uz(*)vV (*) >). For H = S, this probabil- 
ity gives the syntactical plausibility of the partial theory 
uz(*)vy (*), which may be used for computing hypothesis 
scores in search of the most plausible interpretation of a spo- 
ken sentence. The asterisk indicates that nothing is known 
about gap z. 
We have determined that calculation of such island prob- 
abilities with unknown gap length requires solving a rather 
huge non-llnear system of \[N\[(q + 1) 2 equations, q being the 
length of the island. If an approximate solution is of any 
interest, such a system can be rendered linear and can be 
solved by means of the computation of an \[Nl(q + if × 
\[Nl(q + 1) 3 inverse square matrix; this takes an O(\[Nl~q 6) 
amount of time. For practicai values of N and q the required 
computational effort seems unaffordable. 
Tables 1, 2, and 3 list the remaining cases that have been 
examined, along with the worst-case time complexity of cal- 
culating each probability given a known SCFG G,. Table 
1 is self-explanatory. Table 2 deals with a problem of great 
practical interest - the computation of a theory that has 
been obtained from a previous theory by means of a single- 
word extension. In these cases the only calculation required 
concerns the new terms whose introduction is due to the 
added word. Table 3 shows the complexity of additional 
computation when not the theory, but the gap, is extended 
by one term. Since sufKxes and prefixes are symmetric, the 
tables show only one of two symmetric cases (results still 
valid if strings are reversed). 
The computations shown in Table 3 are particularly 
worth studying because we do not know ezactly the number 
of words filling the gap but often know a probability distri- 
bution for this quantity; hence we have to take into account 
more than one value for the gap length. 
Rows 3 and 5 in Table 3 show that a one-unit extension 
of a gap within a string costs a cubic amount of time (on 
top of work already done). If it is possible to get bounds 
on the number of (possible) words in a gap, this extra work 
will be repeated a fixed (in practical cases small) number of 
times. 
Island-Driven Parsing Strategies 
Given a method for scoring partial sentence interpre- 
tations in ASU systems, how can the method be utilized? 
This section discusses how the computations listed previ- 
ously support island-driven bidirectional strategies for ASU. 
In speech recognition and speech understanding tasks, 
partial theories are created and a strategy is used to select 
the most probable theory (theories) for growing. 
The score of a theory th can be expressed as: 
Pr(uz(*)vY (*) I A) = Pr(A \[ uz(*)v'V (*)) Pr(uz (*)vv (*)) Pr(A) 
(4) 
A parsing strategy can be considered that starts from left 
to right generating a sequence of word hypotheses ~; sub- 
sequently, syntactic or semantic predictions generate a se- 
quence v. 
An upper bound for Pr(Aluz(*)vlt(*) ) can be obtained by 
running the Viterbi algorithm using a model for u, followed 
226 
by a looped model of the lexicon (or the phonemes) for z (*), 
followed by a model for v and by a looped model for y(*). 
Starting from th, a theory can grow by trying to frill the 
gap z (*) with a sequence of words. The hypotheses used for 
filling the gap may have one word, two words, three words, 
etc.. For each size of the gap an upper bound of the prob- 
ability coming from the language model is Pr(uz('~0vy(*)). 
Reasonable assumptions about possible values of m can 
be obtained if suprasegmental acoustic cues such as energy 
contour descriptors are available. Based on a string Ag de- 
scribing these features in the gap, it is possible to express 
the probability Pr(Aslm ) of observing A s given a gap of m 
words as follows: 
Pr(Aslm) = ~Pr(Agln,=,,m) Pr(no=~l-~) 
$=0 
a ir!,t6m 
~- ~ PrCAs I'~. = s) Pr(.o = s I re)C5) 
where n, indicates the number of syllables in the gap, and 
Pr(Ag ) n, ---- s) denotes the ~priori probability of observ- 
ing A s given that there are $ syllables in the gap. It is 
reasonable to assume that this probability is a good ap- 
proximation of the probability of observing A s given that 
there are s syllables and rn words in the gap. Pr(no = s ) m) 
is the probability that a string of m words is made up of 
8 syllables, and it can be estimated from a written text. 
The limits s~i.. and s~= are chosen in such a way that 
Pr(n, = s \[ m) < • for s < s,~,: and s > s .... so that they 
depend on m and on the language model, but not on the 
input string and can be computed oil-line. 
Thanks to (5) it is possible to delimit practical values 
between which m can vary. Let m, and m2 be the lowest 
and the highest value for m. 
An upper bound for the probability of the language 
model relative to theory "th" can be expressed as: 
uc~z(*)t~y(*)) = max PrC~z(m)t,y (')) (6) 
wtl ~wt 3 
We are mainly interested in ASU systems performing 
sentence interpretation in restricted domains. In this kind 
of task, non-syntactic information is usually available to 
predict words on the basis of previously obtained partial 
interpretations of the uttered sentence. Predicted words 
may be "islands" in the sense that they do not follow an 
existing partial theory in a strictly left-to-right manner. 
The acoustic evidence of these islands can be evaluated us- 
ing word-spotting techniques. For these situations, island- 
driven parsers can be used. These parsers produce partial 
parses in which sequences of hypothesized words can be in- 
terleaved by gaps, making theories of the kind listed in the 
previous section (whose probabilities are calculated as de- 
scribed in \[3\]). 
The same methods permit assessment of word candi- 
dates adjacent to an already recognized string - i.e., com- 
putation of the probability that the first (last) word of the 
gap z, (z,~,) is a certain a 6 E. This new word will extend 
the current theory. Normally, the system would select the 
word candidate(s) which maximize the prefix-string-with- 
gap probability of the theory augmented with it. Instead of 
computing these probabilities for all the elements in the dic- 
tionary, it is possible to restrict such an expensive process 
to the preterminal symbols (as in \[8\]). 
The approach discussed here should be compared with 
standard lattice parsing techniques, where no restriction is 
imposed by the parser on the word search space (see, for 
example \[4\] and the discussion in \[11\]). Our framework ac- 
counts for bidirectional expansion of partial analyses; this 
improves the predictive capabilities of the system. In fact, 
bidirectional strategies can be used in restricting the syntac- 
tic search space for gaps surrounded by two partial analyses. 
This idea has been discussed without reference to stochas- 
tic grammars in \[12\] for the case of one word length gaps. 
We propose a generalization to m-length gaps and to cases 
where partial analyses do not represent entire parse trees 
but partial derivation trees. 
A fair comparison between island-driven and left-to-right 
theory growing in stochastic parsing is not possible at present. 
In practice, island-driven parsers may remarkably accelerate 
the theory-growing process if island predictions are made by 
a look-ahead mechanism that leads to a correct partial the- 
ory with a limited number of competitors and if a limited 
number of predictions can be made for the words that can 
fill the gap. 
HEURISTICS FOR IMPROVED 
LANGUAGE MODELING 
The domains of discourse into which we might wish to 
introduce speech recognition systems vary widely. Often, 
the way in which human beings employ speech within a 
given domain has idiosyncracies which should be incorpo- 
rated into the probabillstic language model, because they 
greatly increase its predictive power. In this section, we 
discuss two heuristics which may improve language model- 
ing in specific situations. 
Adding a Cache Component to a Stan- 
dard Language Model 
Our work on cache-based language modeling began with 
the simple observation that a given speaker or writer is likely 
227 
to use the same words repeatedly, and gave rise to a heuris- 
tic which greatly improved the performance of a standard 
probabiilstic language model (the 3g-gram model). This 
heuristic is likely to be useful in any context where a speaker 
interacts with the speech recognition system for some length 
of time (the longer the interaction continues, the more ac- 
curate the system's estimate of the speaker's characteristic 
word use frequencies). Dictation systems are an obvious 
application, as is any interactive system in which the in- 
teraction is prolonged, or where the same people use the 
system repeatedly. 
This section summarizes our work on cache components; 
see \[9\] for more details. Since the cache is superimposed on 
a standard language model (we used the 3g-gram model) we 
will begin with an overview of such models. 
Consider the most straightforward kind of language mod- 
eling, where the task of the LM is to determine Pr(th) = 
P~-(~ol...~o~,). The trigram model \[7\] approximates this by 
setting P~'(~,du, l...tvi_l) = Pr(u, dwi_2~oi_t ), which can be 
estimated from a training text. Thus we have 
probability of any word W which belongs to it is a weighted 
average of W's frequency in that POS category in the train- 
ing text - the 3g-gram component - and W's frequency in 
the cache belonging to the POS category - the cache com- 
ponent. During the speech recognition task, the cache for 
a POS will contain the last N words which were guessed to 
have that POS (we set N to 200). If a word has occurred 
often in the recent past, it will occur many times in the 
cache for its POS. 
Let Ci(W ,i) be the cache-based probability estimate for 
word W at time i for POS gj. This is calculated from the 
frequency of W among the N=200 most recent words be- 
longing to POS Oi- Our combined model estimates P~'(wi = 
WIg(W)) by kMjf(wi = Wig(W)) +kc,iCi(W,i), where 
kM,i + kc,i = 1, instead of by f(~ = Wig(W)) alone. The 
POS component Pr(gl = g(W)lgi_a, gi-1) of the combined 
model was estimated as described in \[5\]. 
To train and test the pure 3g-gram model and the com- 
bined model, we utilized different portions of the LOB Cor- 
pus of British English: 100 sample texts drawn from this 
corpus form the testing text. The results exceeded our ex- 
Pe(th)----P,'(~,1)P~'(~o21~ol)P,'(~31w~2)...P,'(w,,Ito,,_2~o,,_~). pectations. On the testing text, the pure 3g-gram model 
(7) gave a perplexity of 332; the combined model gave a per- 
The 3g-gram model uses grammatical parts of speech 
(POS). Let g(wl) = 91 denote the POS of the word that 
appears at time i. Based on g~-I and gi-2, one part of the 
model gives the probability that gi is a noun or a verb or an 
article, etc: Pr(gi = X) = Pr(g~ = Xlgi-~,gi_l). Another 
part gives the probability of a particular word if the POS is 
known. Both parts are estimated from frequencies in train- 
ing texts. Thus, for a word W that has only one possible 
POS, g(W), the probability Pr(~i = W) is estimated by 
Pr(~i = Wfg;-2, gi-1) = P,(Wlg(W))Pr(g, = g(W)Ig,-a, g,-1). (S) 
Our contention was that while someone is speaking, a 
word used in the immediate past is very likely to be used 
again - much more likely than would be predicted by ei- 
ther of the models just described. We believed that these 
short-term word frequency fluctuations depend on the POS. 
Therefore, we used the 3g-gram model along with a cache 
component as the basis for a combined model which 
could weight the short-term cache component heavily for 
some POSs and not for others. The relative weights as- 
signed to the cache and 3g-gram components within each 
POS category were obtained by maximum-likelihood esti- 
mation. To assess the improvement achieved by incorporat- 
ing the cache component, we ran the combined model and 
a pure 3g-gram model (both trained on exactly the same 
data) on a text and compared the perplexities obtained. 
The combined model gives a probability to each POS in 
the same way as the 3g-gram model. For a fixed POS, the 
plexity of 107. Thus, the incorporation of a cache compo- 
nent yielded a 3-fold improvement in perplexity. 
The results confirm our hypothesis that recently-used 
words have a higher probability of occurrence than the 3g- 
gram model would predict, and that incorporating this knowl- 
edge into the LM via a cache component gives a significant 
improvement in performance. 
The following are some ideas for extending this work: 
• The weighting of the cache component could be made 
to depend on the number of words in the cache. 
* The idea of tracking the speaker's recent behaviour 
could be extended to POSs - that is, recently employed 
POSs could be assigned higher probabilities. 
* A word association matrix could be built, so that the 
occurrence of a word would increase the estimated 
probability of words that often co-occur with it. 
• A cache component could be incorporated into a SCFG, 
where it would affect the probabilities of those produc- 
tions which give rise to terminals. 
Language Modeling and Semantics in Di- 
alogue Systems 
More recent work focuses on the special characteristics 
of dialogue. Given the context provided by the state of the 
discourse, there will be strong constraints on both syntax 
228 
and word choice which should be expressed probabilisticallv 
iu the LM. Development of such an LM is one of our maiu 
current goals. 
We have recently begun to consider the influence of dis- 
course state on semantics. It has usually been tacitly as- 
sumed that almost all the words in an utterance must be 
correctly recognized for its meaning to be determined. This 
is true for isolated sentences, but it is seldom true du_dng 
a dialogue. We wish to design a dialogue system capable 
of extracting semantic content even from distorted utter- 
ances, by using Bayesian criteria to decide between possible 
meanings. This work has links with the results for island- 
driven parsers, since meaningful word sequences often form 
islands within a user utterance. A rigorous quantitative 
theory linking these themes, together with the appropriate 
parsing algorithm, is under development. 
II. R.M.Moore F.Perelra and H.Murveit, "Integrating Speech 
and Natural Language Processing", Proceedings of the 
Speech and Natural Language Workshop, 1989, Philadel- 
phia, Pennsylvanla, pp.243-247. 
12. O.Stock R.Falcone and P.Insinnamo, "Bidirectional Chart: 
A Potential Technique for Parsing Spoken Natural Lan- 
guage Sentences", Computer speech and Language, vol.3, 
n.3, 1989, pp.219-237. 
13. W.A.Woods, "Optimal Search Strategies for Speech Under* 
standing ControP', Artificial Intelligence, vol.18, n.3, 1981, 
pp.295-326. 
REFERENCES 
I. A.V.Aho J.E.Hopcroft and J.D.U\]lman, "The Design Anal- 
ysls of Computer Algorithms", Addison-Wesley Publishing 
Company, Reading, MA, 1974. 
2. P.Brown FJelinek and R.L.Mereer, "Basic Method of 
Probabilistic Context Free Grammars", Internal Report, 
T.J.Watson Research Center, Yorktown Heights, N'Y" 10598, 
85 pages. 
3. A.Corasza R.De Mori R.Gretter and G.Satta, "Computa- 
tion of Probabilities for an Island-Driven Parser", McGill 
Unieersitv Technical Report, No. SOCS 90.19, Jan. 1991. 
4. Y.L.Chow and S.Roukos, "Speech Understanding Using a 
Unification Grammar", Proc. IEEE International Conf. on 
Acoustic, Speech and Signal Processing, 1989, Glasgow, 
Scotland, pp. 727-731. 
5. A.M. Derouault and B. Merialdo, "Natural Language Mod- 
eling for Phoneme-to-Text Transcription", IEEE Trans. 
Pattern Anal. Machine Intell., Vol. PAMI-8, pp. 742-749, 
Nov. 1986. 
8. E.P.Gie~k;- and C.Rullent, "A Parallel Parser for Spoken 
Natural Language", Proc. of Eleventh International Joint 
Conference on Artificial Intelligence, 1989, Detroit, Michi- 
gem USA, pp.1537-1542. 
7. F.Jelinek, "The Development of an Experimental Discrete 
Dictation Recognizer", Prac.IEEE, vol.73, n.ll, Nov. 1985, 
pp. 1616-1624. 
8. F.Jellnek, "Computation of the Probability of Initial Sub- 
string Generation by Stochastic Context Free GrAmmars", 
Internal Report, Continuous Speech Recognition Group, 
IBM Research, T.J.Watson Research Center, Yorktown 
Heights, NY 10598, 10 pages. 
9. R.Ku.hn and R.De Mori, "A Cache-Based Natural Language 
Model for Speech Recognition", IEEE Trans. on Pattern 
Analysis and Machine Intelligence, vol.12, n.8, June 1990, 
pp.570-583. 
1O. K.Lari and S.J.Young, "The Estimation of Stochastic 
Context-Free Grammars using the Inside-Outslde Algo, 
rithm", Computer Speech and Language, vol.4, n.1, 1990, 
pp.35-56. 
Computed probability and its time eomplezitv 
gap probabilities 
I.{Pr(H < z (1) >) I H 6 N,I < k < m} 
- > o(Iel~ 2) 
inside probabilities 
2.{Pr(H < w,... w,+~ >) I/-i r e N} 
- > o(IPI~, ~) 
prmqx/su~x-string probabilities 
3.{Pr(H < wi ...wi+pz (*) >) \[H 6 N} 
- > 0(IPIp 3) 
4.{Pr(H < w~...w~+,z ('~) >) I H e N} 
- > o(IPI max@ ~, m2P}) 
gap.ln-strlng probabilities 
S.{Pr(H < ~ ...,,~+p~('),,,i ...~J+, >) I H ~ N} 
- > O(Iel maC', @}) 
6.{Pr(H < w,... ~,+,z('~)wj ... wj+, >) I H e N} 
- > O(IP\[ max~ s, q3, re:p}) 
island probabi.llties 
7.{Pr(H < zC"),J,i ...wj+,~ (') >) I H 6 N} 
- > O(\]p I maX{q 3, rrt'q}) 
preax-string-with-gap probabilities 
S.{Pr(H < ~,~ :..,,,~+pz(')~,j ...~j+~V (') >) \] H e N} 
- > OOPI m~ ~, @,P'~'=, q~=}) 
Table It Worst.case time complezity for the computation of 
bounded and unbounded gap length probabilities. 
229 
Prob. of theory extended by word a 
inside probabiLfffes 
I.{Pz(H < wl ...w~+~a >) \[ H 6 N} 
- > o(IPI~ ~) 
pxe~x-stxing probabilitles 
2.{P:(H < w~ ...w,+~az(') >) \]H 6 N} 
- > o(IPIp ~) 
~.{Pr(H < a~,...~+~=(") >) \[ H 6 N} 
- > O(IPImax{p~,ra~}) 
4.{Pr(/-/< ~,~... ~,~+:=("-') >) I H e N} 
- > 0(IP\]max{2=ra,pm'}) 
gap-in-string probabiliHes 
5.{Pr(H < wi...w~+~z(')wi...wi+qa >) \] H 6 N} 
- > o(IPI max{p', q'}) 
6.{P:(H < w~... w{+~,z(')awl ... wj+~ >) \] H 6 N} 
7.{Pz(H < ~,~... wi+~z(~)wj ... w\]+~a >) 
- > o(IPl m={~ ~, ~, ~,~(~ + q)}) 
8.{Pr(H < ~o~...w~+~z("~)awi ... wi+, >) 
- > O(IP\[ max{p~q, Pq ~ , q2m, qrnl }) 
9.{P~(H < u,, ...w~+pz(')a >) \[ H 6 N} 
- > O(IPIp') 
IO.{Pr(H < w,... wi+pz{'~')a >) \[ H e N} 
- > o(I P I max~=,m~}) 
H6N} 
HeN} 
inland probabilities 
ll.{Pr(H < z(')wi ...wj+~ay (') >) I H 6 N} 
- > O(IPI max{q =, m=}) 
~2.{P~(H < =('~)~i---"i+~'(') >) I H e N} 
- > O(\[P\[ m={m=q, mq'}) 
prei~x-stKng-wi~h-gap probabilities 
13.{Pr(H < w{ ...w{+~,az("~-~)w i . ..wi+qy(*) >) \[ H E N} 
- > O(\]Pimax{p:q, pq=,p:m,pm=}) 
14.{Pr(H < wi ...w~+pz("~-X)awi ...~O\]+qy (') >) \[ S e N} 
- > O(\[P\]max{p~q,lxl=,p=rn, pm~}) 
15.{Pr(H < w{ ...w{+~z(~)wi...wi+~ay (') >) I H 6 N} 
- > o(IPI max{p',q~, rrt' , (rn + q)p}) 
16.{Pr(H < w~...~i+~,z('~)ay (°) >) \] H ~ N} 
- > O(IPlm~x{m',p'}) 
Prob. of one-uni~ extension of ~no~n-length 9ap 
gap pro bab//H:ies 
1.{Pr(H < z~ z~ 
- > O(IPIm) 
pxe~x-strlng probabilities 
• -~'~-('~ >)l~eN} 2.{Pr(H < ~,i...~i+p~, ~2 
- > O(\[P\[max{pZ,pm}) 
gap-in-stKng p~obaM//tles 
- > O(IPI max.bTq, qp= ,/n'~q}) 
i4aad probab~//ties 
~.(~L.(*), . >) I H 6 N} 4.{P:(H < -, -2 -,.'-~J+qY(*) 
- > O(IPI max{q=,qm}) 
pre~x-sfring-w~th-gap probabl/ifles 
• _(-,)_(1)_ >) I H 6 N} 5.{Pr(H < ,J,~ .... i+pz, =2 wj...,J,j+.~y(*) 
- > O(\[Plmax{p=q, lx1:,fq + re)p}) 
Table 3: Worst.case time eomplezity for the computation of 
probabilities of theories eztended by mean* of incrementing by 
one unit the known length gap. 
Table 2: Worst-case time ¢omp&zity /or the computation o/ 
p1"obabilitie* of theories eztended by the addition of a :ingle word aE~. 
230 
