\¥c fldy 17{.c, trlct, c.d St,()chasti(: (;;rammars 
:Ricks op den Akker ~nd Itugo tcr Doest 
University o:\[ Tw('ntc, l)epartment of CompuSer Science 
P.O.Box 217, 7,500 A\]i', Ensch(;de, The Netherlands 
Keywords: stochastic languages, grammars, grammar 
inference. 
Abstract 
A new type of stochatstic grammars is introduced tor inw~s- 
tigation: weakly restricted stochastic grammars. In this 
p~Lper we will concentrate oil the consistcl,cy prol)lcnt. ~1'o 
find conditions for stoch~stlc gramma.rs to be consistm~t, 
the theory of multitype Galton-Watson branciting pro- 
cesses and gelmrating functions is of central ilnporl,~utce. 
The unrestricted stochastic grammar formalism generates 
the s~tme class of langu~gcs as the wc~kly rcstricl.cd for 
malisln, q'}te inside-outside algorithm is adapt.cd for use 
with weakly restricted granlmars. 
1 Introduction 
l\[' W( COilSJdcr ~ natural lttnguages as a sLr~,lcLtire lno(l- 
elled by a formal grammar we do not consider il any 
more as a language thal. is used. Formal (context- 
free) grammars are often advocated as a model lbr 
l, he "linguistic competence" of an ideal }tal;ural l~m- 
guage user. It is also 1,oticed that this mat}mmatical 
concept is far froln at su{Iicieut model for describing all 
aspects of the language. What cannot be expressed by 
this model is tile fact thai. some sentences or phrases 
are more likeley to occur than others. '\]'his notion of 
occurrence refers to the use of language and thereR)re 
considering this kind of statistical knowledge about 
language has to do with the pragmatics of language 
laid down in a corpus of the language. With a partic- 
ular context of use in mind a syntactically ambiguous 
sentence will often have a most, likely meaning and 
hence a most likely mlalysis. Some of the shortcom- 
ings of the pure (context-free) grammar model can 
maybe be solved by stochastic gralmnars, a model 
that makes it possible to incorporate certain statisti- 
cal facts about the language nse into a model of the 
possible structures of sentences as we conceive them 
from a mathematical, formal, point of view. Natural 
languages are now seen as stochastic; a user of a lan- 
gnage as a stochastic source producing sentences. A 
stochastic language over some alphabet E is simply a 
formal language L over i3 together with a probability 
function 05 assigning to each string a: in the language 
a real mn-nber 05(a,) in \[0, 1\]. Since 05(a:) is interpreted 
as the chance that tile event x, or the event that a 
language-source produces x, will occur, it will be clear 
that the sum of 4)(x) where x ranges over all possible 
sentences is equal to one. Tile stochastic language is 
called context free if tile language L is context-free. 
The usual grammatical model for a stochastic 
context-free language is a context-free gramm~u' I,o- 
gcther with a probability function f that assigns a 
real mnnber in \[0, 1\] to each of the productions of the 
grammar. The Ineaning of this Nnction is the follow- 
int. A step in a derivation of a sententia\] form, in 
which a nonterminal A is rewritten using production 
p has chance f(p) to occur, independent of which A 
is rewri{,ten in the sentential form and indepelident 
of the history of the proces l, hat produced the sen- 
tential form. The probability of a derivation(- tree) is 
the product of the probabilities of bile derivation steps 
that produces the tree. The probability of a sentence 
generated by tin, gramnm.r is the sum of the probabili- 
ties of all the. trees of' a sentence. So given a stochastic 
grammar we can compute the probabilities of all its 
sentences. The distribution language generated by a 
stochastic grammar G, I)L(G), is defiued as the set 
o\[' all deriw~tion trees with their probabilities. 'Che 
stochast, ic language generated by a stochastic gram- 
mar (7, HL(G), is defined as the set of all sentences 
generated by the grammar with their probabilities. 
A stochastic gr~mmlar G is an adequate model of a 
language L if on its basis we can correctly compute 
tile probabilities of the sentences in the hmgnage L. 
Of course this assumes a statistical analysis of a lan- 
guage corpus. A stochastic grammar that generates a 
stochastic language is called consistent. 
Definition 1.1 A stochastic grammar G is called 
consistent if for the probability measure p reduced b?l 
(; onto the laT~g~agc generated by its underlying gram- 
Otherwise the grammar is called inconsistent. 
Not all stochastic grammars generate a stochastic lan- 
guage. Even proper, and reduced grammars 1 do not 
1A granlmed is culled proper if for all nontcrminals A, talc 
stlnl of the probabilities assigned to the rules for A is 1. A 
grammar is c~dled reduced if all nont.erminals are reachable and 
can produce a terminal st, ring. 
929 
necessarily generate a stochastic language. This is 
illustrated in the following example. " 
Example 1.1 Consider the stochastic grammar G 
with nonterminal set VN = {S}, terminal set V'r = 
{a}. The productions with their probabilities are 
given by: 
S ~ SS 
S ---' a 
Following the technique presented in \[2\] we find 
that the production generating function is given by 
gl(St ) = qs~ + 1- q, and that, the first moment matrix 
/2 is given by \[2q\]. We can conclude that the gram- 
mar is consistent if and only if q _< 1/2. For details we 
refer to \[5\]. No~ice thai, all the different trees of string 
a r~ have Lhe same probability. Hence, they cannot be 
distinguished according to their probabilities. \[\] 
It has been noticed that the usual model of a stoch- 
astic grammar as presented above, and which we 
from now on call the unrestricted stochastic gram.mar 
model, has some disadvantages for modelling "real" 
languages. In this paper we present a more ade- 
quate model, the weakly restricted stochastic gram- 
mar model. We give necessary and sufficient con- 
ditions to test in an efficient way whether such a 
grammar defines a stochastic language. Moreover, we 
will show that these grammars can be transformed 
into an equivalent model of the usual type. The nice 
thing about the new model is that it models "context- 
dependent" probabilities of production-rules directly 
in terms of the grammar specification of the language 
and not in terms of some particular implementation 
of the grammar as a parser. The latter is done by 
Briscoe and Carroll \[3\] by assigning probabilities to 
the transitions of the LR-parser constructed for the 
grammar. In section 2 weakly restricted grammars 
are introduced, in section 3 conditions for their con- 
sistency are investigated; in section 4 it is proven that 
weakly restricted grammars and unrestricted gram- 
mars generate the same class of stochastic languages 
and section 5 presents the inside-outside algorithm for 
weakly restricted grammars. 
2 Weakly Restricted Stochastic 
Grammars 
~Ib add context-sensitivity to the assignment of proba- 
bilities to the application of production rules, we take 
into account (and distinguish) the occurrences of the 
nonterminals. Then, for each nonterminal occurrence 
distinct probabilities can be given for the production 
rules that can be used to rewrite the nonterminal. 
This way of assigning probabilities to the application 
2Although we found in \[7\] by Jelinek and Lafferty the (false) 
statement that a stochastic grammar is consistent if and only 
if it is proper, given that the underlying grammar is reduced. 
The example gives a clear counter example of their statement. 
of production rules seems unknown in literature, al- 
though we found some other fornlalisms that were de- 
signed to add context-sensitivity to the assignment of 
probabilities. For instance, the definition of stoch- 
astic grammars by Salomaa in \[8\] is somewhat differ- 
ent from the definition we gave in our introduction: 
the probability of a production to be applied is here 
dependent on the production that was last applied. 
To escape the bootstrap problem (when a deriva- 
tions is started, there is no last applied production) 
an initial stochastic vector is added to the grammar. 
Weakly restricted stochastic grammars are introduced 
in \[1\]. In the following definition Ca, denotes the set 
of productions for Ai and l~(Ai) denotes the number 
of right-hand side occurrences of nonterminal A~. 
Definition 2.1 A weakly restricted stochastic gram- 
mat" Gzv is a pair" (Cc, A), where Cc = (VN, ~,},, P, X) 
is a conlex#free flrammar and A is a set of functions 
A = {p~lA~ C VN} 
where, if j E 1 ...t~(Ai) and k E 1...\[CA,I, pi(j,k) = 
Pij~" G \[0, 1\] The set of productions P contains ezacily 
one produclion for start symbol S. 
In words, Plj~ stands for the probability that the k-th 
production with left~hand side Ai is used for rewrit- 
ing the j-th right-hand side-occurrence of nonterminal 
Ai. The usefullness of this context-dependency can 
be seen immediately fi'om the following unrestricted 
stochastic grammar, which is taken (in part) from the 
example grammar in \[3\] (p. 29): 
S ~ NP VP 
VP o~ Vt NP 
N P o~ \[5,N P 
NP Q Dd N 
NP ~ NP PP 
Unrestricted stochastic grammars cannot model con- 
text dependent use of productions. For example, an 
NP is more likely to be expanded as a pronoun in sub- 
ject position than elsewhere. Exactly this dependence 
on where a nonterminal was introduced can be mod- 
eled by using a weakly restricted stochastic grammar. 
Since in a weakly restricted stochastic grammar the 
probabilities of applying a production are dependent 
on the particular occurrence of a nonterminal in the 
right-hand side of a production, it is useflfl to require 
that there is only one start production. 
The characteristic grammar of a weakly restricted 
grammar is the underlying context free grammar. 
The next step is to compute probabilities for strings 
with respect to weakly restricted stochas~aic gram- 
mars. For this purpose a tree is written in terms 
of its subtrees (trees with a nonterminal as root) as 
q\[tiljl, f.i2j~, • .., ti,~(q)j,(q)\], in which q is a production, 
n(q) is the number of nonterminals in the right-hand 
side of q and tij denotes a (sub)tree with the j-th 
930 
occurrence of' nonterminal Ai at its root. A tree for 
which n(q)---0 is written as \[\]. 
Definition 2.2 The probabilily of a derivation tree t 
wi~h respect to o weakly reslriclcd s~ochastie grammar 
is deflated r'ecurswely a 
,~(q) 
m .~ ,n m3,,,) 
rn=l 
where 1 < k,,~ < \[CA,,. \]. 
The probability of a string is deiined as the sun: of 
the probabilities of all distinct derivation trees ~hat 
yield this string. 
l)efinition 2.3 73e probability of a string x Z~l 
L(G~) is defined as 
The distribution langnage \])L(Ow) and stochastic 
language ,5'/;((\]~) of a wealdy restrieted gramuxar 
((;~,A) are detined ana\]oguous to :;l~e distribution 
language and stochastic language of an nnrestricted 
granllilar. 
3 Consistency 
In this section consistency of we~kly rest, rict, ed stoch- 
astic grammars will be considered. The theory of 
nmltiwpe hranching processes will be used to come 
to a similar theorem as is given in \[2\] for unrestricted 
stochastic grammars. 
Definition 3.1 l~or the j-th occurrence of ~ontermi- 
hal Ai ~ VN the production generating fnnetAon \]or 
weakly restricted stochaslic grammars is defined as: 
9i~(~1,1, ...,sk,It(A~)) = 
ICAil k t~(Am) 
u=l m~:\] n:.:l 
where r,m(k) is 1 if nonlerminal-occurrence A .... ap- 
pears in the righ1-hand side of the k-lh production rule 
wilh nonIerminal Ai as left-hand side and 0 otherwise. 
Note that for each right-hand side nonterminal oc- 
currence a dummy-variable is introduced: sij corre- 
sponds to the j-th occurrence of nonterminal Ai. A 
special variable is Sl,:: it corresponds to the s~art 
symbol which is the right-hand side of the start pro- 
duction s ~ P of the form Z ~ S, The genera- 
~ing function for nonterminal occnrrenee A 0 entails 
for each production for Ai a term. If 91j has a term 
of the form 
O~Si: Si 2 . . . Si,, 
then we know thai it corresponds to a prodnetion for 
Ai of the fbrm 
A i -+ :t:i:Ail Xi2gi ~ . . . zi,~Ai,,:ci,,+~ 
where the .~:ij ~ 1.4\[. The production has, if it is used 
for rewriting oceurrence Aij, probability ct of being 
applied. In Example 3 it will be illustrated how the 
terms of t~he genera.tAng timctions correspond to the 
productions of the grammar. 
Theorem 3.1 Let Aq =~ ct," thus ~he j-th oecurre~.cc 
of nonlerminal Ai is re.written using ezactly one pro- 
dTtclion. 73c probability tha~ (~ contains lhe ,..th oc- 
currence of nontermznal Am is given by 
c~gii(Sl,t ..... st',tt(Ak)) 
Proof h~ general ~he generating function can be writ- 
ten as 
9ij(SIA,...,Sk,I~(A~)) =: 91j(St,1,...,Sk~/{(A~))-I eli 
where glj(slj,,.., s~,Jt(a~)) only contains terms de- 
pendent on slj,... ,sk,la(A~) and where eij is a (;Oll- 
stant~ term. The terms dependent on S:,l, ..., S~.,R(Ak) 
come from productions for Ai that contain nontermi.. 
nals in their right-h~md sides and the constant terms 
fl'om produetAons for Ai that only contain terminals 
in their right-hand sides. When partial derivatives 
are taken from 9ij we can just as well consider .qlj, 
since the constant term will become zero. Wc know 
thal. the terms in g~j do not contain any powers higher 
than 1. of the variables in it. This leads us to the in- 
sight, that taking the mn-th partial derivative of .qij 
results in at most one term consisting of the form 
po,f(si,: .... , s#,/~(Ak)) where f does not depend on 
s ..... and Pij, is one of the probabilities resulting from 
applying Pi to j and some h in 1 ... \]('~fA,\]. If we substi-. 
lute i for all remaining variables in the partial deriw~- 
live we find as value for eijmn the probability that the 
j-th occurrence of nor,terminal Ai is rewritten by the 
production that contains in its right-hand side non- 
terminal occurrence A,7 m . D 
The first-moment matrix for weakly restricted gram- 
mars is defined just like the first-moment matrix for 
unrestricted grammars: 
Definition 3.2 The first-moment matrix E associ- 
ated wzth the weakly restricted grammar G is 
/; = \[~u,,-d 
We order the set of eigenvalues of the first-moment 
matrix from the largest one to the smallest, such that 
P: presenl,s the maximum. 
Theorem 3.2 A proper weakly restric~ed grammar is 
consistenl if pl < 1 a'nd is nol consisZcnl if pj > 1 
931 
The proof of this theorem is analoguous to the proof 
of the related theorem in \[2\] and we will not trea.t it 
here (see \[5\] for a proof). 
Example 3.1 Consider the weakly restricted stoch- 
astic grammar (G~, A) where G~ = (VN, VT, P, Z) = 
((Z, S}, (a}, P, Z) and P ~md A are as follows: 
z -~ s (p, 1 - p) 
s -~ s s (q, ~ - q)(r, 1 - r) 
For a reason at. the of the example to become clear, 
we assume that p ¢ 0. The production generating 
functions are given by 
gll(s11, st2,sla) = ps12s13 + 1 - p 
g12(S11~ Sl2~ 813) "~-- qs12813 @ 1 - q 
g13(S11, 812,813) ---~ rs12813 + 1 - r 
The first-moment matrix E is given t)y 
0 q q 
0 7' r 
The characteristic equation is given by ¢(x) = x((x - 
q)(q-- r)-qr) = x2(x- (q+r)) = 0. Thus, the eigen- 
values of the matrix are 0 and x = q + r. According 
to Theorem 3.2 the grammar is consistent if q + r < 1 
and inconsistent ifq+r > I. Ifq+r = 1 the theo-. 
rein does not decide tile consistency of the grammar. 
From the characteristic equation it follows that the 
value of p does not influence the consistency of the 
grannnar. However, looking at the gramnrar we find 
that it is consistent if p = 0, regardless of probabil- 
ities q and r. Therefore, before Theorern 3.2 can be 
used for checking the consistency of tt~e grammar, the 
grammar must be stripped of productions having for 
each nonterminal occurrence probability zero of being 
applied. \[\] 
Definition 3.3 A final class C ofnonterminal occur- 
rences is a subset of tile set of all nontcrminal occur- 
fences having tile property that any occurrence in C 
has probability 1 of producing, when rewritten using 
one production rule, exactly one occurrence also in 
C. 
Theorem 3.3 A weakly restricted s~ochastic gram- 
mar is consistent if and only if Pl <_ 1 and there are 
no .final classes. 
For the proof of Theorem 3.3 we refer to \[5\]. Ap- 
plying this theorem to the example learns us that if 
q + r = 1, the grammar is consistent if and only if 
there is no final class of nonterminals. Looking at the 
grammar we see that there is a final class of occur- 
rences ifq = 1 or r = 1 (or both); the final classes then 
are {S2},{Ss} and {$2, $3}, respectively; if in addi- 
tion p = 1, then the final classes are {S1, S2},{S1, $3} 
and {$1,$2, $3}, respectively. Hence, the grammar is 
consistent if and only if q + r < 1 A q ¢ 1 A r ¢i 1. 
Notice that if q 7~ r then all trees of a ~ have difi~rent 
probabilities. 
4 Equivalence 
In this section we will show that a weakly restricted 
stochastic grammar can be transformed into an equiv- 
alent unrestricted grammar. We define two grammars 
G and H to be equivalent if DL(G) = DL(II). 
The transformation is pertbrmed as follows. With 
each nonterminal occurrence Aij in the right-hand 
side of a production rule associate a new unique non- 
terminal Aij; for each new nonterminal Aij copy the 
set of production rules with nonterminal Ai as left- 
hand side, replace the left-hand sides with Aij and 
replace in the right-hand sides each nonterminal with 
its new (associated) nonterminal; assign probability 
Pijk to the k-th production rule with left-hand side 
Aij. We formalized this in the following algorithm. 
Algorithm 4.1 
Associate with the j-th occurrence of nontermi- 
nal Ai in the right-hand sides of tile production 
rules a (new) unique nonterminal Aij (clearly 
j ~ I,...,R(A~)). \[\['he set ofnonterminals for 
the rewritten grammar C' is denoted by V/~ and 
is the set of associated nonterminals plus the 
start symbol S from the we~fldy restricted gram- 
mar G. 
2 This step is given in pseudo-pascah 
for i:= 1 to IVNI do 
for j := \] to t~(Ai) do 
\[" := _P' U CA, (j) 
od 
od 
Kl 
where CA,(j) is the set of productions CA, with 
left-hand sides Ai replaced by Aij and the nonter- 
minals in tile right-hand sides of the production 
rules replaced by their associated nonterminals. 
The probabilities to be assigned to tim produc- 
tion rules in CA,(j) are deduced from the Ply -~ 
(Pijl,.. ",PijlCAil): the \]c-th production rule in 
CA,(j) is assigned probability pij;:. 
Theorem 4.1 For every weakly restricted stoch- 
astic grammar there is an unrestricted stochastic 
grammar which, is distributively equivalent. 
Proof We can prove the theorem by proving that 
the algorithm finds for every weakly restricted gram- 
mar an unrestricted grammar that is distributively 
equivalent. From the algorithm it immediately fol- 
lows that the languages (without the probabilities) 
generated by the weakly restricted grammar and the 
unrestricted grammar generated by the algorithm are 
equal. The production rules introduced by the al- 
gorithm in the unrestricted grammar cannot gener- 
ate any other strings than the string generated by 
932 
tile weakly restricted gr~mnnar. Also it. cart be seen 
l\[rom tile algorithm that the unrestricted grammar as- 
sociates the same probabilities with its strings as tt~e 
unrestricted grammar. IIence, the theorem holds. \[\] 
A corollary of this theorem is that %r each weak\]y 
restrict, ed grammar there exists an unrestricted gram- 
mar that is stochaslically equivalent. 
q'he time-complexity of the algorithm can easily be 
found. YWe obserw~ that, if we denote the number of 
nonterminals in the weakly restricted grammar by k, 
each step can be done in in O(k) steps. Then the 
total time complexity is O(k). We, deiine the size of 
a grannriar to be tile product of the number of non- 
terminals and the nmnber of productions. The size of 
the newly created grammar c~al be found to he poly- 
nomial in the size of tile weakly restricted gralnm~u'. 
5 Inference 
The inside-outside algorithm is originally a reestilna- 
Lion procedure for the rule probabilities of an un- 
restricted stochastic grammar in Chomksy Normal 
Form (CNF) \[4\]. It, takes as input an initial m~re- 
stricted stochastic grammar (; in CNF and a sam- 
pie set b7 of strings and it itcralJvely reestimates rule 
probabilities to ma~ximize the probability that the 
grammar would produce the samt)le set,. 
The basic idea of" the inside-outside algorithm is l,o 
tlse the cllrrent rl.tle probabilities to cstirnate from the 
sample set the expected frequencies of certain deriva- 
tion steps, and them compute new rule probability 
estimates as appropriate frequency rates. Therefore, 
each iteration of the algorithm starts by c~deulating 
the inside and outside probabilities for all strings in 
the sample set. These probabilities are. ill fact. prob- 
ability functions which haw~ as arguments a string 
w from the sample set, indexe~ which inclicate what 
substring of w is to be considered, and an occurrence 
of a nonterminal, say A. With i;hese arguments, the 
inside proi~abiliw now is the probability that the oc- 
currence of A derives the substring of w; the oulside 
probability is the probability that the occurrence of 
nonterminal A appears in the intermediate string of 
some deriw~tion of string w. 
In what follows, we will take I@,V7, as tixed n = 
Iv~l, ~ -IVrl, and ass.n,e that VN :- {z -- A~,,,S' = 
A1,A2,...,A,~} and l/!t, = {a\] .... ,ct~}. By definition 
it is required t, hat the grammar has one production for 
start, symbol Z: Z -+ £'. Parallel to the definition of 
generating fnnctions for weakly restricted grammars, 
we have to distinguish all nonterminal occurrences in 
right-hand sides of productions; we remind that the 
probahility of each production depends on the par: 
ticular nonterminal occurrence to be rewritten. The 
inside and outside t>rohabilities now have to he spec: 
ilied for ea.ch nonterminal occurrence seperately. As 
already stated in the introduction, the inside-outside 
algorithm is designed only for context-free grammars 
in CNI i'. Using this fact we can sirnplify the way non- 
terminal occurrences are indexed: A,ffp.,.) (A,.(vq.)) de- 
notes the occurrence of ./lq (At) ill the production 
Ap --* AqA,.; for this production also the notation 
(pqr) is used and for the production Ap---~ aq (pq). 
Similarly the probability of occurrence Aq(p.r) to be 
rewritten using rule (qst) is denotes by Pq(p.r)(q~t). 
For the start production a special provision has to he 
taken: the norlterminal occurrence in its right-hand 
side is denoted by Z~l(0..). A stochastic grammar in 
CNF over these sets can then be specified by 
tg(Adllfl 
i 
probabilities. Since wc require stochastic gr~mmars 
to he proper, we know theft for p, q, r = l,..., ~, 
~_.p,~(>,.)(q~.t) t- )_~ PqO,.,)(,.) = 1 
.s, t s 
If we want to use the inside-outside algorithm for 
grammar inf'erence, then the. grammar prohabilities 
haw; to meet the above condition in order for tile rees- 
timation to make sense. 
If string w - w\]'w2...wl~d, then 1.tvj~ 0 ~ i < 
j < IWl denotes the substring wi+i...wj. The in- 
side probahili,y Pq ,(i,j) estimates the likelihood 
P\[C -r) . . , that occurrence Av(,i.,. ) derives iwj, wlnle the outside 
probability ,O ~l,(q.,,)~i, j) estimates the likelihood of de-- 
riving otl~iAp(q,,)j~t)lw I \[roln the start symbol S. The 
msideq~robability for st, ring w and nontermin~d occur- 
rellce Ap(q.,,) is defined by the recurrent relation 
l~,}q.,.)(i-- t, i) = p,,(q.,.)(p~), where a, = wl 
w (,(~,.,.)(~, a:) := 
) w • . to 
s,t i<j<k 
Similarly, the outsideq~robabilities f'or shorter spans 
of w can he computed from the inside probabilities 
and the outside probabilities for longer spans by the 
following recurrence: 
w o,,(,,,)(0, I,~1) = \], if q 
-- 1 
tu o,,(~,o(o, I,~1) = 0, otherwise 
o;,%,,)(i, = 
i-- 1 ~ w 
' w ' E E Oq(s.t) (J' ~')\],.(qp.)(3, i)pq(a.t)(qpr) 
~,t j=O 
The second equation above is somewhat simpler 
than the corresponding one For unrestricted stoch- 
astic grammars, because the occurrence Ap(q.r) for 
which the outside probability O~(q ,.)(/, k) is com- 
puted specifics the production use(~}~r creating it and 
consequently the prohability for Ap(v,. ) to generate 
cl'wiAp(q.,.)j'w\[w I is the sum of lnnch less possibilities. 
Once the inside and outside probabilities are con}- 
tinted for each string in the sample set E, the reesti- 
mated probability of binary rules, ~Kf.,.)0,~t) , and tile 
933 
reestimated probability of unary rules, ~q(p.,.)(q~), are 
computed using the following reestimation formulae: 
/Sv(q.,.)0,.,, ) = 
1 \[ Pv(q'")(w*)I~(q't)(i'J) \] 
wee O<i<J<k<\[wl- - I~ .) (J'k)O~ v(q., ") (i,k) 
- Z wee 
l)v(q:,')(v~) = 
1 
wEE l<_i<_la\[,wi=a, 
pp(q.r)(p,QOpW(q.,,) (i -- 1, i) 
wEE 
where P~ is the probability assigned by the current 
model to string w 
P~ = I7<0 )(0, I,ol) 
and P~ is the probability assigned by the current 
model to the set of derivations involving some in- 
stance of Ap 
'~ = \[~ i ' ~ i ' 
0_<i<j_<\[wl 
The denominator of the estimates /3p(q.,.)(p~,) and 
}p(q.r)(ps) estimates the probability that a derivation 
of a string w C E will involve at least one expansion 
of the nonterminal occurrence Ap(q.~). The numerator 
of \])p(q.r)(pa~) estimates the probability that a deriva- 
tion of a string w C E will involve rule A,~ ~ AqAr, 
while the numerator of 7~p(q.,,)(pa) estimates the prob- 
ability that a derivation of a string w ~ E will rewrite 
Ap to aa. Thus Dp(q.,')(pst) estimates the probability 
that a rewrite of Ap(q.r) in a string from E will use 
rule Ap --+ A~A,, and Dp(q.~')(ps) estimates the proba- 
bility that occurrence Av(q.~ ) in a string from E will 
be rewritten to a,. Clearly, these are the best cur- 
rent estimates for the binary and unary ruie proba- 
bilities. The process is then repeated with the reesti- 
mated probabilities until the increase in the estimated 
probability of the sample set given the model becomes 
negligible. We presented the inside, outside and (esti- 
mated) production probabilities only for the nonter- 
minal occurrences of the form Ap(q.r); for occurrences 
Ap(qr.) these can simply be found by adapting the 
equations we have given for them. 
'\]?he reestimation algorithm can be used both to 
refine the current estimated probabilities of a stoch- 
astic grammar and to infer a stochastic grammar from 
scratch. The former application can be said to be 
incremental. In the latter case, the initial weakly 
restricted grammar for the inside-outside algorithm 
consists of all possible CNF rules over the given sets 
VN of nonterminals and liT of terminals, with suitable 
nonzero probabilities assigned to the nontm'minal oc- 
currences. 
6 Conclusions 
In this paper we have investigated consistency of 
weakly restricted stochastic grammars and presented 
an adapted version of the inside-outside algorithm. 
Other issues concerning stochastic grammars and es- 
pecially weakly restricted grammars that are being in- 
vestigated at the moment are stochastic grammatical 
inference and parsing using weakly restricted gram- 
mars. By stochastic grammatical inference we mean 
grammatical inference whereby the production prob- 
abilities are computed simultaneously. Consistency 
of stochastic grammars and stochastic inference will 
be treated in full in the master thesis of H.W.L. ter 
Doest, which is to appear in 1994 \[5\]. 
Acknowledgement We are grateful to Jorma 
Tarhio, presently at the University of Berkeley, Cali- 
fornia, for stimulating discnssions. 
References 
\[1\] R. op den Akker. Stochastic Gram.mars: fl, eory 
and applications. University of Twente, Depart- 
ment of Computer Science, Memoranda Informat- 
ica 93-19, 1993. 
\[2\] T.L. Booth, R.A. Thompson. Applying Probabil- 
ity Measures to Abstract Languages. In: IEEE 
Transactions on ComputersVoh C-22, No. 5, May 
1973. 
\[3\] T. Briscoe, J. Carroll. Generalized Probabilistic 
Lt~ Parsing of Natural Language (Corpora) with 
Unification-Based Grammars. hr: Computational 
Linguistics, Vol. 19, No. 1. 
\[4\] K. Lari and S.J. Young. Applications of Stoch- 
astic Context-free Grammars Using the Inside- 
Outside Algorithm. In: Computer Speech and Lan- 
guage,Vol. 5,pp. 237-257,1991. 
\[5\] II.W.L. ter Doest. Stochastic Grammars: Consis- 
tency and Inference. M. Sc. Thesis, University of 
Twente, Enschede, in preparation, The Nether- 
lands. 
\[6\] T.E. Harris. Th.e Theory of Branching Processes. 
Springer-Verlag (Berlin and New York), 1963. 
\[7\] F. aelinek, J.D. Lafferty. Computation of the 
Probability of Initial Substring Generation by 
Stochastic Context-Free Grammars. In: Compu- 
tational Linguistics, Vol. 17, No. 3. 
\[8\] A. Salomaa. Probabilistic and Weighted Gram- 
mars. In: Information and Sciences, Vol. 15 
(1969), pp. 529-544. 
C.S. Wetherell. Probabilistic Languages: A Re- 
view and Some Open Questions. In: Computing 
Surveys , Vol. 12, No. 4, pp. 361-379, December 
1980. 
\[9\] 
934 
