File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-2150_metho.xml

Size: 20,979 bytes

Last Modified: 2025-10-06 14:13:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-2150">
  <Title>Y=c fldy 17{.c, trlct, c.d St,()chasti(: (;;rammars</Title>
  <Section position="4" start_page="929" end_page="930" type="metho">
    <SectionTitle>
2 Weakly Restricted Stochastic
Grammars
</SectionTitle>
    <Paragraph position="0"> ~Ib add context-sensitivity to the assignment of probabilities to the application of production rules, we take into account (and distinguish) the occurrences of the nonterminals. Then, for each nonterminal occurrence distinct probabilities can be given for the production rules that can be used to rewrite the nonterminal.</Paragraph>
    <Paragraph position="1"> This way of assigning probabilities to the application 2Although we found in \[7\] by Jelinek and Lafferty the (false) statement that a stochastic grammar is consistent if and only if it is proper, given that the underlying grammar is reduced.</Paragraph>
    <Paragraph position="2"> The example gives a clear counter example of their statement.</Paragraph>
    <Paragraph position="3"> of production rules seems unknown in literature, although we found some other fornlalisms that were designed to add context-sensitivity to the assignment of probabilities. For instance, the definition of stochastic grammars by Salomaa in \[8\] is somewhat different from the definition we gave in our introduction: the probability of a production to be applied is here dependent on the production that was last applied.</Paragraph>
    <Paragraph position="4"> To escape the bootstrap problem (when a derivations is started, there is no last applied production) an initial stochastic vector is added to the grammar.</Paragraph>
    <Paragraph position="5"> Weakly restricted stochastic grammars are introduced in \[1\]. In the following definition Ca, denotes the set of productions for Ai and l~(Ai) denotes the number of right-hand side occurrences of nonterminal A~.</Paragraph>
    <Paragraph position="6"> Definition 2.1 A weakly restricted stochastic grammat&amp;quot; Gzv is a pair&amp;quot; (Cc, A), where Cc = (VN, ~,},, P, X) is a conlex#free flrammar and A is a set of functions A = {p~lA~ C VN} where, if j E 1 ...t~(Ai) and k E 1...\[CA,I, pi(j,k) = Pij~&amp;quot; G \[0, 1\] The set of productions P contains ezacily one produclion for start symbol S.</Paragraph>
    <Paragraph position="7"> In words, Plj~ stands for the probability that the k-th production with left~hand side Ai is used for rewriting the j-th right-hand side-occurrence of nonterminal Ai. The usefullness of this context-dependency can be seen immediately fi'om the following unrestricted stochastic grammar, which is taken (in part) from the example grammar in \[3\] (p. 29):</Paragraph>
    <Paragraph position="9"> Unrestricted stochastic grammars cannot model context dependent use of productions. For example, an NP is more likely to be expanded as a pronoun in sub-ject position than elsewhere. Exactly this dependence on where a nonterminal was introduced can be modeled by using a weakly restricted stochastic grammar.</Paragraph>
    <Paragraph position="10"> Since in a weakly restricted stochastic grammar the probabilities of applying a production are dependent on the particular occurrence of a nonterminal in the right-hand side of a production, it is useflfl to require that there is only one start production.</Paragraph>
    <Paragraph position="11"> The characteristic grammar of a weakly restricted grammar is the underlying context free grammar.</Paragraph>
    <Paragraph position="12"> The next step is to compute probabilities for strings with respect to weakly restricted stochas~aic grammars. For this purpose a tree is written in terms of its subtrees (trees with a nonterminal as root) as q\[tiljl, f.i2j~, * .., ti,~(q)j,(q)\], in which q is a production, n(q) is the number of nonterminals in the right-hand side of q and tij denotes a (sub)tree with the j-th  occurrence of' nonterminal Ai at its root. A tree for which n(q)---0 is written as \[\].</Paragraph>
    <Paragraph position="13"> Definition 2.2 The probabilily of a derivation tree t wi~h respect to o weakly reslriclcd s~ochastie grammar is deflated r'ecurswely a</Paragraph>
    <Paragraph position="15"> where 1 &lt; k,,~ &lt; \[CA,,. \].</Paragraph>
    <Paragraph position="16"> The probability of a string is deiined as the sun: of the probabilities of all distinct derivation trees ~hat yield this string.</Paragraph>
    <Paragraph position="17"> l)efinition 2.3 73e probability of a string x Z~l L(G~) is defined as The distribution langnage \])L(Ow) and stochastic language ,5'/;((\]~) of a wealdy restrieted gramuxar ((;~,A) are detined ana\]oguous to :;l~e distribution language and stochastic language of an nnrestricted granllilar.</Paragraph>
  </Section>
  <Section position="5" start_page="930" end_page="931" type="metho">
    <SectionTitle>
3 Consistency
</SectionTitle>
    <Paragraph position="0"> In this section consistency of we~kly rest, rict, ed stochastic grammars will be considered. The theory of nmltiwpe hranching processes will be used to come to a similar theorem as is given in \[2\] for unrestricted stochastic grammars.</Paragraph>
    <Paragraph position="1"> Definition 3.1 l~or the j-th occurrence of ~ontermihal Ai ~ VN the production generating fnnetAon \]or weakly restricted stochaslic grammars is defined as:</Paragraph>
    <Paragraph position="3"> u=l m~:\] n:.:l where r,m(k) is 1 if nonlerminal-occurrence A .... appears in the righ1-hand side of the k-lh production rule wilh nonIerminal Ai as left-hand side and 0 otherwise. Note that for each right-hand side nonterminal occurrence a dummy-variable is introduced: sij corresponds to the j-th occurrence of nonterminal Ai. A special variable is Sl,:: it corresponds to the s~art symbol which is the right-hand side of the start production s ~ P of the form Z ~ S, The genera~ing function for nonterminal occnrrenee A 0 entails for each production for Ai a term. If 91j has a term of the form O~Si: Si 2 . . . Si,, then we know thai it corresponds to a prodnetion for Ai of the fbrm A i -+ :t:i:Ail Xi2gi ~ . . . zi,~Ai,,:ci,,+~ where the .~:ij ~ 1.4\[. The production has, if it is used for rewriting oceurrence Aij, probability ct of being applied. In Example 3 it will be illustrated how the terms of t~he genera.tAng timctions correspond to the productions of the grammar.</Paragraph>
    <Paragraph position="4">  Theorem 3.1 Let Aq =~ ct,&amp;quot; thus ~he j-th oecurre~.cc of nonlerminal Ai is re.written using ezactly one prodTtclion. 73c probability tha~ (~ contains lhe ,..th occurrence of nontermznal Am is given by c~gii(Sl,t ..... st',tt(Ak)) Proof h~ general ~he generating function can be writ-</Paragraph>
    <Paragraph position="6"> where glj(slj,,.., s~,Jt(a~)) only contains terms dependent on slj,... ,sk,la(A~) and where eij is a (;Ollstant~ term. The terms dependent on S:,l, ..., S~.,R(Ak) come from productions for Ai that contain nontermi..</Paragraph>
    <Paragraph position="7"> nals in their right-h~md sides and the constant terms fl'om produetAons for Ai that only contain terminals in their right-hand sides. When partial derivatives are taken from 9ij we can just as well consider .qlj, since the constant term will become zero. Wc know thal. the terms in g~j do not contain any powers higher than 1. of the variables in it. This leads us to the insight, that taking the mn-th partial derivative of .qij results in at most one term consisting of the form po,f(si,: .... , s#,/~(Ak)) where f does not depend on s ..... and Pij, is one of the probabilities resulting from applying Pi to j and some h in 1 ... \]('~fA,\]. If we substi-. lute i for all remaining variables in the partial deriw~live we find as value for eijmn the probability that the j-th occurrence of nor,terminal Ai is rewritten by the production that contains in its right-hand side non-terminal occurrence A,7 m . D The first-moment matrix for weakly restricted grammars is defined just like the first-moment matrix for unrestricted grammars: Definition 3.2 The first-moment matrix E associated wzth the weakly restricted grammar G is /; = \[~u,,-d We order the set of eigenvalues of the first-moment matrix from the largest one to the smallest, such that P: presenl,s the maximum.</Paragraph>
    <Paragraph position="8"> Theorem 3.2 A proper weakly restric~ed grammar is consistenl if pl &lt; 1 a'nd is nol consisZcnl if pj &gt; 1  The proof of this theorem is analoguous to the proof of the related theorem in \[2\] and we will not trea.t it here (see \[5\] for a proof).</Paragraph>
    <Paragraph position="9">  Example 3.1 Consider the weakly restricted stochastic grammar (G~, A) where G~ = (VN, VT, P, Z) = ((Z, S}, (a}, P, Z) and P ~md A are as follows: z -~ s (p, 1 - p) s -~ s s (q, ~ - q)(r, 1 - r)  For a reason at. the of the example to become clear, we assume that p C/ 0. The production generating functions are given by</Paragraph>
    <Paragraph position="11"> The characteristic equation is given by C/(x) = x((x q)(q-- r)-qr) = x2(x- (q+r)) = 0. Thus, the eigenvalues of the matrix are 0 and x = q + r. According to Theorem 3.2 the grammar is consistent if q + r &lt; 1 and inconsistent ifq+r &gt; I. Ifq+r = 1 the theo-.</Paragraph>
    <Paragraph position="12"> rein does not decide tile consistency of the grammar.</Paragraph>
    <Paragraph position="13"> From the characteristic equation it follows that the value of p does not influence the consistency of the grannnar. However, looking at the gramnrar we find that it is consistent if p = 0, regardless of probabilities q and r. Therefore, before Theorern 3.2 can be used for checking the consistency of tt~e grammar, the grammar must be stripped of productions having for each nonterminal occurrence probability zero of being applied. \[\] Definition 3.3 A final class C ofnonterminal occurrences is a subset of tile set of all nontcrminal occurfences having tile property that any occurrence in C has probability 1 of producing, when rewritten using one production rule, exactly one occurrence also in C.</Paragraph>
    <Paragraph position="14"> Theorem 3.3 A weakly restricted s~ochastic grammar is consistent if and only if Pl &lt;_ 1 and there are no .final classes.</Paragraph>
    <Paragraph position="15"> For the proof of Theorem 3.3 we refer to \[5\]. Applying this theorem to the example learns us that if q + r = 1, the grammar is consistent if and only if there is no final class of nonterminals. Looking at the grammar we see that there is a final class of occurrences ifq = 1 or r = 1 (or both); the final classes then are {S2},{Ss} and {$2, $3}, respectively; if in addition p = 1, then the final classes are {S1, S2},{S1, $3} and {$1,$2, $3}, respectively. Hence, the grammar is consistent if and only if q + r &lt; 1 A q C/ 1 A r C/i 1. Notice that if q 7~ r then all trees of a ~ have difi~rent probabilities.</Paragraph>
  </Section>
  <Section position="6" start_page="931" end_page="932" type="metho">
    <SectionTitle>
4 Equivalence
</SectionTitle>
    <Paragraph position="0"> In this section we will show that a weakly restricted stochastic grammar can be transformed into an equivalent unrestricted grammar. We define two grammars G and H to be equivalent if DL(G) = DL(II).</Paragraph>
    <Paragraph position="1"> The transformation is pertbrmed as follows. With each nonterminal occurrence Aij in the right-hand side of a production rule associate a new unique non-terminal Aij; for each new nonterminal Aij copy the set of production rules with nonterminal Ai as left-hand side, replace the left-hand sides with Aij and replace in the right-hand sides each nonterminal with its new (associated) nonterminal; assign probability Pijk to the k-th production rule with left-hand side Aij. We formalized this in the following algorithm.</Paragraph>
    <Paragraph position="2"> Algorithm 4.1 Associate with the j-th occurrence of nonterminal Ai in the right-hand sides of tile production rules a (new) unique nonterminal Aij (clearly j ~ I,...,R(A~)). \[\['he set ofnonterminals for the rewritten grammar C' is denoted by V/~ and is the set of associated nonterminals plus the start symbol S from the we~fldy restricted grammar G.</Paragraph>
    <Paragraph position="3"> 2 This step is given in pseudo-pascah for i:= 1 to IVNI do</Paragraph>
    <Paragraph position="5"> where CA,(j) is the set of productions CA, with left-hand sides Ai replaced by Aij and the nonterminals in tile right-hand sides of the production rules replaced by their associated nonterminals.</Paragraph>
    <Paragraph position="6"> The probabilities to be assigned to tim production rules in CA,(j) are deduced from the Ply -~ (Pijl,.. &amp;quot;,PijlCAil): the \]c-th production rule in CA,(j) is assigned probability pij;:.</Paragraph>
    <Paragraph position="7"> Theorem 4.1 For every weakly restricted stochastic grammar there is an unrestricted stochastic grammar which, is distributively equivalent.</Paragraph>
    <Paragraph position="8"> Proof We can prove the theorem by proving that the algorithm finds for every weakly restricted grammar an unrestricted grammar that is distributively equivalent. From the algorithm it immediately follows that the languages (without the probabilities) generated by the weakly restricted grammar and the unrestricted grammar generated by the algorithm are equal. The production rules introduced by the algorithm in the unrestricted grammar cannot generate any other strings than the string generated by  tile weakly restricted gr~mnnar. Also it. cart be seen l\[rom tile algorithm that the unrestricted grammar associates the same probabilities with its strings as tt~e unrestricted grammar. IIence, the theorem holds. \[\] A corollary of this theorem is that %r each weak\]y restrict, ed grammar there exists an unrestricted grammar that is stochaslically equivalent.</Paragraph>
    <Paragraph position="9"> q'he time-complexity of the algorithm can easily be found. YWe obserw~ that, if we denote the number of nonterminals in the weakly restricted grammar by k, each step can be done in in O(k) steps. Then the total time complexity is O(k). We, deiine the size of a grannriar to be tile product of the number of non-terminals and the nmnber of productions. The size of the newly created grammar c~al be found to he polynomial in the size of tile weakly restricted gralnm~u'.</Paragraph>
  </Section>
  <Section position="7" start_page="932" end_page="933" type="metho">
    <SectionTitle>
5 Inference
</SectionTitle>
    <Paragraph position="0"> The inside-outside algorithm is originally a reestilna-Lion procedure for the rule probabilities of an unrestricted stochastic grammar in Chomksy Normal Form (CNF) \[4\]. It, takes as input an initial m~restricted stochastic grammar (; in CNF and a sampie set b7 of strings and it itcralJvely reestimates rule probabilities to ma~ximize the probability that the grammar would produce the samt)le set,.</Paragraph>
    <Paragraph position="1"> The basic idea of&amp;quot; the inside-outside algorithm is l,o tlse the cllrrent rl.tle probabilities to cstirnate from the sample set the expected frequencies of certain derivation steps, and them compute new rule probability estimates as appropriate frequency rates. Therefore, each iteration of the algorithm starts by c~deulating the inside and outside probabilities for all strings in the sample set. These probabilities are. ill fact. probability functions which haw~ as arguments a string w from the sample set, indexe~ which inclicate what substring of w is to be considered, and an occurrence of a nonterminal, say A. With i;hese arguments, the inside proi~abiliw now is the probability that the occurrence of A derives the substring of w; the oulside probability is the probability that the occurrence of nonterminal A appears in the intermediate string of some deriw~tion of string w.</Paragraph>
    <Paragraph position="2"> In what follows, we will take I@,V7, as tixed n = Iv~l, ~ -IVrl, and ass.n,e that VN :- {z -- A~,,,S' = A1,A2,...,A,~} and l/!t, = {a\] .... ,ct~}. By definition it is required t, hat the grammar has one production for start, symbol Z: Z -+ PS'. Parallel to the definition of generating fnnctions for weakly restricted grammars, we have to distinguish all nonterminal occurrences in right-hand sides of productions; we remind that the probahility of each production depends on the par: ticular nonterminal occurrence to be rewritten. The inside and outside t&gt;rohabilities now have to he spec: ilied for ea.ch nonterminal occurrence seperately. As already stated in the introduction, the inside-outside algorithm is designed only for context-free grammars in CNI i'. Using this fact we can sirnplify the way non-terminal occurrences are indexed: A,ffp.,.) (A,.(vq.)) denotes the occurrence of ./lq (At) ill the production Ap --* AqA,.; for this production also the notation (pqr) is used and for the production Ap---~ aq (pq).</Paragraph>
    <Paragraph position="3"> Similarly the probability of occurrence Aq(p.r) to be rewritten using rule (qst) is denotes by Pq(p.r)(q~t).</Paragraph>
    <Paragraph position="4"> For the start production a special provision has to he taken: the norlterminal occurrence in its right-hand side is denoted by Z~l(0..). A stochastic grammar in CNF over these sets can then be specified by tg(Adllfl i probabilities. Since wc require stochastic gr~mmars to he proper, we know theft for p, q, r = l,..., ~, ~_.p,~(&gt;,.)(q~.t) t- )_~ PqO,.,)(,.) = 1 .s, t s If we want to use the inside-outside algorithm for grammar inf'erence, then the. grammar prohabilities haw; to meet the above condition in order for tile reestimation to make sense.</Paragraph>
    <Paragraph position="5"> If string w - w\]'w2...wl~d, then 1.tvj~ 0 ~ i &lt; j &lt; IWl denotes the substring wi+i...wj. The inside probahili,y Pq ,(i,j) estimates the likelihood P\[C -r) . . , that occurrence Av(,i.,. ) derives iwj, wlnle the outside probability ,O ~l,(q.,,)~i, j) estimates the likelihood of de-riving otl~iAp(q,,)j~t)lw I \[roln the start symbol S. The msideq~robability for st, ring w and nontermin~d occurrellce Ap(q.,,) is defined by the recurrent relation</Paragraph>
    <Paragraph position="7"> Similarly, the outsideq~robabilities f'or shorter spans of w can he computed from the inside probabilities and the outside probabilities for longer spans by the following recurrence:</Paragraph>
    <Paragraph position="9"> The second equation above is somewhat simpler than the corresponding one For unrestricted stochastic grammars, because the occurrence Ap(q.r) for which the outside probability O~(q ,.)(/, k) is computed specifics the production use(~}~r creating it and consequently the prohability for Ap(v,. ) to generate cl'wiAp(q.,.)j'w\[w I is the sum of lnnch less possibilities. Once the inside and outside probabilities are con}tinted for each string in the sample set E, the reestimated probability of binary rules, ~Kf.,.)0,~t) , and tile  reestimated probability of unary rules, ~q(p.,.)(q~), are computed using the following reestimation formulae:</Paragraph>
    <Paragraph position="11"> where P~ is the probability assigned by the current model to string w</Paragraph>
    <Paragraph position="13"> and P~ is the probability assigned by the current model to the set of derivations involving some instance of Ap</Paragraph>
    <Paragraph position="15"> The denominator of the estimates /3p(q.,.)(p~,) and }p(q.r)(ps) estimates the probability that a derivation of a string w C E will involve at least one expansion of the nonterminal occurrence Ap(q.~). The numerator of \])p(q.r)(pa~) estimates the probability that a derivation of a string w C E will involve rule A,~ ~ AqAr, while the numerator of 7~p(q.,,)(pa) estimates the probability that a derivation of a string w ~ E will rewrite Ap to aa. Thus Dp(q.,')(pst) estimates the probability that a rewrite of Ap(q.r) in a string from E will use rule Ap --+ A~A,, and Dp(q.~')(ps) estimates the probability that occurrence Av(q.~ ) in a string from E will be rewritten to a,. Clearly, these are the best current estimates for the binary and unary ruie probabilities. The process is then repeated with the reestimated probabilities until the increase in the estimated probability of the sample set given the model becomes negligible. We presented the inside, outside and (estimated) production probabilities only for the nonterminal occurrences of the form Ap(q.r); for occurrences Ap(qr.) these can simply be found by adapting the equations we have given for them.</Paragraph>
    <Paragraph position="16"> '\]?he reestimation algorithm can be used both to refine the current estimated probabilities of a stochastic grammar and to infer a stochastic grammar from scratch. The former application can be said to be incremental. In the latter case, the initial weakly restricted grammar for the inside-outside algorithm consists of all possible CNF rules over the given sets VN of nonterminals and liT of terminals, with suitable nonzero probabilities assigned to the nontm'minal occurrences. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML