File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-2066_metho.xml
Size: 27,372 bytes
Last Modified: 2025-10-06 14:12:53
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-2066"> <Title>Stochastic Lexicalized Tree-Adjoining Grammars *</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 1 Motivations </SectionTitle> <Paragraph position="0"> Although stochastic techniques applied to syntax modeling have recently regained popularity, current lazlguage models suffer from obvious inherent inadequacies.</Paragraph> <Paragraph position="1"> Early proposals such as Markov Models, N-gram models (Pratt, 1942; Shannon, 1948; Shannon, 1951) and tlidden Markov Models were very quickly shown to be linguistically not appropriate for natural language (e.g.</Paragraph> <Paragraph position="2"> Chomsky (1964, pages 13-18)) since they are unable to capture long distance dependencies or to describe hierarchically the syntax of natural languages. Stochastic context-free granunar (Booth, 1969) is a hierarchical model more appropriate for natural languages, however none of such proposals (Lari and Young, 1990; Jelinek, Lafferty, and Mercer, 1990) perform as well as the simpler Markov Models because of the difficulty of capturing lexical information. The parameters of a stochastic context-free grammar do not correspond directly to a distribution over words since distributional phenomena over words that are embodied by the application of *This work was partially supported by DARPA Grant N001490-31863, ARO Grant DAAL03-89-C-0031 and NSF Grant 1RI90- 16592. We thank Aravind Joshi for suggesting the use of TAGs for statistical analysis during a private discussion that followed a presentation by bS'ed Jdinek during the June 1990 meeting of the DARPA Speech and Natural Language Workshop. We are also grateful to Peter Braun, FYed Jelinek, Mark Liberman, Mitch Marcus, Robert Mercer, Fernando Pereira said Stuart Shieber for providing vMu~ble comments.</Paragraph> <Paragraph position="3"> more than one context-free rule cannot be captured under the context-freeness assumption. This leads to the difficulty of maintaining a standard hierarchical model while capturing lexieal dependencies.</Paragraph> <Paragraph position="4"> This fact prompted researchers in natural language processing to give up hierarchical language models in the favor of non-hierarchical statistical models over words (such as word N-grams models). Probably for lack of a better language model, it has also been argued that the phenomena that such devices cannot capture occur relatively infrequently. Such argumentation is linguistically not sound.</Paragraph> <Paragraph position="5"> Lexicalized tree-adjoining grammars (LTAG) t combine hierarchical structures while being hxieany sensitive and are therefore more appropriate for statistical analysis of language. In fact, LTAGs are the simplest hierarchical formalism which can serve as the basis for lexicalizing context-free grammar (Schabes, 1990; Joshi and Sehabes, 1991).</Paragraph> <Paragraph position="6"> LTAG is a tree-rewriting system that combines trees of large domain with adjoining and substitution. The trees found in a TAG take advantage of the available extended domain of locality by localizing syntactic dependencies (such as finer-gap, subject-verb, verb-objeet) and most semantic dependencies (such as predicate-argument relationship). For example, the following trees can be found in a LTAG lexicon:</Paragraph> <Paragraph position="8"> uts J~n p~nutJ hungrily Since the elementary trees of a LTAG are minimal syntactic and semantic units, distributional analysis of the combination of these elementary trees based on a training corpus will inform us about relevant statistical aspects of the language such as the classes of words appearing as arguments of a predicative element, the distribution of the adverbs licensed by a specific verb, or the adjectives licensed by a specific noun.</Paragraph> <Paragraph position="9"> This kind of statistical analysis as independently suggested in (Resnik, 1991) can be made with LTAGs because of their extended domain of locality but also because of their lexiealized property.</Paragraph> <Paragraph position="10"> lWe attallnle familiarity throughout the paper with TAGs and its lexicallzed variant, See, for instance, (Joehl, 1987), (Schabes, Abeill~, and Joehi, 1988), (Schabes, 1990) or (Joslfi and Schabes, 1~1).</Paragraph> <Paragraph position="11"> ACTES DE COLING-92. NANTES, 23-28 AOUT 1992 4 2 6 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 In this paper, this intuition is made formally precise by defining the notion of a stochastic lexicalized tree-adjoining grammar (SLTAG). We present an algorithm for computing the probability of a sentence generated by a SLTAG, and finally we introduce an iterative algorithm for estimathlg the parameters of a SLTAG given a training corpus of text. This algorithm can either be used for refining the parameters of a SLTAG or for inferring a tree-adjoining grammar frmn a training corpus. We also report preliminary experiments with this algorithm.</Paragraph> <Paragraph position="12"> Due to the lack of space, in this paper tim algorithms are described succinctly without proofs of correctness and more attention is given to tile concepts and techniques used for SLTAG.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 SLTAG </SectionTitle> <Paragraph position="0"> hfformally speaking, SLTAGs are defined by assigning a probability to tile event that an elementary tree is combined (by adjunction or substitution) on a specific node of another elementary tree. These events of combination are the stochastic processes considered.</Paragraph> <Paragraph position="1"> Since SLTAG are defined on the basis of the derivation and since TAG allows for a notion of derivation independent from the trees that are derived, a precise mathematical definition of the SLTAG derivation must be given. For this purpose, we use stochastic linear indexed grammars (SLIG) to formally express SLTAGs derivations.</Paragraph> <Paragraph position="2"> Linear Indexed grammar (LIG) (Alto, 1968; Gazdar, 1985) is a rewriting system in which the non-terminal symbols are augmented with a stack, in addition to rewriting non-terminals, the rules of the grammar can have the effect of pushing or popping symbols on top of tile stacks that are associated with each non-terminal symbol. A specific rule is triggered by the non-termlnal on the left hand side of the rule and the top element of its associated stack.</Paragraph> <Paragraph position="3"> The productions of a LIG are restricted to copy the stack corresponding to tile non-terminal being rewritten to at most one stack associated with a non-terminal symbol on tile right hand side of the production? In tile following, \[..p\] refers to a possibly unbounded stack whose top element is p and whose remaining part is schematically written as '..'. \[$\] represents a stack whose only element is the bottom of the stack. While it is possible to define SLIGs in general, we define them for the particular case where the rules are binary branching and where tile left hand sides are always incomparable.</Paragraph> <Paragraph position="4"> A stochastic linear indexed grammar, G, is denoted by (VN, VT, VI, S, Prod), where VN is a finite set of non-terminal symbols; VT is a finite set of terminal symbols; VI is a finite set of stack symbols; S E VN is the start symbol; Prod is a finite set of productions of the form:</Paragraph> <Paragraph position="6"> where Xk E Vjv, a E VT and po ~. VI, Pl,P2 E V\[; P, a probability distribution which assigns a probability, 0 < P(X\[..z\] ~ A) < 1, to a rule, X\[..x\] -* A ~. Prodsuch 2LIGs have been shown to be weakly eqtfivalent to &quot;Ibee- Adjoining Graramars (V~jay-Shanker, 1987). that tbe sum of the probabilities of all the rules that can be applied to any non-terminal annotated with a stack is equal to one. More precisely if, VX E VN,Vp E VI:</Paragraph> <Paragraph position="8"> that X\[..p\] is rewritten as A.</Paragraph> <Paragraph position="9"> A derivation starts from S associated with the empty stack (S\[$\]) and each level of the derivation must be validated by a production rule. The language of a SLIG is defined as follows: L = {w E VT~ \[ S\[$\]~w}.</Paragraph> <Paragraph position="10"> The probability of a derivation is defined as the product of tile probabilities of all individual rules involved (counting repetition) in the derivation, the derivation being validated by a correct configuration of the stack at each level. The probability of a sentence is then computed as the sum of the probabilities of all derivations of tile sentence.</Paragraph> <Paragraph position="11"> Following tile construction described in (Vijay-Shanker and Weir, 1991), given a LTAG, Glaa, we construct an equivalent LIG, G,ua. Tile constructed LIG generates tile same language as Gtag and each derivation of Gtaa corresponds to a unique LIG derivation corresponds to a unique derivation in G,ua (and conversely). In addition, a probability is assigned to each production of the LIG. For simplicity of explanation and without loss of generality we assume that each node in an elementary tree in Gt,9 is either a leaf node (i.e.</Paragraph> <Paragraph position="12"> either a foot node or a non-empty terminal node) or binary branching, a The construction of the equivalent SLIG follows.</Paragraph> <Paragraph position="13"> The non-terminal symbols of Gstia are the two symbols 'top' (t) and 'bottom' (b), tile set of terminal symbols is the same as the one of Gta9, the set of stack symbols is the set of nodes (not node labels) found in the elementary trees of Gla~ augmented with the bottom of tile stack ($), and tile start symbol is 'top' (t). For &quot;all root nodes ~10 of an initial tree whose root is labeled by S, the following starting rules are added:</Paragraph> <Paragraph position="15"> These rules state that a derivation must start from the top of the root node of some initial tree. P is the probability that a derivation starts from the initial tree associated with a lexical item and rooted by %.</Paragraph> <Paragraph position="16"> Then, for all node '/ in an elementary tree, the following rules are generated.</Paragraph> <Paragraph position="17"> * If rhT/2 are ttle 2 children of a node r/sucb that r/2 is on the spine (i.e. subsumes tile foot node), include: b\[..~l ~&' tI$n, lt\[-.,~l (2) Since (2) encodes an immediate domination link defined by the tree-adjoining grammar, its associated probability is one.</Paragraph> <Paragraph position="18"> * Similarly, if thT/~ are the 2 children of a node r/such that r h is on the spine (i.e. subsumes the foot node), include: b\[..rt\] P=-*~ t\[&quot;rl~\]t\[$~\] (3) Since (3) encodes a~t immediate domination link defined by the tree-adjoining grammar, its associated probability is one.</Paragraph> <Paragraph position="19"> aThe algorlthnm explained ill this paper cart be generalized to lexicadized tree-adjoining granunars that need not be in Chottmky Normal Form using techniquC/~ similar the one found in (Schabet, 1991).</Paragraph> <Paragraph position="20"> ACIES DE COLING-92, NANTES, 23-28 AO~rf 1992 4 2 7 P~oc. OF COLING-92, NANTES, AUG. 23-28, 1992 * If ~/tT/2 are the 2 children of a node q such that none of them is on the spine, include:</Paragraph> <Paragraph position="22"> Since (4) also encodes an immediate domination link defined by the tree-adjoining grammar, its associated probability is one.</Paragraph> <Paragraph position="23"> * If 7? is a node labeled by a non-terminal symbol and if it does not have an obligatory adjoining constraint, then we need to consider the case that adjunetion might not take place. In this ease, include:</Paragraph> <Paragraph position="25"> The probability of rule (5) corresponds to the probability that no adjunetion takes place at node q.</Paragraph> <Paragraph position="26"> o If t/ is an node on which the auxiliary tree fl can be adjoined, the adjunetiou of fl can be predicted, therefore (assuming that ~tr is the root node of fl) include:</Paragraph> <Paragraph position="28"> The probability of rule (6) corresponds to the probability of adjoining the auxiliary tree whose root node is ~/~, say/3, on the node 0 belonging to some elementary tree, say a.4 * If r)! is tim foot node of an auxiliary tree fl that has been adjoined, then the derivation of the node below q\] must resume. In this case, include:</Paragraph> <Paragraph position="30"> The above stochastic production is included with probability one since the decision of adjunction has already been made in rules of the form (6).</Paragraph> <Paragraph position="31"> * Finally, if r h is the root node of an initial tree that can be substituted on a node marked for substitution r), include: t\[$~\] L t\[S~t\] (g) Here, p is the probability that the initial tree rooted by ~/~ is substituted at node q. It corresponds to the probability of substituting the lexicalized initial tree whose root node is 71, say 6, at the node q of a lexicalized elementary tree, say a. 5 The SLIG constructed as above is well defined if the following equalities hold for all nodes ~l:</Paragraph> <Paragraph position="33"> 4Since the granmmr is lexicalized, both trees a and /3 are a~ sociated with lexical iter~s, mad the site node for adjtmction ~ correuponds to some syntactic modification. Such llde encapsu- lates S modifiers (e.g. s~tential adverbs as in &quot;apparently John left&quot;), VP modifiers (e.g. verb phr~e adverbs as in &quot;John left abruptly}&quot;, NP modifiers (e.g. relative clauses as in &quot;The man who left was happy&quot;), N modifiers (e.g. adtieetive~ as in &quot;prelty woman&quot;), or even sententiM complements (e.g. John think8 that Harry is sick).</Paragraph> <Paragraph position="34"> s Among other cases, the probability of thi~ rule corr~ponds to the probability of filling some argument p(~ition by a lexiealized tree. It will encapsulate the distribution for Belectional restriction since the position of substitution is taken into account.</Paragraph> <Paragraph position="35"> A gramular satisfying (12) is called consistent. 6</Paragraph> <Paragraph position="37"> Beside the distributional phenomena that we mentioned earlier, SLTAG also captures the effect of adjoining constraints (selective, obligatory or null adjoining) which are required for tree-adjoining grammar. 7</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Algorithm for Computing the </SectionTitle> <Paragraph position="0"> Probability of a Sentence We now define an bottom-up algorithm for SLTAG which computes the probability of an input string. The algorithm is an extension of the CKY-type parser for tree-adjoining grammar (Vijay-Shanker, 1987). The extended algorithm parses all spans of the input string and also computes tbelr probability in a bottom-up fashion.</Paragraph> <Paragraph position="1"> Since the string on the frontier of an auxiliary is broken up into two substrings by the foot node, for the purpose of computing the probability of the sentence, we will consider the probability that a node derives two substrings of the input string. This entity will be called the inside probability. Its exact definition is given below. null We will refer to the subsequenee of the input string w = ax &quot;&quot; aN from position i to j, w{'. It is defined as follows: w~/'~f { ai+t&quot; .uj ,ifi>_j' if/< j Given a string w = at... a N and a SLTAG rewritten as in (1-8) the inside probability, F(pos, 71, i,j, k,l), is defined for all nodes 7/ contained in an elementary tree and for pos E {t,b}, and for all indices 0 < i < j < k < I < N as follows: (i) If the node 7/does not subsume the foot node of (~ (if there is one), then j and k are unbound and: l~ (pos, ~, i,-, -, I) d~=l P(pos\[$@~ w~) (it) If the node y/subsumes the foot node 7/! of e, then: l~ (pos, rL i, j, k, l) a~l P ( pos\[$@~ w{ b\[$o l lw~ ) In (ii), only the top element of the stack matters since as a consequence of the eonstrnction of the SLIG, we have that if pos\[$tl\]~ w~b\[$rll\]w ~ then for all string 7 e V/~ we also have pos\[$Tr/\]~ w~b\[$7~l\]w~.S Initially, all inside probabilities are set to zero. Then, the computation goes bottom-up starting from the productions introducing lexieal items: if r/ is a node such that b\[$7/\] --~ a, then:</Paragraph> <Paragraph position="3"> Then, the inside probabilities of larger substrings are computed bottom-up relying on the recurrence equa~We will not investigate tim conditions under which (12) holds. We conjecture that the techniques used for dmcking the eolmistency of stochastic context-free grammars (Booth and Thomp6on, 1973) can be adapted to SLTAG.</Paragraph> <Paragraph position="4"> r For example, for a given node 0 setting to zero the probability o\[ all rules of the forts (6) ht~ the effect of blocking adjunction.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 8Thls can be seen by obae~.ing that for any node on the path </SectionTitle> <Paragraph position="0"> from the root node to the foot node of an auxiliary tree, the stack remains unchanged.</Paragraph> <Paragraph position="1"> ACRES DE COLING-92, NANTES. 23-28 AOt~T 1992 4 2 8 PROC. OF COLING-92, NANTES. AUG. 23-28, 1992 lions stated in Appendix A. This computation takes in the worst case O(IGl~N6)-time and O(IGINa)-space for a sentence of lengtb N.</Paragraph> <Paragraph position="2"> Once the inside probabilities cmnputed, we obtain the probability of the sentence flu follows:</Paragraph> <Paragraph position="4"> Wc now consider the problem of re-estimating a SI,TAG.</Paragraph> </Section> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Inside-Ouside Algorithm for </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 1%eestimating a SLTAG </SectionTitle> <Paragraph position="0"> Given a set of positive example sentences, W = {wt'&quot;wK}, we would like to compute the probability of each rule of a given SLTAG in order to maximize thc probability that the corpus were generated by this SLTAG. An algorithm solving this problem can be used in two different ways.</Paragraph> <Paragraph position="1"> The first use is as a reestimation algorithm. In ttfis approach, the input SI,'1'A(~ derives structures that arc reasonable according to some criteria (such as a linguistic theory and some a priori kuowledge of the corpus) and the intended use of the algorithm is to refine the probability of each rule.</Paragraph> <Paragraph position="2"> The second use is as a learning algorithm. At the first iteration, a SLTAG which generates all possible structures over a given set of nodes and terminal symbols is used. Initially the probability of each rule is randomly assigned and then tile algorithm will re-estimate tbese probabilities.</Paragraph> <Paragraph position="3"> Informally speaking, given a first estimate of the parameters of a SLTAG, the algorithm re-estimates these parameters on the basis of the parses of each sentence in a training corpus obtained by a CKY-tyt)e parser. The algorithm is designed to derive a new estimate after each iteration such that the probability of the corpus is increased or equivalently such that tile cross entropy estimate (negative log probability) is decreased:</Paragraph> <Paragraph position="5"> In order to derive a new estimate, the algorithm needs to compute for all seutences in W the inside probabilities and the outside probabilities. Given a string w = al...aN, tbe outside probability, 0 ~ (pos, ~, i, j, k, It, is defined for all nodes r I contained in an elementary tree a and for pos E {t,b}, and for all indices 0 < i < j < k < l < N as follows: (it If the node r/does not subsume the foot node of a (if there is one), then j and k axe unbound asld: ..de\] O'deg (P os, O, i, -, -, t) - null P(B&quot;/ C V~ s.t. t\[$\]=~ Wio pos\[$Ttl\] w~) (ii) If the node ~/does subsume the foot node ~/! of a then:</Paragraph> <Paragraph position="7"> Once the inside probabilities computed, the outside probabilities can be computed top-down by considering smaller spans of the input string starting with O&quot;(t,$,O,-,-,N) = 1 (by definition). This is done by computing the recurrence equations stated in Appendix B.</Paragraph> <Paragraph position="8"> In the following, we assume that r I subsumes the foot node r/l within a same elementary tree, and also that tll subsumes the foot node ~111 (within a same elementary tree). The other cases are handled similarly. Table 1 shows the reestimation formulae for the adjoining rules (16) and the null adjoining rules (17).</Paragraph> <Paragraph position="9"> (16) corresponds to the average number of time that tl .... le L\[..T1\] .-* t\[..yqv\] is used, and (17) to th ...... age number of times no adjunction occnrred on T/. The denominators of (16) and of (17) estimate the average number of times that a derivation involves tlLe expansion oft\[-.~/\]. The numerator of(16) estimates the average number of times that a derivation involves the rule t\[.-7/\] -~ t\[..Tirfl\]. Therefore, for example, (16) estimates the probability of using the rule/\['-~7\] ~ l\[&quot;rplt\]. The algorittun reiterates until H(W, G) is unchanged (within some epsilon) between two iterations. Each iteration of the algoritbm requires at most O(IGIN e) time for each sentence of length N.</Paragraph> </Section> </Section> <Section position="8" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Grammar Inference with SLTAG </SectionTitle> <Paragraph position="0"> The reestimation algorithm explained in Section 4 can be used botll to reestimate the paramcters for a SI,TAG derived by some other mean or to infer a grammar from scratch. Ill the following, we investigate grammar Inference from scratch.</Paragraph> <Paragraph position="1"> The initial grammar for the reestimation algoritiim consists of all SLIG rules for the tress ill Lexicalized Normal I~brm (ill short LNF) over a given set = {aill .< i _< T} of terminal symbols, with suitably assigned non zero probability: 9 S 0 $4 s h t~ a i The above normal form is capable not only to derive any lexicalized tree-adjoining language, but also to impose ally binary bracketing over the strings of the language. The latter property is important as we would like to be able to use bracketing information in the ilLput corpus as in (Pereira and Schabes, 1992).</Paragraph> <Paragraph position="2"> The worst case complexity of tim reestimation algorithm given iu Section 4 with respect to the length of the input string (O(NS)) makes this approach in general impractical for LNF grammars.</Paragraph> <Paragraph position="3"> However, if only trees of the form fit a' and a~&quot; (or only of tile form /~' and a~), the language generated is a context-free language and can be handled more efficiently by the reestimation algorithnL 9Adjoining constraints can be u~d in tiffs normal form, They will be reflected in the SLIG eq~vaient grammar. Indices have been added on S nodes in order to be able to uniquely refer to each node in the granunar.</Paragraph> <Paragraph position="4"> AcrEs OE COLING-92, NANTES. 23-28 AOOT 1992 4 2 9 DROC. OF COLING-92, NANTES, AUG. 23-28, 1992</Paragraph> <Paragraph position="6"> It can be shown that if, only trees of the form ~a~ and ~a~ are considered, the reestimation algorithm requires in the worst case O(Na)-time) deg The system consisting of trees of the form ~' and c~ ~ can be seen as a stochastic lexicalized conle~:t-free grammars since it generates exactly context-free languages while being lexically sensitive.</Paragraph> <Paragraph position="7"> In the following, due to the lack of space, we report only few experiments on grammar inference using these restricted forms of SLTAG and the reestimation algorithm given in Section 4. We compare the results of the TAG inside-outside algorithm with the results of the inside-outside algorithm for context-free grammars (Baker, 1979).</Paragraph> <Paragraph position="8"> These preliminary experiments suggest that SLTAG achieves faster convergence (and also to a better solution) than stochastic context-free grmnmars.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.1 Inferring the Language {a&quot;b&quot;\]n > 0} </SectionTitle> <Paragraph position="0"> We consider first an artificial language. The training corpus consists of 100 sentences in the language L = {a&quot;b'~ln > 0} randomly generated by a stochastic context-free grammar.</Paragraph> <Paragraph position="1"> The initial grammar consists of the trees ~', fl~, c~ a and ab with random probability of adjoining and null adjoining.</Paragraph> <Paragraph position="2"> The inferred grammar models correctly the language L. Its rules of the form (I), (5) or (fi) with high probability follow (any excluded rule of the same form has probability at least l0 -a3 times lower than the rules given below). The structural rules of the form (2), (3), (4) or (7) are not shown since their probability always remain 1.</Paragraph> <Paragraph position="3"> ZdegThis can be Been by ol~ervin g that, for exaanple in l(posji, i,j,k,I), it i~ nece~y the ea~ that k = l, nnd also by noting that k is superfluous.</Paragraph> <Paragraph position="5"> t\[..o~\] 1~0 b\[..o~\] In the above grammar, a node S'k in a tree c~ a or/~ associated with the symbol a is referred as t/~, and a node S~ in a tree associated with b as r/~.</Paragraph> <Paragraph position="6"> We also conducted a similar experiment with the inside-outside algorithm for context-free grammar (Baker, 1979), starting with all pc~sible Chomsky Normal Form rules over 4 non-terminals and the set of terminal symbols {a,b} (72 rules). The inferred grammar does not quite correctly model the language L. Furthermore, the algorithm does not converge as fast as in the case of SLTAG (See Figure 1).</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.2 Experiments on the ATIS Corpus </SectionTitle> <Paragraph position="0"> We consider the part-of-speech sequences of the spoken-language transcriptions in the Texas Instruments sub-ACT~ BE COIANG-92. NANTES, 23-28 AO~' 1992 4 3 0 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 set of the Air Travel hfformation System (ATIS) corpus (Hemphill, Godfrey, and Doddington, 1990). This corpus is of interest since it has been used for infcrring stochastic context-free grammars from partially bracketed corpora (Pereira and Sehabes, 1992). We use the data given by Pereira and Schabes (1992) on raw text and compare with an inferred SLTAG.</Paragraph> <Paragraph position="1"> The initial grammar consists of all trees (96) of the form fl~, a ~ for all 48 terminal symbols for part-ofspeech. As shown in Figure 2, the grannnar converges very rapidly to a lower value of the log probability than the stochastic context-free grammar reported by Pereira and Schabes (1992).</Paragraph> </Section> </Section> class="xml-element"></Paper>