INSIDE-OUTSIDE REESTIMATION FROM 
PARTIALLY BRACKETED CORPORA 
Fernando Pereira Yves Schabes 
2D-447, AT&T Bell Laboratories 
PO Box 636, 600 Mountain Ave 
Murray Hill, NJ 07974-0636 
Dept. of Computer and Information Science 
University of Pennsylvania 
Philadelphia, PA 19104-6389 
ABSTRACT 
The inside-outside algorithm for inferring the parameters of 
a stochastic context-free grammar is extended to take advan- 
tage of constituent information in a partially parsed corpus. 
Experiments on formal and natural language parsed corpora 
show that the new algorithm can achieve faster convergence 
and better modelling of hierarchical structure than the orig- 
inal one. In particular, over 90% of the constituents in the 
most likely analyses of a test set are compatible with test 
set constituents for a grammar trMned on a corpus of 700 
hand-parsed part-of-speech strings for ATIS sentences. 
1. MOTIVATION 
Grammar inference is a challenging problem for statis- 
tical approaches to natural-language processing. The 
most successful grammar inference techniques involve 
stochastic finite-state language models such as hidden 
Markov models (HMMs) \[1\]. However, finite-state lan- 
guage models fail to represent the hierarchical struc- 
ture of natural language. Therefore, stochastic versions 
of grammar formalisms structurally more expressive are 
worth investigating. Baker \[2\] generalized the parameter 
estimation methods for HMMs to stochastic context-free 
grammars (SCFGs) \[3\] as the inside-outside algorithm. 
Unfortunately, the application of SCFGs and the inside- 
outside algorithm to natural-language modeling \[4, 5, 6\] 
has so far been inconclusive. 
Several reasons can be adduced for the difficulties. First, 
each iteration of the inside-outside algorithm on a gram- 
mar with n nonterminals may require O(nalwl 3) time per 
training sentence w, while each iteration of its finite-state 
counterpart training an HMM with s states requires at 
worst O(s2lwD time per training sentence. Second, the 
convergence properties of the algorithm sharply deteri- 
orate as the number of nonterminal symbols increases. 
This fact can be intuitively understood by observing that 
the algorithm searches for the maximum of a function 
whose number of local maxima grows with the number 
of nonterminMs. Finally, although SCFGs provide a hi- 
erarchical model of the language, that structure is unde- 
termined by raw text and only by chance will the inferred 
grammar agree with qualitative linguistic judgments of 
sentence structure. For example, since in English texts 
pronouns are very likely to immediately precede a verb, 
a grammar inferred from raw text will tend to together 
the subject pronoun with the verb. 
We describe here an extension of the inside-outside algo- 
rithm that infers the parameters of a stochastic context- 
free grammar from a partially parsed corpus, thus pro- 
viding a tighter connection between the hierarchical 
structure of the inferred SCFG and that of the training 
corpus. The Mgorithm takes advantage of whatever con- 
stituent information is provided by the training corpus 
bracketing, ranging from a complete constituent analysis 
of the training sentences to the unparsed corpus used for 
the original inside-outside algorithm. In the latter case, 
the new algorithm reduces to the original one. 
Using a partiMly parsed corpus has several important 
advantages. We empirically show that the use of par- 
tially parsed corpus can decrease the number of itera- 
tions needed to reach a solution. We also exhibit cases 
where a good solution is found from partially parsed cor- 
pus but not from raw text. Most importantly, the use 
of partially parsed corpus enables the Mgorithm to infer 
grammars that derive constituent boundaries that can- 
not be inferred from raw text. 
We first outline our extension of the inside-outside al- 
gorithm to partially parsed text, and then report pre- 
liminary experiments illustrating the advantages of the 
extended algorithm. 
2. PARTIALLY BRACKETED TEXT 
Informally, a partially bracketed corpus is a set of sen- 
tences annotated with parentheses marking constituent 
boundaries that any analysis of the corpus should re- 
spect. More precisely, we start from a corpus C con- 
sisting of bracketed strings, which are pairs c = (w, B) 
where w is a string and B is a bracketing of w. For con- 
venience, we will define the length of the bracketed string 
cby\[c\[=\[w I. 
Given a string w = wl ...wlw \[, a span ofw is a pair of 
integers (i, j) with 0 _~ i < j _~ \[w\[. By convention, span 
(i,j) delimits substring iwj = wi+l ...wj of w. We also 
122 
use the abbreviation itv for iWlto I. 
A bracketing B of a string w is a finite set of spans on w 
(that is, a finite set of pairs or integers (i, j) with 0 _< i < 
j _< \[wl) satisfying a consistency condition that ensures 
that each span (i, j) can be seen as delimiting a (sequence 
of) constituents iwi. The consistency condition is simply 
that no two spans in a bracketing may overlap, where two 
spans (i, j) and (k, I) overlap if either i < k < j < i or 
k < i < l < j. We also say that two bracketings of the 
same string are compatible if their union is consistent. 
Note that there is no requirement that a bracketing of 
w describe fully the constituent structure of w. In fact, 
some or all sentences in a corpus may have empty brack- 
etings, in which case the new algorithm behaves like the 
original one. 
To present the notion of compatibility between a deriva- 
tion and a bracketed string, we need first to define the 
span of a symbol occurrence in a context-free derivation. 
Let (w, B) be a bracketed string, and a0 ~ al =::> ... 
am = w be a derivation of w for (S)CFG G. The span 
of a symbol occurrence in aj is defined inductively as 
follows: 
• Ifj = m, aj = w E E*, and the span ofwi in aj is 
(i - 1, i). 
• If j < m, then aj = /fAT, aj+l = ~X1...XkT, 
where A ~ Xi ..- Xk is a production of G. Then the 
span of A in aj is (il,jk), where for each 1 < 1 < k, 
(iz,jt) is the span of Xz in aj+l. The spans in aj of 
the symbol occurrences in/~ and 7 are the same as 
those of the corresponding symbols in otj+t. 
A derivation of w is then compatible with a bracketing B 
of w if no span of a symbol occurrence in the derivation 
overlaps a span in B. 
3. THE INSIDE-OUTSIDE 
ALGORITHM 
The inside-outside algorithm \[2\] is a reestimation proce- 
dure for the rule probabilities of a Chomsky normal-form 
(CNF) SCFG. It takes as inputs an initial CNF SCFG 
and a training corpus of sentences and it iteratively rees- 
timates rule probabilities to maximize the probability 
that the grammar used as a stochastic generator would 
produce the corpus. 
A reestimation algorithm can be used both to refine the 
parameter estimates for a CNF SCFG derived by other 
means \[7\] or to infer a grammar from scratch. In the 
latter case, the initial grammar for the inside-outside al- 
gorithm consists of all possible CNF rules over given sets 
N of nonterminals and E of terminals, with suitable as- 
signed nonzero probabilities. In what follows, we will 
take N, E as fixed, n = \[NI, t = I~1, and assume enu- 
merations N = {A1,...,An} and E = {bl,...,bt}, with 
A1 the grammar start symbol. A CNF SCFG over N, E 
can then be specified by the n s + nt probabilities Bp,q,r 
of each possible binary rule Ap ~ Aq Ar and Up,m of 
each possible unary rule A n ~ bin. Since for each p the 
parameters Bp.q,r and Up,m are supposed to be the prob- 
abilities of different ways of expanding Ap, we must have 
the for all 1 _< p_< n 
+ = I (1) 
qjr 
For grammar inference, we give random initial values to 
the parameters Bp,q,r and Up,m subject to the constraints 
(I). 
The intended meaning of rule probabilities in a SCFG 
is directly tied to the intuition of context-freeness: a 
derivation is assigned a probability which is the prod- 
uct of the probabilities of the rules used in each step of 
the derivation. Context-freeness together with the com- 
mutativity of multiplication thus allow us to identify all 
derivations associated to the same parse tree, and we 
will speak indifferently below of derivation and analy- 
sis (parse tree) probabilities. Finally, the probability of 
a sentence or sentential form is the sum of the proba- 
bilities of all its analyses (equivalently, the sum of the 
probabilities of all of its leftmost derivations from the 
start symbol). 
The basic idea of the inside-outside algorithm is to use 
the current rule probabilities to estimate from the train- 
ing text the expected frequencies of certain derivation 
steps, and then compute new rule probability estimates 
as appropriate frequency ratios. Therefore, each itera- 
tion of the algorithm starts by calculating estimates of 
the number of occurrences of the relevant configurations 
in each of the sentences tv in the training corpus W. 
Because the frequency estimates are most conveniently 
computed as ratios of other frequencies, they are a bit 
loosely referred to as inside and outside probabilities. 
In the original inside-outside algorithm, for each tv E 
W, the inside probability I~(i,j) estimates the likeli- 
hood that Ap derives iwj, while the outside probability 
O~(i,j) estimates the likelihood of deriving sentential 
form owi Apjw from the start symbol A1. In adapting 
the algorithm to partially bracketed strings we must take 
into account the constraints that the bracketing imposes 
on possible derivations, and thus on possible phrases. 
Clearly, nonzero values for I~(i,j) or O~(i,j) should 
only be allowed if iwj is compatible with the bracketing 
of w, or, equivalently, if (i, j) does not overlap any span 
123 
in the bracketing of w. Therefore, we will in the fol- 
lowing assume a bracketed corpus C, which as described 
above is a set of bracketed strings c = (w, B), and will 
modify the standard formulae for the inside and outside 
probabilities and rule probability reestimation \[2, 4, 5\] 
to involve only constituents whose spans are compatible 
with string bracketings. For this purpose, for each brack- 
eted string c = (w, B) we define the auxiliary function 
1 if (i,j) does not overlap any b 6 B 6(i,j)= 
0 otherwise 
For each bracketed sentence c in the training corpus, the 
inside probabilities of longer spans of c can be computed 
from those for shorter spans by the following recurrence 
equations: 
I~(i- 1, i) = Up,~ where c = (w, B) and bm = wi (2) 
I;(i'k) = 5(i'k) E E Bp,q,rI~(i,j)I,~(j,k) (3) 
q,r i<j<k 
Equation (3) computes the expected relative frequency 
of derivations of ~wk from Ap compatible with the brack- 
eting B of c = (w, B). The multiplier 5(i, k) is 0 just in 
case (i, k) overlaps some span in B, which is exactly when 
Ap cannot derive iwk compatibly with B. 
Similarly, the outside probabilities for shorter spans of c 
can be computed from the inside probabilities and the 
outside probabilities for longer spans by the following 
recurrence: 
1 ifp=l 
O$(0,1cl) = 0 otherwise. (4) 
Or(i, k) = 
i--1 
+ 
q~r 
kj--k+x 
(5) 
Once the inside and outside probabilities computed for 
each sentence in the corpus, the reestimated probability 
of binary rules, /Jp,q,r, and the reestimated probability 
of unary rules, (/p,m, are computed using the following 
reestimation formulae, which are just like the standard 
ones \[2, 5, 4\] except for the use of bracketed strings in- 
stead of unbracketed ones: 
l ( Bp,q,rX~(i,j) ) 
Z × 
Bp,q,r : c6C O~i<j<k_<\[w\[ I,~(j, k)O~(i, k) (6) EP;/PO 
cEC 
1 
~p,rn ~ cEC: l<i<\[c\[,¢-(w,B),wi-b. (7) 
E e;/e° c£C 
where Pc is the probability assigned by the current 
model to bracketed string c 
pc = I~(0, \[el) 
and P~ is the probability assigned by the current model 
to the set of derivations compatible with c involving some 
instance of nonterminal Ap 
P;= 
0<i<j<lc\[ 
The denominator of ratios (6) and (7) estimates the 
probability that a compatible derivation of a bracketed 
string in C will involve at least one expansion of nonter- 
minal Av. The numerator of (6) estimates the probabil- 
ity that a compatible derivation of a bracketed string in 
C will involve rule Ap --~ Aq At, while the numerator of 
(7) estimates the probability that a compatible deriva- 
tion of a string in C will rewrite Ap to bin. Thus (6) 
estimates the probability that a rewrite of Ap in a com- 
patible derivation of a bracketed string in C will use rule 
Ap ~ Aq At, and (7) estimates the probability that an 
occurrence of Ap in a compatible derivation of a string in 
in C will be rewritten to bin. Clearly, these are the best 
current estimates for the binary and unary rule proba- 
bilities. 
The process is then repeated with the reestimated prob- 
abilities until the increase in the estimated probability 
of the training text given the model becomes negligible, 
or, what amounts to the same, the decrease in the cross 
entropy estimate (log probability) 
ElogProb(c) E log I\[(0, Icl) 
He(C) = tee = tee (8) 
eEC cEC 
becomes negligible. Note that for comparisons with the 
original algorithm, we should use the cross entropy of 
the unbracketed text with respect to the grammar, not (8). 
4. EXPERIMENTAL EVALUATION 
The following experiments, although preliminary, give 
some support to our earlier suggested advantages of the 
inside-outside algorithm for partially bracketed corpora. 
We start with a formal-language example used by Lari 
and Young \[4\] in a previous evaluation of the inside- 
outside algorithm. In this case, training on a bracketed 
124 
corpus can lead to a good solution while no reasonable 
solution is found training on raw text only. 
Then, using a naturally occurring corpus and its par- 
tially bracketed version provided by the Penn Treebank, 
we compare the bracketings assigned by grammars in- 
ferred from raw and from bracketed training material 
with the Penn Treebank bracketings. 
Together, the experiments support the view that train- 
ing on bracketed corpora can lead to better convergence, 
and the resulting grammars agree better with linguistic 
judgments of sentence structure. 
4.1. Inferring the Palindrome Language 
We consider first an artificial language discussed by Lari 
and Young \[4\]. Our training corpus consists of 100 sen- 
tences in the palindrome language L over two symbols a 
and b 
L = {ww a I we {a,b}'}. 
randomly generated with the SCFG 
S°~AC 
S°~BD 
S°~AA 
S °~ BB 
C~SA 
D -~ SB 
1 A --~ a 
B -L b 
The initial grammar consists of all possible CNF rules 
over five nonterminals and the terminals a and b (135 
rules), with a random assignment of initial probabilities. 
As shown in Figure 1, with an unbracketed training set 
the log probability remains almost unchanged after 40 
iterations (from 1.57 to 1.43) and no useful solution is 
found. In contrast, with the same training set fully 
bracketed, the log probability of the inferred grammar 
computed on the raw text decreases rapidly (1.57 ini- 
tially, 0.87 after 22 iterations). Similarly, the cross en- 
tropy estimate of the bracketed text with respect to the 
grammar improves rapidly (2.85 initially, 0.87 after 22 
iterations). 
The inferred grammar models correctly the palindrome 
language. Its high probability rules (p > 0.1, p/p' > 30 
=o 
1.6 
1.5 
1.4 
1.3 
1.2 
I.I 
1 
0.9 
0.8 
  ! i i i Raw Bracketed --" 
I 
I 
I 
I 
I 
i i i i i I i 
5 I0 15 20 25 30 35 
Iteration 
40 
Figure 1: Convergence for the Palindrome Corpus 
for any excluded rule pl) are 
S ~ AD 
S --~CB 
B ~ SC 
D ~ SA 
A ~ b 
B ~ a 
C ~ a 
D ~ b 
which is a close to optimal CNF CFG for the palindrome 
language. 
The results on this grammar are quite sensitive to the 
size and statistics of the training corpus and the ini- 
tial rule probability assignment. In fact, for a couple of 
choices of initial grammar and corpus, the original algo- 
rithm yields somewhat better results than the new one. 
However, in no experiment did the training on unparsed 
text achieve nearly as good a result as that shown above 
for parsed text. 
4.2. Experiments on the ATIS Corpus 
We also conducted an experiment on inferring grammars 
for the language consisting of part-of-speech sequences of 
spoken-language transcriptions in the Texas Instruments 
subset of the Air Travel Information System (ATIS) cor- 
pus \[8\]. We take advantage of the availability of the 
hand-parsed version of the ATIS corpus provided by the 
Penn Treebank project \[9\] and use the corresponding 
bracketed corpus over parts of speech as training data. 
Out of the 770 bracketed sentences (7812 words) in the 
corpus, we used 700 as training data and 70 (901 words) 
as test set. The following is an example training string 
( ( ( VB ( DT ~NS ( Im ( ( ~ ) ( n 
CD) ) ) ) ) ) . ) 
125 
corresponding to the parsed sentence 
(((\[List (the fares (for ((flight) 
(number 891)))))) .) 
The initial grammar consists of all possible CNF rules 
(4095 rules) over 15 nonterminals (the same number as 
in the tree bank) and 48 terminals corresponding to the 
parts of speech used in the tree bank. 
We trained a random initial grammar twice, on the un- 
bracketed version of the training corpus yielding gram- 
mar GR, and on the bracketed training set, yielding 
grammar GB. 
Figure 2 shows that the convergence to GB is faster than 
the convergence to GR. Even though the cross-entropy 
estimates for the raw training text with both grammars 
are not that different after 50 iterations (3.0 for GB, 3.02 
for GR), the analyses assigned by the resulting grammars 
to the test set are drastically different. 
To evaluate objectively the quality of the analyses 
yielded by a grammar G, we used a Viterbi-style parser 
to find the most likely analyses of the test set according 
to G, and computed the proportion of phrases in those 
analyses that are compatible in the sense defined in Sec- 
tion 2 with the tree bank bracketings of the test set. This 
criterion is closely related to the "crossing parentheses" 
score of Black et al. \[10\]. We found that that only 35% 
of the constituents in the most likely GR analyses of the 
test set are compatible with tree bank bracketing, in con- 
trast to 88% of the constituents in the most likely GB 
analysis. 
As a first example, GB gives the following bracketings: 
(((I (would (like (to (take (((Delta 
(flight number)) 83) (to 
Atlanta))))))) .) 
((What (is (((the cheapest) fare) (I 
(can get))))) ?)) 
Although the constituent (the cheapest) is linguisti- 
cally wrong, the only constituent not compatible with 
the tree bank bracketing is (Delta flight number): 
(I would (like (to (take (Delta 
((flight number) 83)) (to 
Atlanta)))).) 
(What ((is (the cheapest fare (I can 
get)))) ?) 
In contrast, GR gives the following analyses for the same 
strings, with 16 constituents incompatible with the tree 
bank: 
(I (would (like ((to ((take (Delta 
flight)) (number (83 ((to Atlanta) 
.))))) 
((What (((is the) cheapest) fare)) ((I 
can) (get ?))))))) 
Another example analysis for GB is 
((Tell (me (about (((the public) 
transportation) ((from SFO) (to 
(San Francisco))))))) .) 
which is compatible with the tree bank one: 
~o 
3 
4.6 
4.4 
4.2 
4 
3.8 
3.6 
3.4 
3.2 
3 
! m ! ! ! ! ! ! ! 
Raw -- 
Bracketed --- 
5 i0 15 20 25 30 35 40 45 50 
Iteration 
Figure 2: Convergence for the ATIS Corpus 
It is interesting to look at some the differences between 
GR and GB, as seen from the most likely analyses they 
assign to certain sentences. For readability, we give the 
analyses in terms of the original words rather than part 
of speech tags. 
((Tell me (about (the public 
transportation ((from SF0) (to San 
Francisco))))). ) 
However, the most likely GR analysis has nine con- 
stituents incompatible with the tree bank: 
(Tell ((me (((about the) public) 
tramsportation)) ((from SF0) ((to 
Sam) (Francisco . ) ) ) ) ) 
In this analysis, a Francisco and the final punctuation 
are places in a lowest-level constituent. Since final punc- 
tuation is quite often preceded by a noun, a grammar 
inferred from raw text will tend to bracket the noun with 
the punctuation mark. 
Even better results can be obtained by continuing the 
reestimation on bracketed text. After 78 iterations, 91% 
of the constituents of the most likely parse of the test 
set are compatible with the tree bank bracketing. 
126 
This experiment illustrates the fact that although 
SCFGs provide a hierarchical model of the language, 
that structure is undetermined by raw text and only by 
chance will the inferred grammar agree with qualitative 
linguistic judgments of sentence structure. This prob- 
lem has also been previously observed with linguistic 
structure inference methods based on mutual informa- 
tion. Magerman and Marcus \[11\] propose to alleviate 
this behavior by enforcing that a predetermined list of 
pairs of words (such as verb-preposition, pronoun-verb) 
are never embraced by a constituent. However, these 
constraints are stipulated in advance rather than being 
automatically derived from the training material, in con- 
trast with what we have shown to be possible with the 
inside-outside algorithm for partially bracketed corpora. 
5. CONCLUSIONS AND FURTHER 
WORK 
We have introduced a modification of the well-known 
inside-outside algorithm for inferring the parameters of a 
stochastic context-free grammar that can take advantage 
of constituent information (constituent bracketing) in a 
partially bracketed corpus. 
The method has been successfully applied to SCFG in- 
ference for formal languages and for part-of-speech se- 
quences derived from the ATIS spoken-language corpus. 
The use of partially bracketed corpus can reduce the 
number of iterations required for convergence of parame- 
ter reestimation. In some cases, a good solution is found 
from a bracketed corpus but not from raw text. Most im- 
portantly, the use of partially bracketed natural corpus 
enables the algorithm to infer grammars specifying lin- 
guistically reasonable constituent boundaries that can- 
not be inferred by the inside-outside algorithm on raw 
text. 
These preliminary investigations could be extended in 
several ways. First, it is important to determine the 
sensitivity of the training algorithm to the initial proba- 
bility assignments and training corpus, as well as to lack 
or misplacement of brackets. We have started experi- 
ments in this direction, but reasonable statistical models 
of bracket elision and misplacement are lacking. 
Second, we would like to extend our experimvnts to 
larger terminal vocabularies. As is well-known, this 
raises both computational and data sparseness problems, 
so clustering of terminal symbols will be essential. 
Finally, this work does not address a central weakness 
of SCFGs, their inability to represent lexical influences 
on distribution except by a statistically and computa- 
tionally impractical proliferation of nonterminal sym- 
bols. One might instead look into versions of the current 
algorithm for more lexically-oriented formalisms such as 
stochastic lexicalized tree-adjoining grammars \[12\]. 
ACKNOWLEGMENTS 
We thank Aravind Joshi and Stuart Shieber for useful 
discussions. The second author is partially supported by 
DARPA Grant N0014-90-31863, ARO Grant DAAL03- 
89-C-0031 and NSF Grant IRI90-16592. 
REFERENCES 
1. Rabiner, L. R. A tutorial on hidden Markov models and 
selected applications in speech recognition. Proceedings 
o\] the IEEE, 77(2):257-285, February 1989. 
2. Baker, J. Trainable grammars for speech recognition. 
In Wolf, J. J. and Klatt, D. H., editors, Speech com- 
munication papers presented at the 97 th Meeting of the 
Acoustical Society of America, MIT, Cambridge, MA, 
June 1979. 
3. Booth, T. Probabilistic representation of formal lan- 
guages. In Tenth Annual/EEE Symposium on Switching 
and Automata Theory, October 1969. 
4. Lari, K. and Young, S. J. The estimation of stochas- 
tic context-free grammars using the Inside-Outside algo- 
rithm. Computer Speech and Language, 4:35-56, 1990. 
5. Jelinek, F., Lafferty, J. D., and Mercer, R. L. Basic 
methods of probabilistic context free grammars. Tech- 
nical Report RC 16374 (72684), IBM, Yorktown Heights, 
New York 10598, 1990. 
6. Lari, K. and Young, S. J. Applications of stochastic 
context-free grammars using the Inside-Outside algo- 
rithm. Computer Speech and Language, 5:237-257, 1991. 
7. Fujisaki, T., Jelinek, F., Cocke, J., Black, E., and 
Nishino, T. A probabilistic parsing method for sen- 
tence disambiguation. In Proceedings of the Interna- 
tional Workshop on Parsing Technologies, Pittsburgh, 
August 1989. 
8. Hemphill, C. T., Godfrey, J. J., and Doddington, G. R. 
The ATIS spoken language systems pilot corpus. In 
DARPA Speech and Natural Language Workshop, Hid- 
den Valley, Pennsylvania, June 1990. 
9. Brill, E., Magerman, D., Marcus, M., and Santorini, 
B. Deducing linguistic structure from the statistics of 
large corpora. In DARPA Speech and Natural Language 
Workshop. Morgan Kaufmann, Hidden Valley, Pennsyl- 
vania, June 1990. 
10. Black, E., Abney, S., Flickenger, D., Grishman, R., har- 
rison, P., Hindle, D., Ingria, R., Jelinek, F., Klavans, 
J., Liberman, M., Marcus, M., Roukos, S., Santorini, 
B., and Strzalkowski, T. A procedure for quantita- 
tively comparing the syntactic coverage of english gram- 
mars. In DARPA Speech and Natural Language Work- 
shop, pages 306-311, Pacific Grove, California, 1991. 
Morgan Kaufmann. 
11. Magerman, D. and Marcus, M. Parsing a natural lan- 
guage using mutual information statistics. In AAAI-90, 
Boston, MA, 1990. 
12. Schabes, Y. Stochastic lexicalized tree-adjoining gram- 
mars. Also in these proceedings, 1992. 
127 
