Dynamic compilation of weighted context-free grammars 
Mehryar Mohri and Fernando C. N. Pereira 
AT&T Labs - Research 
180 Park Avenue 
Florham Park, NJ 07932, USA 
{mohri, pereira}@research, att. com 
Abstract 
Weighted context-free grammars are a conve- 
nient formalism for representing grammatical 
constructions and their likelihoods in a variety 
of language-processing applications. In partic- 
ular, speech understanding applications require 
appropriate grammars both to constrain speech 
recognition and to help extract the meaning 
of utterances. In many of those applications, 
the actual languages described are regular, but 
context-free representations are much more con- 
cise and easier to create. We describe an effi- 
cient algorithm for compiling into weighted fi- 
nite automata an interesting class of weighted 
context-free grammars that represent regular 
languages. The resulting automata can then be 
combined with other speech recognition compo- 
nents. Our method allows the recognizer to dy- 
namically activate or deactivate grammar rules 
and substitute a new regular language for some 
terminal symbols, depending on previously rec- 
ognized inputs, all without recompilation. We 
also report experimental results showing the 
practicality of the approach. 
1. Motivation 
Context-free grammars (CFGs) are widely used 
in language processing systems. In many appli- 
cations, in particular in speech recognition, in 
addition to recognizing grammatical sequences 
it is necessary to provide some measure of the 
probability of those sequences. It is then natu- 
ral to use weighted CFGs, in which each rule is 
given a weight from an appropriate weight alge- 
bra (Salomaa and Soittola, 1978). Weights can 
encode probabilities, for instance by setting a 
rule's weight to the negative logarithm of the 
probability of the rule. Rule probabilities can 
be estimated in a variety of ways, which we will 
not discuss further in this paper. 
Since speech recognizers cannot be fully cer- 
tain about the correct transcription of a spoken 
utterance, they instead generate a range of al- 
ternative hypotheses with associated probabil- 
ities. An essential function of the grammar is 
then to rank those hypotheses according to the 
probability that they would be actually uttered. 
The grammar is thus used together with other 
information sources - pronunciation dictionary, 
phonemic context-dependency model, acoustic 
model (Bahl et al., 1983; Rabiner and Juang, 
1993) - to generate an overall set of transcrip- 
tion hypotheses with corresponding probabili- 
ties. 
General CFGs are computationally too de- 
manding for real-time speech recognition sys- 
tems, since the amount of work required to ex- 
pand a recognition hypothesis in the way just 
described would in general be unbounded for 
an unrestricted grammar. Therefore, CFGs 
used in spoken-dialogue applications often rep- 
resent regular languages (Church, 1983; Brown 
and Buntschuh, 1994), either by construction or 
as a result of a finite-state approximation of £ 
more general CFG (Pereira and Wright, 1997). 1 
Assuming that the grammar can be efficiently 
converted into a finite automaton, appropriate 
techniques can then be used to combine it with 
other finite-state recognition models for use in 
real-time recognition (Mohri et al., 1998b). 
There is no general algorithm that would map 
an arbitrary CFG generating a regular language 
into a corresponding finite-state automaton (UI- 
lian; 1967). However, we will describe a use- 
ful class of grammars that can be so trans- 
formed, and a transformation algorithm that 
avoids some of the potential for combinatorial 
1 Grammars representing regular languages have also 
been used successfully in other areas of computational 
linguistics (Karlsson et al., 1995). 
891 
explosion in the process. 
Spoken dialogue systems require grammars 
or language models to change as the dialogue 
proceeds, because previous interactions set the 
context for interpreting new utterances. For in- 
stance, a previous request for a date might ac- 
tivate the date grammar and lexicon and inac- 
tivate the location grammar and lexicon in an 
automated reservations task. Without such dy- 
namic grammars, efficiency and accuracy would 
be compromised because many irrelevant words 
and constructions would be available when eval- 
uating recognition hypotheses. We consider two 
dynamic grammar mechanisms: activation and 
deactivation of grammar rules, and the substi- 
tution of a new regular language for a terminal 
symbol when recognizing the next utterance. 
We describe a new algorithm for compil- 
ing weighted CFGs, based on representing the 
grammar as a weighted transducer. This 
representation provides opportunities for op- 
timization, including optimizations involving 
weights, which are not possible for general 
CFGs. The algorithm also supports dynamic 
grammar changes without recompilation. Fur- 
thermore, the algorithm can be executed on de- 
mand: states and transitions of the automa- 
ton are expanded only as needed for the recog- 
nition of the actual input utterances. More- 
over, our lazy compilation algorithm is opti- 
mal in the sense that the construction requires 
work linear in the size of the input grammar, 
which is the best one can expect given that 
any algorithm needs to inspect the whole in- 
put grammar. It is however possible to speed- 
up grammar compilation further by applying 
pre-compilation optimizations to the grammar, 
as we will see later. The class of grammars 
to which our algorithm applies includes right- 
linear grammars, left-linear grammars and cer- 
tain combinations thereof. 
The algorithm has been fully implemented 
and evaluated experimentally, demonstrating 
its effectiveness. 
2. Algorithm 
We will start by giving a precise definition of 
dynamic grammars. We will then explain each 
stage of grammar compilation. Grammar com- 
pilation takes as input a weighted CFG repre- 
sented as a weighted transducer (Salomaa and 
Soittola, 1978), which may have been opti- 
mized prior to compilation (preoptimized). The 
weighted transducer is analyzed by the com- 
pilation algorithm, and the analysis, if suc- 
cessful, outputs a collection of weighted au- 
tomata that are combined at runtime according 
to the current dynamic grammar configuration 
and the strings being recognized. Since not all 
CFGs can be compiled into weighted automata, 
the compilation algorithm may reject an input 
grammar. The class of allowed grammars will 
be defined later. 
2.1. Dynamic grammars 
The following notation will be used in the rest 
of the paper. A weighted CFG G = (V,P) 
over the alphabet E, with real-number weights 
consists of a finite alphabet V of variables or 
nonterminals disjoint from ~, and a finite set 
P C V × R × (V U Z)* of productions or deriva- 
tion rules (Autebert et al., 1997). Given strings 
u, v E (V U ~)*, and real numbers c and c', we 
write (u, c) 2+ (v, c') when there is a derivation 
from u with weight c to v with weight c'. We 
denote by La(X) the weighted language gener- 
ated by a nonterminal X: 
LG(X) = {(w,c) E ~* x R: (X, 0) -~ (w,c)} 
We can now define the two grammar-changing 
operations that we use. 
Dynamic activation or deactivation of 
rules 2 We augment the grammar with a set 
of active nonterminals, which are those avail- 
able as start symbols for derivations. More pre- 
cisely, let A C_ V be the set of active nonter- 
minals. The language generated by G is then 
LG = \[.JxEA LG(X). Note that inactive nonter- 
minals, and the rules involving them, are avail- 
able for use in derivations; they are just not 
available as start symbols. Dynamic rule acti- 
vation or deactivation is just the dynamic re- 
definition of the set A in successive uses of the 
grammar. 
Dynamic substitution Let a be a weighted 
rational transduction of ~* to A* x R, ~ C_ A, 
that is a regular weighted substitution (Berstel, 
1979). a is a monoid morphism verifying: 
2This is the terminology used in this area, though a 
more appropriate expression would be dynamic activa- 
tion or deactivation of nonterminal symbols. 
892 
Vx E ~, a(x) C Reg(A" × R) 
where Reg(A* x R) denotes the set of 
weighted regular languages over the alphabet 
A. Thus a simply substitutes for each symbol 
a E ~ a weighted regular expression a(a). A 
dynamic substitution consists of the application 
of the substitution a to ~, during the process 
of recognition of a word sequence. Thus, after 
substitution, the language generated by the new 
grammar G I is: 3 
La, = a( Lc) 
Our algorithm allows for both of those dy- 
namic grammar changes without recompiling 
the grammar. 
2.2. Preprocessing 
Our compilation algorithm operates on a 
weighted transducer v(G) encoding a factored 
representation of the weighted CFG G, which 
is generated from G by a separate preproces- 
sor. This preprocessor is not strictly needed, 
since we could use a version of the main algo- 
rithm that works directly on G. However, pre- 
processing can improve dramatically the time 
and space needed for the main compilation step, 
since the preprocessor uses determinization and 
minimization algorithms for weighted transduc- 
ers (Mohri, 1997) to increase the sharing -- fac- 
toring- among grammar rules that start or end 
the same way. 
The preprocessing step builds a weighted 
transducer in which each path corresponds to a 
grammar rule. Rule X(~ -+ Y1 --.Y~ has a cor- 
responding path that maps X to the sequence 
I/1 ...Y~ with weight ~. For example, the small 
CFG in Figure 1 is preprocessed into the com- 
pacted transducer shown in Figure 2. 
2.3. Compilation 
The compilation of weighted left-linear or right- 
linear grammars into weighted automata is 
straightforward (Aho and Ullman, 1973). In 
the right-linear case, for instance, the states of 
the automaton are the grammar nonterminals 
together with a new final state F. There is a 
3a can be extended as usual to map ~* × R to 
Reg( A * × R ). 
Z .1 -~ XY 
X .2 -~ aY 
Y .3 --+ bX 
Y.4-~c 
(i) 
Figure 1: Grammar G1. 
:¢10.1 
Figure 2: Weighted transducer r(G1). 
transition labeled with a E E and weight a E R 
from X E V to Y E V iff the grammar con- 
tains the rule Xa --+ aY. There is a transition 
from X to F labeled with a and weight a iff 
Xa --~ a is a rule of the grammar. The initial 
states are the states corresponding to the active 
nonterminals. For example, Figure 3 shows the 
weighted automaton for grammar G2 consisting 
of the last three rules of G1 with start symbol 
X. 
However, the standard methods for left- and 
right-linear grammars cannot be used for gram- 
mars such as G1 that generate regular sets but 
have rules that are neither left- nor right-linear. 
But we can use the methods for left- and right- 
linear grammars as subroutines if the grammar 
can be decomposed into left-linear and right- 
linear components that do not call each other 
recursively (Pereira and Wright, 1997). More 
precisely, define a dependency graph Dc for 
G's nonterminals and examine the set of its 
strongly-connected components (SCCs). 4 The 
nodes of Da are G's nonterminals, and there 
is a directed edge from X to Y if Y appears 
in the right-hand side of a rule with left-hand 
side X, that is, if the definition of X depends 
on Y. Each SCC S of DG has a corresponding 
subgrammar of G consisting of those rules with 
4 Recall that the strongly connected components of a 
directed graph are the equivalence classes of graph nodes 
under the relation R defined by: q R q~ if q~ can be 
reached from q and q from q~. 
893 
Figure 3: Compilation of G2. 
Figure 4: Dependency graph DG1 for grammar 
G1. 
left-hand nonterminals in S, with nonterminals 
not in S treated as terminal symbols. If each of 
these subgrammars is either left-linear or right- 
linear, we shall see that compilation into a single 
finite automaton is possible. 
The dependency graph DG can be obtained 
easily from the transducer r(G). For exam- 
ple, Figure 4 shows the dependency graph for 
our example grammar G1, with SCCs {Z} and 
(X, Y}. It is clear that G1 satisfies our condi- 
tion, and Figure 5 shows the result of compiling 
G1 with A = (Z}. 
The SCCs of Da can be obtained in time lin- 
ear in the size of G (Aho et hi., 1974). Be- 
fore starting the compilation, we check that 
each subgrammar is left-linear or right-linear 
(as noted above, nonterminals not in the SCC 
of a subgrammar are treated as terminals). For 
example, if (X1, X2} is an SCC, then the sub- 
grammar 
Xt -'~ aYlbY2X1 
X1 --~ bY2aY1X2 
X2 -~ bbYlabX1 
(2) 
Figure 5: Compilation of G1 with start symbol 
Z. 
Figure 6: Weighted automaton K((X, Y}) cor- 
responding to the strongly connected compo- 
nent {X, Y} of G1. 
with X1,X2, Y1,Y2 E V and a,b E ~ is right- 
linear, since expressions such as aYlbY2 can be 
treated as elements of the terminal alphabet of 
the subgrammar. 
When the compilation condition holds, for 
each SCC S we can build a weighted automa- 
ton K(S) representing the language of S's sub- 
grammar using the standard methods. Since 
some nonterminals of G are treated as termi- 
nal symbols within a subgrammar, the transi- 
tions of an automaton K(S) may be labeled 
with nonterminals not in S. 5 The nontermi- 
nals not in S can then be replaced by their cor- 
responding automata. The replacement opera- 
tion is lazy, that is, the states and transitions of 
the replacing automata are only expanded when 
needed for a given input string. Another inter- 
esting characteristic of our algorithm is that the 
weighted automata K(S) can be made smaller 
by determinization and minimization, leading 
to improvements in runtime performance. 
The automaton M(X) that represents the 
language generated by nonterminal symbol X 
can be defined using K(S), where S is the 
strongly connected component containing X, 
X E S. For instance, when the subgrammar 
of S is right-linear, M(X)is the automaton 
that has the same states, transitions, and final 
states as K(S) and has the state correspond- 
ing to X as initial state. For example, Figure 
6 shows K((X,Y)) for G1. M(X) is then ob- 
tained from K((X,Y}) by taking X as initial 
state. The left-linear case can be treated in a 
similar way. Thus, M(X) can always be de- 
fined in constant time and space by editing the 
automaton K(S). We use a lazy implementa- 
tion of this editing operation for the definition 
5More precisely, they can only be part of other 
strongly connected components that come before S in 
a reverse topological sort of the components. This guar- 
antees the convergence of the replacement of the nonter- 
minals by the corresponding automata. 
894 
xt0 
Figure 7: Automaton Ma with activated non- 
terminals: A = {X, Y, Z}. 
of the automata M(X): the states and transi- 
tions of M(X) are determined using K(S) only 
when necessary for the given input string. This 
allows us to save both space and time by avoid- 
ing a copy of K(S) for each X E S. 
Once the automaton representing the lan- 
guage generated by each nonterminal is cre- 
ated, we can define the language generated by G 
by building an automaton Ma with one initial 
state and one final state, and transitions labeled 
with active nonterminals from the initial to the 
final state. Figure 7 illustrates this in the case 
where A -- {X, Y, Z}. 
Given this construction, the dynamic activa- 
tion or deactivation of nonterminals can be done 
by modifying the automaton MG. This opera- 
tion does not require any recompilation, since it 
does not affect the automaton M(X) built for 
each nonterminal X. 
All the steps in building the automata M(X) 
-- construction of DG, finding the SCCs, and 
computing for K(S) for each SCC S -- require 
linear time and space with respect to the size 
of G. In fact, since we first convert G into 
a compact weighted transducer r(G), the to- 
tal work required is linear in the size of r(G). 6 
This leads to significant gains as shown by our 
experiments. 
In summary, the compilation algorithm has 
the following steps: 
1. Build the dependency graph Da of the 
grammar G. 
2. Compute the SCCs of Da. 7 
3. For each SCC S, construct the automaton 
K(S). For each X E S, build M(X) from 
SApplying the algorithm to a compacted weighted 
transducer r(G) involves various subtleties that we omit 
for simplicity. 
TWe order the SCCs in reverse topological order, but 
this is not necessary for the correctness of the algorithm. 
K(X). 8 
4. Create a simple automaton MG accepting 
exactly the set of active nonterminals A. 
5. The automaton is then expanded on-the-fly 
for each input string using lazy replacement 
and editing. 
The dynamic substitution of a terminal sym- 
bol a by a weighted automaton 9 aa is done by 
replacing the symbol a by the automaton aa, us- 
ing the replacement operation discussed earlier. 
This replacement is also done on demand, with 
only the necessary part of aa being expanded for 
a given input string. In practice, the automaton 
aa can be large, a list of city or person names for 
example. Thus a lazy implementation is crucial 
for dynamic substitutions. 
3. Optimizations, Experiments and 
Results 
We have a full implementation of the compila- 
tion algorithm presented in the previous section, 
including the lazy representations that are cru- 
cial in reducing the space requirements of speech 
recognition applications. Our implementation 
of the compilation algorithm is part of a gen- 
era\] set of grammar tools, the GRM Library 
(Mohri, 1998b), currently used in speech pro- 
cessing projects at AT&T Labs. The GRM Li- 
brary also includes an efficient compilation too\] 
for weighted context-dependent rewrite rules 
(Mohri and Sproat, 1996) that is used in text- 
to-speech projects at Lucent Bell Laboratories. 
Since the GRM library is compatible with the 
FSM general-purpose finite-state machine li- 
brary (Mohri et al., 1998a), we were able to use 
the tools provided in FSM library to optimize 
the input weighted transducers r(G) and the 
weighted automata in the compilation output. 
We did several experiments that show the ef- 
ficiency of our compilation method. A key fea- 
ture of our grammar compilation method is the 
representation of the grammar by a weighted 
transducer that can then be preoptimized using 
weighted transducer determinization and mini- 
mization (Mohri, 1997; Mohri, 1998a). To show 
SFor any X, this is a constant time operation. For 
instance, if K(S) is right-llnear, we just need to pick out 
the state associated to X in K(X). 
9In fact, our implementation allows more generally 
dynamic substitutions by weighted transducers. 
895 
O 
// " 
/ / 
no op6mization /// 
- (Un~/10) / 
optimization i I 
i I I 
= // 
i/ 
i/ / 
/ 
i / =/ 
/ 
7 / °~" .~:'~.~ 
50 100 1~50 2()0 250 
VOCABULARY 
/" 
i I /' 
i I 
no optimization // / 
- (size / 25) / 
optimization ,/ 
/' 
// 
// / 
// 
// ,= 
// 
// 
// / 
/ 
VOCABULARY 
Figure 8: Advantage of transducer representation combined with preoptimization: time and space. 
the benefits of this representation, we compared 
the compilation time and the size of the re- 
sulting lazy automata with and without preop- 
timization. The advantage of preoptimization 
would be even greater if the compilation output 
were fully expanded rather than on-demand. 
We did experiments with full bigram models 
with various vocabulary sizes, and with two un- 
weighted grammars derived by feature instanti- 
ation from hand-built feature-based grammars 
(Pereira and Wright, 1997). Figure 8 shows 
the compilation times of full bigram models 
with and without preoptimization, demonstrat- 
ing the importance of the optimization allowed 
by using a transducer representation of the 
grammar. For a 250-word vocabulary model, 
the compilation time is about 50 times faster 
with the preoptimized representation. 1° Figure 
8 also shows the sizes of the resulting lazy au- 
tomata in the two cases. While in the preop- 
timized case time and s_~ace grow linearly with 
vocabulary size (O(x/IGI)), they grow quadrat- 
ically in the unoptimized case (O(\[G\[)). 
The bigram examples also show the advan- 
tages of lazy replacement and editing over the 
full expansion used in previous work (Pereira 
and Wright, 1997). Indeed, the size of the 
fully-expanded automaton for the preoptimized 
1°For convenience, the compilation time for the unop- 
timized case in Figure 8 was divided by 10, and the size 
of the result by 25. 
Table 1: Feature-based grammars. 
I \[GI I optim, time expanded (s) states \[expanded 
transitions \[ 14 1\]no 04  14'° i 
431 yes .02 1535 2002 i1 0 ,i 0,0  ,40141 i 
12657 yes 2.02 112795 144083 
case grows quadratically with the vocabulary 
size (O(IGI)), while it grows with the cube of 
the vocabulary size in the unoptimized case (0(IGt3/2)). 
For example, compilation is about 
700 times faster in the optimized case for a fully 
expanded automaton even for a 40-word vo- 
cabulary model, and the result about 39 times 
smaller. 
Our experiments with a small and a medium- 
sized CFG obtained from feature-based gram- 
mars confirm these observations (Table 1). 
If dynamic grammars and lazy expansion are 
not needed, we can expand the result fully and 
then apply weighted determinization and min- 
imization algorithms. Additional experiments 
show that this can yield dramatic reductions in 
automata size. 
4. Conclusion 
A new weighted CFG compilation algorithm has 
been presented. It can be used to compile effi- 
896 
ciently an interesting class of grammars repre- 
senting weighted regular languages and allows 
for dynamic modifications that are crucial in 
many speech recognition applications. 
While we focused here on CFGs with real 
number weights, which are especially relevant in 
speech recognition, weighted CFGs can be de- 
fined more generally over an arbitrary semiring 
(Salomaa and Soittola, 1978). Our compilation 
algorithm applies to general semirings without 
change. Both the grammar compilation algo- 
rithms (GRM library) and our automata opti- 
mization tools (FSM library) work in the most 
general case. 
Acknowledgements 
We thank Bruce Buntschuh and Ted Roycraft 
for their help with defining the dynamic gram- 
mar features and for their comments on this 
work. 

References 
Alfred V. Aho and Jeffrey D. Ullman. 1973. 
The Theory of Parsing, Translation and 
Compiling. Prentice-Hall. 
Alfred V. Aho, John E. Hopcroft, and Jeffrey D. 
Ullman. 1974. The design and analysis of 
computer algorithms. Addison Wesley: Read- 
ing, MA. 
Jean-Michel Autebert, Jean Berstel, and Luc 
Boasson. 1997. Context-free languages and 
pushdown automata. In Grzegorz Rozenberg 
and Arto Salomaa, editors, Handbook of For- 
mal Languages, volume 1, pages 111-172. 
Springer. 
Lalit R. Bahl, Fred Jelinek, and Robert Mercer. 
1983. A maximum likelihood approach to 
continuous speech recognition. IEEE Trans- 
actions on Pattern Analysis and Machine In- 
telligence (PAUl), 5(2):179-190. 
Jean Berstel. 1979. Transductions and Context- 
Free Languages. Teubner Studienbucher: 
Stuttgart. 
Michael K. Brown and Bruce M. Buntschuh. 
1994. A context-free grammar compiler for 
speech understanding systems. In Proceed- 
ings of the International Conference on Spo- 
ken Language Processing (ICSLP '94), pages 
21-24, Yokohama, Japan. 
Kenneth W. Church. 1983. A finite-state parser 
for use in speech recognition. In 21 st Meet- 
ing of the Association for Computational Lin- 
guistics (ACL '88), Proceedings of the Con- 
ference. ACL. 
Fred Karlsson, Atro Voutilalnen, Juha Heikkila, 
and Atro Anttila. 1995. Constraint Gram- 
mar, A language-Independent System for 
Parsing Unrestricted Tezt. Mouton de 
Gruyter. 
Mehryar Mohri and Richard Sproat. 1996. An 
efficient compiler for weighted rewrite rules. 
In 34th Meeting of the Association for Com- 
putational Linguistics (ACL 96), Proceedings 
of the Conference, Santa Cruz, California. 
ACL. 
Mehryar Mohri, Fernando C. N. Pereira, and 
Michael Riley. 1998a. A rational design for a 
weighted finite-state transducer library. Lec- 
tune Notes in Computer Science, to appear. 
Mehryar Mohri, Michael Riley, Don Hindle, 
Andrej Ljolje, and Fernando C. N. Pereira. 
1998b. Full expansion of context-dependent 
networks in large vocabulary speech recogni- 
tion. In Proceedings of the International Con- 
ference on Acoustics, Speech, and Signal Pro- 
cessing (ICASSP '98), Seattle, Washington. 
Mehryar Mohri. 1997. Finite-state transducers 
in language and speech processing. Compu- 
tational Linguistics, 23:2. 
Mehryar Mohri. 1998a. Minimization algo- 
rithms for sequential transducers. Theoretical 
Computer Science, to appear. 
Mehryar Mohri. 1998b. Weighted Grammar 
Tools: the GRM Library. In preparation. 
Fernando C. N. Pereira and Rebecca N. Wright. 
1997. Finite-state approximation of phrase- 
structure grammars. In Emmanuel Roche 
and Yves Schabes, editors, Finite-State Lan- 
guage Processing, pages 149-173. MIT Press, 
Cambridge, Massachusetts. 
Lawrence Rabiner and Biing-Hwang Juang. 
1993. Fundamentals of Speech Recognition. 
Prentice-Hall, Englewood Cliffs, NJ. 
Arto Salomaa and Matti Soittola. 1978. 
Automata-Theoretic Aspects of Formal Power 
Series. Springer-Verlag: New York. 
Joseph S. Ullian. 1967. Partial algorithm prob- 
lems for context free languages. Information 
and Control, 11:90-101. 
