Technical Correspondence 
Techniques for Automatic Memoization with 
Applications to Context-Free Parsing 
Peter Norvig* 
University of California 
It is shown that a process similar to Earley's algorithm can be generated by a simple top-down 
backtracking parser, when augmented by automatic memoization. The memoized parser has the 
same complexity as Earley's algorithm, but parses constituents in a different order. Techniques 
for deriving memo functions are described, with a complete implementation in Common Lisp, 
and an outline of a macro-based approach for other languages. 
1. Memoization 
The term memoization was coined by Donald Michie (1968) to refer to the process by 
which a function is made to automatically remember the results of previous compu- 
tations. The idea has become more popular in recent years with the rise of functional 
languages; Field and Harrison (1988) devote a whole chapter to it. The basic idea is 
just to keep a table of previously computed input/result pairs. In Common Lisp one 
could write: 1 
(defun memo (fn) 
"Return a memo-function of fn." 
(let ((table (make-hash-table))) 
#'(lambda (x) 
(multiple-value-bind (val found) 
(gethash x table) 
(if found 
val 
(setf (gethash x table) (funcall fn x))))))). 
(For those familiar with Lisp but not Common Lisp, gethash returns two values, 
the stored entry in the table, and a boolean flag indicating if there is in fact an entry. 
The special form multiple-value-bind binds these two values to the symbols val 
and found. The special form serf is used here to update the table entry for x.) 
In this simple implementation fn is required to take one argument and return 
one value, and arguments that are eql produce the same value. Below we will relax 
some of these restrictions. The problem with this approach is that we also want to 
memoize any recursive calls that fn may make. To use the canonical example, if we 
have: 
* Computer Science Division, University of California, Berkeley, CA 94720 1 All examples are in Common Lisp, rather than in generic pseudo-code for two reasons. First, I want to 
stress that these techniques can actually be used in existing languages. Second, Common Lisp provides 
a rich set of primitives (such as make-hash-table) that would otherwise require lengthy explanations. 
(~) 1991 Association for Computational Linguistics 
Computational Linguistics Volume 17, Number 1 
(defun fib (n) 
"Compute the Nth Fibonacci number." 
(if (<= n 1) n 
(+ (fib (- n I)) (fib (- n 2))))), 
then the function (memo (function fib)) will not have linear-order complexity, be- 
cause recursive calls will go to the original function fib, not to the memoized version. 
One way to fix this problem is to assign the new memoized function to the name f lb. 
Common Lisp is a good host language for this approach, because there are primitives 
for accessing and altering the global function name space. Consider the following: 
(defun memoize (fn-name) 
"Replace fn-name's global definition with a memoized version." 
(setf (symbol-functio n fn-name) (memo (symbol-function fn-name)))). 
When passed a symbol that names a function, memoize changes the global defini- 
tion of the function to a memo-function. Thus, any recursive calls will go first to the 
memo-function, rather than to the original function. This is just what we want; all we 
have to say is (memoize ' fib) to transform fib from an exponential- to a linear-order 
algorithm. 
To make sure that the memoization step isn't left out during program develop- 
ment, some programmers may prefer to write code like this: 
(memoize 
(defun f (x) ...)). 
Or even like this: 
(defmacro defun-memo (fn args &body body) 
"Define a memoized function." 
~(memoize (defun ,fn ,args . ,body))) 
(defun-memo f (x) ...). 
Both of these approaches rely on the fact that defun returns the name of the 
function defined. Note that the following will not work 
(labels ((fib (n) 
(if (<= n I) n 
(+ (fib (- n I) 
(memoize 'fib) 
(fib i00)) 
(fib (- n 2))))))) 
because memoize affects only the global function binding, not the lexical (local) defi- 
nition. 
The version of memo presented above suffers from a serious limitation -- the func- 
tion to be memoized can only have one argument. In the revised definition given 
below, the function can take any number of arguments, and the indexing can be on 
92 
Norvig Memoization and Context-Free Parsing 
any function of the arguments. When there is only argument, the default key function, 
first, is appropriate. To hash on all the arguments, use identity as the key. 
(defun memo (fn &key (key #'first) (test #'eql) name) 
"Return a memo-function of fn." 
(let ((table (make-hash-table :test test))) 
(serf (get name 'memo) table) 
#'(lambda (&rest args) 
(let ((k (funcall key args))) 
(multiple-value-bind (val found) 
(gethash k table) 
(if found val 
(setf (gethash k table) (apply fn args)))))))) 
(defun memoize (fn-name &key (key #'first) (test #'eql)) 
"Replace fn-name's global definition with a memoized version." 
(setf (symbol-function fn-name) 
(memo (symbol-function fn-name) 
:name fn-name :key key :test test))) 
(defun clear-memoize (fn-name) 
(clrhash (get fn-name ~memo))) 
Also note that the hash table is stored away on the function name's property list, so 
that it can be inspected or cleared at will. The intent is that the user's program should 
clear the table when the results are likely to be out of the working set. We might also 
want a hash mechanism that caches only recent entries, and discards earlier ones, as 
originally suggested by Michie (1968). 
The user also has the responsibility of choosing the appropriate hashing function. 
By choosing eql instead of equal hashing, for example, hashing overhead will be 
reduced, but computation will be duplicated for equal lists. A compromise is to use 
eql hashing in conjunction with unique or canonical lists (Szolovits and Martin, 1981). 
eql hashing has the additional advantage of allowing infinite circular lists, as discussed 
by Hughes (1985). 
2. Context-Free Parsing 
All efficient algorithms for parsing context-free grammars make use of some kind 
of well-formed substring table. Earley's algorithm (1970) is perhaps the best-known 
example. The algorithm builds up a vector of parse lists, where each entry in a parse 
list is an item -- a production rule with one indicator showing how much of the right- 
hand side has been parsed, and another saying where the parse started. Kay (1980) 
introduced a similar data structure called a chart, along with a family of parsing 
algorithms that operate on the chart. Chart parsing is described in virtually all recent 
texts on natural language processing; for example, Winograd (1983) devotes 19 pages to 
the topic. It is part of the "folklore" on parsing that these algorithms can be thought of 
as tabular (i.e., memoized) versions of corresponding simpler algorithms. Earley (1970) 
himself mentions it, and Shieber (1989) gives a general abstract parsing strategy, and 
then proves that Earley's algorithm can be derived by suitable constraints on control 
of the strategy. 
93 
Computational Linguistics Volume 17, Number 1 
This paper's contribution is a concrete demonstration of just how direct the cor- 
respondence is between the simple and the efficient algorithm. We present a simple 
parser which, when memoized, performs the same calculations as Earley's algorithm. 
The core of the parser is only 15 lines of code: 
(defun parse (tokens start-symbol) 
"Parse a list of tokens, return parse trees and remainders." 
(if (eq (first tokens) start-symbol) 
(list (make-parse :tree (first tokens) :rem (rest tokens))) 
(mapcan #' (lambda (rule) 
(extend-parse (lhs rule) nil tokens (rhs rule))) 
(rules-for start-symbol) ) ) ) 
(defun extend-parse (lhs rhs rem needed) 
"Parse the remaining needed symbols. " 
(if (null needed) 
(list (make-parse :tree (cons lhs rhs) :rem rem)) 
(mapcan 
#' (lambda (p) 
(extend-parse lhs (append rhs (list (parse-tree p))) 
(parse-rem p) (rest needed))) 
(parse rem (first needed))))). 
This assumes that there are no left-recursive rules. The parser requires the following 
definitions: 
(defstruct (parse) "A parse tree and a remainder." tree rem) 
;; Trees (and rules) are of the form: (lhs . rhs) 
(defun lhs (tree) (first tree)) 
(defun rhs (tree) (rest tree)) 
(defun rules-for (symbol) 
"Return a list of the rules with symbol on the left hand side." 
(remove symbol *grammar* :key #'lhs :test-not #'eql)). 
We now need to specify the memoization. In addition, since the function parse returns 
all valid parses of all prefixes of the input, we add the function parser which returns 
only parses of the complete input. 
(memoize 'rules-for) 
(memoize 'parse :test #'equal :key #'identity) 
(defun parser (tokens start-symbol) 
"Return all complete parses of a list of tokens." 
(clear-memoize 'parse) 
(mapcar #'parse-tree 
(remove-if-not @'null (parse tokens start-symbol) 
:key #'parse-rem))) 
As an example, consider the following grammar, taken from page 321 of Aho and 
Ullman (1972). Here the Lisp form (E T ÷ E) corresponds to the grammar rule 
94 
Norvig Memoization and Context-Free Parsing 
E --, T + E, where +, *, ~ and \] are all terminal symbols in the grammar, and con- 
catenation is implicit. 
(defparameter *grammar* 
'((E T + E) 
(E T) 
(T F * T) 
(T F) 
(F \[ E \]) 
(F a))) 
We can use this to obtain parses like the following: 
> (parser '( \[ a + a \] * a) 'E) 
((E (T (F \[ (E (T (F A)) + (E (T (F A)))) \]) 
(T (F A))))). 
We do not include here a proof that the memoized parse does the same work as 
Earley's algorithm, but it is not too difficult to show. Interested readers can insert 
the following format statement as the first expression in extend-parse, and then run 
the example again. The resulting output, when sorted, matches exactly the parse lists 
shown on page 322 of Aho and Ullman (1972). 
(format t "~&~d ~a ->~{ ~a ~} . ~{ ~a~}" 
(- 7 (length rem)) lhs 
(mapcar #'(lambda (x) (if (consp x) (first x) x)) rhs) 
needed) 
The items come out in a different order because the Earley algorithm is strict left-to- 
right, while the memoized version is doing backtracking, but with the memoization 
eliminating all duplication of effort. 
Unfortunately, the memoized parser as presented does not have the same asymp- 
totic complexity as Earley's algorithm. The problem is that hashing is done on the 
complete argument list to parse: the start symbol and the list of remaining tokens. 
Hashing on this takes time proportional to the number of tokens, so the whole algo- 
rithm is O(r~ 4) instead of O(n3). What we really want is a hash table that is a compro- 
mise between equal and eql hashing, one that takes keys that are lists, where each 
element of the list should be compared by eql. Such a table can be easily managed: 
(defun put-multi-hash (keylist value hash-table) 
"Store a value in a multi-level hash table: 
one level for each element of keylist" 
(if (= (length keylist) I) 
(serf (gethash (first keylist) hash-table) value) 
(let ((tablel (or (gethash (first keylist) hash-table) 
(setf (gethash (first keylist) hash-table) 
(make-hash-table))))) 
(put-multi-hash (rest keylist) value tablel)))) 
95 
Computational Linguistics Volume 17, Number 1 
(defun get-multi-hash (keylist hash-table) 
"Fetch a value from a multi-level hash table: 
one level for each element of keylist" 
(if (= (length keylist) i) 
(gethash (first keylist) hash-table) 
(let ((table1 (or (gethash (first keylist) hash-table) 
(serf (gethash (first keylist) hash-table) 
(make-hash-table))))) 
(get-multi-hash (rest keylist) tablel)))). 
Now to use these multi-level hash tables, we need another version of memo, one that 
gives us the flexibility to specify the get and put functions: 
(defun memo (fn &key name (maker #'make-hash-table) 
(getter #'gethash) (putter #'puthash)) 
"Return a memo-function of fn." 
(let ((table (funcall maker))) 
(serf (get name 'memo) table) 
#'(lambda (~rest args) 
(multiple-value-bind (val found) 
(funcall getter args table) 
(if found val 
(funcall putter args (apply fn args) table)))))) 
(defun memoize (fn-name &rest memo-keys) 
"Replace fn-name's global definition with a memoized version." 
(serf (symbol-function fn-name) 
(apply #'memo (symbol-function fn-name) 
:name fn-name memo-keys))). 
Finally, we can get an O(n 3) parser by saying: 
(memoize 'parse :getter #'get-multi-hash :putter #'put-multi-hash). 
3. Memoizing in Other Languages 
In the Scheme dialect of Lisp, the programmer would have to write (set! fib (memo 
fib)) instead of (memoize 'fib). This is because Scheme lacks symbol-value and 
set-symbol-value! functions, and has nothing to do with the fact that Scheme has a 
single name space for functions and variables. It is also possible to write the memoized 
function all in one step: 
(define fib 
(memo (lambda (n) 
(if (<= n i) n 
(+ (fib (- n 1) (fib (- n 2)))))))). 
This is the approach taken by Abelson and Sussman (1985; see exercise 3.27). 
In a Scheme implementation with macros, we could of course define memoize or 
define-memo as macros, thereby achieving the same result as in Common Lisp. In 
96 
Norvig Memoization and Context-Free Parsing 
fact, the macro solution even works in languages without first class functional objects, 
if a sufficiently powerful macro facility is available, or if the language's parser can 
be modified. For example, one could add the keyword memo to Pascal, and write the 
following: 
memo function fib(n:integer) : integer; 
begin 
if n <= i then 
fib := n 
else fib := fib(n-l) + fib(n-2); 
end; 
and have it parsed as if it were: 
function fib(n:integer) : integer; 
begin 
if not inTable(n) then 
if n <= i then 
addToTable(n,n) 
else addToTable(n,fib(n-1) + fib(n-2)); 
fib := tableValue(n); 
end; 
Texts such as Field and Harrison (1988) assume that memoization is done by a built-in 
feature of the language that in effect implements this kind of source-to-source trans- 
formation. This paper shows that memoization can also be done completely within 
the language -- provided the language has either (a) a macro facility or (b) both 
higher-order functions and a way to set function bindings. 
4. Conclusion 
What has been gained by memoization? The memoized algorithms presented here are 
of the same order of complexity as the existing Earley algorithm and linear-recursive 
Fibonacci function. In fact, unless tables are implemented very carefully, the memoized 
algorithms perform worse by a constant factor. 
However, note that Earley's algorithm is not the best for all parsing applications. 
One might prefer a bottom-up parser or a deterministic parser. The point is that ex- 
perimenting with alternate parsing strategies is easier when starting from a simple 
15-line program, rather than from a multi-page parser that maintains complex explicit 
data structures. 
Memoization supports an incremental, lazy-evaluation-like style that can make 
the overall design simpler. For example, note that besides parse, we also memoized 
rules-for. This has the effect of building an inverted index of the grammar rules, 
implicitly and incrementally. The programmer is freed from having to remember to 
declare, allocate, and initialize a rule table, but still gets the benefit of efficient access 
to the rules. This is the ultimate data abstraction: the programmer is shielded not only 
from the details of the implementation of the table, but even from the existence of a 
table at all. 
For twenty years, algorithms like Earley's (and more recently chart parsing) have 
been treated as special techniques, worthy of special attention. This paper has shown 
97 
Computational Linguistics Volume 17, Number 1 
that the maintenance of well-formed substring tables or charts can be seen as a spe- 
cial case of a more general technique: memoization. Furthermore, we have shown 
that Common Lisp, with its mutable function name space, is an especially congenial 
host language for memoization, and that other languages can be hosts with other 
approaches. It is our hope that programmers will adopt automatic techniques like 
memoization where appropriate, and concentrate their data-creation and algorithm- 
optimization efforts on truly special cases. 
Acknowledgments 
This work has been supported by the 
Defense Advanced Research Projects 
Agency (DoD), Arpa Order No. 4871, 
monitored by Space and Naval Warfare 
Systems Command under Contract 
N00039-84-C-0089. 
References 
Abelson, H. and Sussman, G.J. (1985). 
Structure and Interpretation of Computer 
Programs. MIT Press. 
Aho, A.V. and Ullman, J.D. (1972). The 
Theory of Parsing, Translation, and 
Compiling, Vol 1: Parsing. Prentice-Hall. 
Earley, J. (1970). "An efficient context-free 
parsing algorithm." Communications of the 
Association for Computing Machinery 6(2), 
451-455. 
Field, A.J. and Harrison, P.G. (1988). 
Functional Programming. Addison-Wesley. 
Grosz, B.J., Sparck-Jones, K., and Webber, 
B.L. (1986). Readings in Natural Language 
Processing. Morgan Kaufman. 
Hughes, R.J.M. (1985). "Lazy memo 
functions." In Proceedings, Conference on 
Functional Programming and Computer 
Architecture, 129-146. Springer-Verlag. 
Kay, M. (1980). "Algorithm schemata and 
data structures in syntactic processing." 
In Proceedings, Symposium on Text 
Processing, Nobel Academy. 
Michie, D. (1968). "Memo functions and 
machine learning." Nature, 218, 19-22. 
Sheiber, Stuart M. (1989). "Parsing and type 
inference for natural and computer 
languages." SRI International Technical 
Note 460. 
Szolovits, P. and Martin, W.A. (1981). 
"Brand X: Lisp support for semantic 
networks." In Proceedings, 7th IJCAL 
Vancouver. 
Winograd, T. (1983). Language as a Cognitive 
Process. Addison-Wesley. 
98 
