Sorry, what was your name again, or how to overcome
the tip-of-the tongue problem with the help of a computer?
Michael Zock
LIMSI-CNRS, B.P.133
91403 Orsay-Cedex, France
zock@limsi.fr
Abstract
A speaker or writer has to find words for
expressing his thoughts. Yet, knowing a
word does not guarantee its access. Who
hasn’t experienced the problem of looking
for a word he knows, yet is unable to ac-
cess (in time) ? Work done by psy-
chologists reveals that people being in this
so called tip-of-the-tongue state (TOT)
know a lot about the word : meaning,
number of syllables, origine, etc. Speakers
are generally able to recognize the word,
and if they produce an erroneous word,
that token shares many things with the tar-
get word (initial/final letter/phoneme, part
of speech, semantic field, etc.). This being
so, one might want to take advantage of
the situation and build a program that as-
sists the speaker/writer by revealing the
word that’s on his/her mind (tongue/pen).
Three methods will be presented, the first
one being implemented.
1 The context or starting point
I’m currently involved in a project (PA-
PILLON)
1
 whose goal is to build a huge mul-
tilingual lexical data-base (English-French-
Japanese, Thai) from which one can extract
digital bilingual dictionaries. The latter can
be customized to fit different needs: dictiona-
ries to be used by people vs. dictionaries to
be used by machines (e.g. automatic transla-
tion).
One of the ideas is to enhance this dic-
tionary by adding certain functions, in order
to capitalize on the data. Rather than being a
                                                       
1
 http://www.papillon-dictionary.org
component supporting a single task, the dic-
tionary is at the centre, supporting the user in
a variety of tasks like reading, writing, me-
morization of words or automation of syn-
tactic structures. Particular emphasis will be
given to word access, the topic of this paper,
because, what is a dictionary good for, if one
cannot access the data it contains? The ap-
proach taken is generic, hence, it applies not
only within this particular context.
Word access being a fundamental task in
language production, one might wonder what
could be learned by gleaning at work done in
the context of automatic text generation.
2 Word access in Natural-Language 
Generation
A lot of (natural language generation) resear-
chers have been interested in lexical issues
during the last fifteen years or so.
2
 Yet des-
pite this enormous body of work, the issue of
word access has not been addressed at all
within this community, not even in Ward’s
extensive problem catalog (Ward 1988).
While from a strict computational linguistic
point of view, the whole matter may be a non-
issue,
3
 however, if we address the problem of
lexicalization from a psycholinguistic or
man-machine interaction point of view
                                                       
2
 For excellent surveys see (Robin, 1990; Wanner
1996).
3
 Most programs running serially, there is no such
thing as competition. Hence, problems like inter-
ference, confusion or forgetting do not occur.
Furthermore, computers having a perfect memory,
stored information can generally be easily acces-
sed. The situation is quite different for people.
(spontaneous discourse or writing on a com-
puter), things are quite different. Just as
“knowing a word” does not imply “being
able to access it”, ‘choosing a word’ does not
imply ‘being able to find the set of candidates
from which to choose’. The situation is so-
mehow different in psycholinguistics. Again,
there is an enormous body of research (Mar-
slen-Wilson, 1989; Aitchinson, 1987; Levelt,
1992, to name just those). While all these
authors take up the issue of word access, they
do not consider the use of computers for hel-
ping people in their task. Yet this is precisely
what I do here.
3 What prevents us from finding a
word?
In order to answer this question, let us take a
look at the time course of lexicalization. Ac-
cording to psychologists (Butterworth,
1989:110; Levelt 1989), lexical access takes
place in two temporally distinct stages. In the
first stage the speaker checks the semantic
lexicon for a lemma expressing the concep-
tual input.
4
 If he can find one, he will take it
and consult then the phonological lexicon in
order to find the appropriate phonological
form.
5
Errors can occur at both ends. Yet, since
the two stages are independent, errors belong
to either category, but never to both. Errors at
the semantic level will yield a wrong lemma
(e.g. hate instead of love), while errors at the
phonological level may result in the wrong
phonological form. For example, the intented
relegate may surface as renegate or delegate
(/l/ and /n/ as well /r/ and /d/ being phonolo-
gically relatively close), turn on the heater
switch may result in turn on the sweeter hitch
(Clark & Clark, 1977), etc. As one can see,
these words are all phonologically reasonably
close to the target word, yet, it is precisely
                                                       
4
 Suppose you wanted to refer to a castor, then
there could be a competition between the lemmata
“otter, beaver”.
5
 If the chosen lemma were “beaver” then  all
words starting with “b-e-a” or “b-e-e”  could be
considered as candidates.
this proximity that prevents the speaker to
access the “right” word.
4 The speaker’s problem: choosing
words, finding them, or both?
Obviously, there is more to lexicalisation than
just choosing words: one has to find them to
begin with. No matter how rich a lexical data-
base may be, it is of little use if one cannot ac-
cess the relevant information in time.
Work on memory has shown that access de-
pends crucially on the way information is orga-
nized (Baddeley, 1982). From speech error lit-
erature (Fromkin 1973) we learn that ease of
access depends not only on meaning relations
(word bridges, i.e. associations) or the structure
of the lexicon (i.e. the way words are organized
in our mind),— but also on linguistic form.
Researchers collecting speech errors have of-
fered countless examples of phonological errors
in which segments (phonemes, syllables or
words) are added, deleted, anticipated or ex-
changed. Reversals like /aminal/ instead of
/animal/, or /karpsihord/ instead of /harpsikord/
are not random at all, they are highly systematic
and can be explained. Examples such as grastly,
a blend of grizzly + ghastly, or Fatz and Kodor
(instead of Katz and Fodor) clearly show that
knowing the meaning of a word does not guar-
antee its access.
The work on speech errors also reveals that
words are stored in two modes, by meaning and
by form (sound), and it is often this latter which
inhibits finding the right token: having recom-
bined inadvertently the components of a given
word (syllable scrambling), one may end up
producing a word, which either does not exist or
is simply different from the one in mind. This
kind of recombination, resulting from bookkee-
ping problems due to time pressure, parallel
processing and information overload, may dis-
turb or prevent the access of words. Hence the
usefulness of a tool that allows to revert the
process.
5 Three methods for enabling the
computer to guess the right word
I shall present here three methods (one based on
form, another based on meaning, the last one
based on a combination of both) for helping the
writer to find the word he is looking for. So far,
only the first method is implemented.
5.1 Access by form
The component described below is part of a
larger system PIC (Fournier & Letellier, 1990).
For its adaptation to the problem of lexical ac-
cess see Zock & Fournier (2001).
The system has two basic mechanisms for
finding the right word: anacodes and phoneti-
codes. The former deals with errors due to per-
mutations, while the latter deals with homo-
phones. Since an anacode is equivalent to the set
of letters composing the word, scrambled letters
(unwanted permutations) are not a problem at
all. The system would still find the right word,
provided that there is such a candidate in the
dictionary, and provided that the user didn't
omit, add or replace one character(s) by another.
For example, if the input were aclibrer instead
of calibrer, the system would have no difficulty
to find the target word (calibrer), since both
words are composed of the same set of letters
(a/b/c/e/i/l/r). If the user added letters outside of
the anacode, the system would need several runs
to check alternative spellings by making local
variations (delete or add a character by making
systematic permutations).
The second technique (phoneticodes) con-
sists in converting graphemes into phonemes,
which allows the system to deal with spelling
errors due to homophony, a very frequent phe-
nomenon in French (see figure 1).
FRENCH ENGLISH SYNT. CAT. DOMAIN
vingt twenty Adjective NUMBER
vin wine Noun-singular BEVERAGE
vins wines Noun-plural BEVERAGE
je vins I came Verb-past
tense
MOUVEMENT
tu vins you came Verb-past tense MOUVEMENT
il vint he came Verb-past
tense
MOUVEMENT
…qu’il
vînt
...that he
came
Verb-
subjonctif
MOUVEMENT
je vaincs I win V-pres. tense COMPETITION
tu vaincs you win V-pres. tense COMPETITION
il vainc he wins V-pres. tense COMPETITION
vaincs win V-Imperative COMPETITION
en vein in vain Adverb UTILITY
Figure 1 : The many ways of writing /vR/
For example, the system would be able to
deal with errors like vin, vins, vein, vint,
vaincs, etc. instead of vingt. If the system can-
not find directly a candidate, it will perform lo-
cal changes by performing permutations of pho-
nemes or syllables. Hence it would have no
problem to find the word poteau (pole) instead
of topos (topic), both words being composed of
the same syllables (/po-to/), the only difference
being their order.
The situation is more complex and may even
become intractable if extraneous material is
added, or if the correction yields an existing, yet
semantically different word from what was in-
tended. Suppose that the target word were "mai-
son" (house), while the user typed “masson”.
Relying on the phoneticode, the system might
suggest "maçon" (bricklayer), a word that exists,
but which is not at all what was intended. Rely-
ing on the anacode, it would fail, because none
of the permutations would result in the target
word.
5.2 Access by meaning: navigation in a 
huge associative network
As mentionned before, words are stored by
meaning and by form (sound). Therefore we
need a method to access words in both modali-
ties. This is all the more necessary, as the spea-
ker starts from a meaning representation
(concept, message). We plan to experiment with
the following two methods. In the first case
search is done by propagation in a dynamically
built network. In the second case search is done
by filtering (5.3).
The fact that the dictionary is organized as a
web rather than a taxonomy, has obvious advan-
tages : there is more than one way to reach the
goal. Hence, if ever one has gone in the
« wrong » direction, one may still recover later
on. To illustrate this last point, let’s take an
example. Suppose you played chess and you
wanted to find the French equivalent for the
word « knight » (cavalier). If the dictionary
were structured along the lines suggested, than
one could access the word via any of the follo-
wing links or associations : horseman (cavalier-
noun), to be nonchalant (être cavalier), to be a
loner (faire cavalier seul), but also to bolt (ca-
valer), King Arthur (chevalier), famous French
singer (Maurice Chevalier). Note, that while in
the first three cases we get homonymes of the
target word (cavalier), to bolt produces a simi-
larly sounding word (cavaler instead of cava-
lIer). The last association (Maurice Chevalier)
results in a perfect match, except for the first
syllable and the first name, which would have to
be dropped, of course.
5.2.1 Search by progagation in the net-
work
I start from the assumption that the mental dic-
tionary is a huge semantic network composed of
words (nodes) and associations (links), either
being able to activate the other.
6
 Finding a word
amounts thus to entering the network and to
follow the links leading to the target word.
Being unable to access the desired word, a spea-
ker being in the TOT-state may still be able to
recognize it in a list. If the list doesn’t contain
the exact word, he is generally able to decide
which word leads in the right direction, i.e.
which word is most closely connected to the
target word.
Suppose you wanted to find the word nurse
(target word), yet the only token coming to your
mind were hospital. In this case the system
would build (internally) a small semantic net-
                                                       
6
 The idea according to which the mental dictionary
(or encyclopedia) is basically an associative network,
composed of nodes (words or concepts) and links
(associations) is not new. Actually the very notion of
association goes back at least to Aristotle (350 before
our time), but it is also inherent in work done by
philosophers (Locke, Hume) physiologists (James &
Stuart Mills), psychologists (Galton, 1880 ; Freud,
1901 ; Jung & Riklin, 1906) and psycholinguists
(Deese, 1965 ; Jenkins, 1970, Schvaneveldt, 1989 ).
For surveys in psycholinguistics see (Hörmann,
1972 ; chapters 6-10), or more recent work (Spitzer,
1999). The notion of association is also implicit in
work on semantic networks (Sowa, 1992), hypertext
(Bush, 1945), the web (Nelson, 1967), connectionism
(Stemberger, 1985 ; Dell, 1986) and of course Word-
Net (Miller, 1990, Fellbaum, 1998).
work with hospital in the center (Figure 2a)
and as immediate satellites all the words having
a direct link with it (Figure 2b).
7
 This process is
recursive: satellites can become the center, thus
triggering a new search, and since the speaker
knows the concept/word he is looking for, he is
likely to encounter it sooner or later.
Figure 2b shows the candidates from which
the user is supposed to choose. If he finds in any
of these groups the word he is looking for, the
process halts, otherwise it goes on. As you can
see words are presented in clusters. Each cluster
corresponds to a specific link. The assumption is
that the user will use this information in order to
jump quickly from one group to the next.
This approach might work fine provided : 1)
the speaker is able to come up with a word rea-
sonably close to the target; 2) The dictionary
contains (or allows to infer) all the relations/
associations a speaker typically uses. This se-
cond condition hardly ever holds. Hence, we
need to find out what these assocations are.
Also, while a single piece of information (a
word, a relationship or part of the definition) can
be useful, it is obviously better to provide more
information (number of syllables, sound, ori-
gine, etc.) as it will reduce the search space.
dentist
assistant ako
near-synonym
nurse
gynecologist
physician
health
institution
clinic
patient
sick person
hospital doctor
sanatorium
psychiatric hospital
military hospital
asylum synonym
ako
treat
ako
ako
ako
isa
isa
isa
isa
isa
isa
take care of
treatactor
actor
actor
Figure 2a : Search based on propagation in a network
(internal representation)
                                                       
7
 Of course, in case of ambiguity the user would have
to signal the specific meaning he has in mind.
clinic
sanatorium
military hospital
psychiatric hospital
doctor
patient
nurse
Figure 2b: proposed candidates grouped according
to the nature of the link
5.3 Search through a combination of 
conceptual and linguistic constraints
As mentionned already, a speaker finding him-
self in the TOT state knows generally many
things about the object he is looking for: parts of
the definition, ethymology, beginning/ending of
the word, number of syllables, part of speech
(noun, verb, adjectif, etc.), and sometimes even
the gender (Brown et McNeill,1966 ; Burke et
al. 1991 ; Vigliocco et al.,1997). We could use
all this information as constraints. The interface
for communicating this knowledge is somehow
akin to what MEDLINE offers to researchers
helping them to specify the kind of book they
are looking for.
6 Conclusion
I have drawn the readers’ attention to the im-
portance of word access in the context of NLG:
information must not only be available, it must
also be accessible. While this problem may not
be relevant for NLG in general, or in the strict
computational linguistic’s framework, it cer-
tainly is relevant when we look at generation as
a machine mediated process (people using a
word processors for writing), or from a psycho-
linguistic point of view: word access in writing
or (spontaneous) discourse. Looking at some of
the psycholinguistic findings, and looking at the
work done on spell checking, it seemed that
some of the techniques developed in the context
of the latter could profitably be used in the do-
main of the former. While the use of certain spell
checking techniques can certainly enhance word
access in speaking and writing (hence the poten-
tial of electronic dictionaries associated with
word processors), more work is needed in order
to adjust the method to be in line with psycho-
linguistic data and in order to keep the search
space small.
I have also tried to show that in order to sup-
port a speaker being in the TOT-state, we need
to create an associative memory. That is, I've
raised and partially answered the question what
kind of information semantic networks need to
have in order to be able to help a speaker being
in this state. Actually, my basic proposal is to
build a system akin to WordNet, but containing
many more links — in particular on the hori-
zontal plane. These links are basically associa-
tions, whose role consists in helping the speaker
to find either related ideas to a given stimulus,
(concept/idea/word - brainstorming), or to find
the word he is thinking of (word access). Hence,
future work will consist in identifying the most
useful assocations, by considering relevant work
in linguistics
8
 and in collecting data by running
psycholinguistic expermiments. For example,
one could ask people to label the links for the
words (associations) they have given in response
to a stimulus (word) ; or one could also ask them
to lable couples of words (eg. apple-fruit, lemon-
yellow, etc.). A complementary approach would
be to look for lexical-data-base mining-strate-
gies, as the desired information may be distri-
buted or burried deep down in the base. Finally,
one can also look at texts and try to extract au-
tomatically co-occurences (see Rapp & Wettler,
1991 ; Wettler & Rapp 1992).

References
Aitchinson, J. (1987) Words in the Mind: an Intro-
duction to the Mental Lexicon, Oxford, Blackwell.
Aitchison, J., A. Gilchrist & D. Bawden (2000) The-
saurus construction and use : a practical ma-nual,
Fitzroy Deaborn Pbs, Chicago
Aristotle (350 before JC) De memoria et reminiscen-
tia. In Parva Naturalia, Vrin
Baddeley, A. (1982) Your memory: A user's guide.
Penguin
Brown, R and Mc Neill, D. (1966). The tip of the
tongue  phenomenon. Journal of Verbal Learning
and Verbal Behavior, 5, 325-337
Burke, D.M., D.G. MacKay, J.S. Worthley & E.
Wade (1991) «On the Tip of the Tongue:What
Causes Word Finding Failures in Young and Older
Adults?», Journal of Memory and Language 30,
542-579.
Bush, V. (1945) "As we may think". The Atlantic
Monthly; Volume 176, No. 1; pp. 101-108
Butterworth, B. (1989) Lexical Acces in Speech
Production. In, W. Marslen-Tayler  (Ed.).
Clark, H & Eve V. Clark, (1977) Psychology and
Language.Harcourt, Brace, Jovanovich, New York
Crouch, C. (1990). An approach to the automatic
construction of global thesauri. Information Pro-
cessing and Management, vol. 26, no. 5, pp. 624-
640
Deese, J.(1965) The structure of associations in lan-
guage and thought. Baltimore
Dell, G. S., Chang, F., and Griffin, Z. M. (1999),
"Connectionist Models of Language Production:
Lexical Access and Grammatical Encoding," Cog-
nitive Science, 23/4, pp. 517-542.
Fellbaum, C. ( Ed.) 1998, WordNet : an electronic
lexical database, Cambridge (Massachusetts), The
MIT Press.
Fournier, J.P & S. Letellier. (1990) PIC: a Parallel
Intelligent Corrector. Artificial Intelligence Appli-
cation & Neural Networks AINN'90, pp 38-41,
Zürich
Freud, S. (1901) Psychopathology of everyday life.
Paris : Payot, 1997.
Fromkin, V. (1973) (Ed.) Speech errors as linguistic
evidence. The Hague: Mouton Publishers
Galton, F. (1880). Psychometric experiments. Brain,
2, 149-162.
Hörmann H. (1972). Introduction à la psycholin-
guistique. Paris: Larousse
Jenkins, J.J. (1970). The 1952 Minnesota word asso-
ciation norms. In: L. Postman, G. Keppel (eds.):
Norms of Word Association. New York: Academic
Press, 1-38.
Johnson, R, C. Fillmore, E. Wood, J. Ruppenhofer,
M. Urban, M. Petruck, C. Baker (2001) The Fra-
meNet Project: Tools for Lexicon Building,
http://www.icsi.berkeley.edu/~framenet/
Jung, C.G., Riklin, F. (1906). Experimentelle Unter-
suchungen über Assoziationen Gesunder. In: C.G.
Jung (ed.): Diagnostische Assoziationsstudien.
Leipzig: Barth, 7-145.
Levelt, W. (1992). Accessing Words in Speech Pro-
duction: Stages, Processes and Representations.
Cognition 42: 1-22.
Marslen-Taylor, W. (Ed.) (1979) Lexical Repre-
sentation and Process, Bradford book, MIT Press,
Cambridge, Mass.
Mel'cuk, I. et al. (1992) Dictionnaire Explicatif et
Combinatoire du français contemporain. Recherche
lexico-sémantique III. Les presses de l’université
de Montréal.
Miller, G.A., ed. (1990). WordNet: An On-Line Lexi-
cal Database. International Journal of Lexico-
graphy, 3(4).
Nelson, T. (1967) Xanadu Projet hypertextuel,
http://xanadu.com/
Palermo, D., Jenkins, J. (1964). Word Association
Norms. Minneapolis, MN: University of Minnesota
Press.
Perreault, J. (1965). Categories and relators: a new
schema. In Rev.Int. Doc., vol. 32, no.4, pp.136-144
Rapp, R. & M. Wettler (1991) A connectionist simu-
lation of word associations. Int. Joint Conf. on
Neural Network, Seattle
Robin, J. (1990) A Survey of Lexical Choice in Natu-
ral Language Generation, Technical Report CUCS
040-90, Dept. of Computer Science, University of
Columbia
Schvaneveldt, R. (Ed.) (1989) Pathfinder Associative
Networks : studies in knowledge organization.
Ablex, Norwood, New Jersay
Sowa, J. (1992) "Semantic networks," Encyclopedia
of Artificial Intelligence, edited by S. C. Shapiro,
Wiley, New York
Spitzer, M. (1999). The mind within the net : models
of learning, thinking and acting. A Bradford book.
MIT Press, Cambridge
Stemberger, J. P. (1985) "An interactive activation
model of language production." In A. W. Ellis [ed]
Progress in the Psychology of Language, Vol. 1,
143-186.  Erlbaum.
Vigliocco, G., Antonini, T., & Garrett, M. F. (1997).
Grammatical gender is on the tip of Italian tongues.
Psychological Science, 8, 314-317.
Wanner, L. (1996). Lexical Choice in Text Gene-
ration and Machine Translation: Special Issue on
Lexical Choice, Machine Translation. L. W. (ed.).
Dordrecht, Kluwer Academic Publishers. 11: 3-35.
Ward, N. (1988). Issues in Word Choice. COLING-
88, Budapest.
Wettler, M. & R. Rapp (1992) Computation of Word
associations based on the co-occurences of words
in large corpora
Wettler, M. (1980) Sprache, Gedächtnis, Verstehen.
Berlin, de Gruyter
Zock, M. & J.-P. Fournier (2001). How can compu-
ters help the writer/speaker experiencing the Tip-
of-the-Tongue Problem ?, Proc. of RANLP, Tzigov
Chark, pp. 300-302
