COMMENTS ON LEXICAL ANALYSIS* 
George A. Miller 
How lexical information should be 
represented in a computer program for 
processing natural language depends both on 
the goals that the program is intended to 
achieve and on the lexical information 
itself. Although programs can be imagined 
that might use lexical information in 
different ways, the information base that is 
exploited must be invariant over alternative 
programs. 
The present paper is concerned with the 
lexical information that must be 
represented, rather than with programming 
devices for representing it. First, an 
analysis scheme will be illustrated through 
a study of a single English verb. Then the 
scheme will be used as background for a 
discussion of some fundamental theoretical 
issues. 
Hand:. An exercise in Lexical Analvsis 
Consider the verb "hand" as it is used 
in: 
(I) a. She handed her hat to him. 
b. She handed him her hat. 
A paraphrase of (I) that captures all of the 
components of meaning to be disuussed here 
is: 
(2) She had her hat prior to some time t at 
which she used her hand to do something 
that caused her hat to travel to him, 
after which time he had her hat. 
The difference between (la) and (Ib) is 
usually regarded as syntactic, (Ib) deriving 
from the structure underlying (la) as a 
consequence of a dative-movement 
transformation that inverts the order of the 
direct and indirect objects and deletes 
"to". Some people, however, detect a 
difference in meaning: "She handed her hat 
to him," they say, merely suggests that he 
took it, whereas "She handed him her hat" 
asserts that he took it -- the sense 
expressed in (2). If one respects this 
difference in meaning, and also holds to the 
semantic neutrality of such grammatical 
transformations as dative movement, then 
presumably one must distinguish two 
different meanings of "hand" -- one 
resembling 'offer" and another 
offer-and-take." If one does not respect 
this meaning difference, both (la) and (Ib) 
have the "offer-and-take" sense. In either 
case, it is the sense paraphrased in (2) 
that will be considered here. 
Let the verb "hand" be represented as 
an operator, HAND, taking three arguments: 
the grammatical subject x, the indirect 
*Preparation of this paper has been 
supported by grants to The Rockefelier 
University from the Grant Foundation and 
from the Public Health Service, GM21796. 
The style of lexical analysis that is 
illustrated here was developed in 
collaboration with P.N. Johnson-Laird and 
will appear in more detail in Miller and Johnson-Laird (in press). 
object y, and the direct object z: 
HAND(x,y,z). Then (I) can be represented 
(to a first approximation) by: 
(3) (~X,F,z)\[WOMAN(x) &MAN(y) &HAND(x,F,Z)\] 
WOMAN, MAN, and HAT are not uninteresting 
concepts -- in particular, men and women 
have" hands (inalienable possession), 
whereas hats do not, and men and women can 
have" hats (either accidental possession or 
ownership), but not vice versa -- but the 
present discussion is confined to HAND, 
which will be analyzed to illustrate the 
need for certain very general lexical 
concepts. 
HAPPEN: Consider first the temporal 
shape of the handing episode in (I). It 
begins in the state: "she has her hat and he 
does not have it". Then an event occurs at 
time t which results in a change of state. 
And the episode ends in the state: "she does 
not have her hat and he does have it". This 
characterization raises two questions: how 
to represent changes of state, and how to 
reduce the redundancy of the state 
descriptions. 
A statement forming operator R (Rescher 
& Urquhart, 1971) Can be used to represent 
changes of state. R takes a temporally 
indefinite statement S and forms a new 
statement R (S) to the effect that "S is 
realized at t." In order to indicate a 
change of state at moment t, another 
operator -- call it HAPPEN -- is needed to 
form a new statement to the effect that 
"notS is realized at t-1 and S is,realized 
at t": 
(4) HAPPEN(S) = (~t)\[R~_i(notS) & R~(S)\] 
HAPPEN is a very general operator, 
characteristic of verbs that denote events. 
Note that the first conjunct of (4) will 
ordinarily be presupposed; that is to say, 
"S didn't happen" is not ordinarily taken to 
mean "R~(S) for all t." 
The two state characterizations -- "she 
had it and he didn't" and "she didn't have 
it and he did" -- are clearly redundant. 
The fact that a hat cannot be in two places 
at the same time (which must be part of a 
language user's general knowledge) merely 
compounds the redundancy of such state 
descriptions for double-object verbs of 
motion. However, it is a general 
characteristic of double-object verbs, not 
limited to motion verbs like "hand", that, 
in some sense of the ambiguous verb "have", 
the event ends with the indirect object 
"having" the direct object (Green, 1974). 
In the case of "hand", either x or y, but 
not both, will have z at any moment t; since 
y has z after t, x cannot also have it. On 
the other hand, if x "tells" y some 
information z, x does no~t stop having z 
after t. What is common to both, however, 
is that y does not have z before t. Thus, 
the simplest state description is 
S = HAVE(y,z), in which case the antecedent 
state would be notS. Since nots seems to be 
presupposed by HAND, (4) would be satisfied. 
On this analysis, therefore, some part of 
30 
I 
I 
i 
! 
I 
I 
I 
i 
I 
i 
I 
I 
I 
I 
I 
I 
I 
I 
I 
the meaning of "hand" must be: 
(5) HAPPEN(HAVE(y,z)) = GET(y,z) 
Discussion of HAVE will be omitted here; see 
Bendix (1966) or Miller and Johnson-Laird. 
Actually, of course, two things happen 
in handing: the object changes location as 
well as possessor. Indeed, the former 
change seems to be causally related to the 
latter. So, in order to complete the 
analysis, it is necessary to conside~ also 
what happens at t that results in the 
transition from notHAVE(y,z) to HAVE(y,z). 
Roughly, x uses x's hand to d~ something, 
and what x does causes z to travel to y. 
This paraphrase introduces four new 
operators -- USE, DO, CAUSE, and 
TRAVEL -- which can combine as follows to 
provide additional parts of HAND: 
(6) USE(x,hand,S x) & 
CAUSE(S~,(TO(TRAVEL))(Z,F)) 
Because the concepts associated with these 
operators -- instrumentality, agency, 
causality, and motion -- are required in the 
analysis of many English verbs, they will be 
discussed individually. 
USE: The first conjunct of (6) 
corresponds to "x uses hand to S~" or, more 
generally, USE(x,w,S,) is "x uses w to Sx," 
as in "Tom used a knife to open the box." A 
fuller paraphrase would be: "x intentionally 
does something S that causes w to do 
something S" that allows Sx." "Use" 
contrasts with instumental "with" in being 
intentional: "He broke the window with his 
elbow" is not synonymous with "He used his 
elbow to break the window." If we introduce 
an operator ACT to represent intentional 
acts, then USE can be defined: 
(7) USE(x,w,S~) = ACT(x,S) & 
CAUSE(S,DO(w,S')) & 
ALLOW(S',S~) 
This formulation adds two more 
operators -- ACT and ALLOW -- for which an 
account must be given. 
ACT: Intention will be taken as an 
unanalyzed primitive and represented by 
INTEND(x,g), where x is understood to be 
animate and g is understood to be a goal 
that x intends to achieve. It is further 
assumed that intentions can stand in a 
causal relation to behavior, so: 
(8) ACT(x,S) : CAUSE(INTEND(x,g),DO(x,S)) 
ACT and DO are closely related: ACT is the 
intentional counterpart of unintentional DO. 
DO: Let S denote a statement whose 
grammatical subject is x and whose predicate 
phrase is an event description (i.e., whose 
predicate entails HAPPEN). Then the 
relation between x and the event will be 
DO(x,S). DO is essentially a place holder. 
That is to say, DO will be restricted to 
contexts in which S can be a dummy 
variable -- see (7) for example, where 
DO(w,S') can be paraphrased as "w does 
something." If S cannot be a dummy 
variable -- if what x does is relevant to 
the meaning -- then DO will be replaced by 
an operator that makes the action explicit. 
CAUSE: Causation is too complex for 
brief explication. The following 
formulation is simply lifted from Miller and 
Johnson-Laird: 
(9) CAUSE(S,S') = 
BEFORE(HAPPEN(S),HAPPEN(S')) & 
notPOSSIBLE(S & notS') 
This formulation adds two more 
operators -- BEFORE and POSSIBLE -- for 
which accounts are needed. It is obvious 
that the plausibility of (9) must depend 
very heavily on POSSIBLE and on how a 
language user acquires general knowlldge 
about what combinations of events are 
possible or impossible. Lacking any clear 
psychological theory, POSSIBLE can be taken 
as a primitive, undefined term. 
ALLOW: "Cause" and "allow" are closely 
related, as a comparison of (9) with the 
following formulation shoud show: 
(10) ALLOW(S,S') = 
BEFORE(HAPPEN(S),HAPPEN(S')) & 
notPOSSIBLE (notS & S') 
Note that, although it is impossible for S" 
to occur unless S has occurred, the 
occurrence of S does not insure the 
subsequent occurrence of S'; that is to say, 
(S and notS') may well be possible. 
BEFORE: Sentences of the form "S before 
S'" can be interpreted to mean that there is 
some moment t such that S has been realized 
at t and S" has not yet been 
realized -- that there is an interval 
between the first realization of S and the 
first realization of S'. In terms of the 
temporal operator R: 
(11) BEFORE(S,S" (~to)\[(3t)\[t<t~ & R~(S)\] 
&)nit & R~(S')\]J (3t)\[t<t~ 
TRAVEL: According to Miller (1972), 
verbs of motion constitute a semantic field 
of English having "change of location" or, 
more briefly, "travel", as the core concept. 
It is sufficient evidence that something has 
traveled if one notices that it has appeared 
where it wasn't before, or if one notices 
that it is no longer where it was before. 
These conditions are accommodated by: 
(12) TRAVEL(z) = (~F)\[HAPPEN(AT(z,F)) 
or HAPPEN(notAT(z,y))\] 
for an appropriate choice of the location y 
as the origin or destination of motion. The 
first disjunct represents "z travels to y" 
and the second "z travels from y. 
Miller and Johnson-Laird adopt the 
convention of using A(B(x)) for sentential 
adverbials and (A(B))(x) for predicate 
adverbials, so the notation: 
31 
(13) (TO(TRAVEL))(z,y) : HAPPEN(AT(z,y)) 
reflects a judgment that "to y" is a 
predicate adverbial in z traveled to y. 
This analysis of TRAVEL, however, introduces 
still another operator, AT. 
AT: The form "z is at y" seems to mean 
that z is included in the characteristic 
region of interaction with y. Miller and 
Johnson=laird give: 
(14) AT(z,y) = INCL(z,REGION(y)) & 
notINCL(y,REGION(z)) 
The second conjunct is required to 
distinguish "at" from "with'. If z and y 
are commensurate, so that INCL is 
symmetrical between them, "with" is the 
preferred preposition. 
The two operators used to define AT can 
both be taken as primitive concepts. The 
relation of spatial inclusion that is 
supposed to be captured by INCL probably 
derives rather directly from perception of 
spatial relations. REGION, an operator 
indicating the characteristic region of 
interaction with its argument, derives from 
~eneral knoweldge of objects and their uses. 
HAND: Enough machinery has now been 
introduced to provide some rationalization 
for the following formulation: 
(15) HAND(x,y,z) : USE(x,hand,Sx) & 
CAUSE(Sx,(TO(TRAVEL))(Z,F)) 
& CAUSE(TRAVEL(z),GET(y,z)) 
Apparently some users of 
another meaning for "hand': 
English have 
(16) (TO(HAND))(x,z,y) : USE(x,hand,Sx) & 
CAUSE(S~,(TO)TRAVEL))(Z,F)) 
& ALLOW(TRAVEL(z),GET(y,z)) 
according to which x's action merely allows 
y to get z, rather than causes y to get z. 
As detailed as this analysis is, some 
omissions are obvious. For example, the 
noun "hand" introduced in (6) as the 
instrument x uses is not only undefined, but 
no explicit indication is given that the 
hand x uses is x s own hand. This relation 
of inalienable possession could be 
introduced, of course, by adding an 
appropriate HAVE relation between x and the 
hand in question, but one feels that this 
goes beyond the limits of lexicology -- the 
fact that people have hands and enjoy a 
special user s privilege with respect to 
them is surely part of one s general 
knowledge about people. Also omitted is any 
recognition of one s intuition that, when x 
hands z to y, not only does x use his hand 
to deliver z, buy y also uses his hand to 
receive it -- one would not ordinarily say, 
for example, "I handed him his dinner" if 
what one had done was to use one's hand to 
feed the food into his mouth. The 
characteristic region of interaction with 
the recipient is, in this case, his hand. 
Moreover, "hand" seems to implicate y's 
conscious acknowledgement that he has 
received z -- one would not ordinarily say 
32 
:~I handed it to him" if what one had done 
was to slip it surreptitiously into his coat 
pocket. Some of these features of "hand" 
could be introduced by definin~ GET in the 
third conjunct of (16) to something like: 
USE(y,hand,ACCEPT(y,z)), with an appropriate 
formula for ACCEPT. Also omitted are any 
explicit grounds for distinguishing handing 
from throwing -- something more would have 
to be said about the temporal shape of the 
transfer. No doubt there are still other 
omissions. Since the present discussion of 
"hand" is merely an expository device to 
motivate the introduction of certain very 
general semantic operators, however, the 
definition offered in (15) will be left 
incomplete. 
The general problem of completeness 
requires comment. How far one should go in 
adding such features to a lexical analysis 
is an important question of lexicology for 
which a principled answer could be most 
useful. There is at present no way to 
refute the claim that, after all general 
components of meaning have been specified as 
fully as possible, there will always be a 
residuum of meaning unique to each 
particular lexical item. 
Some Theoretical Alternatives 
Beginning with the English verb "hand', 
several paths were followed in search of its 
lexical primitives. What turned up were 
such things as the symbols of first-order 
predicate calculus, the concept of state, 
the generic concept of possession, the modal 
operators R~ and POSSIBLE, the psychological 
operator INTEND, and the spatial operators 
INCL and REGION. This is not an exhaustive 
list, but it is illustrative. In each case, 
a level of general knowledge was reached 
that went beyond the usual bounds of a 
lexicon. In terms of these primitive 
operators it was then possible to offer 
formulas for such general and important 
operators as HAPPEN, USE, ACT, DO, CAUSE, 
ALLOW, BEFORE, TRAVEL, and AT; these; in 
turn, made possible a first approximation of 
HAND. 
What is the status of these various 
operators? In what sense are any of them 
lexically "primitive"? Or, to ask a closely 
related question, what is the status of the 
- that occurs in so many of the 
derivations? 
Two kinds of answer can be suggested, 
one too strongand the other too weak. 
Somewhere between them seems the best place 
to search. 
COMPLETE DECOMPOSITION: Probably the 
strongest claim one cou\].d hope to make would 
resemble the fundamental theorem of 
arithmetic, e.g, any lexical item can be 
expressed as a unique (Cartesian?) product 
of prime lexical items. In this case, = 
would be a reflexive, symmetric, and 
transitive relation -- synonymy -- between 
lexical items. Since one reason for playing 
the lexical decomposition game seems to be 
the hope of reducing lexical variety to a 
I 
I 
I 
I 
I 
i 
I 
i 
I 
I 
I 
I 
i 
I 
I 
i 
I 
I 
I 
relatively small set of concepts, some 
workers might also insist that the number of 
lexical primes be finiie. (Note, 
incidentially, that the number of different 
primes into which an integer can be factored 
has little to do with the "complexity" of 
that integer, except in very special tasks; 
that is to say, for most tasks, reaction 
times to an integer would not correlate with 
its number of prime factors. Presumably the 
same could be said of the lexical version of 
this hypothesis.) 
COMPLETE INDIVIDUALITY: Probably the 
weakest claim one would care to make is that 
each individual lexical item is a unique 
prime in its own right; just as one person 
cannot be decomposed into some combination 
of other persons, so no lexical item can be 
decomposed into others. Various shared 
properties might be used to partition the 
lexicon, and varous relations might be found 
to hold between many pairs of lexical items, 
but such properties and relations could not 
be regarded as conceptual atoms from which 
lexical items are bult or to which they can 
be reduced. In this case, = would not 
hold between lexical items, although it 
might be a convenient metalinguistic 
relation between various properties, 
relations, or other theoretical statements. 
(Note again that no correlations would be 
expected between reaction times and 
"complexity", since all individual lexical 
items are, presumably, equally complex.) 
Although some theorists might be 
interpreted as embracing one or the other of 
these alternaties, it seems more plausible 
to regard them as upper and lower bounds. 
Complete decomposition is implausible in 
view of the difficulty lexicographers have 
in providing complete definitions; there is 
usually some residuum of meaning, often but 
not necessarily affective, that vitiates the 
equivalence relation. Complete 
individuality is inadequate to explain the 
rich and relatively consistent patterns of 
properties and relations that have been 
described. 
So, one is led to speculate about 
intermediate alternatives. For example: 
Suppose there were many lexical items having 
the characteristic that, whenever they 
occurred in a simple declarative statement, 
that statement's verification required the 
execution of some particular cognitive 
(perceptual or memorial) test. That test 
would, of course, partition the lexicon into 
those items thatneeded it vs. those that 
did not, as the individuality hypothesis 
suggests. One might go further, however, 
and argue that the need to perform this test 
and its acceptable outcome must be indicated 
explicitly in the information associated 
with those lexical items and so, in a real 
sense, it can be said to be "incorporated" 
into their meanings (Gruber, 1965). The 
goal of analysis would be to determine which 
items incorporated it, or in short, to 
decompose such items into that test plus 
anything else required for verification. 
This program falls short of complete 
decomposition in that: (I) it is a 
decomposition of words into cognitive 
33 
entities, like tests, rather than into other 
words; (2) the method of incorpoation is 
left unspecified, but would surely be more 
complex than taking a Cartesian product; and 
(3) there is no guarantee that decomposition 
will be complete without introducing more 
cognitive entities than there are lexical 
items to be defined, i.e., the problem of 
the residuum is unresolved. But a sort of 
limited decomposition would be possible. 
LIMITED DECOMPOSITION: Every lexical 
item incorporates several primitive lexical 
concepts, but no primitive lexical concept 
is expresed directly in a single lexical 
item. Certain patterns of these primitive 
concepts recur frequently, and so give the 
impression of underlying concepts into which 
surface words could be decomposed. 
Individuality and the appearance of residual 
meanings result from the existence of unique 
lexical primitives not expressed directly by 
any single word and not entering into 
recurrent patterns. Although underlying 
concepts (patterns of primitives) would 
reflect the considerable order that has been 
repeatedly noticed in the lexicon and in 
selectional restrictions for word 
combinations, it is not obvious that there 
can be any unique solution to the 
decomposition problem (any alternative 
formulations may seem equally plausible) and 
theoretical economy is highly unlikely 
(there will not be fewer lexical primitives 
than there are lexical items). 
The style of lexical analysis 
illustrated above for the English verb 
"hand" is taken as providing evidence for 
the plausibility of such an intermediate 
theoretical position. 
REFERENCES 
Bendix, E.H. Componential Analysis of 
GeneralVocabulary: The Semantic 
Structure of A Set of Verbs i__nn English, 
Hindi, and Japanese. The Hague, 
Netherlands: Mouton, 1966. 
Green, G.M. Semantics and Syntactic 
Regularity. Bloomington: Indiana 
University Press, 1974. 
Gruber, J.S. Studies in lexical relations. 
Unpublished doctoral dissertation, 
Massachusetts Institute of Technology, 
Cambridge: 1965. 
Miller, G.A. English verbs of motion: A 
case study in semantics and lexical 
memory. In A.W. Melton & E. Martin 
(eds.), Codin~ Processes i__nn Human 
MemorF. Washington: Winston, 1972. 
Miller, G.A., & Johnson-Laird, P.N. 
PerceDtion and Language Cambridge: 
Harvard University Press (in press). 
Rescher, N. & Urquhart, A. Temporal Logic, 
New York: Springer-Verlag, 1971 
