ON THE INTERPRETATION 
OF NATURAL LANGUAGE INSTRUCTIONS 
Bm'bara Di Eugenio Michael White 
Department of Computer and lnfommtion Science 
University of Pennsylvania 
Philadelphia, PA, USA 
{dieugeni, mwhite}@linc.cis.upenn.edu 
Abstract 
In this paper, we dLscuss the approach we take to the interpretation 
of instructions. Instructions describe actions related to each other 
and to other goals the agent may have; our claim is that the agent 
must actively compute the actions that s/he has to perfomt, not 
simply "extract" their descriptions from the input. 
We will start by discussing some inferences that are necessary 
m understand instructions, and we will draw some conclusions 
about action representation formalisms and inference processes. 
We will discuss our approach, which includes an action represan- 
tation formalism based on Conceptual Structures \[Jac90\], and the 
construction of the structure of the agent's intentions. We will 
conclude with an example that shows why such representations 
help us in analyzing instructions. 
1 Making sense of instructions 
Consider the following three instructions: 
(la) Go into the other room to get the urn of coffee. 
(tb) Before you pick it up, be sure to unplug it. 
(lc) When you bring it back here, carry it carefully with 
both hands. 
Let's consider (la). To understand this instructiou, an 
agent must find the connection between ttle two actions 
a--go into the other room, and \[3--get the urn of coffee. 
The infinitival to alerts the agent to the fact that a con- 
tributes to achieving /3. General knowledge about phys- 
ically getting objects requires that the agent move to the 
place where the object is located; therefore, the agent will 
infer that the (most direct) connection between these ac- 
titms has go into the other room fulfilling this requiremenL 
However, this is not enough. An assumption needs to be 
made for such connection to go through, namely, that the 
urn is in the other room. 
This example shows that to make sense of instructions, 
an agent must engage in the active computation of the 
action(s) to be executed, and cannot simply "extract" all 
such information from the input. This differentiates our 
work from others', as we will discuss shortly. 
Another important point that arises from (la) is that the 
relation contributes holding between c~, described in the 
matrix clause, and t, described in the purpose clause 1, 
can be specilied either as generation or enablement, as 
a study of naturally occurring purpose clauses \[Di 92,'1\] 
shows. 
Generation was introduced by \[Gol70\]. Informally, if ac- 
tion ,~ generates action t, we can say that fl is exe- 
cuted hy executing c~. An exmnple is Turning on the 
light hy flipping the switch. 
Enablement. Following \[Po186\] and \[Bat901, action cr en- 
ables action fl if and only if an occurrence of c, brings 
about conditions necessary for the subsequent perfor- 
mance of ft. In Unscrew the protective plate to expose 
the box, "unscrew the protective plate" enables "tak- 
ing the plate off" which generates "exposing the box". 
In \[Po186\], it is shown that these two relations are nec- 
essary to model action descriptions conveyed by Natural 
Language. We would like to add one further observation: 
such relations allow us to draw conclusions about action 
execution too. Tbis is quite useful since we do have to 
execute (it., animate) the input iustractions, as our work is 
taking place in the context of the Animation from Natural 
Language (AnimNL) project at the University of Pennsyl- 
vania \[WBD*91I. 
As far as generation is concerned, while two actions arc 
described, only a, the generator, needs to be performed; 
instead, if c~ etmbles t, after executing ~r, fl still needs to 
be executed. In fact, if cx enables t, cr bas to begin, but 
not necessarily end, before/3. 
I We am using the term purpose clauses to informally designate rob- 
ordinate clausel -- such as those introduced by to -- that express the 
igenI'l pmpog in executing the action delcdbed in the matrix clause. 
The usage of the term purpose clause in the lyntactic literature ii sOme- 
what different--~ee \[JonS5l. 
ACI'ES DE COLING-92, NANTES. 23-28 Ao~r 1992 l I 4 7 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
In both eases, the goal/3 also constrains the interpreta- 
tion and / or execution of c~. An example of this as regards 
generation is 
(2) Cut the square in half to create two triangles. 
The only action to be performed is cut the square in half. 
However, there is an infinite number of ways to cut a 
square in half: the goal create two triangles restricts the 
choice to cut the square along one of the two diagonals. 
We turn next to the second instrnctiQn (lb), Observe that 
the agent understands pick up to be part of the sequence 
that achieves get the urn of coffee. This is not warranted by 
the preposition before: if (lb) were Before you ruin it, be 
sure it's unplugged, the agent clearly shouldn't infer that 
ruin it is part of getting the urn! This shows that in before 
c~, /3, the action e~ is not necessarily part of achieving a 
certain goal, even if/~ is, 
As far as (lc) goes, the agent has to understand that 
bring it back here is part of achieving getting the urn; that 
carry it carefully with both hands generates bring it back 
here, provided that carry it carefully with both hands is 
augmented with the destination back here. Notice that 
the action description carry it carefully with both hands is 
fairly complex, sporting two modifiers in addition to the 
traditional arguments of agent and patient, 
2 Problems and Proposed Solutions 
The following conclusions can be drawn from the obser- 
vations in the previous section: 
1. NL action descriptions are fairly complex, including 
modifiers of many different types--see also \[WD90\]. 
An action representation formalism must be able to 
deal with complex descriptions, such as carry it care- 
fully with both hands; with descriptions at different 
levels of abstraction, such as go and walk to, or such 
as cut the square in half and cut the square in half 
along the diagonal in (2). 
2. NL instructions include a wide variety of construe- 
tions, such as purpose clauses and temporal clauses. 
Instruction interpretation systems must be able to deal 
with complex imperatives and with the relations be- 
tween actions that they express. 
3. An instruction interpretation system cannot assume 
that the descriptions of the actions to be performed 
are equivalent to the logical forms computed by the 
parser: such logical forms have to be constrained in 
various ways, e.g. by computing assumptions, as in 
(la), or more specific action descriptions, as in (2) 2. 
Notice that these coustralnts derive from the interac- 
tion between the actions to be executed and the goals 
21n thi~ paper we will ~ly discuss the former type of co~st.raint com- 
petition; the latter ii diJoassed in \[Di 92b\]. 
the agent adopts. It is essential that this interaction is 
taken into account by such systems. 
Work done in the past on understanding instructions has 
generally concentrated on simple positive commands, and 
has failed to address some of the desiderata listed above: 
\[VB90\] limits the interaction between new and preexist- 
ing goals to inserting the new goals in the list of goals 
if their execution does not violate preexisting constraints, 
otherwise they am rejected. \[Cha91\] proposes a model of 
instruction interpretation which seems useful at the level 
of the basic skills an agent is endowed with, but in winch 
there is no internal structure to actions, and no distinction 
between the agent's actions and goals. \[AZC91\] instead 
does assume a rich relation between instructions and pre- 
existing goal(s). However, instructions are not continually 
integrated into the plan the agent is developing; instead 
they are used as a resource when the stored knowledge 
about plans cannot be adapted to the situation at hand. 
Turning now to our proposal, our approach to these prob- 
lems includes 
1. An action representation formalism based on Jack- 
endofrs Conceptual Structures \[Jac90\]. 
2. An action KB that contains simple plans that repre- 
sent common sense knowledge about actions. 
3. A plan graph that represents the structure of the 
agent's intentions. 
3 Action representation 
We have chosen to use Jackendoff's Conceptual Structures 
\[Jac90\] for two reasons. First, as our point of departure is 
NL, there are the obvious benefits of using a linguistically 
motivated representational theory, e.g. easing the burden 
upon the parser to produce such representations \[Whi92\]. 
Second, there is significant mileage to be gained from using 
a decompositional theory of meaning, insofar as the prim- 
itives effectively capture important generalizations. In this 
section we introduce the notation and some minor modifi- 
cations to the theory as presented in \[Jae90\]. We use Go 
into the other room as a representative example. 
In Jackendoff's theory, an entity may be of ontological 
type Thing, Place, Path, Event, State, Manner or Property. 
The conceptual structure for a room is shown in (3a) below: 
(3a) \[Thins ROOM\] 
(3b) \[Whlns l<rrCHEs\] 
Square brackets indicate an entity of type Thing meet- 
ing the enclosed featural description. Small caps indi- 
cate atoms in conceptual structure, which serve as links 
to other systems of representation; for example, the con- 
ceptual structure for a kitchen (3b) differs from that of a 
Acrl~ Dr: COLING-92, NANTES, 23-28 AOt~T 1992 l 1 4 8 Paoc:. OF COL1NG-92, NANq'ES, AUG. 23-28, 1992 
tO a body that generates a header. The annotations on the 
body specify the relations between the subactions; such re- 
latious include partial temporal ordering, enablement, and 
possibly othel's. 
From the planning tradition, we retain the notions of 
qualifiers and effects. Qualifiers are conditions tfiat make 
an action relevant: for example, unplug x is relevant only 
if x is plugged. 
Notice the importance of using a representation such as 
Jackendoff's: it helps us capture the comnlou characteris- 
tics of different actions, e.g. get and carry. Tfie seman- 
tic representation for carry would also match the generic 
move-action template, and would add to it a qualification 
suclt as 
(10) \[M~,ne~ WITH(\[Thmg HANDS\])\] 
Having such a representation is also useful for comput- 
ing qualifiers and effects in a systematic way: they can be 
precompiled from tile representation itself. For example, 
for every action including a component ?J such as 
we know tlmt after 6, j must be at 1, theretore we can 
include this in the effects of the action. Given the filrther 
restriction that j cannot be in two places at once, we may 
infer that j cannot be at l now, and thus precompnte the 
qualifier s . 
4 The plan graph 
The plan graph represents tile structure of the intentions 
that the agent adopts as a response to the instructions. It 
keeps hack of the goals the agent is pursuing, of the hier- 
archical relations between the goals and the actions whose 
execution achieves such goals, and of various relations be- 
tween the actions. It also helps interpret tile instructions 
that follow. In (t), establishing the initial goal get the urn 
of coffee provides the context in which the two following 
instructions have to be interpreted--a similar strategy is 
adopted for example by \[Kau90\]. In Fig. 2, we show the 
complete structure built after interpreting (1). 
A node in a plan graph contains the Conceptual Structure 
representation of an action, augmented with the consequent 
state achieved alter the execution of that action 9. The arcs 
represent relations between actions; among them, those 
relevant to our example are: temporal, such as precedes 
in Fig. 2; enablement; generation, and its generalization 
substep, used when ~ belongs to a sequence of more than 
one action that generates 3- 
BJickendoff suggests something antlogous with his inference rules, which have yet to be form~lizea. 
°In Fig. 2 the libels on the nodes tire only mnemonics, tad do not 
represent their ~eal contents. 
AI: album. IN(Iotlv~r-,x~oml)) 
A2: B E(urn, pluggvd-in ) 
~~~n~ "tlOaP'wlth~n 
E w3 
Figure 2: The plan graph. 
There may also be assumptions associated with a plan 
graph. If an assumption is derived from the quMifiers as- 
sociated with an action, it is associated with the node de- 
scribing that action--A2 in Fig. 2; if it is derived while 
inferring a rehdion between two actions, it is associated 
with the corresponding arenA1. 
The plan graph is built by an interpretation algorithm 
that takes as its input the logical form constructed by the 
p,'wser. The algorithm works by keeping track of the ac- 
tive nodes, which include the goal currently in focus, and 
the nodes just added to the tree. The topmost level of the 
algorithm invokes different procedure.s, according to the 
particular syntactic construction at hand - e.g. the con- 
struction Do c~ to do/3 will trigger the hypothesis that 
either generates or enables fl \[Di 92b\]. These procedures 
retrieve the plan(s) associated with the goal currently in 
focus, and then expand such plans in a hierarchical fash- 
iou. 
These procedures embody various inference processes, 
that can be characterized either as planning--e.g, plan ex- 
pausion, subgoaling-- or as plan inference---e.g, inferring 
assumptions, inferring the more abstract goal some actions 
are supposed to achieve. Space doesn't allow as to go into 
further details about the algorithm or the inference pro- 
eesses; rather, in the next section we will give an example 
of how assumptions are computed. 
5 Making an Assumption 
We will now show how the assumption that the urn is to 
be found in the other room is made while processing (la), 
Go into the other room to get the urn of coffee. 
The process begins with the following representation 
constructed by the parser, where the FOR-function (de- 
rived from the to-phrase) encodes the contributes relation 
holding between the go-action ~, and the get-action B: 
ACRES DECOL1NG-92, NANTES, 23-28 AOt/r 1992 1 1 4 9 PROC. OF COLING-92, NANTES, AUG. 23-28. 1992 
room only in its choice of constant, leaving the determi- 
nation of their similarities and differences to a system of 
representation better suited to the task 3. 
To distinguish instances of a type, we follow \[ZV91\] in 
requiring every conceptual stnlcture to have an index: 
(4) \[Thing ROOM\]I 
Conceptual structures may also contain complex features 
generated by conceptual functions over other conceptual 
structures. For example, the conceptual function IN: Thing 
Place may be used to represent the location in the room 
as shown in (5a) below. Likewise, the traction TO: Place 
Path describes a path that ends in the specified place, 
as shown in (5b) -- (5c) is an equivalent representation of 
(5b), where the index 1 stands for the entire constituent4: 
(5a) \[vl~e IN(\[Thing ROOM\]k)\]/ 
(5b) \[P~.th TO(\[pl~ IN(\[Thins ltOOM\]k)\]l)\]n. 
(5c) \[Path TO(I)\]., 
To complete our clause 5, it remains only to add the con- 
ceptual function GO: Thing × Path ~ Event: 
(6) \[Event GO(\[whiag \]i. m)\] 
\[Path TO(\[IN(\[O THER-RO O M\])\])\]m 
AS there is no subject in our clause, the constituent i (prag- 
matically, the AGENT) in (6) is left unspecified. 
To distinguish Walk into the other room from (6), we 
include an indication of manncr~: 
\[ GO(i. m) 
\[Mmmer WALKING\] \] (7) L 
Finally, semantic fields, such as Spatial and Posses- 
sional, are intended to capture the similarities between sen- 
tencas like Jack went into the other room and 7"he gift went 
to Bill, as shown in (8) below: 
(Sa) \[GOso(\[JACK\], \[TO(\[IN(\[OTIIER=ROOM\])\])\])\] 
(8b) \[GOpo~s(\[GWT\]. \[TO(fAT(\[BILL\])\])\])\] 
The idea is that verbs like go leave the semantic field un- 
derspecified, whereas verbs like donate specify a particular 
field. In addition to these semantic fields, we propose to 
add a new one called Control. It is intended to represent 
the functional notion of having control over some object. 
For example, in sports, the meanings of having the ball. 
keeping the ball. and getting the ball embody this notion. 
and are clearly quite distinct from their Spatial and Pos- 
sessional counterparts; (9) represents Jack got the ball: 
(9) \[GOCtrI(\[BALL\], \[TO(fAT(\[JACK\])\])\])\] 
3In our c.as*, th~ action representation formalism is grounded in the 
animation system serving as the back~nd to the AnimNL project. 
4We win often adopt the tcpte.tentation in (5c), and leave out indices 
and ot~tological types, in order to hi~en the typographical berden of 
rep~seafing large c.~ccpmal stmcturct 
Slgnodng, of course, the meaning of other for now. 
e Though tiff s is clearly intended. Jackendoff never explicitly ~presc~nt~ 
such a distincfiota. 
Header 
\[CAUSE(fAoE.~\],, \[~Os,(j, ~)\])\] 
FROM(fAT(j)\]) 
TO(I) \] 
Body 
- \[GOsp(\[i, lYfO(\[AT(j)l)l)\]-fi 
- \[CAUSE~i, \[GOct~l(j, I'IO(\[AT(i)\])\])\])1-~ 
\[ GOs,(i./~) 
\[WITII(j)\] \] "~3 
- Annotations - 
- "71 enables "7~ enables "/3 
Qualifiers 
- \[NOT BEsp(j, 1)\] 
Effects 
- \[nEsp(J, l)\] 
Figure 1: A Move Something Somewhere Action. 
3.1 The action KB 
The action KB contains simple plans that represent com- 
mon sense knowledge about actions, and whose compo- 
nents are expressed in terms of Jacketldofffs semantic prim- 
itives. To discuss the characteristics of these plans, we will 
refer to the move-action KB entry shown in Fig. 1, which 
might be described as follows: go to wherej is, get control 
over it, then take it to 17. 
Actions have a header and a body. This terminology is 
reminiscent of planning operators; however we express the 
relations between these components in terms of enablement 
and generation---e.g, the body generates its header. 
The representation does not employ preconditions, be- 
cause it is very difficult to draw the line between what is 
a precondition and what is part of the body of an action. 
One could say that having control over the object to be 
moved is a precondition for a move-action. However, if 
the object is heavy, the agent will start exerting lorce to 
lift it, and then carry it to the other location. It is not obvi- 
ous whether the lifting action is still part of achieving the 
precondition, or already part of the body. Therefore, we 
don't have preconditions, but only actions which are sub- 
steps in executing another action, that is, they may belong 
ZThis do-it-younelf method is bet one way to move something front 
where it is to somewhere else. Other methods would be listed separately 
in the aclion KB. 
ACTES DE COLING-92, NAN'll/.S, 23-28 AOr.~T 1992 1 l 5 0 PREC. OF COLING-92, NANTES, AUG. 23-28, 1992 
\[ (:~Osp(\[AGFN'I'\]I , \[TO(\[IN(\[OTIIER-ItOOM\])\])\]) \], 
\[CAUSE(i, \[GOsp(\[URN-OF-COI~'I.'EI"\]j, k)\])\]fl 
FROM(\[A'I'(j)\]) 
TOll) \] k 
Given the presence of the to phrase, we know that a 
may be part of a sequence of actions that generate ft. To 
pursue this hypothesis, we begin by looking up fl in tile 
action KB. /3 matches the general move-action shown in 
Fig. 1 if the object to be moved j is bound to the urn of 
coffee: 
j \[URN-OF-COI~'FEE\] 
Next we try to match r, with some stthaction 7 of ft. a 
matches the iirst action 71 in /3 if we take tAT(j)\] and 
\[\[N(\[OTIIER-ROOM\])\] to be tile same place. This is tanta- 
mount to making the following assmnption: 
(11) \[BEsp(J, \[IN(\[oTItER-ROOM\])\])\] 
Once the instruction is understood itt this way, the two 
actions may be incorporated into the plan graph ,'ts shown 
in Fig. 2. 
One should mention that assumption (11) could of 
course be wrong, say if there were a note in the next room 
saying ha ha, it's not really in this room but the next. 
Notice that even if there is already an urn of coffee in the 
current room, the instraction Go into lhe other roortl lo get 
the urn of coffee is still understood to refer to an um in the 
other rcmm. This contrasts sharply with Go into the other 
room to wash out the urn of coffee, where the most likely 
urn is the currently visible one. In the current framework, 
this difference would be captured in the following way. 
Unlike itt the case of the get-action, the go-action matches 
tile following subaction of wash-out: 
\[GOsp(\[i, \[TO(\[AT(wA sit IN C~ -MATEttI A 1,S \]1111~. 
TIterelore, assumption (11) will not be derived, permitting 
the possibility of the urn being in the current room. 
6 Summary and Future Research 
We have presented an approach to action representation 
and instruction interpretation which we feel is more llexible 
than previously proposed formalisms: it allows us to use 
terms at different levels of specificity, and to perform the 
complex inferences that NL instructions require. 
Fnture research includes exploring how to integrate a 
hierarchical organization of entities, actions and plans with 
the action KB. 
The system is being implemented in Quintus Prolog, 
with substantial progress having been made in particular 
on the parser \[Whi92\], and on the action KB. 
Acknowledgements 
This research was supported by rite following grants: DARPA 
no. N00014-90 J°1863, ARO no. DAAL 03-894:-0031, NSF 
no. IRI 90-16592, and Ben Franklin no. 91S.3078C-1. We 
would like to thank all the members of the AoimNL group, and 
in particular Bonnie Webber, Lihby Levison and Chris Geib, for 
very stimulating and helpful discussions. 
References 
\[AZC911 Richard Altcnnan, Roland Zito-Wolf, and Tamitha 
Carpenter. Interaction, Comprehension, and ln,~truc- 
tion Usage. Technical Report CS-91-161, Brandeis 
University, 1991. 
\[Bal90\] Cecile Balkarmki. Modelling act-type relations in col- 
laborative activity. Technical Report TP,-23-90, Cen- 
tel" for Research in Computing Technology. Harvard 
University, 1990. 
\[Cha91\] l)avid Chapman. Vision, Im'truction and Action. 
Cambridge: MIT lhress, lga)l. 
\[I)i 92a\] Barbara Di Eugenio. Goals and Actions in Natural 
Language Instructior~¢. Technical Report MS4:IS- 
92-07. University of Pennsylvania, 1992. 
\[Di 92b\] Barbara Di Eugenio. Understanding Natural Lan- 
guage Instructions: the Case of Purpose Clauses. In 
Proceedings ACL 92, 1992. 
\[GolT0\] Alvin Goldman. A Theory of ttuman Action. Prince- 
ton University Press, 1970. 
\[JacgO\] Ray Jackendoff. Semantic Structures. Current Stud- 
ies in Linguistics Series. The MYI" Press, 1990. 
\[Jon85\] Charles Jones. Agent, patient, and control into pur- 
pose clauses. In Chicago LinguL¢tic Society, 21, 
1985. 
\[Kan90\] Henry Kautz. A circamscriptive theory of plan recog- 
nition. Iu J. Morgan, P. Cohen, and M. Pollack, edi- 
tors, Intentions in Communication, MIT Press, 1990. 
\[1'o186\] Martha Pollack. Inferring domain plans in question- 
at~wering, Phi) thesia, thfiversity of Pennsylvania, 
1986. 
\[VB901 Steveu Vete and Timothy Bickmore. A basic agent. 
Computational Intelligence. 6:41 60. 1990, 
\[WBD*9I\] Bonnie Webber, Norman Badler, Barbara Di Euge- 
nio, Libby Levison, and Michael White. Instruct- 
ing Animated AgentS. In Proc. US-Japan Workshop 
on lnlegrated Systems in Multi-Media Enviromnents. 
Las Cruces, NM, 1991. 
\[WD90\] Bonnie Webber and Barbara Di Eugenio. Free Ad- 
juncts in Natural Language Instructions. In Proceed- 
ings COldNG 90, pages 395~100, 1990. 
\[Whi92\] Michael White. Conceptual Structures mid CCG: 
Linking Theory mid Incorporated Argument Ad- 
junctS. 1992. COLING '92. 
\[ZV9I\] Joost ZwarLs and tlenk Verkayl. An Algebra of Con- 
ceptual Structure; an investigation into Jackendoff's 
Conceptual Semantics. 1992. To appear in Linguis- 
tics and Philosophy. 
Aortas DE COLING-92, NMCrES, 23-28 Ao(Yr 1992 1 1 5 1 \]'ROe. Ol: COLING-92, NANTES, AUG. 23-28, 1992 
