A Corpus Study of Negative Imperatives in 
Natural Language Instructions* 
Keith Vander Linden t 
Information Technology Research Institute 
University of Brighton 
Brighton BN2 4AT, UK 
knvl@itri.brighton.ac.uk 
Barbara Di Eugenio 
Computational Linguistics 
Carnegie Mellon University 
Pittsburgh, PA, 15213 USA 
dieugeni@andrew.cmu.edu 
Abstract 
In this paper, we define the notion of 
a preventative expression and discuss a 
corpus study of such expressions in in- 
structional text. We discuss our cod- 
ing schema, which takes into account 
both form and function features, and 
present measures of inter-coder reliabil- 
ity for those features. We then discuss 
the correlations that exist between the 
function and the form features. 
1 Introduction 
While interpreting instructions, an agent is con- 
tinually faced with a number of possible actions 
to execute, the majority of which are not appro- 
priate for the situation at hand. An instructor is 
therefore required not only to prescribe the ap- 
propriate actions to the reader, but also to pre- 
vent the reader from executing the inappropriate 
and potentially dangerous alternatives. The first 
task, which is commonly achieved by giving simple 
imperative commands and statements of purpose, 
has received considerable attention in both the 
interpretation (e.g., (Di Eugenio, 1993)) and the 
generation communities (e.g., (Vander Linden and 
Martin, 1995)). The second, achieved through the 
use of preventative expressions, has received corn 
siderably less attention. Such expressions can in- 
dicate actions that the agent should not perform, 
or manners of execution that the agent should not 
adopt. An agent may be told, for example, "Do 
not enter" or "Take care not to push too hard". 
* This work is partially supported by the Engi- 
neering and Physical Sciences Research Council (~n,- 
sac) Grant J19221 and by the Commission of the Eu- 
ropean Union Grant LRE-62009. 
t After September 1, Dr. Vander Linden's address 
will be Dept. of Mathematics and Computer Science, 
Calvin College, Grand l~apids, MI 49546, USA. 
Both of the examples just given involve negation 
("do not" and "take care not"). Although this is 
not strictly necessary for preventative expressions 
(e.g., one might say "stay out" rather than "do 
not enter"), we will focus on the use of negative 
forms in this paper. We will use the following 
categorisation of explicit preventative expressions: 
• negative imperatives proper (termed DONT 
imperatives). These are characterised by the 
negative auxiliary do not or don't. 
(1) Your sheet vinyl floor may be vinyl as- 
bestos, which is no longer on the mar- 
ket. Don't sand it or tear it up 
because this will put dangerous asbestos 
fibers into the air. 
• other preventative imperatives (termed neg- 
TC imperatives). These include take care 
and be careful followed by a negative infiniti- 
val complement, as in the following examples: 
(2) To book the strip, fold the bottom third 
or more of the strip over the middle of 
the panel, pasted sides together, taking 
care not to crease the wallpaper 
sharply at the fold. 
(3) If your plans call for replacing the wood 
base molding with vinyl cove molding, be 
careful not to damage the walls as 
you remove the wood base. 
The question of interest for us is under which 
conditions one or the other of the surface forms is 
chosen. We are currently using this information 
to drive the generation of warning messages in the 
DRAFTER system (Vander Linden and Di Eugenio, 
1996). We will start, by discussing previous work 
on negative imperatives, and by presenting an hy-. 
pothesis to be exI)lored. We will then describe 
the nature of our corpus and our coding schema, 
346 
detailing the results of our inter-coder reliability 
tests. Finally, we will describe the results of our 
analysis of the correlation between function and 
form features. 
2 Related work on Negative 
Imperatives 
While instructional text has sparked much inter- 
est in both the semantics/pragmatics community 
and the computational linguistics community, lit- 
tle work on preventative expressions, and in par- 
ticular on negative imperatives, has been done. 
This lack of interest in the two coinmunities has 
been in some sense complementary. 
In semantics and pragmatics, negation has been 
extensively studied (cf. Itorn (1989)). hnpera- 
rives, on the other hand, have not (for a notahle 
exception, see Davies (1986)). 
In computational linguistics, on the other hand, 
positive imperatives have been extensively inves- 
tigated, both from the point of view of interpre- 
tation (Vere and Bickmore, 1990; Alterman et al., 
1991; Chapman, 1991; Di Eugenio, 1993) and gen- 
eration (Mellish and Evans, 1989; McKeown et 
al., 1990; Paris et al., 1995; Vander Linden and 
Martin, 1995). Little work, however, has been (ti- 
rected at negative imt)eratives. (for exceptions see 
the work of Vere and Bickmore (1990) in interpre- 
tation and of Ansari (1995) in generation). 
3 A Priori Hypotheses 
Di Eugenio (1993) lint forward the following hy- 
pothesis concerning the realization of preventative 
expressions. In this discussion, S refers to the in- 
structor (speaker / writer) who is referred to with 
feminine pronouns, and H to the agent (hearer / 
reader), referred to with masculine t)ronouns: 
• DONT imperatives. A DONT imperative 
is used when S expects H to be aware of a cer- 
tain choice point, but to be likely to choose 
the wrong alternative among many possi- 
bly infinite ones, as in: 
(4) Dust-mop or vacuum your parquet floor 
as you would carpeting. Do not scrub 
or wet-mop the parquet. 
Here, H is aware of the choice of various clean- 
ing methods, but m W choose an inappropri- 
ate one (i.e., scrul)bing or wet-mopping). 
• Neg-TC imperatives. In general, neg-TC 
imperatives are used when S expects H to 
overlook a certain choice point; such choice 
point may be identified through a possible 
side effect that the wrong choice will cause. 
It may, for example, be used when H might 
execute an action in an undesirable way. Con- 
sider: 
(5) To make a piercing cut, first drill a hole 
in the waste stock on the interior of the 
pattern. If you want to save the waste 
stock for later use, drill the hole near a 
corner in the pattern. Be careful not 
to drill through the pattern line. 
Here, H has some choices as regards the exact 
position where to drill, so S constrains him 
by saying Be careful not to drill through the 
pattern line. 
So tile ilypothesis is that H's awareness of the 
presence of a certain choice point in executing a 
set of instructions affects the choice of one preven- 
tative expression over another. This hypothesis, 
however, was based on a small corpus and on intu- 
itions. In this paper we present a more systematic 
analysis. 
4 Corpus and coding 
Our interest is in finding correlations between fea- 
tures related to the function of a preventative ex- 
pression, and those related to the form of that ex- 
pression. Functional features are the semantic fea- 
tures of the message being expressed and the prag- 
matic features of the context of communication. 
The h)rm feature is the grammatical structure of 
the expression. In this section we will start with a 
discussion of our corpus, and then detail the func- 
tion and form features that we have coded. We 
will conclude with a discussion of the inter-coder 
reliability of our coding. 
4.1 Corpus 
The raw instructional corpus t}'om which we take 
all the examples we have coded has been collected 
opportunistically off the internet and from other 
sources. It is at)l)roximately 4 MB in size and 
is made entirely of written English instructional 
texts. The corpus includes a collection of recipes 
(1.7 MB), two comt)lete do-it-yourself nmnuals 
(RD, 1991; McGowan and R. DuBern, 1991) (1.2 
MB) l , a set of comt)utcr games instructions, the 
Sun Open-windows on-line instructions, and a col- 
lection of administrative application forms. As a 
1These do-it-yourself manuals were scanned by 
Joseph ltosenzweig. 
347 
collection, these texts are the result of a variety of 
authors working in a variety of instructional con- 
texts. 
We broke the corpus texts into expressions us- 
ing a simple sentence breaking algorithm and then 
collected the negative imperatives by probing for 
expressions that contain the grammatical forms 
we were interested in (e.g., expressions containing 
phrases such as "don't" and "take care"). The 
first row in Table 1 shows the frequency of occur- 
rence for each of the grammatical forms we probed 
for. These grammatical forms, 1175 occurrences 
in all, constitute 2.5% of the expressions in the full 
corpus. We then filtered the results of this probe 
in two ways: 
1. When the probe returned more than 100 ex- 
amples for a grammatical form, we randomly 
selected around 100 of those returned. We 
took all the examples for those forms that re- 
turned fewer than 100 examples. The number 
of examples that resulted is shown in row 2 
of Table 1 (labelled "raw sample"). 
2. We removed those examples that, although 
they contained the desired lexical string, did 
not constitute negative imperatives. This 
pruning was done when the example was not 
an imperative (e.g., "If you don't see the 
Mail Tool window ... ") and when the exam- 
ple was not negative (e.g., "Make sure to lock 
the bit tightly in the collar."). The number 
of examples which resulted is shown in row 
3 of Table 1 (labelled "final coding"). Note 
that the majority of the "make sure" exam- 
ples were removed here because they were en- 
surative. 
As shown in Table 1, the final corpus sample is 
made up of 239 examples, all of which have been 
coded for the features to be discussed in the next 
two sections. 
4.2 Form 
Because of its syntactic nature, the form feature 
coding was very robust. The possible feature val- 
ues were: DONT -- for the do not and don't 
forms discussed above; and neg-TC -- for take 
care, make sure, ensure, be careful, be sure, be 
certain expressions with negative arguments. 
4.3 Function Features 
The design of semantic/pragmatic features usu- 
ally requires a series of iterations and modifica- 
tions. We will discuss our schema, explaining the 
reasons behind our choices when necessary. We 
coded for two function features: INTENTIONAL- 
ITY and AWARENESS, which we will illustrate in 
turn using ~ to refer to the negated action. The 
conception of these features was inspired by the 
hypothesis put forward in Section 3, as we will 
briefly discuss below. 
4.3.1 Intentlonality 
This feature encodes whether the agent con- 
sciously adopts the intention of performing a. 
We settled on two values, CON(scious) and 
UNC (onscious). As the names of these values may 
be slightly misleading, we discuss them in detail 
here: 
CON is used to code situations where S expects 
H to intend to perform ~. This often happens 
when S expects H to be aware that ~ is an 
alternative to the ~ H should perform, and to 
consider them equivalent, while S knows that 
this is not the case. Consider Ex. (4) above. 
If the negative imperative Do not scrub or 
wet-mop the parquet were not included, the 
agent might have chosen to scrub or wet-mop 
because these actions may result in deeper 
cleaning, and because he was unaware of the 
bad consequences. 
UNC is perhaps a less felicitous name because 
we certainly don't mean that the agent 
may perform actions while being unconscious! 
Rather, we mean that the agent doesn't re- 
alise that there is a choice point It is used in 
two situations: when c~ is totally accidental, 
as in: 
(6) Be careful not to burn the garlic. 
In the domain of cooking, no agent would 
consciously burn the garlic. Alternatively, an 
example is coded as UNC when a has to be 
intentionally planned for, but the agent may 
not take into account a crucial feature of a, 
as in: 
(7) Don't charge - or store a tool where 
the temperature is below 40 degrees F or 
above 105 degrees. 
While clearly the agent will have to intend to 
perform charging or storing a tool, he is likely 
to overlook, at least in S's conception, that 
temperature could have a negative impact on 
the results of such actions. 
4.3.2 Awareness 
This binary feature captures whether the agent 
is AWare or UNAWare that the consequences of 
are bad. These features are detailed now: 
348 
Raw Grep 
Raw Sample 
Final Coding 
DONT Neg-TC 
II d°n~t I d° not II take care make sure be careful 
417 385 21 229 52 
100 99 21 104 52 
78 89 17 3 46 
167 72 
be sure 
71 
71 
6 
Table 1: l)istribution of negative imperatives 
UNAW is used when H is perceived to be un- 
aware that a is bad. For example, Exam- 
pie (7) ("Don't charge or store a tool 
where the temt)erature is below 40 degrees F 
oz' above 105 degrees") is coded as UNAW be- 
cause it is unlikely that tile reader will know 
about this restriction; 
AW is used when It is aware that a is bad. Ex- 
ample (6) ("Be careful not to burn the gar- 
lic") is coded as AW t)e(:ause the reader is 
well aware that burning things when cooking 
them is bad. 
4.4 Inter-coder reliability 
Each author independently coded each of the fea- 
tures for all tile examples in tile sample. The per- 
centage agreement is 76.1% for intentionality and 
92.5% for awareness. Until very recently, these 
values would most; likely have been accepted as 
a basis for fllrther analysis. To support a more 
rigorous analysis, however, wc have followed Car- 
letta's suggestion (1996) of using the K coettMcnt 
(Siegel and Castellan, 1988) as a measure of coder 
agreement. This statistic not only measures agree- 
ment, but also factors out chance agreement, and 
is used for nominal (or categorical) scales. In nom- 
inal scales, tiler(; is no relation between the differ- 
ent categories, and classification induces equiva- 
lence classes on the set of classified objects. In our 
coding schema, each feature determines a nominal 
scale on its own. Thus, we report the values of the 
K statistics for each feature we coded for. 
if P(A) is the prot)ortion of times the coders 
agree, and P(E) is the t)rot)ortion of times that 
coders are expected to agree by chance, K is com- 
puted as follows: 
K = P(A) - P(E) 
1 - P(E) 
Thus, if there is total agreement among the 
coders, K will be 1; if there is no agreement 
other than chance agreement, K will be 0. There 
are various ways of computing P(E); according 
to Siegel and Castellan (1988), most researchers 
Table 2: 
liability 
Kappa Value Reliability Level 
.00 - .20 
.21 - .40 
.41 .60 
.61 .80 
.81 1.00 
slight 
fair 
moderate 
substantial 
almost t)erfe.ct 
Tim Kappa Statistic and Inter-coder Re- 
feature K 
INTENTIONALITY 0.51 
AWARENESS 0.75 
Table 3: Kappa values for flmction features 
agree oil tile following formula, which we also 
adopted: 
: Zg 
j=l 
where m is the nulnber of categories, and pj is the 
proportiorL of t)bjccts assigned to category j. 
The mere fact that K may have a vahw. 
k greater than zero is not sufficient to draw 
any conclusion, though, as it inust be estab- 
fished whether k is significantly different fl'om 
zero. While Siegel and Castellan (1988, p.289) 
point out that it is possible to check tile sig- 
nificance of K when tile ,lumber of objects 
is large, Rietveh! and van Hout (1993) suggest a 
much simpler correlation between K values and 
inter-coder reliability, shown in Figure 2. 
For the form feature, the Kappa wfiue is 1.0, 
which is not surprising given its syntactic nature. 
The flmction features, which are more subjec- 
tive in nature, engender more disagreenmnt ainong 
coders, as shown by the K vahms in Table 3. Ac- 
cording to Rietveld and van Hout, tile awareness 
feature shows "substantial" agreement and the in- 
tentioimlity feature shows "mo(lerate" agreement. 
5 Analysis 
In our analysis, we have attempted to discover 
and to empirically verify correlations between tile 
349 
feature X 2 significance level 
intentionality 51.4 0.001 
awareness 56.9 0.001 
Table 4: X 2 statistic and significance levels 
function features and the form feature. We did 
this by computing X 2 statistics for the various 
functional features as they compared with form 
distinction between DONT and neg-TC impera- 
tives. Given that the features were all two-valued 
we were able to use the following definition of the 
statistic, taken from (Siegel and Castellan, 1988): 
= N(IAD - BCI - 
(A + B)(C 4- D)(A 4- O)(B 4- D) 
Here N is the total number of examples and A-D 
are the values of the elements of the 2x2 con- 
tingency table (see Figure 5). The X 2 statistic 
is appropriate for the correlation of two indepen- 
dent samples of nominally coded data, and this 
particular definition of it is in line with Siegel's 
recommendations for 2x2 contingency tables in 
which N > 40 (Siegel and Castellan, 1988, page 
123). Concerning the assumption of indepen- 
dence, while it is, in fact, possible that some of 
the examples may have been written by a single 
author, the corpus was written by a considerable 
number of authors. Even the larger works (e.g., 
the cookbooks and the do-it-yourself manuals) are 
collections of the work of multiple authors. We felt 
it acceptable, therefore, to view the examples as 
independent and use the X 2 statistic. 
To compute X 2 for the coded examples in our 
corpus, we collected all the examples for which 
we agreed on both of the functional features (i.e., 
intentionality and awareness). Of the 239 total 
examples, 165 met this criteria. Table 4 lists the 
X 2 statistic and its related level of significance for 
each of the features. The significance levels for in- 
tentionality and awareness indicate that the fea- 
tures do correlate with the forms. We will focus 
on these features in the remainder of this section. 
The 2x 2 contingency table from which the in- 
tentionality value was derived is shown in Ta- 
ble 5. This table shows the frequencies of exam- 
ples marked as conscious or unconscious in rela- 
tion to those marked as DONT and neg-TC. A 
strong tendency is indicated to prevent actions 
the reader is likely to consciously execute using 
the DONT form. Note that the table entry for 
conscious/neg-TC is 0, indicating that there were 
no examples marked as both CON and neg-TC. 
Similarly, the neg-TC form is more likely to be 
Conscious Unconscious Total 
DONT 61 (A) 45 (B) 106 
neg-WC 0 (C) 59 (D) 59 
Total 61 104 165 (N) 
Table 5: Contingency Table for Intentionality 
Aware Unaware ~btal 
DONT 3 103 106 
neg-TC 32 27 59 
Total 35 130 165 
Table 6: Contingency Table for Awareness 
used to prevent actions the reader is likely to ex- 
ecute unconsciously. 
In Section 3 we speculated that the hearer's 
awareness of the choice point, or more accurately, 
the writer's view of the bearer's awareness, would 
affect the appropriate form of expression of the 
preventative expression. In our coding, awareness 
was then shifted to awareness of bad consequences 
rather than of choices per se. However, the basic 
intuition that awareness plays a role in the choice 
of surface form is supported, as the contingency 
table for this feature in Table 6 shows. It indi- 
cates a strong preference ibr the use of the DONT 
form when the reader is presumed to be unaware 
of the negative consequences of the action to be 
prevented, the reverse being true for the use of the 
neg-TC form. 
The results of this analysis, therefore, demon- 
strate that the intentionality and awareness fea- 
tures do co-vary with grammatical form, and in 
particular, support a form of the hypothesis put 
forward in Section 3. 
6 Application 
We have successfully used the correlations dis- 
cussed here to support the generation of warning 
messages in the DRAFTER project (Paris and Van- 
der Linden, 1996). DRAFTER is a technical author= 
ing support tool which generates instructions for 
graphical interfaces. It allows its users to spec- 
ify a procedure to be expressed in instructional 
form, and in particular, allows them to specify ac- 
tions which must be prevented at the appropriate 
points in the procedure. At generation time, then, 
DRAFTER must be able to select the appropriate 
grammatical form for the preventative expression. 
We have used the correlations discussed in this 
paper to build the text planning rules required 
to generate negative imperatives. This is dis- 
cussed in more detail elsewhere (Vander Linden 
and Di Eugenio, 1996), but in short, we input our 
350 
coded examples to Quinlan's C4.5 learning algo- 
rithm (Quinlan, 1993), which induces a decision 
tree mapping from the functional features to the 
appropriate form. Currently, these features are 
set mammlly I)y the user as they are too ditticult 
t,o derive automatically. 
7 Conclusions 
This paper has detailed a corpus study of pre- 
ventative expressions in instructional text. The 
study highlighted correlations between flmctional 
features and grammatical form, tim sort of corre- 
lations usefld in I)oth interpretation and genera- 
tion. Studies such as this have been done before 
in Computational Linguistics, although not, to 
our knowledge, on preventative expressions. The 
point we want to emphasise here is a methodolog- 
ical one. Only recently have studies been making 
use of more rigorous statistical measures of accu- 
racy and reproducibility used here. We have found 
the Kappa statistic critical in the definition of the 
features we coded (see Section 4.4). 
We intend to augment and refine the list; of fea- 
tures discussed here and hope to use them in un- 
derstanding applical;ions as well as generation ap- 
plications. We, also intend to extend the analysis 
to ensurative ext)ressions. 
References 
Richard Alterman, Roland Zito-Wolf, and 
Tamitha Carpenter. 1991. Interaction, Com- 
prehension, and Instruction Usage. Technical 
Report CS-91-161, Dept. of Computer Science, 
Center R)r Complex Systems, Brandeis Uniw~.r - 
sity. 
Daniel Ansari. 1995. Deriving Procedural and 
Warning Instructions from Device and Envi- 
ronment Models. Master's tt,esis, University of 
Toronto. 
Jean Carletta. 1996. Assessing agreement o~, clas- 
sification tasks: the kappa statistic. Computa- 
tional Lingustics, 22(2). 
David Chapman. 1991. Vision, Instruction and 
Action. Cambridge: MIT Press. 
Eirlys Davies. 1986. The English Imperative. 
Croom Helm. 
Barbara Di Eugenio. 1993. Understanding Nat- 
ural Language Instructions: a Computational 
Approach to Purpose Ulauses. Ph.D. thesis, 
University of Pennsylvania, December. Tech- 
nical Report MS-CIS-93-91 (Also Institute for 
Research in Cognitive Science report IRCS-93- 
59,). 
Laurence Horn. 1989. A Natural History of Nega- 
tion. The University of Chicago Press. 
J. McGowan and editors R. DuBern. 1991. Home 
Repair. London: Dorlin Kingersley Ltd. 
Kathleen R. McKeown, Michael Elhadad, Yu- 
miko l%lkumoto, Jong Lira, Christine Lombardi, 
Jacques Robin, and ~'ank Smadja. 1990. Natu- 
ral language generation in COMET. In Robert 
Dale, Chris Mellish, and Michael Zoek, editors, 
Current Research in Natural Language Gene~u- 
tion, chapter 5. Academic Press. 
Chris Mellish and Roger Evans. 1989. Natural 
language generation from plans. Computational 
Linguistics, 15(4):233 249, December. 
Cdcile Paris and Keith Vander Linden. 1996. 
Drafter: An interactive support tool for writ- 
lug multilingual instructions. IEEE Computer. 
to appear. 
C6cile Paris, Keith Vander Linden, Markus 
Fischer, A~,thony Hartley, Lyn Pemberton, 
R.ichard Power, and Donia Scott. 1995. A 
support tool for writing multilingual instruc- 
tions. In Proceedings of the kburtcenth Inter- 
national Joint Conference on Artificial Intelli- 
gence, August 20 25, Montr6al, Canada, pages 
1398 1404. Also availal)le as ITRI report \[Tt{\]- 
95-11. 
J. Ross Quinlan. 1993. C/t.5: Programs for Ma- 
chine Learning. Morgan Kaufmann. 
:1991. Reader's Digest New Comt)lete Do-It- 
Yourself Manual. 
T. Rietveld and R. van Hout. 1993. Statistical 
Techniques for the Study of Language and Lan- 
guage Bchaviour. Mouton de Gruyter. 
Sidney Siegel and N. John Castellan, Jr. 1988. 
Nonparametric statistics fl)r the behavioral sci- 
ences. McGraw Hill. 
Keith Vander Linden and Barbara Di Eugenio. 
1996. Learning micro-planning rules tbr pre- 
ventative expressions. In Proceedings of the 
Eighth International Workshop on Natural Lan- 
guage Generation, Herstmonceux, England, 13-- 
15 June 1996, June. To appear. 
Keith Vander Linden and James Martin. 1995. 
Expressing Local Rhetorical Relations in hl- 
structional Text. Computational Linguistics, 
21(1):29 57. 
Stevcn Vere and Timothy Bickmore. 1990. A Ba- 
sic Agent. Computational Intelligence, 6:41 60. 
351 
