O. Abstract* 
FROM COGRAM TO ALCO(~RAM: 
TOWARI) A CONTROLLEI) ENGI,ISI\]f (;RAMMAR CIIECKER 
GEERT ADRIAENS \[1,2\] I)IRK SCIlREIlRS \[21 
\[ 11 Siemens-Nixdorf Software ('.enter LiSge, Rue des Foric.s 2, 4020 Liege, Belgium 
\[21 University of 1,eaven Ceuter for Couqmtational l.iuguistics, 
Maria-There, siastraat 21, 3000 Leaven, Belgium 
geert@et.kuleuven.ac.bc 
In this l~q)er we describe the roots of ControUed English 
(CE), the analysis of several existing CE grammars, the 
development of a wcll-lbunded lS0-rule CE grammar 
(COGRAM), the elaboration of an algorithmic variant 
(ALCOGRAM) as a basis for NLP applications, the use 
of ALCOGRAM in a CAI program teaching writers how 
to use it effectively, aud the preparatory study into a 
Controlled English grammar and style clmcker within a 
desktop publishing (ITI~)) environmeut. 
1. Introduction 
The use of controlled or simplified languages for text 
writing is a controversial matter, maiuly because it is 
felt as an attack of the writer's frecxlom of expression. 
Still, we see more and more attempts to introduce 
control and simplification in file text writing process, 
mostly integrated within intelligent text processing 
environments and complex NLP appieations such as 
machine translation (see 2. for an short overview). There 
are at least two types of motivation that Imve led us and 
other researchers to pursuing this matter with renewed 
interesL 
First, experience with large-scale NLP applications that 
should be capable of handling a wide rouge of inputs (in 
our case, the METAL MT system, used for the 
translation of technical and administrative text.s) has 
shown that there are limits to fine-tuniug big grammars 
to handle semi-grammatical or otherwise badly written 
sentences. The degree of complexity added to an already 
complex NLP grammar tends to lead to a deterioration of 
overall translation quality and (where relevant) speed. On 
the other hand, simple pre-editing tools that e.g. help 
split up overly long seuteuces into shorter mills (a very 
mild way of simplifying the inpu 0 have proved to lead 
to amazing improvements in output quality for the 
application of METAL in administrative text translation 
(Deprez 1991). In general, the avoidance of lexical, 
syntactic and stylistic ambiguities is believed to make 
machine translation or other NLP applications easier. 
Second, there is a growing need in international 
industrial environments for standardizatiou and 
simplification of written commnnieation; the experience 
is that the language used in industrial documents such as 
manuals needs a thorough revision to be used efficiently 
by both native and (especially) non-native writers and 
readers. To ensure that the language of technical 
documents is unambiguous, well-strnctured, economical 
and easily translatable, controlled language has been 
thought to be the solution, be it that this solution is 
The research reported m this paper Itas been funded by 
Alcatel Bell in the period 1989-1991. 
often proprietary to a company and hence difficult to 
access by the NLP re,arch conmamity. 
In this paper, we report {lit ongoing lesearch and 
development nf a Cnlltollcd L:nglish graluular for 
technical documenlatioii (ctmrsl; nlaterlal and systems 
docunlenlatiou) ill the are+( of telecomnninlcation. We 
started by examining three representative controlled 
grammars (AECMA, Ericsson, IBM). Fimling them 
iucmnplete and defective in numy ways, we developed 
our own controlled gfanlu|ar, COGRAM. Since such a 
paper gumnuar is riot the most motivaling of texts lbr 
technical writers to use in tht: writing prtg:ess, we 
dccided to restructure it in an algorithurie way 
(ALCOGRAM) with an eye to using it in a cmnpnter- 
aided language learning tool toni a mote anthititms 
grammar and style checking program. The first 
application is finish(~l aml currently being lestcd at the 
Alcatel-Bcll company, We ;ire )alw dcsiguiug the checker 
for operation within the Interleaf I)TI' environment, 
which ahcady oflk;rs integrated ludinmntaty lexical 
control. 
But let us stall by giviltg a shm-t overview oi the history 
and current application (if controlled English iu the NLP 
research ;rod the industrial communities. 
2. The rnots of Contrnlled English 
The foundation lot most el the current CE umnnals wa.'-; 
laid by the Catelpillar Tractm Company (Peoria, 
Illinois, USA) in the mido1960s. This company 
(currently still active in the CE field) introduced 
Caterpillar Fundamental English (CFE), on which two 
significant derivatives, i.e. Smart's t'lain Euglish 
PMgram (PEP) and White's International Laugnage l~n 
Sc~ving and Maintenance (II~SAM) were based. PEI' 
gave birth to grammars used by Clark, Rnckwell 
International, and ltyster, while II,SAM can be 
considered the root of gramntars nscd by AECMA 
(Ass(v,:iadon EurolC,~enne de Constractears de Mat(.'~ici 
A(~rospatial), IBM, Rank Xerox, and Ericssmt 
TelecramuunieaLioas. Nowadays, a ctmsidcrable nnmlr~:l 
of variants of Cmflrolled English can be |inmd in many 
corporations. In the USA, Boeing successlnlly uses an 
elaborate Simplified English Checker (SEC) to control 
aircraft maintenance reporls (Wojcik ct al, 1990). The 
Xerox Corporation uses Systran and ALPS in 
conjunction with a Controlled English input (Kingscott, 
1990). In rile UK, Perkins Engiues introduced Pelkins 
Approved Clear English (PACE) to simplify their 
publications and to aid translation, whether carried out 
by conventional or computer-aided methods (Pyre, 
1988). At Woll~mn College in Cambridge E. Johnson 
developed Airspeak and Seaspeak, both restricted 
languages. Policespeak is currently being developed tn 
ACrEs DI.: COLING-92, NANIa!S, 23 28 AO(,q' 1992 5 9 5 I'ROC. O~: COl,ING-92, NANq I!s, Air(i. 23-21;, 1992 
developed Airspeak and Seaspcak, both restricted 
languages. Policespeak is currently being developed to 
enable fast and accurate communication with the French 
counterparts when the Channel Tunnel opens in 1993 
(Jackson, 1990). In the Netherlands, the BSO/DLT 
machine-based translation project also benefits from the 
linguistic confines and standardization of terminology 
(Van der Korst, 1986). In the French TITUS system, 
controlled language ("Langage Documentaire 
Canonique") is used to improve machine translation of 
abstracts of technical papers on textile fabrics (Ducrot 
1984). 
CFE 
PEP 
I 
Clark, Rockwell, Hyster 
ILSAM 
I 
AECMA, IBM, Rank Xerox, 
Ericsson, Boeing SE, Perkins 
Engines, B SO,fDLT 
Fig. 1. The Controlled English heritage tree 
Since the above-mentioned grammars have been adapted 
to the individual needs of each company, they might - to 
some extent - differ from one another. Unfortunately, 
we were not able to get bold of any grammar of the PEP 
branch. Despite this limitation, three of the above- 
mentioned grammars, namely AECMA, Ericsson 
English, and the IBM manual were taken as the starting 
point from which our research and development in the 
domain of CE could evolve. 
3, Preliminary linguistic study 
Although our study of 3 CE grammars does not claim to 
be exhaustive, it does reveal the structural dissimilarities 
between the AECMA, Ericsson, and IBM grammars. 
Moreover, it underscores some of the qualities and 
deficiencies of each manual Concerning spelling, syntax, 
style, and other information such as completeness and 
readability. Whereas the English used in all three 
grammars is good, the grammars differ in structure 
overtly. The following subsections summarize the study 
(Lemmens 1989: 10). 
3.1 Spelling 
Spelling 
word list 
new words allowed 
free compounding 
spelling checker 
AECMA ERICSSON IBM 
yes yes yes 
yes no no 
no no no 
no no yes 
Grid 1 : Spelling 
As to the lexical organization, all three manuals contain 
a controlled vocabulary list. In particular, Ericsson 
English uses a two-level lexicon : Level 1 documents 
may only contain those lexical items that are marked 1, 
whereas Level 2 documents can be edited using a more 
extended vocabulary. In the IBM word list a marginal 
"!" symbol indicates that "the word has some restriction, 
either a restriction to one meaning or a caution that the 
word is not at eight-grade level and should only be used 
with care." Other words are preceded by a marginal "X" 
indicating "a word to be avoided". 
All the words used in the three grammars must conform 
to the spelling used in the word lists. EE prefers British 
spelling, whereas AECMA consistently uses American 
spelling rules as prescribed in the Webster dictionary. 
Obviously, as they were inspired by individual heritage 
and international business matters, each of these 
companies have taken pragmatic decisions that match 
their internal organization. 
To check lexical terminology and spelling in its 
documents, IBM supports its writers by means of three 
computer-assisted instruction programs : WORD 
CHECKER II, SPELL 370, and PROOF. 
The AECMA grammar reveals a remarkable degree of 
lexical flexibility : "Besides the words in the dictionary, 
the writer can also use those words which he decides 
belong to one of two categories : either Technical 
Names or Manufacturing Processes" (AECMA : iv). 
Nevertheless, controlled rules tell whether or not a term 
belongs to the field of Technical Words or a 
Manufacturing Processes. "Inhouse preferences" can be 
"defined in your company's house rules, or by your 
editors" (AECMA : vi). In a controlled grammar, 
however, you cannot deliberately add new meanings to 
the vocabulary list, and transfer words from one lexical 
category to another, e.g. the Ericsson grammar demands 
that no new lexical items may be listed, unless the 
Ericsson Standards Department gives permission to do 
so. Similar authority holds for the IBM DPPG 
Customer and Service Information. Nevertheless, 
Ericsson describes a special procedure for using non- 
listed words : "If you need to use a new word that is 
useful only in a very specialized context, give a 
definition of the word in EE, in the document that you 
are writing. If you need to give several definitions in the 
document, make an alphabetical list of the definitions at 
the end of the document" (EE : 8). The IBM grammar 
restricts the use of new words heavily. Writers can, if 
really necessary, use X-marked words, provided they 
have been defined and even illustrated in every line where 
they might be encountered for the first time, and 
preferably in a glossary, as well. All three manuals 
allow noun clusters or compounds, if the number of 
nouns making up the cluster does not exceed three. 
Adding prefixes or suffixes to items listed in the lexicon 
is also not allowed. 
3.2 Syntax 
Syntax AECMA ERICSSON IBM 
verb forms restricted restricted restricted 
subclause nothing limited very little 
grammar checker no no no 
tense distribution nothing nothing nothing 
linguistic basis weak weak weak 
descriptive little little little 
Grid 2 : Syntax 
AcrEs DE COLING-92, NANTI~S, 23-28 AOUT 1992 5 9 6 PROC. OF COLING-92. NANTES. AUt3. 23-28. 1992 
As to syntax control, Ericsson English states that "the 
two fundamental principles of writing are : the memfing 
must be clear; the language must be simple" (EE : 8). 
Ericsson, AECMA, and IBM control more or less 
identical grammatical milts, notwithslanding each 
company has its own way of simplifying syntax. All 
three grammars control verb torms, but AECMA 
Simplified English (SE) does not allow either a gerund 
or a participle. EE only allows gerunds ("EE uses -ing 
words ... as nouns to describe activities") and it "doe~s 
not use present participles or the continuous tenses". 
IBM in its turn lets file present participle function either 
as an adjective or as a noun. 
3.3 Style 
Style AECMA ERICSSON IBM 
punctuation basic nothing basic 
sentence structure +/- little little 
paragraph structure basic nothing nothing 
Grid 3 : Style 
Next to some elementary rules of imnctuation coutrol, 
the EE grammar does not lbcus on stylistic control. 
AECMA Simplified English refers to some panctlmtion, 
and it discusses sentence length, paragraph length, aml 
structure. IBM has a speciM Information Developmem 
Guidelines manual called "STYLE". It goes without 
saying that uniformity of style and layout eahances the 
overall quality of documents in coutrollext language. 
3.4. Miscellaneous 
Other information AECMA ERICSSON IBM 
check list no no no 
completeness no uo no 
readability +/- ok good 
Grid 4 : other iutonnatioa 
At times, one of the three grammars prol×lses .. besides a 
rule of control - valuable information, which cannot be 
found ill the other two grammars. The AECMA 
grammar, for example, instrncts the writer how to 
change a passive sentence into an active one and states 
that no verbs should be left out to reduce rite sentence 
length. In addition, one particular grammar sometimes 
does not contain a rule of control which file two others 
have : file Ericsson grammar does not refer to control of 
articles; AECMA and IBM do not take into accouut 
subordinate clauses (except for controlling file participial 
adverbial subclause). Still, although individually 
focusing on syntax control, all three manuals are 
incomplete: since EE considers but a few aspects of 
subordinate clause control, the grammar reveals 
insufficiency and incompleteness. "llmre are no 
satisfactory answers to questions such as : What alxmt 
gapping and elliptic structures? How about using zero- 
relative markers and zero-connectives? Are sentential 
relative clauses allowed? Cau nominal relatives be 
used? Tire rules of control are vague as, hlr instance, in 
the EE statement "A comma divides a sentence into its 
aatmat compoueuts and makes it easier to read". What 
does "aataral COlnlRrUPAIIIS" ineau? Numerous examples 
of rules fllat are not well-defined or vague instructions 
indubitably cause confusion and lead to grmnmatical 
ntistakes. 
3..~ Collc|llSi()ll 
First of all, we concluded that "lhe liuguistic l(madation 
of these manuals are at times very weak: 
oversimplifications oth~n lemls to linguistic inaccuracies; 
frequently linguistic structures are not covered; the 
instrnctious are at times vagtve ,'rod ambiguous; and ol/en 
the rules disregard liuguistic reality" (Lemmens 1989 : 
ill. 
Secoudly, in all three graimnars there is a lack of clear 
distinction between descriptive aud normative 
principles. There is uo specification whether the 
s|ructmes to be avoided are uugrannnatical or simply 
non..coutrolled. Typical of tile three grammars is die 
nolmative "IX) not use" uleaning "Avoid". Seldom - if 
ever - is dlis phrase used to show that the writer should 
not use a construction becan~ it is ungrammatical. For 
exanlple, tile rules for distributing "when" mid "if" do 
not laention file iucon'eet use of "when" in conditional 
subclauses. 
Moreover, sometimes descriptive information ntw_.ds to 
be included, e.g. a list o\[ alternative constructious ill 
connuon English not to be used by the writers. 
Onfortmtately, to guide the writiug of descriptive 
documents the rules set forth by the ahove~utentioned 
gralmums Imve to be violated regularly. To write a new 
CE gramnmr a clear distinction between the tales h)r 
editiug, on the one hand, basic instructive technical 
documents, aud, ou the other hand, "higher-level" 
descriptive docnments (EE l.evel 1 and 2) will be 
tequhed. 
Consequently, "... it is not salficient to construct a new 
grmnmar hy just melting together the three graummrs, 
as was mentioned earlier. The new grammar should also 
be linguistically welLfounded, unambiguous, and, 
where necessary, descriptively adeqmlte" 0,ennnens 1989 
: I1). 
4. Organization of the COGRAM project 
Since the develolnuent ill the Controlled English 
grammar (C(KIRAM) o as it will be pre~nted in this 
pallet' - iiiaiuly consisls of two colnlJonelilS, a word list 
and a grauuuar, a two-dimensional strategy has to be 
takeu into iu:connt. 
Ou tile one haud, a lindtod lexical database is being 
develut~xl. A basic wold list containing 2000 terms has 
been constituted to Ire nsed in computer-aided language 
learning exercises. Receudy, this list has been extended 
to a vocabulary package of approximately 50110 words. 
Moreover, auother 1000 teehuical Ix:ruls were added to 
make the eontrulted vocabulary mole complete. Oa the 
other hand, rile fiehl of Controlled English has been 
studied to geacmle a selectitm of ad~uate granutlar rules 
tlmt pertain to multiple aspects of technical writing: 
lexical structures, syntactic patterns, arid stylistic 
l('.atnics. 
Both the lexical dalalmse aud tile grammar need to be 
integrated into a powerhd tool tiJr writers. To ensure 
that an in|roduetiou of the grammar at a company will 
ACRES DECOLING-92, NANTES, 23-28 hofzr 1992 5 9 7 t'ltoc, ol: (;OI,ING 92, NANTES, AUG. 23-28, 1992 
take place without many users psychologically 
objecting to Controlled English, we have thought of 
illustrating the grammar rules by means of straight-to- 
the-point examples, all taken f¢om the users' field of 
intelest. 
5. The Controlled Gramnrar (COGRAM) 
The development of COGRAM bas been partly directexl 
by a three-fold division into a lexical, syntactic, and 
stylistic component. Most of the COGRAM rules can 
be characterized by the following three models : "Do not 
use X", "Use only X", and "Avoid X". At times "Do 
not use X"-rules are complemented with alternative 
suggestions. Secondly, the difference between "Do not 
use"orules and "Avoid"~rules is fundamental iu 
COGRAM. "Do not use"-rules mean "You must not 
use", whereas the "Avoid"orules denote "Try not to use". 
Some cnlcial remarks to be made here are : How is each 
tylm of rule related to the others? To what extent do 
they J~ed to complement oue another and how? 
Unlbrtunately, a dilemma makes an adequate solution 
even more complicated. On the one hand, from a 
pedagogic point of view it is not useful to add all nou- 
controlled lbmrs to complement a "Use only"-rule. All 
grammar rules should be kept as simple as possible. 
Moreover, file addition of non-controlled torms may 
cause coufosion on the side of the users; they might he 
euticed to use non-controlled forms. On the other 
hand, in view of NLP applications, it is necessary to 
consider all correct (+) and incon'ect (-) usages to 
develop a powerful grammar checker. The problems tlmt 
arise in regard to the modeling of rnles result from tile 
inability of exactly determining the users' knowledge of 
non-controlled but correct English, and Controlled 
English : What should the level of ram-controlled 
English be before one starc~ mastering COGRAM? 
In the following sections we will focus on each 
component in ternls of descriptive approach, linguistic 
foundations, and structural organization. Each 
compooent will lie illustrated by a few COGRAM 
exmuplcs. 
5.1 COGRAM : The Icxical component 
To guarantee that COGRAM would systematically cover 
all major lexical categories in English, the grammatical 
division by Leech and Svartvik was taken as a starting 
point. To create the initial frame of the grammar all ten 
lexical categories as described in tile Communicative 
Grammar of English (Leech 1987 : 307) were divided 
into four major word classes (nouns, main verbs, 
adjectives, adverbs) and six minor classes (auxiliaries, 
pronouns, determiners, conjunctions, prepositions, and 
interjectious). All the rules applying to these categories 
were methodically brought together into the lexical 
component. 
F,x. 1 : Avoid splitting infinitives, unless the 
emphasis is on rite adverb. 
BOM tries t0accurately lJ~ all the 
subassenthlies. 
+ BOM tries ~all the subassemblies 
accurately. 
Ex. 2 : Use short infinitives of regular action verbs. 
Make a ohotocoov of the CAD graph. 
+ 2~¢d..OA the CAD graph. 
Ex. 3 : Use "a" before a noun beginning with a 
consonant sound for non-specific reference. 
Store all numerical information in database 
program. 
+ Store all munerical inlormation in a database 
program. 
5.2 COGRAM : The syntactic component 
Beside the lexical component, a syntactic module, which 
controls coordination, subordination, tense, and aspect 
describes Controlled English sentence patterns. It should 
be mentioned that during the development of the 
controlled syntax, two computer-assisted writing 
programs, Grammatik 4 (Reference Software 
International 1989) and Right Writer (Right Soft Inc. 
1987), were analyzed to weigh pros and cons with 
respect to controlled syntactic patterns. 
Ex. 4 : Write all instructions in a chronological order. 
Press the button on your fight, after you have 
set the switch to the middle. 
+ Set the switch to the middle. Press the button 
on yoar righL 
Ex. 5 : Do not use a participle to introduce an 
adverbial clause. 
MIC manufacturing, electroplate the 
housings. 
+ ~,~,..MIC manufacturing to electroplate the 
housings. 
Ex. 6 : Use only because, never since in a subclause 
of reason. 
a DBCS manages the System 12 database, 
physical storage is transparent to the users. 
+ ~ a DBCS manages the System 12 database, 
physical storage is transparent to the users. 
5.3 COGRAM : The stylistic component 
The third subsection in the grammar comprises 
controlled punctuation and layout rules to organize 
textual material efficiently. Extensive study of 
Kirkman's manual on punctuation added to the insight 
into the facilities of style control as well (Kirkman 
1983). 
Ex. 7 : Use a question mark only at the end of a direct 
question. 
+ Is the component single-sourced or multi- 
source~? 
Ac:l~s ~l~ COI.ING-92, NA~rES. 23-28 At)(;r 1992 5 9 8 PROC. OF COLING-92. NANTES, AUG. 23-28, 1992 
Ex. 8 : Do not divide words. 
Ex. 9 : Expound major tOlfiCS, restrict minra topics. 
6. Testint, and evaluating the prototype 
The prototype version of COGRAM comprised 
approximately 100 rules. To test file efficieucy of the 
prototype, we analyzed a technical text sample of 450 
lines (Schreurs 1989). Because of its linguistic 
resemblance with other teChltical text files this ~,gment 
might be a suitable representation (if the crucial 
grammatical problems to be discnssed. 
In the Appendix, we show a short eXCelpt front the 
uncontrolled base text next to its controlk'xl couaterpa~t~ 
A preliminary remark involves the seniautics of the 
terminology. During the revision of the smnple file 
several incomprehensible terms aud phrases had to Ix: 
decoded. Since most linguist~ are not technical experts, 
an irreproachable semantic revision couhl not be 
guaranteed. This is a semanlic problcm~ aud thus 
beyond the scope of this lexico-syntactic analysis. 
Nonetheless, the English of the sample text had been 
revised ~ thoroughly as possible to test our prototypic 
yet controlled English grammar. 
6.1 Summary of the sample text aaalysis 
In the ~mple of 187 sentences 452 iuaccuracies wetc 
traced. This means more thau two errors per ~ntellce on 
average. Sixty-three percent are Controlled English 
mistakes, 37 % are common English errors. As to non- 
controlled English the lexical component reveal:; n 
noteworthy lack of precision : 17 % of all mistakes art', 
lexical, another 13 % covers spelling co'ors and incotrcct 
abbreviations.Concerning Controlled English 17 % nl 
all inaccuracies pertained to punctuation: overuse of 
brackets and slashes, lack of clear tahular layouts and 
imprecise organisation of titles. In additiou, the 
dispensable use of passive sentences that can easily ix ~, 
active and the huge amount of wordiness ace other major 
problems, 
6.2 l)iscussioa 
After examining rite analysis of the sample text throngh 
the COGRAM prototype, we conclmled that the 
grammar was still incomplete and uot powelful enough 
to transform technical prose into fully controlled 
documents. Ttle results, us shown above, do not reflect 
the linguistic contents of the docmneut in a realistic 
way. Obviously, because tile roles of the proUitype were 
not explicit enough, a lot of conslxnctions that were 
acceptable in Controlled English were flagged 
negatively. The rule "Put a period at the cud of each 
syntactic unit", for instance, was not accnrate euongh. 11 
led to flagging of all titles, heading, mid subheadings, 
which obvionsly do not end with a period. 
Consequently, the number of punctuation mistakes 
should be considered with can(ion. 
In general, this test exercise led to better controlled 
definitions of technical tcrms (the lexical compouent), 
and to more efficient, clearer and well-illustrated rules 
(the syntacitc comlxmen0. 
7. Au Idgol'ithutic (:ore(trolled grammar 
After a number of ul~ated versions, the invention and 
classilication of 150 grammatical rules (COGRAM 1.0 
It) could function as a solid inlrastiucture from which a 
uew stngc in the development toward a grammar and 
style chc~ker can emerge: the organization of an 
algorithmic con(foiled grammar (ALCOGRAM). The 
question to be answered ia regard to the logical 
organization of the ut'~w grammar is two-fold. First, 
cau we keep Ihe thretM'old division (lexical, syn "tactic, 
stylistic) unchanged when storing 150 rules of control 
into role algorithm? Secondly, how nmch will an 
algorithmic sh'netnre affect the adequate interaction 
aurong the componenls? 'lk) find a suitable solulion to 
the above-mentioaed questions, the following 
tmragt'aphs will deal with tim internal structure of the 
AI~fY_RqRAM modules. 
7ol ALCO(;RAM : Algorithlnie Controlled 
Grammar o¢ F',laglish 
With an eye to NLI' applicalions of COGRAM (being 
just a line~m lisi of carefully designed rules), a different 
organization of the rules had to be developed. 
AI~COGRAM i~ not a mere blcud of conveational 
coutrolled /4xammar rules; it is an algorithmieally 
organized grammm lhat consists of four m(xlule~s each 
cC, vcrillg particular asl~CkS o\[ tile process of controlled 
w~iting. 'l'hroa~h its division AI~COGRAM does not 
only operate at the word or sentence level, but also takes 
into co)(side)alien the textu'.,fl orgauizafion of technical 
documents; guided thai,storming ~ules should be 
regarded as an initial textmd infrastructure gradually 
evolving ttlward couitolled text 10rmat standards. 
The fimr-block swucturc nf AI,CtKiRAM constitutes the 
el)re of coai~'ollexl writing. )Tulging from "conciseness" 
ovcl "exha-.textnality" 11) "lay(m( and puoctnation", lit 
()Liter words, each level ill the grammar covers ~vctal 
ideas typical ot conhollcd writing, which - in their hun 
arc )cpresentcd by n \[uunbt:r of lexieal, extm4extnal, and 
style mles. 
'/.Lt t°vepavattn'y Textual Control Algorithm 
(PTCA) 
Carehd Ic.d couhol implies spt~cilicadoo of the initial 
stage, from witch the limitexl aml exactly defined steps 
huve to be taken. This starting point is to Ix'. situated 
within a ptetanatoiy phase, i.e~ before the actual text is 
writteu. Whelt a iechnical wliter wants to write a text, 
guided brainstorming would be the solution to avoid 
snperliciality fioiu tht~ initial Vfiut in the process of 
wriiinf;. ~nis segntc.t of the algorithm is labeled 
Prepatz~h, y Textual Control Al~.orithm (PTCA). The 
tel'CA may entail iuhoductoly coutrol, coitla'ol through 
adequacy, w~itin~, control, paragraph control, aud 
cxanq,ic c~muol, lit additkm, it generates a textual limne 
i~! v,hk:h lh~z syntactic. COlnpOl|e|lt C~I opt, rate adequately. 
E:~. lO : Define |ethnical terats and acl'onyms ill 
a(lvla|¢c, l'mvide Rtpafate lists of diem iu llppeadic~s. 
7./\[.?, FJy~ltl,letie Coidrol Algorithm (SEA) 
ALq'ES DI! COLING-92, NAt'cres, 23-28 AOUr 1992 5 9 9 iqt(~c, oJ: CO1,llq(3..92, NAN rv;s, AlJo. 7%28. 1992 
The Syntactic Control Algorithm (SCA) controls, at a 
second stage, syntax in terms of sentence length, 
coordination and subordination, tense and aspect, A 
variety of syntactic units i.e. titles and headings, 
statements, direct and indirect questions are prepared for 
lexical control. 
Ex. 11 : 
single actions. 
+ 
7.1.3 Lexical 
Write one instruction per sentence for 
Insert the disk. Enter your password. 
Control Algorithm (LCA) 
At the third stage, the Lexical Control Algorithm (LCA) 
operates on all major and minor classes: noun control, 
verb control, adjective control, adverb control, auxiliary 
control, pronoun control, conjunction control, 
proposition control, and interjection control. The output 
of the LCA is a controlled lexico-syntactic unit. 
Ex. 12 : Avoid gender-specific language. Use a 
more neutral term. 
For information, contact our local 
salesman or saleswoman. 
+ For information, contact our local 
salesmanaeer, 
7.1.4 Micro Control Algorithm (MCA) 
Stage four aims at controlling particular microfeatures of 
the lexico-syntactic unit, The Micro Control 
Algorithm (MCA) includes a.o. numeric control, 
reference control, series control, omission control, 
crucial term control, expression control. 
Ex. 13 : Use words for a number when it is the 
first word in the sentence. 
+ v$_g~galggdl engineers developed a new 
high-quality expert system. 
7.2 ALCOGRAM : General algorithmic 
structure 
In comparison to the paper grammar and its derivatives, 
the three-block structure could not be kept unchanged : 
the stylistic component is not a separate unit in the 
algorithmic grammar; control of punctuation and style 
has been accurately merged into the textual, syntactic, 
lexical, and micro control subdivisions. Moreover, the 
answer to our second question can thus be formulated : 
the link between the PTCA, SCA, LCA, and MCA is 
definitely more compact, even more structured, and, as to 
the integration of the stylistic component into the 
algorithmic frame, more functional. 
7.3. Flow-chart example of ALCOGRAM 
The following algorithmic sample has been taken from 
the SCA. This part of ALCOGRAM controls adverbial 
subclauses. If the users answer the questions generated 
by the algorithm correctly, they will be given 
suggestions on how to control their adverbial subclaase. 
What ~ of subordinate clause ? 
1. adverbial z, relative J. nominal 
I What type of adverbial subclause ? 
I 1. time 2. purpose 3. condition 4. reason 
5. concession 6. result I 7. place 
! 
What kind of condition ? ./- x, 
1. positive 2. negative 
I Use unless 
Fig, 4 Algorithmic grammar flow-chart 
7.4 ALCOGRAM & NLP applications: 
present and future 
7.4.1. Computer-aided Language Learning 
(CALL) 
When the controlled grammar (COGRAM) has been 
structured according to strict algorithmic principles 
(ALCOGRAM), the notion of applying a computer in 
the process of technical writing (CAI) is obviously 
close. Consequently, a three-level (beginner - 
intermediate - exper0 computer program has been 
developed that guides the writer through the algorithm 
by asking questions and giving suggestions on how to 
control a specific item. The user can also retrieve 
information about linguistic terminology from the 
database by means of a popup-window. The entire 
algorithm - 25 files (2,5 Mb) which may run from MS- 
WINDOWS's Enhanced Mode - has been programmed 
and compiled in CLIPPER, and linked by PLINK86 for 
IBM compatible 386 SX Personal Computers. It is 
currently being tested at the Alcatel-Bell company in 
Belgium, to assess both its completeness and usefulness 
as well as its degree of acceptance by technical writers. 
7.4.2 Grammar/style checking 
After the assessment period of the Controlled Grammar 
via the CALL application, the next more ambitious step 
will be the development of an intelligent grammar and 
style checking program for Controlled Language. We 
are currently designing the ALCOGRAM checker in 
such a way that it can be fully integrated with the 
Interleaf DTP environment (which already contains a 
Lisp-based rudimentary lexical control componen0. It 
should be able to transform non-controlled lexico- 
syntactic units into controlled ones without 
substantially affecting the semantic content of the units 
(cp. Wojcik et al, 1990). Since the development of 
parsers and grammars for NLP applications is a costly 
enterprise, we will be looking at the potential 
integration of the METAL MT grammar for English 
into our checker. Experiments in style checking of 
German and Spanish using the METAL analysis 
grammars and the FrameMaker DTP environment in the 
context of the Translator's Workbench ESPRIT project 
(Thurmair 1990a/b) have yielded promising results 
which we might use as a starting point. 
AcrEs De COLING-92, NAWrEs, 23-28 no(rr 1992 6 0 0 PROC. oi: COLING-92. Nnwres, AUG. 23-28, 1992 
Appendix 
Non-controlled input sample 
Automatic test circuits 
Special test tone circuits are often foreseen. When the 
test circuit is called, a test tone with the proper transmit 
level is returned. When many circuits have to be tested 
the use of automatic test circuits is recommended. They 
can dial the preset number to connect to the special test 
tone circuit in the distant exchange, and test each circuit 
for noise, transmission level, signalling, and answer 
supervision. The faulty circuits can be printed out, or 
alarm can be given to the technician. The test can be 
made not only from exchange to exchange, but also 
through tandem exchanges to the terminating exchange. 
The automatic test circuit can also be used to test the LD 
equipmenL 
Controlled OUtlmt sample 
Automatic test circuits 
Special test tone circuits are often foreseen. When the 
test circuit is called, a test tone with the proper 
transmit level is returned. When many circuits need a 
test, we recommend automatic test circuits. 
These circuits can : 
dial the preset number to reach the special test tone 
circuit in the distant exchange; 
test each circuit for noise, transmission level, 
signalling, and answer supervision. 
One can print the faulty circuits, or alarm the technician. 
One can do the test not only from exchange to exchange, 
but also through tandem exchanges to the terminating 
exchange. One can also use the automatic test circuit to 
test the LD equipment. 

Bibliography 

Adriaens G. & Schreurs D. (1990) - Controlled English 
(CE) : from COGRAM to ALCOGRAM (presented at 
"Computers and Writing lIl", Edinburgh 1990). Leuven, 
Center for Computational Linguistics. 

AECMA (1988) - A Guide for the Preparation of Aircraft 
Maintenance Documentation in the Aerospace 
Maintenance Language. AECMA Simplified English, 
Paris. 

Beeken J. (1990) - CONST : Computer Instructed 
Writing Techniques (presented at "Computers and 
Writing III", Edinburgh 1990). Leuven, Department of 
Liuguistics. 

Deprez F. (1991) - TARZAN: pre- and post-editing tools 
for rite METAL system in the administrative domain. 
METAL documentation. 

Ericsson (1983) - English Writer's Guide. Stockholm, 
Ericsson Group. 

IBM (1989) - Information Development Guidelines, 
"Content", "On-line Information", "Vocabularies for 
Customers and Service Information", "Style". 

Jackson T. (1990) - Less is more, article in "Electric 
Word" #19. 

Kingscott G. (1991) - Applications of Machine 
Translation : Study for the Commission of European 
Communities, Praetorius Limited. 

Kirkman J. (1983) - Point on Punctuation for Scientific 
and Technical Writing. South Glamorgan : John 
Kirkman Communication Consultancy. 

Lemmens M. (1989) - Controlled English Project - 
Preliminary Research. Leuven, Department of 
Linguistics. 

Pym P.J. (1988) - Prc-editing and the use of simplified 
writing for MT: an engineer's experience of operating 
an MT system, ASLIB. 

Pym P.J. (1990) - Simplified English and Machine 
Translation, Perkins Engines UK. 

Schreurs D. (1989a) - COGRAM, Controlled Grammar 
1.0. Leuven, Department of Linguistics. 

Sehreurs D. (1989b) - Grammatical Analysis of a 
DATACOM 2 sample through the Controlled English 
Grammar COGRAM. Leuven, Department of 
Linguistics. 

Schreurs D. (1990a) - ALCOGRAM, Algorithmic 
Controlled Grammar 1.0. Leuven, Center for 
Computational Linguistics. 

Schreurs D. (1990b) - Testing of teu Alcatel-Bell 
Abstracts through the Computer-controlled Interactive 
Algorithmic Grammar ALCOGRAM. Leuven, Center 
for Computational Linguistics. 

Thurmalr G. (1990a) - Parsing for Grammar and Style 
Checking. In Proceedings of the 13th International 
Conference on Computational Linguistics (Helsinki 
1990), Volume II, 356-370. 

Thurmair G. (1990b) - Style Checking in TWB 
(Translator's Workbench). Munich : Siemens-Nixdorf. 

Van der Korst B. (1986) - A Dependency Syntax for 
English. Utrecht, BSO Research. 

Wojcik R., Hoard J. & Holzhauser K. (1990a) - On 
Creating a Practical Simplified English Checker. 
Washington : Boeing Computer Services. 

Wojcik R., Hoard J. & Holzhauser K. (1990b) - An 
automated grammar and style checker for writers of SE. 
Washington : Boeing Computer Services. 
