A oom~uter model for Russian ~Faa~atiqal description, and a 
me~ho~ of En~ish s~nthesis in maehine translation 
DeMo Tares+ (National Physioel Laboratory, Teddington) 
• 1qtz'oauotiem 
This paper is the second of two from the NPL M~ group at 
this conference. It desorlbes a model deslgn~ to express the 
gremmatical •facts discovered by the Russian enamels algorithm 
in such + a wa~ that they can be used dAreotly by the Xng~ish 
synthesis algoritha~ The ~neral nature of this synthesis 
process is the subject of the second part of the papor. 
The mudLeX; lin~ulstlo features 
Russian and English have aa~ important cater-lee in ooamen. 
For instance, both have subjects, verbs, objects, nominal groups, 
OOmmIt:L'ImLoni3. clauses and so one When it comes to finer details, 
though, the dlfferenoes between the two languages become more 
notieeable than the similarities: the use of auxiliary verbs 
I;O represent tenses, for instance, is quite different (e.g~ did 
not ask = we c npocx~) 
The basic task of this model is to provide a means of 
representing in the computer a~ Russian graematieel structure 
wh:Loh th.e analysis algorithm a&y have to express. As far as 
possible this representation must be independent of the particular 
conventions of either language. For ezaaple we cnpocx~ would 
not be ascribed an~ internal structure, but would be represented 
as "cHl~oc-/ask, negative, past tense". The ana3~is woul8 
dlsoover these facts, conoe~t.z~ itself only with Russian conven- 
tions~ and the synthesis woula express thee in En s~p oon~ 
itse\]~ only with the English conventions. "Negative" and u~ 
tense" are examples of choices within closed sets of possibilities. 
Such sets are known as systems. Our model therefore has two main 
linsuistic features, structure and system, which w~ both be 
needed to describe a Russian sentence. This tez~alnolo~ is taken 
f~oe the work of Halliday (1961). 
~he structure is fundamental~ a hierarchy of constituents, 
but there are four ways in whieh it differs from a conventional 
constituent structure: 
(1) Each constituent may e~eeplif~ choices in systems, and, as 
illustrated above, this means that some units in the tez~ 
(e.g. particles and auxiliaries) are not given places in 
the structure. 
(2) One item may occupy more than one place in the structure. 
The on~ nee~ for this in scientiflo Russian seems to be 
-1- 
the ~ role of a relative wet4 in linking a subordinate 
clause to some +higher eonatituent and at the same tlne 
some role within its structure. 
(3) There is no requirement fo~ a eonstituent to be eontinuous 
in.the text (although "~hose found by the eurrent analysis 
al&urltha always are), 
(~) If the systems are powerful enough there is no need for 
explioit ordering of suboonstituents. This point will 
be taken up again later. 
The no,el: eoaputln~ features 
A grammatical structure of word-groups is represented in 
the computer by a list structure, that is to say a collection 
of stored items called elements with the property that eaeh 
element either (i) contains addresses of one or more other 
elements, or (ii) is ma~ked as a terminal element° The 
elements represent single items (words or idAome); the other 
elements each represent a larger word-group or eonctituent of the 
sentence= If an element A contains the address of an element B, 
this represents the fact that woz~l-group A inoludes word-group B. 
For example, the d~soriptlon of the structure of the group 
wewSoAee ~pocTas AoKewwa~ CTp_~Typa includes four terLinal 
elements (for the four words) and two other elements, linked as 
follows: 
=aw6oJee... e....~ ~ 
......... '/I 121"21"2::2 
.... ........ I 
(AG = adjectival group, 
NG = nominal group) 
Eaoh element is labelled with a code giving the constituent 
type (noun, verbal ~oup,eto.), and eaeh address referrin 8 to 
an inoluded wor&-~oup is 1abe-lied with a code giving the role . 
of the smaller group in the larger one (complement in prepo-~ional 
-2- 
With roles includedp the above description becomes: 
wax6oJee. _~da 
_,oo.,, ...... -'/T 
:::'2:'2:--"_.. I 
Role s: 
I = .ed.~ier (in .~) 
. = ~ad (in ~G) 
\]E~A = pre-acl~eetive (in AG) 
A~ = aa~active (in AC~) 
Choices in Systems are also represented in a label in the 
element concerned. This label is called the s~steas word~ 
In the above ezample, the systems word in the nominal group 
element records the numberp gender, and case of the group. In 
theory, the observed order of items is either evidence 
for a particular structure (as in the order of prepositions and 
thei~ complements), or evidence for a choice in a system (as in 
the order of auxiliary and subject in English interro~tive 
sentences). Just the same is true of punctuation (some oo~as 
indicate structure, e.g. those marking clause boundaries; others 
ir~tnate a oholo~ in a system, e.g. those dist~shing 
0descriptive' and 'restrictive' qualifiers in nominal groups)®. 
Y~eall~ then the model would have no need to represent item order 
or punctuation explicitly: it would record the structures aria 
systems 9 and the synthesis algorithm would have a free hand in 
deteraining the English order and punctuation a~oording to English 
structu~ and eystemAc rules. But in practice the language 
features concerned are not yet understood in sufficient detail, 
so the synthesis keeps the original order and punctuation except 
.... where it has some reason to change them. This means that they 
need to be recorded in the model statement. The addresses in an 
element are therefore stored in the same order as the constituents 
to which they refer, aDd. each element includes details of a~ 
pu~tuation s~ the constituent. 
-3- 
The full list of constituent ~pes aria roles is as follows: 
,i 
Constituent 
~o~ ~p (~G) 
Adjectival group '(AG) 
Prepositional group (PC,) 
Aaverbial group (ADV) 
verbal ~p (v~) 
CoorCtnate sroup (CO,) 
Close (CL) 
Subordinate clause (SO) 
Complex clause (CC) 
Comparative group (Cl~) 
(e.s. zaz + noun) 
P~ ~up (P~) 
(e.g. aezTop-$yK~n~ ) 
Suboonstituents' roles 
i 
Hera (H) 
~Itisr (Q) 
Appositive (Ap) 
Pre-adjective (PrA) 
A4jeotive (Ad~) 
Post-ad~ective (PtA) 
Preposition (Pp) CoapZement (Ct) 
Pre-adverb (PrA) 
Ad~ (Aav) 
Post-adverb (PtA) 
Verb (V) Co=plement (Ct) 
Adjunct (At) 
Conjunction (C~) 
Subject (8) 
Predicate (Pd) Adjunct (At) 
Conjunction (C j) 
Clause (Cl) 
Clause (Cl) A~u=t (At) 
Link (~) 
cospa~son (Cp) 
P~ix (Pf) 
stock (st) 
Although most of the terminolo~ in the table will be self- 
explanatory, it should be made clear that in a co=ordinate 6coup 
the 'members' may be oonstituents of an~ type, Likewise the 
prefix group is a general one D the *stock' bein~ nounp adjective, 
or verb. (In practise, for reasons of pro~-amming convenience, 
the prefix group was not used, such 6roups bein~ represents& by 
the 'stock' alone, raged with the reference number of the prefix). 
The table attempts to provide an adequate set of oonsti~uen~ 
typesand roles for the 4esoription of sentences in our texts. 
It should not be ~erred that our analysis processes could 
reoognlse all these features; ~eed the clauses and. the eom- 
Imamtlve group were not usea at all. 
Associated with each type of constituent there are certain 
systems. For example# a clause may beeither non-finite (ecJw 
w unyzzc nozaT~ ... ) or finite. If finite, choices of mood 
(interro~tive/imperotive/~eclarativ'e), oox~itionali~, ax~ 
personality will have been made; and if the clause is personal 
there will be selections of person and mmber. All these 
systemic, choices would be recorde~ in the element ropresenting 
the clause. 
Below, an example is given of the st~otural description 
of a complete sentence; ags~tu it is not a structure which the 
curron~ ~na~Tsis could produce, but is intendad simP~7 to 
illust,,ate the use of tJm model. 
,Example of sentence struotu.~ description 
C lXOMO~\[~,m ...... "@_..i..~.._~ ¢L 
ueTo~.w .... ,._~,~4 ~/i " 
zcnozJ, e~-~me,_ _ _ ~_...~. /s I 
e~ezT ....... 
Eeppa, ....... 
mpoaeAewo .... 
xcc~e~oaasxe ...... e----JL- 
,n,pn- -- q _'@r ,%~ 
Q 
xx ........... = ~ ../m.l ..a 
.... ... .. HA/~/ nO.l-qpHO CTH I 
• ......... ,-- - -d 
pacmpeAe~eww~__ .-- .- '~ - 
HflMSPHHqQHHGOTH.~,.,~ __q 
c~ 
(N.B. C noMoI~D is treated as one item since it is included 
in the ~istiomary as am idiom) 
The En~ish e~nthesis algorithm 
The synthesis algorithm has the task of taking a sente~e 
expressed in terms of the model described above, an~ producing 
from it the string of characters which form the English output 
sentenoe. 
The program uses the model statement to &uide it in 
decisions on: 
(1) re-ordering; 
(2) insertion of English 'function' words (auxiliary 
verbs, etc.); 
(3) selection of English equivalents from the short 
list in each dictionary entry; 
(4) inflection of English equivalents. 
These decisions are of course based on grammatical data only 
(both structural an~ systemic); in particular in the selection 
of equivalents no semantic or eolloeational techniques are used. 
The particular tasks under these headin~ which are 
appropriate to a particular type of constituent will in general 
need to be carrie~ out whatever the role of the constituent in 
some higher structure may be; a~ we are therefore led to the 
need for a separate routine for each constituent type. Such a 
routine will be called a constituent type procedure (CTP). 
The nominal group CTP, for example, will be called upon when 
and only when a nominal group has to be produced by the program. 
Since constituents nest within one another freely, one CTP 
will need to call on others to deal with the parts of the 
constituent in turn. The CTPs must in fact be written as fully 
recursive subroutines; and the program consists basically of a 
oontrol routine for exploring the list structure together with 
a set of CTPs, one for each constituent type. 
As was pointed out by Yngve (1960), it is a lin&uistic fact 
(at least in the Indo-European family of lansuages to which 
Russian and English Both belong) that in ma~ constituents the 
final sub-constituent is a group of words, while other sub- 
constituents are more frequently single items. Thus ,multiple 
"nestiu~ of the CTPs usually involves final suboonstituents. 
But in these cases all details of the higher constituent can be 
"forgotten" by ~e computer since that constituent will not need 
to be returned to; so even a long sentence nsede no great depth 
of push-down store to han~e the nested CTPs. (Lan~mge has 
presumably evolved in this way because of an analogous advantage 
-6- 
in the 
The first task of a CTP is to decide on any re-ordering 
needed. It implements such a decision simply by re~g 
the addresses in the element concerned. Each CTP entered does 
thls, so that the individual items are met in their new order 
and can be added to the output string at once. 
The selection and inflection of equivalents are carried 
out at the time they are to be produced, when all relevant 
information is available to the CTP without excursions into 
other parts of the structure. The insertion of function words, 
on the other hand, say be done by a~y CTP. 
The resulting English output string is then passed to a 
final program which is responsible for format control. The 
normal form of output is punched paper tape, from which the 
printed copy, as shown in McDaniel et al, (this conference), is 
produced on a 'Flex~writer'. There is an alternative form of 
output on punched cards, from which printed copy can be produced 
on a card-controlled typewriter. This earlier form gives the 
text in the two languages side by side, which was useful for 
research purposes, but the absence of lower-case Roman letters 
and pa~Lnation, add the restricted width of each language 
version, makes this form less well suited for general use. 
This format control process, and the main control routine 
which deals with the exploration of the tree and the handover 
from one CTP to the next, need not be described further, but 
the tasks of the individual CTPs will be outlined below. 
Tasks of noai~l ~roup CTP 
(t) To insert before the group a preposition depending on the 
case and role of the group, e.g. of is inserted if case is 
6enitive and role is qualifier in NO. 3eversl instances 
occur in the sample output referred to above. 
(2) To mo~e modifiers containing items after the adjective or 
participle to the end of the group, with appropriate 
punctuation. 
Structure as received from analysis: 
zyxxHe / necessary_ _~~_..~4 
wccleAoBawwm / investigation _ 
no~epxwocTw / surface ........... . 
-7- 
Resu1_.._~t: 
su~aoes n@gessar ~ for investi~tions 
In a more complex case commas are inserted. 
 xl mp e: 
BHAea~ex~e /Choose___ 
ycTpo~cTBOX /system__ 
nepBx~e /primary_ 
npwsaaza /sign 
pe~eBHx /speech 
cwrwaAos /signal_ 
D 
Resul..__!t: 
~rimary signs of speech signals, chosen by 
system.---- 
Tasks of verbal ~roup CTP 
(I) To insert auxiliary verbs ar~ 'not' as msoessazy in finite 
verbal groups, for instance inserting does not for ~he 
present tense 3r~ person singular ne~tive. The precise 
rules for the position of the insertion are complex, but 
roughly these words are inserted immediately before the verb 
in negative verbal groups and before the verb and a~ 
immediately preceding adverbs in positive verbal groups. 
EzamDle" 
Structure as received from analysis: 
weawam~Te1~wo /insignificant___: 
Mem~mTC~ /change_ --~i 
The V~ has systems coding 3r~ plur., present, c~ passive 
positive. The V@ CTP therefore outputs are an~ hands 
control to the aaJeotive CTP (since %he di--~ionary entry 
-8- 
for the first word is an adjectival one). As described 
below, this ~ will output the adjectival equivalent with 
an adverbial inflection -ly. The verb CTP then generates 
the verbal equivalent again with the appropriate inflection. 
Resul_.__~t: 
are Insi~nlfioant\]~v changed 
(2) The V~ CTP also inserts auxiliary verbs before "short form m 
predicative adjectives and participles, and inserts to 
before infinitives, in both cases with appropriate placing 
of ,not ~ a~l a~ adverbs. 
(3) Special measures are taken to allow for the non-standard 
behaviour (as regards English auxiliaries) when equivalents 
include be, shoul,_...~d or can. 
(4) The CTP is so arranged that a treatment of government 
phenomena could be added conveniently. The routine con- 
cerned was developed only as far as the flowchart stage. 
Tasks of clause CTP 
The principal task of this CTP is to determine the order of" 
subject, verb and complements. For ex~ple, if in Russian a 
sentence begins with an intransitive verb, and the subject 
follows, the preferred translation depends on the length of the 
subject-short subjects can be put before the verb, but with 
long subjects this would not be acceptable in Eng.lish and some 
expedient, such as the insertion of the dum~ subject there, 
must be adopted (e.g. Then there arose the problem of..----~... 
Unlike the other CTPs described, this one was not 
implemented, being developed only as far as the flowchart stage. 
In its absence, certain pronominal subjects are inserted by 
ad hoe methods. 
Tasks of noun I verb and adjective CTPs 
Apart from certain insertions (such as ~ before past 
verbal adverbs ) , the main task of these CTPs is inflection. 
The decision to int~ect is based on the systems coding and, in the 
case of adverb formation, on the role given to the item by the 
analysis. The actue~l type of inflection is chosen according 
to a code in the dictionary associated with each correspondent; 
thus boundar~ will be plur~lised as boundaries , foo_..~t as 
fee__~t , and so one (Irre~lar forms such as feet are extrac- 
ted by the program from a list, using an address given in the 
dictio~ry entry. Including both nouns and verbs, this list 
contains 212 forms). Provision is made for inflecting the 
right word in multiple word correspondents such as mode of life . 
All vagaries of English inflection c~lIed for by present 
-9- 
~Letioz~ry equivalents are oovered. 
Selection of e~uivalents 
There are five CTPs which select equivalents on ~arious 
grammatical orlterla, usually the role of the item. A typical 
case is that dealing with 'nou~/edjeotives' such as ~pyrom 
This ensures that Apyrwuw aBTopaKw is translated by other 
authors, while rpaww~m ApyroR is translated as boundar~s) 
of another (assueLing, of course, that the analysis has giv~a tJaem 
structures of modifier-head and heed-qualifier respectively). 
Conelusion 
The model and synthesis algorithm described proved 
satisfactory in practical use. They h~ve the advantage that 
translations can be produced when the algorithms are incomplete: 
provided the sub-trees proauced by a partial analysis are linked 
arbitrarily to produce a single sentence structure, this can 
then be explored by a synthesis algorithm, even one in which 
several CTPs are replaced by dummies. As new packages (analysis 
passes or synthesis CTPs) become available they can be incorpor- 
ated very simply. 
The work described above has been carried out at the 
National Physical Laboratory. 

References, 

I. Hallid~y, M.A.K. (1961) - Categories of the theory of 
grammar° Word, 17, (3), PP. 241-292. 

2. McDaniel, J. et al. (1967) - An evaluation of the usefulness 
of ~achine translations produced at Teddin6ton , ~ an 
account of the translation methods. COLING 67

. Yngve, V.H. (t960) - A model and an hypothesis for 
language structure. Prec. Am. Phil. Soc., 10~, (5), 
