1. The numbe r of all the possible structures as a 
function of the ler~th of the sentence 
Ae soon as practical applications are coneidered 
the efficiency of the parsing method is of fundamental 
importance whether natural or programming languagee are 
to be proceesed. The problem of efficiency arises because 
the relationship between the length of e string of symbols 
and the number of structures that may in theory be 
assigned to the string is far from linear, the growing 
number of symbols entails a much more rapidly growing 
number of possible structures. 
For CF ~s it is comparatively easy to determine 
how the number of structures depends on the length of the 
string. Considering binary branchings only and excluding 
the possibilities that arise from having different labels 
attached to one node 
~ii ~n-l/ 
is the number of different trees that can be assiEned to 
a linear string of n elements \[I\]. This means that for a 
i0 element strin E the number of different trees is slightly 
less than 5000, for a 20 element string this number 
I 
becomes more than 1.75 milliard. 
To include non-binary branchings as well I suggest 
the following recursive formula 
g/if= ~21= 1 
g/n/= 2 g/21 ~',~I + g/31 j=l ~'jl + ... 
2 
... + g/n-2/ j~= g/j/ + F/n-I/ ~\] + I 
where n is the number of elements in the string. 
Accordingly, more than I00 000 different structures can 
be assigned to a I0 element string, and 1.6 x 1012 
different structures to a 20 element s~ing. 
Let us stress again that what we have calculated 
here is the number of the essentially different 
derivations, i.e. the number of those yielding different 
results. The number of possible derivational paths for I0 
elements is 18 times larger than the number of the 
different results, for 20 elements the number of paths 
is 750 times larger than that of the different structures 
2. Syntactic ambiguities 
Natural languages utilize but a small fraction of 
these possibilities. As to the number of possible stru- 
ctures of concrete sentences, the syntactic restrictions 
are very strong yet far from sufficient ~o yield inform- 
a~ion for unambiguous assignement. The number of stru- 
ctures allowed by the formal syntactic rules is in most 
cases definitely larger than the number of structures a 
human being becomes aware of in the course of speech. 
A well-known point is that unambiguity cannot al- 
ways be ensured by grammatical means even for artificial 
languages whose structure is immensely less complicated 
\[2\]. It is worth mentioning that the authors of ALGOL-68 
decided to let some ambiguities remain in the language 
as it could have been eliminated but by making the gram- 
mar a lot more complicated \[3\]. 
Where does the majority of syntactic ambiguities 
in natural languages come from? 
I. There is a number of words with varying scopes 
and vice versa: some words may fall within the 
scope of several different words and it cannot be 
determined by formal syntactic means - nor yet 
by semantic ones at times - whose soope they really 
fall within. These two things often combine, espe- 
cially in complex genitive constructions. 
A fine Russian specimen of which is as 
loll ows: 
...soze~oTBxe Xpyr-,x SaXoHos ooxl)aweHxa w 
O000euHoOTell saazuoxeMo~Bzs lao~'x~... 
The corresponding string of symbols: 
Pr g Ag Ng Ng E Ng Ng Ng 
The rules of reductions: 
It' _ + - l~g /i/ Ag MPg 
liil = NPg 
/iv/ ~ + = C 
I 
Apparently a number of different structures can be 
determined by changing the order of rule epplloatlon. 
2. Another source of syntactic ambiguities is that 
not even the string of symbols /categories/ can 
always be unambiguously assigned to the sentence, 
i.e. homor~ve~ may often appear on the morphological 
level. Homon~m~ arises either because formal diffe- 
rentiation between parts of speech is absent /e.g. 
in English/ or because the correspondence of the 
functions of words and the morphological means of 
expressing them is ambiguous, the morphological 
functions are not unambiguously expressed/e.g, in 
Russian/. Completely independent words with or with- 
out afftoea.too can of course agree in form. 
Still one seldom comes across a sentence that could 
be assigned several entirely different structures. Sen- 
tences of this type are usually puns or grammatical 
examples /cf. "Time flies like an arrow"/. It is the 
so-called local syntactic ambiAnaity that normally troubles 
us, i.e. a part of the sentence that can be assigned 
several different part-structures without influencing the 
remainder of the sentence-structure. Now if there are 
several locally ambiguous parts in the sentence and they 
are independent from each other, the number of ambiguities 
for the whole sentence will considerably increase: it will 
be the arithmetic product of the numbers of independent 
local ambiguities. 
3. ~uestions of tacti.c9 
The above numeric data clearly show how hops- 
less it is to simple proceed by checking on all the theo- 
retically possible structures. But it is also apparent 
that syntax-directed parsing systems will fsil in a con- 
siderable number of cases just because the sentence 
structure is syntactically undetermined E4~. The develop- 
ment of an effective analyzer is at least as much a 
mathematical as a linguistic problem. 
The most important demands a parsing algorithm 
should meet are as follows- 
/i/ It should be able to determine all the conceiv- 
able parsinge that a given sentence is assigned by a 
particular grammar. 
/ii/ It should be consistent in the sense that one 
parsing could not be arrived at but in one single way. 
/It should be a 'one-to-one algorithm'./ 
/iii/ In some way or other it should counterbalance 
the immense growth of the number of possible structures. 
The purpose to be strived for is a linear relationship 
between the steps to be taken and the length of the 
sentence. 
The efficiency of the algorithm depends considerably 
on factors that are independent of the particular method 
one has chosen to apply. These problems arise with any 
algorithm even if in different forms. The most important 
'tactical' questions of this type are as follows: 
/i/ Aesumin~ a large set of rules how does the 
algorithm select the rules that are /possibly/ to be 
applied? 
/ii/ How does it check for the conditions of 
applylnE them? 
/iii/ How does it recognize 'blind alleys' i.e. 
illegal paths /if any/? 
/iv/ How does it return from the illegal path to 
the legal one /or to the one that has not proved to 
be illegsl as yet/? 
Some of the well-known methods for solving /i/ are 
as follows: 
/a/ Each rule is explicitly assigned the set of rules 
by which it could be continued. But ehoosing thie method 
for s complicated grammar with a large number of possi- 
bilities one might fsce troubles. 
/b/ The rules are divided into several groups on the 
basis of different characteristics such as the number or 
the character of the symbols within the rule etc. 
Searching is then carried out within a comparatively small 
set of rules. 
7 
/c/ Each symbol is assigned a set of all the rules 
this particular symbol appears in. Assignment can be done 
according to the position numbers rithin the rules. The 
so called initial symbols, i.e. symbols in first position~ 
play then a distinguished role in the rule selection. 
Whatever method one applies one may choose one of 
the several possible ways of practical realization. In 
case of /c/ the choice made will be of immense importance 
/e.g. rules aITanged in matrix form, chainlike 
representation etc./. 
Problems /i/ and /ii/ are strongly interconnected. 
How are we to decide whether the conditions of applying a 
rule are met? 
In the case of CF grammars checking could be carried 
out quite easily. For top-to-bottom analysis all we have 
to do is the identification of the left-hand sidesymbol of 
the rule.Yo~ bottom-to-top analysis based on normal form CF 
rules /i.e. binary branchings/ only, once again it is not 
too difficult to check a twodimensional table for the 
possibilities of connecting a pair of symbols. 
If general form CF or CS grammars are applied, the 
problem is not trivial at all, it turns out to be that of 
identifvin~ strir~s of symbols. It could of course be 
solved in a trivial way but this would require an awful 
lot of work to do. B. DS~iki has developed a most 
elegant method that would examine a whole series of rules 
at ones,. The checking is performed on Boolean vectors, 
end the point DSmSlki has made an excellent use of is that 
computers carry out logical operations on ell the bits of 
a machine word at the same time \[Sj. 
Two subproblems connected with checking rules should 
be discussed: 
/a/ When should it start at all? Suppose that the 
symbol string is processed in sequential order 
/left-to-right or right-to-left/ and a possibly applicable 
rule or a given context should be checked for. Then we 
could either go back to symbols that have already been 
examined /and check them repeatedly when checking for the 
applicability of various rule~ or have already begun and 
completed certain examinations so that we finished 
checking by the time its result is needed. /The second 
solution could of course be applied only if an appropriate 
mechanism automatically provides the checking for the 
conceivable conditions and the /gradual/ cancelling of the 
non-realizable possibilities./ 
/b/ Is some kind of an additional examination 
necessary before the checking is completed? Namely it might 
turn out that the whole checking was superfluous because 
its result cannot be used later on or it will not lead to 
a correct result. 
We have come very near to /iii/, i.e. to how the 
occasional impasses /blind alleys/ could be recognized in 
the course of the analysis? This is a cardinal problem 
concerning the efficiency of automatic analysis. The 
growing length of the sentence /symbol string/ entails not 
only a growing number of possible structures but the 
number of inappropriate part-structures growing as well. 
These 'torsoes' correspond to certain parts of the sentence 
but are incompatible with the remainder of it. What is 
more, the longer a sentence the more levels it ma~ have 
i.e. the deeper its structure can be. This holds for the 
blind alleys as well: the longer the sentence the deeper 
the blind alley can be, the more branches and the more 
valid elements it m8~ contain. Sentences that are 
monosemantic thoug~ syntactically ambiEuous could he 
thought of as bottomless blind alleys not yet explored 
whose exploration needs either a wider context or the use 
of interrelationships not contained in the text. 
The problem once again becomes twofold: 
8/ What is the criteriu~ of having got into a blind 
alley? 
b/ How could we prevent getting into a blind alley 
at least in some cases? 
The answer to these questions ms~7 be different, of 
course, for each algorithm and plays a subordinate thongh 
extremely important role regarding the "strate~" applied. 
I0 
Just to give an example I would like to mention a 
most elegant method of defining end "calculating" the 
criterium of blind alleys using an algorithm built up in 
terms of logical vectors. DSmSlki \[5\] -- who condenses the 
information related to the hypothetically accepted part 
structure and to the given symbol string under processing 
into s state vector defined recursively -- applies the 
following criteria to determine the impossibility of 
continuing the analysis along the given line 
(,(Q0V B)A H \[xt÷ ~ = 0 
Accordingly the new symbol xt+ 1 to be processed may 
neither continue the paths the previous vector of state 
contained that have proved possible so far, i.e. 
T(Qt) A H \[Xt÷l\] = O, nor begin a new rule, i.e. 
B^. =o. 
The only handleap of DSm51ki's method is that impasses 
can be recognized only after the algorithm has got into 
them -- the algorithm cannot pick out the paths that will 
lead into an impasse later on. So we have modified the 
algorithm and instead of using DSmSlki~s vector B - that 
would'activate ~ the first position of each of the rules - 
we let only those of the rules become active that provide 
/direct or indirect/ continuation of the paths that have 
already proved to be legal \[6\]. 
II 
Experience so far shows three practical methods of at 
least partial avoidance of impasses: #i/ taking into 
consideration the context; /ii/ making use of the 
transitive connectivity of the rules; /iii/ checking 
ahead the number of symbols not yet processed. 
Taking into consideration the context means making 
use - if possible -- of only one direction of the con- 
text to avoid the repetition of the tests performed. 
Today such analyzing grammars play an important role in 
the analysis of artificial languages \[TJ. 
In m~ opinion making use of the transitive joining 
of rules has yet mar~y important possibilitiem to offer. 
P. Z. Ingerman's analysis is a good example of experiments 
in this direction \[aj. 
Taking into consideration the number of symbols not 
yet processed, saves the analysis mmly unnecessary tests. 
There have been attempts at doing a preliminary global 
analysis of the complete symbol string on this basis to 
assess in advance the possibilities of each path of the 
analysis \[9\]. 
Finally let us mention the question as the last of 
the questions of tactics: 
/iv/ How to find the way from an' illegal path back 
to a legal one? 
12 
This is the task that must somehow be solved by the 
parsing algorithm. So it is not enough to give a sign or 
~flag" at the points where the decision may perhaps be a 
failure. /i/ It must be ensured that the state prior to 
commltti~ the error is reconstructed. /ii/ It would be 
advantageous to return to the state immediately prior to 
committing the error thus avoiding unnecessary delays. 
Hever%heless9 there exist fine algorithms with no 
assurance that every error could be corrected. One of them 
is the well-known 'compiler compiler' that would never 
reinterpret a part of the symbol string if the part has 
once been accepted, consequently it is unable to recognize 
certain structures.) 
One of the possible solutions to the problem in 
question is to have the "current state" of the analysis 
stored whilst proceeding so that it could be accessed 
later on. What we have termed "current state" here may 
include all the half-finished and abandoned rule 
applications that could be continued only after other 
rules have been applied. Whenever reaching back for 
a previous "current state" the possibilities that have 
ceased to exist in the meantime can always be cancelled. 
/The techniques followed for practical realization may 
vary depending on the amount of information to be stored, 
on the msmor~ area available for the working fields~ etc. 
(In most cases some kind of a push down store is applied.) 
13 
4. The strate~iy of anal~sis I. 
The problems mentioned so far are common in varying 
degrees for all parsing systems, the ways they are solved 
have no decisive influenc~ o,1 the whole flow of analysis 
/though they are of decisive importance as far as efficiency 
is concerned/. 
T.V. Griffiths and S.R. Petrick base the determination 
of the types of parsing systems on two considerations EI~ 
/whilst stressing that 'some procedures are described in 
these terms only with difficulty' and 'others seem to 
allow no such classification'/: 
/i/ In what direction does the parsing proceed - 
is it a top-to-bottom or a bottom-to-top analysis? 
/The third type mentioned - 'direct substitution 
algorithms' - is a subclass of the bottom-to-top 
algorithms./ 
/ii/ Does the algorithm apply any means of a preven- 
tive reduction of the number of blind alleys, i.e. 
for increasing the 'selectivity' of the algorithm? 
Their most important findings concerning the effi- 
ciency of the different types of algorithms are as follows: 
/a/ Algorithms proceeding from top to bottom - 
I . . especially those of the direct substxtutlon type - 
are the more efficient ones. 
/b/ Methods of increasing selectivity are of no 
special importance in the case of top-to-bottom 
14 
analyses but they do considerably increase the 
efficiency in the case of bottom-to-top analyses. 
/c/ Efficiency is demonetratably influenced by the 
asymmetxT/left-branching or right-branching/ of 
the structure to be analyzed. In the case of 
analysis proceeding from top to bottom it is in- 
fluenced in the reverse direction if compared 
with the analysis proceeding from the bottom up- 
wards. /We assume that the analysis proceeds 
either from right to left in both cases or from 
left to right in both cases./ 
They considered the parsing time of the following 
sentence types: 
ab n anb a% n abncd 
left- right- embedding compoun d 
branching branching 
/left-branching 
/'regressive' /'progressive' with respect to 
in Yngve's term/ in Yngve's term/ recursivity/+ 
Parsing time as a function of sentence length in- 
creases - according to Griffiths' and Patrick's date - 
as follows: 
*The grammar given would have allowed right recursivity 
as well /abncdm/ but in the measurements only the above 
restrictions of grammar are dealt with. 
ab n i ' .... 
top-to- non-select. I selective 1 
bottom q u a d a t i c 
bottom- I i n e a r I to-top ~. 
I 
a% 
non-select, selective 
linear 
exponential linear 
/ I 
a 
S 
A / \b 
/ \b 
a/S\~ 
/\ 
8 ". \ 
B 
top-to- 
bottom 
bottom- 
to-top 
anb n abncd 
non-select, selective 
linear 
exp° linear 
non-select, selective 
exponential 
exp. cubic 
a/~\b 
a S b 
/\ 
a b 
A/\ B /\ I\~ 
A bB 
• .\ bin A" b 
I a 
16 
According to GrifTiths's and Patrick's data it is 
the bottom-to-top selective parser alone that is able to 
analyze sentences of the last, comparatively simple type 
grammar with a better than exponential efficiency. 
What are the underlying reasons for the results 
obtained by Griffiths and Petrick? 
/I/ Bottom-to-top algorithms are characterized by the 
fact that they take their start from what actually exists 
instead of looking for what "could be" ~llJ. 
In the case of exceedingly extensive ATsm~ars the 
top-to-bottom analysis must work with a huge number of 
potential possibilities and the elements of the symbol 
string to be analyzed will but slowly filter out the 
possibilities that may not be realized. 
/ii/ Selectivity, in the sense Oriffiths and Petrick 
use the term, does not influence all this to any degree as 
the :filtering on the basis of a precedence-matrix extends 
only to testiv~ the first element. It will be shown later 
on that selectivity can be considerably increased and, 
going even further, it could be made the basis of the 
strategy of the analysis. 
/iii/ In the case of bottom-to-top analysis the 
situation is entirely different. Here the seemingly 
identical apparatus works with a much greater efficiency. 
 t/ /the nook  ead" condition s ested by  etian 
17 
/i.e. the possiblli~y of the resultant symbol aohleving 
its aim oheok~ the uompatlblllty/ one level higher up and 
the distanoe from the top is so muoh less. /b/ Here on17 
euoh rules are to be realised in whioh all the oomponents 
oan be found, the others are omitted in the oourse of the 
rule oontrols. It is out of the question therefor to regard 
this eeleotivi~y as analogoue with the top-to-bottom se- 
leetlvi~y that is based on the flret ~i of the lowest 
level. 
/iv/ Griffiths'e and Patrick's measurements of the 
effect of the asymmetry of sentences on the efficiency of 
the analysis are a practical justification of an observation 
I made in 1964. In an article about Yngve's hypothesis ~13j 
I developed the idea that for lar~uages that have mostly 
"progressive" /right-branching/ structures it is the right- 
to-left analysis that is more effective in the case of ana- 
lysis from bottom to top. /The right-to-left analysis is 
equivalent of course with a left-to-right analysis in e 
system that is a mirror image of the original./ 
In case of pure structures the explanation of the 
phenomenon is simple: In a right-branching structure the 
number of erroneous linkinge is started at the end of a 
sentence. Let us take the example from the above mentioned 
article of mine: 
8HasTe \]~opo TeopeM o llpe~ezez. 
18 
Its processing from right to left is very simple: 
8HeeTe lmoPo Teopex o npexelax 
J 
J 
\] 
If, however, the analysis is started from the 
beginning of the sentence we get erroneous /or incomplete/ 
linkages again and again: 
\]~K BHeeTe 
aHaeTe ~OrO 
Msoro ~eope~ 
In the case of complex structures the situation is 
more complicated. In this case the effectivity greatly 
depends on the method used for eliminating the impasses. 
/On the disadvantages of vertical analysis see the 
next para~rraph. / 
19 
5o The strategy of ~sis II. 
When determining the type of analysis apart from its 
starting point it is also very important to know along 
what paths the analysis proceeds towards its goal, or in 
other words in what sequence the tests are carried out 
together with the inseparable question of in what form or 
structure the part-results su'e stored. 
On the basis of these considerations there are two 
basic types of Parsers.In theory this classification is 
independent of the fact whether the analysis proceeds from 
the bottom upwards or from the top downwards. 
/i/ Those parsezs that proceed with "msxim~ width" 
from level to level working on the full symbol string, 
first produce all the reductions that may be achieved by 
applying a single rule~ than those that may be obtained by 
applying two rules and so on until the part-structu_~es thus 
obtained are &Tadually linked. /In the analysis that 
proceeds from top downwards, these correspond to the 
derivations produced by applying two, there, ... rules, 
followed by the comparison of the terminal symbols thus 
obtained with the symbol string being analyse~./ 
/ii/ The ps.rse~sthat proceed with a "minimum width" 
and the "steepest slope', while gradually extending the 
20 
elements of the symbol string take the first opportunity 
to apply a rule and will not extend the analysis to a new 
symbol until there are new rules that could be built on 
the rules applied so far. 
We could mention as an example for the first method 
the ~kai-Nagao algorithm \[14\] \[15\] the Cocks algorithm 
\[16\] or its application by Kuno to context sensitive 
languages \[17\]/the same strategy is applied by Vsuquois in 
his analysis of Russian/. The algorithms by Woods \[18\] by 
Boracsev \[19\] and the DSmalki-Varga algorithms \[5\] \[6\] are 
exnmples of the second method. 
Both methods have their advantages and disadvantages; 
perhaps it may be :useful to draw the attention to them. 
The great advantage of the analysis that proceeds 
from level to level is the ease with which in case of 
appropriate storage the part-analyses that could be 
continued alon E the same line, are contracted /see 
Griffiths-Petrick: "Merging similar sections of different 
\[ =ing  aohine\] pathsV. 
Its disadvantage is the fact that 
a/ relatively large number of independent part structures 
has to be stored, 
b/ it needs relatively lengthy tests to detemine whether 
the individual part structures are compatible. 
21 
The strategy of "maximal hierarchization" is more 
advantageous beyond doubt as far as econom~ in storage is 
concerned because in this case s single push down store 
will suffice to store the results and all the paths that 
have proved incorrect may be removed once and for all from 
the push down store together with all the derivations. This 
principle may be formalized as follows. 
Let us denote according to inverse Polish notation 
the result of the rule applied to the elements 
8 k ak+l.., ak. r with the result B m as 
r 
a k ak+ I... ak+ r B m • In other words let %he 
elements of the symbol string that we applied the rule 
remain in the symbol string and let us simply add to the 
end of the string the symbol obtained as the result of the 
rule application. 
Accordingly the resulting symbol string will be 
mini al'''ai ~i rl& i 
after applying the first applicable rule, 
Let us suppose that there are a% most m-I more 
applicable rules following the first one while no new 
symbol is read /m- ~ O/ 
The symbol string will become 
rl rm i %...a i ... Bm. rj-  
m 1 
22 
~Fnile continuing the application of this principle 
the symbol string will be increased by new terminal and 
non-terminal symbols: 
r m r. 
al.." ai i .. Bm ai+l aj o rain sin max j m i .... ; 
~I r rj Brm+l r n 
maXn mini maxm minl 81"'" ai "''Bin m ai+l'''aj m+l Bn 
If the analysis gets into an impasse and cannot continue, 
r then we have to return to the symbol BsS last applied, 
remove it and continue the analysis applying the above 
principle. /First an attempt is made at applying another 
permissible rule in the same place and only if this fails 
shall we take a new a t symbol end continue the analysis./ 
The return from an impasse always means the deletion 
of the last non-terminal symbol and the reconstruction of 
the symbol string following it. /We would like to mention 
that this principle of analysis may be quite easily adopted 
to analyze context sensitive languages as well/. 
This undoubtedly elegant principle of application 
produces the first possible analysis relatively rapidly, 
in its canonic form. 
The increased selectivity of the analysis gives us a 
procedure that could be very well used in practical 
applications. Going further, having obtained the first 
23 
analysis if the analysis is continued on the same principles 
/just as if the first correct analysis were in an impasse/ 
all the other analysis ms,V be likewise produced. 
The disadvantages of the applied strategy of analysis 
are as follows: 
/a/ If right at the beginning of the analysis we have 
taken an incorrect path, then the correction of this error 
may only be done after all the following and in part 
independent applications of the rules have been ideleSad. This 
means that the correct, or perhaps the only possible part- 
results are lost: after putting the error right they have 
to be re-generated. 
/b/ The position is somewhat similar as far as the 
erroneous part-results are concerned: the analysis may get 
into a "local" impasse several times. 
/c/ A new, different system of storage and searching 
must be provided if we wish to ensure a newer generation 
of the Identloalcontinuations -- supposing that previously 
some kind of a change took place in the determined 
structure. 
24 
6. A new strateKy su~j~ested for analyzin~CF lanAmaKes 
The exponential increase in the time of analysis in 
various systems of analysis is obviously due to the 
increase in the number and depth of impasses, to their 
various branches-- in short to their dangerousness 
increasing with the length of the symbol string. 
This is the dangerous point I tried to dodge by 
elaborating a parsing system that applies selectivity not 
as an additional device for increasing the efficiency of 
some method but as an independent method itself. 
The linesrity of the increase in the process of 
analysis may be best achieved if the symbol string to be 
analyzed can be segmented in accordance with the highest 
level rules applicable and these parts could be analyzed 
sepsratedly. If several parsings can be assigned to any of 
these segments /cf. what we have said about homor~ym~ on p.5/ 
the structures corresponding to the whole sentence can be 
produced from the local part-results by combinatorical 
means. 
Segmentation requires the following apparatus: 
/i/ the transitive Initial matrix of the rules 
IBla, n/l 
/ii/ the transitive continuation matrix of the rules 
/c/a,n// 
25 
/iii/ the transitive end matrix of the rules /E/a,n// 
/ivy/ the transitive Initial matrix of the i th 
rule component /Bila,n// 
Ivl/ the transitive continuation matrix of the i th 
component /Ci/a,n// 
/vii/ the transitive end matrix of the i th component 
/Ei/a,n// 
/vii/ the matrix of the number of rule components 
/Vii ~n/l 
The structure of the transitive initial ma£rix of 
the rules is almost the same as that of the so called 
precedence /or complete connectivity/ matrix. The 
differences show up in two fao%s,namely 
a/ the lines correspond to the terminal elements only 
and not to all the elements of the vocabulary V; 
b/ the columns are assigned to rules of the grammar 
and not to the symbols. 
Thus it is a Boolean matrix B/a,n/; its element 
B/a,n/ is a truth function whose value is t if and only 
if the grammar allows the terminal symbol ~ to be the •first 
element of the terminal rewritin~ of the n th rule. 
The transitive continuation matrix of the rules 
C_/a,n/ is a Boolean matrix whose element C/a,D/ is t if end 
only if the terminal symbol a is whichever but not the 
first element of the terminal strings of the n th rule. 
The value of an element E/a,rg of the transitive end 
26 
matrix of the rules is t if and only if the terminal 
symbol a can be the last element of the terminal strings 
of the n th rule. 
It follows from the definition that 
B/ap,n/ = E/ap,n/ and C/aq,n/ = E/aq,n/ 
may occur but B/~,n/ = E/ap,n/ = C/sp,n/ may not. 
The Initial, continuation and end matrices of the 
rule components can be defined in s similar way, so it 
will be sufficient to give the definition of the initial 
matrix of the i th rule component: 
The value of an element B i /a,n/ of the initial 
matrix of the i th rule component B i /a,n/ is t if and 
only if the non-terminal symbol a ms~v be first element 
of the terminal strings of the i th direct component of 
the n th rule. 
The line of thought of the algorithm is as follows: 
Tests ape carried out on two levels: on the level of 
inter-Pule linkages /from top downwards/ and on the level 
of inter-terminal-symbol linkages /from left to right/. In 
each successive step of the test the individual components 
of the rules are made to correspond in the sequence of the 
components to a certain series of the terminal symbols of 
which the given component may be built up. By continuing 
this process finally either we arrive st the terminal ending 
in case of all oomponents or the given segmentation is found 
27 
to be incorrect. 
In case of incorrect segmentation first the 
permissible branches of the latest segmentation are tested 
by the algorithm. In our experience the selectivity of the 
system is considerable. Therefore even the storage of 
relatively small quantity of information allows a rapid 
examination of all the possibilities. 
During segmentation we apply a "principle of 
segmentation" that is analogous "4;o the principle 
discussed in connection with the "maximum hierarchization": 
the shortest component that is nearest to the beginning of 
the seEment or to the end of the previous component, is 
taken and used until it becomes evident that for some 
reason the given segmentation is not applicable. In this 
case an attempt is made at solving the situation by 
shifting the last border of segmentation to the right: only 
if this leads to no result, is the previous border of 
segmentation changed. The outstanding effectivity of the 
method applied is due to 
8/ making best use of the bottle-neck for the 
reduction in analyzing tim; 
h/ the fact that the tests for the possibilities of 
various part-segmentations can be quickly 
performed; 
c/ the possibility of testin~ each segment in 
complete separation from all the other segments; 
d/ the fact that the twoaided approach leads to much 
fewer unnecessqry pert results than either Cock's 
or the well-known top-to-bottom algorithms. 
28 

References

1 Berge, C. Th~orie des graphes et see applications, 
Dunod, Paris, 1958. 

M Ginsburg, S. The Mathematical Theory of Context-Free 
Languages, McGraw Hill, New York, 1966. 

3 Algol-68. MR 95. /mimeographed/ 

4 Shrejder, Ju. A. Teorija tolerantnosti, Nauchno- 
Texnieheskaja Informacija, Ssro 2 

5 DSmSlki, B. Voprosy sintaksicheekogo analiza dljs 
formal'nyx jazykov, Computational Linguistics 5, 
pp. 41-93. 

6 Varga, D. Problems of Machine Analysis, Linguistica 
Antverpiensia If. pp. 415-428. 

7 Knuth, D.E. On the Translation of Languages from 
Left to Right, Information and Control Vol. 8, 
NO 6, pp. 607-639. 

Kaufman, V. Sh. 0 raspoznavanii nekotoryx svojstv 
kontekstno-svobodnyx grammatik, l-ya Vsssojuznaja 
konferencija po programmirovaniju, Kiev, 1968. 

8 Ir~erman, P.Z. A Syntax-Oriented Translator, Academic 
Press, New York, 1966. 

9 Unger, S.H. A Global Parser for Context-Free Phrase 
Structure Grammars, Comm. ACM, Vol. ii, No 4, pp. 
240-247. 

I0 Griffiths, T.V., Petrick, S.R. On the Relative 
Efficiencies of Context-Free Grammar Recognizers, 
Comm. ACM, Vol. 8. No 5, pp. 289-300° 

ii Cf. Vakulovskaja, G.V., Kulagina, O.S. Ob odnom al- 
gori1~e sintaksicheskogo analiza russkix tekstov, 
Proble~ kibernetiki 18, p. 218. 

12 Bastian, L. A Phrase-Structure La~uage Translator, 
AFCRL Rep. 62-549, AF Cambridge Research Labs., 
Bedford, Aug. 1962. 

13 rares , D. Yr~ve's Hypothesis and Some Problems of 
the Mechanical Analysis, Computational Linguistics 
3, pP. 47-72. 

14 Sakai, I. Syntax inUniversal Translation, PrOeo 
1961 Internat. conf. on MT of LanguaEes and 
Applied Language Analysis, London, 1962, pp. 59~-608o 

15 NeEao, M. Studies on LanEuaEe Analysis Procedure end 
Charectel- Recognition, Eyoto University, 1965. 

16 Cf° Hsys, D.G. Automatic ~e-Data Processing, 
Computer Appllcatlons in the Behsviorel Sciences t 
Prentice-He11, Englewood Cliffs, N.J., 1962, pp. 
394-421. 

17 Kuno, S° A Context-Sensitive Reco~ition Procedure, 
NSF-18, &u~. 1967. VII-I-28o 

18 Woods~ W.A. Context-Sensitlve ReeoEnition~ NSF-18° 
£ug° 1967. VIII-I-23. 

19 Borsceev, V.B°, Efimova, E.N., 0 sokreshchenil pete- 
bore prl sinteksicheskom anallze, Neuchno-Texni- 
cheskaje Informacija, 1967, No I0, pp. 27-33. 
