1955 !uternational 
C omput at i onal 
uonzere:zce 
Linguistics 
O~ 
-., .,. ,', T A ,, .. T~. C i ~'i r,~, .: 7', 
T.7 ? !"i ?~" "T ~,#" i Z i~ Z E C T i ". O 
ZAUGUAGE S 
D. Varga 
~o~.,pu~ing Centre of the 
Hungarian Academy of Sciences 
53, Uri u., ~udapest i., Hungary 
\.,_ - ..;>. 
Varga 2 
I~STRACT 
The paper d{scusses the two main methods %~ased 
on the ~epen0ency ~rammars an~ on PS grammars 
used in s2ntactic analysis of natural languages. 
In the case of highly inflecting languages the 
ZS analysis has the main disaavantage that they 
considered syntactically homo~eueous categories 
the number of rules to be a~Flied increases ra- 
plaly. The paper ?romoses the metho~ of partial 
decomposition into morphemes in oreer to in- 
crease the efficiency of the rewritin~ rules, 
so that t?~e problems of rectlon an~ agreement 
ca~ be solved for highly iufleotln~ languages. 
1. Language analysis needs an approach to language 
different from the generation of the sentences of 
a given lancuage: 
1.1. In the case of analysis one has to reckon 
witch t~e fact that beeauEe of the restrlcte~ accu- 
racy of the wc, y lan~:la~e ~ata are ~esiLT~ate~, ;.';e can 
often attain our air:, /i.e. the esta!:lishin~, of the 
real str;ict~_,re of t~'e sentence consigerer','/ only 
after the testi:',~ of several altez;.atlves, i.e. it 
is impossille to ~o\].ve t~:e raise~ problems ~izect\]~, 
without retur~s. We have not at our (!isposal at 
every stage of the analysis t},e information that 
-~'.~oul~ make a c\]ear-c~t ~ecisiou possi%~le ;,,it}~ z es- 
?ect to the path to %'e fol\].o~,.'ecl in the next staces 
of the analysis. This is ~hy it can be said that 
analysis depen@s to some extent on the previous 
history of the analysis. This requirement~ however, 
does not necessarily lead to the reformulation cf 
the mules but may come to the ~ ~ in ~o~e a ne~,~ way of 
t eir a plioation or their orde  of   lic tlo !ii. 
Of course, it has to be ensued the correctness of 
the analysis that the correct structure can be ob- 
tained by testing in all cases.. 
1.2. If we are interested in the problem not only 
from the side of theory but also that of its 
practical applicability, then we have to ensure the 
optimalization of the way the correct structure is 
revealed.The optimalization of analysis is related 
- in many respects - to the requirement of simpli- 
city in language theory. Of course the method to 
be applied is not in@ependent of the typological 
properties of the language under consideration,and 
this applies, above all, to the optimalization. 
1.3. if we aim at the analysis of natural languages 
our main requirement may be much less stringent 
than the requirement of generative grammars. Gene- 
rative grammars, quite reasonably, consider as a 
Varga 4 
principal requirement that any grammar should ge- 
nerate all sentences of a given ~ .... ~ ~s~o~ and only 
these. An analogous stipul&tion is not necessary 
in the case of analysii 
we want to analyze only 
the case of artificial 
since we may ass~qe that 
impeccable sentences. /in 
languages - for instance, 
in the case of progra~ning languages - the situ~on 
is quite different: it is a basic requirement that 
the analyzator should be able to distinguish the 
syntactically impeccable strings from the incorrect 
ones, i.e. that he could disclose the syntactic 
faults./ 
Now the question is that what kind of methods or 
which combination of methods may lead to the re- 
cognition of the structure or structures of any 
syntactically impeccable sentence, within optimal 
time and with especial regardto highly inflecting 
languages. 
2. With respect to the non-inflecting or only 
weakly inflecting languages there is a useful 
method for analysis, namely, the reversed applica- 
tion of the so-called rewriting rules. Besides its 
Varga 5 
simplicity, this method offers quite a few advan- 
~y it is based on the n~thematically 
well-formalized phrase structure grammars, secon~- 
!y, from a linguistic point of vie~.v, it is related 
to the IC grammar that has been elaborated for the 
analysis of natural languages. 
In the case of inflectional languages, however, 
the application of such rules meets ~ a cz~l- 
c~loy which is due to the fact ~ha~ the applioa- 
~_on of suon rewr "+'- ~ l~in~ rules means the processing 
of symbols assigned to the categories of syntacti- 
oal!y homogeneous elements. The number of the 
categories consisting of such syntaotieally homo- 
geneous elements is very high in these languages 
and each additional category increases the number 
of the re~Triting rules by So many rules as there 
are different structures in which the category in 
question may occur. The number of rules would 
amo'Ant, for instance, in Russian to about 30-40 
thousand, %vhioh diminishes the applicability of 
the system considerably. 
The excessive increase of the categories is mostly 
due to the fact that the olassifigations according 
Vazga 8 
to the different points of view may occur ingepen- 
cen~ly of each o~.er'. __r# m emzzeren~ basic catego- 
ries were needed accor£in~ ~o one aspect of clas- 
sification and n ~ifferent categories according to 
+~'~ into account both another aspect then~ • ~-~.o 
aspects, m.n differen~ basic categories woul$ be 
called for. If, for instance, the classification 
of substantives according to section needed 
~aool~!cao~on accord- seven basic categories, the ~ ~'=" ~ 
ing to the cases 6 basic categories, an~ the 
classification according to the numbers 2 dif- 
ferent categories then - instead of a slno~e 
substantival /N/ c~te~orj - 7.6.2 = 84 catego- 
ties would be necessary. 
should we take into consideration 
between male and female, animate 
let alone the semantic categories, 
obtain a completely unmanagable apparatus. 
It is easy to see that 
the differences 
and inanimate, 
then we ~ould 
3- Dependency grammars have been elaborate~ mainly 
to circumvent t~e ~ifficu!ties raised by inflec- 
tional languages. ~ ~ is interesting to note in 
passing that in the Soviet Union this conception 
prevails even today in the groups engaged in ma- 
Varga 7 
chine translation. According to dependency grammar 
we have to consider the category of the distinguish- 
ed ~or~ form as a representative of a complex cate- 
gory in each case a rewriting rule is applie6. In 
this way the concretness of the categories is main- 
tained. Xastly the predicate represents the whole 
sentence,standing as it aces at the top of the tree 
diagram. 
At first glance a dependency grammar seems to ex- 
hibit quite a few advantages from the point of view 
of highly inflecting languages. This advantages may 
be summarized as follows. " 
~) It 'traces back the relations within the sentence 
to .he relations h, et%,;een concrete woza forms. In 
this way the establishment of the sentence structure 
is traceable back to the establishment of the rela- 
tions between concrete words, i.e. to the examina- 
tion of micro-structures. 
(ii) In the case of highly inflecting languages v~heze 
the zelatimns between words come to the fore through 
their outer form, namely through the form of agree- 
merit and rection, the information obtained in this 
Varga 8 
way may he used immed "~ l- i~te y for finding out the 
sentence structure. 
(iii) On the basis of the direct relations between 
words the analysis may start at any point: at the 
top of the tree diagram or at the bottom or in the 
order given by the words of the sentence. 
~v) No difficulty in principle is encountered in a 
dependency grammar analysis in the uniform handling 
of continuous and ~iscontinuous structures. /These 
structures are rather frequent in highly inflecting 
languages, due to the fact that they have more ef- 
fective means at their disposal than word order for 
expressing relations between words./ 
In spite of these advantages dependency gramr~ars 
have not solved the problems definitively as it 
has turned out that these advantages are only of a 
rather restricted character. 
Ad (i). It may happen that the examination of the 
relation between two words does not provide enough 
information for further analysis. The statement of 
complementary conCitions is rather gifficult in 
Varga 9 
these cases and can be done most cases only by an 
ad hoc a~jusment. 
Ad (iii). Although it is possible to begin the 
analysis at the top of the ~epen~ency tree, such 
an analysis eeman~s either a rather laborious 
testing process or the storing of a grat amount 
of information. / It is illuminating from this 
point of view to follow the ~evelopment of predic- 
tive analyses beginning with the original concep- 
tion of Ida Rhodes up to the varianl elaborated 
by Kuno-Oettinger-Plath. Accor6ing to Rhodes the 
analysis is to be carried out on the basis of 
dependency grammar, beginning at the top of the 
dependency tree. The ne~v version of £ependency 
grammar is based thoroughly on ,the conception of 
IC grammars. As is \]~own, the main defect of the 
earlier version ~!as caused by the fact that when 
longer sentences were to be analyzee the predic- 
tions to be stored increased in an excessive way./ 
Ad (iv). In principle it woule be possible to ana- 
lyze all possible cases of the discontinuous struc- 
tures but such a full analysis seems to be unattai- 
nable in the forseeable future. / Kulagina's main 
Varga lO 
en~avour is aimed at excluding on the basis of 
a preliminary analysis those constructions that 
cannot be further expected an@ making possible 
F this analysis equal to the full analysis~2~/. In 
practice the analysis is al~(~ays carried out on the 
basis of some simplifying conditions or hypo- 
theses concerning chiefly the @ecomposition of sen- 
tences or the relations of some structures/pro- 
jectivity/. 
¢. Different methods have been proposed to circum- 
vent the difficulties raised by the IC grammar ana- 
lysis. Choms~ tackles these proposals /proposals 
of Harris,Matthews, 8toch,:el\], Anderson, Sohachter, 
Harman an~ others/ in his paper submitted to the 
Magdeburg conference; he concludes,"the problem of 
remedying this defect in PSG is clearly very much 
open, and deserves much further study" \[3\]. With 
respect to Russian it is Plath who has recently 
elaborated an ingeniou s inSexing and index-trans- 
mission system which sets out to ensure the many- 
sided applicability of the rules an~ the transmis- 
sion of tile information from one symbol to another. 
Chomsky points to the fact that the indexing of 
Varga II 
categoriea and the introduction of complex symbols 
means essentially the application of a special type 
of transformational gramzar. Un@oubtedly, the pure 
methods have not yielded the expected results in 
the analysis of natural languages. Chomsky himself 
suggests a compromise with respect to similar ~if- 
ficuities that arise in generative grammars, 2tac- 
tically it goes about a new dimension,neglected so 
far, namely about the paradigmatic lavel. Chomsky 
pose~ the alternative straightforwardly : either 
one should accept the decomposition Into morphemes 
or opt for the para@igms. He himself pronounced in 
favour of the paradigmatic conception. 
Chomsk 7 has been led to this decision by the com- 
plexity of the morphemes. However, it should be 
added that quite different questions arise in the 
case of agglutinative languages where the inflec- 
tional morphemes generally serve to express a 
single grammatical function. So, for instance, in 
Hungarian hgzaknak = 
= hgz ÷ ak + nak 
house + P1 + Dat 
If we take account of this structure of words the 
decomposition into morphemes seems more justified. 
Varga 12 
Taking into consideration the aspects of the syn- 
tactic a~aly~Is an intermediary solution offers it- 
self: with the aia of common rewriting rules /with- 
out increasing their number essentially/ a conside- 
rable part of the syntactic relations may be de- 
tected if we decompose the sentence - but only 
partially - into morphemes, i.e. if we separate 
the case category from the basic category. This 
means that we~may use the same symbols for the de- 
signation of cases of substantives, adjectives , 
pronouns etc. and it is necessary to decompose 
the correspon@ing categories. On the other hand, 
the case category is handled separately, the role 
of which is a syntactic one in the first place. 
Last but not least it facilitates the separation 
of case and gender - number which is important in 
the processing of relative pronouns. 
A similar situation can also be produced artifi- 
cially in the case of the machine translation of 
nonagglutinative languages. As in machine trans- 
lation the morphological analysis precedes the 
syntactic one, in practice there are no difficul- 
ties to transform the occuring word forms on the 
basis of the morphological analysis carried out 
Varga 13 
previously in such a way that the grammatical in- 
formation becomes explicit and so the word forms 
are rendered "agglutinative". 
To find out the rection we have usually to take 
into consideration the following factors: 
a./ the category of the construing word stem; 
b./ the case ending of the construed word; 
c./ the category of the construed word stem. 
It is, however, u.unecessazy to consider the case 
ending of the construing Word. E.g. 
instr. pyEOBO~ZTe~s ~a@e~po~,~ N ~o~. + N~e. "-~ N~o~. 
9en. -t- ~insEr. ~ Ngen. 
py~OBO~Tes~ ~aC~e~o~ N ~'~' * 4d:. +" N in~f.r. "-~ N doE, 
By separating the case ending and by placing it 
before the word have instead of (~) a single rule: 
N i~. + inset + N -~ N 
The rection can be examined by means of simple 
context-restricted phrase structure rules: 
A+~+N ~ N/~ 
A + S +lq --~ N/S~ eta. 
(,> 
The decomposition into morphemes can also be used 
with respect to the participles and the infinitive. 
Consequently the problems connected with the rection 
Varga 14 
of participle as verbal derivate may be handled 
separately from the problems connected with the 
participles as secondary parts of speech being em- 
bedded in the structure of the sentence. 
5- The advantages of dependency grammars derived 
from the factthat they could draw conclusions with 
respect to the type of the relations taking no ac- 
count of the arrangement of the words in the 
Structure of the sentence only by examining sepa- 
rate concrete words. With respect to some local 
units the same holds in the case of an IC analysis 
as well. Such ~ local examinations can be used as 
input information to further analysis, and on the 
other hand, ~hey may effect the reduction of the 
number of the possibilities to be considered. 
1. A typical local problem is represented 
by the m6rphological analysis which means /in com- 
mon parlance/ the determination of the grammatical 
properties of separate wor~s. 
2. As local problem may be considered, for 
instance, the agreement of the substantive with 
the immediately preceding adjective/s and/or pre- 
position in Russian. /The risk to make a mistake 
Varga 15 
is minimal, ~ a~nougn it is no~ entirely unlikely 
because of the adjectives that may be used as sub- 
stantives too: 
B CTOaOBOE ~eBy~Ee ~aa~ o6e~. 
Such preliminary examination of compatibility is 
of great impo~ance in MT because "hereby the number 
of case homonymies may be reduced essentially. 
3. We place the examination of the possibi- 
lities of extension or of the realization of these 
possibilities among the local problems, at least 
insofar as it provides preliminary information for 
the analYsis. The nu~oer of these possibilities is 
limited and is characteristic of the language under 
consideration. First, in what ~irection and second, 
v~hat kind of gra~natical and lexical methods may be 
used for the extension, the continuation of a word 
or structure. It is highly revealing to examine how 
a given structure can be extented starting from~a 
single sentence kernel /i.e. not from several full 
sentences/. So, for instance, in English~ 
Sometimes a decision to compute is followed 
by a process of selecting the particular kind 
of computingmachine best suited for the given 
problem. 
Varga 16 
o~ 
The designer should be careful in choosing 
circuit designs that he not build in addi- 
tional difficulties with a choice of a par- 
ticular circuit in an attempt to eliminate 
other difficulties. 
The same grammatical relations would be expressed 
in Htu~garian oz in Russian in entirely different 
ways. /We would have full clauses instead of par- 
ticiples in IIungarian , in Russian the participles 
would be replaced by substantives derived from 
verbs/. 
4. Semantic information may also be use~ for 
the reduction of the possibilities in the case of a 
partial analysis of ambiguous structures. /In case 
of no ambiguity it makes no sense to use semantic 
information if we assume that the input sentences ' 
are impeccable not only grammatically but also se- 
mantically, cf.l.2/.Notice that the constructional 
homonymy extending over the whole sentence is 
• rather unusual, we have, however, frequent cases of 
ambiguous structures ~ithiu sentences. So, for 
instance, in Russian the string ',BCXeACTBHe ~py~x 
BaEOHOB Co~TpaHeH~j a Ta~xe Oc06eHHOOTe~ BsaH~o~e~- 
CTB~S ~acTn~" 
Varga 17 
may have 7 different bracketings, i.e. 7 different 
structures. If there are several syntactically 
ambiguous structures in the sentence then it woul8 
be unnecessary to carry out a new. syntactic analy- 
sis for each of them: if we can localize the ambi- 
guous structure the production of all possible sen- 
tencestructures is merely a matter of combination. 
The mentioned local problems need not be:incorpo- 
rated into the main program, i.e. the proper syn- 
tactic analysis. A considerable part of them may be 
carried out either previously or simultaneously 
with the morphological analysis, while other pro- 
blems may be solve8 as subsidiary operations, in 
each case separately, when some rules are applied, 
if necessary. 
6. The crucial point of the syntactic analysis of 
the whole sentence /i.e. not of the form of the 
rules, but of the strategy of their application/ is 
the problem where to begin the analysis, i.e. at 
which word of the sentence \[4\]. Lees says with 
respect to the order in which the transformational 
rules must be applied, that one has to begin with 
V arga 18 
the constituent sentence that is embedded deepest 
anc~ that further transformations can only be 
applied to matrix sentences previously "satisfied". 
This holds -mutatis mutandis - with respect to 
the simplest structures, word groups as well. /Na- 
mely, assuming that ~,e begin with the analysis of ~ 
the given string to be examined,i.e.from the bottom 
of the tree. The other possibility is to begin from 
the top of presu;posed tree diagram , i.e. 
with the hierarchy of the given system of rules. 
This pathhas been followed in predictive analysis/. 
A basic problem is the determination of the struc- 
ture that is embedde~ deepest in some other struc- 
tures. If we have succeeded in determining this 
structure then we could obtain the analysis of ra- 
ther complicated sentences by a stepwise processing 
of the embe~6ed structures in a rather simple way. 
Naturally, if it is wanted that an erroneous step 
should not ~estroy the whole analysis the ~ifferent 
possibilities must be remembered by the algorithm. 
A suitable algorithm worked out by B&lint Ddm~lki~5~ 
could be used with only slight alterations fDr the 
analysis meeting the above requirement. 
~e can considerably diminish the number of the unne- 
Varga 19 
eessary blind alleys by taking into consideration 
the type of the language huder consideration. As to 
Russian, for instance,the right recursive rewriting 
rules prevail. There is a right recursivity, for 
instance, in the case of substantival complements 
"~ ~'" ,a~jectives,oarticiples connected with suJstan~l~es 
or the participial constructions embedded in each 
other etc. According to Yngve's terminology we can 
say that a considerable part of the Russian struc- 
tures are of the progressive type.As a consequence, 
the tree diagram of the sentence is in most cases 
characterized by right-branchin~ /or at least this 
holds for some subtrees of most structures/ In this 
case, however, we arrive at the deepest part of the 
right-branching tree in the simplest way if we begin 
the analysis at the end of the sentence. To put it 
differently, if we consider the sentence structure 
given by a bracket expression then in the case of 
progressive lan~aages we have often a ease of the 
brackets accumulating at the end but not at the be- 
ginning of the sentence. To take a simple example, 
we have in Russian such sentences as 
(BH (sHaeTe (MHO~O (Teope~ (o npe~ezax~)D 
If we began the amalysis at the beginning of the 
sentence, we should have to try connecting quite a 
I 
Varga 20 
few words ant structures that are in fact separated 
by brackets, that is that are not connected with 
each other. If we start, however ,at the on6 of the 
sentence an@ embed the obtaine8 symbol oorresponSing 
to the structure ~iscovered till that moment into 
subsequent structures ~'e can arrive at the correct 
i 
analysis of the whole string more quickly ant with 
less effort. 
a~ 2! 

References 

L1j Cf. ibraham, S.~ Some ques~zons of =hra~e ot~ue- 
linsuistics IV /.~_~nou ..... ~/ 

\[2\] EvzarzHa,~ 0. C. Kcno~soo~a~ne~ ~ -,, ",,'~',-~z~-T-" s ~cc,~e~o-,.~ ~' 
onocode aHaJ~sa ~e:ce~aj E#od- 
~ez.~x r::depHeTz.~Tz{~,i 12, pp.233-7 

BaHl~£X no ~azzKHO~J nepeBo~y, 
....... e.p.A ~ . .,L.,," i0, 
pp. 205-2i8 

F L 3\] Chomsky~ :~., 
Categories au~ Relations in 
Syntaeted Theory, mimeographed, 
PN 7 ~ 1964 

L*~ Varga,D., 
Yngve' s Hypothesis and Some 
Problems of the Kechanical 
Analysis, 13ozputational !ingTai s- 
ties IiI, pp.4T-74 

 5\] D~m~Iki,B., 
An Algorithm fez Syntactic Ana- 
lysis, Computational Linguis- 
tics iII, pp.29-46 
