2 
" 196 5 International Conference on Computational Linguistics" 
SUBCLASSIFICATION OF PARTS OF SPEECH IN RUSSIAN: VERBS ~ 
A. Andreyewsky 
Inte rnatlonal Busine s s Machine s Corporation 
Thomas 3. Watson Research Center 
P. O. Box Z18 
Yorktown Heights, New York, 10598 
/,.~'.,~ .... ".'.,..-,~ \ 
• Thls work was partly sponsored by the Information Processing Laboratory, 
Rome Air Development Center, United States Air Force, under Contract 
AF 30(60Z)-3301 
Andreyewsky I 
Abstract 
In a trial study, about 500 Russian verbs were coded using 44 
potential classificatory criteria. Through sorting and the introduction 
of a metric, numerous groupings were obtained. Initial results suggest 
that, with proper refinements, the approach described cab provide use- 
ful information that may be employed in syntactic analysis and certain 
information retrieval applications. 
0. 0 Introduction 
As part of a broader effort to extend the existing traditional part- 
of-speech classification in modern Russian, this study of verbs is oriented 
toward developing an improved basis for syntactic analysis. Moreover, 
it is hoped that the refinements introduced will be of interest in content 
analysis. To this end, an extensive set of potential classificatory cri- 
teria has been selected, in the hope that eventually this categorization 
can be optimized and extended to other parts of speech. 
I. 0 The Experiment 
The 514 verbs analyzed came from two sources: (a) a randomized 
sample of 370 entries ( I ) and (b) a list of the most frequently used 
Russian verbs ( Z ) from which the first 144 entries were selected. 
The classificatory criteria, subdivided into two groups, are 
discussed in Section I. 1 below. Generally, each verb was taken in a 
particular meaning (stirat', for instance, as "to erase" and not as "to 
launder") and English equivalents used solely for purposes of identifica- 
tion. At the same time, for reasons of convenience, provisions were 
made in coding to allow for coexisting alternatives. Thus, for proper- 
ties A and B there can be four posBibilities which are represented by the 
following numerical codes: i - "A", Z - "B", 3 - "AB", 0 - "neither 
applle s" 
After the verbs and appropriate codes were punched on cards, 
verbs with identical codes were compared. To obtain additional cluster- 
ing, a program, written by R. F. Hubbard for the IBM 7040, compared 
the code vector of each card against those of the rest of the sample. 
The distance between any two entries was calculated by taking the 
square root of the sum of the squares of distance between correspond- 
ing positions in their code vectors as defined by the following table: 
Andreyewsky 2 
0--0 = 0 0--2= 4 I~i : 0 I~3 = i 2~3 = i 
0--I=40--3:6 l~Z=2 Z--2=03~3:0 
i. i Tests 
Since one of the main objectives of this study has been to estab- 
lish the relevance of various classificatory criteria, these were tested 
in two groups as described below. The selection of criteria, based on 
studies of existing grammars of Russian, was directed toward discov- 
ering solutions for problems arising or likely to arise in machine- 
assisted syntactic analysis. 
I. i. i Test I 
In this test, the verbs were coded according to their ability to 
combine with selected prepositional phrases, certain adverbs, and the 
chto-introduced object clauses. Most of the examples are derived from 
the discussion of slovosochetaniye (grammatically bound word group) 
problem in the Academy Grammar ( 3 ). While the English meanings 
supplied do reflect certain semantic differences the main objective has 
been to test not only the ability of a given verb to co-occur with certain 
types of phrases (examples are used solely for illustration) or classes 
of adverbs but to trace what effect the verb has on their syntactic func- 
tion, if any. 
i. i. I. i Classificatory Criteria 
1) . . . do menya 4) . . . k mttin~u 7) . . . u Zin)r 
(A) \]~efore me (A) for the ... (A) at Ztna's 
(B) as far as me (B) to the ... (B) from Zina 
Z) . . . do rassveta meeting 8) ... pod kapustu 
(A) before dawn 5) ... k nam (A) for ca-bbage 
(B) until dawn (A) to us (B) under cab- 
3) ... iz-za stola (B) toward us bage 
(A) because of... 6) ...za obedom 9) ...za stol 
(B) from behind (A) after (to get)... (A) at the 
the table (B) during dinner table 
(B) be hind the 
table 
Andreyewsky 3 
10) .°.za brata lZ) ...yashchtk 15) ...chto napishet* 
(A) in brother's iz-pod,u~lya that + (subject) + 
place (A) coal crate will write 
(B) for brother's (B) crate from 16) nadvoe* 
sake under the coal "'" in two (as in 
11) ... po o shibke 13) ...o stol* cutting) 
(A) a mistake against the table 17) . .. ochen'* 
apiece 14) . . . po vodu* very much (B) by mistake 
to get water 18) ...o sestre* 
about the sister 
1. 1. 1. Z Results of Sorting 
Sorting revealed some of the following groupings with identical 
code s: 
A1 zakipet' nameknut' A8 vbezhat' 
(to boil) (to hint) (to run in) 
prosolit'sya A4 raznuzdat' yavit'sya 
(to turn salty) (to let become (to appear) 
A2 vzdrognut' undisciplined) A9 podumat' (to think) 
(to flinch) vo spitat' bredit' 
ustavat' (to educate) (to rave) 
(t0 become A5 vychest' okhat' 
tired) (to subtract) (to moan) 
ustat' Izderzhat' (to become (to spend) AI0 ~ordit's)ra 
tired) (to be proud) 
izz~rabnut' A6 otrubit' vesellt' sya 
(to become (to chop off) (to enjoy self) 
chilled) vskryt' voskhishchat sya 
(tO open up) \[to admire) A3 sovrat' 
(to tell a lie) A7 nabryz~at' All verlt' 
soobrazlt' (to sprinkle on) (to believe) 
(to grasp) rasprostranit \[. to skovat' 
dogadat'sya (to spread) (to be sad, 
(to surmise) to pine) 
* Only test ability to combine in the meaning indicated. 
Andreyewsky 4 
AI2 
AI3 
AI4 
AI5 
grustit' 
(to be sad, 
to yearn) 
s kuc hat ' 
(to be bored) 
fantazlrovat' 
(to dream) 
volnovat' sya 
(to worry) 
opasat' sya 
(to be afraid) 
z apryat~at' 
(to hide) 
vovle kat' 
(to draw in) 
berech' 
{to save) 
poberech' 
(to save) 
ude rzhat' 
(to withhold) 
vzgromozdit' sya 
(to perch self 
uglubit' s)ra 
(to go deep into) 
r as sazhivat' 
(to seat) 
AI6 
AI7 
AI8 
AI9 
AZ0 
nastroit' 
(to incite) 
bespokoit' 
(to distt~rb) 
obizhat' 
(to offend) 
proklinat' 
(to damn) 
portit' 
(to spoil, ruin) 
bakhvalit' sTa 
(to brag) 
llkovat' 
(to rejoice) 
razr)rkhlyat' 
(to loosen) 
razdroblt' 
(to pulverize) 
be sedovat' 
(to converse) 
sore shchat' sya 
(to confer) 
razodrat' 
(to tear) 
rasshibit' 
(to break, bust) 
AZI 
AZZ 
AZ3 
AZ4 
AZ5 
AZ6 
AZ7 
morosit' 
(to drizzle) 
nakrapyvat' 
(to sprinkle) 
por o shit' 
(to snow) 
farshirovat' 
(to stuff) 
slnte zir ovat' 
(to synthe size) 
kla s slfit sir ovat' 
(to classify) 
razbivat' 
(to break) 
begat' 
(to run) 
prikhodit' 
(to come) 
ne sti 
(to carry) 
ve zti 
(to cart) 
vol o c hit ' 
(to drag) 
tashchit' 
(to pull) 
(to reach, (walking)) 
doletet' 
(to reach (flying)) 
1. 1. 1. 3 l~esudts of the Introduction of the Metric 
On the basis of preliminary results, the maximum distance con- 
sidered was set at 10. Given this arbitrary limitation, the metric produced 
various groupings. The majority of them contained some "noise" - i. e. , 
apparently incorrect entries were brought together or several distince 
groupings turned out insufficiently differentiated. Partly responsible for 
this are: the method employed, the distances selected, and the occasional 
errors that crept in during the analysis and subsequent processing. These 
factors are discussed in greater detail below ( 1. 1. 1. 4). 
Andreyewsky 5 
AZ8 
AZ9 
A30 
A31. 
A3Z 
A33 
A34 
A35 
Some of the more interesting outcomes were as follows: 
Groups All (verlt', toskovat', srustit', skuchat', and fantazirovat'), 
A 17 (bakhvallt' sya and likovat'), A 1Z (volnovat' sya and opasat' sya), 
and the verb bespokoit'sya (to worry) 
Group AI6 (nastroit', bespokoit', obizhat', proklinat', portit')and 
the verb nenavidet' (to hate). 
Group A 10 (voskhishchyat' sya, ve selit' sya, ~ordlt' sya) and verbs 
vozmutlt' sya (to become disgusted) and boyat' s)ra (to be afraid). 
Group A8 (yavit'sya, vbezhat'), the following verbs: vernut'sya (to 
return), prlkhodit' (AZ4), begat' (AZ4), ~ (to step out), podyezzhat' 
(to drive up), ),ezdit' (to ride), vyekhat' (to go away), kinut'say (to 
lunge), vypolzti (to crawl out), doletet' (AZ7), and doyti (AZ7). 
sarantirovat' (to guarantee), pokazyvat' (to show), demonstrirovat' 
(to demonstrate) 
sovrat' (to lie), poverit' (to believe), uverlt' (to assure) 
znat' (to know), ozhldat' (to expect), videt' (to see). 
na~ryanut' (to come unexpectedly), zaekhat' (to stop by), probezhat'sya 
(to run), otstupit' (to retreat). 
i. i. I. 4 Comments 
The problems stemming from the application of the metric (th "num- 
bers game") mentioned in I. I. I. 3 reflect a characteristic of statistical infer- 
ence jocularly compared by an anonymous author to a bikini bathing suit: 
being sufficiently suggestive, but not revealing. In this regard, alternative 
approaches have been considered and will be tried in the near future. As 
it turned out in practice, however, the metric did provide useful insights 
which can point the way toward developing a more powerful set of classi- 
ficatory criteria. This, in turn, can foster increased reliance on simple 
sorting procedures based on proper ranking and grouping of the criteria 
themselves. 
While not unexpectedly, the verbs of motion in the broad sense of 
the term came out more clearly in the classification than did any other 
groups, interesting subclasses of abstract verbs, exhibiting unexpected 
shades of valuation also emerged. 
Andreyewsky 6 
1. 1. Z Test II 
In contrast to Test I, this test placed a relatively lesser emphasis 
on syntagmatlc relationships and stressed a mixture of formal and seman- 
tic properties. On the whole, except where noted, the two tests were 
developed independently of one another. While Test I was based on mater- 
ials derived from the Academy Grammar of Russian ( 3 )0 Test II bene- 
fited from experience gained in dealing with the problems encountered in 
machine translation output and from studies conducted preparatory to 
launching syntactic analysis. 
1. 1. Z. 1 Classtficato,ry Criteria 
In view of the extensive nature of this test, the description of vari- 
our criteria used is given here in abbreviated notation. 
I) (A) imperfective 
(B) perfe ctive 
Z) verb (I/3) or 
"verboid" (Z/0); 
"concrete" (I/0) 
or "abstract" 
(Z/3): when "yes" 
answer is possible 
under I. i. I. I. 17. 
3) is ~ form 
(A) reflexive 
(B) non-reflexive 
4) generally: 
(A) non-reflexive 
(B) reflexive 
5) when reflexive, 
meaning: 
(A) active 
(B) passive 
6) participial forms: 
(A) active 
(B) passive 
7) passive participle: 
(A) past 
(B) present 
8) gerundial forms: 
(A) present 
(B) past 
9) action (gerund): 
(A) parallel 
(B) sequential 
I0) deverbal nouns: 
(A) in -enle, -ks 
(B) other forms 
II) deverbal nouns: 
(A) concrete 
(B) abstract 
1Z) verb used: 
(A) personally 
(B) impersonally 
13) verb function: 
(A) link, auxillary 
(B) other 
14) meaning affected by 
(A) governed infinitive 
(B) object(s) 
15) subject preference: 
(A) inanimate 
(B) animate 
16) verb governs: 
(A) infinitive s 
(B) objects 
17) object preference: 
(A) animate 
(B) inanimate 
18) (A) motion verb 
(broad sense) 
(B) action perceived 
19) verb describes: 
(A) action 
(B) state 
Z0) (A) beginning 
(B) end of action 
Andreyewsky 7 
21) verb is one of: 
(A) being 
(B) becoming 
ZZ. action described: 
(A) outward- 
(B) inward- 
directed 
23) action directed: 
(A) downward 
(away) 
(B) upward 
(toward) 
24) action in respect 
to object: 
(A) contacts 
(B) permeates 
ZS) reference to: 
(A) duration 
(B) intensity 
Z6) action produces: 
(A) decrease 
(B) increase 
g7) action describes: 
(A) gain 
(B) loss 
i. i. Z. Z Results of Sortln~ 
The following groupings had identical codes: 
BI skuchat' (All) 
(to be bored) 
to skovat' (A 1 I) 
(to be sad) 
BZ morosit 4 (AZI) B5 
(to drizzle) 
poroshit' (Agl) 
(to snow) 
B3 nakrap)rvat' (AZI) 
(to sprinkle) 
mertsat' B6 
(to twinkle) 
B4 izderzhat' 
(to expend) 
Istratit t 
(to spend) 
nosit' 
(to carry) 
ta shc hit ' 
(to pull) 
vol o c hit ' 
(to drag) 
podshivat' 
(to attach) 
B7 
navyazyvat' 
(to tie on) 
skladyvat' 
(to put together) 
ozhivit' 
(to vlvfy) 
uve rlt' 
(to assure) 
1. 1. Z. 3 Results of the Introduction of the Metric 
Comments made in I. I. I. 3 above, apply. Because of a greater 
number of classificatory criteria the results of introducing the metric were 
more important in this test. Numbers in parentheses preceding each verb 
indicate distances from the first verb in the group. 
B8 pr, idavlt' B9 vosstat' BI0 prlche sat' 
(to squeeze) (to riot) (to comb) 
(I) p rishchemlt' (I) vystupit' (I) zaputat' 
(to pinch) (to appear) (to tangle) 
Andreyewsky 8 
BII vbezhat' B17 otkryt' B23 nestls' 
(to run in) (to open) (to dash) 
(I) vTpolztl (5) ubavlt' (7) bezhat' 
(to crawl out) (to decrease) (to run) 
B12 napevat' B18 vynestl B24 vydelit' 
(to hum) (to carry out) (to single out) 
(P.) veshchat' (5) vypustit' (7) vypisat' 
(to speak with (to let out) (to write out) 
authority) BI9 zheltet' BZ5 potusknet' 
BI3 temnet' '(to turn yellow) (to dull) 
(to grow:dark) (5) umirat' (7) zatverdet' 
(2) teplet' (to die) (to harden) 
(to grow warm) B20 terrorizirovat' B26 prikrepit' 
Bl4 vyrabotat' (to terrorize) (to fasten) 
(to develop) (5) khvalit' (8) nav'yuchit' 
(3) vyuchlt' (to praise) (to pack on) 
(to learn) BZI viset' BZ7 vozvratit' 
BI5 khmurit'sya (to hang) (to return) 
(to frown) (6) lezhat' (8) dopolnlt' 
(3) tumanit'sya (to lie) (to augment) 
(to grow gloomy) B22 podognat' BZ8 vvesti 
B16 razbushevat'sya (to drive up) (to introduce) 
(to start raging) (6) navestit' (9) dobavlt' 
(4) uchastlt's)ra (to visit) (to add) 
(to become more 
f r e clue nt ) 
In addition to shorter groups described above, longer groupings 
were observed. Thus, otdokhnut' (to rest) (8) utikhnut' (to quiet down), and 
(10) ugasnut' (to become extinguished} or nabryz~at' (to sprinkle on), (Z) 
nakinut' (.to throw on), (3) vzvalit' (to pile on), and (4) nastrocit' (to sew on) 
are some of the examples. 
In other cases, apparently incongruous groups llke the following: 
strekotat' (to chirr), (1) moshennlcat' (to swlndle), (5) fokusnlchat' (to juggle), 
(5) nakrapyvat' (to sprinkle), (5) mertsat' (to twinkle) (6) zvenet' (to ring) 
emerged. However, upon closer examination it became apparent that 
nakrapyvat', mertsat', and zvenet' fall in a group clearly distinguishable from 
the one containing the other verbs. Further, fokusnlchat' and zvenet' showed 
sufficient distance within re spective groups sugge sting at least four different 
basic groups in all. 
Andreyewsky 9 
1. 1. Z. 4 Comments 
Aside from the problems traceable to statistics, the sets of cr iterla 
selected for Test II are more open to debate than those found in Test I. How- 
ever, correlations between both tests indicate that some of the criteria are 
relevant and that others are, at least, redundant. As observed from minor 
differences in two versions of coding of nine verbs introduced six months 
apart, the results of Test II are less reliable. 
I. 1.3 Comparison of Test I and Test II 
As noted in 1. 1. 2 above, the two tests differ in the base from which 
they were derived. Accordingly, the results obtaining from Test I are both 
intuitively and actually more reliable. Yet, as suggested in 1. 1. 1.4, to the 
extent that the results of the application of the metric tend to supplement 
sorting, the results of Test II tend to back up many of the findings of Test I. 
Given a small sample, it is difficult to make any generalizations. At 
the same time, the evidence emerging so far suggests some subtle differences 
in the two tests. Basically, in both cases the results of the metric applica- 
tion show little or no discrimination between antonyms. However, the group= 
Ings resulting from Test II tend to be, if at all, held together by similarity 
of content, the results of Test I, in contras% have a peculiar sort of out- 
ward, formal similarity in the manifestation of processes described by the 
verbs in question. 
Z. 0 The Outlook 
In the months ahead, it is hoped that the small corpus can be increased 
and the time required to code each entry reduced to reasonable proportions. 
While in many respects the results of:both tests are self-proving, rigorous 
evaluation criteria will have to be formulated in detail. 
As far as potential application of the results obtained is concerned, 
especially the information derivable from Test I could be immediately put to 
use to improve (together with classification of nouns currently in progress) 
the translation of verb-governed prepositional phrases. It is likely that this 
syntagmatic patterning will extend to larger structures dominated by the 
verb. Further, if the apparent trends persist, some framework of semantic 
classification can be anticipated. To what extent this will be possible to 
accomplish by computers alone and the degree to which such ~t classification 
Andreyewsky 10 
will satisfy the needs of computer processing remains to be established. 
While it can be argued that any classification is likely to produce some 
classes, we take solace in the fact that the methodology employed even in 
such classics as Roget's Thesaurus remains unknown to this day. 
. This sample was selected from the Daum and Schenck Dictionary in 
another connection and was generally random in its intent more than 
its methodology. 

References 

. A. K. Demidova, O. G. Motovilova, G. D. Shevchenko, E. P. Chaplygln, 
Naiboleye upotrebltel'nyye ~lagoly sovremennogo russko~o yazyka (The 
Most Frequently Used Verbs in Modern Russian), Moscow, USSR Academy 
of Sciences Publishlng House, 1963. 

. V. V. Vinogradov, ed. , Grammatika russkogo yazyra (Grammar of the 
Russian Language) Moscow, USSR Academy of Sciences Publishing House, 
1960, Vol. II, Part I, pp. I13-2B0. 
