File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/65/c65-1002_abstr.xml
Size: 13,692 bytes
Last Modified: 2025-10-06 13:45:45
<?xml version="1.0" standalone="yes"?> <Paper uid="C65-1002"> <Title>SUBCLASSIFICATION OF PARTS OF SPEECH IN RUSSIAN: VERBS ~</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> In a trial study, about 500 Russian verbs were coded using 44 potential classificatory criteria. Through sorting and the introduction of a metric, numerous groupings were obtained. Initial results suggest that, with proper refinements, the approach described cab provide useful information that may be employed in syntactic analysis and certain information retrieval applications.</Paragraph> <Paragraph position="1"> 0. 0 Introduction As part of a broader effort to extend the existing traditional part-of-speech classification in modern Russian, this study of verbs is oriented toward developing an improved basis for syntactic analysis. Moreover, it is hoped that the refinements introduced will be of interest in content analysis. To this end, an extensive set of potential classificatory criteria has been selected, in the hope that eventually this categorization can be optimized and extended to other parts of speech.</Paragraph> <Paragraph position="2"> I. 0 The Experiment The 514 verbs analyzed came from two sources: (a) a randomized sample of 370 entries ( I ) and (b) a list of the most frequently used Russian verbs ( Z ) from which the first 144 entries were selected.</Paragraph> <Paragraph position="3"> The classificatory criteria, subdivided into two groups, are discussed in Section I. 1 below. Generally, each verb was taken in a particular meaning (stirat', for instance, as &quot;to erase&quot; and not as &quot;to launder&quot;) and English equivalents used solely for purposes of identification. At the same time, for reasons of convenience, provisions were made in coding to allow for coexisting alternatives. Thus, for properties A and B there can be four posBibilities which are represented by the following numerical codes: i - &quot;A&quot;, Z - &quot;B&quot;, 3 - &quot;AB&quot;, 0 - &quot;neither applle s&quot; After the verbs and appropriate codes were punched on cards, verbs with identical codes were compared. To obtain additional clustering, a program, written by R. F. Hubbard for the IBM 7040, compared the code vector of each card against those of the rest of the sample.</Paragraph> <Paragraph position="4"> The distance between any two entries was calculated by taking the square root of the sum of the squares of distance between corresponding positions in their code vectors as defined by the following table:</Paragraph> <Paragraph position="6"> Since one of the main objectives of this study has been to establish the relevance of various classificatory criteria, these were tested in two groups as described below. The selection of criteria, based on studies of existing grammars of Russian, was directed toward discovering solutions for problems arising or likely to arise in machine-assisted syntactic analysis.</Paragraph> <Paragraph position="7"> I. i. i Test I In this test, the verbs were coded according to their ability to combine with selected prepositional phrases, certain adverbs, and the chto-introduced object clauses. Most of the examples are derived from the discussion of slovosochetaniye (grammatically bound word group) problem in the Academy Grammar ( 3 ). While the English meanings supplied do reflect certain semantic differences the main objective has been to test not only the ability of a given verb to co-occur with certain types of phrases (examples are used solely for illustration) or classes of adverbs but to trace what effect the verb has on their syntactic function, if any.</Paragraph> <Paragraph position="8"> i. i. I. i Classificatory Criteria 1) . . . do menya 4) . . . k mttin~u 7) . . . u Zin)r (A) \]~efore me (A) for the ... (A) at Ztna's (B) as far as me (B) to the ... (B) from Zina Z) . . . do rassveta meeting 8) ... pod kapustu (A) before dawn 5) ... k nam (A) for ca-bbage (B) until dawn (A) to us (B) under cab3) ... iz-za stola (B) toward us bage (A) because of... 6) ...za obedom 9) ...za stol (B) from behind (A) after (to get)... (A) at the the table (B) during dinner table (B) be hind the table Andreyewsky 3 10) .deg.za brata lZ) ...yashchtk 15) ...chto napishet* (A) in brother's iz-pod,u~lya that + (subject) + place (A) coal crate will write (B) for brother's (B) crate from 16) nadvoe* sake under the coal &quot;'&quot; in two (as in 11) ... po o shibke 13) ...o stol* cutting) (A) a mistake against the table 17) . .. ochen'* apiece 14) . . . po vodu* very much (B) by mistake to get water 18) ...o sestre* about the sister 1. 1. 1. Z Results of Sorting Sorting revealed some of the following groupings with identical code s: tired) (to subtract) (to moan) ustat' Izderzhat' (to become (to spend) AI0 ~ordit's)ra tired) (to be proud) izz~rabnut' A6 otrubit' vesellt' sya (to become (to chop off) (to enjoy self) chilled) vskryt' voskhishchat sya (tO open up) \[to admire) A3 sovrat' (to tell a lie) A7 nabryz~at' All verlt' soobrazlt' (to sprinkle on) (to believe) (to grasp) rasprostranit \[. to skovat' dogadat'sya (to spread) (to be sad, (to surmise) to pine) 1. 1. 1. 3 l~esudts of the Introduction of the Metric On the basis of preliminary results, the maximum distance considered was set at 10. Given this arbitrary limitation, the metric produced various groupings. The majority of them contained some &quot;noise&quot; - i. e. , apparently incorrect entries were brought together or several distince groupings turned out insufficiently differentiated. Partly responsible for this are: the method employed, the distances selected, and the occasional errors that crept in during the analysis and subsequent processing. These factors are discussed in greater detail below ( 1. 1. 1. 4). Some of the more interesting outcomes were as follows: Groups All (verlt', toskovat', srustit', skuchat', and fantazirovat'), A 17 (bakhvallt' sya and likovat'), A 1Z (volnovat' sya and opasat' sya), and the verb bespokoit'sya (to worry) Group AI6 (nastroit', bespokoit', obizhat', proklinat', portit')and the verb nenavidet' (to hate).</Paragraph> <Paragraph position="9"> Group A 10 (voskhishchyat' sya, ve selit' sya, ~ordlt' sya) and verbs vozmutlt' sya (to become disgusted) and boyat' s)ra (to be afraid). Group A8 (yavit'sya, vbezhat'), the following verbs: vernut'sya (to return), prlkhodit' (AZ4), begat' (AZ4), ~ (to step out), podyezzhat' (to drive up), ),ezdit' (to ride), vyekhat' (to go away), kinut'say (to lunge), vypolzti (to crawl out), doletet' (AZ7), and doyti (AZ7). sarantirovat' (to guarantee), pokazyvat' (to show), demonstrirovat' (to demonstrate) sovrat' (to lie), poverit' (to believe), uverlt' (to assure) znat' (to know), ozhldat' (to expect), videt' (to see). na~ryanut' (to come unexpectedly), zaekhat' (to stop by), probezhat'sya (to run), otstupit' (to retreat).</Paragraph> <Paragraph position="10"> i. i. I. 4 Comments The problems stemming from the application of the metric (th &quot;numbers game&quot;) mentioned in I. I. I. 3 reflect a characteristic of statistical inference jocularly compared by an anonymous author to a bikini bathing suit: being sufficiently suggestive, but not revealing. In this regard, alternative approaches have been considered and will be tried in the near future. As it turned out in practice, however, the metric did provide useful insights which can point the way toward developing a more powerful set of classificatory criteria. This, in turn, can foster increased reliance on simple sorting procedures based on proper ranking and grouping of the criteria themselves.</Paragraph> <Paragraph position="11"> While not unexpectedly, the verbs of motion in the broad sense of the term came out more clearly in the classification than did any other groups, interesting subclasses of abstract verbs, exhibiting unexpected shades of valuation also emerged.</Paragraph> <Paragraph position="13"> In contrast to Test I, this test placed a relatively lesser emphasis on syntagmatlc relationships and stressed a mixture of formal and semantic properties. On the whole, except where noted, the two tests were developed independently of one another. While Test I was based on materials derived from the Academy Grammar of Russian ( 3 )0 Test II benefited from experience gained in dealing with the problems encountered in machine translation output and from studies conducted preparatory to launching syntactic analysis.</Paragraph> <Paragraph position="14"> 1. 1. Z. 1 Classtficato,ry Criteria In view of the extensive nature of this test, the description of variour criteria used is given here in abbreviated notation.</Paragraph> <Paragraph position="15"> i. i. Z. Z Results of Sortln~ The following groupings had identical codes: 1. 1. Z. 3 Results of the Introduction of the Metric Comments made in I. I. I. 3 above, apply. Because of a greater number of classificatory criteria the results of introducing the metric were more important in this test. Numbers in parentheses preceding each verb indicate distances from the first verb in the group.</Paragraph> <Paragraph position="16"> B8 pr, idavlt' B9 vosstat' BI0 prlche sat' (to squeeze) (to riot) (to comb) (I) p rishchemlt' (I) vystupit' (I) zaputat' (to pinch) (to appear) (to tangle) (to hum) (to carry out) (to single out) (P.) veshchat' (5) vypustit' (7) vypisat' (to speak with (to let out) (to write out) authority) BI9 zheltet' BZ5 potusknet' BI3 temnet' '(to turn yellow) (to dull) (to grow:dark) (5) umirat' (7) zatverdet' (2) teplet' (to die) (to harden) (to grow warm) B20 terrorizirovat' B26 prikrepit' In addition to shorter groups described above, longer groupings were observed. Thus, otdokhnut' (to rest) (8) utikhnut' (to quiet down), and (10) ugasnut' (to become extinguished} or nabryz~at' (to sprinkle on), (Z) nakinut' (.to throw on), (3) vzvalit' (to pile on), and (4) nastrocit' (to sew on) are some of the examples.</Paragraph> <Paragraph position="17"> In other cases, apparently incongruous groups llke the following: strekotat' (to chirr), (1) moshennlcat' (to swlndle), (5) fokusnlchat' (to juggle), (5) nakrapyvat' (to sprinkle), (5) mertsat' (to twinkle) (6) zvenet' (to ring) emerged. However, upon closer examination it became apparent that nakrapyvat', mertsat', and zvenet' fall in a group clearly distinguishable from the one containing the other verbs. Further, fokusnlchat' and zvenet' showed sufficient distance within re spective groups sugge sting at least four different basic groups in all.</Paragraph> <Paragraph position="18"> Andreyewsky 9 1. 1. Z. 4 Comments Aside from the problems traceable to statistics, the sets of cr iterla selected for Test II are more open to debate than those found in Test I. However, correlations between both tests indicate that some of the criteria are relevant and that others are, at least, redundant. As observed from minor differences in two versions of coding of nine verbs introduced six months apart, the results of Test II are less reliable.</Paragraph> <Paragraph position="19"> I. 1.3 Comparison of Test I and Test II As noted in 1. 1. 2 above, the two tests differ in the base from which they were derived. Accordingly, the results obtaining from Test I are both intuitively and actually more reliable. Yet, as suggested in 1. 1. 1.4, to the extent that the results of the application of the metric tend to supplement sorting, the results of Test II tend to back up many of the findings of Test I. Given a small sample, it is difficult to make any generalizations. At the same time, the evidence emerging so far suggests some subtle differences in the two tests. Basically, in both cases the results of the metric application show little or no discrimination between antonyms. However, the group= Ings resulting from Test II tend to be, if at all, held together by similarity of content, the results of Test I, in contras% have a peculiar sort of outward, formal similarity in the manifestation of processes described by the verbs in question.</Paragraph> <Paragraph position="20"> Z. 0 The Outlook In the months ahead, it is hoped that the small corpus can be increased and the time required to code each entry reduced to reasonable proportions. While in many respects the results of:both tests are self-proving, rigorous evaluation criteria will have to be formulated in detail.</Paragraph> <Paragraph position="21"> As far as potential application of the results obtained is concerned, especially the information derivable from Test I could be immediately put to use to improve (together with classification of nouns currently in progress) the translation of verb-governed prepositional phrases. It is likely that this syntagmatic patterning will extend to larger structures dominated by the verb. Further, if the apparent trends persist, some framework of semantic classification can be anticipated. To what extent this will be possible to accomplish by computers alone and the degree to which such ~t classification Andreyewsky 10 will satisfy the needs of computer processing remains to be established. While it can be argued that any classification is likely to produce some classes, we take solace in the fact that the methodology employed even in such classics as Roget's Thesaurus remains unknown to this day.</Paragraph> <Paragraph position="22"> . This sample was selected from the Daum and Schenck Dictionary in another connection and was generally random in its intent more than its methodology.</Paragraph> </Section> class="xml-element"></Paper>