THE RUMORS SYSTEM OF RUSSIAN SYN .\[ HtJSIS 
MaX I. Kanovich, Zoya M. ShMyapina. 
Institute of Oriental Sh~dies, Russian Academy of Sciences, 
Rozhdestvenka sir., 12, 103753 Moscow, Kussi~ 
Abstract 
The RUMORS synthesizer of Russian is a,~ integral 
part of the JM~AP experimental system of ,ht.pn.nese- 
Russian antom.t~tic translation, although it cn,n also 
h~ve other ~ppliczrtions. Morphologically, it is based, 
primtrrily, on A.A.ZMizny~k's model of Russi~t. in- 
flexion. Syntactical functions of RUMORS rely on 
word-order ~nd dependency d~t~ as input informa- 
tion. The synthesizer is implemented on IBM PC, 
MS DOS, in Turbo Pascal. 
1 General information 
The RUMORS system of RUssian MORphologicul 
and Syntactical synthesis has been developed as part 
of the JaRAP experimental system of Jt~p~nese- 
Russian antoma~tic tra~sl~tion, described in (Mod- 
ina,Shaly~pin~,1994). Its operation is, however, com- 
pletely independent of the other components of the 
Jatl2kP system, so that RUMORS could be used in 
any other AT system irrespective of its source l~n- 
guage. Basically, RUMORS constitutes a system in 
its own right, which can also be used for purposes 
other th~n tra~nsl~tion, e.g., as ~ computerized refer- 
ence book of Russian morphology and the simplest 
phenomena of syntactic government a~td agreement 
(for students and teachers of Russian), ~ part of 
spell-checking system, etc. 
The RUMORS synthesizer has two m~jor modes of 
oper~tlon: the QUERY mode ttnd the TASK mode. 
In the QUERY mode of operr~tion, ItUMOItS 
accepts, as its input, a separate syntactico- 
morphological query (entered from the keybo~Lrd) 
which represents the lexeme to be processed ~nd the 
synttmtlcal.~.nd morphological ch araeteristics specify- 
lag the word-form to be obtained by the processing. 
The output is, primarily, the desired word-form of the 
input lexeme or, if necessary, ~ periphrastic ~u6stitute 
for this word-form. 
After obtaining this output, the user c~t switch 
at will to the FULL PARADIGM submode of the 
QUERY mode. In this submode, RUMORS gener- 
~rte8 all synthetic word-forms of the input lexeme (or 
of the l~t lexeme processed while obtaining a pc- 
riphra~tic word-combina.tion). It n\],o offers s. set of 
menus alIowing the user to modify his initial qlmry 
by choosing additions\] morphologicM categorh~s from 
~;lLese It\] (2~rtll S. 
In the TASK mode~ the input d~t~ is i~ sequence. 
of queries fed from the special TASK file. Apart 
from lexical, morphological, and syntacticM inform~ 
tlon contained in eaclt query, the TASK mode of op- 
ertLtion m~kes considerable use of word order (l~t~ 
which is essential, axnong other things, for process- 
ing prepositional, adjectival and noun phrases. The 
output is the sequence of word-forms manifesting the 
phrases or sentences specified by the input sequence 
of queries. 
Both in the QUERY ttnd bt th.e TASK modes, the 
output is displayed on the screen and written slmul- 
taneously in the speci~ SOLVE file. If required by 
the user, it may Mso include the alterations rome in 
the queries processed and the d~tubase informa~tion 
used in their processing. Inasmuch as the simulatio~t 
by RUMORS of ttte linguistic processes involved in 
tl.ussla~t synthesis is faithful enough, this ~uxillary 
d~ttL could be valuable by itself (e.g., for learning 
or teaching Russian), ~ide from its significance for 
debugging and controlling purposes. 
2 Synthesis functions 
envisaged by RUMORS 
2.1 Morphoh)gical functions 
The morphologic~ functlolts of RUMORS cover HI 
aspects of Russia.n in~le:don, as well as some semr~n- 
ticMly bafic lcxico-morphologlcaJ relationships. 
'\['he inlh:xiona\] funetions tLre initiated after the 
input query ha~ ~dreaxty been subjected to syntactical 
and lexieo-morphologlcal operations, which retry lltrve 
modified its htitial form. At this stage, ii: contMas 
nothing but the lexeme and the infiexiontd categories 
specifying its desired word-form. If some categories 
needed for complete specifictrtion of this word-form 
are not explicltly stated in the query, they are set- 
tled by default. E.g., a query containing nothing but 
the lexeme of a verb is taken to describe the finite 
177 
form, indicutlve mood, present tense, active voice, 3d 
person singular of this verb, so that the query, s~y, 
6pam~ produces the form 6epem. 
The inilexional model of Russian implemented by 
RUMORS is the one proposed and detailed by prof. 
A.A.Zaliznyak (1977). Its important virtue is tttat 
the generation procedures it envls~ges represent those 
to be expected of human speakers of Russian more 
faithfully than any other known model, wtfile the req- 
uisite database information is very compact. 
Our verslo~t of ZaliznyaMs inflexionai model dif- 
fers from its description in (Zaliznyabq1977) in two 
respects. On the one haztd~ we have reduced the 
scope of the original Zalizny~k's model, implementing 
it only in so fax as written Russltrn is concerned. As 
tr result, quite a number of the p~rticulars of Russian 
accentuation registered in (Zallznyak,1977), nt~mely, 
all those that axe relev~t for oral speech only, have 
been ignored. 
On the other hand, we h~ve extended the model to 
cover analytical word-forms. Moreover, we have in- 
troduced a new type of morphological functions, the 
periphrastic functions allowing RUMORS to produce 
output that makes sense even if the required worfl- 
form is non-existent (e.g., due to the lexeme h~ving 
a defective paradigm or to the combination of cs.te- 
gorlea in the query being beyond the scope of Rus- 
slan infiexlonal morphology). E.g., the future tense 
ist person singular of the verb 
noge~um~ 'win' 
(which does not h~ve thls form) is p~r~phrased 
e.uo~y noge~urn~ '< I > ~hall be able to win'. 
The lexlco-morphological functions of RU- 
MORS axe limited so fax to conversion between lex- 
emes having essentially similak semantics, but differ- 
ins in their part-of-speech or (for verbs only) aspec- 
tuai characteristics. 'l'hus~ the aspectuM or part-of- 
speech markers in the following three queries 
paaocrn.~arn~:coo, 
Snarn~:lI 
c~use the lexeme~ in these queries to be replaced, 
resp., by the required perfective verb, noun, a:nd ad- 
jective: 
paeemu.aarat, 
~l, ne~lue t 
u$o ecmubtl~I. 
The implementation of aspectual 
lexico-morphological relations is based, principally, 
o~t their description in (Zalizny~k,1977). For p~rt- 
of-speech relations, we have adopted, though in a 
very limited sense, the concept of lexical substitutions 
(Zholkovsldj,Mel'chuk, 1970). 
If the d~tabase contains no information necessary 
for switcldng to a lexeme of the desired a~pect or 
paxt of speech, RUMORS resorts to its periphrastic 
functions or else m~kes modifications i~t the query. 
E.g., the query 
'~U~OS~Ug : 1' 
aimed at forming the verb corresponding to the noun 
uunoouur 'bureaucrat', will be processed to produce 
the phrase: 
Deaaern ms, umo xapa~mepno DAx uu~oo~uEa 
'< He > is doing what is typical of a bureaucrat'. 
2.2 Syntactical functions 
RUMORS has two m~jor types of syntactical fune- 
tlons: relational ttnd word-string sues. There is alas 
a third group of prepositional\]unctions. 
Relational functions may be called both in the 
QUERY and in the TASK mode of operation to mod- 
ify the input query with regard to the relational re}. 
erenees it may include. There may be references to 
dependency relations, where the node specified by the 
query acts either as dependent (D-references) or as 
governor (G-references}, and to ~napttoric relations 
(F-references). I)- and G-references may contain 
embedded rehLtional references, so that in the gen- 
era\] case each reference present in the input query 
corresponds to tL more or leas corrtplex fragment of 
the dependency and anaphoric structure this query 
is p~rrt of. 
The job of the rel~tlonal functions is to ensure 
fulfilment of the requirements for syntectical govern- 
ment and ~greement which may be imposed on the 
word-form specified by the query by the dependency 
and anaphorlc relations this query has refereuces to. 
Tkls involves extr0cting such requirements from the 
references in the query, reconciling them with enc.h 
other (if there axe two or more references dictating 
conflicting requlrements)~ a.nd then rnodifylng the inl- 
tim query to fit them: choosirtg the correct prepoul- 
tion or conjunction (tire empty one, if needs be) to 
accompany the goal word-form, ~Itd ulteringj a~ re- 
quired, the inflexional and paxt-obspeech categories 
within the query. E.g., the query 
pemenue R D~(aat~ucem~) 
describing the noun petueuue 'decision~an the synt~tc- 
tical object of the verb oasueem~ 'depend' will pro- 
duce the prepo~itlonai combination: 
om pemenu~r '< depend > on < the > decision'. 
Word-strlng fimctlons axe specific to the TASK 
mode of operation only. Their peculiarity is that they 
include some t~nalysis-llke operations maldng it pos- 
sible to locate s.nd process simple prepositional, ad- 
jectival and noun phrazes, even if the input sequence 
of queries h~ no syntactical marking. 
To be more partlcul~r, word-strlng processing con- 
sists in ext~mining the queries of the input sequence 
one by one until the query under examlutLtlon is 
found to answer our definition of the end of a 
word-string. During this ex~min~tlon, e~ch query is 
checked for information relevant to agreement tLnd 
prel)ozitionaJ gow~rnmenl;, ~J,nd tile inflexion~l ;J,ad 
178 
paxt-of-speech c~tegorles pertaining to such inform,> 
tion axe integrated into a special word-strir~g query 
(w-query). After the end of the word-string bus been 
located and exarnhted, the w-query obta.hted is, i~t 
staatdaxd ca~es~ rome common to all of the individual 
queries within this word-string. 'rhus~ the sequence 
of morphologically empty querle, for lexemes 
o, oec b ~atu, aa.~a~amura 'in, all, our, galaxy' 
will be processed to pro&me the prepodtionM phrase: 
so 8ee~ uataeCl eaaa~mu~e 
'in the whole of our #alaxy'. 
Some types of word-strlngs, e.g. tho~e containing ct~r- 
dinM numerals, have to be subjected to more elabo- 
rate operations. 
If • query with la a word-string contains rela.tlonal 
references, the requirements imposed by these are 
given priority over the requirements extr~ted by 
word-string functions~ so that the l~tter provide a 
sort of default. 
Prepositional functions s.re employed in both 
modes of operation, if the word-form or word-string 
being processed is to be preceded by a preposition. 
Tltus~ if ttte preposition i~t question denote~ location, 
direction or source, the noun it i~ meant to accom- 
panty is checked for having lexical preferences in thi~ 
respect. This helps to account, e,g., for such idiomrtt- 
iCfl a.fl 
aa yam~e 'in the ~treet' 
VS. 
n nepeyaue 'in the aide-,~treet'. 
Other prepositional fa~tctio,ts serve to ~dd the 
prothetie u to personal pronouns ~.fter prepositions 
imposhtg this requirement, to choose the eoutextual 
form of the prepositlon if it ha~ more tht~n one of 
theft h etc. 
The data.base ha~ thus been reduced to leas than 
one fifth of (ZMiznya.k,1977), still Mt'ordiIIg correct 
morphologica.l proees,~ing of M1 of the 100 000 lexemes 
listed in (7,ali'anyak,1977). 
Moreover, so far ~m lexeme8 with st~n(hLrd morpho- 
\]ogle'el ch~razterlstlcs go, they ctLn now be processed 
correctly, even if they axe newly-colned or occasional 
(and do not htwe therefore dictionaxy entries of their 
own). As (ZMizny~k,1977) m~y be trusted to eontMn 
M1 non-standard lexemes, the inllexionM ~nd aspec- 
tu~l information in the resulting d~t,~b~e very nearly 
covers the whole of the ltussb~rt voct~bulaxy. 
The sltu~Mon is different with inform~tlon to be 
used in p~rt-of-speec.h conversion ~nd syntactic pro- 
c(;~dng, for it i~ not provided in (Zslie, ny~k,1977). 
This information is now also being Mded~ but in this 
respect, the datn.ba*m is fa,r from completed ~tnd ha~ 
yet only experimental vMue. 
References 
\[1\] L.S,Modina, L.S., Shaly~qfin% Z.M. (1994). 
The JattAP experimental system of Ja.pa.nese- 
Itussit~n ~utomatic tr~ndation (submitted for 
COLING 94). 
\[2\] Z~liznyak, A.A, (1977). Gra,mmaticheskil slow~,r 
russkogo y~.z, yka. (Tile Gru.mmn.tical 1)icl;ionary of 
ll~ussla.n). - Mos(ow.: l/,u,'lskil y~yk, (in ltussiu.rl). 
\[3\] Zholkovskij, A.K., MePchuk, I.A. (1970). Sur la 
synth~se udm~ntique. T.A.Ir~formation, No.2. 
3 Database 
The dat,~b~e used by RUMORS has been derived, 
prlm~.rily, from (g~dlmtya.kd977) which provide, in- 
formation on inflexion ~utd ~pectual cmtverdon for 
~bout 100 000 lexemes. We used • cornputerlzed ver- 
sion of (gMiznyakd977) m0xle twaila~ble to as by the 
Department of tile mttehine pool 0f tile Russian ln.n- 
gu~ge (Institute of the ltussi~tn l~aguage, of the l~.uu- 
sien acMemy of sciences, Moscow). By now, how- 
ever~ our d~t~ba~e is appreciably ditrerent from its 
source, 
Aside from various minor moditic~tloitB, we ha.re 
t~ken Mv~tage of the fnzt that it is ~tot unusu~d 
for inttexiona.l informz,£ion chaxacterir, lng ,u lexeme 
to correlate with some components of this lexeme'u 
word-structure. Such componeat~ have been orga,- 
nixed into ,. dlction~ry of their own, the inforrm~tlon 
a~socia.ted with ea.ch of tltem inchtded in their re- 
spective entries, ~nd N1 the eorrresponding Iexemes 
removed from the dut~b~um. 
/79 
