SOME PROBLEMS OF MACHINE TRANSLATION 
BETWEEN' CLOSELY RELATED LAN~GUAGES 
Alevtina BEMOVA, karel OL~VA and Jarmila PANEVOVA 
Faculty of Mathematics and Physics 
Charles University 
~alostransk@ n~m~st~ 25 
CS-118 00 Praha 1 - Mal~ $%rana 
Czechoslovakia 
Abstract i 
\]~e describe the linguistic background of 
a Czech-to-Russian ~T system, stressing its 
features resulting from the closed related- 
ness of the two languages, above all the pos- 
sibility of a minimization of the transfer. 
Related linguistic problems are analyzed 
within the MT project, as well as in the 
perspective of contrastive linguistics. 
1. The system of Czech-to-Russian MT 
system called RUSLAN is conceived ruimilarly 
as all linguistically based MT systems) as a 
modular system consisting (in brief) of a 
source language parser, a tranfer and a syn- 
thesis of the target language. The task is to 
translate texts from the domain of computers, 
in particular manuals of operating systems. 
Since in RUSLAN the source language is close- 
ly genetically related to the target one, 
some of the modules of the system could have 
been considerably simplified, not leaving out 
of consi4eration the theoretical linguistic 
framework on which t\]~e system is based (de- 
pendency and stratificatlonal approach). The 
simplifications concern, first of all, the 
transfer phase, so that the system cannot be 
understood as including a complete transfer. 
2. The effort towards a maximally effec- 
tive procedure has also resulted in simplifi- 
cations in the 2arser. This was made possible 
i.a. by the similarity of cases of syntactic 
ambiguity in the source and the target lan- 
guage. For example, with sequences of the 
type Verb Noun I Noun 2 ... Nouni, where each 
Nounj stands for a nominal or a prepositional 
group serving as a free modifier, the surface 
order can generally be preserved, which fact 
makes unnecessary a detailed identification 
whether any of the Noun~'s modifies the Verb 
or one of the precedin~ Nouns. This can be 
illustrated by the output Russian sentence 
"Vo vremja svoej raboty programma mo~et po- 
trebovat' tak~e pomo~6" sistemy pri obrabotke 
failer dannych." (Lit. "In course of-lts work 
program can need also help of-system in pro- 
cessing of-files of-data."), where the group 
"pri obrabotke ..." can be analyzed (in both 
languages) as modifying the verb "potrebovat "' 
or the nouns "pomo~5"" or "sistemy". If the 
order of the nominal groups is preserved, the 
translation also preserves the structural 
ambiguity of the original. Also nominallza- 
tions can be translated independently of 
their underlying structure (e.g. , "Indeksno- 
posledovaternyje faJly neobchodimo do obra- 
botki preobrazovat'." - llt. "Index-sequential 
files have-to-be before ~rocessi__~ transfor- 
med.", or "Programmy, napisannye na Jazyke 
Assembler v ramkach pred\[du~6e~ versii, ne- 
obchodimo snova translirovat'." - lit. "Pro- 
grams written in language Assembler in frame- 
work-of ~ version have-to-be again 
compiled."). 
Such an approach made it possible, at 
first, to minimize the transfer phase in the 
design of the project, and then, in the pro- 
cess of realization, the articu\].ation of 
transfer operations into the pax'ser and the 
synthesis, which may lead to an impressio~ 
that RUSLAN works complete\].y without 
transfer, i.e., as a direst binary NT system° 
In principle, it can be said that the minim:\[- 
zation of the transfer reflects the empirical 
fast that the two languages have a lot el ~ 
common features. 
3. A great role is played in RUSLAN by 
the lexicon. The lexical entry contains maxi -~ 
mum of information, which is then projected 
to the syntactic rules; only the most general 
behaviour of words is rendered purely by 
means of syntax. 
The rules of choice of lexical equiva- 
lents include different types of information° 
Along with the data on parts of speech and 
morphemics, semantic features are listed, and 
(esp. with verbs) also the valency (subcate- 
gorization) frame; the valency slots are ac.-- 
companled by information on their Czech mor- 
phemic form as well as that of the corres- 
ponding Russian items (as an example of their 
discrepancy might serve the pair "u~ivat 
n~co(acc.)" vs. "po~zovat~Ja ~em(instr°)" -- 
"to use stg." ). Where pasivization is possi~ 
ble, it is indicated which of the slots 
(mostly, but not always expressed by aecusa-~ 
tire) is selected as the passive surface sub- 
Ject, expressed then by nominative. With each 
of the slots, the semantic features required 
or excluded for the filler of that slot are 
indicated. These features help to identify 
the fillers, especially in cases of ambigui ~ 
ty, e.g. in Czech "V~stupnl za~izenl nastavi 
~,dkov&nl na po~adovanou hodnotu." (lito 
"Output device sets line-spaclng at required 
value ") the verb "nastavit" ("set") has the 
following valency frame: Actor (nom/nom~ 
+Human ,+Device) , Objective (ace/ace ,~-Con- 
cr,+Result-of-process,-}luman), where "+ de ~ 
notes semantic features such that at least 
one of them has to be prescott with the filler 
of the respective slot, " " denotes semantic 
features excluded with the filler, and bold- 
print denotes Czech/Russian morphological 
forms. In this way, the ambiguity of morphe- 
mic case with "~&dkov~ni" and "za~Izeni" (in 
both cases between non and ace) can be solved 
on the basis of semantic features of the "two 
nouns. 
3.1 The choice of the Russian equivalents 
for Czech lexical units should reflect also 
46 
structura\] differences between the two lan- 
guageso These differences concern also syn- 
tac'iifc pat';;erns; at least 'the following cases 
should be distinguished: 
as Adj Ad j NoUil -¢~ AdJ Noun 
cxo: d~,,';;ov,9 f'fdic~ p~ikaz 
-..> upravljaju\[\[ij operator 
\].it, : data soutz'ol comand 
-,{.> control oper ai;or 
be Noun -> Adj Noun 
ex.: poiiita~ -~ vy~islitel~naJa ma~J.na 
11%o,~ oomputez .... ~, computing machine 
c. Ve:rb -,~ Verb Noun. 
eXo: zkompilovat -~, osu~estv~%' kompJ\] jaciju 
lifo :to (!omp:i, le -~> to carry out compi\].ation 
d° Noun --4~ Neuu Noun 
eXo: poi}gtek .->. toSka peresc~en_J;ja 
li%o: beginning "> point of.-intersection 
e o Ad j Ad j Noun .-~.- Noun Noun Ad j Noun 
eXo: vyglil programsvaoi jazyk 
--.~, Jazyk programmtrovaniJa Vys,~ego urovnja 
\]:it o : highe:e prog,,amm:i, ng language 
-@ language of--programming of-higher level 
(\]\].early, ~'~ome types ~¢re easier "to implement 
"than the el:hers, which depends 011 the eomple-- 
xity of i;he respective Czech and Russian con~. 
struetions. For simplification of some cases 
of the type do ~ where %he Hussian equivalent 
includes a modifying noun in a fixed morphe- 
mic form, this is treated as an uninflected 
word, the syntactic relation of which is 
established already in the dictionary. 
3,2 Due %c the closeness of the languages, 
useful, tng:redien~ can be seen in the idea 
of a trans(;ueing dictionary propose@ and. ela- 
borated i~ the English-to-Czech NT system 
(el° Kirschner,82)o The transducing dictions .... 
ry, based on algorithmic handling of the re- 
gular productive international affixes (with 
exceptions listed in the main dictionary) 
and of the orthographic and similar differen- 
ces, can be illustrated by the following : 
as with the suffixes -gig (mental, ,"assembly") 
--~,.t (agreg/;t ,"agl<regate") , pen-~_ (koeficient , 
"coefficiest") , -ura (kubatura ,"cubic vo-- 
fume") , an,:l the lexloal components of Greek 
er Latin opigin, such as -~_%:af, -~ko~o_ ~ (kar-- 
diograf,"cardiegraph" ,elektroskop,"electro- 
scope") , the Russian equivalents differ at 
most in details 
b~ with other suffixes of international use, 
the Russia\[, equivalents correspond in a sys- 
tematic way to the Czech ones~ as with 
-_~.~st-2l a/'-iK~. , C!9./.:::J=Ja. , .::J=9~.us/::!z~ m, 
z~Xn:i/.-~rn;\[ it '=!£k£/-4 ~. e s ki ~/ 
Co to a certain degree also word~ of Slavo- 
nic origin can be handled by a procedure 
based on correspondences with regular segment 
pairs such aS h/£\[~ \]'3/~1, TraT/ToroT (where T 
s'l; and s fo:r an occlusive : krAtkp/korotki j 
"short"); such pairs as "brad" ("castle") vs. 
"gored" ("town") ~ where the lexical semantics 
differs, have to be \].is'bed in the lexicon. 
do whenever a word has net been identified in 
%he main dictionary and cannot he treated by 
%he procedures of the types as ,be ,Co , at 
\].east %ra~lsltteratJon and some of the elemen- 
bary correspondences &re carried ou\]; ~ so tha'b 
if cogs "pPepln~n~" ("overloading") or "dis- 
keta" ("floppy disc") were not found in the 
dictionary, they would be transduced as "pe-- 
repolnenie" (correctly) an8 " disketa" (in-- 
stead of "glbkij disk"), respectively° 
This procedure , and a set of similar 
f~%il-so ft rules for syntax , should ensure 
that the output be basically undel's%anda.ble. 
/4. The procedures of synt ac tie ana\] ysis 
and synthesis are based on lexical ini'ormat ~ 
ion, including the valency frameso Certain 
difficulties arise when filling the slots of 
ohliga~ery adverbials (see Panevov\[t,80) with 
which the forms of a given adverbial type are 
variable ~ e.g. "vrA%it se kam" ("%o return 
somewhere"): "napravo" ("%o the rip;hi" , ad- 
verb) ~ "k problgmu" ("to the problem" ~ prepo-- 
sition "k" + ds/tive) , "do bytu" ("into -the 
flat" , preposition "do 'I -l- accusa%J, ve) etc. 
Snch cases are handled by the parser tog-ether 
with free adverbials, only it must be ensured 
that the obligatory modifier is identified 
(in a case of ellipsis, it is necessary %o 
take into account the preceding" sentence 
although often the Czech deletion goes in 
parallel with that in the corresponding Rus-- 
sian sentence). 
4.fl One of the relevant differences be . 
tween Czech and Russian syntax concerns sent- 
ences with the Czech Ist person plural co: .... 
responding to the Russian reflexive form~:~ 
e .g. Czech "Algoritmus re zm is t~ovf)n i b I o\]c¢~ 
popisujeme v ~stJ 6" vs. }{ussian "Al£;orJ.tm 
razme~enlja blokov opJ. syvaetsjs v razdele 6" 
("The algorithm of dislocation of blocks Js 
described in Sects 6"). Often a modal e~.-- 
pression is present: "NAzvy progz'am~ m~erne 
mayn't v knihovn~ I' vs. Russian "}!azvanija pro.-- 
gramm me\[no naj%i v biblioteke" ("The titles 
of the programs can be found in the libra- 
ry") o The linguistic rules underlying the 
practical solution of these problems can have 
%he following form: 
NeUnacc VerblstP\] -~ N°unnom Verbref\] 
(N°unacc) Verbmodal ,IstPl Verbinf 
~-~ (NOUnno m ) Modal V er bin i:' 
("~'Todal" stands here for sueb express:ions as 
"mo~no" ("possible") , "nado" ("nec ~s~arv"); 
parentheses " (" ,')" denote the facb that -the 
Objective is not always obligatory. 
4.2 In some cases the ambig'uity of a Czech 
sentence corresponds to a simJ.\].ar ambiguity 
in Russian. In other cases the ambip;uity in 
the two languages is not in such accord;.tnce° 
This is illustrated by the fell. owing: 
a. Czech: 
V 16t@ prob~.h\].o jedngn5 o n ozv~ ! v az'J=aT!'j\[ ~ .0~!o 
Russian : 
\]:,etom pro~lo sove~,anije .o ~!ovj)jr 2 v£tr:iante OS~ 
(In summer~ the negotiations on the new va--. 
riant of OS took place°) 
be Czech: 
V 16t@ prob~hlo jednAnl o. p-rf-tz__dn=i_nA£j!o 
Russian : 
Letom sove~ganie pre~Io vo vremja k aj!.~Lk\]~I o 
(In summer , the negotiations took place 
during vaca~ions. ) 
47 
The preposition "o" with locative in Czech is 
kept also in Russian or, with nouns having 
the feature Time, translated as "vo vremja" 
with genitive. 
Differences in prepositional construct- 
ions are found also with the following pairs: 
c. Czech: 
Price n_~a programu pokrabujl i v tomto roce. 
Russian: 
Raboty nad programmoj prodol~ajutsja i v f, tom 
godu. 
(The works on the program continue also this 
year. ) 
d. Czech: 
Prhee na fakult~ pokra~uj~ i v %omto rote. 
Russian: 
Raboty na faku~tete prodol~ajutsja i v 6tom 
godu. 
(The works at the faculty continue also this 
year.) 
These examples cannot be fully accounted for 
by means of lexieal information, neither can 
they be included into the general scheme of 
syntactic rules. It is necessary to have a 
list of such differences. 
4.3 In translating Czech subordinate 
clauses introduced by such conjunctions as 
"zda" ,"-li" ("whether") , "jestli~e" ("if") , 
"kdy~" ("when"), "dokud" ("till"), "dokud he" 
("until") , "pokud" ("as long as") , some of 
which are ambiguous, the text can be treated 
as relatively homogenous. The functioning of 
a clause introduced by "zda" or "-li" as a 
subject can be identified on the basis of the 
valency of the verb in superordinated clause, 
where it is marked whether the verb may take 
a subordinated clause as its Actor or Objec- 
tive. In the other cases, suitable or at 
least acceptable translations of the conjunc- 
tions are as follows: Czech "zda","-li","po- 
kud" ,"jestli~e" as Russian "esli"; Czech "do- 
kud" ,"dokud he" as Russian "poka","poka ne" , 
Czech "kdy~" as Russian "kogda". 
It follows that while it is necessary 
to work "to a certain degreewith the under - 
lying structure, in'the majority of cases the 
equivalent can be chosen just in accordance 
with the conjunctions themselves. 
4.4 The Czech verb "btt" ("to be") has 
several Russian equivalents: the copula 
"byt TM , verbs "est TM , "javljat~Ja", "naehodit u 
sja", "imet~ja". The selection of the equiva- 
lent depends on the syntactic context: if the 
nominal predicate in Czech is in instrumental 
ease, then a form of the verb "javljat~ja" is 
preferred; if a local adverbial is present, 
then the translation "nachodit~ja" is at pla- 
ce, otherwise the appropriate form of the 
copula is chosen; Of course, another point 
concerns the translation of "btt" within 
idioms ("byt'v porjadke", but "imet~ja v ras- 
porja~enii"). 
4.5 The surface behaviour of negation is 
not the same in Czech and Jn Russian: in 
Czech, even partial negation is often expres- 
sed as a prefix of the verb, which gives rise 
to an ambiguity absent in Russian, where %hls 
distinction is always transparent. Some of 
the examples from our texts are: 
ao Czech: 
To ant system p~esn~ nev~. 
Russian: 
~togo da~e sistema to,no ne znaet. 
(Th~s even the system does not know exactly°) 
b. Czech: 
Tabulka nen~ ulo~cna na pevn6m m~st~ v 
pam6ti. 
Russian: 
Tahlica pome~aetsja ne na postojannom meste 
v pamjati. 
(The matrix is not placed in a fixed position 
in the stbrage.) 
4.6 We assume that the surface order is 
substantially the same in the two languages; 
the differences concern only such specific 
cases as, e.g., the positions of parts of the 
complex verb forms or those of certain pro~- 
nouns and particles which have the character 
of elitics in Czech, but usually follow the 
verb in Russian: 
a. Czech: 
... vypadal by tak, ~e ~ tabulka obsahovala 
fldaje ... 
Russian: 
... vygljadel by tak, 5to tablica soder{ala 
~Z dannye ... 
(... he would look as if the matrix con-- 
tained(eond.) data ...) 
b. Czech: 
Budeme se v opera~n~ch syst6mech sna~it ... 
Russian: 
V operacionnych sistemach budem s~aratsb/a~,' ' ' .. . 
(In the operating systeme~ we shall try ...) 
The differences described in this section do 
not concern the structural order, and there 
is no danger that ambiguity might arise. The 
dislocation of function words and particles 
can be described by general rules. 
4.7 In 4.1 through 4.6 we wanted to show 
what the problems of parsing are if the cor~ 
respondences in the underlying structure, in 
surface syntax and in the surface order of 
morphemes are to be made use of, while the 
differences are solved; we also wanted to il~ 
lustrate the narrowed, but nonetheless neces-- 
sary role of transfer. 
5. We wanted to point out that, on the one 
hand, the closeness of the two languages 
makes it relatively easy to find a strategy 
for an MT system, since the most complex pro- 
blems of ambiguities might be partially a- 
voided, although, on the other hand, compara- 
tive empirical research in the domains of 
lexicon and of syntax is necessary also for 
such a pair of languages. Results of such an 
approach may be useful in MT, and also in the 
context of a contrastive comparison of cog- 
nate languages. 
References

Kirschner Z.: On a Device in Dictionary 
Operalions in Machine Translation, 
in proceedings of Coling "82, Prague 

PanevovA J.: Formy a funkce ve stavb~ ~esk8 
v~ty, Academia, Prague, 1980 
