PIIOBLi~IS OF F01@biL REPRESENTATION OF TEXT STRUCTURE 
FROM THE POINT OF VIEL{ OF AUTOMATIC TitiNSLATION 
Z.M. Shalyapina 
Institute of 0rien~al Studies 
of tile Academy of Sciences of tile USSR 
l.loscow, USSR 
Stumnary 
The paper is devoted to linguistic 
problems of defining the basic formaliz- 
ed representation of text in an automatic 
translation system within the frsmlework 
of the so-called integral formal model of 
tile tr~ulslation process, the primary re- 
quirement for tills representation consi- 
dered to be a compromise between its se- 
mal~ticity, superficiality~ a/id exhausti- 
veness. A representation covering five 
major aspects of text structure (its 
lexico-gra~\]matical composi$ion; its pre- 
dicate-ar~iment organization on the se- 
mantico-syntactic level; the syTitactic 
groupinz ~ of its units; the a/iaphoric re- 
lations between them; tile peculiarities 
of their linear arrmlgement) and refer- 
red ~o as Combined Structural Represen- 
tation (CSR) of text, is described to 
show tile ways and means of achieving 
this compromise in the Japanese-Russian 
Automatic Translation Project, now under 
development at tile Institute of Oriental 
Studies of the Academy of Sciences o£ 
the USSR (Moscow). 
Introduction 
Many problems of the automatic proces- 
sing of text require for their effect- 
ive solution a previous analysis of the 
text processed, aimed at tra/isforming 
this text into its intelnnediate formali- 
zed representation of some kind, more 
suitable for further processing than the 
text itself. I'~hen determining tile conc- 
rete characteristics of such a represen- 
tation one must obviously take into ac- 
count the operations meant to be applied 
to it, or to be performed on its basis 
within the framework of the system ilivol- 
red. If it is the problem of automatic 
tra~Islation that the system is to solve, 
the set of tile corresponding operations 
will depend primarily on the general for- 
mal model of the translation process un- 
derlying this system. One version of tlle 
model in question, proposed in I a/id dis- 
cussed in more detail in 2, envisages 
the following main groups of operations: 
1 ) analysis and interpretation of the 
initial text, simulating the process of 
perceiving and understanding its signifi- 
cation and denotation; ideally, it pre- 
supposes a semantic description of tile 
text, as well as a model of the situation 
("world" fra~ment) presented in it, being" 
constructed from this text (possibly, 
via a ,lumber of intelnnediate represen- 
tations ) ; 
2) tra/islation proper, which is per- 
formed at a level R of some formal re- 
presentation R i of the initial text, de- 
rived from its analysis, and 6ullounts to 
selecting translation equivalents for 
the units included in l{i: the result is 
an intermediate representation R t of tlle 
target text, this representation being 
usually (although not necessarily) of 
the same level as Ri; 
3) verification of the adequacy of 
the tr~ulslation performed, by means of 
a/lalyzili~' the resulta/it representation 
R t a/id comparing the semantic descrip- 
tion and the situational model obtained, 
with the sema/itic description alid the 
model of tlre situation corresponding to 
the initial text.; 
I~) generation (synthesis) of the 
tar,z'et text by tra/lsforming tlre inte~ne- 
diate representation R t formed during 
translation proper and assumed to be 
adequate by the verification procedure, 
into a sequence of actual word-forms 
azld punctuation marks making up the tar- 
tset l.a/i~age text; 
5) evaluation of ~he target text 
with a view fie detect undesirable ambi- 
Ln/ities and inaccuracies that mi~-ht have 
slipped in during the synthesis process; 
it implies 8/lalyzin~- the text back to 
the I{ level and checking,whether tile re- 
sulting representation R t coincides with 
the representation R t from which this 
text has been formed; 
6) editing operations dictated by the 
checks and comparisons made:if the trans- 
lation is judged to be inadequate tliey 
will consist ill returning" to the phase 
of translation proper and either substi- 
tuting alternative translation equiva- , 
lents for some of the previously select- 
ed ones,or reconsidering the entire pro- 
cedure %Ised and repeating it at a diffe- 
rent ("deeper") representation level or 
in a different form (probably,resorting • 
to synonymous trsl\]sformations of" the ini- 
tial text at the R i level); if it is tile 
target text ambif~lities 6uld stylistic im- 
perfections that arc ~o be removed,better 
expressive means will be sought chiefly 
by actuating9 the system of synonymous 
tra/Isformations at the R t level. 
174 
It is readily seen that the basic le- 
vel of formal text representation from 
the standpoint of the above conception 
of the translation process is level R, 
directly concerned with the most impor- 
tant translation operations, primarily, 
the operations of translation proper, 
the scope of ~lich is practically confin- 
ed to the level in question, and the ope- 
rations of synthesis ensuring the tran- 
sition from the R-level representation 
of a text to its more "superficial" re- 
presentations up to the text as such. 
Some other of the operations mention- 
ed involve also switching from the R-le- 
vel to "deeper" levels of intermediate 
formal text representation and taking 
into consideration such supplementary 
factors as the essence of the situation 
described by the text to be translated, 
the semantic peculiarities of the vocabu- 
lary and the syntax of the two languages; 
the requirements of gr~nmaticality and 
stylistic normativity (regularity) of the 
target text, and so on. The foregoing 
shows that these operations are mostly 
auxiliary in nature, their main purpose 
being to improve the content adequacy 
and the linguistic acceptability of the 
translation text formed through the use 
of the R-level representation; in a con- 
crete automatic translation system based 
essentially on ~he formal model we have 
outlined, they may be reduced or even al- 
together omitted for various practical 
reasons. 
However, whether these supplementary 
operations be included in an AT system 
or not, it is clear that the system will 
depend largely for its efficiency on the 
choice of the intermediate level R. It 
is precisely this basic level that we 
are now Going to consider. 
General Requirements 
From the point of view of the purpos- 
es and peculiarities of the translation 
process, there are two opposite require- 
ments that can be placed upon the inter- 
mediate formalized representation R in 
an automatic translation system. 
On the one hand, insofar as transla- 
tion boils down to transforming the sur- 
face structure of a text while preserv- 
ing its content, it seems safe to assu- 
me that if some ,components of the text to 
be translated~ some features of these 
components, or links between them are re- 
levant for the content structure of this 
text, they may also prove of importance 
for choosing the correct translation 
equivalents for the text units. Conseque- 
ntly, the adequate representation R used 
in au AT system should be sufficiently 
"semantic" for all the necessary info~11a- 
tion concerning the components, links 
and features in question to be either 
explicitly given in this representation 
Or, at least, to be easily obtainable 
from it. To put it differently, represe- 
ntation R of a text processed must ref- 
lect its semantic structure with suffi- 
cient precision and in sufficient de- 
tail. 
On the other hand, the structures of 
the source and the target l~ages will, 
as often as not, have certain features 
in co,non, this leading to an inevitable 
neutralization of any analysis transfor- 
mations involving such features, by the 
inverse transfo~nations during the syn- 
thesis process. Such transfo~nations 
will thus prove unnecessary for transla- 
tion purposes, no matter how in~ortant 
they might be as regards the full seman- 
tic analysis of the text. Accordingly, 
representation R must be sufficiently 
"superficial" for its construction to in- 
corporate the minimum possible of such 
superfluous transformations. 
As we see, the second requirement pro- 
vides a kind of limitation on the first 
one, restricting the extent and the me- 
thods of the explication necessitated by 
tile latterp of the semantic structure of 
the text. Taking into account both of 
these requirements will most likely re- 
sult in a kind of a compromise solution 
suggesting that information made expli- 
cit in representation R of a certain text 
should not include all the elements of 
its semantic structure; rather, it should 
cover only those of them which are aprio- 
ri known to be extensively used in estab- 
lishing inter-l~guage correlations dur- 
ing translation. 
With such a solution, however, one 
must be fully aware that real texts will 
contain a substantial proprtion of cases 
where some text information overlooked 
by our analysis might eventually turn out 
relevant for translation. If we do not 
want to give up the idea of adequately 
processing such texts as impracticable 
in principle, it seems useful to impose a 
third requirement on representation R - 
the requirement of "exhaustiveness" which 
may be formulated as follows. All infor- 
mation contained in a natural language 
text and not made explicit in its inter- 
mediate representation must be preserved 
within this representation; if possible, 
it should be preserved fully and without 
changing its original (natural language) 
form, so that there might be no acciden- 
tal losses or distortions. 
If so, the substitution of the forma- 
lized representation R for the original 
text will not exclude the possibility of 
175 
additional analysis ~llplifying the res- 
ults of tile standard analyzing procedure 
and providing access to some extra infor- 
mation that may be required. This is to 
say that the linguist describing the 
means of translating concrete language 
units within such a system will not be 
subject to the pressure of too stringent 
limitations originating from the conven- 
tions of the system, rather than from 
the nature of the material he deals with, 
and complicating his task (difficult 
enough as it is). ~leoretically, he will 
be free to use any text information (bo~ 
"superficial" and "deep")in any way he 
may find linguistically appropriate: 
whether as source units to be replaced 
by tr~islation equivalents, or as condi- 
tions determining the equivalents chos- 
en for some other units, or else as tra- 
nslation equivalents themselves. 
The above principles are general 
enough to allow of various ways of imp- 
lementing them in a concrete automatic 
translation project. We sl~all present 
here one attempt of such implementation 
made in defining the so-called Combined 
Structural Representation to be used in 
the system of Japanese-Russian automatic 
translation, now under development at 
the Institute of Oriental Studies of the 
Academy of Sciences of the USSR (~oscow)3. 
Combine d St, ruc, tural Representation (CSR), 
Taking into account the typological 
correlation between the Japanese and the 
Russlan lan6~ages, we consider it neces- 
sary to specify in the CSR of the initi- 
al Japanese texts, as well as of their 
Russiax, translations, five main aspects 
of text structure: the lexico-granmlati- 
cal composition of the text processed, 
its predicate-ar~iment organization on 
the semantico-syntactic level, the syn- 
tactic grouping of its units, the ~apho- 
tic relations between them, and the pecu- 
liarities of their linear arrangement. 
Within the CSR the corresponding five 
types of linguistic information about the 
text form separate components ~lich will 
now be discussed in turn, mostly from the 
point of view of their consistency witll 
the general requirements stated above. 
Lexico-~|r~lati.cal composition 
~e component of the CSR concerned 
with the lexico-granm~atical composition 
of the text is intended to contain expli- 
cit descriptions of all lexemes present 
or implied (if ellipsis is the case) in 
the text under consideration, as well as 
of all grammatical (morpho-syntaetical) 
elements accompanying them in the corres- 
ponding word forms or quasi-word forms 
(units taken to be functionally analogous 
to word forms). The descriptions requir- 
ed must include, apart from the s~nbols 
of the units involved, information about 
their meanings within the text in quest- 
ion and about their relevancy or irrele- 
vancy as regards the process of its tra- 
nslation. 
~le operations necessary to obtain 
this component of the CSR ~len analyzing 
the initial Japanese text will evidently 
comprise isolating separate word forms 
and determining their internal structure 
(in terms of lexemes and morphologic mar- 
kers), resolving ~nbi~ities for all 
units established; eliminating synonymy 
where it is manifested as supplementary 
distribution or free variation of morpho- 
logic units; detecting phraseological 
word combinations ~d reducing them to 
a one-word symbol; giving special labels 
to those word forms or parts of word 
forms which play ~ auxiliary role in the 
text analyzed and require no special tran- 
slation equivalents; filling in the units 
omitted in the source text if their ab- 
sence obscures its structure and hinders 
the translation process (due to the dif- 
ferences between the rules of linguistic 
ellipsis in the two lmlguages), etc. 
From this it follows that the lexico- 
granm~atical composition of a text cannot 
be definitively established in the course 
of its analysis without drawing upon in- 
formation about its structural characte- 
ristics. The same kind of information is 
also needed ~en working with this compo- 
nent of the CSR in the synthesis process 
(chiefly in connection with such means 
of expressing structural relations as 
grammatical agreement and government, 
typical of the Russian l~,guage). 
Therefore~ in deciding what language 
units are to be described as permissible 
in the given con~onent of the CSR, and 
~at status is to be attributed to them 
within its framework, specifically, which 
units it is best to treat as individual 
words and ~lich ones should rather be 
regarded as meaningful parts of words - 
morphemes (the problem being of particu- 
lar importance for Japanese where no re- 
gular graphical means are used in writ- 
ing to separate words from each other), 
we believe it advisable to pay special 
attention to the functions of the corres- 
ponding units in the general structure of 
the text and in the system of operations 
used for its processing. With this aim in 
view, we have devised an operational cri- 
terion of distinguishing words and their 
meaningful parts, based on the principle 
of the homogeneity of the levels of text 
processing" and on the requirement that 
each level's units should have structural- 
ly significant functions within the level 
176 
itself, while there should also exist a 
well-defined (although not necessarily 
one-to-one) correspondence between cer- 
tain subsets of units belonging to the 
adjacent levels of processing. According 
to this criterion, the status of separ- 
ate words is justified, among others, 
for such Japanese units as the so-called 
"causative voice" marker -ser u/-sas~ 
the "conditional mood" marker -~, the 
negation marker -nai (at least, in con- 
ditional contexts~d some others. 
Ajnong units ftulctionally analogous to in- 
dependent word forms (and, consequently, 
appearing as such within the CSR), are 
also classified punctuation marks which 
are, to our mind, quite similar to words 
in that they can be meaningful and can 
correspond to definite translation equi- 
valents (or play the role of such, cf. 
Japanese ka vs. Russian ?). 
In this way, so far as the position of 
a unit in text structure and in the sys- 
tem of translation transformations is re- 
lated to the meaning of this unit, our 
general principles of describing the le- 
xico-grammatical composition of texts in 
their CSR conform to the requirement of 
its "semanticity". On the other hand, the 
"exhaustiveness" requirement is also met, 
since we mm/¢e it a point not to leave out 
of the CSR any text elements, up to those 
that serve essentially as surface mark- 
ers of other linguistic units made expli- 
cit in this representation, and do not 
themselves participate to any significant 
extent in the semantic operations provid- 
ed in the system (e.g. Japanese "case" 
particles; Russian morphological catego- 
ries of case, gender and number of adjec- 
tives; "surface" linguistic expression 
of "lexical functions" and their transla- 
tion equivalents, etc.). 
Predicate-argument organization of the 
text on the semantico-syntactie level 
This component of the CSR represents 
semantico-syntactic links between words 
and/or quasi-words corresponding to their 
predicate-arz~ment relations and, accor- 
dingly, constituting meaningful text 
units. It is common knowledge that the 
surface expression of these units is 
language-specific while their semantic 
content is generally assumed to be of a 
more or less universal nature. So in tra- 
nslation they must either remain essen- 
tially the same (naturally, with all the 
necessary modifications of their surface 
markers) or must be transformed by cer- 
tain formal rules depending on the seman- 
tic interpretation of the links in ques- 
tion and on their relation with the mean- 
ing of the units linked. 
The lexico-sy1~tactic translation trans- 
formations mentioned are most colmnonly 
used where the source and the target 
languages have appreciable typological 
differences. This is precisely the case 
with the Japanese-Russian correlation 
(a simple example: kate-we mannenhitsu- 
o nusumaremashita, lit. "he was stolen 
a pen", transl. Y Hero yEpaa~ pyqEy 
"he had his pen stolen"). Bearing this 
in mind we have chosen the dependency 
grammar to represent the predicate-ar- 
gument structure of texts in their CSR, 
preferring it to its alternative - the 
immediate constituent system, for accor- 
ding to a number of specialists, this 
type of transformations is easier to 
describe in dependency terms. 
One of the central linguistic prob- 
lems connected with presenting the pre- 
dicate-arL~nnent structure of a text in 
its CSR is which among the various (~u~d 
often semantically overlapping) dependen- 
cies between the text units should be se- 
lected for explicit description. In solv- 
ing this problem we proceed from the 
principle of the possibility of "imme- 
diate semantic substantiation" of the 
dependencies to be selected. It can be 
specified as the following requirement 
bearing on the ways and methods of des- 
cribing words and grammatical construc- 
tions when compiling the linguistic in- 
formation for the automatic translation 
system: 
- all syntactic dependencies registe- 
red in the CSR of a certain text must 
realize some semantico-syntactic valen- 
cies of the lexical or grammatical units 
present in it (and usually forming part 
of the lexico-grammatical composition of 
the word forms or quasi-word forms link- 
ed by the corresponding dependencies). 
These valencies, in their turn, must 
directly correlate with the semantic 
characteristics of the units they are 
ascribed to, semantic considerations 
viewed as the major factor underlying 
their assignment to those units. One im- 
portant consideration of this kind con- 
sists in preferring the descriptions 
where the maximum possible of the valen- 
cies envisaged could be realized in con- 
crete texts by two-word combinations and 
the maximum possible of such combinations 
could be checked for their semantic ac- 
ceptability (consistency) without regard 
to any units outside them. 
Apart from the situations where some 
of the syntactically linked units perfoxna 
in the text processed auxiliary functions 
(thus having no independent semantic con- 
tent) the application of the above crite- 
ria can only be limited for reasons of 
economy and effective controllability of 
177 
the lin~istic description. 
From the above it can be inferred that 
the linguistic information used to reveal 
~u\]d/or process the predicate-argument 
structure of concrete texts should com- 
bine data on the means of surface expres- 
sion of the links involved (i.e. word or- 
der, function words, etc.) with fairly 
detailed semantic descriptions of the 
words to be fin/ted and of their combina- 
torial potentialities. To provide the 
formal tools necessary for constructing 
such descriptions we have devised a spe- 
cial formalized semantic language SLS, 
the characteristic properties of which 
ca/l be briefly outlined as follows. 
The vocabulary of SL comprises three 
categories of the so-called semantic ele- 
ments: categorial elements, encyclopae- 
dic elements and identifying elements. 
Among these the leading role belongs to 
the catez'orial elements which are given 
special descriptions constituting a kind 
of formalized semantic ~'ra~nmar of the na- 
tural language. The sy~tax of SL, used 
to combine semantic elements into seman- 
tic formulae~ accounts both for the se- 
:**antic relations established between the 
components of such a formula and for its 
communicative organization determining 
the behaviour of its components as re- 
gards tile logic operations tllat can be 
applied to the formula as a whole. From 
the formal point of view a semantic for- 
mula is a linear sequence of symbols, 
structurally equivalent to a special type 
of a dependency tree where the nodes can 
be labeled by tile symbols not only of 
single sems/~tic elements, but also of 
their combinations(subtrees)of 8x~y length. 
Sema/2tic formulae caxl be employed to 
express: I) semantic definitions of natu- 
ral l~u~G-uage tu~its (from a separate word 
up to a whole text); 2) their paradigma- 
tic sem~mtic features; 3) their syntagma- 
tic semantic properties (semantic inter- 
pretations of their syntactic valencies). 
An important distinguishing characte- 
ristics of SL is that it affords formal 
derivability of information about the se- 
max~tic paradiLnnatic and syntagm,atic fea- 
tures of ls/~guage units from their seman- 
tic definitions. This helps to make the 
sema~tic descriptions of these units more 
compact (by eliminating the unnecessary 
reiteration of essentially the sarape data) 
and to improve their reliability, owing 
to tl~e possibility of more objectively 
evaluating the adequacy of semantic defi- 
nitions on the basis of such a criterion 
as the degree of correlation between the 
synta&~natic properties of a unit deriv- 
able from its definition, on the one hand, 
and its actual semantic eombinability as 
observed in real texts, on the other 
ha/id. ~oreover, it increases the range 
of linguistic facts explainable on seman- 
tic grounds. Thus, it becomes possible 
to give uniform rules (unattainable if 
one stays within tile bounds of purely le- 
xico-syntactic phenomena) for the select- 
ion of tile correct morpho-syu~tactical 
markers (as well as for the appropriate 
synonymous transformations and logical 
deductions - operations commonly used as 
translation devices) when handling con- 
structions with such Russian verbs as 
FpO3HTB ("run the risk"), 0naCaTBCS 
("fear"), O~cM~aT~ ("expect"), ycneBaT~ 
("be in time"), etc., ta/<ing predicate 
words as their ar~'uments. These rules 
will enable us, for ex~uuple, to choose 
the correct Russi6u\] sentence 
K paHeHO~,ly 0n0s~an~ o nomo~sD 
("IIelp c~une late to the wounded man"), 
rather tha/l * PaHeHBI~ 0n0s~az c HOMOI~BD 
("The wounded man was late with help") 
as translation of the Japanese sentence 
Keganin-wa teate-ga okurete shimatta. 
With senn~tic definitions of words 
formulated in the SL tel~ns, all syntac- 
tic dependencies linking these words in 
texts can be interpreted (for the most 
part, unsm, bi~uously) as semantic relat- 
ions between certain elements within 
their definitions, and replacin 6' a word 
by its semaxltic definition will not al- 
ter the general form of the predicate- 
argument structure of the text. The ef- 
fect is that ill the framework of the pre- 
dicate-arzulment component of the CSt{ tile 
contradiction between the "semanticity" 
a/id the "superficiality" required of its 
turns out Go be to a large extent elimi- 
nated. For one thing, any fraG~nent of 
the predicate-aruument structure of a 
text Call be interpreted (developed) as 
a structure of sem~ultic elements and re- 
lations; for ~other, the scope of such 
interpretation does not depend on any 
but linguistic considerations, and if no 
transformations affectin~ the internal 
semaxltic structure of words or relations 
between them are necessary for translat- 
ing a certain text fraglnent, the latter 
need not be semantically interpreted, no 
matter whether this kind of interpreta- 
tion be indispensable for some other 
frail'penis of the sane text. 
Syntactic group_in(~ of text units 
This type of structural information 
about the text concerl, s the ~roupin c of 
the words contained in it into larger 
combinations possessing certain syntactic 
and/or sem6ultic independence, which makes 
it advisable to treat them as separate 
units at least at some stages of proces- 
sing' the text in question. In a way such 
178 
info~nation is ~alogous to the informa- 
tion about the constituent structure of 
the text. ~e difference is, though, that 
the aspects of syntactic word-grouping 
included in the CSR of a text are limited 
to those that carry semantically relevant 
information lacking in its dependency 
structure 6 (and, for tI~at matter, not al- 
ways directly expressible in the classi- 
cal constituent marker form, either). 
For the present, the given component 
of the CSR of a text is supposed to spe- 
cify only the word groups established 
within connected fra~nents of its depen- 
dency tree in situations ~ere the compo- 
sition of such groups and their bounda- 
ries are important for some of the opera- 
tions employed to process it, such as 
ascertaining the domain of the quantifi- 
ers; distinguishing between descriptive 
and restrictive attributes; revealing the 
full form of some types of elliptical con- 
structions (e.g. ~hose with co-ordinative 
reduction); deciding on whether it would 
be safe to employ transformations disjoi- 
ning elements of some word-combinations 
within the text's dependency structure or 
linear representation (it seems reasona- 
ble to mark the combinations excluding 
this kind of lexico-syntactic transforma- 
tions aS a special type of syntyctic 
word-groups), etc. 
The relevancy of the data on syntactic 
word-grouping for translation purposes 
can be illustrated by the Japanese sen- 
~ence 
Watakushitachi-no tsukau nichi- ~hin- 
:doen:olaw~in~da tsukatte me hera-hal 
meaning "Among the things we use daily 
there are none that could be used for a 
long time and still remain as good as ne~ 
If the data in question is not taken 
into account here we are liable to dis- 
tort the presuppositional structure of 
the sentence by giving it the "literal" 
translation: 
*Cpe~H nononssyeMMx HaME Be~e~ 10ia~- 
Hero 06zx0~a HOT TaEMx, EOTORMe 6N 
He ~3Ha~Ba~HcB, ~ame ecnH HM~ Hon~- 
30BaTBCS ~0nr0e BpeMa 
("Among the things we use daily there 
are none that do not wear out, even 
if used for a long time"), 
having the evidently false implication 
that the longer things are used the less 
they wear out (of.: HeT Be~e~, EOT0pNe 
6~ He MJHa~MBa~McB, ~axe ecnH HMH Hon~- 
BOBaTBC~ O~eH~ ~KKyp~THO"~ere are no 
things that would not wear out even if 
they are taken good care of"). 
The origin of this undesirable impli- 
cation can be explained two-fold. The 
first explanation is that one of the word- 
group boundaries in the given Russian 
sentence separates the negation He 
("not") from the whole of the fragment 
following it in the linear sequence of 
this sentence:HaHa~HBa~Zeb 6~ ~a~e 
eom\[ m~M H0~BSOBQTBC2 ~0nr0e BpeMs 
("wear out even if they are used for a 
long time"), so that the fragment cited 
is interpreted as an integral seme/itieo- 
syntactic unit, this giving rise to the 
implication to be avoided. According to 
the other explanation, the boundary res- 
ponsible for the interpretation of the 
Russian sentence runs between the whole 
of its initial fragment Cpe~H HCH0~B3y- 
eMNX HaM~ Be~e~ ~0MamHer0 06~x0~a HeT 
TaEHX, KOT0pBIe 6BI He ~JHa~HBa~CB 
("Among the things we use daily there 
are none that do not wear out") and the 
remaining sequence ~axe e0n~ HmH Hons- 
SOBaTBCa ~0zr0e BpeM~ ("even if they 
are used for a long time"). From this 
standpoint, the false implication is ac- 
counted for by the possibility, sugges- 
ted by grouping the sentence units into 
the above two fragments, of interpreting 
and/or transforming these independently 
of each other, thus obtaining 
~e Hs ~CH0X~syeM~x HaM~ Beme~ ~0- 
MamHero o~xo~a ~sHamHBa~Tca, ,~a~e 
eczH HMM HO~BSOBaTBCS ~onroe Bpems 
("All of the things we use daily wear 
out t even if used for a long time"). 
No matter which one of the two expla- 
nations be taken as true (the second one 
seeming more plausible, while the first 
one suggesting simpler check-ups in pro- 
cessin G texts) it is clear that the tran- 
slation problem is to achieve in Russian 
the same syntactic grouping as in the 
original, by introducing the correspond- 
ing lexical and/or positional (linear) 
modifications, e.g. : 
Cpe~ ~cn0a~syeM~x Ha~Z Beme~ ~0ia~- 
Her0 06Mx0~a HeT TaKMX, KOT0pNe 6B~ 
~aze np~ ~nI~Te~BHOM HO~BBOBSHMH OC- 
TaBa~HCB HeM3HO~eHHB~H. 
Another (and, probably, more ordina- 
ry)case of using data on syntactic word- 
grouping in translation can be exempli- 
fied by the sentence: 
q-~F'.~%~ ~ewa- 
re-no seikatsu sui un-o itsumade-mo 
xoku saseru koto-~a de~. 
Here it is essential that the negation 
marker, as well as the expression of con- 
dition, which in the translation sentence 
must take a position different from the 
one its Japanese counterpart occupies in 
the original word-sequence, should not 
interpose between the two members of the 
co-ordinatlve-type word-group present in 
the sentence (for clarity, we have enclo- 
sed tills group in brackets). That is, 
the translation must be (l~iglish being 
structurally similar to Russian in this 
-179-- 
respect): 
If the workers do not ~te ~d ~ 
forward 2olitical ~emandsl we shall 
never be able to raise our life level 
and not 
*If the workers unite and do not put 
forward political demands..." 
Generally speaking, the correct tran- 
slation of the last example (as well as 
of other constructions explainable in 
terms of co-ordinative reduction) could 
also be obtained without recourse to tlle 
information about syntactic word-grouping. 
Instead, one could use a "deeper" descri- 
ption of the text to be translated, with 
elliptical constructions transformed into 
their full representations. However, this 
kind of transformation would be basically 
superfluous, for in the synthesis process 
it would be necessary to reduce the con- 
structions in question back to their el- 
liptical form using but slightly diffe- 
rent rules. It seems therefore preferable 
for the operations of translation proper 
to result directly in axx elliptical con- 
struction analogous to the original one 
a~d differing only in details of its 
surface expression (such as the position 
of negation in the above example), speci- 
fied by the subsequent synthesis proce- 
dure. 
So we see that while the component of 
the CSR under discussion registers only 
semantically significant phenomena of 
text structure, the means of representing 
them in it remain essentially superficial, 
so as to satisfy both the "semanticity" 
and the "superficiality" requirements. 
Anaphor_ic relations between text units 
For interpreting texts in respect of 
tlleir sicalification and especially deno- 
tation, the structure of anaphoric rela- 
tions between their units is on the whole 
no less important than their predicate- 
argument structure. However, the anapho- 
ric structure is expressed mainly by le- 
xical repetition, and this can be easily 
accounted for if we require that as long 
as one text is dealt with, one and the 
same translation equivalent should be se- 
lected, so far as possible, for all oc- 
currences of one and the same lexeme (le- 
xeme being defined as a word taken in one 
of its various lexical meanings). Given 
this requirement (which appears to be na- 
tural enough and, but for some special 
cases, easy to comply with), there is no 
need to include this structure in the 
CSR in its full form. It seems sufficient 
to indicate it only for those types of 
language units which directly depend for 
their translation on the properties of 
their antecedents in the text at hand. 
In Japanese (as also in other langua- 
gas) there are two types of such units. 
The first type are pronouns: when 
translating, say, the pronoun sore, the 
choice of one of tile words: thi~, lle~ 
she, its the, o~ etc.,- as its text 
equivalent will be determined, among 
other things, by the syntactic class of 
the unit chosen as the equivalent of its 
saltecedent. If this unit is a noun, one 
will also need to \]~low its number and 
(for Russia~l) gender. 
The second type of units which can- 
not be translated properly without in- 
formation about their antecedents is 
more specific. These are words which are 
graphically identical with components of 
more complex units, also lexicalized 
from the point of view of their semantic 
behaviourj and which ca~1 function as 
structural substitutes for the latter. 
When used in this function, such words 
must be replaced either by the transla- 
tion equivalents of their 6uatecederlts I 
or by pronouns (with the data on these 
antecedents used in the s~me fashion as 
in translating usual pronouns). Anyway, 
their own tra/~slation equivalents are 
ruled out. 
Thus, the word nimotsu, meaning 
"lu~'gage" if used independently, will be 
translated as "tllem" or "these parcels" 
in the context of the sentence 
Ko,limo t su-gakari- ~'a mazu nimo t su-no 
megata-o hakarimasu, 
where~Izmotsu is substituted for konimo- 
tsu ("parcel") : 
The clerk dealing with )arcels first 
~i~s ~e~ t t!~9_2~3e~3~ 
AS regards all other types of lexi- 
cal units, our approach is that the exis- 
tence of' anaphoric relations between them 
sllould be checked and the relations them- 
selves registered in the CSR for further 
processing only in those infrequent situ- 
ations (due mostly to dissimilarities in 
the combinatorial properties of the ori- 
ginal language words and of their tra/Is- 
lation equivalents, this necessitating 
the use of synonymous transfor~nations) 
where it is impossible to fulfil the 
above requirement of translating diffe- 
rent occurrences of the same lexeme by 
the sa~,le equivalent, a/Id one has to make 
sure that employing different equivalents 
in this case does not affect the original 
anaphoric structure of the traxlslated 
text. 
Linear arranK~ement of text units 
In dealing with linear arra~'ement 
of units in a ~ext in the frs/nework of 
axl automatic tra~islation system, it is 
importalat to distinguish between two 
types of their positional (word-order) 
180 " 
relations requiring different processing 
during translation. 
If the first type of such relations 
occurs between two text units, the posi- 
tion of one of them in respect to the 
other is merely a surface syntactic mar- 
ker showing the presence (or absence) of, 
says some semantico-sy~Itactic link be- 
tween the two, an anaphoric relation be- 
tween them, a syntactic word-group boun- 
dary, and so on. In case of the second 
type such position is me~nillgul in it- 
sel~', irrespective of whether it should 
or should not be taken into accotult when 
establishing certain syntactic links or 
boundaries: it shows the relative posi- 
tions of the units in question in the 
co~mnunicative structure of the text (i.e. 
from the point of view of its functional 
perspective). 
It should be noted that the opposition 
of these two types of positional rela- 
tions is not the same as that of rigid 
(fixed) and free word order: while free. 
word order is always "semantic" to some 
extent, rigid word order ca/is to our 
mind, correspond to both cases, depend- 
ing on whether the given syntactic con- 
struction with rigid word order correla- 
tes in the language under consideration 
with any alternative constructions pro- 
viding the same predicate-argument struc- 
ture alld/or syntactic grouping of their 
Eomponents~ but assigning them a diffe- 
rent linear arra/l~'ement (a possible exam- 
ple of such alternative constructions 
which can be considered as dependent for 
their selection on the word order requi- 
red, rather than vice versa, is furnish- 
ed by predicative constructions differ- 
ing in their voice value). 
Guided by the "ex/laustiveness" prin- 
ciple, we judge it expedient for the CSR 
to contain information both about the 
"mea/lingful" axld the "auxiliary" type of 
word-order relations, thouz% represented 
and employed in different ways. 
The sphere of employTnent of the "auxi- 
liary word-order information is practi- 
cally limited to the analysis and synthe- 
sis procedures. During the analysis phase 
this information serves mainly as a means 
of revealing and formally representing 
units and constructions pertaining to 
other components of the CSR; in the syn- 
thesis phase it is used to obtain the 
correct form of the same type of units 
and constxnlctions in the target language. 
The corresponding facts of the linear ar- 
rangement of the text do not play any in- 
dependent role either in its semantic 
processing or in choosing translation 
equivalents for its units, so it is per- 
fectly sufficient to regard them as just 
one of the various features of tile units 
and constructions involved, important 
enough to be registered in their lingui- 
stic descriptions, but constituting no 
separate objects of description. To in- 
corporate these facts in the CSi~, we re- 
sort to nuunbering the words in the text 
processed in the order of their succes- 
sive occurrence (tile resulting numbers 
used also, in combination with some other 
data, as their identifiers throughout 
the processing). 
If, on tile contrary, a construction 
is characterized by a meaningful word- 
order relation between its lexical com- 
ponents, it is given the status of a 
special "positional unit", distinct from 
the constrnction itself and represented 
explicitly in ~he CSI~. Such a unit di- 
rectly participates in semantic opera- 
tions, including those of translation 
proper, which means that it must flare 
its own description (in particular, its 
o~I translation equivalent). It stands 
to reason that the range of inter-lo/l~- 
uage correspondences involving positio- 
nal units of either the source or the 
target laxlguage is not restricted to this 
class of units alone, as the communicat- 
ive organization of text COjl also be con- 
veyed by some types of syntactic const- 
ructions and lexical elements. An exam- 
ple is the Japanese particle~ as used 
in independent sentences (or, sometimes, 
in the main clauses of complex senten- 
ces), where its best Russian equivalent 
(if the salne type of predicative const- 
ruction is used) is the reverse order of 
the subject and the predicate. 
As we see, here also, as in the other 
components of the CSR, there is a compro- 
mise between the "semanticity" and the 
"superficiality" requirements. On the one 
ha/Id, explicit indication of tile word- 
order relations found to be meazlingful 
in the text processed characterizes some 
aspects of its semantic structure. On 
the other ha/~d, the forth of "positional 
units" chosen to represent them is ra- 
ther superficial in that it does not dis- 
play the semantic correlations underlying- 
the interchs/M/eability of these units 
with other structural text features(such 
as the selection of the nexus vs. junc- 
tion form of expressing the predicate- 
ar~ument dependencies between text units; 
the use of "relational" words~ of the 
0per, or Funci type and the like; the oc- 
currence of emphatic particles and con- 
structions, etc.). 
Conclusion 
In the foregoing we have tried to 
show the way the Combined Structural Re- 
presentation of text reflects the requi- 
rements of "semanticity", "superficiali- 
ty" and "exhaustiveness" formulated at 
1'81 
tile beginning of the paper as essential 
for the basic level of fonually repre- 
senting text structure in an automatic 
translation system. We shall now briefly 
recapitulate ~he points. 
The "semanticity '' requirement is ac- 
counted for in the CSR, in the first 
place, by the very possibility provided 
in it of explicitly describing the five 
most important aspects of text structure 
and composition, as stated above. The 
quest for "semanticity" forms also the 
basis of the principles we employ in se- 
lecting concrete information to be made 
explicit. Among these one can mention 
the criterion of structural significance 
of the units to be represented in the 
CSR as separate words or quasi-words; the 
principle of "immediate sems/itic substa~- 
tiation" of the predicate-arguJnent synta- 
ctic relations registered in it; the re- 
quirement of supplying the elements of 
the lexico-grammatical composition of 
the text under consideration, as well as 
of its linear arrangement, with indica- 
tions of their meaningful or auxiliary 
role within this text; the employment 
of a special formal language to define 
the semantic properties of words and 
word-combinations, etc. 
The "superficiality" of the CSR is 
seen, erelong other things, in the fact 
that this level of text representation 
envisages the use of lexico-syntactic 
translation equivalents and does not ne- 
cessarily require decomposition of lexe- 
rues into combinations of smaller units 
of mea/~ing, such decomposition consider- 
ed appropriate but in comparatively rare 
cases of descriptive and interpretative 
translation. Other features of the CSR 
originating from the "superficiality" 
principle are absence of exhaustive in- 
formation about the anaphoric structure 
of the text, inclusion of only those da- 
ta on syntactic word-grouping which are 
of importance for the translation pro- 
cess, direct tr6u\]slation of elliptical 
constructions, wherever possible, etc. 
Finally, the "exhaustiveness" require- 
ment is specified as w:hat may be called 
the "lose-nothing" principle of construc- 
ting tlle CSR. It inca/is that when special 
labels are fetched in it to explicitly 
display various structural elements im- 
plicitly present in the surface form of 
the text at hand, tile surface text mark- 
ers (such as the "auxiliary" type word 
order; morphological features expressing 
t~rammatical agreement or government; 
function words and punctuation marks ha- 
ving no independent translation equiva- 
lents, a~d so on), though having been 
used already to reveal those elements, 
are not eliminated from the representa- 
tion being formed. They are merely sup- 
plemented by the designations of the elo- 
me\]its revealed, as well as by formal in- 
dications of their o~al auxiliary nature, 
and thus remain accessible for any fur- 
ther analysis that might prove useful, 
should it tur~ out that their f~mctions 
in the text are not limited to just iden- 
tifying tile units already made explicit. 
Notes 
1 3.M.~anan~¢Ha. !< np0OneMe n0CTp0- 
eH~S ¢0pManBH0~ M0~enM np0uecca nepeB0- 
~a. -B EH. : Te0pMs nepeB0~a M HayqHBIe 
OCHOBH HO~FOTOBKH HepeBOAq~EOB. qaCTB 
~. M., \[975, C. 'I65-~72. 
2 Z.M. Shalyapina. Automatic transla- 
tion as a model of tlle human tra/islation 
activity. - International Forum on Infor- 
mation and Documentation, 1980, vol.5, 
No.2, p.13-23. 
3 An earlier version of text repre- 
sentation aimed at incorporating the 
principles proposed had been developed 
in the framework of all A/iglo-I{ussian au- 
tomatic translation project a/id describ- 
ed briefly in:3.M.Wa~HMHa. AHF~o~pyc- 
cE~ MHOFOaoneETH~ aBTOMaTZqecK~z cn0- 
Bap~ (APMAC). - MamHHH~ nepeB0~ 
np~Ena~HaS nHHFBMCTNKa. BS:H. \[7. M. 
\[974, C. 7-67. 
The notion of levels of text pro- 
cessinz is not identical with the notion 
of levels of text representation (al- 
though there certainly exist some strong 
correlations). Linguistically, the for- 
mar corresponds rather to tlle notion of 
la/16u/age tiers introduced in: ~.,D.Bap- 
~y~B. 0CHOBBI 0HM0aTeZBH0~ ~MHFBHCTMKM. 
M., "HayEa", ~977. 
5 A detailed formal definition of 
this la~ICuage and a description of some 
of its linguistic interpretations are 
given in: 3.M.~aasHHHa. ~opMaaBH~I~ 2BME 
~22 3aHHOH TOJIEOBaHH~ CJIOB PI OZOBOOOqe- 
TaHOe. - \[Ip0OneMu EzOepHeT~EZ. B~In. 36. 
M., \[979, c. 247-278. 
6 There is also a paper on a French- 
RussisJl automatic translation project 
where a similar type of structural infor- 
mation is mentioned as necessary (see: 
D.A.AnpecsH z Ap. H~HrBZCT~qecEoe 00ec- 
neqeH~e B cz~cweMe aBToMaT~qecE0r0 neDe- 
B0~a Tpew~er0 n0E0neHz~S. M.,\[978, c.~3). 
In our case of Japax, ese-Russian traxlsla- 
rich, however, such infor~nation seems to 
require more attention due to wider dif- 
ferences between positional and other 
rules of expressin C the corresponding 
constructions in the two 18/qguages. 
182 
