GRAMMATIC AND S~ANTIC NORMATIVITY OF LINGUISTIC UNITS 
AND FEATCRES AS A FACTOR OF AUTOMATIC TEXT PROCESSING 
Z. M. ShalyapIna 
Instltut vostokovedenA~a AN 333R, 
ul. ~danova 12, 103 777 Moskva GSP, SSSR 
All systems of automatic text processing are explicitly 
or implicitly based on two general linguistic ~ssumptlons: 
the assumption o~ grammatioallty of the texts processed, and 
the assumption of their meaningfulness. These assumptions, 
however, cannot be considered as absolute laws: It is not un- 
oommon that a text, thongh acceptable to most speakers of the 
corresponding language, still contains some morpholoEic 
and/or syntactic lngrammatlcalitles or cannot be completely 
Interpreted In terms of "standard" semantics; and conversely, 
startlng from an acceptable (meaningful) semantic structure 
one may as often as not fail to find fully "grammatical" 
language means that could express thls structure wlth absolute 
accuracy (one of the usual translation difficulties). Thls As 
due, f~om our point of view, not only to Incompleteness of 
linguistic and extrallngulstlc knowledge of separate people 
or to Imperfections In the corresponding formal models, but 
also to the following two fundamental principles of Ilnguls- 
tic performance: 
I) a large number of ~equlrementa on lexico-gremmatic 
(superficial) manifestation of natural-language texts, and 
on their semantic Interpretation, are relative In that they 
characterize certain manifestations or Interpretations as more 
or less normative (preferable) in the given conditions, rather 
than obligatory vs. Inadmissible In the absolute sense; 
- 255 - 
2) the interaction between the requirements of grammat- 
Ical and semantic normativlty of texts adheres to a so~ of 
~omDlementarlt,7 principle: if the basic meaning of a text 
fragment is supposed by Its author to be sufficiently trans- 
parent or known aprlori to the text addressee, the grammatlc- 
allty requirement for this fragment*s surface manifestation 
may be somewhat slackened; if on the contrary, the author 
believes the text to contain much Important Information new 
to the addressee, the language rules used in composing its 
surface manifestation are apt to be as standard and rigid as 
possible. 
In this presentation we Intend to describe one Way of 
incorporating the above principles in the design of the 
analysis and synthesis (generation) components of an automat- 
Ic translation system. 
The general structure of the system viewed from this 
standpoint Is planned to be as follows. 
The major pert of factual linguistic Anformatlon Is 
formulated in the system regardless of the tasks of analysis 
or synthesis. It is shaped principally as a set of descript- 
ive rules arranged into dictionary and grammar according to 
the so-called lexlco~aphlc principle and classified into two 
maln types: the context-representation rules making up the 
contextual dictionary and grammar component, and the context- 
-contrastlvs rules forming the Inter-contextual grammar. 
The rules of both types describe the possible superficial 
manifestations and semantic Interpretations for elementary 
potential components of text structure. The kind of text 
structure serving as the point of reference for this descript- 
ion 18 defined in our model at the language-sign (LS) level, 
based primarily on the Saussurian conception of linguistic 
sign and roughly corresponding to the level of N.Chomsky's 
deep structures. 
The context-representation rules proceeding from thl8 
type of structure specify the contextual functioning of 
- 256 - 
language unite and features Isolated at the LS-level, by 
relating them to their associated manifestations end Intex-- 
pretatAons° F~sentlally, they amount to statements of the 
following pattern: "If the I~S-structure of s text contains s 
certain unit or feature X An a certain contextual position, 
thAs unit or feature can be superficially manifested (reap. 
semantically Interpreted) In this text through the use of 
expression means ¥ Creep. of meaning constituent Z)° ~ 
The above principle of "relativity" As Ancorpor~ted In 
these rules by supplementing each of them by Its priority 
coefficient showing the degree of Its nor~ativIty. In con- 
trast to mar~7 other "preferential" linguistic models we 
emphasize the linguistic significance of these coefficients 
which, An our vAsw, must be derived primarily from the Inter- 
play of synonymy and homony~ as phenomena Inherent In natural 
language. With our linguistic description centered as It Is 
around the notion of linguistic sign In the Seussurian sense, 
It Is possible to evaluate these phenomena, as well as the 
priority coefficients r~qulred~ In terms of statistical data 
bearing on the occurrence rate (relative f~equenc~) of 
vlla'ious specific mstulfestattons and Interpretations of each 
LS-struotu~e among their alternatives. 
The context-contrsstlve rules Anplement the "relativity" 
principle even more Immediately. Their general pattern is: 
"If a fragment of the L$-structure of • text has several 
alternative manifestations (r~sp. Interpretations) differing 
An a certain characteristics Y, preference should be given, 
all things being equal, to the alternative where the value of 
Y As related to the values of the same variable for the other 
altex~aatives An a definite wa~". In terms of such rules one 
can stats all those particulars of the surface and/or semantic 
arrangement of natural-language texts (or of a special t~pe 
of texts) whAoh involve a kind of overall stylistic comparison, 
rather than the properties of Individual linguistic units and 
structural features. 
- 257 - 
The descriptive part of the system is made operational by 
means of special control components acting as "planners" of 
the analysis and Synthesis processes. One of the main tasks 
of these components within the framework outlined consists in 
grading the alternatives obtained from processing separate 
text fragments, as more or less p~omisln~ for accomplishing • 
the analysis (rasp. synthesis) of the whole text, this gradat- 
Ion based, among other things, on the priority coefficients 
of the rules used to form (or check) different aiternattvesj 
and on the interrelation between these rules with respect to 
the grammar and semantics "complementarity" principle. In as 
much as this aspect of processing Is concerned, the approach 
accepted makes It possible to re-lnterpret the well-known idea 
of "analysis through synthesis" and "synthesis through analy- 
sis" from the "normetlvlty" angle of vlew. Thus, for analysis 
one can reduce this Idea to a formalization of the followlng 
llne of reasoning (quits popular with translators or people 
somehow concerned with texts In foreign languages): "Express- 
ion X in the text at hand cannot mean Y because had the author 
meant Y he would have much rather used expression Z". 
Apart from affording better processing accuracy and 
efficiency, explicit introduction of data on normatlvlty and 
preferability of linguistic unite and features throughout all 
the major components of a text processing system, and drawing 
on statistical characteristics of LS-units" contextual mani- 
festations and interpretations as the controlling factor In 
selecting the more "promising" among the alternative routes of 
processing concrete texts, seems to have one more asset. 
Name\]y, possibilities are opened up for automatically perfect- 
£r~ the system's functioning when required, and tailoring It 
to different text styles, by way of modifying the priority 
I coefficients of the linguistic rules Involved , directly from 
the current results of the system's operation. 
- 258 - 
