B. V. SUKI-IOTIN 
DECIPHERING METHODS AS A MEANS OF LINGUISTIC 
RESEARCH 
Methods of linguistic deciphering may be regarded as a set of pro- 
cedures aimed at the recognition of linguistic objects in a text whose 
language is not known to the investigator. 
They combine many advantages of the formal approach to language. 
Assuming that each deciphering procedure may serve as a definition 
of the respective linguistic object we may view the set of such procedures 
as a certain linguistic theory which has the following properties: 
1) A great degree of generalization, because its definitions should 
be valid both for the known and unknown languages. 
2) Formality, because naturally enough, the deciphering pro- 
cedures should be presented in the shape of algorithms. 
3) Constructivity, i.e. the possibility of identifying a certain lin- 
guistic object with the help of a deciphering procedure within a rea- 
sonable (finite) time interval. 
To identify a linguistic object a deciphering algorithm makes use 
of a set of its features. Those features are sufficient for a non-constructive 
definition of the object under investigation and to a very great extent 
determine the kind of the recognition algorithm. 
It seems obvious that a linguistic object cannot be defined by means 
of binary features alone. A definition based on binary features will 
be either too specific and valid only for a chosen language, or too abstract 
and insufficient for identifying the object in a given text. 
The following scheme seems to be better founded: 
(1) Binary features are used to determine the general type of cer- 
tain linguistic objects. The objects belonging to that type form the set 
of admissible solutions of a deciphering problem. 
(2) An objective function which estimates the quality of each 
solution is introduced on the set of admissible solutions. The values of 
the objective function are calculated with the help of the investigated 
text. They reflect the individuality of the given language. 
14 
210 B.V. $UKHOTIN 
A maximum or a minimum of the objective function should cor,- 
respond to the linguistic object which is to be defined. 
(3) It follows that a recognition procedure should be an optimi- 
zation algorithm which finds "the best" admissible solution - from 
the point of view of the objective function. 
Thus, the set of admissible solutions, the objective function and the 
optimization algorithm constitute the definition of a linguistic object 
which may be used for the purposes of deciphering. A definition of 
this kind will be further referred to as a deciphering algorithm, or simply, 
an algorithm. 
There is a natural hierarchy of deciphering algorithms. An algorithm 
B is senior to an algorithm A if the former makes use of the information 
provided by the latter. If A and B work alternatively, each time im- 
proving the output, then the seniority is determined by the first iteration. 
Consequently, taking into account the fact that the set of essentially 
different deciphering algorithms should be finite, it appears that there 
must exist "zero " algorithms which use no information produced 
by any other deciphering algorithm. 
Zero algorithms should be different due to the fact that the phy- 
sical substances of different languages may be different too. Thus the 
zero algorithm for the analysis of the written form of languages should 
be able to discriminate between a dark spot and a light one and to find 
the set of alphabetic symbols of the language. A similar algorithm 
adjusted to the analysis of audible speech should produce an alphabet of 
phonemes, exploiting its capacity to discern certain minimal differencies 
of sonation. The plurality of zero algorithms may be reduced by con- 
verting signals of different nature into a set of curves. As is well known 
such algorithms are the goal of pattern recognition theory. 
Senior algorithms should be used for the analysis of grammar; the 
highest level corresponds to the problems of semantics and translation. 
Many algorithms of different levels display great similarity and 
sometimes even identity, their only difference consisting in the linguistic 
material which serves as the input. 
Thus the algorithms that classify letters according to their pro- 
nunciation may closely resemble the algorithms that classify morphe- 
mes according to their grammatical role; the algorithms that find 
the boundaries between sentences may be similar to those that find 
boundaries between words and so on. 
The following types may be pointed out: 
1) Algorithms of classification, which divide the set of investi- 
DECIPHERING METHODS AS A MEANS OF LINGUISTICS RESEARCH 211 
gated objects into several subsets. For instance, the letters are classi- 
fied into vowels and consonants, morphemes - into auxiliary and root 
morphemes, words - into parts of speech. 
2) Algorithms of aggregation which form larger units from 
smaller ones. For instance, they put together letters into morphemes 
or syllables, morphemes into words and words into sentences. 
3) Algorithms of connection, which find Out some rdation of 
partial ordering. A typical example is provided by different algorithms 
of discovering the dependency graph of a sentence. 
4) Algorithms of mapping the dements of an unknown language 
into the dements of a known one. Algorithms of this type should 
solve problems of translation and discover the kinship of languages. 
The most simple classification algorithm is that which classify let- 
ters into vowels and consonants: 
In effect this algorithm is valid for any string composed of objects 
of two different classes and characterized by the fact that objects of the 
same class co-occur rather rarely whereas objects of different classes 
co-occur rdatively more often. 
The set of admissible solutions in this case is a set of divisions of the 
list of objects into two subsets; the quality Q of a division D = { K1, 
Ks } is evaluated by the following formula: 
Q---2 .~ f (e,, ej). 
i 1 
Here f (e~, ei) denotes the frequency of co-occurrence of objects 
e,, e s from classes/('1 and Ks respectively. The maximum of Q corresponds 
to the optimal classification. An appropriate optimization procedure 
reduces the amount of divisions that should be evaluated to a reason- 
able number. This algorithm has been thoroughly tested in a number 
of computer experiments and in every case yielded almost entirely 
correct results. 
The most important algorithm of aggregation is the morpheme 
identification algorithm. Apart from identifying morphemes this al- 
1 See \]3. B. CyXOTHH, YIpo6aeMH cTpy Typnofi anHrB~ICTn~, 1962. Later 
on appeared the works of V. V. SEVORO~Kn~, Ik. C~Lo and A. TlmTmxo~. 
Since the pioneering work of Z. I-tam~s, From phoneme to morpheme, in ~ Language 2, 
XXXI (1955), pp. 190-222, attempts for solving this problem were made by N. D. 
ANDaE~V, A. Y. ~a_rv~vI~. The author's first paper on this problem appeared in 1963 
(Hpo6AeM~ cTpyHTypHofi ArlI-IrBHCTrII~H). 
212 B.V. $UKHOTIN 
gorithm discovers an IC graph which shows the way in which morphe- 
mes are combined into words. The algorithm is valid even for texts 
which have no special devices for marking the boundaries between 
words and may be used in order to fred those boundaries. 
An admissible solution in this case is a series of divisions D1 .... , D~ 
of the text, each class of D~ being included in a certain class of D~+1. 
Morphemes form the classes of the smallest division D1, the classes 
of the biggest division D~ corresponding to words. 
The objective function is set up by ascribing to each class K~ of 
D i a certain number p(K~) which shows the strength of mutual pre- 
diction of the components of Kit and by adding up all p(K~i): 
Q= .~v 2; p(Ko,). 
i y 
p (K~) is the product of Sty,, (Ki~) (internal stability) and SG (Ki~) (exter- 
nal stability). 
Sti~ (Kij) is the mean conditional probability 
1 (f(l~) f(r.) 
where the string K~j of the length L is divided into the left part l~ and 
the right part r~ (l~r~ = Kq) in all possible ways; f(l~), f(r~), f(K~) de- 
noting the frequencies of the respective strings. 
SG(Ki~) is equal to zero if there is a string K such that K~ c K 
and f(K~)=f(K), and equal to 1 in other cases. 
This algorithm was tested in a number of manual experiments. 
A large computer experiment is going on at the present time. 
It is only statistically that the immediate neighbourhood of the 
words in a text reflects their semantic connections. The understanding 
of the text is greatly facilitated by the discovery of those connections, 
a procedure carried out by the connection algorithms. 
Representative of these is the algorithm of finding the dependency 
graph of a sentence. For this purpose the words of the language should 
be classified into parts of speech so that we may consider a word v 
to be included in a class Kv. The conditional probability p(KJK,) of 
occurrence of Kv near Kwis calculated with the help of the text. 
The set of admissible solutions in the set of all possible dependency 
trees which may be ascribed to a given sentence. The conditional pro- 
babilities provide the weights for the arcs of the tree. The quality of 
DECIPHERING METHODS AS A MEANS OF LINGUISTIC RESEARCH 213 
a tree is the sum (or the mean) of the weights of all arcs. The optimal 
tree presumably has the maximum quality. Some algorithms of this 
type has recendy been tested in computer experiments and yielded 
good results. 
One such experiment which employed 19 syntactic classes was 
carried out for a R.ussian text of about 10000 words. It has established 
about 80 % correct connections. Here are some typical examples: 
+ I II +1 + OaHamau nrpa:tr~ B r~apr~i 
Adv. Verb. Prep. Acc. Sub. 
y r~0Haoraap~efiua HapyMoBa 
Prep. Gen. Sub. Gen. Sub. 
Ylpo,me B paccenHnOCTn cnnezn nepen nycTl, IMn CBOI4MH npntopaMr~ 
Nora. S. Pr. Loc. Sub. Verb. Pr. Ins. Ad. Ins. Ad. Ins. Sub. 
Applying the optimization algorithm to the alphabet of syntactic 
classes, we get the "representating graph " which shows the typical 
connections, recognized by the given algorithm. For the algorithm 
mentioned above such representating graph 'looks as follows: 
\[ Adverb \[ 
~l No,,. s,,bs,: H Nora. ~,',j. 1 
~'~ co4,,,aio.:. 
\[ a,.,,,,d. \] \[ I,,. s,b,t. \] Ice,,. S,bst. \[. \[ Lo~. s,b,t. 
Algorithms of this kind may be used for the purposes of machine 
translation, in which case a greater amount of the input information 
is needed. 
214 ~. V. SUKHOTIN 
A typical example of an algorithm which obtains mapping M----- 
= { E~ ~ El } (E~ being some elements of the unknown language, 
E" - the respective elements of the known one) is furnished by the 
algorithm which discovers the pronunciation of letters. 
It is based on the hypothesis that letters of two different languages 
which have similar pronunciation possess similar combinatory power 
in their respective languages as well. 
The combinatory power of letter l i may be described by the vec- 
tor of conditional probabilities { p(lJlj) } which characterizes the oc- 
currences of I i in the neighbourhood of I r In the same way, vector (p(1JIj) 
} characterizes the combinatory power of 1'. 
The quality of a mapping may be estimated by the formula: 
Q(M)----- .~ d(lol" ). 
i 
Here d denotes the distance (e.g. Euclidean) between vectors (p(l d 
l~) ) and { p(l'/l~) }. All pairs l, ~ l~ l i ~ l~ belong to mapping M, so 
that d may be calculated by the formula: 
d(l,, l;) ¢-~,(p(l,/l~) p(l,/lyl) 
Y 
The minimum of Q corresponds to the optimal mapping. Some 
algorithms of this type have been tested with good results. It is obvious 
that a similar algorithm will be able to compile a bilingual dictionary 
with the entries in the unknown language, although the latter problem 
is, naturally, far more difficult. 
