OPTIMIZATION ALGORITHMS OF DECIPHERING AS THE ELEMENTS OF A LINGUISTIC ~HEORY 
B.V.SUKHOTIN 
Institute of the Russian Language 
%2~ 0~9, Volkhonka ~8/2, Moscow, USSR 
Abstract 
This paper presents an outline of the 
linguistic theory which may be identified 
with the partially ordered set of optimi- 
zation algorithms of deciphering. An 
algorith~L of deciphering is the operational 
definition of a given linguistic phenomenon 
which ha~, the following three components: 
a set of admissible solutions, an objective 
function and a proaodure which finds out 
the mini,4~m or the maximum of the objective 
function. 
The p~er contains the description of 
the four algorithms of the proposed type: 
~. The algorithm which classifies the 
letters into vowels and consonants. 
2. The ~Lgorithm which identifies the 
morphemes in the text without the boundaries 
between words. 
3. The algorithm which finds out the 
dependency tree of a sentence. 
4. The algorithm which finds out the mapping 
of the letters of an unknown language into 
the letters of a known one. 
The forties and the first half of the 
fifties were marked by the pronounced 
interest of the linguists to the so-called 
"discove~r procedures". These investigations 
were not very successful at that time. The 
Chomskyan~'criticism also hindered the 
progress in this direction. 
There is no reason to revive the old 
discussions. We will try to show further 
that the optimization algorithms we propose 
combine the theoretical generality on the 
one hand with the practical usefulness on 
the other. Moreover it appears that the 
methods of the generative grammar theory 
and those of the discovery procedures are 
even not at all contradictory. For example, 
in a recent work of M.Remmel the set of the 
admissible solutions is determined as a set 
of the generative grammars of N.Ohomsky. 
In this paper we prefer to use the term 
"deciphering procedures (algorithms~" 
instead of "discovery procedures", because 
the latter implies the operations which are 
not necessarily formal. 
An algorithm of linguistic deciphering 
is a formal procedure aimed at the recogni- 
t~nn of linguistic objects in a text whose 
language is not known to the investigator. 
Assuming that any deciphering procedure 
may serve as a definition of the respective 
linguistic object we may vow the set of 
such procedures as a certain linguistic 
theory which has the following properties: 
I) A greatdegree of generalization, 
because its definitions should be valid 
both for the known and unknown languages. 
2) Formality, because naturally enough, 
the deciphering procedures should be 
presented in.the shape of algorithms. 
3) Constructivity, i.e. the possibility 
of identifying a certain linguistic object 
with the help of a deciphering procedure 
within a reasonable time interval. 
645 
To identify a linguistic object a 
deciphering algorithm makes use of a set 
of its features. It seems obvious that a 
linguistic object cannot be defined by 
means of binary features alone. The 
following scheme seems to be better 
founded: 
I. Binary features are used to 
determine the general type of certain 
linguistic objects. The objects belonging 
to that type form the set of admissible 
solutions of a deciphering problem. 
2. An objective function which estim- 
ates the quality of each solution is 
introduced on the set of admissible 
solutions. The values of the objective 
function are calculated with the help of 
the investigated text. They reflect the 
individuality of the given language. A 
maximum or a minimum of the objective 
function should correspond to the linguist- 
ic object which is to be defined. 
3. It follows that a deciphering pro- 
cedure should be an optimization algorithm 
which finds "the best" admissible solution 
- from the point of vow of the objective 
function. 
Thus, the set of admissible solutions, 
the objective function and the optimiza- 
tion algorithm constitute the definition 
of a linguistic object which may be used 
for the purposes of deciphering; a 
definition of this kind will be further 
referred to as a deciphering algorithm, 
or simply, an algorithm. 
There is a natural hierarchy of de- 
ciphering algorithms. An algorithm B is 
senior to an algorithm A if the former 
makes use of the information provided by 
the latter. If A and B work alternatively 
each time improving the output, then the 
seniority is determined by the first 
iteration. Taking into account the fact 
that the set of essentially different 
algorithms should be finite, it appears 
that there must exist "zero" algorithms 
which use no information produced by any 
other deciphering algorithms. 
Zero algorithmz should be different 
due to the fact that the physical sub- 
stances of different languages may be 
different too. Thus the zero algorithm 
for the analysis of the written form of 
languages should be able to discriminate 
between a dark spot and a light one and 
to identify the place of each spot on the 
page; it should discover the set of alpha- 
betic symbols of the language. A similar 
algorithm adjusted to the analysis of 
audible speech should produce the alpha- 
bet of phonemes, exploiting its capacity 
to discern certain minimal differences of 
sonation. The plurality of zero algorithms 
may be reduced by converting signals of 
different nature into a set of curves. As 
it is well known such algorithms are the 
goal of pattern recognition theory. 
Senior algonithms should be used for 
the analysis of grammar; the highest levels 
correspond to the problems of semantics 
and translation. 
~any algorithms of different levels 
display great similarity and sometimes 
even identity, their only difference con- 
sisting in the linguistic material which 
serves as the input. The following types 
of the algorithms may be pointed out: 
I. Algorithms of classification, which 
divide the set of investigated objects 
646 
into sew~ral subsets. 
2. Algorithms of aggregation which 
form larger units from smaller ones° 
3. Algorithms of connection which 
find out some relation of partial 
ordering. 
4. Algorithms of mapping the elements 
of an unknown language into the elements 
of a known one. 
The most simple classification algorithm 
is that which classifies the set of letters 
A = ~i~ into vowels and consonants. 
In this case an admissible solution is 
a division 
, vu C=A, 
The objective function reflects the fact 
that letters of the same class co-occur 
rather rarely whereas letters of different 
classes co-occur relatively more often; 
it is formulated as follows: 
Here f(li,1 j) denotes the frequency of 
letters I i and lj. The maximum of Q(D) 
corresponds to the optimal classification. 
An appropriate optimization procedure 
reduces the amount of divisions that should 
be evaluated to a reasonable number. This 
algoritl~ has been thoroughly tested ina 
number of computer experiments and in 
every case yielded almost entirely correct 
results° 
The most important algorithm of 
aggregation is the morpheme identification 
algorithm. Apart from identifying morphemes 
this algorithm discovers an IC graph which 
shows the way in which morphemes are 
combined into words° An admissible solution 
in this case is a sequence of divisions 
D~,...,D n of the text, each class of D~E+~ 
being included in a certain class of D i. 
A morpheme m is the string of letters at 
least one occurrence of which should be an 
element of a certain class of D i. 
The sequence DT,...,D n determines the 
set of morphemes in a unic way. The 
objective function is set up by ascribing 
to each morpheme a certain number q(m) 
which is great when m consists of the 
letters which predict each other stronger 
than they predict the letters of the 
neighbouring morphemes. A number of sx-- 
periments have been carried out; the best 
results have been obtained with the help of 
the following function: 
f2(aXb) 
q(m) = q(aXb) = max(f(ax), f(Xb)) 
- max(f(slbx), f(yaXb)) 
x,y 
Here f denotes the frequency of a string, 
a is the initial, b is the final letter of 
m, y is a letter which precedes m, x is a 
letter which follows it, X is a string. 
The best solution should correspond to 
the maximum of Q(M) = ~ q(mi) , where 
M = ~mi~. A Russian text of IOOOO letters 
was chosen for the experiments. Here is 
an extract of the analysed text: 
((~exoBe)z) c ((a~mTZ)O~) (.Pop,K) OZ 
(yzop~s~)(eHHo) ycMex ((Hy~)c2) 
Representative of the algorithms of the 
third type is the algorithm of finding the 
dependency graph of a sentence. For this 
purpose the words of the language should 
be classified into syntactical classes 
so that we may consider a word v to be 
included in a class K v. The conditional 
probability P(Kv/Kw) of occurrence of K v 
near K w is calculated with the help of the 
text. 
647 
The set of admissible solutions is the 
set of all possible dependency trees which 
may be ascribed to a given sentence. The 
conditional probabilities provide the 
weights for the arcs of the tree. The 
quality of a tree is the sum (or the mean) 
of the weights of all arcs. The optimal 
tree presumably has the maximum quality. 
A great number of the algorithms of this 
type have been tested in computer experi- 
ments; the best ones correctly identified 
more than 80% of connections. Here is a 
typical example taken from an experiment 
which was carried out for a Russian text 
of ~0000 words: 
O~Ha~u zrpa~B ~ap~ y 
A~v. Verb Prep. Acc.Sub. Prep. 
Algorithms of this type may be used for 
the purposes of machine translation, in 
which case a greater amount of the input 
information is needed. 
A typical example of an algorithm which 
obtains the mapping M = ~E i -+ E~ ~(E i 
being some elements of the unknown language, 
E~ - the respective elements of the known 
one) is furnished by the algorithm which 
discovers the pronunciation of letters. 
It is based on the ~ypothesis that 
letters of two different languages which 
have similar pronunciation possess similar 
combinatory power as well. 
The oombinatory power of the letter i i 
may be described by the vector of 
conditional probabilities G i = P(li/ix) 
which characterizes the occurrences of I i 
in the neighbourhood of Ix.I~ the same 
way, the vector C i = P(li/Ix) characterizes 
the combinatory power of i~. 
64B 
The quality of a mapping may be 
estimated by the formula: 
Q¢~) = ~ d(ci,c~) = ~ ¢li,li) 
Here d denotes the distance (e.g. 
Euclidean) between the vectors C i and C~. 
All pairs li-~l~, lx-->l ~ belong to the 
mapping M, so that d may be calculated 
by the formula: 
d(~i,1 ~) = V~x(P(l±/l x) - P(Z~/l~)) ~ 
The minimum of Q(M) corresponds to the 
optimal mapping. Some algorithms of this 
type have been tested with interesting 
results. It is obvious that a similar 
EoHHor~ap~e~a HapyMosa 
Gen. Sub. Gen. Sub. 
algorithm will be able to compile a 
bilingual dictionary with the entries in 
the unknown language, although the latter 
problem is, naturally, far more difficult. 

References 

CyXOT~H B.B. (1962). "3I{cr~ep~MeHT~HOe B~- 
~e~e.He ~accoB dyF~ c nOMO~L~ 3BM". IIpo- 
6~e~ cTpy~TypHo~ ~HrB~CT~KH, M. 

CyxoTaH B.B. (I975) "0nT~Hsan~oH.~e 
a~ropMT~u ~HrS~oT~ecsoI ~e~po~- 
~". HT~, cep.2, ~5. 

CyxoT~H B.B. (1976) "0nTa~sai~oHH~e 
MeTo~H ~OoJIe~oBaHMH H3NKa". M. 

CyXOTaH B.B. (1984) "B~e~eH~e ~op~eM 
B TeKCTSX 6e3 npo6e~oB Me~y c~oBa~m" 
M. 

FM~c ~.M., Kar~a~ C.A., Kap~ame~ 
H.C., HaHoBzaH B.H., Cy~OT~H B.B., 
XoBaHoB F.M. (I969) "BHe3eMHue 
I~HB~38II~". M. 
