A IklNGUISTIC APPROACH TO THE DESIGN OF A I~LNQUA@E FOR 
COMI~UTATIONAL LINGUISTICS 
J 
V. Andreweht shenko 
Faculty of N~erical Mathematics and Cybernetics, 
lfosoow State University, U3SR 
Computational Linguistics is a sphere of science and its 
applications lying between ltn~zlsttos and computer science. 
The main taBk of computationaZ lin~istics is developing 
methods and design~u K tools for man-machine oonnnunication° To 
this task its two main directions are subordinated= natura~ 
language data processing, including machine translation, and 
automation of linguistic research, Including automatic le~Loo- 
g~aphT. The formation of computation~l linguistics requires 
designing a unified language, z~ch enouKh to satisfy diverse 
computational conditions arising in the above-mentioned 
applications e 
As such a language we propose LZCOL CZ~nKua COmputatio- 
num ~Lngu£sticszsnn), an inevitably short review of which is 
given in the present report. The l~Lstic approach to the 
desiKn of this language consists in view~n~ it as a 8en~iotic 
system and forming its unite in accordance with R. Jekobson's 
linguo-se~iotic functions. 
LZCO T. is ~.utendsd for conmnln£cstion between man (native 
speaker, user) end computer (interpreter of this language) in 
communicative situations of automatic and automatised (user 
directed) natu~s£ 18z~uaKe data process~. The concepts and 
constructions of LICOL presuppose a rather broad range of 
users - from nonprofessional ones (translators, lsy~co~p~aph - 
ere, editors, etc.) to computational linguists and tradition- 
al prosrsI~ez'8° LIOOL can be used as a command la~,u~e in 
- 17 - 
information retrieval, as a data description and data manipul- 
ation language in data base design and as a programming lan6ua- 
ge in the usual sense. It is intended as means for defining 
and control both in dialogical ('on-line) and in batch (off- 
-line) processing. 
As a semiotic system LICOL consists of signs - bilateral 
entities - composed by sig~lfiants (sequences of letters) and 
signifi6s - those entities of real or conceptual world which 
are objects and/or means for automatic processing. The role of 
signlfi6s in this language can be played by the sig~.ifiants 
and by the signs of the same language. This "world of lan~a6e" 
we name the system of its concepts the relations of which are 
expressible in terms of relations between the signs of the 
language. To use LICOL, specificalXy to program in this lan6ua- 
ge, is to express one's thou~ht, knowledge, notions in the 
framework of the "world of LICOL" in accordance with the rules 
of its grammar (syntax). In this language the function-ar~nnn- 
ant form is chosen as means for ~ statin~ relations between the 
concepts: suffixal compound stems and prefixal and/or suffixal 
incorporated synta6m8 , the part of the suffixes bein 6 played 
by names of operations. The notation of LICOL therefore has 
the retersed Polish form. 
Since LICOL is a language for computation, the main 
concept of its "world" (i.e. the main sign of its semiotic 
system) is the notion of Computational Construction (CC). 
The CCs are constructional material both for the prO~TSme and 
the data, the underlying form for which are IC trees. Since 
the trees are easily representable in a linear form by means 
of the Polish form of notation, it is natural to interpret 
proteins as data and vice versa, the data being fractured into 
relationally-hierarchical data network. This allows to consider 
the data base also as a data base for procedurally represented 
knowledge and therefore not to draw distinctions between the 
two main forms of data representation - a naming and a proce- 
dural one. 
- 18 - 
Aocoz.11ng to the type of signlfla,~ts the COs axe dlvld- 
ed into rep:esentln8 and processing co-structions, each of 
which are further subdivided: the representing class into 
names and pictures and the processing class into controllers 
and operations. If the si~Lfiant is, semiottcally, a symbol, 
we have to do with a namin~ construction, if it is an icon, 
we have a picture construction! the index-sign represents 
either a controller or an operational construction. The sign- 
ifi6e for' the CCs are the so-called descriptions consisting 
of 'a descriptor (which corresponds to the concept of the 
sign) and of its referent (value). 
According to the type of signifi#s the CCe are divided 
into real, virtual and notational constructions. The real CCe 
CozTespond to external data of the usual programming langua~. 
ee. They are structured into elements, chains, fields, re- 
cords, fragments, sets and bases. The virtual OCs correspond 
to the user's notions of processing and are structured into 
atoms, sequences, trees, ~un6hes (arbitrary graphs), blocks, 
files and (file) systems. The notional CCe correspond to con- 
etitutive parts of entitles of si~mtfiante. These are: lett- 
ers, strings, groups (of strings), segments, modules, corpus 
and packets (of texts). 
Acco~ding to the form of value the CCs can be subdivid- 
ed into scalars, vectors and lists. The scalar CCs have the 
following typess numbers, codes, logicals, figures, symbols, 
keys, references, descriptors and masks. The notion of the 
vector corresponds to one of array, its components may be not 
only scalars but also vectors or lists, provided their com- 
ponents are of the same type. The lists may consist of seal- 
ers, vectors or subliste which may be of route, tree, struct- 
ure or executive type. 
Such multibase classification of the CCs has the follow- 
ing sense. The operations of LICOL are defined on the viz'tual 
CCs having ve.z"J.Ous origins- either netational (textual) 
- 19 - 
constituents or denotational, virtual and real CCs. They may 
be intended either for displaying in textual form, or they 
may be used in further processing as virtual ones, or they 
may be transmitted in external environment in the form of real 
CCe. 
The CCs can be defined either by description of their 
type and the mode of evaluating or by a picture, the simplest 
type of which is a literal. Two or more CCs can be associated 
together one of them being an object and the rest of its 
features. There are the following possibilities: implicit 
transformation of data from one type into another; indirect 
definitions of operands! participation in operations by 
objects and their features both separately and Jointly! eval- 
uation of the operands via pictu~s, the operands may be the- 
reby procedures. Diverse operations on sequences, sets, graphs 
with labelled and unlabelled nodes and arcs are defined. This 
allows operating both on the constituents and dependences, to 
form both the paradi~ns and synta~ms, to examine alternatives 
and to control thAs processing by putting diverse conditions 
and restrictions on evaluating objects by pictures without 
explicit description of the processing sequence. Specifically, 
some operations on files and systems can be immediately inter- 
preted as operations on dictionaries. 
The system of unite in LICOL is defined by a system of 
linguosemiotioal functions, i.e. it is necessarily close to 
the structure of functions of natural langunge, specific 
features of programming are taken into consideration. This 
allows to proceed from expressions in a natural language to 
expressions in LICOL, i.e. the highest function is fulfilled, 
the mstalan~e~e function, the existence of which is ensured 
by the fulfilment of lower functions: the cognitive one, the 
oo--nunicative one etc. 
- 20 - 
