COTING 82, J. ttorec/ff, (eeL) 
North-Holland Publishing Comply 
© Aeuclerala, 1982 
TRANSFORMATION OF NATURAL LANGUAGE INTO LOGICAL FORMULAS 
Leonard Bolc and Tomasz Strzalkowski 
Institute of Informatics 
Warsaw University 
PKIN, pok. 850 
00-901 Warszawa, POLAND 
This paper presents an attempt of elaboration of 
a full parsing system for Polish natural language 
which is being worked out in the Instlt~te of 
Informatics of Warsaw University. Our system was 
adapted to the parsing of the corpus of real 
medical texts whici= concern a subdomain of medi- 
cine. We made use of the experience of such 
famous authors as (6), (7), (8), (9), (10), (11), 
(12), (13), (1~). 
INTRODUCTION 
The system described below could be used as an interface of natural 
language information systems, natural question-answering systems, 
expert systems or automatic understanding of texts. The authors paid 
close attention to the syntactical and semantical constraints of me- 
dical dialogue so that the system would be used by physicians without 
previous preparation. Although a subdomain of medicine is a current 
system application, the change or development of the conversation 
field may be facilitated. It requires only that a new dictionary will 
be established and some expert parts of semantical interpreter will 
be changed. 
Our system contains two stages: syntactical analysis and semantical 
interpretation. Both stages cooperate with each other in such a way 
that the second stage checks up on the correctness of syntactical 
structures which have been built by the first one. Finally, the par- 
ser produces a formula of First Order Predicate Calculus which 
corresponds to the input sentence. Other outputs as MINSKY frames or 
FUZZY formulas are considered. 
We used the CATN method (Cascaded ATN) (lk) to implement the system. 
The CATN possesses ali of the advantages which proved true in natural 
language processing. A high degree of universality is a very impor- 
tant feature of the system. 
Here are some examples of sentences the parser can understand: 
Alkoho\] podany doustnle powoduje wzmozone wydzlelanle gastryny. 
(Alcohol given per os cause greater secretion of gastrin.) 
Alkohol zwieksza wydzielanle soku trzustkowego. 
(Alcohol increases pancreatic juice secretion.) 
Gastryna jest hormonem powodujacym wydzielanle kwasu solnego w zola- 
dku. 
(Gastrin is a hormon which cause gastric HCL secretion.) 
29 
30 L BOLe and T. STRZALKOWSKI 
Sekrytyna I pankreozynlna stymuluja czynnosc wewnatrzwydzlelnlcza 
trzustkl. 
(Secretin and pancreocymln stimulate endlcrln actlvlties of pancreat) 
wzrost napiecia miesnlowkl dwunastnicy moze byc przyczyna wzrostu 
clsnlenia w przewodach trzustkowych. 
(The lncreas of tonus of the tunica muscularis may cause higher 
pressure in the pancreatic ducts.) 
Dlugotrwale dzialanle alkoholu powoduje prawdopodobnle bezposrednie 
uszkodzenle komorek w~,dzielnlczych trzustkl. 
(Long action of alcohol propably cause direct injury in the pancrea- 
tlc endocrln cells.) 
Jakie sa kllnlczne objawy ostrego zapalenia trzustkl? 
(How appears the cllnlcal symptoms of acute pancreatifls?) 
Co stymuluje czynnosc wewnatrzwydzlelnlcza trzustkl? 
(What stimulates endocrln activities of pancreat?) 
CATN AS A TOOL 
A Cascaded Augmented Transition Network looks like two or more 
"cascades" which succesively perform the same information. Each of 
them is an ATN grammar (1) which has, In addition,, a new action cal- 
led TRANSMIT. The TRANSMIT action may be set on every arc and causes 
a piece of Information to be sent from the current "cascade" to the 
lower one. Whenever a TRANSMIT occurs each information about the 
current "cascade '= is saved on the stack while the parser operates on 
the lower "cascade" until new information or data is required. Then 
the higher "cascade" is activated from the same point It has been 
stopped. 
Two stages of our parsing system correspond to the CATN-casc~des. In 
the present realisation the struct¢ re popped from the syntactical 
stage is TRANSMITed into semantic ~nterpretation because a free word- 
order of Polish sentences prohibits another solution. Partlculary, 
the places of the subject and the main verb in the sentence may be 
varying. 
If the second stage is not able to find an appropriate interpretation 
for syntactical structure the first stage is activated to build an 
alternative parsing. I/hen such a parsing cannot be rebuilt the parser 
fails. 
In the other Implementation of CATN we used the Earley's algorithm, 
a well-known context-free parsing method (10). In this case the syn- 
tactical analyser produces all possible p~rslngs at once. The seman- 
tical interpreter has to verify them and reject each meaning-less 
parsing. 
THE FIRST STAGE - SYNTACTICAL ANALYSIS 
A surface structure of a sentence is received after the First Stage 
of the parser was applled to an utterance. It means that such eleme- 
nts as VERB/ACTION, SUBJECT, OBJECT (direct and indirect), PREPOSI- 
TION PHRASES etc. are found out. 
Polish natural languaKe is a typlcal example of a flexiona\] language. 
One of its most characteristic features is a free word-order in a 
sentence. It is very Important for the parser to know each lex~cal 
parameter of nouns, adjectives, adverbs, numbers, preposltlons etc. 
These parameters are number, gender, case, person and de~ree. They 
TRANSFORMATION OF NATURAL LANGUAGE INTO LOGICAL FORMULAS 31 
are carrled over the whole phrase and decide about the role of the 
phrase in the sentence. A flexlonal form of the main verb also influ- 
ences the construction of the sentence. Especlally, however, the fle- 
xlonal properties of the main verb could help the parser to find out 
the subject and the direct object. 
These problems and several others as post-modifiers problem, wh-move- 
ment, conjunction, etc. were solved succesfully. 
The syntactical analysis comprises a wlde subset of Polish language 
eg. simple affirmative sentences and questions, complements and rela- 
tlve clauses and certain types of complex sentences. We had to take 
into account a number of speclal properties of the medical dialect 
which rarely occur in a common conversation. The grammar is able to 
parse not only the common Polish but the "medical" Polish as well. 
It means, among others, a great deal of participles, gerunds, modal 
verbs (eg. moze - could, powinien - should) and vague adverbs (eg. 
prawdopodobnle - propably, czesto - frequently, rzadko - rarely, 
czasami - sometimes). 
The syntactical analyser transforms an input sentence into an unflex- 
ional and ordered form. Sorle examples of the output of the First Sta- 
ge are given be\]ow. The I-hark divides the whole sentence into phra- 
ses. An empty p|ace between two Is points out a missing phrase. The 
S and END flags indicate the beginning and the ending of each simp- 
le clause in the sentence. If the DCL fla~ occurs just after S-mark 
in the top-level clause the sentence is dealt as an assertion. In a 
question there are one or more question words instead. The MODIFIERS 
fla~ divides a direct object (if any) into the main phrase and post 
modifiers. This last flag ls an Important one because the head word 
of a direct object phrase nay be a predicative e\]ement of the clause. 
(eg. byc przyczyna - to be a cause). Notice, that a predicative ele- 
ment of the top-level clause becomes the main predicative element of 
the whole sentence. 
alkohol podany doustnie powoduje wzmozone wydzielanie gastryny. 
(alcohol given per os cause greater secretion of gastrin.) 
(S DCL I I I POWODOWAC I I ALKOHOL I S I I I PODAC IDOUSTN* I I 
ALKOHOL MODIFIERS I I I END I S l I I WYDZIELANIE I WZMOZON* I I 
GASTRYNA MODIFIERS I I I END I I I END) 
alkohol zwieksza wydzielanie soku trzustkowego. 
(alcohol increases pancreatic juice secretion.) 
(S DCL I I I ZWIEKSZAC I I ALKOHOL I S I I I NYDZIELANIE I I I 
TRZUSTKOW* SOK MODIFIERS I I I END I I I END) 
co stymuluje czynnosc wewnatrzwydzleinlcza trzustki? 
(what stimulates edocrin activities of pancreat?) 
(S CO I I I STYMULOWAC I I I t~E~NATRZWYDZIELNICZ* CZYNNOSC MODIFIERS 
TRZUSTKA I I I END) 
Nevertheless, because such information ts not sufficient an interpre- 
tation in the Second Stage Is needed. 
32 L BOLC and T. STRZALKOWSKI 
The First Stage contains the main ATN net named SENTENCE which can 
perform Polish natural sentences. There are four speclal subnets: 
NOUN_PHR, ADJ_PHRA, ADV_PH~A, Q_.EXPR which can recognize different 
types of phrases eg. nominal phrases, adjectival phrases, adverbial 
phrases and question expressions respectlvely. 
The First Stage uses a syntactical dictionary which contains the fle- 
xional forms of the words, 
THE SECOND STAGE - SEMANTICAL INTERPRETATION 
When the syntactical analysis has been completed the Second Stage of 
the parser tries to Find out a semantical interpretation ofthe synta- 
ctical structure. The maln predicative element of this structure (eg. 
VERB/ACTION or OBJECT) creates one or more Instances of framework 
descrlbing an event. That Framework looks like a pattern-concept pair 
(8), (12), nevertheless there are more framelndicating verbs (7). 
For example the FOllowing verbs and verb expressions: powodowac (cau- 
se), stymulowac (stimulate), prowadzlc do (conclude), byc przyczyna 
(to be a cause), byc skutkiem (to be a result), etc. refer to the 
conceptualization #IMPLY and podac (to give), stosowac (to apply), 
etc. to the conceptuallzatlon #APPLY. 
The pattern determines which phrases may be expected round the predi- 
cate and which of them must occur. The interpretation process is dri- 
ven by such a pattern so It Is called e~oectatioD-drlve~. It may be 
called structure-driven too because there are structural conditions 
in the pattern which must hold true during the parsing tlme. 
A concept is a notation t~t represents the meaning of a clause. 
Together this pair associates different forms of an utterance with 
Its meaning. 
The #APPLY conceptualizatlon looks like: 
(APPLY TYPE TREATMENT 
-AGT ( (.) HUMOPT ) 
OBJ ( () MEDIC OBL ) 
MANNER ( () MOA OPT ) 
CONCEPT (BUILDQ 
((#APPLY 5) + + +) AGT OBJ MANNER) 
) 
where TYPE Is an indicator which points out that the described event 
is a treatment. AGT, OBJ, HANNER determine that there may be three 
phrases round the predlc~te, but only one of them must occur In an 
utterance. (OBL means obligatory parameter, OPT - optional one). None 
of these phrases could have a preposition before it - (). The AGT-ph- 
rase (agent that applies something) must be a human; the OBJ-phrase 
(object which is applied) must be a medicament; the MANNER slot may 
be filled when the wanner ~f appllcatlon is specified (eg. doustnle 
- per os). The CONCEPT indicator describes the way an atomic formula 
has to be built. As It is seen above, we shall receive a 5-nary pre- 
TRANSFORMATION OF NATURAL LANGUAGE INTO LOGICAL FORMULAS 33 
dicate ca\]led #APPLY which arguments w111 be constructed during the 
Interpretation process. The BUILDQ function ls a special ATN form 
which provides BUILDing of Quoted expressions (see (1) for details). 
A filling of frame slots is done after the syntactical and semantic- 
al requirements were satisfied. When the who\]e pattern were completed 
an atomic formula would be generated. Therefore, the interpretation 
process is an attempt to saueeze the syntactical structure of a sen- 
tence into one or more Instances of framework of an event. Beside 
the maln predlcate(s), a great deal of additional information would 
be joined the output formula. These facts are stored In part in patt- 
ern-concept pairs and in expert subnets of interpreter. They create 
a system knowledge. It is necessary for the system to have such a 
knowledge because none of the real text corps is able to describe 
comp\]etely a domain of the real world. 
A great deal of context information may also be used from the special 
context stack. It helps to solve the problems of pronoun references 
and elllpsls. 
If the "squeezing" could not be made the First Stage is actvated 
again. 
In addition, the semantical dictionary is appended to the Second Sta- 
ge. It keeps al1 patterns of frameworks mentioned above. It contains 
some special entities too for Indicating the reference between verbs 
and patterns. 
The Second Stage also contains the main ATN net named FORMULA. It 
guides the interpretation process and controls the semantical correc- 
tness of utterances. There are aslo some expert nets which can recog- 
nize special medical expressions (eg. names of sicknesses and symptoms 
organs, treatments, etc.). These subnets are a changeable part of the 
system and they decide about the system knowledge. The expert subnets 
may communicate with the main net through the middle \]eve1 of inter- 
preter - the CASES net. Thls net handles nomimai phrase structures 
eg. prepositions, conjunctions and post-modiflers. 
The Second Stage produces a formula of the First Order Predicate Cal- 
culus corresponding to the input sentence. The formula has an impli- 
cative form where the main predicate of the utterance is a conclusion 
and other generated facts are presumptions. 
Two generated formulas are given below. First of them is an assertion, 
the remaining one denotes a question. They are In LISP notation so 
a clarlflcation is needed. IMPLSYM and KONJSYM marks are the logical 
operators IMPLY (=>) and ArID (&). Pn integer just after the KONJSYM 
mark indicates how many factors were joined. Each predicate name is 
preceded by a hash-mark (#) and followed by an integer to indicate 
a number of arguments. Arguments look like a oair or rrlole which 
determines the type of argument, the name of a varlable and a consta- 
nt (if any) respectlvely. 
A1kohol zwieksza wydzlelanie soku trzustkowego. 
(Alcohol Increases pancreatic juice secretion.) 
(IMPLSYM (KONJSYM 3 
((#BADMEDIC 1)(MEDIC X0002585)) 
((#MEDICAMENT 2)(MEDIC X0002585)(mname X0002586 ALKOHOL)) 
34 I.. BOI.C and T. STRZALKOWSKI 
(IHPLSYH (KONJSYH 
• ((#ORGAN 2)(ORGAN X0002589)(ONAHE X0002590 
TRZUSTKA)) 
((#WYDZIELNICZ*-NARZAD 1)(ORGAN X0002589)) 
((#JUICE 1)(LIQUID X0002588)) 
((#LIQUID 3)(LIQUID X0002588) 
(LNAHE X0002591)(ORGAN X0002589))) 
((#SICKNESS h)(SICK X0002587)(STYPE X0002592 FI) 
(SNAME X0002593 wydzlelanle) 
(BODY X0002588)))) 
((#RAISE 2)(etio X0002585)(SYHPTOH X0002587)) )) 
Co powoduje alkohol podany doustnie? 
(What damages cause alcohol drinklnE?) 
((X39) (IMPLSYH (KONJSYH 3 
((#BADHEDIC 1)(HEDIC X30)) 
((#HEDICAMENT 2)(HEDIC X30)(MNAHE X31 ALKqHOL)) 
(IHPLSYH (KONJSYH 2 
((#BADMEBIC 1)(HEOIC X36)) 
((#HEDICAHENT 2)(HEDIC X36) 
(MNAHE X37 ALKOHOL))) 
((#APPLY 3)(anlm X38)(MEDIC X36) 
(HANNER X33 DOUSTN*)))) 
((#1HPLY 2)(ETIO X30)(SICKNESS X39)) ))) 
The parser can also produce other kinds of formal representation of 
natural lamguage. 
CONCLUSION 
The parsing system described above is an attempt to build an univer- 
sal parser for natural language analysis, The authors incline to the 
fashionable thesis thet the syntactical and semantical componemts 
should act in the same time, nevertheless with a domination of the 
syntax over the semantics. This remark is an important one for the 
Polish language. This approach, however, provides no less efficiency 
of the parsing process than in the semantic-dominate systems (7), (8) 
(11), (12) and certainly greater universality of tl~e system. This pro- 
vides among others most of the advantages of regularity of natural 
language. 

BIBLIOGRAPHY 

(1) Bates, M, The Theory and Practice of Augmented Transition Network 
Grammars, in (2). 

(2) Bolc, L., (ed), Natural Language Communication wlth Computers, 
in Lecture notes in Comp. Sci., voI 63,(Sprtnger-Veriag,Berlin, 
Heiielberg, New York 1978). 

(3) Boic,L. (ed), Natural Language Based Computer Systems, (Hanser 
Veriag and Hacmiil&n Press, London 1980). 

(k) Boic, L.(ed), Natural Language question Answerig Systems,(as (3)). 

(5) Bolc, L.(ed), Representation and Processing of Natural Lnaguage, 
(as (3)). 

(6) Burton, R., Brown, d.S.,Semantic Grammars: A Technlque of Constru 
cting Natural Language Interfaces to Industral Systems, (BBN Rep. 
No. 3587, Bolt Beranek and Newman Inc. Cambridge MA 1977). 

(7) Carboneli, J.G., Hulty-Strategy Parsing,(OEDt. of Comp. Sci., 
Carnegle-Mel\]on Univ., Pittsburgh PA, 1981). 
TRANSFORMATION OF NATURAL LANGUAGE INTO LOGICAL FORMULAS 35 

(8) Gershman, A.V., Knowledge-Based Parslng,(Reaearch Rep. 156, 
Yale University, Dept. of Comp. Scl, 1979) 

(9) Landsbergen, J., Adaptation of Montague Grammar to the qequire- 
ments of Parsing, (reprint from MC Tract 136, Formal Methods in 
the Study of Language, J.A.G. Groenendijk, T.M.V. dassen, M.B.J. 
Stokhof (eds.) 1981). 

(10) Martin, N.A., Church, K.W., Pattl, q.S., Preliminary Analysis 
of a Breadth-First Parslmg Algorlthm, (MIT Laboratory for Comp. 
Scl., 1981). 

(11) Schank, R.C., Lebowitz, M.,~irnbaum, L.A., Integrated Partial 
Parsing, (Research Rep. 143, Yale Unlv., Dept. of Comp. Sci.1978) 

(12) Wilensky, R., Arens, Y., PHRAN - A Knowledge-Based Approach to 
Natural Language Analysls,(Dept. of Comp. Sci., Univ. of Cali- 
fornia, Berkeley 1980). 

(13) Woods, W.A., An Experlmental Parsing Syntem for Transition Net- 
work Grammars, (BBN Rep. No. 2362, Bolt Beranek and Newman Inc. 
Cambridge MA, =1972). 

(lk) Woods, W.A., Cascaded ATN Grammars, (in AJCL vol. 6, no. 1, 1980). 
