PARSING FREE WORD ORDER LANGUAGES IN PROLOG 
\]anusz Stanistaw Biefi + 
Krystyna Laus- M~czyrlska ++ 
S tanislaw Szpakowicz + 
• + Institute of Informatics, Warsaw University 
Warsaw, Poland 
++Institute for Scientific Technical and Economic 
Information, Warsaw, Poland 
The Prolog programming language allows 
the user to write powerful parsers in the form 
of metamorphosis grammars. However, the 
metamorphosis grammars, as defined by 
Colmerauer 2 , have to specify strictly the 
order of terminal and nonterminal symbols. 
A modification of Prolog has been implemen- 
ted which allows "floating terminals" to be 
included in a metamorphosis grammar toge- 
ther with some information enabling to cori- 
trol the search for such a terminal in the 
unprocessed part of the input. The modifica- 
tion is illustrated by several examples from 
the Polish language and some open questions 
are discussed. 
Metamorphosis grammars 2' 3 make a con- 
venient tool of the formal description of syn- 
tax of natural languages. Their convenience 
is due to their straightforward relation to 
the programming language Prolog. A meta- 
morphosis grammar is an ordinary part of 
a Prolog program. It defines a language as 
well as a parser for it. 
We suggest here such modifications of 
the way of handling the metamorphosis gram- 
mars in Prolog which allow these grammars 
to analyse constructions without strictly 
specified order of their components. 
Let us consider an example. The follo- 
wing sentence in Polish : 
(I) PRACOWA(J BYLO BARDZO PRZYJEMNIE 
'to work' 'it was' "very \ 'nice ~ 
"It was very nice to work." 
is accepted by the metamorphosis grammar 
given below (nonterminals prefixed by % , 
terminals by ~, == stands for an arrow) : 
%S == %INF %V %ADVP. 
%INF == gPRACOWAC. 
%V == ~BYLO. 
%ADVP = = ~B AR DZ O ~PR ZYJE M NIE. 
In order to simplify the example we ne- 
glect the grammatical categories of phrases 
and words. The last three rules serve 
as "dictionary rules". 
This grammar does not, however, 
account for many correct Polish sentences, 
such as : 
(2) BARDZO PRZYJEM~E BYLO PRACOWAd 
(3) BYLO BARDZO PRZYJEMNIE PRACOWAd 
To make the grammar accept these 
sentences we should, for example, add two 
rules : 
%S == %ADVP %V NNF. 
%S == %V %ADVP %INF. 
One-third of the possible permutations 
of words BYLO, BARDZO, PRACOWAC, 
PRZY\]EMNIE constittzte admissible Polish 
sentences (although sometimes stylistically 
marked). The complete grammar should 
then have 21 rules, including dictionary 
rules. Such a solution is obviously clumsy 
and not satisfactory. 
Our first proposal consists in allo- 
wing two kinds of terminal symbols:ancho- 
red terminals, retrieved in the current 
position of a given sentence (available in 
metamorphosis grammars 2 and prefixed By 
in our example) and floating terminals, 
retrieved anywhere in the unprocessedpart 
of a sentence (we shall prefix them by ~ ). 
The easiest and most concise way of 
expressing a grammar for the sentences 
mentioned above consists in replacing eve- 
ry anchored terminal by a floating termi- 
nal. It is, however, not satisfactory beca- 
use such a grammar accepts also deviant 
(syntactically or stylistically) sequences, 
e.g. 
(g) BYLO BARDZO PRACOWAC PRZYJEMNIE 
(5) PRZYJEMNIE PRACOWAC BARDZO BYLO 
By using both the anchored terminals and 
the floating terminals we can define the 
following grammar : 
--346-- 
%S == %IN.F %V %ADVP. 
%INF == @PRACOWAC. 
%V == @BYLO. 
%ADVP == ~BARDZO @PRZYJEMNIE. 
The grammar accepts only half of the 
incorrect sequences, but (a usual trade-off) 
it rejects some correct Polish sentences. 
It seems that only a grammar with 
numerous specific rules can satisfy the 
strong requirement of accepting those and 
only those sequences which are considered 
correct and no others. 
The formalism is, however, quite ap- 
propriate to describe e.g. the syntax of 
some noun phrases in Polish or syntacti- 
cally unbound modifiers. 
Introducing the floating terminals 
into the Marseille-originated Prolog inter- 
preter requires only minor alterations of 
the bootstrap. The facility has been alre- 
ady made standard in the Prolog version 
for ODRAI$05 (ICL ~900 compatible)which 
is distributed in Poland. 
To illustrate deficiencies of the pro- 
posed mechanism in parsing certain kinds 
of free word-order constructions we shall 
consider the following Polish sentences: 
(6) TRZEBA BY CZEGOg WII~C EJ 
'is needed" "something" "more" 
\[present, \[condi- \[genitive\] 
impersonal\] tional 
formative\] 
(7) CZEGOS BY WlRCE\] TRZEBA 
"Something more would be needed." 
The sentences (6),(7) consist of the 
~mpersonal conditional verb-like phrase 
TRZEBA BY and the noun phrase CZEGO~ 
WII~CEJ. The words CZEGOS and WII~CE\] 
may occupy any position, but the order of 
TRZEBA and BY is restricted. If BY 
precedes TRZEBA then BY must not be 
the first word of a sentence, otherwise, 
BY must be adjacent to TRZEBA. 
Therefore in order to make a conci- 
se grammar accepting all correct Polish 
sentences built of the words TRZEBA, BY, 
WII~CEJ, CZEGO~, we must introduce a 
more selective information concerning the 
order of words. We supply selected termi- 
nals and nonterminals with-control items 
restricting their scopes of floating. The 
lack of such an item means the restric- 
lions inheriled from the left-hand nonter- 
minal (in particular no restrictions). 
.For example, such restrictions could 
be: 
a terminal should be the last (the firsl), 
a terminal must follow (immediately fol- 
low) the recently retrieved terminal. 
Coming back to our example we 
should specify: 
either BY follows a verb immediately, 
or BY must not be the first and must 
precede a verb. 
We can now write the grammar accep- 
ting the sentences (6),(?). The grammar 
is as follows (variable parameters prefixed 
by asterisks, control items separated by 
commas ). 
%S (~TENSE, ~MOOD ) == 
%VPIMPER S (~TENSE, ~MOOD ). 
%VPIM PER S (~TENSE, ~M OOD ) = = 
%VIMPER S ( ~TENSE, ~MOOD, ~SYNTREQ) 
%REQ(~SYNTREQ). 
%VIMPERS(,~TENSE, COND, ~SYNTREQ) == 
%VERB( IMPERS, ~TENSE, ~SYNTREQ) 
@BY, NEXT. 
%VIMPERS (~TENSE, COND, ~SYNTREQ) == 
@BY, NOT.FIRST 
%VERB(IMPERS, ~TENSE, ~SYNTREQ), AFTER. 
~VERB (IMPERS, PRESENT, NP (GEN)) == 
~TRZEBA. 
%REQ(NP(~CASE ) ) == %'NP(~CASE ). 
%NP(,CASE) == %NPRON(,CASE) %IviOD. 
%¢~PRON(GEN) == @CZEGOS. 
%MOD == eWIECEJ. 
In order to make the example clear 
we use only the categories relevant for 
the sentences under discussion. We omit, 
for instance, the number and gender of a 
noun phrase ; the parameter ~SYNTREQ 
expresses a single syntactic requirement 
(in general a verb can have more then 
one requirement ; for details, see 
Szpakowicz 5 ). The rule for NP is also 
very simplified. .From the point of view of 
the description of Polish syntax the grmn- 
mar presented above is, in fact, unsophi- 
sticated and fragmentary. It is sufficient, 
however, to illustrate some linguistic phe- 
nomena mentioned earlier. 
An experimental version of the ODRA- 
Prolog accepts the metamorphosis grammar 
--347-- 
rules with control items (syntactically just 
Prolog terms). The inventory of the word 
order restrictions has yet to Be established 
by the research on word order in Polish. 
Thus, for the time Being, the interpreta- 
tion of the control items is implemented in 
an ad hoc manner. 
A formal description of the syntax of 
a natural language of free word-order type, 
as for example Polish and other Slavonic 
languages, requires, however, some addi- 
tional technical and linguistic problems to 
Be solved. 
We want to present now those pro- 
blems which we find to Be the most impor- 
tant. 
In some cases the occurence of a 
word-form depends on particular proper- 
ties of the word which immediately prece- 
des it (usually it is the phonetic shape of 
the preceding word which influences the 
choice of the proper word-form ). For 
example, agglutin,ative present tense form 
of the verb BYC in second person, singu- 
lar, masculine can Be realized either by 
or By E~. The forms ~, EL are 
written jointly with the preceding syntactic 
item But on the level of syntactic descrip- 
tion they are clearly distinguishable. 
Let us illustrate this problem by the 
following sentences : 
(8)NAROBIL + E~ LADNEGO 
"to cause" "cute" 
here : 'big" 
\[sg, masc\] \[2p, sg, \[sg, masc, \[sg,masc, 
masc\] gen \] gen \] 
(9) LADNEGO + S KLOPOTU NAROBIL 
"You've caused quite a lot of trouble." 
KLOPOTU 
/trouble" 
The very simple grammar presented 
below accepts these two sentences but it 
accepts also some incorrect sequences 
because the rules do not express the 
dependency phenomena mentioned above. 
%S == %PP(,GENDER , ~NUMBER , NP(GEN)) 
%VPT(,~GENDER , ~NUMBER , ~PERSON, ~X) 
%NP(~NUMBER2, ~GENDER2, GEN). 
%VPT(MASC, SING, 2P, VOW) == @S. 
%VPT(MASC, SING, 2P, CON) == @ES. 
%PP(MASC, SING, NP(GEN ) ) == eNAROBIL. 
i 
%NP(SING, MASC, GEN) == 
eLADNEGO @KLOPOTU, AFTER. 
(VPT- the abbreviated present tense form 
I of the verb BYC; VOW and CON mean 
"used after a vowel" and "used after a 
consonant" ). 
So far we do not see the simple and 
satisfactory way of relating the parameter 
• X of %VPT to the other words and 
phrases. Provisionally the agreement of 
the agglutinative forms of the verb BYE 
with the corresponding words may Be re- 
solved during dictionary lookup in the 
pre-parsing phase. 
The other purely linguistic problems 
are related to influence of the free word- 
order on accomodating the verb phrase to 
the gender of a compound noun phrase. 
For example, the verb phrases in the apo- 
sition agree in gender with the last consti- 
tuent of the noun phrase, as in: 
(i0) JAN m:B MARIA PRZYSZLi 
'John' "or" "Mary ~ "came" 
If era\] 
Similarly, the gender of the verb 
phrase in the postposition may agree with 
the first constituent of the noun phrase, 
for example : 
(II) PRZYSZEDL JAN LUB MARIA 
• Calne ~ 
\[masc\] 
It is only recently that this difficult 
problem has been a subject of a partial 
research. The formal syntax description of 
written sentences in Polish with neutra\] 
word-order is availableS, 6. It accepts 
practically all nonelliptical declarative and 
negative sentences, as well as the majority 
of interrogative sentences, nevertheless, 
we can propose only a provisional solution 
of this problem. 
Another complicated question consists 
in the discontinuity of the phrases which 
constitute the sentence, as for example 
interpenetration of the verb phrase and the 
noun phrase : 
(12) NOW& K SlAZKE; INPI 
DAL JAN MARII IVPI 
'new" 'gave" 'Joh~ /book" "Mary" 
\[acc\] ~.nom\] \[acc\] \[dat \] 
"It is a new Book that John gave to Mary", 
348 
Therefore the contro\] information 
should allow the search of missing consti- 
tuents of the phrases even far off the 
main component. On the other hand it 
should protect against "borrowing" an inap- 
propriate constituent from a quite different 
phrase, e.g. from the subordinate clause. 
It is now clearly visible that parsing 
free word-order languages is really dif- 
ferent from the syntactic analysis of, say, 
English. Although the presented modifica- 
tions of metamorphosis grammars do not 
solve all the problems discussed above, 
they provide a useful instrument for furt- 
her experimental studies. 
Finally we want to emphasize that we 
were aware of the semantic and pragmatic 
functions of free word-order, which are 
studied e.g. by Sgal! 4 and Szwedek 7. But 
we believe that, from the methodological 
point of view, it is justified to prescind 
from them in the syntax description. 
A reader interested in some notions of 
the impact of word-order on semantico- 
pragmatic level, may wish to consult 
Biell I . 

References 

Ill Biefi J.S. Multiple Environments Model 
of Natural Language \[in Polish, unpu- 
blished Ph. D.thesis \], 1977. 

\[2\] Colmerauer A. Metamorphosis Grammars. 
In Bolc L.(ed) Natural Language Com- 
munication with Computers, Lecture No- 
tes in Computer Science 63, 1978. 

\[3\] Pereira F., Warren D.H.D. Definite 
Clause Grammars Compared with Aug- 
mented Transition Networks. Dept.of A1 
Report 58, University of Edlnburg, 1978. 

\[4\] Sgall P.,Haj1cova E.,Benesova E. Topic, 
Focus and Generative Semantics. 
Kronberg Taunus: Scriptor Verlag 
GmbH, 1973. 

\[5\] Szpakowicz S. Automatic Syntactic Ana- 
lysis of Polish Written Utterances \[ in 
Polish, unpublished Ph.D. thesis\], 1978. 

\[6\] Szpakowicz S. Syntactic Analysis of 
Written Polish. In Bolc L.(ed)Natural 
Language Communication with Computers, 
Lecture Notes in Computer Science 63, 
1978. 

\[7\] Szwedek A. Word Order, Sentence Stress 
and Reference in English and Polish. 
Edmonton: Linguistic Research Inc. ,1976. 
