|  i!i 
I i! 
Ii  ,ili 
Two Useful Measures of Word O 'der Complexity 
Tomfi~ Holan, Vladislav Kubofi t, Karel Oliva IMartin Plfitek§ 
Abstract 
This paper presents a class of dependency-based for- 
real grammars (FODG) which can be parametrized 
by two different but similar measures of non- 
projectivity. The measures allow to formulate con- 
straints on the degree of word-order freedom in a lan- 
guage described by a FODG. We discuss the problem 
of the degree of word-order freedom which should be 
allowed ~, a FODG describing the (surface) syntax 
of Czech. 
1 Introduction 
In \[Kuboh,Holan:Pl~tek.1997\] we have introduced a 
class of formal grammars. Robust Free.Order Depen- 
dency Grammars (RFODG's), in order to provide 
for a formal foundation to the way we are devel- 
oping a grammar-checker for Czech. a natural lan- 
guage with a considerable level of word-order free- 
dora. The design of RFODG's was inspired by tile 
commutative CF-grammars (see \[Huynh.83\]), and 
several types of dependency based grammars (cf.. 
e.g... \[Gaifman.. 1965\]. \[B61eckij.1967\], \[Pl~tek,1974\], 
\[Mer~uk.1988\] ). Also in \[Kubofi.Holan.Pl~tek,1997\]. 
we have introduced different measures of incorrect- 
ness and of non-projectivity of a sentence. The 
measures of the non-projectivity create the focus of 
our interest in this paper. They are considered as 
the measures of word-order freedom. Considering 
this aim we work here with a bit simplified version 
of RFODG's. namely with Fre~-Order Dependency 
Grammars (FODG's). The measures of word-order 
freedom are used to formulate constraints which can 
be imposed on FODG's globally, or on their individ- 
ual rules. 
• Department of Software and Computer Science Educa- 
t ion. Faculty of Mat hemat ics and Ph.~asi~. Charles University, 
Prague. Czech Republic. e-math holan~aksvi.ms.mff.cuni.cz 
I Institute of Formal and Applied Linguistics, Faculty of 
Mathematics and Physics. Charles University. Prague, Czech 
Republic. e-mail:vk~ u fal.ms.mff.cuni.cz 
: C'c, mputalionM Linguistio,, University of Saarland. Saz~r- 
briicken. Germany. e-mail: oliva(Icoli.uni-sb.de 
§ D~par!men! of Ther,relical Computer Science. Faeuhy of 
.Mat 10,-mal i*.'s and i'hysics.. Charles University. Prague. t"z*~'h 
l~epuhlic, e-math platek~kkl.ms.mff.runi.r~. 
Two types of syntactic structures, namely DR - 
trees (delete-rewrite-trees), and De- fre~s (depen- 
dency trees), are connected with FODG's. Any DR, 
tree can be transformed into a De-tree in an easy 
and uniform w~,. In \[Kubofi,Holan.Pl/Ltek.1997\] the 
measures of non-projectivity are introduced with the 
help of DR-trees only. Here we discuss one of them, 
called node-gaps complexity (Ng). It has some very 
interesting properties. CFI.'s are characterized by 
the complexity 0 of Ng. The N9 also character- 
izes the time complexity of the parser used by the 
above-n~'entioned grammar-checker. The sets of sen- 
tences with the Ny less than a fixed constant are 
parsable in a polynomial time. This led us to the 
idea to look for a fixed upper bound of Ng for 
all Czech sentences with a correct word order. In 
\[Kubofi,Holan,Pbltek.1997\] we even worked with the 
conjecture that such an upper bound can be set to 1. 
We will show in Section 5 that. it is theoretically im- 
possible to find such an upper bound, and that even 
for practical purposes, e.g.. for grammar-checking. 
it should be set to a value considerably higher than 
1. This is shown with the help of the measure dNg 
which is introduced in the same way as :Vg. but on 
the dependency trees, dNg creates tile lower esti- 
mation for Ng. The advantage of d.Vg is that it 
is linguistically transparent. It allows for an easy 
discussion of the complexity of individual examples 
(e.g., Czech ~ntences). On the other hand. it al- 
lows neither for characterizing the class of CFL's, 
nor for imposing the context-free interpretation for 
some individual rules of a FODG. Also. no useful 
relation between dXg and some upper estimation 
of the time complexity of parsing has been estab- 
lished yet. These complementary properties of Ny 
and dNg force us to consider both of them sinmha- 
neously here. 
2 FOD-Grammars 
The basic notion we work with are free-order de- 
pendency grammars (FODG's). Ill tile .sequel tile 
FODG's are analytic (recognition) grammars. 
Dcfinititm -( FODG ). I.'lv ,-oldcr ch l,¢ mh ,,'y 
grammar (FOD(;) is a tupb" (; -- (T.A'..g't./~). 
21 
where the union of N and T is denotc~! as I'. T 
is the set of terminals. N is the set of nonterminais. 
St C V is the ~.t. of rool-symbols (starling symbols), 
and P is the set. of rewrhing rules of two types of the 
form: 
a) A "+x BC, where A E N. B,C E V. X is 
denoted as the subscript, of the rule, X E {L, R}. 
b) .4 --+ B. where A E N, B E V. 
The letters L (R) in the subscripts of the rules 
mean that the first. (second) symbol on the right- 
hand side of the rule is considered dominant, and 
the other dependent. 
If a rule has only one symbol on its right-hand 
side, we consider the symbol to be dominant. 
A rule is applied (for a reduction) in the following 
way: The dependent symbol is deleted (if there is 
one on the right-hand side of the rule) , and the 
dominant one is rewritten (replaced) ~, the symbol 
standing on the left-hand side of the rule. 
The rules A "+L BC, A -~n BC, can be applied 
for a reduction of a string z to any of the occur- 
rences of symbols B, C in z; where B precedes (not 
necessarily' immediately) C in z. 
For the sake of the following explanations it is 
necessary to introduce a notion of a DR-tree (delete- 
rewrite-tree) according to G. A DR-tree maps the 
essential part of history of deleting dependent sym- 
bols and rewriting dominant symbols, performed by 
the rules applied. 
Put informally', a DR-tree (created by a FODG G) 
is a finite tree with a root and with the following two 
types of edges: 
a) vertical: these edges correspond to the rewriting 
of the dominant symbol by" the symbol which is 
on the left-hand side of the rule (of G) used. 
The vertical edge leads {is oriented) from the 
node containing the original dominant symbol 
to the node containing the symbol from the left- 
hand side of the rule used. 
b) oblique: these edges correspond to the deletion 
of a dependent symbol. Any, such edge is ori- 
ented from the node with the dependent deleted 
symbol to the node containing the symbol from 
th{ left-hand side of the rule used. 
Let us now proceed more formally'. The following 
technical definition allows to derive a corresponding 
dependency" tree from a DR-tree in a natural way,. to 
define the notion of coverage of a node. and to de- 
fine two measures of non-projectivity. In the sequel 
the symbol :Vat means the set of natural numbers 
(without zero). 
Definition -(DR-tree). A tuple Tr = 
(Nod. Ed. Rl) is called DR-tree created by a FODG 
G ( where Nod means the set of nodes. Ed the set of 
edges, and Rt means the root node), if the following 
pohlls hold for any l" $ Nod: 
22 
a) U is a ,l-lnl)le of Ihe form \[.-I. i. j. el. where .! E 
I" (terminal or nonterminal of G). i.j E ,Vat. 
¢ is either equal to 0 or it has tile shape L" r. 
where k, p E ,Vat. The A is called symbol of I r. 
the number i is called hori:o,tal indcz" of U. j 
is called vertical index, c is called domination 
index. The horizontal index expresses the cor- 
respondence of U with the i-th input symbol. 
The vertical index corresponds to the length in- 
creased by I of the path leading bottom-up to 
U from the leaf with the horizontal index i. The 
domination index either represents the fact that 
no edge starts in U (e = 0) or it represents the 
final node of the edge starting in U (e = k r, cf. 
also the point e) below). 
b) Let U = \[A,i,j,e\]and j > 1. Then there is 
exactly, one node Ul of the form \[B, i, j- 1, ij\] 
in Tr, such that the pair {U\], U) creates a (ver- 
tical) edge of Tr, and there is a rule in G with 
A on its left-hand side, and with B in the role 
of the dominant symbol of its right-hand side. 
c) Let U = \[A, i, j, el. Then U is a leaf if and only 
if A E T (terminal D, mbol of G), and j = 1. 
d) Let"U = \[A,i.j,e\]. U = tit iffit is the single 
node with the domination index (e) equal to 0. 
e) Let U = \[A,i.j,e\]. lfe = k v and k < i (resp. 
k > i). then an oblique edge leads from U (de- 
pendent node) to its mother node t% with the 
horizontal index k. and vertical index p. Fur- 
ther a vertical edge leads from some node U, 
to Urn. Let C be the symbol from Urn, B from 
Us. then there exists a rule in G of the shape 
C "-~L BA (resp. C "-~n AB). 
f) Let U = \[A.i.j.~\]. If e = k r. and k = i. then 
p = j + 1, and a vertical edge leads (bottom 
up) from U to its mother node with the same 
horizontal index i. and with the vertical index j+L 
We will say that a DR-tree Tr is complete if for any 
ofits leaves U = \[.4. i. 1, el. where i > 1. it holds that 
there is exactly" one leaf with the horizontal index 
i- 1 in Tr. 
Example 1. This formal example illustrates the 
notion of FODG. The following grammars G1. G_, 
are FODG's. Gl = (NI,TI.{S}.PI). T1 = {a.b.c}, 
N~ = {T,S}. PI = {S--*L aTISS, S"+R To. T ~L 
be, T ~s cb}. 
G.~ = (N.-,,T.,.{S},P.,), 7"_, = {a.b,c.d}. :V 1 = 
{S. SI,S~.}, P~. = {S ~ Slold, St ~n S~. b, S.., "+s Se}. 
Fig.l. displays a DR-tree generated by, GI for the 
input sentence aabbcc. 
Definitions. TN(G) denotes the set of compb'le 
DR-tlvcs rooted in a symbol from ,%. created hy (;. 
If 7"rE 7"N((;). w,-say' Ihal Tr is loors, d I,y (;. 
I 
I 
m 
Figure 1: A DR-bye :/'rl generated by the grammar 
GI. The nodes ofT'r1 are L~ = \[a, l, l, l.~\], L.~ = 
\[a.2.1.2.~\],La = \[b. 3,1, Z:\], L4 = \[b,4,1,4.*\].Ls = 
\[e. 5.1,3.~\], L6 = \[c, 6, 1.4.~\] (the leaves), and A"I = 
\[T,4.2, I.~\],A~ = \[T,3,2,2.~\],Na = \[S.2,2, la\],A~ = 
\[s. 1.2, h\], ~v~ = IS. L 3, 0\]. 
I 
LI L2 La L4 L5 L6 
Let w = aaa~....an, u'. E T', Tr E TN(G). and 
let \[ai.i, l.e(i)\] denote the i-th leaf of Tr for i = 
1,...,n. In such a case we say that the string w is 
parsed into Tr by G. 
The symbol L(G) represents the set of strings 
(sentences) parsed into some DR-tree from TN(G). 
We will also write 
TN(u'..G) = {Tr:u" is parsed into Tr by G}. 
Example 2. Let us take the grammar G1 from 
the example 1. Then L(G1) = {u" E {a,b,c}+l u" 
contains the same number of a's. b's. and c's }. Let 
us take the G.~. Then L(G.) = {du'lu'E L(G1)}. 
L(G1). L(G,.) are two variants of a well known non- 
context-free language. 
Let us now introduce dependency trees parsed 
from a,string. Informally, a dependency tree is ob- 
tained by contracting each vertical path of a DR-tree 
into its (starting) leaf. 
Definition - (De-tree). Let Tr E TN(w.G) (w 
is parsed into Tr ~, G), where u' = ala2..a,. The 
dependency tree dT(Tr) contracted from Tr is de- 
fined as follows: The set of nodes of dT(Tr) is the 
set of 3-tuples \[ai.i, k(i)\] (note that ai is the i-th 
symbol of u:). k(i) = 0 if and only if the root of 
Tr has the horizontal index i (then the \[ai. i, k(i)\] is 
also the root ofdT(Tr)}, k(i) E Nat if and only ifin 
Tr an oblique (~lge leads from some node with the 
horizomal index i to ~me node with the horizontal 
index k(i). 
Figure 2: The dependency tree dT'rl corre- 
sponding to the Trl. The nodes of dTr3 are 
^5 = \[,i.J.o\].:x~ = \[o.2.q.,% = \[b, 3, ~\]. .v, = 
\[b,4, q.Ns = \[,-, 5.3\],.~ = \[~,6,4\]. 
Sl \ 
h~ lx~ 
We can see that the edges of dT(Tr) correspond 
Cone to one) to the oblique edges of Tr, and that they 
are fully represented by the second and the third 
slot of nodes of dT(Tr). The second slot is called 
hori:ontal index of the node. 
The symbol dTN(w, G) denoles the set of dT(Tr). 
where Tr E TN(w,G). The symbol dTN(G) de- 
notes the union of all dTN(u,, G) for u" E L(G). We 
say that dTN(G) is the set of De-trees parsed by G. 
An example of a dependency tree is given in Fig. 
2. 
3 Discontinuity measures 
In this section we introduce two measures of non- 
project ivity (discontinuity). 
First we introduce the notion of a coverage of a 
node of a DR-tree. 
Defnitlon. Let Tr be a DR- tree. Let u be a 
node of Tr. As Cov(u. Tr) we denote the set. of hor- 
izontal indices of nodes from which a path (bottom 
up) leads to u. (Cot'(u, Tr) obligatorily contains the 
horizontal index of u). We say that Cov(u. Tr) is the 
coverage of u (according to Tr). 
Exmnple 3. This example shows the coverage of 
individual nodes of Trl front Fig.l. 
Co,,(\[a, 1. l, h\]. Tra) = {1}, 
Co,'{\[a, 2, 1.2-_,\]. Trl) = {2}, 
Co,,(\[t,. 3. l, 321. rr3) = {3}. 
Coy(\[b, 4.1,42\],Trl) = {4}. 
Co,.(\[~. ~. i. 3..,\]. rr,) = 15}. 
co,,(\[,., el. 1. 4...\]. T,.~) = {ti}. 
23 
I 
I 
I 
I 
I 
I 
I 
I 
i 
! 
i 
I 
l 
I 
I 
I 
I 
('o,.(\[T.4.'2. I=,\].T,.,) = 14.6}. 
Cow.dr. 3. ~. ~.q. T~j) = 13.5}. 
Co,.(\[s,'2.'2.13\], 7"w.~7 = 12,3.5}, Cov(\[S. 
1,2.13\]. Try) - {1,4.6}. Cor(\[S, 
1,3.0\],Tra) = {I,2.3.4,5,6} 
Let. us define a (complexity7 measure of non- 
projectivity ~' means of the notion of a coverage 
for each type of trees: 
Definition. Let Tr be a D\]:I- tree. Let. u be 
a node of Tr, Cot:(u, Tr) = {iI,i.~ ..... in}, and 
ia < i2 ...in-a < in. We say that the pair (ij,ij+l) 
forms a gap if 1 _< j < n, and ij+l - i~ > 1. The 
symbol Ng(u, Tr) represents the number of gaps in 
Cov(u, Tr). Ng(Tr) denotes the maximum from 
{Ng(u, Tr);u E Tr}. We say that Ng(Tr) is the 
node-gaps complexity of Tr. 
In the same way dNg(dTr 7 can be introduced for 
any dependency tree dTr. 
Example 4. We stick to the DR-tree Trt from 
previous examples. The following coverages contain 
gaps: 
Coy(IT, 4, 2, 1~\], Trl) = {4, 6} 
Co~,(\[T, 3, 2, 2,.\], T,~) = {3, 5} 
Cot,(\[S. 2.2, 13\], Tra) = {2, 3: 5} 
Cov(\[S, 1,2, ls\], Trl) = {1,4,6} 
has one gap (4..67, 
has one gap (3~5), 
has one gap (3,57, 
has two gaps (1.4) and (4.6). 
We can see that Trl has three different gaps 
(1.4),(3,5).(4.6). and .Vg(Trl) = 2. 
Example 5. This example shows the coverages 
of tl, e nodes of dTra from Figure 2. 
Cor(\[a. 1,0\],dTra) = {1.2.3,4,5.6}. 
Cov(\[a. 2.1\], drra) = {2.3.5}. 
Co~,(\[b. 3.2\]. dr,a) = 13.5}. 
Cot,(\[b,4.1\],dTra) = {4.6}, 
Co,,(\[~. 5.3\]. dr,,) = {5}, 
Cov(\[c. 6.4\]. dTrl) = {6} 
We can see that dTrl has two different gaps 
(3.5).(4,6). and dNg(dTrl) = 1. 
Definitions. Let i E CVa/U{0}u{*}) and let * be 
greater than any natural number. Let us denote as 
TN(w. G. i) the set of DR-trees from TA'(w, G) such 
that the value of the measure A'g does not exceed i 
on them. When i is the symbol .. it means that no 
limitation is imposed on the corresponding value of 
the measure A'g. 
Let us denote LN(G. i) = {u'\[ TN(w. G. i) ~ ~}. 
TN(G,i) denotes the union of all TA'(u',G.i} over 
all u" E L(G, i). 
TN(i) denotes the class of sets (of DR-trees} 
TN(G. i). for all FOD-grammars G. 
LN(i) denotes the class of languages LN(G, i). for 
all FOl)-grammars G. 
The denolalions dTN ( w. G. i). dLN (G. i). 
dTN(i), dLN(i) cau be introdm't~l slepwi.~r in the 
same way for l)~tr~t.'s' as TN(w,G.i). LN(G.i). 
TN(iT, LN(i) for DR-tr~x,s. 
4 Formal observations 
This section contains some observations concerning 
the notions defined above. Due to space limitations. 
they are not accompanied ~' (detailed) proofs. The 
symbol CF + denotes the set. of context-free law,- 
guages without the empty string and comCF + de- 
notes the set. of commutative context-free languages 
without the empty string (L is a commutative CF- 
language if it is composed from a CF-language and 
from all permutations of its words (see, e.g., C0)): 
note, in particular, that L need not be a context- 
free language, actually the classes CF+,comCF + 
are incomparable). 
a) LN(0) = CF +. 
The rules ofa FODG not allowing any gaps are 
interpreted in an usual context-free way. 
b) dLN(07 contains non-context-free languages. 
Such a language is, e.g., L(G~.) frown example 2. 
It holds that all De-trees from dT(G,.) do not 
have any gaps. On the other hand the number 
of gaps in the set of DR-trees parsed ~, G.~ is 
not bounded by any constant. 
c) LN(*) = dLN(,) D comCF +. 
It easy to see the inclusion. The fact that it is a 
proper inclusion can be shown by the context- 
free language {ancb'ln > 0}, a language which 
is not commutative context-free, and is obvi- 
ously from LN(i7 for any i E (Nat L) {0} O {*}) 
d) Tile L(Ga). L(G2) from tile example 2 are com- 
mutative CF-languages which are not context- 
free. 
The language Lc! = {anbmcb'na'\[n.m > 0} is 
from CF +, but it is not from LN(*). 
It means that the classes LN(*) and LN(0) = 
CF + are incomparable. 
e) L:Y(i) C_ dLN(i) for any natural number i 
(since Ng(Tr) >__ dNg(dT(Tr)) for any DR-tree 
Tr). 
f) Any language L from LN(i). where i E (Nat U 
{0}): is recognizable in a polynomial time com- 
pared t~ the size of the input. 
We have implemented (see \[Holan.Kuboh, 
Pidtek, 1995\]. \[Holan.Kuboh. Pl~tek. 1997\].) a 
natural bottom-up parsing algorithm based on 
the stepwise computalion of pairs of the shape 
(U. Cv). where U means a node of a DR-tree 
and Cr means its coverage. With Ihe N.q (il can 
be. interpreted as the maximum of the number 
of gaps in lhe coverages during a comlmlation 
of the parser) limited by a con.~!ant. ! he number 
24 
g) 
of such pairs.- depends polynomially on the size 
of the input. The assertion f) is derived front 
this observation. 
Becau~ a limited dN 9 for a language L does 
not ensure also the limited N9 (see the item 
b)) for L, the limited dN9 need not ensure the 
parsing in a polynomial time for the language 
L. 
5 A measure of word order freedom 
of syntactically correct Czech 
sentences 
In this section we describe linguistic observations 
concerning the word-order complexity of the surface 
Czech syntax. The notions defined above are used in 
this section. We are going to discuss the fact that for 
Czech syntax (described by means of a FODG GE) 
there is no adequate upper estimate of the bound- 
ary of correctness of word-order freedom based on 
node-gaps complexities. It means that we are going 
to show that there is no i0 such that each De-tree 
belonging to dTN(GE, i) - dTN(GE, io) for i > i0 
is (quite clearly) syntactically incorrect. 
Let us now show a number of examples of non- 
projective constructions in Czech. In the previous 
work (see \[Holan.Kubofi.Pl~tek,1997\]) we have pu t 
forward a hypothesis that from the practical point 
of view it is advisable to restrict the (possible) lo- 
cal number of gaps to one. However, we found out 
soon that this is not generally true because it is 
not very difficult to find a perfectly natural, un- 
derstandable and syntactically well-formed sentence 
with dN9 higher than one. Such a sentence may for 
example look like this: 
Tuto knihu jsem se mu rozhodl ddt k 
narozeningm. 
(Lit.: This book l-have-Refl, him de- 
cided \[to\] give to birthday.) 
\[I decided to give him this book to 
birt hday.\] 
The Fig.3 shows one of possible De-trees repre- 
senting, this sentence: 
The node\[ddt, 7,6\] has a coverage containing two 
gaps. Since no other node has a coverage with 
more gaps. then (according to the definition of d.X'g) 
the dependency node-gaps complexity of this depen- 
dency tree is equal to 2. It is even quite clear that 
it is not possible to find a linguistically adequate 
De-t ree representing the same sentence with a lower 
value of dNg (words js~m, se and ro:hodl will al- 
ways cause gaps in the coverage of ddl). The max- 
imum empirically attested number of verbal partic- 
ipants is 5 in Czech (see \[.qgalI,Panevovti.1988/t~9\]). 
The previous example showed thal the infinitives of 
verbs wit h sucl, a number of part icipants may quite 
naturally form constructions containing four gaps. 
It. might even suggest that the nmnber four mighl. 
hence, serve as an upper estimation of dN9 (based 
on the highest possible number of participants of an 
infinitive), ltowever, this conclusion would not be 
correct., since in the general case it. is necessary to 
take into account also the possibility that partici- 
pants are combined with free modifiers, and from 
this it follows that it is not reasonable t~ set a cer- 
tain constant as an upper boundary of dN9 • In 
order to exemplify this, we present the sentence 
Zo dnegnl krize by se lidem joke PeW 
kvdli jejich p~'ijmdm takov+j byt -ddn~ marl- 
tel domu :a tu ccnu ncmohl na tak diouhou 
dobu sna~it pronafimat. 
(Lit.: In today's crisis would Refl. peo- 
ple (dot.) as Petr because-of their income 
such flat no owner of-a-house for that price 
couldn:t for such long time try to-rent) 
\[In today's crisis no landlord would try 
to rent a fiat for that. price for such a long 
time to such people as Petr because of their 
income.\] 
Similarly as in the previous case. Fig.5 shows a 
dependency tree representing the structure of the 
sample sentence. The words b~/, p~qjrn~m, rnajitel. 
sna~it and pronafimat depend (immediately) on 
the main verb nemohl. They are together with their 
dependents interlocked with the set of word groups 
{Za da¢inl kri:~: lidem jako Pelt; takor~ byt: :a tu 
cent: and no tak dlouhou dobu } dependent on the 
subordinated verb in infinitive form (prvnajimat - to 
rent), creating thus five gaps in the local tree whose 
governor is pronafimat. 
The previous claim is supported also by another 
example of combination of different types of non- 
projectiviti~ inside one clause. Apart from gaps 
caused by complemeutations, the following syntacti- 
cally correct sentence contains also a gap between a 
wh-pronoun and a noun. The Fig.4 shows that the 
value of dNf is equal to 3 for this sentence. 
K~ kolikdt~m jsem se mu nakon~c tuto 
knihu ro'_hodl ddt naro-_enindmf 
(Lit.: To which l-have-Rcfl, him finally 
this book decided to give birthday) 
\[Which birthday I finally decided to 
give him this book to?\] 
The examples presented above illustrate the fact 
that the measure ofdN9 has (in Czech) similar prop- 
erties as some other simple measures of complexity 
of sentences (the length of a semence, the level of 
center-embedding, elc.), it is quite clear that for 
the .sake of readabilil.v and simplicily it is advisable 
to produce ~nlcnces wit h a loss" ~ore of these simple 
|ll{,a:gllrP~4. ()li the ollwr hand. it is not i~ossibh • to 
25 
Tuto to~hu ~ se mu ~zho~ (~t k narozenk'dlm. ~ ~ .~ ~ ~.. ~ ~ 
Figure 3: 
...... :..~......~.~ ..........~.~!...... 
Ke kolik~ ~m ~e |S.h~mO| mlkon~ tuto 
which I have 
Figure 4: 
laChu ~ ~ na~? 
26 
J~r 
• j~o~ A . 
s~ 
........................................... i~qr~ C 27 
I 
I 
I 
I 
I 
! 
I 
I 
I 
1 
! 
I 
I 
I 
I 
I 
I 
I 
| 
~t a fixed threshold above wl,ieh the senlence may 
be cousidered ungranmtatical. Similarly as very long 
sentences or sentences with complex self-embedding, 
also the sentences containing a large nuntl~r of gaps 
may be considered stylistically inappropriate, but 
not syntactically ill-formed. 
6 An enhancing of FODG 
Even when the examples presented above show 
the unlimited nature of nonprojective constructions 
in Czech, it is necessary to point out that there 
are some constructions, which do not. allow non- 
projectivily at all. These constructions therefore 
require a limitation applied to particular grammar 
rules handling them. For example, some ofthe gram- 
mar rules then allow only projective interpretation 
of input data. As an example of such a construc- 
tion in Czech we may take an attachment of a nom- 
inal modifier in genitive case to a governing noun 
(obchod\[nom.\] otce\[yen.\] - shop of-father, father's 
shop). Only the words modifying (directly or indi- 
rectly) one of the nouns may appear in between the 
two nouns, no "gap" is allowed. 
(obchod s potrarinami na~eho otce) 
Formally we add the previously described "projec- 
tivity" constraint to a rule of the form .4 --->x CB 
with help of an upper index 0. Hence the augmented 
rule has the following form A--->°xCB. It means that 
any node of a DR-tree created by this rule (corre- 
sponding to the left-hand side A) can be a root of a 
subt ree without gaps only. 
We can easily see that with such a type of rules. 
the power of (enhanced) FODG's increases substan- 
tially. 
Formal observation. The class of languages 
parsed (generated) ~" enhanced FODG's contains 
CF + and comCF +. 
This type of enhanced FODG is used for our cur- 
rent robust parser of Czech. 
We outline a further enhancing of FODG's: We 
can imagine that if we write for some natural number 
i a rule of the shape A-+ixCB it means that any 
node created by this rule can have a coverage with 
at most'i gaps. We can easily ~e that with such a 
type of rules, the power of FODG's enhanced in this 
way again increases substantially. 
7 Conclusion 
We have not considered some other natural mea- 
sur~ of non-projectivity here.e.g., the number of 
crossings (\[Kunze.1972\]). the size of gaps or the tree- 
gaps complexity (\[Holan. Kubofi. Plfilek.1995\]}. By 
Ihis mea.,~ures it is much easier to show the non- 
existence of their upper boundarit.*s concerning the 
non-proj,.,'livily in Czech than by d.Y!l. The prt-- 
sented nwasur¢.'s .'V.q and d.Yq help us to show !ha! 
28 
it is I)ol easy to d,~:ril>e tile Czech syntax (in parlic- 
ular. to rellect the word-order) fully and adequately. 
\]1 is file!her shown that i1. is not easy (maybe im- 
possible) to find an algorilm which should guaran- 
tee parsing of any correct. Czech semence with a (no! 
too high) polynomial time complexity according to 
its size. We will try in future to improve the de- 
~ription, and also the parsing algorithm, in order 
to come closer to meeting this challenge. 
8 Aknowledgement 
We are very thankful to prof. Haiti:oval and 
prof.PanevovA for their recomendations. 
The work on this paper is a part of the project. 
"Formal Tools for Separation of Syntactically Cor- 
rect Struclu~,es from Syntactically lnco1~ect Struc- 
tures" supported by the Grant Agency of the Czech 
Republic under Grant No. '201/96/0195. 

References 
M.I.Beleckij: Beskontekstnye i dominacionnye 
grammatiki i srjazannye s nimi algoritmideskije 
problem.y, Kibernetika, 1967. A'o ,l,pp. 90-97 
H. Gaifinan: Dependency Systems and Pbrase- 
Structure Systems: Information and Control, 8. 
196.5, pp. 30~-337 
A. |: Gladkij: Formal~ye grammatiki i jazyki, 1:.: 
NA UIC4. Moskva. 1973 
D.T.Huynb: Commutative Grammars: The com- 
plexity of uniform word problems. Information 
and Control 57, 1983, pp. 21-39 
T.Holan, VKubofi. M.Pidtek: An Implementation 
of Syntactic Analysis of Czech. Proceedings of 
IH'PT" 95. Charles Umrersity Prague. 1995.. pp. 
IO_6.135 
T.Holan. I:Kubo~ . M.Pidtek : A Prototyp~ of a 
Grammar Checker for Czech. Proceedings of the 
Fifth Confel~nce on Applied A'atural Languaye 
Processin 9. ed. Association for Computational 
Linguistics, Washington. Mm~b 1997.. pp. 14 ?- 15~ 
V Kubofi. T.Holan. M.Pidtek : A Grammar-Checker 
for Czech. UE4L Tech. Report TR-1997-02, MFF. 
Charles Unirtrsity Prague; 1997 
J.h'un:e: Die A uslassbarkeit ton Sat:teilen bei koor- 
dinativen Verbindungen in Deutschen, Akadtmie- 
| "crlag-Berlin. 197~ 
l.A.Mel'~uk: Dependency Syntax: Theory and Prac- 
tice. Stole Unirersity of A'eu" }brk Press. 1988 
M.Phltek. Qutstions of Graphs and Automata m 
Generative Description of Language. The Prague 
Bulletin of Mathematical Linguistics ~I.197~. 
pp.'27-63 
P.Sgall. J.Pam yard: Dependency ~qwda2" - a Cbal- 
hng¢. Thto1~tical Linguistics. IbLIS. A'o.i/~.'. 
H'alh r & Gruytcr. Berlin - ,V¢ w )brk 1988/89. 
lq~. 7.?. l;6" 
