Guaranteeing Parsing Termination of Uni cation Grammars
Efrat Jaeger and Nissim Francez
Department of Computer Science
Technion, Israel Institute of Technology
32000 Haifa, Israel
Shuly Wintner
Department of Computer Science
University of Haifa
31905 Haifa, Israel
Abstract
Uni cation grammars are known to be Turing-
equivalent; given a grammar a0 and a word a1 , it is
undecidable whether a1a3a2a5a4a7a6 a0a9a8 . In order to ensure
decidability, several constraints on grammars, com-
monly known as off-line parsability (OLP) were
suggested. The recognition problem is decidable for
grammars which satisfy OLP. An open question is
whether it is decidable if a given grammar satis es
OLP. In this paper we investigate various de nitions
of OLP, discuss their inter-relations and show that
some of them are undecidable.
1 Introduction
Context-free grammars are considered to lack the
expressive power needed for modelling natural lan-
guages. Uni cation grammars have originated as an
extension of context-free grammars, the basic idea
being to augment the context-free rules with fea-
ture structures (FSs) in order to express additional
information. Today, several variants of uni cation
grammars exist, some of which do not necessarily
assume an explicit context-free backbone.
The recognition problem (also known as the
membership problem), for a grammar a0 and a string
a1 , is whether a1a3a2a5a4a10a6
a0a9a8 . The parsing problem, for
a grammar a0 and a string a1 , is to deliver all parse
trees that a0 induces on a1 , determining what struc-
tural descriptions are assigned by a0 to a1 . The rest
of this paper is concerned with recognition.
Uni cation grammars have the formal power of
Turing machines, thus the recognition problem for
them is undecidable. In order to ensure decidability
of the recognition problem, a constraint called off-
line parsability (OLP) was suggested. The recog-
nition problem is decidable for OLP grammars.
There exist several variants of OLP in the literature
(Pereira and Warren, 1983; Johnson, 1988; Haas,
1989; Torenvliet and Trautwein, 1995; Shieber,
1992; Wintner and Francez, 1999; Kuhn, 1999).
Some variants of OLP were suggested without
recognizing the existence of all other variants. In
this paper we make a comparative analysis of the
different OLP variants for the  rst time. Some re-
searchers (Haas, 1989; Torenvliet and Trautwein,
1995) conjecture that some of the OLP variants are
undecidable (it is undecidable whether a grammar
satis es the constraint), although none of them gives
any proof of it. There exist some variants of OLP
for which decidability holds, but these conditions
are too restrictive; there is a large class of non-OLP
grammars for which parsing termination is guaran-
teed. Our main contribution is to show proofs of
undecidability for three OLP de nitions.
Section 2 de nes the basic concepts of our for-
malism. Section 3 discusses the different OLP de -
nitions. Section 4 gives an analysis of several OLP
de nitions and the inter-relations among them. Sec-
tion 5 proves the undecidability of three of the OLP
conditions.
2 Preliminaries
The following de nitions are based on Francez
and Wintner (In preperation) and Carpenter (1992).
Grammars are de ned over a  nite set FEATS of fea-
tures, a  nite set ATOMS of atoms, and a  nite set
CATS of categories. A multi-rooted feature struc-
ture (MRS) is a pair a11a13a12a14a16a15a17a0a9a18 where a0a20a19 a11 a14a21a15a23a22a24a15a23a25a26a18
is a  nite, directed, labelled graph consisting of
a set a14 a27 NODES of nodes, a partial function,
a22a29a28a26a14a31a30 FEATS
a32
a14 , specifying the arcs and a par-
tial function, a25a33a28a34a14 a32 ATOMS, labelling the sinks,
and a12a14 is an ordered set of distinguished nodes in a14
called roots. a0 is not necessarily connected, but the
union of all the nodes reachable from all the roots in
a12
a14 is required to yield exactly a14 . The length of an
MRS is the number of its roots, a35 a12a14 a35.
Meta-variables a36
a15a38a37 range over MRSs, and
a22a24a15a23a25a39a15a38a14a21a15
a12
a14 over their constituents. If
a36
a19
a11 a12
a14a40a15a17a0a9a18
is an MRS and
a12
a41a43a42 is a root in
a12
a14 then
a12
a41a44a42 naturally
induces an FS a45 a42 a19 a6 a14 a42 a15
a12
a41a44a42
a15a44a22
a42
a15a44a25
a42
a8 , where a14
a42 is
the set of nodes reachable from
a12
a41a43a42 ,
a25
a42
a19a46a25
a35a47a49a48 and
a22
a42
a19a20a22
a35a47a49a48 . Thus a36 can be viewed as an ordered
sequence a11a50a45a52a51 a15a54a53a24a53a54a53a38a15 a45a56a55 a18 of (not necessarily disjoint)
FSs. We use the two views of MRSs interchange-
ably.
The sub-structure of a36 a19 a11a50a45a57a51 a15a54a53a54a53a24a53a58a15 a45a59a55 a18 , induced
by the pair a11a50a60 a15a50a61a62a18 and denoted a36
a42a64a63a65a63a65a63a66
, is a11a67a45 a42 a15a54a53a24a53a54a53a38a15 a45 a66 a18 .
If a60a69a68 a61 , a36
a42a64a63a65a63a65a63a66
a19a71a70 . If
a60
a19a72a61 ,
a36
a42
is used for a36
a42a64a63a65a63a65a63a42
.
An MRS a36 a19 a11a13a12a14a40a15a38a0a9a18 subsumes an MRS a36a13a73 a19
a11 a12
a14
a73
a15a38a0
a73
a18 (denoted by
a36a75a74a76a36 a73 ) iff a35 a12
a14
a35
a19
a35 a12
a14
a73 a35 and
there exists a total function a77 a28a21a14 a32 a14 a73 such
that for every root
a12
a41a43a42
a2 a12
a14a21a15
a77a49a6
a12
a41a44a42
a8a72a19
a12
a41
a73
a42 ; for ev-
ery a41 a2 a14 and a78a79a2 FEATS, if a22 a6 a41 a15 a78 a8a56a80 then
a77a49a6
a22
a6
a41
a15
a78
a8a38a8a81a19a82a22
a73a50a6a83a77a84a6
a41
a8a85a15
a78
a8 ; and for every
a41
a2
a14 if
a25
a6
a41
a8a64a80 then a25
a6
a41
a8a86a19a87a25
a73 a6a88a77a49a6
a41
a8a38a8 .
Skeletal grammars are a variant of uni cation
grammars which have an explicit context-free back-
bone/skeleton. These grammars can be viewed as
an extension of context-free grammars, where every
category is associated with an informative FS. An
extended category is a pair a11a50a45 a15a44a89a85a18 where a45 is an FS
and a89 a2 CATS.
De nition 2.1. A skeletal grammar (over FEATS,
ATOMS and CATS) is a tuplea0a90a19 a11a88a91 a15a38a92a7a15 a45a94a93 a18 where
a91 is a  nite set of rules, each of which is an MRS
of length a95a97a96a99a98 (with a designated  rst element, the
head of the rule), and a sequence of length a95 of cat-
egories; a92 is a lexicon, which associates with every
terminal a100 (over a  xed  nite set a101 of terminals) a
 nite set of extended categoriesa92 a6a83a100 a8 ; a45 a93 is the start
symbol (an extended category).
A skeletal form is a pair a11a50a36 a15a103a102a89a38a18 , where a36 is an
MRS of length a95 and a102a89 is a sequence of a95 categories
(a89 a42 a2 CATS for a98a105a104a106a60a107a104a106a95 ).
De nition 2.2 (Derivation). Let a11a67a36a109a108 a15a105a102a89 a108 a18 and
a11a50a36a111a110
a15a105a102a89
a110
a18 be forms such that
a36 a108
a19
a11a50a45a52a51
a15a24a53a54a53a54a53a17a15
a45a7a112
a18
and a36 a110 a19 a11a67a113 a51 a15a24a53a54a53a54a53a17a15 a113a59a114 a18 . a11a50a36a111a108 a15a105a102a89 a108 a18 immedi-
ately derives a11a50a36a111a110 a15a29a102a89 a110 a18 iff there exist a skeletal rule
a11
a37
a73
a15a7a102a89a44a115a49a18
a2a99a91 of length a95 and an MRS
a37 , a37
a73a94a74
a37 ,
such that:
a116a118a117
a19a99a119a29a120
a95a21a121a118a122 ;
a116
a37 ’s head is some element
a60 of a36 a108 : a37 a51 a19 a36
a42
a108
;
a116
a37 ’s body is a sub-structure of
a36a109a110 : a37a62a123
a63a65a63a65a63
a55
a19
a36
a42a124a63a65a63a65a63a42a126a125
a55a103a127
a123
a110
;
a116 The  rst
a60a62a121a128a98 elements of a36 a108 and a36a111a110 are iden-
tical: a36
a51
a63a65a63a65a63a42
a127a129a51
a108
a19
a36
a51
a63a65a63a65a63a42
a127a129a51
a110
;
a116 The last
a119
a121a33a60 elements of a36 a108 and a36a62a110 are iden-
tical: a36
a42a126a125
a51
a63a65a63a65a63
a112
a108
a19
a36
a114 a127a129a130a126a112a85a127
a42a126a125
a51a132a131
a63a65a63a65a63
a114
a110
;
a116
a102a89
a110 is obtained by replacing the a60 -th element of
a102a89
a108 by the body of
a102a89 a115 .
The re exive transitive closure of ‘a32 ’ is denoted
‘ a133a32 ’. A form a11a50a36a111a108 a15a10a102a89 a108 a18 derives a11a50a36 a110 a15a105a102a89 a110 a18 (denoted
a11a50a36a111a108
a15a7a102a89
a108
a18
a133
a134
a11a50a36 a110
a15a105a102a89
a110
a18 ) iff there exist MRSs
a36 a73
a108
a15
a36 a73
a110
such that a36 a108 a74a135a36a129a73
a108
, a36a111a110a135a74a82a36a129a73
a110
and a11a50a36a129a73
a108
a15a7a102a89
a108
a18
a133a32
a11a50a36 a73
a110
a15a29a102a89
a110
a18 .
De nition 2.3 (Pre-terminals). Let a1 a19
a100a111a51a137a136a54a136a54a136a67a100a103a55 a2 a101
a133
. a138a7a139a129a140a141a6
a61a39a15a23a119a62a8 is de ned if
a98a142a104
a61
a104
a119
a104a143a95 , in which case it is the skeletal
form, a11a38a11a67a45
a66
a15
a45
a66a38a125
a51
a15a24a53a54a53a24a53a38a15
a45a7a112
a18a85a15
a11a88a144
a66
a15
a144
a66a38a125
a51
a15a54a53a24a53a54a53a38a15
a144a10a112
a18a38a18
where a11a50a45 a42 a15 a144 a42a18 a2 a92 a6a83a100 a42 a8 for a61 a104a145a60a69a104 a119 .
De nition 2.4 (Language). The lan-
guage of a skeletal grammar a0 is
a4a7a6
a0a9a8a146a19 a147
a1 a2 a101
a133
a35a31a1
a19
a100a111a51a137a136a54a136a24a136a132a100a148a55 and
a11a50a45 a93
a18
a133
a134
a11a38a11a50a45a52a51
a15a24a53a54a53a54a53a17a15
a45a59a55
a18a85a15
a11a88a144a105a51
a15a54a53a24a53a54a53a38a15
a144a56a55
a18a38a18a150a149 , where
a11a50a45
a42
a15
a144
a42
a18
a2
a92
a6a83a100
a42
a8 fora61
a104a106a60a69a104
a119 .
De nition 2.5 (Derivation trees). (also known
as constituent structures, c-structure) Let a0 a19
a11a83a91
a15a38a92a56a15
a45 a93
a18 be a skeletal grammar. A tree is a
derivation tree admitted by a0 iff:
a116 The root of the tree is the start symbol
a45 a93 ;
a116 The internal vertices are extended categories
(over the same features, atoms and categories
as the grammar a0 );
a116 The leaves are pre-terminals of length
a98 ;
a116 If a vertex
a11a50a45
a15a23a89a85a18 has a119 descen-
dants, a11a50a113a52a51 a15a23a89 a51 a18a85a15 a11a50a113 a123 a15a44a89 a123 a18a85a15a24a53a54a53a24a53a132a15 a11a50a113a7a112 a15a23a89 a112 a18 ,
then a11a38a11a50a45 a18a85a15 a11 a89a85a18a38a18 immediately derives
a11a38a11a67a113a105a51
a15a24a53a54a53a54a53a17a15
a113a7a112
a18a85a15
a11
a89
a51
a15a54a53a24a53a54a53a38a15a44a89
a112
a18a38a18 with respect
to some rule a11 a37a129a15a7a102a89a43a115a137a18 a2a33a91 .
De nition 2.6. A general uni cation grammar
(over FEATS and ATOMS) is a tuple a0 a19
a11a83a91
a15a38a92a56a15
a45 a93
a18 where
a91 is a  nite set of rules, each of
which is an MRS of length a95a71a96a151a98 ; a92 is a lexicon,
which associates with every terminal a100 a  nite set
of FSs a92 a6a83a100 a8 ; a45 a93 is the start symbol (an FS).
General uni cation grammar formalism do not
assume the existence of a context-free backbone.
Derivations, pre-terminals, languages and deriva-
tion trees for general uni cation grammars are de-
 ned similarly to skeletal grammars, ignoring all
categories.
3 Off-line-parsability constraints
It is well known that uni cation based grammar
formalisms are Turing-equivalent in their genera-
tive capacity (Pereira and Warren, 1983; Johnson,
1988, 87-93); determining whether a given string
a1 is generated by a given grammar
a0 is equiva-
lent to deciding whether a Turing machine a152 halts
on an empty input, which is known to be undecid-
able. Therefore, the recognition problem is unde-
cidable in the general case. However, for gram-
mars that satisfy a certain restriction, called off-
line parsability constraint (OLP), decidability of the
recognition problem is guaranteed. In this section
we present some different variants of the OLP con-
straint suggested in the literature. Some of the
constraints (Pereira and Warren, 1983; Kaplan and
Bresnan, 1982; Johnson, 1988; Kuhn, 1999) apply
only to skeletal grammars since they use the term
category which is not well de ned for general uni-
 cation grammars. Others (Haas, 1989; Shieber,
1992; Torenvliet and Trautwein, 1995; Wintner and
Francez, 1999) are applicable to both skeletal and
general uni cation grammars.
Some of the constraints impose a restriction on
allowable derivation trees, but provide no explicit
de nition of an OLP grammar. Such a de nition
can be understood in (at least) two manners:
De nition 3.1 (OLP grammar).
1. A grammar a0 is OLP iff for every a1a153a2a72a4a10a6 a0a81a8
every derivation tree for a1 satis es the OLP
constraint.
2. A grammar a0 is OLP iff for every a1a153a2a72a4a10a6 a0a81a8
there exists a derivation tree which satis es the
OLP constraint.
We begin the discussion with OLP constraints for
skeletal grammars. One of the  rst de nitions was
suggested by Pereira and Warren (1983). Their con-
straint was designed for DCGs (a skeletal uni ca-
tion grammar formalism which assumes an explicit
context-free backbone) for guaranteeing termina-
tion of general proof procedures of de nite clause
sets. Rephrased in terms of skeletal grammars, the
de nition is as follows:
De nition 3.2 (Pereira and Warren’s OLP for
skeletal grammars (a154a9a4a107a138a156a155a137a157 )). A grammar is off-
line parsable iff its context-free skeleton is not in-
 nitely ambiguous.
The context-free skeleton is obtained by ignoring
all FSs of the grammar rules and considering only
the categories. In Jaeger et al. (2002) we prove
that the depth of every derivation tree generated by
a grammar whose context-free skeleton is  nitely
ambiguous is bounded by the number of syntactic
categories times the size of its yield, therefore the
recognition problem is decidable.
Kaplan and Bresnan (1982) suggested a linguis-
ticly motivated OLP constraint which refers to valid
derivations for the lexical functional grammar for-
malism (LFG), a skeletal grammar formalism. They
impose constraints on two kinds of a158 ’s, optionality
and controlled a158 ’s, but as these terms are not for-
mally de ned, we use a variant of their constraint,
suggested by Johnson (1988, 95-97), eliminating all
a158 ’s of any kind.
De nition 3.3 (Johnson’s OLP (a154a81a4a86a138a160a159 )). A con-
stituent structure satis es the off-line parsability
constraint iff it does not include a non-branching
dominance chain in which the same category ap-
pears twice and the empty string a158 does not appear
as a lexical form annotation of any (terminal) node.
This constraint bounds the depth of any OLP
derivation tree by a linear function of the size of its
yield, thus ensuring decidability of the recognition
problem.
Johnson’s de nition is a restriction on allowable
c-structures rather than on the grammar itself. We
use de nition 3.1 for a154a9a4a107a138 a159 grammars and refer
only to its second part since it is less restrictive.
The next de nition is also based on Kaplan and
Bresnan’s constraint and is also dealing only with
OLP derivations. OLP grammar de nitions are ac-
cording to de nition 3.1.
X-bar theory grammars (Chomsky, 1975) have a
strong linguistic justi cation in describing natural
languages. Unfortunately neither Kaplan and Bres-
nan’s nor Johnson’s constraints allow such gram-
mars, since they do not allow derivation trees in
which the same category appears twice in a non-
branching dominance chain. Kuhn (1999) refers to
the problem from a linguist’s point of view. The
purpose of his constraint was to expand the class of
grammars which satisfy Kaplan and Bresnan’s con-
straint in order to allow X-bar derivations. Again,
since there exists no formal de nition of the differ-
ent kinds of a158 ’s we assume that a158 does not represent
a lexical item (no a158 -rules).
De nition 3.4 (Kuhn’s OLP (a154a81a4a86a138a49a161 )). A c-
structure derivation is valid iff no category appears
twice in a non-branching dominance chain with the
same f-annotation.
Kuhn (1999) gives some examples of X-bar the-
ory derivation trees of German and Italian sen-
tences which contain the same category twice in a
non-branching dominance chain with a different f-
annotation. Therefore they are invalid OLP deriva-
tion trees (by both Kaplan and Bresnan’s and John-
son’s constraints), but they satisfy Kuhn’s OLP con-
straint.
According to Kuhn (1999),  The Off-line
parsability condition is a restriction on allowable c-
structures excluding that for a given string, in nitely
many c-structure analyses are possible . In other
words, Kuhn assumes that OLP is, in fact, a con-
dition that is intended to guarantee  nite ambigu-
ity. Kuhn’s de nition may allow X-bar derivations,
but it does not ensure  nite ambiguity. The fol-
lowing grammar is an LFG grammar generating c-
structures in which the same category appears twice
in a non-branching dominance chain only with a dif-
ferent f-annotation, therefore it satis es Kuhn’s def-
inition of OLP. But the grammar is in nitely am-
biguous:
a155a40a162 a155
a130a164a163
a93a83a165a54a166
a66a49a167a34a168
a131 a169
a130
a166
a131
a167a81a170a44a171
a155a111a172a50a130a164a163a109a173a44a174a176a175a67a177a38a131
a167a40a178
a166a83a179a181a180a124a182
P a183
a93a83a165a54a166
a66a69a184
a183
a93a83a165a54a166
a66a69a184a85a63a83a63a67a63
a183
a173a23a174a38a175a50a177
a184a85a178
a166a83a179a65a185
a63a67a63a83a63
a185a38a185
P a183
a93a83a165a54a166
a66a69a184a85a63a50a63a83a63
a183
a173a23a174a38a175a50a177
a184a85a178
a166a83a179a185
a63a50a63a67a63
a185.
..
P a183a173a23a174a38a175a50a177 a184a85a178
a166a83a179a185
Therefore, it is not clear whether the condition
guarantees parsing termination nor decidability of
the recognition problem and we exclude Kuhn’s
de nition from further analysis.
The following de nitions are applicable to both
skeletal and general uni cation grammars. The  rst
constraint was suggested by Haas (1989). Based on
the fact that not every natural uni cation grammar
has an obvious context-free backbone, Haas sug-
gested a constraint for guaranteeing solvability of
the parsing problem which is applicable to all uni -
cation grammar formalisms.
Haas’ de nition of a derivation tree is slightly
different from the de nition given above (de ni-
tion 2.5). He allows derivation trees with non-
terminals at their leaves, therefore a tree may rep-
resent a partial derivation.
De nition 3.5 (Haas’ Depth-boundedness (a186a16a113 )).
A uni cation grammar is depth-bounded iff for ev-
ery a4a3a68a31a187 there is a a186a188a68a31a187 such that every parse
tree for a sentential form of a4 symbols has depth
less than a186 .
According to Haas (1989),  a depth-bounded
grammar cannot build an unbounded amount of
tree structure from a bounded number of symbols .
Therefore, for each sentential form of lengtha95 there
exist a  nite number of partial derivation trees, guar-
anteeing parsing termination.
The a154a9a4a107a138 a155a137a157 de nition applies only to skeletal
grammars, general uni cation grammars do not nec-
essarily yield an explicit context-free skeleton. But
the de nition can be extended for all uni cation
grammar formalisms:
De nition 3.6 (Finite ambiguity for uni cation
grammars (a189a105a45 )). A uni cation grammar a0 is
OLP iff for every stringa1 there exist a  nite number
of derivation trees.
Shieber’s OLP de nition (Shieber, 1992, 79 82)
is de ned in terms of logical constraint based gram-
mar formalisms. His constraint is de ned in logi-
cal terms, such as models and operations on models.
We reformulate the de nition in terms of FSs.
De nition 3.7 (Shieber’s OLP (a154a81a4a86a138a160a190 )). A gram-
mar a0 is off-line parsable iff there exists a  nite-
ranged function a189 on FSs such that a189a40a6a83a45 a8 a74a191a45
for all a45 and there are no derivation trees admitted
by a0 in which a node a11a50a45 a18 dominates a node a11a50a113 a18 ,
both are roots of sub-trees with an identical yield
and a189a40a6a83a45 a8a86a19 a189a40a6a83a113 a8 .
The constraint is intended to bound the depth of
every derivation tree by the range of a189 times the
size of its yield. Thus the recognition problem is
decidable.
Johnson’s OLP constraint is too restrictive, since
it excludes all repetitive unary branching chains and
a158 - rules, furthermore, it is applicable only to skele-
tal grammars. Therefore, Torenvliet and Trautwein
(1995) have suggested a more liberal constraint,
which is applicable to all uni cation grammar for-
malisms.
De nition 3.8 (Honest parsability constraint
(a192a128a138 )). A grammara0 satis es the Honest Parsabil-
ity Constraint (HPC) iff there exists a polynomiala193
s.t. for each a1a188a2a194a4a7a6 a0a9a8 there exists a derivation
with at mosta193a49a6a38a35a65a1a81a35
a8 steps.
The de nition guarantees that for every string of
the grammar’s language there exists at least one
polynomial depth (in the size of the derived string)
derivation tree. Furthermore, the de nition allows
X-bar theory derivation trees, since a category may
appear twice in a non-branching dominance chain as
long as the depth of the tree is bounded by a poly-
nomial function of its yield.
4 OLP Analysis
In this section we  rst give some grammar examples
and mention their OLP properties, then compare the
different variants of OLP de nitions using these ex-
amples. The examples use a straightforward encod-
ing of lists as FSs, where an empty list is denoted
by a11 a18 , and a11 head a35 tail a18 represents a list whose  rst
item is a77a34a195a54a100a39a196 , followed by a197a58a100a198a60a67a199 .
Figure 1 lists an example uni cation grammar
generating the language a147a201a200a24a149
a125
. A string of a202 oc-
currences of a200 has exactly one parse tree and its
depth is a122a103a202 . Therefore, a0 a51 is a189a105a45 and a192a128a138 . a0 a51 is
neither a186a21a113 nor a154a9a4a86a138 a190 ; it may generate arbitrarily
deep derivation trees (containing lists of increasing
length) whose frontier consists of only one symbol,
and thus there exists no  nite-ranged function map-
ping each FS on such a derivation to a  nite set of
FSs.
a108a156a203
a167a118a204 CAT
a184
a93WORD a184a44a171 s
a180a126a205
a206
a167
a207a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a209
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208a210
a204 CAT
a184
a93WORD a184a44a171 s
a180a126a205
a127a54a162
a204 CAT
a184
a173
WORD
a184a23a171 t
a180a126a205
a211
CAT
a184
a173
WORD
a184 a212 a213 a127a85a162
a211
CAT
a184
a173
WORD
a184a23a171 t
a214
a212
a180
a213
a211
CAT
a184
a173
WORD
a184 a212 a213 a127a85a162
a211
CAT
a184a23a215
WORD
a184 a212 a213
a211
CAT
a184a23a215
WORD
a184a44a171 t
a214
a212
a180
a213 a127a216a162
a211
CAT
a184a38a215
WORD
a184 a212 a213
a204 CAT
a184a176a215
WORD
a184a85a171 t
a180a205
a217a208
a208
a208
a208
a208
a208
a208
a208
a208
a208a218
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a219
a169
a130
a166
a131
a167a33a220a221a204 CAT
a184a38a215
WORD
a184a85a171 t
a180a205a201a222
Figure 1: A uni cation grammar, a223
a212 ,
a224a13a225a124a223
a212a132a226a62a227a229a228a17a230a132a231a44a232 .
Figure 2 lists an example uni cation grammar
generating the language a147a201a200a24a149 . There exist in nitely
many derivation trees, of arbitrary depths, for the
string a200 , therefore, a0 a123 is neither a186a21a113 nor a189a105a45 nor
a154a9a4a107a138 a190 .
a0
a123 is
a192a128a138 ; there exists a derivation tree for
a200 of depth
a122 .
Figure 3 lists an example uni cation grammar
generating the language a147a201a200a24a149
a125
. A string of a202 occur-
a108 a203
a167 a204 CAT
a184
a93WORD a184a24a171 s
a180 a205
a206
a167
a207a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a209
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208a210
a204 CAT
a184
a93WORD
a184a26a171 s
a180 a205
a127a43a162
a204 CAT
a184
a173
WORD
a184a26a171 t
a180 a205
a211
CAT
a184
a173
WORD
a184 a212 a213 a127a54a162
a211
CAT
a184
a173
WORD
a184a201a171 t
a214
a212
a180
a213
a211
CAT
a184
a173
WORD
a184 a212 a213 a127a54a162
a211
CAT
a184a233a215
WORD
a184 a212 a213
a211
CAT
a184a233a215
WORD
a184a54a171 t
a214
a212
a180
a213 a127a54a162
a211
CAT
a184a233a215
WORD
a184 a212 a213
a217a208
a208
a208
a208
a208
a208
a208
a208
a208
a208a218
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a219
a169
a130
a166
a131
a167 a220a59a204 CAT
a184a233a215
WORD
a184a201a171 t
a180 a205a103a222
Figure 2: A uni cation grammar, a223a69a234 , a224a13a225a124a223a84a234 a226a62a227a229a228a17a230a132a231 .
rences of a200 has exactly one parse tree. The feature
DEPTH represents the current depth of the derivation
tree; at each derivation step an item is added to the
DEPTH list. The feature TEMP represents the num-
ber of derivation steps before generating the next a200
symbol. Every application of the second rule dou-
bles the depth of TEMP list (with respect to its length
after the previous application of the rule). Thus the
number of derivation steps for generating each a200 is
always twice the number of steps for generating its
predecessor, and for every sentential form of length
a202 any partial derivation tree’s depth is bounded by
an exponential function of a202 (approximately a122a103a235 ).
Therefore a0a29a236 is a189a105a45 and a186a21a113 but neither a154a9a4a107a138 a190 nor
a192a128a138 .
a108 a203
a167a145a237
CAT
a184
a93DEPTH a184a85a171
a180TEMP
a184a85a171
a180a13a238
a206
a167
a207a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a209
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208a210
a237
CAT
a184
a93DEPTH
a184a85a171
a180TEMP a184a23a171
a180 a238
a162
a237
CAT
a184
a173
DEPTH
a184a85a171 t
a180TEMP a184a44a171
a180 a238
a237
CAT
a184
a173
DEPTH
a184 a212
TEMP
a184a23a171
a180 a238
a162
a237
CAT
a184
a173
DEPTH
a184a85a171 t
a214
a212
a180
TEMP
a184 a212
a238
a204 CAT
a184a23a239
LEX
a184 t
a205
a237
CAT
a184
a173
DEPTH
a184 a212
TEMP
a184a23a171 t
a214
a234
a180a238
a162
a237
CAT
a184
a173
DEPTH
a184a85a171 t
a214
a212
a180
TEMP
a184
a234
a238
a237
CAT
a184
a173
DEPTH
a184a85a240a38a241
TEMP
a184a23a171
a180 a238
a162
a204 CAT
a184a38a239
LEX
a184 t
a205
a217a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208a218
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a208
a219
a169
a130
a166
a131
a167 a220a56a204 CAT
a184a23a239
LEX
a184 t
a205a103a222
Figure 3: A uni cation grammar, a223a160a242 , a224a13a225a124a223a84a242 a226a109a227a33a228a17a230a132a231a44a232 .
Inter-relations among the OLP de nitions
Below we make a comparison of all given OLP def-
initions; such relationships were not investigated in
the past. We begin by considering skeletal gram-
mars.
Johnson’s condition is the only one omitting all
a158 ’s, thus none of the others implies a154a81a4a86a138a160a159 .
a154a9a4a107a138a49a159  a192a128a138 : The depth of any a154a9a4a107a138a84a159 deriva-
tion tree is bounded by a linear function of its yield,
therefore for every string there exists a derivation
tree of at most a polynomial depth, and an a154a81a4a86a138 a159
grammar is a192a128a138 .
a154a9a4a107a138 a159  a154a9a4a86a138a156a155a137a157 ,a186a16a113 ,a189a105a45 ,a154a9a4a107a138 a190 : The gram-
mar of  gure 2 is an a154a9a4a107a138 a159 grammar (viewing CAT
as the category) but it does not satisfy the other con-
straints.
a154a9a4a107a138 a155a137a157  a186a21a113 ,a189a105a45 ,a154a9a4a86a138a84a190 ,a192a128a138 : By Jaeger et
al. (2002), the depth of any derivation tree
(partial/non-partial) admitted by an a154a81a4a86a138 a155a13a157 gram-
mar is bounded by a linear function of the size of
its yield, thus an a154a9a4a107a138 a155a137a157 grammar satis es all
the other constraints. A grammar satisfying the
constraints may still have an in nitely ambiguous
context-free backbone.
We continue the analysis by comparing the def-
initions which are applicable to general uni cation
grammars.
a186a21a113  a189a105a45 : A a186a21a113 grammar is also a189a105a45 ; it
can only generate derivation trees whose depth is
bounded by a function of their yield, and there ex-
ist only a  nite number of derivation trees up to a
certain depth. By  gure 1, an a189a105a45 grammar is not
necessarily a186a16a113 .
a186a21a113  a154a9a4a107a138 a190 : None of the conditions implies
the other. The grammar of  gure 3 is a186a21a113 but not
a154a9a4a107a138 a190 . A grammar whose language consists of
only one word, and its derivation is of a constant
depth, may still contain a redundant rule generating
arbitrarily deep trees whose frontier is of length a98 .
Thus it is a154a9a4a107a138 a190 but not a186a21a113 .
a186a21a113 ,a189a105a45  a192a128a138 : a186a16a113 means that every derivation
tree is bounded by some function of its yield. a192a128a138
means that for every string there exist at least one
derivation tree of a polynomial depth of its yield.
The grammar of  gure 3 is a186a21a113 and a189a105a45 , but since
every derivation tree’s depth is exponential in the
size of its yield, it is not a192a243a138 . The grammar of  gure
2 is a192a128a138 , but since it is in nitely ambiguous, it is
neither a189a105a45 nor a186a21a113 .
a189a105a45 ,a192a128a138  a154a81a4a86a138 a190 : The depth of any derivation
tree admitted by an a154a81a4a86a138 a190 grammar is bounded by
a linear function of its yield. Thus an a154a9a4a86a138 a190 gram-
mar is a189a105a45 and a192a243a138 . By  gure 1, an a189a105a45 and a192a128a138
grammar is not necessarily a154a9a4a86a138 a190 .
Figure 4 depicts the inter-relations hierarchy dia-
gram of the OLP de nitions, separated for skeletal
and general uni cation grammars. The arrows rep-
resent the implications discussed above.
skeletal grammars.
a244
a110 a245 a108 a246 a155
a247a137a248
a155a39a249a103a250
a247a137a248
a155a198a251
a247a156a248
a155a62a252
general uni cation grammars.
a244
a110 a245 a108 a246 a155
a247a137a248
a155a198a251
Figure 4: Inter-relations Hierarchy diagram.
5 Undecidability proofs
For the de nitions which are applicable only to
skeletal grammars it is easy to verify whether a
grammar satis es the constraint. The de nitions
that apply to arbitrary uni cation grammars are
harder to check. In this section we give sketches of
proofs of undecidability of three of the OLP de ni-
tions: Finite Ambiguity (a189a105a45 ), Depth-Boundedness
(a186a21a113 ) and Shieber’s OLP (a154a9a4a107a138 a190 ).
Theorem 1. Finite ambiguity is undecidable.
Proof sketch. In order to show that  nite ambiguity
is undecidable, we use a reduction from the mem-
bership problem, which is known to be undecidable
(Johnson, 1988). We assume that there exists an al-
gorithm, a45a59a245 a108 , for deciding a189a105a45 and show how it
can be used to decide whether a1a87a2a229a4a10a6 a0a81a8 .
Given a string a1 and a grammar a0 , construct
a0
a73 , by adding the rule a45 a93 a32a135a45 a93 to a0 ’s set of rules.
Apply a45a59a245 a108 to a0 a73 , a0 a73 is a189a105a45 on a1 iff a1a254a253a2a142a4a10a6 a0a81a8 . If
a1a75a2a106a4a7a6
a0a9a8 then
a1a153a2a106a4a10a6
a0
a73
a8 , therefore by applying
the rule a45 a93 a32a255a45 a93 in nitely many times, there exist
an in nite number of derivation trees for a1 admitted
by a0 a73 . If a1a3a253a2a229a4a7a6 a0a9a8 then a1 a253a2a229a4a10a6 a0 a73a8 , no application
of the additional rule would generate any derivation
tree for a1 , and a0 a73 is  nitely ambiguous.
Since the membership problem is undecidable, it
is undecidable whether there exist only a  nite num-
ber of derivation trees for a stringa1 admitted by a0 .
Hence  nite ambiguity is undecidable.
Theorem 2. Depth-boundedness is undecidable.
Proof sketch. In order to prove undecidability of
depth-boundedness, we use a reduction from the
Turing machines halting problem, which is known to
be undecidable (Hopcroft and Ullman, 1979, 183-
185). We assume that there exists an algorithm,
a45
a244
a110 , for decidinga186a16a113 and show how it can be used
to decide whether a Turing machine a152 terminates
on the empty input a158 .
Johnson (1988) suggested a transformation from
the Turing machines halting problem to uni cation
grammars. The transformation generates a gram-
mar, a0 a1 , which consists of unit-rules only, and can
generate at most one complete derivation tree. As-
sume the existence of an algorithm a45 a244 a110 . Apply
a45
a244
a110 to
a0
a1 . If
a0
a1 is
a186a21a113 then the grammar gen-
erates a complete derivation tree, therefore its lan-
guage is non empty and a152 terminates on the empty
input. Otherwise, a4a7a6 a0a9a8a84a19a3a2 and a152 does not termi-
nate on the empty input. Thus, we have decided the
Turing machines halting problem.
Theorem 3. a154a81a4a86a138 a190 is undecidable.
Proof sketch. In order to prove undecidability of
a154a9a4a107a138 a190 , we use a combination of the undecidability
proofs of a186a16a113 and a189a105a45 . Given a Turing machine a152 ,
construct a0 a1 using Johnson’s reduction, then con-
struct a0 a73
a1 by adding
a45 a93 a32 a45 a93 to a0 a1 . Assume the
existence of an algorithma45 a190 , deciding a154a9a4a107a138 a190 . a0 a73a1
is a154a9a4a107a138 a190 iff a152 does not terminate on the empty in-
put. Thus, by applying a45 a190 on a0 a73a1 , we have decided
the Turing machines halting problem.
6 Conclusions
In this paper we compare several variants of the
OLP constraint for the  rst time. We give sketches
of proofs of undecidability of three OLP conditions,
full proofs along with undecidability proofs of other
conditions are given in Jaeger et al. (2002). In
Jaeger et al. (2002) we also give a novel OLP con-
straint as well as an algorithm for deciding whether
a grammar satis es it. The constraint is applicable
to all uni cation grammar formalisms. It is more
liberal than the existing constraints that are limited
to skeletal grammars only, yet, unlike all de nitions
that are applicable to general uni cation grammars,
it can be tested ef ciently.
Acknowledgements
The work of Nissim Francez was partially funded
by the vice-president’s fund for the promotion of
research at the Technion. The work of Shuly Wint-
ner was supported by the Israeli Science Foundation
(grant no. 136/1).

References

Bob Carpenter. 1992. The Logic of Typed Fea-
ture Structures. Cambridge Tracts in Theoretical
Computer Science. Cambridge University Press.

Noam Chomsky. 1975. Remarks on nominaliza-
tion. In Donald Davidson and Gilbert H. Har-
man, editors, The Logic of Grammar, pages 262 
289. Dickenson Publishing Co., Encino, Califor-
nia.

Nissim Francez and Shuly Wintner. In preperation.
Feature structure based linguistic formalisms.

Andrew Haas. 1989. A parsing algorithm for
uni cation grammar. Computational Linguistics,
15(4):219 232.

J. Hopcroft and J. Ullman. 1979. Introduction to
automata theory languages and computation.

Efrat Jaeger, Nissim Francez, and Shuly Wint-
ner. 2002. Uni cation grammars and off-line
parsability. Technical report, Technion, Israel In-
stitute of Technology.

Mark Johnson. 1988. Attribute-Value Logic and the
Theory of Grammar. CSLI Lecture Notes. CSLI.

Ronald M. Kaplan and Joan Bresnan. 1982.
Lexical-functional grammar: A formal system
for grammatical representation. The MIT Press,
page 266.

Jonas Kuhn. 1999. Towards a simple architecture
for the structure-function mapping. Proceedings
of the LFG99 Conference.

Fernando C. N. Pereira and David H. D. Warren.
1983. Parsing as deduction. Proceedings of ACL
- 21.

Stuart M. Shieber. 1992. Constraint-based gram-
mar formalisms. MIT Press.

Leen Torenvliet and Marten Trautwein. 1995. A
note on the complexity of restricted attribute-
value grammars. ILLC Research Report and
Technical Notes Series CT-95-02, University of
Amsterdam, Amsterdam.

Shuly Wintner and Nissim Francez. 1999. Off-line
parsability and the well-foundedness of subsump-
tion. Journal of Logic, Language and Informa-
tion, 8(1):1-16, January.
