Unification of Disjunctive Feature Descriptions 
Andreas Eisele, Jochen D6rre 
Institut f'dr Maschinelle Sprachverarbeitung 
Universit~t Stuttgart 
Keplerstr. 17, 7000 Stuttgart 1, West Germany 
Netmaih ims@rusvx2.rus.uni-stuttgart.dbp.de 
Abstract 
The paper describes a new implementation of 
feature structures containing disjunctive values, 
which can be characterized by the following main 
points: Local representation of embedded dis- 
junctions, avoidance of expansion to disjunctive 
normal form and of repeated test-unifications for 
checking consistence. The method is based on a 
modification of Kasper and Rounds' calculus of 
feature descriptions and its correctness therefore 
is easy to see. It can handle cyclic structures and 
has been incorporated successfully into an envi- 
ronment for grammar development. 
1 Motivation 
In current research in computational linguistics 
but also in extralinguistic fields unification has 
turned out to be a central operation in the mod- 
elling of data types or knowledge in general. 
Among linguistic formalisms and theories which 
are based on the unification paradigm are such 
different theories as FUG \[Kay 79,Kay 85\], LFG 
\[Kaplan/Bresnan 82\], GSPG \[Gazdar et al. 85\], 
CUG \[Uszkoreit 86\]. However, research in unifi- 
cationis also relevant for fields like logic program- 
rning, theorem proving, knowledge representation 
(see \[Smolka/Ait-Kaci 87\] for multiple inheritance 
hierarchies using unification), programming lan- 
guage design \[Ait-Kaci/Nasr 86\] and others. 
The version of unification our work is based on 
is graph unification, which is an extension of term 
unification. In graph unification the number of 
arguments is free and arguments are selected by 
attribute labels rather than by position. The al- 
gorithm described here may easily be modified to 
apply to term unification. 
The structures we are dealing with are rooted 
directed graphs where arcs starting in one node 
must carry distinct labels. Terminal nodes may 
also be labelled. These structures are referred to 
by various names in the literature: feature struc- 
tures, functional structures, functional descrip- 
tions, types, categories. We will call them feature 
structures I throughout this paper. 
In applications, other than toy applications, the 
efficient processing of indefinite information which 
is represented by disjenctive specifications be- 
comes a relevant factor. A strategy of multiplying- 
out disjunction by exploiting (nearly) any combi- 
nation of disjuncts through backtracking, as it is 
done, e.g., in the case of a simple DCG parser, 
quickly runs into efficiency problems. On the other 
hand the descriptional power of disjunction often 
helps to state highly ambiguous linguistic knowl- 
edge clearly and concisely (see Fig. I for a disjunc- 
tive description of morphological features for the 
six readings of the german noun 'Koffer'). 
Koffer: 
morph: 
sem: 
oo. 
r sg11 agr: L.pers: 3 J/ 
gend: masc / 
case: {nom dat acc}J 
mum: pill agr: \[pers: 3 J| 
gend: masc / 
case: {nom gen acc}J 
arg: \[\] 
Figure 1: Using disjunction in the description of 
linguistic structures 
Kasper and Rounds \[86\] motivated the distinc- 
tion between feature structures and formulae of a 
logical calculus that are used to describe feature 
structures. Disjunction can be used within such 
a formula to describe sets of feature structures. 
With this separation the underlying mathematical 
framework which is used to define the semantics 
of the descriptions can be kept simple. 
1We do not, ms is frequently done, restrict ourselves to 
acydlc structures. 
286 
2 Disjunctive Feature De- 
scriptions 
We use a slightly modified version of the formula 
language FRL of Kasper and Rounds \[86\] to de- 
scribe our feature structures. Fig. 2 gives the syn- 
tax of FRL', where A is the set of atoms and L the 
set of labels. 
FML' contains: 
NIL 
TOP 
a where a E A 
1 : ~ where 1E L, @ E Flff.' 
A 9 where ~,~ E FILL' 
V • where ~,~ E FRL' 
~p) where p E L ° 
Figure 2: Syntax of FML' 
In contrast to Kasper and Rounds \[86\] we do 
not use the syntactic construct of path equivalence 
classes. Instead, path equivalences are expressed 
using non-local path expressions (called pointers 
in the sequel). This choice is motivated by the 
fact that we use these pointers for an efficient rep- 
resentation below, and we want to keep FIK.' as 
simple as possible. 
The intuitive semantics of FIK/is as follows (see 
\[Kasper/Rounds 86\] for formal definitions): 
1. NIL is satisfied by any feature structure. 
2. TOP is never satisfied. 
3. a is satisfied by the feature structure consisting 
only of a single node labelled a. 
4. I : ~ requires a (sub-)structure under arc I to 
satisfy @. 
5. @ A • is satisfied by a feature structure that 
satisfies ~ and satisfies ~. 
6. • V • is satisfied by a feature structure that 
satisfies @ or satisfies 9. 
7. (p) requires a path equivalence (two paths lead- 
ing to the same node) between the path (p) 
and the actual path relative to the top-level 
structure.2 
The denotation of a formula @ is usually defined 
as the set of minimal elements of SAT(~) with 
respect to subsumption 3, where SAT(@) is the set 
2 This construct is context-sensitive in the sense that the 
denotation of (p) may only be computed with respect to the 
whole structure that the formula describes. 
3The subsumptlon relation _E is a partial ordering on 
feature structures inducing a semi-lattice. It may be de- 
fined as: FS1 C FS2 iff the set of formula~ satisfied by FS2 
includes the set of formulae satisfied by FS1. 
of feature structures which satisfy &. 
Example: The formula 
~=subj:agr:(agr) A ¢ase:(nom V ace) 
denotes the two graphs 
subj agr case subj agr case 
nora acc 
3 The Problem 
The unification problem for disjunctive feature de- 
scriptions can be stated as follows: 
Given two formulae that describe feature 
structures, find the set of feature struc- 
tures that satisfy both formulae, if it is 
nonempty, else announce 'fail'. 
The simplest way to deal with disjunction is 
to rewrite any description into disjunctive nor- 
mal form (DNF). This transformation requires 
time and space exponential with the number 
of disjuncts in the initial formula in the worst 
case. Although the problem of unifying disjunc- 
tive descriptions is known to be NP-complete (see 
\[Kasper 87a\]), methods which avoid this transfor- 
mation may perform well in most practical cases. 
The key idea is to keep disjunction local and con- 
sider combinations of disjuncts only when they re- 
fer to the very same substructure. This strategy, 
however, is complicated by the fact that feature 
structures may be graphs with path equivalences 
and not only trees. Fig. 3 shows an example where 
unifying a disjunction with a structure containing 
reentrancy causes parts of the disjunction to be 
linked to other parts of the structure. The dis- 
junction is e:rported via this reentrancy. Hence, 
the value of attribute d cannot be represented 
uniquely. It may be + or -, depending on which 
disjunct in attribute a is chosen. To represent this 
information without extra formal devices we have 
to lift the disjunction one level up. 4 
4 In this special case we still could keep the disjunction 
in the attribute a by inverting the pointer. A pointer (a b) 
underneath label d would allow us to specify the value of d 
dependent on the disjunction under a. 
287 
a" 
I b: \[o. 
C: :'II :\] 
V \[a: \[b: 
d: \[ 3 
(d) \]\] 
\[ . Eb 1\] / c 
Figure 3: Lifting of disjunction due to reentrancy 
4 From Description to Effi- 
cient Representation 
It is interesting to investigate whether FI~' is suit- 
able as an encoding of feature structures, i.e. if it 
can be used for computational purposes. 
However, this is clearly not the case for the un- 
restricted set of formulae of FML', since a given 
feature structure can be represented by infinitely 
many different formulae of arbitrary complexity 
and -- even worse -- because it is also not pos- 
sible to ascertain whether a given formula repre- 
sents any feature structure at all without extensive 
computation. 
On the other hand, the formulae of FIK.' have 
some properties that are quite attractive for repre- 
senting feature structures, such as embedded and 
general disjunction and the possibility to make use 
of the law of distributivity for disjunctions. 
Therefore we have developed an efficiency- 
oriented normal form F~F, which is suitable as an 
efficient representation for sets of feature struc- 
tures. 
The formulae are built according to a restricted 
syntax (Fig. 4, Part A) and have to satisfy condi- 
tion Cs~j. (Part B). The syntax restricts the use of 
conjunction and TOP in order to disallow contra- 
dictory information in a formula other than TOP. 
However, even in a formula of the syntax of Part A 
inconsistence can be introduced by a pointer to a 
location that is 'blocked' by an atomic value on a 
higher level. For example in the formula a: (b c) 
A b:d the path (b c) is blocked since it would 
require the value of attribute b to be complex in 
conflict to the atomic value d, thus rendering the 
A) Restricted syntax of ENF: 
NIL 
TOP 
a where a q A 
11 : ~I ^"" ^ In : ~, where ~i E EI\[F\{TOP}, 
li E L, li # lj for i :f= j 
V • where @, • E ESF\{TOP} 
(p) where p E L'. 
B) Additional condition Cs~,: 
ff an instance ~ of a formula @ contains a pointer 
(p), then the path p must be realized in 6. 
Figure 4: A normal form to describe feature struc- 
tures efficiently 
formula non-satisfiable. With the additional con- 
dition Cs~, such~inconsistencies are excluded. Its 
explanation in the next section is somewhat tech- 
nical and is not prerequisite for the overall under- 
standing of our method. 
Condition Cs  . 
First we have to introduce some terminology. 
Instance: When every disjunction in a formula 
is replaced by one of its disjuncts, the result is 
called an instance of that formula. 
Realized: A recursive definition of what we call 
a reafized path in an instance ~b is giver in Fig. 5. 
The intuitive idea behind this notion is to restrict 
is realized in ~b, if ~b ~ TOP 
! E L is realized in It : ~bt A... A 1, : ~b, (even 
if/~ {It...In}) 
l.p is realized in .-. A I : ~b A -.., if p is 
realized in 
p is realized in (p'), if pip is realized in 
the top-level formula 
Figure 5: Definition of realized paths 
pointers in such a way that the path to their des- 
tination may not be blocked by the introduction 
of an atomic value on a prefix of this path. Note 
that by virtue of the second line of the definition, 
the last label of the path does not have to actually 
occur in the formula, if there are other labels. 
Example: In a: (b c) only the path e and each 
path of length 1 is realized. Any longer path may 
be blocked by the introduction of an atomic value 
at level 1. Thus, the formula violates CENP. 
288 
a:(b d) A b:(c) A c:(d:x V b:y), on the 
other hand, is a well-formed gNF formula, since it 
contains only pointers with realized destinations 
in every disjunct. 
The easiest way to satisfy the condition is to in- 
troduce for each pointer the value NIL at its des- 
tination when building up a formula. With this 
strategy we actually never have to check this con- 
dition, since it is maintained by the unification 
algorithm described below. 
Properties of ENF 
The most important properties of formulae in ~.NF 
are: 
• For each formula of ~'llL' an equivalent formula 
in ENF can be found. 
• Each instance of a formula in ¢-~ (besides 
TOP) denotes exactly one feature structure. 
• This feature structure can be computed in lin- 
ear time. 
The first property can be established by virtue of 
the unification algorithm given in the next section, 
which can be used to construct an equivalent glD'- 
formula for an arbitrary formula in FML ~. 
The next point says: It doesn't matter which 
disjunct in one disjunction you choose -- you can- 
not get a contradiction. Disjunctions in gNF are 
mutually independent. This also implies that TOP 
is the only formula in ENF that is not satisfiable. 
To see why this property holds, first consider for- 
mulae without pointers. Contradictory informa- 
tion (besides TOP) can only be stated using con- 
junction. But since we only allow conjunctions of 
different attributes, inconsistent information can- 
not be stated in formulae without pointers. 
Pointers could introduce two sorts of incon- 
sistencies: Since a pointer links two paths, one 
might assume that inconsistent information could 
be specified for them. But since conjunction with 
a pointer is not allowed, only the destination path 
can carry additional information, thus excluding 
this kind of inconsistency. On the other hand, 
pointers imply the existence of the paths they refer 
to. The condition CB~ r ensures that no informa- 
tion in the formula contradicts the introduction of 
these implied paths. We can conclude that even 
formulae containing pointers are consistent. 
The condition CBN P additionally requires that 
no extension of a formula, gained by unification 
with another formula, may contain such contra- 
dicting information. A unification algorithm thus 
can introduce an atomic value into a formula with- 
out having to check if it would block the destina- 
tion path of some pointer. 
5 The Unification Procedure 
Figure 6 shows an algorithm that takes as in- 
put two terms representing formulae in ~-IlF and 
computes an ElfF-representation of their unifica- 
tion. The representation of the formulae is given 
by a 1-to-l-mapping between formulae and data- 
structures, so that we can abstract from the data- 
structures and write formulae instead. In this 
sense, the logical connectives A, V, : are used as 
term-constructors that build more complex data- 
structures from simpler ones. In addition, we use 
the operator • to express concatenation of labels 
or label sequences and write (p) to express the 
pointer to the location specified by the label se- 
quence p. p : ~ is an abbreviation for a formula 
where the subformula 4~ is embedded on path p. 
The auxiliary function unify-aux performs the 
essential work of the unification. It traverses both 
formulae in parallel and builds all encountered 
subformulae into the output formula. The follow- 
ing cases have to be considered: 
• If one of th~ input formulae specifies a sub- 
formula at a location where the other input 
provides no information or if both inputs con- 
tain the same subformula at a certain location, 
this subformula is built into the output with- 
out modification. 
• The next statement handles the case where one 
input contains a pointer whereas the other con- 
rains a different subformula. Since we regard 
the destination of the pointer as the represen- 
tative of the equivalence class of paths, the sub- 
formula has to be moved to that place. This 
case requires additional discussion, so we have 
moved it to the procedure move..Cormula. 
• In ease of two conjunctions the formulae have 
to be traversed recursively and all resulting at- 
tribute - value pairs have to be built into the 
output structure. For clarity, this part of the 
algorithm has been moved to the procedure 
unify_complex. 
• The case where one of the input formulae 
is a disjunction is handled in the procedure 
ua£~y.ztisj that is described in Section 5.2. 
• If none of the previous cases matches (e.g. if 
the inputs are different atoms or an atom and 
a complex formula), a failure of the unification 
has to be announced which is done in the last 
289 
unify(X,Y) ~ formula 
repeat 
(X,Y) := unify_aux(X,Y,~) 
until Y = NIL or Y = TOP 
return(X) 
unify_aux(Ao,al,Pa) ~-, (formula,formula) 
if A0 ffi AI then 
return (LI ,IIL) 
else if £i -- ~IL then 
return (al-i ,NIL) 
else if £~ is the pointer <Pro> then 
return move_formula(A1_~ ,Pa,Pto) 
else if both a i are conjunctions then 
return unify_complex(Ao ,AI ,Pa) 
else if Ai is the disjunction (B V C) 
then 
return unify_disj (Ai-i, B, C. P.) 
else return (TOP,TOP) 
unif y-complex (ao ,al ,Pa) 
~-* (:formula,formula) 
L := A l:v, where l:v occurs in one Ai 
and 1 does not occur in Al-i 
G := NIL 
for all i that appear in both ~ do 
let Vo,Vl be the values of 1 in Ao,at 
(V,GV) := unify_aux(V0,V1,Pa.1) 
if V = TOP or GV.= TOP then 
return (TOP,TOP) 
else L := L A l:V 
G := uaifyCG,GV) 
if G = TOP then return (TOP,TOP) 
return CL,G) 
Figure 6: The unification procedure 
statement. 
The most interesting case is the treatment of 
a pointer. The functional organization of the al- 
gorithm does not allow for side effects on remote 
parts of the top-level formula (nor would this be 
good programming style), so we had to find a dif- 
ferent way to move a suhformula to the destination 
of the pointer. For that reason, we have defined 
our procedures so that they return two results: a 
local result that has to be built into the output for- 
mula at the current location (i.e. the path both in- 
put formulae are embedded on) and a global result 
that is used to express 'side effects' of the uni- 
fication. This global result represents a formula 
that has to be unified with the top-level result in 
order to find a formula covering all information 
contained in the input. 
This global result is normally set to NIL, but the 
procedure move.for,,ula must of course produce 
something different. For the time being, we can as- 
sume the preliminary definition of move.formuXa 
in Figure 7, which will be modified in the next 
subsection. Here, the local result is the pointer 
(since we want to keep the information about the 
path equivalence), whereas the global result is a 
formula containing the subformula to be moved 
embedded at its new location. 
move_formula(F, P/tom, Pro) 
(formula,formula) 
return (<Pto>,Pto :F) 
Figure 7: Movement of a Subformula -- Prelimi- 
nary Version 
The function tinily_complex unifies conjunc- 
tions of label-value-pairs by calling tutify_aux re- 
cursively and placing the local results of these uni- 
fications at the appropriate locations. Labels that 
appear only in one argument are built into the out- 
put without modification. If any of the recursive 
unifications fail, a failure has to be announced. 
The global results from recursive unifications are 
collected by top-level unification 5. The third ar- 
gument of unify_aux and unify_complex contains 
the sequence of labels to the actual location. It is 
not used in this version but is included in prepara- 
tion of the more sophisticated treatment of point- 
ers described below. 
To perform a top-level unification of two formu- 
lae, the call to unify.aux is repeated in order to 
unify the local and global results until either the 
unification fails or the global result is NIL. 
Before extending the algorithm to handle dis- 
junction, we will first concentrate on the question 
how the termination of this repeat-loop can be 
guaranteed. 
5.1 Avoiding Infinite Loops 
There are cases where the algorithm in Figure 6 
will not terminate if the movement of subformulae 
is defined as in Figure 7. Consider the unification 
of a:(b) A b:(a) with a:~. Here, the formula 
sl.f we Allow the global result to be a //~ o\].fm'm~do.e, this 
recursicm could be replaced by list-concatenation. However, 
this would imply modifications in the top-level loop and 
would slightly complicate the treatmem of disjunction. 
290 
will be moved along the pointers infinitely often 
and the repeat-loop in unify will never terminate. 
An algorithm that terminates for arbitrary input 
must include precautions to avoid the introduction 
of cyclic pointer chains or it has to recognize such 
cycles and handle them in a special way. 
When working with pointers, the standard tech- 
nique to avoid cycles is to follow pointer chains 
to their end and to install a new pointer only to 
a location that does not yet contain an outgoing 
pointer. For different reasons, dereferencing is not 
the method of choice in the context of our treat- 
ment of disjunction (see \[Eisele 87\] for details). 
However, there are different ways to avoid cyclic 
movements. A total order '<p' on all possible lo- 
cations (i.e. all paths) can be defined such that, if 
we allow movements only from greater to smaller 
locations, cycles can be avoided. A pointer from a 
greater to a smaller location in this order will be 
called a positive pointer, a pointer from a smaller 
to a greater location will be called negative. But 
we have to be careful about chosing the right or- 
der; not any order will prevent the algorithm from 
an infinite loop. 
For instance, it would not be adequate to move 
a formula along a pointer from a location p to 
its extension p • q, since the pointer itself would 
block the way to its destination. (The equivalence 
class contains (p), (p q), (p q q)... and it makes 
no sense to choose the last one as a representative). 
Since cyclic feature structures can be introduced 
inadvertently and should not lead to an infinite 
loop in the unification, the first condition the order 
'<p' has to fulfill is: 
p<ppq if q#~ 
The order must be defined in a way that positive 
pointers can not lead to even indirect cycles. 
This is guaranteed if the condition 
p <p q =~ rps <p rqs 
holds for arbitrary paths p, q, r and s. 
We get an order with the required properties if 
we compare, in the first place, the length of the 
paths and use a lexicographic order <t for paths 
of the same length. A formal statement of this 
definition is given in Figure 8. 
Note that positive pointers can turn into neg- 
ative ones when the structure containing them is 
moved, as the following example shows: 
a:b:c:d:(a b e) U a:b:c:(f) 
pos. pos. 
= a:b:c:(f) A f:d:(a b e) 
pos. neg. 
P<p q if IPl < Iql 
or if Ipl = \[q\[, P = rils, q = ri2 t, 
r,s,t EL*, Ii EL, i1 <112 
Figure 8: An Order on Locations in a Formula 
However, we can be pragmatic about this point; 
the purpose of ordering is the avoidance of cyclic 
movements. Towards this end, we only have to 
avoid using negative pointers, not writing them 
down. 
To avoid movement along a negative pointer, 
we now make use of the actual location that is 
provided by the third argument of unify-aux and 
unify_complex and as the second argument of 
move.~ormula. 
move_formula(F, Pl,om, Pro) 
~. (formula, formula)' 
if Pro <v P/yore then 
return (<Pto>,Pto :F) 
else if P,o = P/,om then 
return (F, MIL) 
else return (F,Pto:<Plvom>) 
Figure 9: Movement of a Subformula -- Correct 
Version 
The definition of move.~ormula given in Fig- 
ure 7 has to be replaced by the version given in 
Figure 9. We distinguish three cases: 
• If the pointer is positive we proceed as usual. 
• If it points to the actual location, it can be 
ignored (i.e. treated as NIL). This case occurs, 
when the same path equivalence is stated more 
than once in the input. 
• If the pointer is negative, it is inverted by in- 
stalling at its destination a pointer to the ac- 
tual position. 
5.2 Incorporating Disjunction 
The procedure unify-disj in Figure 10 has four 
arguments: the formula to unify with the disjunc- 
tion (which also can be a disjunction), both dis- 
juncts, and the actual location. In the first two 
statements, the unifications of the formula A with 
the disjuncts B and C are performed indepen- 
dently. We can distinguish three main cases: 
* If one of the unifications falls, the result of the 
other is returned without modification. 
* If both unifications have no global effect or if 
the global effects happen to result in the same 
291 
unify_disj(A,B,C,Pa) 
, ~-~ (formula,formula) 
(L1,G1) := unify-aux(A,B,P.) 
(L2,G2) := unify-aux(A,C,P=) 
if L1 = TOP or G1 = TOP then 
return (L2,G2) 
else if L2 = TOP or G2 = TOP then 
return (LI,GI) 
else if G1 = G2 then 
return (LIVL2,GI) 
else return (WIL,pack(unify(P.:L1,G1)V 
unify(P~:L~,G2))) 
Figure 10: Unification with a Disjunction 
formula, a disjunction is returned as local re- 
sult and the common global result of both dis- 
juncts is taken as the global result for the dis- 
junction. 
• If both unifications have different global re- 
sults, we can not return a disjunction as local 
result, since remote parts of the resulting for- 
mula depend on the choice of the disjunct at 
the actual location. This case arrives if one or 
both disjuncts have outgoing pointers and if 
one of these pointers has been actually used to 
move a subformula to its destination. 
The last point describes exactly the case where 
the scope of a disjunction has to be extended to 
a higher level due to the interaction between dis- 
junction and path equivalence, as was shown in 
Figure 3. A simple treatment of such effects would 
be to return a disjunction as global result where 
the disjuncts are the global results unified with the 
corresponding local result embedded at the actual 
position. However, it is not always necessary to 
return a top-level disjunction in such a situation. 
If the global effect of a disjunction concerns only 
locations 'close' to the location of the disjunction, 
we get two global results that differ only in an em- 
bedded substructure. To minimize the 'lifting' of 
the disjunction, we can assume a procedure pack 
that takes two formulae X and Y and returns a 
formula equivalent to X V Y where the disjunction 
is embedded at the lowest possible level. 
Although the procedure pack can be defined in a 
straightforward manner, we refrain from a formal 
specification, since the discussion in the next sec- 
tion will show how the same effect can be achieved 
in a different way. 
6 Implementation 
We now have given a complete specification of a 
unification algorithm for formulae in ENF. How- 
ever, there are a couple of modifications that can 
be applied to it in order to improve its efficiency. 
The improvements described in this section are all 
part of our actual implementation. 
Unification of Two Pointers 
If both arguments are pointers, the algorithm in 
Figure 6 treats one of them in the sarne way as 
an arbitrary formula and tries to move it to the 
destination of the other pointer. Although this 
treatment is correct, some of the necessary com- 
putations can be avoided if this case is treated in 
a special way. Both pointer destinations and the 
actual location should be compared and pointers 
to the smallest of these three paths should be in- 
stalled at the other locations. 
Special Treatment of Atomic Formulae 
In most applications, we do not care about the 
equivalence of two paths if they lead to the same 
atom. Under this assumption, when moving an 
atomic formula along a pointer, the pointer itself 
can be replaced by the atom without loss of infor- 
mation. This helps to reduce the amount of global 
information that has to be handled. 
Ordering Labels 
The unification of conjunctions that contain many 
labels can be accelerated by keeping the labels 
sorted according to some order (e.g. <a). This 
avoids searching one formula for each label that 
occurs in the other. 
Organisation of the Global Results on a 
Stack 
In the algorithm described so far, the global re- 
sult of a unification is collected, but is - apart 
from disjunction - not used before the traversal 
of the input formulae is finished. When formulae 
containing many pointers are unified, the repeated 
traversal of the top-level formula slows down the 
unification, and may lead to the construction of 
many intermediate results that are discarded later 
(after having been copied partially). 
To improve this aspect of the algorithm, we have 
chosen a better representation of the global result. 
Instead of one formula, we represent it as a stack of 
292 
formulae where the first element holds information 
for the actual location and the last element holds 
information for the top-level formula. Each time 
a formula has to be moved along a pointer, its 
destination is compared with the actual location 
and the common prefix of the paths is discarded. 
From the remaining part of the actual location 
we can determine the first element on the stack 
where this information can be stored. The rest of 
the destination path indicates how the information 
has to be represented at that location. 
When returning from the recursion, the first el- 
ement on the stack can be popped and the infor- 
mation in it can be used immediately. 
This does not only improve efficiency, but has 
also an effect on the treatment of disjunction. In- 
stead of trying to push down a top-level disjunc- 
tion to the lowest possible level, we climb up the 
stacks returned by the recursive unifications and 
collect the subformulae until the rests of the stacks 
are identical. In this way, 'lifting' disjunctions can 
be limited to the necessary amount without using 
a function like pack. 
Practical Experiences 
In order to be compatible with existing software, 
the algorithm has been implemented in PROLOG. 
It has been extended to the treatment of unifica- 
tion in an LFG framework where indirectly speci- 
fied labels (e.g in the equation (1" (lpcase)) -- J. ), 
set values and various sorts of constraints have to 
he considered. 
This version has been incorporated into an 
existing grammar development facility for LFGs 
\[Eisele/D6rre 86,Eisele/Schimpf 87\] and has not 
only improved efficiency compared to the former 
treatment of disjunction by backtracking, but also 
helps to survey a large number of similar results 
when the grammar being developed contains (too) 
much disjunction. One version of this system runs 
on PCs with reasonable performance. 
7 Comparison with Other 
Approaches 
7.1 Asymptotical Complexity 
Candidates for a comparison with our algorithm 
are the naive multiplying-out to DNF, Kasper's 
representation of general disjunction \[Kasper 87b\], 
and Karttunen's treatment of value disjunction 
\[Karttunen 84\], also the improved version in 
\[Bear 87\]. Since satisfiability of formulae in FNL is 
known to be an NP-complete problem, we cannot 
expect better than exponential time complexity in 
the worst case. Nevertheless it might be interest- 
ing to find cases where the asymptotic behaviour 
of the algorithms differ. The following statements 
- although somewhat vague - may give an im- 
pression of strong and weak points of the differ- 
ent methods. For each given statement we have 
specific examples, but their presentation or proofs 
would be beyond the scope of this paper. 
7.1.1 Space Complexity (Compactness of 
the Represeatation) 
• When many disjunctions concern different 
substructures and do not depend on each 
other, our representation uses exponentially 
less space than expansion to DNF. 
• There are cases where Kasper's representation 
uses exponentially less space than our repre- 
sentation. This happens when disjunctions in- 
teract strongly, but an exponential amount of 
consistent combinations remain. 
• Since Karttunen's method enumerates all con- 
sistent combinations when several disjunctions 
concern the same substructure, but allows 
for local representation in all other cases, his 
method seems to have a similar space complex- 
ity than ours. 
7.1.2 Time Complexity 
There are cases where Kasper's method uses 
exponentially more time than ours. This hap- 
pens when disjunctions interact so strongly, 
that only few consistent combinations remain, 
hut none of the disjunctions can be resolved. 
When disjunctions interact strongly, hut an ex- 
ponential amount of consistent combinations 
remains, our method needs exponential time. 
An algorithm using Kasper's representation 
could do better in some of these cases, since 
it could find out in polynomial time that each 
of the disjuncts is used in a consistent com- 
bination. However, the actual organisation of 
Kasper's full consistency check introduces ex- 
ponential time complexity for different reasons. 
7.2 Average Complexity and Con- 
clusion 
It is difficult to find clear results when comparing 
the average complexity of the different methods, 
293 
since anything depends on the choice of the exam- 
pies. However, we can make the following general 
observation: 
All methods have to multiply out disjunctions 
that are not mutually independent in order to find 
inconsistencies. 
Kasper's and Karttunen's methods discard the 
results of such computations, whereas our algo- 
rithm keeps anything that is computed until a con- 
tradiction appears. Thus, our method tends to use 
more space than the others. On the other hand, 
since Kasper's and Karttunen's methods 'forget' 
intermediate results, they are sometimes forced to 
perform identical computations repeatedly. 
As conclusion we can say that our algorithm 
sacrifies space in order to save time. 
8 Further Work 
The algorithm or the underlying representation 
can still be improved or extended in various re- 
spects: 
General Disjunction 
For the time being, when a formula is unified with 
a disjunction, the information contained in it has 
to be distributed over all disjuncts. This may 
involve some unnecessary copying of label-value- 
pairs in cases where the disjunction does not in- 
teract with the information in the formula. (Note, 
however, that in such cases only the first level of 
the formula has to be copied.) It seems worthwhile 
to define a relazed ElF, where a formula (AVB)AC 
is allowed under certain circumstances (e.g. when 
(A V B) and C do not contain common labels) 
and to investigate whether a unification algorithm 
based on this relaxed normal form can help to save 
unnecessary computations. 
Functional Uncertainty 
The algorithm for unifying formulae with regular 
path expressions given by Johnson \[Johnson 86\] 
gives as a result of a unification a finite disjunction 
of cases. The algorithm presented here seems to 
be a good base for an efficient implementation of 
Johnson's method. The details still have to be 
worked out. 
Acknowledgments 
The research reported in this paper was supported by the 
EUROTRA-D accompanying project (BMFT grant No. 
101 3207 0), the ESPRIT project ACORD (P393) and the 
project LILOG (supported by IBM Deutschland). Much of 
the inspiration for this work originated from a com-se about 
extensions to unification (including the work of Kasper and 
Rounds) which Hans Uszkoreit held at the University of 
Stuttgart in spring 1987. We had fruitful discussions with 
Lauri Karttnnen about an early version of this algorithm. 
Thanks also go to Jftrgen Wedekind, Henk Zeevat, Inge 
Bethke, and Roland Seiffert for hell~ui discussions and im- 
portant counterexamples, and to Fionn McKinnon, Stefan 
Momnm, Gert Smolka, and Carin Specht for polild~ing up 
our m'gumentation. 
References 
\[A~t-Kacl/Nur 86\] AYt-Kaci, H. and R. Nasa- (1986). LO- 
GIN: A Logic Programming Language with Built-In In- 
heritance. The Journal of Logic Programming, 1986 (3). 
\[Bear 87\] Bear, J. (1987). Feature-Value Unification with 
Disjunctions. Ms. SRI International, Stanford, CA. 
\[Bisele 87\] Eisele, A. (1987). Eine Implementierung rekur- 
Idve¢ Merkanalstzxtkturma mlt dlsjunktiven Angaben. 
Diplomarbeit. Institut f. Informatik, Stuttgart. 
\[Bisele/I~rre 86\] Eisele, A. and J. DSrre (1986). A Lexlcal 
Functional Grammar System in Prolog. In: Proceed/~s 
of COLING 1#86, Bonn. 
\[Eisele/Schimpf 87\] Eisele, A. and S. Sddmpf (1987). Eine 
benutzerfreund~che Softwareumgebttn g zur Entwick- 
lung yon LFGen. Studlenarbeit. IfI, Stuttprt. 
\[Gazdar et al. 85\] Gazdar, G., E. Klein, G. Pullum and I. 
Sag (1985). Ge~-m//m/Ph~e $~-~z~ G~z~r. Lon- 
don: Blackwell, 
\[Johnson S6\] John~m, M. (19S6), Cm~e~ ~th P~r 
PcZ/~ Form~ Ms. CSLI, Stanford, California. 
\[Kaplan/Brem~n 82\] Kaplan, R. und J. Bresnan (1982). 
Lexical Ftmctional Grin,mr:. A Formal System for 
Grammatical Pc, presentatlon. In: J. Bresnan (ed.), The 
MenM/Re~ewtat/o~ o\] Gmmn~//r.~ Re/6//o~. MIT Press, 
Cambridge, Mammdm~tts. 
\[Kartt~men 84\] Karttunen, L. (1984). Feattwes and Value~ 
In: Proeesdi~, o\] COLIN G 1#8~, Stanford, CA. 
\[Kasper 87a\] Kasper, R.T. (1987). Feature Structures: A 
Logical Theory with Application to Language Analysia 
Ph.D. Thesis. University of Michigan. 
\[Kasper 871)\] Kasper, R.T. (1987). A Unification Method 
for Disjunctive Feature Descriptions. In: P~-~b~m oJ 
the P.Sth Anmtal Mee6~ o\] the A CL. Stanford, CA. 
\[Kasper/Ronnds 86\] Kasper, R.T. and W. Rounds (1986). 
A Logic~l Semantics for Feature Structures. In: P~- 
ee.edi~ o/the ~.4th Annzmi Meetiwj o/ the ACL. Columbia 
Univenfity, New York, NY. 
\[Kay 79\] Kay, M. (1979). Functkmal Grammar. In: C. 
Chiare\]lo et al. (eds.) Pn~dings o/the 5th Ann~l Mee~ 
of the Be~dq ~g'=~:~c Soci~. 
\[Kay 85\] Kay, M. (1985). Parsing in Functional Unification 
Grammar. In: D. Dowty, L. Karttunen, and A. Zwicky 
(eds.) N,~t~ml l~n~ge Pardng, Cambridge, England. 
\[Smolks/A~t-Kaci 87\] Smolka, G. and H. A~t-Kaci (1987). 
Inheritance Hierarchies: Semantics and Unification. 
MCC Tech. Pep. No AI-057-87. To appear in: Journal 
of Symbolic Logic, Speci~l Issue on Unification, 1988. 
\[Uszkorelt 86\] Uszkoreit, H. (1986). Categorial Unification 
Grammars. In: /xtmze.d/~s of COLJ~G 1#86, Bonn. 
294 
