DISJUNCTIVE FEATURE STRUCTURES AS HYPERGRAPHS 
JEAN V~RONIS 
Groupe Reprdsentation et Traitement des Connaissances, 
Centre National de la Recherche Scientifique, 
31, Chemin Joseph Aiguier, 13402 Marseille Cedex 09, France 
and 
Department of Computer Science, Vassar College 
Poughkeepsie, New York 12601, U.S.A. 
e-mail : veronis@vassar .edu 
Abstract -- In this paper, we present a new math- 
ematical framework in which disjunctive feature 
structures are defined as directed acyclic hypergraphs. 
Disjunction is defined in the feature structure domain, 
and not at the syntactic level in feature descriptions. This 
enables us to study properties and specify operations in 
terms of properties of, or operations on, hypergraphs 
rather titan in syntactic terms. We illustrate the 
expressive power of this framework by defining a class 
of disjunctive feature structures with interesting 
properties (factored normal form or FNF), such as 
closure under factoring, unfactoring, unification, and 
generalization. Unification, in particular, has the 
intuitive appeal of preserving as much as possible the 
particular factoring of the disjunctive feature structures 
to be unified. We also show that unification in the FNF 
class can be extremely efficient in practical applications. 
1. INTRODUCTION 
It has become common to make a distinction between a 
language for file description of feature structures and 
feature structures themselves, which are seeu as directed 
acyclic graphs (dags) or automata (see, for instance, 
Kasper and Rounds, 1986). To avoid confusion, file 
terms of the representation language are often referred to 
as feature descriptions. Disjunction is a representation 
tool in the representation language, intended to describe 
sets of feature structures. In this framework, there are 
no disjunctive feature structures, but only disjunctive 
feature descriptions. 
This framework has enabled researchers to explore 
the compulational complexity of unification. However, it 
has some drawbacks. First, properties have to be stated 
(and proofs carried out) at the syntactic level. This 
implies using a complicated calculus based on formula 
equivalence rules, rather than using graph-theoretical 
properties. In addition, unification is not well-defined 
with respects to disjunction. There is reference in the 
literature to the "unification of disjunctive feature 
descriptions", but, formally, we should instead speak of 
the unification of the sets of feature structures the 
descriptions represent. 
For example, unifying the sets of feature structures 
represented by the disjunctive feature descriptions in 
Fig. 1 yields a set of four (non-disjunctive) feature 
structures, which can be described by several equally 
legitimate formulae: A factored, B factored, disjunctive 
normal form (DNF), etc. Depending on the algorithm 
that is used, the description of file result will be one or 
the other of these formulae. Some algorithms require 
expansion to DNF and will therefore produce a DNF 
representation as a result, but other algorithms may 
produce different representations. 
There is an important body of research concerned 
with the development of algorithms that avoid the 
expensive expansion to DNF (e.g., Kasper, 1987). 
Thcse algorithms typically produce descriptions of the 
unification, in which some of the disjunctions in the 
original descriptions are retained. However, these 
descriptions are produced as a computational side-effect 
(potentially different depending on the algorithm) rather 
than as a result of the application of a formal definition. 
Fig. 1. Different descriptions for tile same set of feature 
structures 
In this paper, we first consider disjunctive feature 
structures as objects in themselves, defined in terms of 
directed acyclic hypergraphs. This enables us to build a 
mathematical framework based on graph theory in order 
to study the properties of disjunctive feature structures 
and specify operations (such as unification) in algebraic 
rather that syntactic terms. It also enables the 
specification of algorithms in terms of graph 
manipulations, and suggests a data structure for 
implementation. 
We then illustrate the expressive power of this 
framework by defining a class of disjunctive feature 
structures with interesting properties (factored normal 
form or FNF), such as closure under factoring, 
unfactoring, unification, and generalization. These 
operations (and the relation of subsumption) are defined 
in terms of operations on (or relations among) 
hypergraphs. Unification, in particular, has the intuitive 
appeal to preserve as much as possible the particular 
factoring of the disjunctive feature structures to be 
ratified. We also show that unification in the FNF class 
can be extremely efficient in practical applications. 
For lack of space, proofs will be omitted or buly 
suggested. 
Aca~s DE COLING-92, NANa~S, 23-28 Aotrr 1992 4 9 g FROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
2. BASIC FRAMEWORK 
2.1 Disjunctive featme structures as hypeqgraplts 
(Disjunctive) feature structures will be defined as 
directed acyclic hypergraphs. In a hypergraph (see 
Bergc, 1970), ,arcs (hyperarcs) connect sets of nodes 
instead of pairs of nodes, as in usual graphs. We will 
consider hyperarcs as directed from their first node to all 
other nodes. More precisely, each hyperarc will be an 
ordered pair consisting of an input node nio, and a (non- 
empty) set of output nodes n# ..... nit. We will say 
that (hid, {nit ..... nit}) is a k-arc from hid to 
ni t ..... nit, that hid is an immediate predecessor of 
i'ti I ..... nit, and that nit ..... nik are immediate 
successors of hid.. 
A path t ill a hypergraph is a sequence of nodes 
ni .... ni such that for j = 1 ... p - 1 ni is an • l' '. p ' ' '. j 
2mmedtate predecessor of nij+l. If there ex2sLs a path 
from a node ni to a node nj, we will write ni ~ nj. A 
hypergraph is acyclic if there is no node such that n i :::> 
hi. A hypergraph has a root no if tbr each node ni ¥: no, 
no ~ hi. The leaves of a hypergraph are those nodes 
with no successor. A path terminating with a leaf is a 
nutximal path. Nodes with more than one immediate 
predecessor are called merging nodes. 
Definition 2.1 Let L be a set of labels and A be a set 
of atomic values. A (disjunctive)feature structure on (L, 
A) is a quadruple F = (D, no, ,~, C0, respecting the 
consistency conditions 2.1 below, where D is a finite 
directed acyclic hypergraph with a root no, 2 is a partial 
function from the l-arcs of D into L, and o' is a partial 
function front the leaves of D into A. 
Feature structures which have isomorphic hyper- 
graphs, whose corresponding leaves have the same 
value, and whose corresponding feature-arcs have the 
same labels, are isomorphic. We will consider such 
feature structures to be equal up to isomorphisnt. 
Definition 2.2 Labeled l-arcs are called feature-arcs. 
Non-labeled hyperarcs are called OR-arcs. 
Note that OR-arcs are usually k-arcs with k >1, but 
(non-labeled) l-arcs can be OR-arcs as a sttecial case• 
We will use a graphic representation for disjunctive 
feature structures in which OR-arcs are represented as k 
liues connected together2 (see Fig. 2). 
Definition 2~3 The extended label of a given path is 
the concatenation of all labels along that path. We will 
use the notation 11:12: ... In to represent extended labels. 
A maximal extended label from a node is an extended 
label for a maximal path from that node. 
lWe use this t*rm in the sense usual in graph theory. It should not 
be confused with the term path use.d in ninny feaUure structure 
studies, which is a string of labels, and for which we will 
intlodtw.e the team extended label lat~ in the paper, 
21n some work involving AND/OR graphs, this convention is used 
for AND-arcs. This should not c~atc further confusion. 
B : 
c 
j 
Fig. 2. Graphic representation 
Conditions 2.1 Disjunctive feature structures must 
verify the folh)wing consistency conditions: 
(C1) No output node of all ORdure is a leaf; 
(C2) Output uodes of OR-arcs are not mergitig nodes; 
(C3) All fealure-arcs from the sante my.It have differeut 
labels; 
(C4) No maximal extended label front a given node is a 
prefix of a non-maximal extended label obtained by 
following a different hyperarc from rite same node. 
C 1 and C2 constrain OR-arcs to represent only 
disjunctions• C3 and C4 are extensions of the 
determinism tbat is usually imposed on dags (no 
outgoing arcs with the ,same label from any given node). 
Definition 2.4 A dag feature structure is a feature 
structure with 220 OR-arc. 
Definition 2.5 A projection of a feature structure x is 
a hypergraph obtained by removing all but one output 
node of all OR-arcs of x. 
Therelore, a projection has only l-arcs. 
Definition 2.6 A dag leature structure y is a dug.. 
projection of a feature structure x if there exist some 
projectinu y' ofx and a function h mapping nudes of y' 
into nodes dry such that: 
(1) the root dry' is mapped to file root of y; 
(2) if (hid, {nit}) is a feature-arc of y', then 
(h(nio), {h(nil)}) is a feature-arc of y with the 
same label; 
(3) if (hid, {nil}) is a 1-oR-arc of y', then h(nlo) : 
h(nil); 
(4) the value associated with a node ni in y' is the same 
as the value associated with h(ni) in y, or both have 
no value; 
(5) each feature arc in y is the image of at least one 
feature arc in y'. 
In other terms, a dag-projection is obtained from a 
projection by merging the input and output nodes of 
each l-oR-arc, and merging paths with common 
prefixes to ensure detemainism. 
Definition 2.7 A sub-feature structure rooted at a 
node ni is a quadruple composed of a sub-hypergmph 
rooted at that node, the root ni, together with the 
ACRES DE COLING-92, NANTES. 23-28 Ao~r 1992 4 9 9 Pride, OF COLING-92. N^N'rEs, AUG. 23-28, 1992 
restrictions of the label and value functions to this sub- 
hypergraph. The AND-part of a node is the sub-feature 
structure rooted at that node, starting with only the 
feature-arcs from that node. The OR-parts of a node are 
the different sub-feature structures rooted at that node, 
starting with each of the OR-arcs. The disjuncts of an 
OR-arc are the sub-feature structures rooted at each of 
the output nodes of that oR-arc. If a node has only one 
OR-arc, we will call its disjuncts the disjuncts of the 
node. 
L\[F mEo:\[\]\]3JJ 
\[-\[" C.: m 
, _-~:Eo;D':m\]\] \]\] ' \] 
L Eo:E : n3 S j 
Fig. 3, De~cTiplion of the feallJ~c sallcl~e in Fig. 2. 
2,2 Representation language 
Definition 2.8 The representation language for 
(disjunctive) feature structures described above is 
defined by the following grammar: 
F -4 \[T ..... T\] 
T -41:IV 
T ~ IF ..... F) 
I -4ilt 
V -4Flale 
where F is the axiom, e is the empty string, I belongs to 
the set of labels L, a belongs to the set of atomic values 
A, and i belongs to a set I of identifiers (we use the 
symbols 1"71, I'~"1, etc.), disjoint from L. A formula • of 
that language is called a (disjunctive)feature description. 
The mapping between feature structures and feature 
descriptions is straightforward (Fig. 3). Translating 
between feature descriptions and feature structures and 
checking that a description is valid (that is, corresponds 
to a valid feature structure) is computationally trivial, 
and does not rely on the (potentially expensive) 
application of equivalence rules as in Kasper and 
Rounds (1986). 
3. A TYPOLOGY OF NORMAL FORMS 
In this section, we will first define the disjunctive 
normal form (DNF) in terms of hypergraphs. We will 
then define a family of increasingly restricted normal 
forms, the most restricted of which is the DNF. One of 
them, the factored normal form (FNF) enables a clear 
definition of the "format" of a feature structure. It also 
imposes a strict hierarchical view of the data, and is 
exactly the class of feature structures that are reachable 
from the DNF through sequences of factoring 
operations. We believe that the FNF class is of great 
linguistic interest, since it is clear that disjunction is 
often used to reflect hierarchical organization, factoring, 
etc., and thus is more than just a space-saving device. In 
the sections that follow, factoring operations in the FNF 
class will be defined formally, along with appropriate 
extentions to the notions of subsumption and 
unification. 
3.1 Disjunctive Normal Form 
Definition 3.1 A (disjunctive) feature structure is said 
to be in disjunctive normal form (DNF) if: 
(1) the root has only one OR-part, and no AND-part; 
(2) each disjunct is a dag feature structure; 
(3) all the disjuncts are disjoint and different (non- 
isomorphic). 
Note that the disjunctive normal form is defined for 
feature structures themselves, not for their descriptions. 
Definition 3.2 The disjunctive normal form of a 
given feature structure x, noted DNF(x), is a DNF 
feature structure, in which the set of disjuncts Di is equal 
to the set of dag-projections ofx. 
Definition 3,3 Two feature structures x and y are 
DNF-equivalent if DNF(x) = DNF0'). We will note 
x ~a,f Y. 
3.2 Typology of normal forms 
We can define several interesting restrictions on feature 
structures, which in turn define a typology of 
increasingly restricted normal forms. 
Condition 3.1 Dag-projections obtained by different 
selections of output nodes of OR-arcs arc different. 
Condition 3.2 Each node has at most one OR-part. 
Condition 3.3 The AND-part of each node is a dag. 
Definition 3.4 When combined, the three conditions 
above define several normal forms: 
(1) 3.1: non-redundant normal form (NRNF); 
(2) 3.1 and 3.2: hierarchical normal form (HNF); 
(3) 3.1 and 3.3: AND-normal form (ANF); 
(4) 3.1, 3,2 and 3.3: layered normal form (LNF). 
Definition 3.5 In an ANF feature structure x, the 
AND-part of a node ni is a maximal AND-part of x if ni 
is the output node of no feature arc. 
Definition 3.6 The layers of a LNF feature structure 
are defined recursively as follows: 
(1) Layer 0 is the AND-part of the root; 
Ac'rEs DE COLING-92, NANTES, 23-28 AOt3T 1992 5 0 0 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
(2) Layer n+l is set of (maximal) AND-parts of all the 
output nodes of OR-arcs originating in layer n. 
Let us now turn back to formats. 
Definition 3.7 The format of a dag feature structure 
is the set of maximal extended labels starting at its root. 
The format of a layer is the union of formats of all the 
nmximal AND-parts in that layer. 
Definition 3.8 A LNF feature structure is said to be 
in factored normal form (FNF) if the following 
properties hold: 
(1) the formats of all layers are disjoint; 
(2) paths originating in two distinct maximal AND-parts 
of a layer n can merge only in a node belonging to 
an AND-part in a layer n' such that n" < n. 
Fig, 3. A typology of normal forms. 
Fig. 3 shows the typology of normal forms. Note 
that the DNF is obviously in FNF. 
In the rest of the paper, we will study only the 
properties of FNF, in which formats are homogeneous. 
Definition 3.9 The format of a FNF feature structure, 
noted f(x), is the sequence <.30 ..... sn> of the formats 
of each of its layers, in increasing order starting with the 
root. 
Definition 3.10 We will call sets of extended labels 
dag-formats, and sequences <so, ..., s,,> of dag- 
formats with all si disjoint,d-formats. 
Proposition 3.2 If two FNF feature structures have 
the same DNF and the same forumt, they are equal. 
4. RESTRUCTURING OPERATORS 
4.1 Factor and unfactor 
Let us give fwst a few auxiliary definitions. 
Definition 4.1 Let x be a dag feature structure, and s 
a dag-format. The spanning ofx according to s, noted 
spans(X), is the greatest sub-dag of x such that of all the 
paths in spans(x) have their extended labels in ,3. 
Note that f(spans(X)) ~ s. 
Definition 4.2 A dag feature structure F is a 
cormnonfactor of a feature structure x if the AND-part of 
all the disjuncts at the top level of x contain F. A dag 
format s is said to span a common factor ofx if the 
spanning of the AND-part of all the disjuncts at the top 
level ofx according to s is a common factor. 
Let us now define the factoring and unfactoring 
operations. Informally, the factor operator extracts a 
factor common to all the top-level disjanets, and raises it 
to the root level. 
Definition 4.3 Let x be a FNF feature structure such 
that f(x) = <so, Sl, s2 ..... Sn> and s a dag-format. If 
s spans a common factor F, the factoring of x according 
to s, noted cs(X), is the FNF feature structure DNF- 
equivalent to x with format <soUs', Sl-S', s2, ..., s,> 
where s' = f(F). 
Definition 4.4 Let x be a FNF feature structure with 
an AND-part A, such that f(x) = <so, Sl, s2 ..... Sn>, 
and s be a dag-format. If F = spans(X), the unfactoring 
of x according to s, noted ~s(x), is the FNF feature 
structure that is DNF-equivalent to x with the format 
<so-s', Slt3S ', s2 ..... sn>, where s" : f(F). 
Example. See Fig. 4 
Proposition 4.1 ~s(~s(x)) = ~s(~s(x)) = x 
Proposition 4.2 
(1) ~(¢.,<x))=~'.e(¢.,(x)) -- ¢.~s<X) 
(2) d~s(~s'(X)) = ~s'(~bs(X)) -- ~-su,~'(X) 
format: format: 
<{A:B,A:C}, {A:D,A:E,F:G,F:H}, <{A:B,A:C,A:D,A:E}, {F:G,F:H}> 
AI\ I\ 
d e hi d e h2 k NI/ h2 
Fig. 4. Factoring and uafactoring 
Acq'ES DE COLING-92, NhbrrEs, 23-28 AO~r 1992 5 0 l I'ROC. OV COLING-92, NAN-rEs, AUG. 23-28, 1992 
4.2 Group and ungroup 
The factor operator requires that there is a common 
factor. In many cases there is no common factor; 
however, it is possible to define a group operator that 
first splits feature structures into groups of disjuncts that 
have common factors with respect to a given format, and 
then factors them. 
Definition 4.5 Let x be a FNF feature structure such 
that f(x) = <¢,Sl, s2 ..... s,>. At ..... An be the 
AND-parts of the top-level disjuncts of x. and s be a 
dag-format. The grouping ofx according to s, noted 
7s(X), is the FNF feature structure DNF-equivalent to x 
with format <0, s', Sl-S', s2, ..., s,> where s' = 
U f(spans(ml)). 
Definition 4.6 Let x be a FNF feature structure such 
that f(x) = <¢,Sl,S2 ..... s,>. At ..... An be the 
AND-parts of the top-level disjuncts of x, s be a dag- 
format, and s" =~f(spans(Ai)). We will note ~s(x) 
the ungrouping of x according to s : 
(1) if s'=sl then ~s(x) is the FNF feature structure 
DNF-equivalent to x with format 
<O, sit,is2, s3 ..... Sn> ; 
(2) if s'~:Sl then ~is(X) is the FNF feature structure 
DNF-equivalent to x with format <¢, Sl-S', s' 
US2, $3, ..., Sn>. 
Example. See Fig. 5. 
Proposition 4.3 Ys( ~ s(x)) = ~ s(ys(x)) = x 
Proposition 4.4 The class of FNF feature structures 
is closed under factoring, nnfactoring, grouping and 
ungrouping. 
4.3 Format operator 
Definition 4.7 Let S be a fs-format <s0,Sl,...,Sn>. 
The formatting of a DNF feature structure x according to 
S, noted Vs(x), is the result of the following sequence of 
operations: 
Vs(x ) = CsO(~t$1Uso(~s2tJSlkJSO(....(~tSnt J.. t.)so(X)))) 
It is clear that vs(x) is in FNF, and is DNF- 
equivalent to x. 
Proposition 4.5 Any FNF feature structure x can be 
reached from its DNF though a sequence of grouping 
and factoring operations. More precisely, if x'= 
DNF(x) then x = Vf(x)(X'). 
Definition 4.8 Let S be a fs-format <sO.Sl,...,sn>. 
The unformauing of a FNF feature structure x according 
to S, noted Vs(x), is the result of the following 
sequence of operations: 
V s(x) = ~ s,u ...Uso(...( f" s2usl~s0( ~ s,us0(~0(x)))) 
Proposition 4.6 Any FNF feature structure x can be 
transformed into its DNF though a sequence of 
unfactoring and ungrouping operations. More 
precisely, vf(x)(x) = DNF(x). 
Proposition 4.7 Vs( iJ s(X)) = i~ s(Vs(X)) = x 
5. SUBSUMPTION, UNIFICATION AND 
GENERALIZATION 
As mentioned in the introduction, the format of tfie result 
of unification is not defined in the classical approach. 
Our goal will be to define unification on FNF disjunctive 
feature structures in such a way that the format of the 
result is unique and predictable. Intuitively. when 
feature descriptions have compatible formats (as in Fig. 
6), it seems that unification should preserve it. On the 
other hand, when two feature descriptions have 
completely incompatible formats (as in Fig. 1), the 
resulting format should be in DNF. When formats are 
only partially compatible, a limited amount of 
unfactoring should be performed, and the compatible 
part should be preserved in the result. These 
considerations lead us to define compatibility of formats, 
and to extend the notions of subsumption, unification, 
and generalization to feature structure formats. We then 
define unification and generalization on disjunctive 
feature structures in such a way that important properties 
format: format: 
<~, (A,B,C,DzE,F), {G,\[I}> __y~ <¢~, {A}, {B,C,D:E,F}, {G,H}> 
"~A) ~ 
a IC~~I~H12 ~ ! el a2 bl c2 f\] al F V 
fl 
Fig. 5. Grouping and ungrouping 
AcrEs DE COLING-92. NAr, rlXS. 23-28 AO~" 1992 5 0 2 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
hold. In particular, reduction to DNF, factoring, and 
grouping are homomorphisms with respect to unification 
(that is, DNF(x LI y) = DNF(x) Ll DNF(y), ys(xU y) 
= ~(x) LI ~0'), etc,). 
B: bl D: dl . 
C: c liB: bill 
B: b2 ILt5: old \[ 
C: c II B: b211 
A: Ba2b3 -A: a2 
D: d2 ., 
II B: 13311 
/.u_L: o~J | 
II B: b4 I I 
lLh: eqJ~L 
 1111 B: bl C: cl E: 01 B: b2 C: c2 E: e 
: d2 
C: c3 
E: e 
C: C4 
E: e 
Fig. 6. Compatible formats 
In what follows, we will call the classical 
subsumption, unification, generalization of (lag feature 
structures dag-subsumption, dag-unification and dag. 
generalization (noted -<-~s, Liaag, \[-la~g, repectively). The 
classical subsumption, unification, generalization of 
DNF feature sU'uctures will be called dnf-subsumption, 
dnf-unification and dnf-generalization (noted ~.f, Lla.f, 
Ha./, repectively). 
5.1 Subsumption, unification, generalization 
of formats 
Definition 5.1Let S1 be a fs-format <Sl0, 
Sll ..... $ln> and $2 be a fs-format <s20, 521,-.-, s2p >. 
We will say that S1 subsumes $2 if each p in Sli 
belongs to some s2j with i <j, for all i in { 1, n}. We 
will note S1 <-/,.t $2 
Definition 5.2 Let SI and $2 be two fs-formats. The 
unification of Sl and $2, noted S1 LI/,.~ $2. is the 
greatest lower bound of S1 and $2 according to the 
format subsumption relation. The generalization of S1 
and $2, noted S1 lql,,. $2, is the least upper bound of 
$1 and $2 according to the format subsumption relation. 
It is easy to prove that these bounds exist. They can 
be built recursively. For example, let SI = <rl0, 
s\] 1 ..... sin> and $2 = <s20, s21,..., s2n> (for the 
sake of simplicity, we will consider the shorter of SI 
and $2 to be padded on file right with an appropriate 
number of ¢% in order to ensure the same length). S = 
S1 UL,. t $2 = <so, Sl,..., s.> can be constructed 
recursively: 
(1) sn=SlnUS2n . 
(2) sF (slit-) s2i) - jL~+I sj for all i, 0-<i <n. 
Definition 5.3 Let SI be a fs-format <Slo, 
Sll,... , sin> and $2 a fs-format <s20, s21 ..... s2p>. 
We will say that $2 is a sub-format of St ifslo is 
included in s20 for all i in { 1, n}, We will say that SI 
and $2 are compatible if both S1 and $2 are sub-formats 
of the same format. 
5.2 Subsumption, unification, generalization 
of disjunctive feature structures 
Definition 5.4 We will say that a FNF feature 
structure x subsumes a FNF feature structure y, and 
note x <- y, if 
(1) x <-a.fY 
(2) If x) -<:,,, Ify) 
Definition 5,5 Let x and y be two FNF feature 
structures. The unification of x mid y, noted x LI y, is 
the greatest lower bound of x and y according to the 
subsumption relation. The generalization of S1 and $2, 
noted x INy, is the least upper bound of x and y 
according to the format subsumption relation. 
The following proposition states thatx LI y is dnf- 
equivalent to the dnf-unification of the DNFs of x and y, 
and the format ofx IJ y is the unification of the formats 
of x and y: 
Proposition 5.1 
(1) DNF(x L\] y) = DNF(x) Lid,,/-DNF(y) 
(2) f(x U y) = f(x) Lip., y(x) 
As a result, the unification of x and y can be computed 
by completely unformatting both x and y, unifying 
them, and formatting the result according to the 
unification of their formats: 
Proposition 5.2 
x U y = v f(x)U\[,.tf(y)(Vf(x)(x) LIdn f V f(y)(y)) 
(Dual proposition holds for generalization.) 
Proposition 5.3 The class of FNF feature structures 
is closed under factoring, unfactoring, unification, and 
generalization. 
This follows directly from the definitions. 
Proposition 5.4 
(1) 7s(XMy) = Z~(x) LI~(y) 
(2) ~(x Lly) = ~(x) Ll~(y) 
(3) eAx LI y) = es(x) u O,,(Y) 
(4) ~s(X lly) : es(x) LlOs(y) 
(Dual propositions hold for generalization.) 
5.3 Algorithm 
Proposition 5.2 does not imply that complete 
unfactoring and re-factoring is the most efficient 
computation of unification and generalization. Because 
of the properties given in proposition 5.4, unification 
can be carried out layer by layer, and only partial 
unfactoring is needed (algorithm 5.1). In the extreme 
case, when the formats of x and y are compatible, no 
unfactoring is needed, and the procedure match-formats 
does nothing. 
ACyF~ DE COLING-92, Nnl,rn~s, 23-28 AOt7 1992 5 0 3 Pe.oc, OI; COLING-92, NAN'I~.S. AUG. 23-28, 1992 
Algorithm 5.1 Unification of FNF feature structures 
function unify(x, y: feature-structure): feature-structure 
match-formats(x, y) 
//Unify AND-parts 
z.AND ~ dag-unify(x.AND, y.AND) 
if z.AND =failure then return failure //Unify OR-parts 
z.OR 4-- unify-disjuncts(x.OR, y.OR) 
If z.OR =failure then return failure else return z 
function unify-disjuncts(x, y: feamre-structare): 
feature-structure 
//assume x.AND and y.AND are empty 
match-formats(x, y) 
k~0 
for each x,DISJi 
for each y.DISJj 
t 4-- dag-unify(x.DISJi.AND , y.DISJj.AND) 
If t ~: failure then 
u ~-- unify-disjuncts 
(x.DISJi.OR, y.DISIj.OR) 
if u ~: failure then k~.-k+l 
z.DISJk.AND 4-- t 
I / \[ I z'DISJk'OR 4-- u if k = 0 then return failure elsereturu z 
We will consider the complexity of this algorithm in 
terms of the number of dag-unifications, which is the 
only costly operation (O(n log(n)), where n is the total 
number of symbols in the two dag feature structures-- 
see AIt-Kaci, 1984). We will first consider the case 
where the formats are compatible. One dag-unification is 
performed in the unify function, but the bulk of the dag- 
unifications are performed in the unify-disjuncts 
function. There are two nested loops, and the function is 
applied recursively through all the layers. Therefore, in 
the worst case, the algorithm requires O(d 2) rag- 
unifications, whre d is the total number of disjancts. 
When the formats are not compatible, some 
unfactoring and ungrouping has to be performed by the 
match-formats function in order to force the formats to 
match. The number of operations can be limited if the 
two formats are partially compatible, due to the 
properties of FNF. Complete unformatting will be 
necessary only in cases where the two formats are 
completely incompatible. 
For example, if f(x) = <{A}, {B,C}, {D,E}, {F}, 
{G}, {H}>, and f(y) = <{I}, {B,J}, {D,F}, {E,K}, 
{G}, {L}>, the resulting format is <{A,I}, {B,C,J}, 
{D}, {E,F,K}, {G}, {H,L}>. The two first layers can 
be computed without unfactoring. Unfactoring is 
required for disjuncts at the next level, yielding the 
formats <{D}, {E,F}, {G}, {H}> and <\[D}, 
{E,F,K}, {G}, {L}>, respectively. When this is 
accomplished the formats match, and the algorithm can 
resume with no more unfactoring. 
It is clear that, in the worst case, when the 
algorithm requires the complete unformatting of the two 
feature structures, the total number of dag-anifications 
grows exponentially with the number of disjanets. 
However, in most pratical cases, the algorithm is likely 
to perform better. We saw, in particular, that when the 
two feature structures have completely compatible 
formats, the complexity is only quadratic. There is 
obviously a range of possible behaviors between these 
two extremes. 
It seems to us that in practical applications, 
disjunction is not random, but, instead, reflects some 
systematic linguistic properties. A high degree of 
compatibility among formats is therefore expected. It 
should also be noted that the algorithm can easily be 
modified so that only one feature structure is nnfactored 
and re-formatted into a format that is compatible with the 
format of the other. This is especially useful in the 
common situation in which a small feature structure, 
containing a small number of disjuncts (e.g. a 
constituent at a given stage of parsing) is matched 
against a very large feature structure (e.g. a grammar). 
In this case, the time required for unformatting and 
reformatting the "small" feature structure is negligible, 
and the overall number of dag-unifications grows 
linearly with the number of disjuncts in the "large" 
feature structure. 
6. CONCLUSION 
In this paper, we present a new mathematical 
framework in which disjunctive feature structures are 
defined as directed acyclic hypergrapbs. Disjunction is 
defined in the feature structure domain, and not at the 
syntactic level in feature descriptions. This enables us to 
study properties and specify operations (such as 
unification) and relations (such as subsumption) in terms 
of algebraic operations on (or relations among) 
hypergmphs rather than in syntactic terms. We illustrate 
the expressive power of this framework by defining a 
class of disjunctive feature structures with interesting 
properties (factored normal form, or FNF), such as 
closure under factoring, unfactoring, unification, and 
generalization. Unification, in particular, has the 
intuitive appeal of preserving as much as possible the 
particular factoring of the disjunctive feature structures 
to be unified. We also show that unification in the FNF 
class can be exlremely efficient in practical applications. 
Acknowledgments -- The present research has been 
partially funded by the GRECO-PRC Curmnunication 
Homme-Machine of the French Ministery of Research 
and Technology and U.S.-French NSF/CNRS grant INT- 
9016554 for collaborative research. The author would 
like to thank Nancy Ide for her valuable comments and 
help in the preparation of this paper. 
REFERENCES 

A)\[T-KACI, H. (1984) A New Model of Computation Based 
on a Calculus of Type Subsumption. Pit. D. Thesis, 
Univ. of Pennsylvania. 

BERGE, C. (1970). Graphes et tlypergraphes. Paris: 
Dunod. \[translation: Graphs and Hypergraphs, 
Amsterdam : North-Holland, 1973\] 

KASPER, R. T. (1987). A unification method for 
disjunctive feature descriptions. Proc. 25th Annual 
Meeting of the Association for Computational 
Linguistics. Stanford, California, 235-242. 

KASPER, R., ROUNDS, W. C. (1986). A logical semantics 
for feature structures. Proc. 24th Annual Meeting of 
the Association for Computational Linguistics. New 
York, 257-266. 
