STRUCTURE SHARING PROBLEM AND ITS SOLUTION 
IN GRAPH UNIFICATION 
Kiyoshi KOGURE 
NTT Basic Research Laboratories 
3-1 Morinosato-Wakarniya, Atsugi-shi, Kanagawa., 243-01 Japan 
kogure~at om. ntis. jp 
ABSTRACT 
The revised graph unification algorithms presented 
here are more efficient because they reduce the 
amount of copying that was necessary because of the 
assumption that data-structure sharing in inputs oc- 
curs only when feature-structure sharing occurs. 
1 INTRODUCTION 
Constraint-based linguistic frameworks use logical 
systems called feature logics (Kasper & Rounds, 1986; 
Shieber, 1989; Srnolka, 1988), which describe linguis- 
tic objects by using logical formulas called feature de- 
scriptions that have as their models feature structures 
or typed feature structures. Shieber (1989) argued 
that if the canonical models of finite formulas of a fea- 
ture logic were themselves finite, we could use them to 
compute over instead of theorem-proving over the for- 
nmlas themselves. This would be advantageous if we 
had efficient algorithms for manipulating the canoni- 
cal models. 
The most important operation on models- feature 
structures or typed feature structures is combining 
the information two models contain. This opera- 
tion is traditionally called unification, although re- 
cently it has come to be more suitably called infor- 
mational union. This unification operation is signif- 
icant not only theoretically but also practically be- 
cause the efficiency of systems based on constraint- 
based formalisms depends on the (typed) feature 
structure unification and/or feature description uni- 
fication algorithms they use. 1 This dependency is 
especially crucial for monostratal formalisms -that 
is, formalisms which use only (typed) feature struc- 
tures such as HPSG (Pollard & Sag, 1987) and JPSG 
(Gunji, 1987)? 
The efficiency of (typed) feature structure unifica- 
tion has been improved by developing algorithms that 
take as their inputs two directed graphs representing 
(typed) feature structures, copy all or part of them, 
and give a directed graph representing the unification 
result. These algorithms are thus called graph unifi- 
cation. Previous researeh has identified graph copying 
as a significant overhead and has attempted to reduce 
this overhead by lazy copying and structure sharing. 
Unification algorithms developed so far, however, 
including those allowing structure sharing seem to 
1For example, the TASL1NK natural language system 
uses 80% of the processing time for feature structure uni- 
fication and other computations required by unification, 
i.e., feature structure pre-copying (Godden, 1990). 
2For example, a spoken-style .Japanese sentence analy- 
sis system based on HPSG (Kogure, 1989) uses 90%-98% 
of the processing time for feature structure unification. 
syn | 
Fig. 1: Matrix notation for a typed feature structure. 
contradict structure sharing because they assmne the 
two input graphs never share their parts with each 
other. This "structure sharing" assumption prevents 
the initial data structures fl'om sharing structures for 
representing linguistic principles and lexical informa- 
tion even though many lexical items share common 
information and such initial data structure sharing 
could significantly reduce the amount of data struc- 
tures required, thus making natural language systems 
much more efficient. Furthermore, even if the struc- 
ture sharing assumption holds initially, unification al- 
gorithms allowing structure sharing can yield situa- 
tions that violate the assumption. The ways in which 
such unification algorithms are used are therefore re- 
stricted and this restriction reduces their efficiency. 
This paper proposes a solution to this "structure 
sharing problem" and provides three algorithms. Sec- 
tion 2 briefly explains typed feature structures, Sec- 
tion 3 defines the structure sharing problem, and Sec- 
tion 4 presents key ideas used in solving this problem 
and provides three graph unification algorithms that 
increase the efficiency of feature structure unification 
in constraint-based natural language processing. 
2 TYPED FEATURE STRUCTURES 
The concept of typed feature structures attgments the 
concept of feature structures. A typed feature struc- 
ture consists of a set of feature-value pairs in which 
each value is a typed feature structure. The set of type 
symbols is partially ordered by subsumption ordering 
_<7 and constitutes a lattice in which the greatest ele- 
ment T corresponds to 'no information' and the least 
element J_ corresponds to 'over-defined' or 'inconsis- 
tency.' For any two type symbols a, b in this lattice, 
their least npper bound and greatest lower bound are 
respectively denoted a VT b and a AT- It). 
Typed feature strnctures are represented in matrix 
notation as shown in Fig. 1, where syn, agr, sg, and 
3rd are type symbols; agree, hum, per, and subj are 
feature symbols; and X is a tag symbol. A feature- 
address that is, a finite (possibly empty) string of 
feature symbols is used to specify a feature value of 
an embedded structure. In Fig. 1, for example, the 
structure at the feature-address agree . uum, where 
'.' is the concatenation operator, is said to have sg 
as its type symbol. The root feature-address is de- 
886 
l \\su bj 
agrec li ~.syn 
a/agr~ fl re c 
++ N sg 3rd 
Fig. 2: Graph representation of a typed feature struc- 
l, ure. 
noted by '(.' To specify token-identity in matrix no- 
tation, a tag symbol is used: feature-address values 
with the same tag symbol arc token-identical, and 
those featm'e-addresses with the token-identical value 
are said to corefer. /n Fig. 1, the feature-addresses 
agree and subj • agree corefer. 
A typed feature, structure is also represented by a 
rooted, connected, directed graph within which each 
node corresponds to a typed feature structure and is 
labeled with a type symbol (and, optionally, a tag 
symbol) and each arc corresponds to a feature.-value 
pair and is labeled with a ti'~ature symbol. Fig. 2 illus- 
trates the graph representation of the typed feature 
structure whose matrix notation is shown in Fig. 1. 
In a graph representation, the values at corefcrent 
Ihature-addresscs that is, token-identical values 
are represented by the same node. 
'\['he set of typed featm:e structures is also partially 
ordered by a subsumption ordering that is an exten-- 
siou of the subsnmptiol, ordering on the set of type 
symbols. A typed feature structure tl is less than or 
equal to tu (written as tl <, in) if and only if tt is 
iuconsistent (that is, if it includes the type symbol \]_) 
or (i) t~ 's type symbol al is less than or equal to t~'s 
type symbol a2 (a~ _<7 ap.); (ii) each h'.atur(~ f of 12 
exists in ll and has a value 12, f such that its counter= 
part ttj is less than or equal to t2,j'; m'/\] (iii) each 
coreference relation holding in 12 also holds in 11. 
'.l'his subsumpl, ion ordering serves its the basis for 
(Mining two lattice operations: generalization (the 
least upper bound or join) and unitlcation (the great- 
est lower bound or meet). 
Typed feature structures have been formalized in 
several ways, such as by using .I/%types (Mt-Kaci, 
198~). 
3 THE STRUCTURE SHARING 
PROI1LEM 
3.1 Graph Unification Algorithms 
The destructive unitlcation algorithnl presenled by 
Aitq(aci is the starting point in increasing the ef- 
liciency of graph unification. It is a node-merging 
process that uses the Unio>Find algorithm, which 
wits originally devek)t)ed for testing tinite automata 
equivalence (llopcroft & Karp, 1971), in a manner 
w.'ry similar to that of the unification algorithm for 
rational terms (llnet, 197(i). (',iveu two root nodes of 
graphs representing (typed) feature structures, this 
algorithm simultaneously traverses a pair of input 
nodes with the same feature-address, putting them 
node structure 
tsymbol 
a7c8 
generation 
forward 
copy 
{a type symbol) 
(a set of arc structures} 
{an integer) 
NIL I {a node st,'ucturc) 
NIL I {~ node structure} 
I (a copydcp structure) 
arc structure 
label (~ feature symbol} 
vahw {a node structure) 
copydep structure 
generation | {an integer} 
deps \[ (a set of node and arc p~irs) 
Fig. 3: I)ata structures for nondestructive unification 
an<l LING unification. 
into a new and larger coreference class, and then re- 
turns the lnerged graph, 
Since the destructive unification process modifies 
its input graphs, they must first be copied if their 
contents are to bc preserved. Nondeterminism in 
parsing, for example, requires the preservation of 
graph structures not only for initial graphs repre- 
senting lcxical entries and phrase structure rules but 
also for those representing well-formed intermediate 
structures. Although the overhead for this copying 
is significant, it is impossible to represent a resul.- 
taut unitied graph without creating any new strut 
tures. Unnecessary copying, though, must be identi- 
fied and minimized. Wroblewski (1987) delined two 
kinds of unnecessary copying- over-copying (copying 
structures not needed to represent resultant graphs) 
and early-copying (copying structures even though 
unitication fails) -but this account is flawed because 
the resultant graph is assumed to consist only of newly 
created structures even if parts of the inputs that are 
not changed during mtitication could be shared with 
the resultant graph. A more eNcient unification al- 
gorithm would avoid this redundant copying (copying 
structures that can be shared by the input and re- 
sultant graphs) (Kogure, 1990). To distinguish struc- 
ture sharing at the implementation level fl'om that at 
the logical lew'l (that is, coreference relations between 
feature-addresses), the lbrmer is called data-structure 
sharing and the latter is called feature-structure shar- 
ing (Tomabechi, 1992). 
'\['he key approaches to reducing the amount of 
structures copied are lazy copying and data-structure 
sharing. For lazy copying, Karttnnen (1986) proposed 
a reversible unification that saves the original con- 
tents of the. inputs into prealloeated areas immedi- 
ately before destructive modification, copies the resul- 
tant graph if necessary, and then restores the original 
contents by undoing all the changes made during mli- 
tication. Wroblewski (1987), on the other hand, pro- 
posed a uondestructiw~ unitication with incremental 
copying. Given two graphs, Wroblewski's algorithm 
simultaneously traverses each pair of input nodes with 
the same feature-address and creates a (:ommon copy 
of the input nodes. The nondestructive unification 
887 
algorithm for typed feature structures uses the data 
structures shown in Fig. 3. a The algorithm connects 
an input node and its copy node with a copy link 
that is, it sets the copy node as the input's copy 
field value. The link is meaningflfl during only one 
unification process and thus enables nondestructive 
modification. 4 Using an idea similar to Karttunen's, 
Tomabechi (1991) proposed a quasi-destructive unifi- 
cation that uses node structures with fields for keep- 
ing update information that survives only during the 
unification process. 5 
Unification algorithms allowing data-structure 
sharing (DSS unification algorithms) are based on 
two approaches: the Boyer and Moore approach, 
which was originally developed for term unification 
in theorem-proving (Boyer & Moore, 1972) and was 
adopted by Pereira (1985); and the lazy copying 
suggested by Karttnnen ~nd Kay (1985). Recent 
lazy copying unification algorithms are based on 
Wroblewski's or Tomabeehi's schema: Godden (1990) 
proposed a unification algorithm that uses active 
data structures, Kogure (1990) proposed a lazy in- 
cremental copy graph (LING) unification that uses 
dependency-directed eol)yiug, and Emeie (1991) pro- 
posed a lazy-incremental copying (LIC) unification 
that uses chronological dereference. These algorithms 
are b0,sed on Wroblewski's algorithm, and Tomabechi 
(1992) has proposed a data-structure-sharing version 
of his quasi-destructive unification. 
3.2 The Structure Sharing Problem 
The graph unification algorithms mentioned so far-- 
perhaps all those developed so far--assume that data- 
structure sharing between two input structures occurs 
only when feature-structure sharing occurs between 
feature-addresses they represent. This "structure 
sharing" assumption prevents data-structure sharing 
between initial data structures for representing lin- 
guistic principles and lexical information even though 
many lexical items share common information. For 
example, many lexical items in a traditional syntactic 
categories such as noun, intransitive verb, transitive 
verb, and so on share most of their syntactic informa- 
tion and differ in their semantic aspects such as se- 
mantic sortal restriction. Such initial data-structure 
sharing could significantly reduce the amount of data 
structures required and could therefore reduce page- 
swapping and garbage-collection and make natural 
language processing systems much more efficient. 
Furthermore, even if the structure sharing assump- 
tion holds initially, applying a DSS unification algo- 
rithm in natural language processing such as parsing 
and generation can give rise to situations that vio- 
late the assumption. Consider, for example, JPSG- 
aFor the nondestructive unification algorithm, the node 
structure takes as its copy field value either NJ L or a node 
structure only. 
4In this algorithm each unification process has an in- 
teger as its process identifier and each node created in a 
process has the identifier as its generation field vMue. A 
copy link is meaningful only if its destination node has the 
current process identifier. Such a node is called ~current.' 
~The technique used to control the lifetime of update 
data is the same as that of Wroblewski's algorithm. 
based parsing. There are only a few phrase structure 
rules in this fl'amework and the Complement-Head 
Construction rule of the form 'M --+ C It' is applied 
very frequently. For instance, consider constructing a 
structure of the form \[vP~ NP2 \[vP, NP1 VII. When 
the rule is applied, the typed feature structure for 
the rule is unified with the structure resulting from 
embedding the typed feature structure for NPl at 
the feature-address for the complement daughter in 
the rule (e.g., dtrs. cdtr), and the unification re- 
sult is then unified with the structure resulting from 
embedding the typed feature structure for V at the 
feature-address for the head daughter. Because not 
every substructure of the structure for the rule al- 
ways changed during such a unification process, there 
may be some substructures shared by the strneture 
for the rule and the structure for VP1. Thus, when 
constructing VP2 there may be unexpected and unde- 
sired data-structure sharing between the structures. 
Let me illustrate what happens in such eases by us- 
ing a simple example. Suppose that we use the non- 
destructive unification algorithm or one of its data- 
structure sharing versions, the LING or I,IC algo- 
rithm. The nondestructive and LING unification al- 
gorithms use the data structures shown in Fig. 3, 
and the LIC algorithm uses the same data struc- 
tures except that its ~zode structure has no forward 
field. Consider unification of the typed feature struc- 
tures tl and t2 shown in Fig. 4(a). Suppose that t, 
and t2 are respectively represented by the directed 
graphs in Fig. 4(b) whose root nodes are labeled by 
tag symbols X0 and X4. That is, tj's substructure 
at feature-address f2 and t2'S substructure at \]'1 are 
represented by the same data structure while feature- 
structure sharing does not hold between them, and 
tl's substructure at \]3 and t2's substructure at; f4 are 
represented by the same data structure while feature- 
structure sharing does not hold between them. Each 
of the algorithms simultaneously traverses a pair of 
input nodes with the same feature-address both of 
the inputs have Dora the root feature-address to leaf 
feature-addresses, makes a common copy of them 
to represent the unification result of that feature- 
address, and connects the input and ontput nodes 
with copy links. For any feature-address that only 
one of the inputs has, the nondestructive unification 
algorithm copies the subgraph whose root is the node 
for that feature-address and adds the copied subgraph 
to the output structure, whereas the LING and LIC 
algorithms make the node shared by the input and 
outpnt structures. In the case shown in Fig. 4(b) the 
root nodes of the inputs nodes with the tag symbols 
Xo and X4 are first treated by creating a common 
copy of them (i.e., the output node with Yo), con- 
necting the input and output nodes with copy links, 
and setting bo = ao A:r a4 as the copy's lsymbol wdue. 
Then the input nodes' arc structures are treated. Snt> 
pose that the pair off1 arcs is treated first. After the 
input nodes at feature-address fl are treated in the 
same manner as the root nodes, the pair of fie arcs 
is treated. In this case, tl's node at f2 (labeled X2) 
already has a copy link because the node is also used 
as t2's node at \]'1 so that the destination node of the 
link is used as this featnre-address's output node. Af- 
888 
I k *~* \] tt : a0 f2 II2 , 
Lfa aa 
t2 : a4 la a~ , 
(a) Input typed feature structures. 
input tl Int)ut t2 
X0:a0 Xva¢ 
:,/ " :..-:: :,/" ,\:, 
x:, / ", 
', /;()utl)ut la ",,, \\/,: 
\, . .... Yo:bo i ~'\ 
/ \] , i k, ,,,\ s,/ s,,, ,. 
Yl:bl ~ copy lin k 
(b) Snapshot of incremental graph ratification allow- 
ing data-structure sharing, 
ta : t)o Xa:aa ' 
LA: 
t)a II. At t2 : t)o aa 
LA: aa 
where, 
\])0 =: a 0 A,\] "14~ 
bl -- alA7 a2Av a~, 
ID2 ~ at A'I a:~, 
|)3 = a2 A7 aE,. 
(c) Wrong graph unili(:ation outl)ut (ta) and the cot-- 
rect unifi(-ation of the inputs (t~ At Zp,,). 
l"ig. 4: An examph; of incorrect graph unitication. 
ter the common label arcs are treated, unique label 
arcs are treated. The nondestructive, unitication algo- 
rithm copies tl's Ca and t~'s f4 arcs and adds them go 
the output root node, whercas the LING and tIC a.1- 
gorithms make the input and output structures share 
their destination nodes, t:'inally, the I,ING and MC 
algorithms obtain gr~l)h t: n represented in matrix no- 
tation ill Fig. 4(('i)just over the correct result. 
The nondestructive unification algorithni obtains 
the same typed feature structure. The reversible and 
the quasi-destructiw', unification algoril, hms are also 
,mable to obtain the correct result for this example 
becatlS(; these Mgorithms cmmot represent two up- 
date nodes by using a single node. Thus, none of tile 
ctiicient unification algorithms developed re, ce, ntly ob- 
tains the correct results R)r such a case. Avoiding such 
wrong unification results requires undesirable copy- 
ing. We can, for example, avoid getting the wrong 
result by interleaving tile application of any non-DSS 
unilication algorithm between N)plications of a I)SS 
unitication algorithm, but such bypassing requires two 
unilication programs and reduces the efficiency gain 
of I)SS unification. This prechlsion of useful data- 
structure sharing is referred to here as the 'structur~ 
sharing" problem. 
It has been shown that all the/)SS mfiticat.ion nlgo 
rithins lncntioned above are subject to tMs problem 
even if the structure sharing assumption holds ini- 
tially. Non-I)SS unification Mgorithms are also sub- 
ject to the problem because their inputs are created 
1)y applying not only the unitication operation but 
also operations such as embedding and extraction, in 
most implelnentations of which data-structure shar. 
hag occurs t)etween their input and output structures. 
1!3ven non-l)SS unification algorithms must there, fore 
take such inputs into act(mat, and this requires un- 
desirable copying. 
4 A SOLUTION '1'O THE STRUCTURE 
SHARING PROBLEM 
4.1 Key Ideas 
The example ill Section 3 suggests that the structlu'e 
sharing l)roblem has two sources, which concern not 
only the increnmntal Col)ying al)proach but also other 
al)proaches. The tirst source is the way of rec, ording 
ul)date inibrmation. In the incremental Col)ying at)- 
proach, this corresponds to the way of copying struc- 
tures. That is, while calculating t l A t t, 2 the incr(:men 
tal copying process does not (lisl, inguish between the 
copies cremated tuq tim sul>strucl;ures of the left input 
l t and the copies created as tile substructures of the 
right input t2. As a result, a copy node oft1 's node at 
f~ature-address p can be used as a copy node of t~'s 
nod(', at a feature-address, and vice versa. In Fig. 4(10, 
fbr example, tile copy of t2's node al; f2 is wrongly 
used as the copy of tl's node at fl. This causes 
unexpected and wrong data-structure sharing in the 
r(~sultant graph and this in turn catlses unexpect(~d 
and wrong feature-structure sharing in the resultant 
(typed) fc~t, ure s'\[,rllcttlro. Ill other apl)roachcs , such 
as the quasi-destructiw~ apl/roach , the source of the 
structure sharing prol)lem is that each node structure 
has tMds for keeping information on only two typed 
feature structures one for the original and one R)r 
tilt: result wheretm fields for keeping information on 
three typed feature structures are needed one for tl> 
original and one for each of the two results, 
One way to solve this problem is therel'ore to nlake 
each node keep information on thre, c typed fe, ttturc 
structures: in the increnrental COl)ying apl)roach ca(;h 
nod(: must have two copy tields, and in the quasi- 
do, structive ;t\[)t)roach each llode l)lUSt have two sets 
of llelds for updates. 
'Fhe second source of the structure sharing prob 
lem is the method of data-stru(:ture sharing between 
input and output structures. Unexpected and wrong 
data-structure sh~ring may result if a node shared by 
the leg and right inputs is used as part of the left in- 
put, intended to be shared between the left input and 
output, at the same time it is used as part of the right 
input, intended to be shared between the right input 
889 
node structure 
tsymbol (a type symbol} 
arcs (a set of arc structures) 
generation {an integer) 
forward NIL \[(a ,,ode structure) 
lcopy NIL \[ (a node structure) 
rcopy NIL\[ {a node structure) 
Fig. 5: The node structure lbr the revised nondestruc- 
tive unitication. 
and output. In Fig. 4(b), for example, tl's node at 
feature-address f~ is shared as t3's node at the same 
feature-address, and the same node as t2's node at f4 
is shared as ta's node at the same feature-address. 
This problem can be solved easily by keeping infor- 
mation on data-structure sharing status; that is, by 
adding to the node structure a new field for this pur- 
pose and using it thus: when a unification algorithm 
makes a node shared (for example, between the left 
inpnt and output), it records this information on the 
node; later when tilt algorithm attempts to make the 
node shared, it does this only if this data-structure 
sharing is between the left input and output. 
4.2 Algorithms 
This section first describes a non-DSS unification al- 
gorithm that discards the structure sharing assump- 
tion and thus permits initial data-structure sharing, 
and then it describes two DSS unification algorithms. 
Revised Nondestructive Unifi(:ation 
This Mgorithm uses, instead of the node structure 
shown in Fig. 3, the node structure in Fig. 5. That is, 
the algorithm uses two kinds of copy links: Icopy for 
the left input and rcopy for the right input. 
Tilt revised nondestructive unification procedure 
for typed feature structures is shown in Figs. 6 and 7. 
Given two root nodes of directed graphs, the top-level 
procedure Unify assigns a new unification process 
identifier, generation, and invokes Unify_Aux. This 
procedure first dereferences both input nodes. This 
dereference process differs from the original one in 
that it follows up fortvard and lcopy links for the left 
input node and forward and rcopy links for the right 
input node. This revised dereference process elimi- 
nates the first source of the structure-sharing prob- 
lena. Then Unify_A*tx calculates the meet of the type 
symbol. If the meet is ±, which means inconsistency, 
it finishes by returning _L Otherwise Unify_Auz ob- 
tains tilt output node and sets the meet as its tsymbol 
value. The output node is created only when neither 
input nodt is current; otherwise the output node is a 
current input node. Then Un*fy_Aux treats arcs. This 
procedure assmnes the existence of two procedures: 
Share&Arc_Pair,s and Complement_Arcs. The former 
gives two lists of arcs each of which contains ares 
whose labels exist in both input nodes with the same 
are label order; the latter gives one list of arcs whose 
labels are unique to the first input node. For each arc 
pair obtained by Shared_A re_Pairs, Unify_A ux applies 
itself recursively to the value pair. And for each arc 
obtained by Complement_Ares, it copies its value. 
Let us compare the newly introduced cost and the 
PROCEDURE Unify(nodcl, node2) 
generation *-- generation + 1 ; 
return( Un@_A ux( node l , node2)) 
ENDPROCEDURE 
PROCEDURE I/n(fy_Aux(nodel, node2) 
node1 +-- Dereference_L(nodel); 
node2 +-- Dereference_R( node2 ) ; 
IF node1 = node2 AND Currcnt_p(nodel) THEN 
return(node l) 
ENDIF 
newtsymbol ~- nodel.tsymbol A7 node2.tsymbol; 
IF newtsymbol = ± THEN 
return(±) 
ENDIF; 
newnode ~-- Get_Out_Node(node1, node2, newtaymbol); 
( sares l , ,,'cs2} ~ ,~'hared_A rc_Pai,'s( node l , node2); 
caresl ~ Complement_Arcs(nodel, node2); 
cares2 *- Complement_Arcs(node2, node1); 
FOR (sarel,sarc2) IN (saresl,sarcs2} DO 
newvaluc ~- Unifg_Aux(sarcl, value, sarc2,value); 
IF ncwvalue- k THEN 
return(±) 
ELSE 
new~)al~te 
*-- Add_Arc(newnode, sarcl.label, newvalue); 
IF newvalue = ± THEN 
return(A_) 
ENDIF 
ENDIF 
ENDFOR; 
IF newnode # node l THEN 
FOR care IN carcst DO 
newvalue *- Copy_Node_L( carc.vah~e ); 
newnode 
~- Add_Arc( ncwnodc, care.label, newvaluc) 
ENDFOR 
ELSE IF ncwnode ¢ node2 THEN 
FOR carc IN carcs2 DO 
newvahte +-- Copy_Nodc_l~(care.value); 
newnode 
Add_Arc( newnode, care.label, newvalue) 
ENDFOR 
ENDIF; 
rcturn(ncwnodc) 
ENDPROCEDURE 
PROCEDURE Dereference_L( node) 
IF Node_p(node.forward) THEN 
return(1)ereference_L( node.forward) ) 
ELSE IF Curret_Nade_p(node.lcopv ) "/\['HEN 
return( Dereferenec_L( node.lcopy) ) 
ELSE 
return(node) 
ENDIF 
ENDPROCEDURE 
Fig. 6: The revised nondestructive unification proce- 
dure (1). 
effect of this revision. This revised version differs from 
the original in thai, it uses two dereference procedures 
that are tile same as tim original dereference proce- 
dure except that they use different fields. Thus, on 
the one hand, the overhead introduced to this revi- 
sion is only the use of one additional field of the node 
structure. On the other hand, although this revised 
version does not introduce new data-structure shar- 
ing, it can safely' treat data-structure sharing in ini- 
890 
PROCEDURE (;ct_Out_Node( node l , node& tsymbol) 
IF Current_p(nodel) AND Current_p(node2) THEN 
nodc2.forward +-- nodel; 
nodel.tsyrnbol ~- tsymbol; 
return( nodc \[ ) 
ELSE IF Current_p(nodel) THEN 
node2.rcopy ~ nodeI ; 
node l, tsymbol ~- tsymbol; 
return( node l ) 
ELSE IF Current_p(node2) THEN 
nodel.h:opy ~- nodc2; 
node2.tsymbol ~-- tsymbol; 
return(node2) 
ELSE 
newnode ~- Creutc_NodeO; 
nodel.lcopy ~- newnode; 
nodel.rcopy ~ newnnde; 
newnode.tsymbol ~- tsgmbol; 
return( newnode ) 
ENDIF 
ENDPROCEDUR.E 
Fig. 7: The revised nondestructiw', unification proce- 
dure (2), 
tial data structures. This can significantly reduce the 
amount of initial data structures required for linguis- 
tic descriptions, especiMly for lexical descriptions, and 
thus reduce garbage-collection and page-sw~q)ping. 
Revised LING Unification 
L\[N(I uniliet~tion is based on nondestructive unifica- 
tion and uses copy-dependency information to imple- 
ment data-structure sharing. For a unique label arc, 
instead of its vMue being copied, the value itself is 
used as the output vMuc and copy-dependency rela- 
tions are recorded to provide R)r later modification 
of shared structures. This algorithm uses a revised 
Copy~Node procedure that takes as its input two node 
structures (nodel and node2) and one arc structure, 
arc 1 where node.l is the node to be COl)ied. The struc- 
ture arel is an arc to node J, and node t is an an- 
cestor node of node.l -that is, the node fi'om which 
arel departs and the revised procedure is as fol- 
lows: (i) if nodel' (the dereference result of node.t) 
is current, then Copy_Node returns nodel' to indi- 
cate that l, he ancestor node2 must be copied imme-. 
diately; otherwise, (ii) Copy_Arcs is applied to node l' 
and if it returns several arc copies, Copy_Node cre- 
ates a new copy node and then adds to the new 
node the arc copies and arcs of node\[' that are not 
copied, and returns the ne.w node to indicate the an- 
eestor node having to be coiffed immediately; other- 
wise, (iii) Copy_Node registors the copy-dependency 
between the nodel' and the ancestor node node2 
that is, it adds the pair consisting of the ancestor node 
node2 a.nd the arc arc I into the copy field of node 1 '- 
and returns Nil, to indicate that the ancestor must 
not be copied immediatelyfi When a new copy of a 
node is needed later, this algorithm will copy struc- 
eIn tile \],IN(-; unlfica.iton Mgorithm, ~t node structure's 
copy field is used to keep either copy iuform~ttion or copy- 
dependency inform~ttion. When tile', field keeps copy- 
dependency inform;ttion, its v~hle is a copydep structure 
consisting of an integer generation field- and a set of 
PROCEDURE Copy_Node_L(node, arc, ancestor) 
node ~- Derference_L( node); 
IF Current_p(node) THEN 
return( node); 
ELSE IF node.reuse: -" rused THEN 
return( Simple_ Copy_Node_L( nodc ) ) 
ENDIF 
newarcs ~- Copy_A rcs_L( node); 
IF newarcs 5£ 0 THEN 
newnodc ~- Create_No&O; 
uewnode.tsymbol ~ node.tsymbol; 
node.lcopy ~-- newnode; 
FOIl. arc IN nodc.arcs DO 
newarc *-- Find_Are( arc.labcl, newarcs); 
IF Arc_p(newarc) THEN 
newvalne 
~- A dd_A rc( newnodc, arc.label, n eware, vahte ) 
ELSE 
newv(thte 
~- A dd_Arc( ncwnode, arc.lab(l, are.value) 
ENDIF 
ENDFOR; 
return( newnode) 
ELSE IF Copydep_p(node.leopy) AND 
node.lcopy.generation = generation THEN 
n ode. Icopy. deps 
~- nod~2eopv.deps u {((,neestor, .r4}; 
node,rettsc ~-- ltlsed; 
return(NIL) 
ELSE 
copydcp ~ Create_CopydePO; 
copydcp.gcneration ~ gcneration; 
,'.opydep.d,,ps ,-- ((rLncesto,', are)}; 
node.leopy ,-. eopydep; 
node. reuse ~- lused; 
return(NIL) 
ENDIF 
ENDPROCEDUI1,E 
PROCEDURE Copy_Ares_L(node) 
newarcs ~- 0; 
FOR arc IN node.arcs DO 
newnode ~- Copg_Nodc( arc. v(due, are, node); 
IF Nodc_p(newnode) THEN 
newarc ~ Create_Are(arc.label, newnode ); 
newarcs +-- newarcs U {newarc} 
ENDIF 
ENDFOR; 
ENDPROCEDUR,E 
Fig. 8: The new revised Copy_Node procedure. 
tures by using the copy-depe,ldency information in its 
copy field (in the revised Get_Out_Node procedure for 
the 13NG unification). It substitutes arcs with newly 
copied nodes for existing arcs. Thus the antecedent 
nodes are also copied. 
The revised L\[NCI unification is based on the re- 
vised nondestructive unification and uses a node struc- 
ture consisting of the fields in the node structure 
shown in Fig. 5 and a new field reuse \[br indicat 
node and arc pMrs -deps field (see Fig. 3). The technique 
used to control tile lifetime of copy-dependency informa- 
tion is tile same as tha.t of copy information. That is, the 
deps field value is meaningN1 only when the generation 
vadne is equM to the unification process identifier. 
891 
ing data-structure sharing status. When the top-level 
unification procedure is invoked, it sets two new sym- 
bols to the two variables lused and fused. That a node 
structure has as its reuse field value the lused value 
means that it is used as part of the left input, and that 
it has as its reuse value the rused value means that it 
is used as part of the right input,. The revised LING 
unification uses two new revised Copy_Node proce- 
dures, Copy_Node_L (shown in Fig. 8) and the analo- 
gons preocedure Copy_Node_It These procedures are 
respectively used to treat the left and right inputs 
and they differ from the corresponding original pro- 
cedure in two places. First, instead of step (i) above, 
if ~odel' (the dereference result of no&l) is current, 
Cop?l_Node_l, (or Copy_Node_R) returns 7~ode l' to in- 
dicate that tire ancestor, node2, must be copied im- 
mediately. But if node1' has as its reuse field value 
the fused (or lused) value, it creates a copy of the 
whole subgraph whose root is nodel'and returns the 
eopied structure also to indica~,c that the ancestor 
node must be copied immediately. Second, in step 
(iii), they register data-structure sharing status that 
is, they set the lused (or fused) value to the reuse field 
of node l" as well as register copy-dependency infor- 
mation. This revised LING unification ensures safety 
in data-structure sharing. 
Again let us compare the newly introduced conr- 
putational costs and the effect of l, his revision. The 
newly introduced costs are the additional cost of the 
revised dereference procedures (which is the same as 
in the previous one) and the cost of checking reuse 
status. The former cost is small, as shown in the dis- 
cussion of the previous algorithm, ~nd the latter cost 
is also small. These costs are thus not significant rel- 
ative to the efficiency gain obtained by this revision. 
Revised Quasi-Destructive Unification 
The strncture-sharillg version of quasi-destructive 
unification keeps update information in the field 
meaningful only during l, he unification. After a suc- 
eessful unification is obtained, this algorithm copies 
the unification result and attempts data-structure 
sharing. This algorithm can be revised to ensure 
safety in dal, a-structurc sharing hy using a node struc- 
ture including two sets of fields for update information 
and one reuse field and by checking node reuse status 
while eopying. 
5 CONCLUSION 
The graph unification algorithms described ira this pa- 
per increase the efIiciency of feature structure unifica- 
tion by discarding tile assumption that data-structure 
sharing between two input structures nccurs only 
when the t~ature-structure sharing occurs lyetween the 
feature-addresses they represent. All graph unifica- 
tion algorithms proposed so far make this assumption 
and are therefore required to copy all or part of their 
input strucl, ures when there is a possibility of violat- 
ing it. '\['his copying reduces their etIiciency. This 
pape.r analyzed this problem and points out key ideas 
for solving it. Revised procedures tbr nondestructive 
unification, LING unification, and quasi-destructive 
unification have been developed. These algorithms 
make the use of feature structures in constraint-based 
natural language processing mnch more elficient. The 
key ideas in this paper can also be used to make the 
incremental graph generalization algorithm (Kogure, 
1993) more efficient, 
ACKNOWLEDGMENTS 
1 thank Akira Shimazu, Mikio Nakmto, and other col- 
leagues in the Dialogue Understanding Group at the 
NTT Basic Research Laboratories for their encour- 
agement and thought-prow)king discussions. 
REFERENCES 
Air-Karl, H. (1986). An Algebraic Semantics Approach to 
the Effective Resolution of Type Equations. J. of 
Thcor. Comp. Sci., It5, 293-351. 
Boyer, R. S., 8z Moore, J. S. (11972). The Sharing of Struc- 
ture in Theorem-Proving Programs. In Meltzer, B., 
& Michie, D. (Eds.), Machine Intelligencc Vol. 7, 
chap. 6, pp. 101-116. Edinburgh University Press. 
gmele, M. (1991). Unification with Lazy Non-Redundant 
Copying. \]in Prec. of the P,9th ACL, pp. 325-330. 
Godden, K. (1990). Lazy Unification. In Prec. of the 28th 
ACL, pp. 180 187. 
Gunji, T. (1987). Japanesc Phrase Structurc Grammar. 
Reidel. 
IIoperoft, J. E., & Karl), R. M. (1971). An Algorithm for 
Testing the Equivalence of Finite Automata. 51"ech. 
Rep. 51'\]t-71-114, Dept. of Comp. Sci., Coruell Uni- 
versity. 
lluet, G. (\]976). l?&olution d'Equations dens des Lan- 
gages d'Ordrc l, 2, ..., w. Ph.D. thesis, Universitd 
de Paris VII. 
Karttunen, I, (1986). D-PNI?R- A Development Environ- 
meat for Unification-Based Grammars. '\['ech. Rep. 
CSLI-86-61, CSI,I. 
Karttuneu, I,., & Kay, M. (1985). Structure Sharing Rep- 
resentation with Binary Trees. In Prec. of the 23rd 
ACL, pp. 133--136. 
K~sper, R. T., & Rounds, W. C. (1986). A Logical Se- 
mantics for l!'e~ture Structure. \[n Prec. of the 24th 
A CL. 
Kogurc, K. (1989). P~rsing Japanese Spoken Sentences 
based on HPS(L In Prec. of the Int. Workshop on 
Parsing Technologies, pp. 132 14l. 
Kogure, K. (1990). Strategic Lazy Incremental Copy 
Graph Unification. In Prec. of the 13th COLING, 
Vol. 2, pp. 223-228. 
Kogure, K. (1993). Typed l"eaturc Structure Generaliza- 
tion by Incremental Graph Copying. \]n 'Frost, tI. 
(Ed.), Feature l'brmalisms and Linguistic Ambigu- 
ity, pp. 1;t9 158. l'llis Horwood, 
t?ereira, F. C. N. (1985). Structure Sharing Representation 
for Unitieation-Based Formalisms. In Prec. of the 
23rd ACL, pp. 137 144. 
Pollard, C., & Sag, \[. (1987). An Information-Bascd 
Syntax and Semantics Volume l: Fundamcntals. 
CSLI Lecture' Notes No. 13. CSLI. 
Shieber, S. M. (1989). Constraint.Based Grammar 
Formalisms Parsing and Type i~tference for Natu- 
ral and Computer Languages. Ph.D. thesis, Stanford 
University. 
Smolka, G. (1988). A l"eature Logic with Subsorts. 
LILOG 33, IBM Deutschland. 
Tomabechi, tI. (1991). Quasi-Destructive Graph Unifica- 
tion. In Prec. of the 29th ACL, pp. 315. 322. 
Tomabechi, II. (1992). Quasi-\])estructive Graph Unifica- 
tion with Structure-Sh~ring. \[n Prec. of the 14th 
COLING, pp. 440-.446. 
Wroblewski, D. A. (1987). Nondestructive (-lraph Unifica- 
tion. Irt })roe. of the 6th AAAI, pp. 582-587. 
892 
