Quasi-Destructive Graph Unification 
Hideto Tomabechi 
Carnegie Mellon University ATR Interpreting Telephony 
109 EDSH, Pittsburgh, PA 15213-3890 Research Laboratories* 
tomabech+@cs.cmu.edu Seika-cho, Sorakugun, Kyoto 619-02 JAPAN 
ABSTRACT 
Graph unification is the most expensive part 
of unification-based grammar parsing. It of- 
ten takes over 90% of the total parsing time 
of a sentence. We focus on two speed-up 
elements in the design of unification algo- 
rithms: 1) elimination of excessive copying 
by only copying successful unifications, 2) 
Finding unification failures as soon as possi- 
ble. We have developed a scheme to attain 
these two elements without expensive over- 
head through temporarily modifying graphs 
during unification to eliminate copying dur- 
ing unification. We found that parsing rel- 
atively long sentences (requiring about 500 
top-level unifications during a parse) using 
our algorithm is approximately twice as fast 
as parsing the same sentences using Wrob- 
lewski's algorithm. 
1. Motivation 
Graph unification is the most expensive part of 
unification-based grammar parsing systems. For ex- 
ample, in the three types of parsing systems currently 
used at ATR \], all of which use graph unification algo- 
rithms based on \[Wroblewski, 1987\], unification oper- 
ations consume 85 to 90 percent of the total cpu time 
devoted to a parse. 2 The number of unification opera- 
tions per sentence tends to grow as the grammar gets 
larger and more complicated. An unavoidable paradox 
is that when the natural language system gets larger 
and the coverage of linguistic phenomena increases 
the writers of natural language grammars tend to rely 
more on deeper and more complex path equations (cy- 
cles and frequent reentrancy) to lessen the complexity 
of writing the grammar. As a result, we have seen that 
the number of unification operations increases rapidly 
as the coverage of the grammar grows in contrast to 
the parsing algorithm itself which does not seem to 
*Visiting Research Scientist. Local email address: 
tomabech%al~-la.al~.co.jp@ uunet.UU.NET. 
1The three parsing systems are based on: 1. Earley's 
algorithm, 2. active chartparsing, 3. generalized LR parsing. 
2In the large-scale HPSG-based spoken Japanese analy- 
sis system developed at ATR, sometimes 98 percent of the 
elapsed time is devoted to graph unification (\[Kogure, 1990\]). 
grow so quickly. Thus, it makes sense to speed up 
the unification operations to improve the total speed 
performance of the natural language systems. 
Our original unification algorithm was based on 
\[Wroblewskl, 1987\] which was chosen in 1988 as 
the then fastest algorithm available for our applica- 
tion (HPSG based unification grammar, three types of 
parsers (Earley, Tomita-LR, and active chart), unifica- 
tion with variables and cycles 3 combined with Kasper's 
(\[Kasper, 1987\]) scheme for handling disjunctions. In 
designing the graph unification algorithm, we have 
made the following observation which influenced the 
basic design of the new algorithm described in this 
paper: 
Unification does not always succeed. 
As we will see from the data presented in a later section, 
when our parsing system operates with a relatively 
small grammar, about 60 percent of the unifications 
attempted during a successful parse result in failure. 
If a unification falls, any computation performed and 
memory consumed during the unification is wasted. As 
the grammar size increases, the number of unification 
failures for each successful parse increases 4. Without 
completely rewriting the grammar and the parser, it 
seems difficult to shift any significant amount of the 
computational burden to the parser in order to reduce 
the number of unification failures 5. 
Another problem that we would like to address in 
our design, which seems to be well documented in the 
existing literature is that: 
Copying is an expensive operation. 
The copying of a node is a heavy burden to the pars- 
ing system. \[Wroblewski, 1987\] calls it a "computa- 
tional sink". Copying is expensive in two ways: 1) it 
takes time; 2) it takes space. Copying takes time and 
space essentially because the area in the random access 
memory needs to be dynamically allocated which is an 
expensive operation. \[Godden, 1990\] calculates the 
computation time cost of copying to be about 67 per- 
3Please refer to \[Kogure, 1989\] for trivial time modifica- 
tion of Wroblewski's algorithm to handle cycles. 
4We estimate over 80% of unifications to be failures in 
our large-scale speech-to-speech translation system under 
development. 
5Of course, whether that will improve the overall perfor- 
mance is another question. 
315 
cent of the total parsing time in his TIME parsing sys- 
tem. This time/space burden of copying is non-trivial 
when we consider the fact that creation of unneces- 
sary copies will eventually trigger garbage collections 
more often (in a Lisp environment) which will also 
slow down the overall performance of the parsing sys- 
tem. In general, parsing systems are always short of 
memory space (such as large LR tables of Tomita-LR 
parsers and expan~ng tables and charts of Farley and 
active chart parsers"), and the marginal addition or sub- 
traction of the amount of memory space consumed by 
other parts of the system often has critical effects on 
the performance of these systems. 
Considering the aforementioned problems, we pro- 
pose the following principles to be the desirable con- 
ditions for a fast graph unification algorithm: 
• Copying should be performed only for success- 
ful unifications. 
• Unification failures should be found as soon as 
possible. 
By way of definition we would like to categorize ex- 
cessive copying of dags into Over Copying and Early 
Copying. Our definition of over copying is the same as 
Wroblewski's; however, our definition of early copying 
is slightly different. 
• Over Copying: Two dags are created in order 
to create one new dag. - This typically happens 
when copies of two input dags are created prior 
to a destructive unification operation to build one 
new dag. (\[Godden, 1990\] calls such a unifica- 
tion: Eager Unification.). When two arcs point to 
the same node, over copying is often unavoidable 
with incremental copying schemes. 
• Early Copying: Copies are created prior to the 
failure of unification so that copies created since 
the beginning of the unification up to the point of 
failure are wasted. 
Wroblewski defines Early Copying as follows: "The 
argument dags are copied before unification started. If 
the unification falls then some of the copying is wasted 
effort" and restricts early copying to cases that only 
apply to copies that are created prior to a unification. 
Restricting early copying to copies that are made prior 
to a unification leaves a number of wasted copies that 
are created during a unification up to the point of failure 
to be uncovered by either of the above definitions for 
excessive copying. We would like Early Copying to 
mean all copies that are wasted due to a unification fail- 
ure whether these copies are created before or during 
the actual unification operations. 
Incremental copying has been accepted as an effec- 
tive method of minimizing over copying and eliminat- 
6For example, our phoneme-based generalized LR parser 
for speech input is always running on a swapping space be- 
cause the LR table is too big. 
ing early copying as defined by Wroblewski. How- 
ever, while being effective in minimizing over copying 
(it over copies only in some cases of convergent arcs 
into one node), incremental copying is ineffective in 
eliminating early copying as we define it. 7 Incremen- 
tal copying is ineffective in eliminating early copying 
because when a gra_ph unification algorithm recurses 
for shared arcs (i.e. the arcs with labels that exist in 
both input graphs), each created unification operation 
recursing into each shared arc is independent of other 
recursive calls into other arcs. In other words, the re- 
cursive calls into shared arcs are non-deterministic and 
there is no way for one particular recursion into a shared 
arc to know the result of future recursions into other 
shared arcs. Thus even if a particular recursion into 
one arc succeeds (with minimum over copying and no 
early copying in Wroblewski's sense), other arcs may 
eventually fail and thus the copies that are created in 
the successful arcs are all wasted. We consider it a 
drawback of incremental copying schemes that copies 
that are incrementally created up to the point of fail- 
ure get wasted. This problem will be particularly felt 
when we consider parallel implementations of incre- 
mental copying algorithms. Because each recursion 
into shared arcs is non-deterministic,parallel processes 
can be created to work concurrently on all arcs. In each 
of the parallelly created processes for each shared arc, 
another recursion may take place creating more paral- 
lel processes. While some parallel recursive call into 
some arc may take time (due to a large number of sub- 
arcs, etc.) another non-deterministic call to other arcs 
may proceed deeper and deeper creating a large num- 
ber of parallel processes. In the meantime, copies are 
incrementally created at different depths of subgraphs 
as long as the subgraphs of each of them are unified 
successfully. This way, when a failure is finally de- 
tected at some deep location in some subgraph, other 
numerous processes may have created a large number 
of copies that are wasted. Thus, early copying will be 
a significant problem when we consider the possibility 
of parallelizing the unification algorithms as well. 
2. Our Scheme 
We would like to introduce an algorithm which ad- 
dresses the criteria for fast unification discussed in the 
previous sections. It also handles cycles without over 
copying (without any additional schemes such as those 
introduced by \[Kogure, 1989\]). 
As a data structure, a node is represented with eight 
fields: type, arc-list, comp-arc-list, forward, copy, 
comp-arc-mark, forward-mark, and copy-mark. Al- 
though this number may seem high for a graph node 
data structure, the amount of memory consumed is 
not significantly different from that consumed by other 
7'Early copying' will henceforth be used to refer to early 
copying as defined by us. 
316 
algorithms. Type can be represented by three bits; 
comp-arc-mark, forward-mark, and copy-mark can be 
represented by short integers (i.e. fixnums); and comp- 
arc-list (just like arc-lis0 is a mere collection of pointers 
to memory locations. Thus this additional information 
is trivial in terms of memory cells consumed and be- 
cause of this dam structure the unification algorithm 
itself can remain simple. 
NODE 
type 
+ ............... + 
arc-list 
+ ............... + 
comp-arc-list 
+ ............... + 
forward 
+ ............... + 
copy 
+ ............... + 
comp-arc-mark 
+ ............... + 
forward-mark 
+ ............... + 
copy-mark 
ARC 
I label I 
+ ............... + 
I value I 
+ ............... + 
Figure 1: Node and Arc Structures 
The representation for an arc is no different from that 
of other unification algorithms. Each arc has two fields 
for 'label' and 'value'. 'Label' is an atomic symbol 
which labels the arc, and 'value' is a pointer to a node. 
The central notion of our algorithm is the depen- 
dency of the representational content on the global 
timing clock (or the global counter for the current 
generation of unification algorithms). This scheme 
was used in \[Wroblewski, 1987\] to invalidate the copy 
field of a node after one unification by incrementing a 
global counter. This is an extremely cheap operation 
but has the power to invalidate the copy fields of all 
nodes in the system simultaneously. In our algorithm, 
this dependency of the content of fields on global tim- 
ing is adopted for arc lists, forwarding pointers, and 
copy pointers. Thus any modification made, such as 
adding forwarding links, copy links or arcs during one 
top-level unification (unify0) to any node in memory 
can be invalidated by one increment operation on the 
global timing counter. During unification (in unifyl) 
and copying after a successful unification, the global 
timing ID for a specific field can be checked by compar- 
ing the content of mark fields with the global counter 
value and if they match then the content is respected; 
if not it is simply ignored. Thus the whole operation is 
a trivial addition to the original destructive unification 
algorithm (Pereira's and Wroblewski's unifyl). 
We have two kinds of arc lists 1) arc-list and comp- 
arc-list. Arc-list contains the arcs that are permanent 
(i.e., usual graph arcs) and compare-list contains arcs 
that are only valid during one graph unification oper- 
ation. We also have two kinds of forwarding links, 
i.e., permanent and temporary. A permanent forward- 
ing link is the usual forwarding link found in other 
algorithms (\[Pereira, 1985\], \[Wroblewski, 1987\], etc). 
Temporary forwarding links are links that are only valid 
during one unification. The currency of the temporary 
links is determined by matching the content of the mark 
field for the links with the global counter and if they 
match then the content of this field is respected 8. As 
in \[Pereira, 1985\], we have three types of nodes: 1) 
:atomic, 2) :bottom 9, and 3) :complex. :atomic type 
nodes represent atomic symbol values (such as Noun), 
:bottom type nodes are variables and :complex type 
nodes are nodes that have arcs coming out of them. 
Arcs are stored in the arc-list field. The atomic value 
is also stored in the arc-list if the node type is :atomic. 
:bottom nodes succeed in unifying with any nodes and 
the result of unification takes the type and the value 
of the node that the :bottom node was unified with. 
:atomic nodes succeed in unifying with :bottom nodes 
or :atomic nodes with the same value (stored in the 
arc-lis0. Unification of an :atomic node with a :com- 
plex node immediately fails. :complex nodes succeed 
in unifying with :bottom nodes or with :complex nodes 
whose subgraphs all unify. Arc values are always nodes 
and never symbolic values because the :atomic and 
:bottom nodes may be pointed to by multiple arcs (just 
as in structure sharing of :complex nodes) depending 
on grammar constraints, and we do not want arcs to 
contain terminal atomic values. Figure 2 is the cen- 
tral quasi-destructive graph unification algorithm and 
Figure 3 shows the algorithm for copying nodes and 
arcs (called by unify0) while respecting the contents of 
comp-arc-lists. 
The functions Complementarcs(dg 1,dg2) and Inter- 
sectarcs(dgl,dg2) are similar to Wroblewski's algo- 
rithm and return the set-difference (the arcs with la- 
bels that exist in dgl but not in rig2) and intersec- 
tion (the arcs with labels that exist both in dgl and 
dg2) respectively. During the set-difference and set- 
intersection operations, the content of comp-arc-lists 
are respected as parts of arc lists if the comp-arc- 
marks match the current value of the global timing 
counter. Dereference-dg(dg) recursively traverses the 
forwarding link to return the forwarded node. In do- 
ing so, it checks the forward-mark of the node and 
if the forward-mark value is 9 (9 represents a perma- 
nent forwarding link) or its value matches the current 
8We do not have a separate field for temporary forwarding 
links; instead, we designate the integer value 9 to represent a 
permanent forwarding link. We start incrementing the global 
counter from 10 so whenever the forward-mark is not 9 the 
integer value must equal the global counter value to respect 
the forwarding link. 
9Bottom is called leaf in Pereira's algorithm. 
317 
value of *unify-global-counter*, then the function re- 
turns the forwarded node; otherwise it simply returns 
the input node. Forward(dgl, dg2, :forward-type) puts 
(the pointer to) dg2 in the forward field of dgl. If 
the keyword in the function call is :temporary, the cur- 
rent value of the *unify-global-counter* is written in 
the forward-mark field of dgl. If the keyword is :per- 
manent, 9 is written in the forward-mark field of dgl. 
Our algorithm itself does not require any permanent 
forwarding; however, the functionality is added be- 
cause the grammar reader module that reads the path 
equation specifications into dg feature-structures uses 
permanent forwarding to merge the additional gram- 
matical specifications into a graph structure 1°. The 
temporary forwarding links are necessary to handle 
reentrancy and cycles. As soon as unification (at any 
level of recursion through shared arcs) succeeds, a tem- 
porary forwarding link is made from dg2 to dgl (dgl 
to dg2 if dgl is of type :bottom). Thus, during unifi- 
cation, a node already unified by other recursive calls 
to unifyl within the same unify0 call has a temporary 
forwarding link from dg2 to dgl (or dgl to dg2). As 
a result, if this node becomes an input argument node, 
dereferencing the node causes dgl and dg2 to become 
the same node and unification immediately succeeds. 
Thus a subgraph below an already unified node will not 
be checked more than once even if an argument graph 
has a cycle. Also, during copying done subsequently to 
a successful unification, two ares converging into the 
same node will not cause over copying simply because 
if a node already has a copy then the copy is returned. 
For example, as a case that may cause over copies in 
other schemes for dg2 convergent arcs, let us consider 
the case when the destination node has a corresponding 
node in dgl and only one of the convergent arcs has a 
corresponding are in dgl. This destination node is al- 
ready temporarily forwarded to the node in dgl (since 
the unification check was successful prior to copying). 
Once a copy is created for the corresponding dgl node 
and recorded in the copy field of dgl, every time a 
convergent arc in dg2 that needs to be copied points 
to its destination node, dereferencing the node returns 
the corresponding node in dgl and since a copy of it 
already exists, this copy is returned. Thus no duplicate 
copy is created H. 
roWe have been using Wroblewski's algorithm for the uni- 
fication part of the parser and thus usage of (permanent) 
forwarding links is adopted by the grammar reader module 
to convert path equations to graphs. For example, permanent 
forwarding is done when a :bottom node is to be merged with 
other nodes. 
nCopying of dg2 ares happens for arcs that exist in dg2 
but not in dgl (i.e., Complementarcs(dg2,dgl)). Such arcs 
are pushed to the cornp-arc-list of dgl during unify1 and 
are copied into the are-list of the copy during subsequent 
copying. If there is a cycle or a convergence in arcs in dgl or 
in ares in dg2 that do not have corresponding arcs in dg 1, then 
the mechanism is even simpler than the one discussed here. 
A copy is made once, and the same copy is simply returned 
QUASI-DESTRUCTIVE GRAPH UNIFICATION I 
FUNCTION unify-dg(dg 1,dg2); 
result ~ catch with tag 'unify-fail 
calling unify0(dgl,dg2); 
increment *unify-global-counter*; ;; starts from 10 12 
retum(result); 
END; 
FUNCTION unify0(dg 1,dg2); 
if '*T* = unifyl(dgl,dg2); THEN 
copy .--- eopy-dg-with-comp-arcs(dgl); 
return(copy); 
END; 
FUNCTION unify1 (dgl-underef, dg2-underef); 
dgl ,-- dereference-dg(dgl-underef); 
dg2 ~-- dereference-dg(dg2-underef); 
IF (dgl = dg2)I3THEN 
return('*T*); 
ELSE IF (dgl.type = :bottom) THEN 
forward-dg(dg 1,dg2,:ternporary); 
return('*T*); 
ELSE IF (dg2.type = :bottom) THEN 
forward-dg(dg2,dg 1,:temporary); 
return('*T*); 
ELSE IF (dgl.type = :atomic AND 
dg2.type = :atomic) THEN 
IF (dgl.arc-list = dg2.are-list)14THEN 
forward-dg(dg2,dg 1,:temporary); 
return('*T*); 
ELSE throwlSwith keyword 'unify-fail; 
ELSE IF (dgl.type = :atomic OR 
dg2.type = :atomic) THEN 
throw with keyword 'unify-fail; 
ELSE new ~ complementarcs(dg2,dgl); 
shared ~-- intersectarcs(dgl,dg2); 
FOR EACH arc IN shared DO 
unifyl (destination of 
the shared arc for dgl, 
destination of 
the shared arc for dg2); 
forward-dg(dg2,dg 1,:temporary); 1~ 
dg 1.comp-arc-mark *-- *unify-global-counter*; 
dgl.comp-arc-list ,-- new; 
return ('*T*); 
END; 
Figure 2: The Q-D. Unification Functions 
every lime another convergent arc points to the original node. 
It is because axes are copied only from either dgl or dg2. 
129 indicates a permanent forwarding link. 
13Equal in the 'eq' sense. Because of forwarding and 
cycles, it is possible that dgl and dg2 are 'eq'. 
X4Arc-list contains atomic value if the node is of type 
:atomic. 
lSCatch/throw construct; i.e., immediately return to un/fy- 
dg. 
16This will be executed only when all recursive calls into 
unifyl succeeded. Otherwise, a failure would have caused 
318 
QUASI-DESTRUCTIVE COPYING \] 
FUNCTION copy-dg-with-comp-arcs(dg-undere0; 
dg ~ dereference-dg(dg-undere0; 
IF (dg.copy is non-empty AND 
dg.copy-mark = *unify-global-counter*) THEN 
return(dg.copy);a7 
ELSE IF (dg.type = :atomic) THEN 
copy ,-- create-node0; Is 
copy.type ,-- :atomic; 
copy.are-list ,--- rig.are-list; 
dg.copy ,-- copy; 
dg.eopy-mark ,--- *unify-global-counter*; 
return(copy); 
ELSE IF (dg.type = :bottom) THEN 
copy *- ereate-nodeO; 
copy.type .-- :bottom; 
dg.copy ,-- copy; 
dg.copy-mark ~-- *unify-global-counter*; 
return(copy); 
ELSE 
copy *- create-node(); 
copy.type ,-- :complex; 
FOR ALL are IN dg.are-list DO 
newarc ,- copy-are-and-comp-arc(are); 
push newarc into copy.are-list; 
IF (dg.comp-are-list is non-empty AND 
dg.comp-arc-mark = *unify-global-counter*) THEN 
FOR ALL comp-arc IN dg.comp-are-list DO 
neware ,-- copy-arc-and-comp-arc(comp-arc); 
push neware into copy.are-list; 
dg.copy 4-- copy; 
dg.copy-mark ,-- *unify-global-counter*; 
return (copy); 
END; 
FUNCTION copy-arc-and-comp-arcs(input-arc); 
label ,--- input-arc.label; 
value ,-- copy-dg-with-comp-arcs(input-are.value); 
return a new arc with label and value; 
END; 
Figure 3: Node and Arc Copying Functions 
Figure 4 shows a simple example of quasi- 
destructive graph unification with dg2 convergent arcs. 
The round nodes indicate atomic nodes and the rect- 
angular nodes indicate bottom (variable) nodes. First, 
top-level unifyl finds that each of the input graphs has 
arc-a and arc-b (shared). Then unifyl is recursively 
called. At step two, the recursion into arc-a locally 
succeeds, and a temporary forwarding link with time- 
stamp(n) is made from node \[-\]2 to node s. At the third 
step (recursion into arc-b), by the previous forwarding, 
node f12 already has the value s (by dereferencing). 
Then this unification returns a success and a tempo- 
rary forwarding link with time-stamp(n) is created from 
an immediate return to unify.dg. 17I.e., 
the existing copy of the node. 
lSCreates an empty node structure. 
node \[-\] 1 to node s. At the fourth step, since all recur- 
sive unifications (unifyls) into shared arcs succeeded, 
top-level unifyl creates a temporary forwarding link 
with time-stamp(n) from dag2's root node to dagl's 
root node, and sets arc-c (new) into comp-arc-list of 
dagl and returns success ('*T*). At the fifth step, a 
copy of dagl is created respecting the content of comp- 
arc-list and dereferencing the valid forward links. This 
copy is returned as a result of unification. At the last 
step (step six), the global timing counter is incremented 
(n =:, n+ 1). After this operation, temporary forwarding 
links and comp-arc-lists with time-stamp (< n+l) will 
be ignored. Therefore, the original dagl and dag2 are 
recovered in a constant time without a costly reversing 
operations. (Also, note that recursions into shared-arcs 
can be done in any order producing the same result). 
unifyl(dagl,dag2) SHARF~-Ia, b} 
S " t 
For each node with arc-a. 
unifyl( s, \[ \]2) 
dag 1 dag2 
a b 
forward(n) 
For each node witbare-b. 
unifyl( \[ \]i, \[ \]2) 
forward(n) 
dagl. forwxd(n) dag2 
a/.., \]b-'.fist(n)={c} a//Jb~C 
ot 
forward(n) 
copy-comp-ar¢-list(dag 1) 
copy. of dagl (n) dag~dag2 
S t S ~.~~..~ j ~ t 
forward(n) copy ofdagl(n) dagl dag2 
Figure4: A Simple Example of Quasi-Destructive 
Graph Unification 
319 
As we just saw, the algorithm itself is simple. The 
basic control structure of the unification is similar to 
Pereira's and Wroblewski's unifyl. The essential dif- 
ference between our unifyl and the previous ones is 
that our unifyl is non-destructive. It is because the 
complementarcs(dg2,dgl) are set to the comp-arc-list 
of dgl and not into the are-list of dgl. Thus, as soon 
as we increment the global counter, the changes made 
to dgl (i.e., addition of complement arcs into comp- 
are-list) vanish. As long as the comp-arc-mark value 
matches that of the global counter the content of the 
comp-arc-list can be considered a part of arc-list and 
therefore, dgl is the result of unification. Hence the 
name quasi-destructive graph unification. In order to 
create a copy for subsequent use we only need to make 
a copy of dgl before we increment the global counter 
while respecting the content of the comp-arc-list of 
dgl. 
Thus instead of calling other unification functions 
(such as unify2 of Wroblewski) for incrementally ere- 
ating a copy node during a unification, we only need 
to create a copy after unification. Thus, if unifica- 
tion fails no copies are made at all (as in \[Karttunen, 
1986\]'s scheme). Because unification that recurses 
into shared ares carries no burden of incremental copy- 
ing (i.e., it simply checks if nodes are compatible), as 
the depth of unification increases (i.e., the graph gets 
larger) the speed-up of our method should get conspic- 
uous if a unification eventually fails. If all unifica- 
tions during a parse are going to be successful, our 
algorithm should be as fast as or slightly slower than 
Wroblewski's algorithm 19. Since a parse that does not 
fail on a single unification is unrealistic, the gain from 
our scheme should depend on the amount of unification 
failures that occur during a unification. As the number 
of failures per parse increases and the graphs that failed 
get larger, the speed-up from our algorithm should be- 
come more apparent. Therefore, the characteristics of 
our algorithm seem desirable. In the next section, we 
will see the actual results of experiments which com- 
pare our unification algorithm to Wroblewski's algo- 
rithm (slightly modified to handle variables and cycles 
that are required by our HPSG based grammar). 
3. Experiments 
Table 1 shows the results of our experiments using an 
HPSG-based Japanese grammar developed at ATR for 
a conference registration telephone dialogue domain. 
19h may be slightly slower becauseour unification recurses 
twice on a graph: once to unify and once to copy, whereas in 
incremental unification schemes copying is performed dur- 
ing the same recursion as unifying. Additional bookkeeping 
for incremental copying and an additional set-difference op- 
eration (i.e, complementarcs(dgl,dg2)) during unify2 may 
offset this, however. 
'Unifs' represents the total number of unifications dur- 
ing a parse (the number of calls to the top-level 'unify- 
dg', and not 'unifyl'). 'USrate' represents the ratio 
of successful unifications to the total number of uni- 
fications. We parsed each sentence three times on a 
Symbolics 3620 using both unification methods and 
took the shortest elapsed time for both methods ('T' 
represents our scheme, 'W' represents Wroblewski's 
algorithm with a modification to handle cycles and 
variables2°). Data structures are the same for both uni- 
fication algorithms (except for additional fields for a 
node in our algorithm, i.e., comp-arc-list, comp-arc- 
mark, and forward-mark). Same functions are used to 
interface with Earley's parser and the same subfunc- 
tions are used wherever possible (such as creation and 
access of arcs) to minimize the differences that are not 
purely algorithmic. 'Number of copies' represents the 
number of nodes created during each parse (and does 
not include the number of arc structures that are cre- 
ated during a parse). 'Number of conses' represents the 
amount of structure words consed during a parse. This 
number represents the real comparison of the amount 
of space being consumed by each unification algorithm 
0ncluding added fields for nodes in our algorithm and 
arcs that are created in both algorithms). 
We used Earley's parsing algorithm for the experi- 
ment. The Japanese grammar is based on HPSG anal- 
ysis (\[Pollard and Sag, 1987\]) covering phenomena 
such as coordination, case adjunction, adjuncts, con- 
trol, slash categories, zero-pronouns, interrogatives, 
WH constructs, and some pragmatics (speaker, hearer 
relations, politeness, etc.) (\[Yoshimoto and Kogure, 
1989\]). The grammar covers many of the important 
linguistic phenomena in conversational Japanese. The 
grammar graphs which are converted from the path 
equations contain 2324 nodes. We used 16 sentences 
from a sample telephone conversation dialog which 
range from very short sentences (one word, i.e., iie 
'no') to relatively long ones (such as soredehakochi- 
rakarasochiranitourokuyoushiwoookuriitashimasu " In 
that case, we \[speaker\] will send you \[hearer\] the reg- 
istration form.'). Thus, the number of (top-level) uni- 
fications per sentence varied widely (from 6 to over 
500). 
~Cycles can be handled in Wroblewski's algorithm by 
checking whether an arc with the same label already exists 
when arcs are added to a node. And ff such an arc already 
exists, we destructively unify the node which is the destina- 
tion of the existing arc with the node which is the destination 
of the arc being added. If such an arc does not exist, we 
simply add the arc. (\[Kogure, 1989\]). Thus, cycles can be 
handled very cheaply in Wroblewski's algorithm. Handling 
variables in Wroblewski's algorithm is basically the same as 
in our algorithm (i.e., Pereira's scheme), and the addition of 
this functionality can be ignored in terms of comparison to 
our algorithm. Our algorithm does not require any additional 
scheme to handle cycles in input dgs. 
320 
sent# 
1 
2 
3 
4 
5 
6 
7 
8 
9 
i0 
ii 
12 
13 
14 
15 
16 
Unifs 
6 
i01 
24 
71 
305 
59 
6 
81 
480 
555 
109 
428 
559 
52 
77 
77 
USrate 
0.5 
0.35 
0.33 
0.41 
0.39 
0.38 
0.38 
0.39 
0.38 
0.39 
0.40 
0.38 
0.38 
0.38 
0.39 
0.39 
Elapsed time(sec) 
T W 
1.066 1 113 
1.897 2 899 
1.206 1 290 
3.349 4 102 
12.151 17 309 
1.254 1 601 
1.016 1 030 
3.499 4 452 
18.402 34 653 
26.933 47 224 
4.592 5 433 
13.728 24 350 
15.480 42 357 
1.977 2 410 
3.574 4 688 
3.658 4 431 
Num of Copies Num of Conses 
T W T W 
85 107 1231 1451 
1418 2285 15166 23836 
129 220 1734 2644 
1635 2151 17133 22943 
5529 9092 57405 93035 
608 997 6873 10763 
85 107 1175 1395 
1780 2406 18718 24978 
9466 15756 96985 167211 
11789 18822 119629 189997 
2047 2913 21871 30531 
7933 13363 81536 135808 
9976 17741 102489 180169 
745 941 8272 10292 
1590 2137 16946 22416 
1590 2137 16943 22413 
Table 1: Comparison of our algorithm with Wroblewski's 
4. Discussion: Comparison to Other 
Approaches 
The control structure of our algorithm is identical to 
that of \[Pereira, 1985\]. However, instead of stor- 
ing changes to the argument (lags in the environment 
we store the changes in the (lags themselves non- 
destructively. Because we do not use the environment, 
the log(d) overhead (where d is the number of nodes 
in a dag) associated with Pereira's scheme that is re- 
quired during node access (to assemble the whole dag 
from the skeleton and the updates in the environment) 
is avoided in our scheme. We share the principle of 
storing changes in a restorable way with \[Karttunen, 
1986\]'s reversible unification and copy graphs only 
after a successful unification. Karttunen originally 
introduced this scheme in order to replace the less 
efficient structure-sharing implementations (\[Pereira, 
1985\], \[Karttunen and Kay, 1985\]). In Karttunen's 
method 21, whenever a destructive change is about to 
be made, the attribute value pairs 22 stored in the body 
of the node are saved into an array. The dag node struc- 
ture itself is also saved in another array. These values 
are restored after the top level unification is completed. 
(A copy is made prior to the restoration operation if 
the unification was a successful one.) The difference 
between Karttunen's method and ours is that in our al- 
gorithm, one increment to the global counter can invali- 
date all the changes made to nodes, while in Karttunen's 
algorithm each node in the entire argument graph that 
has been destructively modified must be restored sep- 
arately by retrieving the attribute-values saved in an 
21The discussion ofKartunnen's method is based on the D- 
PATR implementation on Xerox 1100 machines (\[Karttunen, 
1986\]). 
~'Le., arc structures: 'label' and 'value' pairs in our 
vocabulary. 
array and resetting the values into the dag structure 
skeletons saved in another array. In both Karttunen's 
and our algorithm, there will be a non-destructive (re- 
versible, and quasi-destructive) saving of intersection 
arcs that may be wasted when a subgraph of a partic- 
ular node successfully unifies but the final unification 
fails due to a failure in some other part of the argument 
graphs. This is not a problem in our method because the 
temporary change made to a node is performed as push- 
ing pointers into already existing structures (nodes) and 
it does not require entirely new structures to be created 
and dynamically allocated memory (which was neces- 
sary for the copy (create-node) operation), z3 \[Godden, 
1990\] presents a method of using lazy evaluation in 
unification which seems to be one SUCC~sful actual- 
ization of \[Karttunen and Kay, 1985\]'s lazy evaluation 
idea. One question about lazy evaluation is that the ef- 
ficiency of lazy evaluation varies depending upon the 
particular hardware and programming language envi- 
ronment. For example, in CommonLisp, to attain a 
lazy evalaa_tion, as soon as a function is delayed, a clo- 
sure (or a structure) needs to be created receiving a dy- 
namic allocation of memory Oust as in creating a copy 
node). Thus, there is a shift of memory and associated 
computation consumed from making copies to making 
closures. In terms of memory cells saved, although 
the lazy scheme may reduce the total number of copies 
created, if we consider the memory consumed to create 
closures, the saving may be significantly canceled. In 
terms of speed, since delayed evaluation requires addi- 
tional bookkeeping, how schemes such as the one in- 
troduced by \[Godden, 1990\] would compare with non- 
lazy incremental copying schemes is an open question. 
Unfortunately Godden offers a comparison of his algo- 
Z3Although, in Karttunen's method it may become rather 
expensive ff the arrays require resizing during the saving 
operation of the subgraphs. 
321 
rithm with one that uses a full copying method (i.e. his 
Eager Copying) which is already significantly slower 
than Wroblewski's algorithm. However, no compari- 
son is offered with prevailing unification schemes such 
as Wroblewski's. With the complexity for lazy evalu- 
ation and the memory consumed for delayed closures 
added, it is hard to estimate whether lazy unification 
runs considerably faster than Wroblewski's incremen- 
tal copying scheme, ~ 
5. Conclusion 
The algorithm introduced in this paper runs signifi- 
cantly faster than Wroblewski's algorithm using Ear- 
ley's parser and an HPSG based grammar developed 
at ATR. The gain comes from the fact that our algo- 
rithm does not create any over copies or early copies. 
In Wroblewski's algorithm, although over copies are 
essentially avoided, early copies (by our definition) 
are a significant problem because about 60 percent of 
unifications result in failure in a successful parse in 
our sample parses. The additional set-difference oper- 
ation required for incremental copying during unify2 
may also be contributing to the slower speed of Wrob- 
lewski's algorithm. Given that our sample grammar is 
relatively small, we would expect that the difference 
in the performance between the incremental copying 
schemes and ours will expand as the grammar size 
increases and both the number of failures ~ and the 
size of the wasted subgraphs of failed unifications be- 
come larger. Since our algorithm is essentially paral- 
lel, patallelization is one logical choice to pursue fur- 
ther speedup. Parallel processes can be continuously 
created as unifyl reeurses deeper and deeper without 
creating any copies by simply looking for a possible 
failure of the unification (and preparing for successive 
copying in ease unification succeeds). So far, we have 
completed a preliminary implementation on a shared 
memory parallel hardware with about 75 percent of 
effective parallelization rate. With the simplicity of 
our algorithm and the ease of implementing it (com- 
pared to both incremental copying schemes and lazy 
schemes), combined with the demonstrated speed of 
the algorithm, the algorithm could be a viable alterna- 
tive to existing unification algorithms used in current 
~That is, unless some new scheme for reducing exces- 
sive copying is introduced such as scucture-sharing of an 
unchanged shared-forest (\[Kogure, 1990\]). Even then, our 
criticism of the cost of delaying evaluation would still be 
valid. Also, although different in methodology from the way 
suggested by Kogure for Wroblewski's algorithm, it is possi- 
ble to at~in structure-sharing of an unchanged forest in our 
scheme as well. We have already developed a preliminary 
version of such a scheme which is not discussed in this paper. 
Z~For example, in our large-scale speech-to-speech trans- 
lation system under development, the USrate is estimated to 
be under 20%, i.e., over 80% of unifications are estimated to 
be failures. 
natural language systems. 
ACKNOWLEDGMENTS 
The author would like to thank Akira Kurematsu, 
Tsuyoshi Morimoto, Hitoshi Iida, Osamu Furuse, 
Masaaki Nagata, Toshiyuki Takezawa and other mem- 
bers of ATR and Masaru Tomita and Jaime Carbonell 
at CMU. Thanks are also due to Margalit Zabludowski 
and Hiroaki Kitano for comments on the final version 
of this paper and Takako Fujioka for assistance in im- 
plementing the parallel version of the algorithm. 
Appendix: Implementation 
The unification algorithms, Farley parser and the 
HPSG path equation to graph converter programs are 
implemented in CommonLisp on a Symbolics ma- 
chine. The preliminary parallel version of our uni- 
fication algorithm is currently implemented on a Se- 
quent/Symmetry closely-coupled shared-memory par- 
allel machine running Allegro CLiP parallel Common- 
Lisp. 

References 
\[Godden, 1990\] Godden, K. "Lazy Unification" In Proceed- 
ings of ACL-90, 1990. 
\[Karttunen, 1986\] Karttunen, L. "D-PATR: A Development 
Environment for Unificadon-Based Grammars". In Pro- 
ceedingsofCOLING-86, 1986. (Also, Report CSLL86-61 
Stanford University). 
\[Karttunen and Kay, 1985\] Karttunen, L. and M. Kay 
"Structure Sharing with Binary Trees". In Proceedings 
of ACL-85, 1985. 
\[Kasper, 1987\] Kasper, R. "A Unification Method for Dis- 
junctive Feature Descriptions'. In ProceedingsofACL-87, 
1987. 
\[Kogure, 1989\] Kogure, K. A Study on Feature Structures 
and Unification. ATR Technical Report. TR-1-0032,1988. 
\[Kogure, 1990\] Kogure, K. "Strategic Lazy Incremental 
Copy Graph Unification". In Proceedings of COLING-90, 
1990. 
\[Pereira, 1985\] Pereira, E "A Structure-Sharing Represen- 
tation for Unificadon-Based Grammar Formalisms". In 
Proceedings of ACL-85, 1985. 
\[Pollard and Sag, 1987\] Pollard, C. and I. Sag Information- 
based Syntax and Semantics. Vol 1, CSLI, 1987. 
\[Yoshimoto and Kogure, 1989\] Yoshimoto, K. and K. 
Kogure Japanese Sentence Analysis by means of Phrase 
Structure Grammar. ATR Technical Report. TR-1-0049, 
1989. 
\[Wroblewski, 1987\] Wroblewski, D."Nondestrucdve Graph 
Unification" In Proceedings of AAAI87, 1987. 
