File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/p00-1045_intro.xml

Size: 5,552 bytes

Last Modified: 2025-10-06 14:00:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="P00-1045">
  <Title>Memory-E cient and Thread-Safe Quasi-Destructive Graph Uni cation</Title>
  <Section position="3" start_page="4" end_page="4" type="intro">
    <SectionTitle>
2 Algorithm
</SectionTitle>
    <Paragraph position="0"> The key to the solution of all of the above-mentioned issues is to separate the scratch elds from the elds that actually make up the de nition of the graph. The resulting data structures are shown in Figure 3.</Paragraph>
    <Paragraph position="1"> We have taken Tomabechi's quasi-destructive graph uni cation algorithm as the starting point (Tomabechi, 1995), because it is often considered to be the fastest uni cation algo- null reusable scratch elds. In the permanent structures we use o sets. Scratch structures use index values (including arcs recorded in comp-arc list). Our implementation derives o sets from index values stored in nodes.</Paragraph>
    <Paragraph position="2"> rithm for uni cation-based grammar parsing (see e.g. (op den Akker et al., 1995)). We have separated the scratch elds needed for uni cation from the scratch elds needed for copying.2 We propose the following technique to associate scratch structures with nodes. We take an array of scratch structures. In addition, for each graph we assign each node a unique index number that corresponds to an element in the array. Di erent graphs typically share the same indexes. Since uni cation involves two graphs, we need to ensure that two nodes will not be assigned the same scratch structure. We solve this by interleaving the index positions of the two graphs. This mapping is shown in Figure 4. Obviously, the minimum number of elements in the table is two times the number of nodes of the largest graph. To reduce the table size, we allow certain nodes to be deprived of scratch structures. (For example, we do not forward atoms.) We denote this with a valuation function v, which returns 1 if the node is assigned an index and 0 otherwise.</Paragraph>
    <Paragraph position="3"> We can associate the index with a node by including it in the node structure. For structure sharing, however, we have to use o sets between nodes (see Figure 4), because other- null numbers with nodes. The numbers in the nodes represent the index number. Arcs are associated with o sets. Negative o sets indicate a reentrancy.</Paragraph>
    <Paragraph position="4"> sets can be easily derived from index values in nodes. As storing o sets in arcs consumes more memory than storing indexes in nodes (more arcs may point to the same node), we store index values and use them to compute the o sets. For ease of reading, we present our algorithm as if the o sets were stored instead of computed. Note that the small index values consume much less space than the scratch elds they replace.</Paragraph>
    <Paragraph position="5"> The resulting algorithm is shown in Figure 5. It is very similar to the algorithm in (Tomabechi, 1991), but incorporates our indexing technique. Each reference to a node now not only consists of the address of the node structure, but also its index in the table. This is required because we cannot derive its table index from its node structure alone. The second argument of Copy indicates the next free index number. Copy returns references with an o set, allowing them to be directly stored in arcs. These o sets will be negative when Copy exits at line 2.2, resembling a reentrancy. Note that only AbsArc explicitly de nes operations on o sets. AbsArc computes a node's index using its parent node's index and an o set.</Paragraph>
    <Paragraph position="6">  1. (dg;idx) Dereference(ref in) 2. if v(dg) = 1 and cptab[idx].copy6= nil then 2.1. (dg1;idx1) cptab[idx].copy 2.2. return (dg1;idx1 new idx + 1) 3. newcopy new Node 4. newcopy.type dg.type 5. if v(dg) = 1 then cptab[idx].copy (newcopy;new idx) 6. count v(newcopy)e 7. if dg.type = atomic then  aWe assign even and odd indexes to the nodes of dg1 and dg2, respectively. bTables only needs to be cleared up to point where uni cation failed. cCompare indexes to allow more powerful structure sharing. Note that indexes uniquely identify a node in the case that for all nodes n holds v(n) = 1. dNote that we are multiplying the o set by 2 to account for the interleaved o sets of the left and right graph. eWe assume it is known at this point whether the new node requires an index number. fNote that ref contains an index, whereas ref1 contains an o set. gIf the node was already copied (in which case it is &lt; 0), we need not reserve indexes.  fwtab and cptab|which represent the forward table and copy table, respectively|are de ned as global variables. In order to be thread safe, each thread needs to have its own copy of these tables.</Paragraph>
    <Paragraph position="7"> Contrary to Tomabechi's implementation, we invalidate scratch elds by simply resetting them after a uni cation completes. This simpli es the algorithm. We only reset the table up to the highest index in use. As table entries are roughly lled in increasing order, there is little overhead for clearing unused elements. null A nice property of the algorithm is that indexes identify from which input graph a node originates (even=left, odd=right). This information can be used, for example, to selectively share nodes in a structure sharing scheme. We can also specify additional scratch elds or additional arrays at hardly any cost. Some of these abilities will be used in the enhancements of the algorithm we will discuss next.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML