File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/p00-1045_metho.xml
Size: 6,765 bytes
Last Modified: 2025-10-06 14:07:22
<?xml version="1.0" standalone="yes"?> <Paper uid="P00-1045"> <Title>Memory-E cient and Thread-Safe Quasi-Destructive Graph Uni cation</Title> <Section position="4" start_page="4" end_page="7" type="metho"> <SectionTitle> 3 Enhancements </SectionTitle> <Paragraph position="0"> Structure Sharing Structure sharing is an important technique to reduce memory usage. We will adopt the same terminology as Tomabechi in (Tomabechi, 1992). That is, we will use the term feature-structure sharing when two arcs in one graph converge to the same node in that graph (also refered to as reentrancy) and data-structure sharing when arcs from two di erent graphs converge to the same node.</Paragraph> <Paragraph position="1"> The conditions for sharing mentioned in (Tomabechi, 1992) are: (1) bottom and atomic nodes can be shared; (2) complex nodes can be shared unless they are modi ed. We need to add the following condition: (3) all arcs in the shared subgraph must have the same o sets as the subgraph that would have resulted from copying. A possible violation of this constraint is shown in Figure 6. As long as arcs are processed in increasing order of index number,3 this condition can only be violated in case of reentrancy. Basically, the condition can be violated when a reentrancy points past a node that is bound to a larger subgraph.</Paragraph> <Paragraph position="2"> 3This can easily be accomplished by xing the order in which arcs are stored in memory. This is a good idea anyway, as it can speedup the ComplementArcs and IntersectArcs operations.</Paragraph> <Paragraph position="4"> be shared, as this would cause the arc labeled F to derive an index colliding with node q.</Paragraph> <Paragraph position="5"> Contrary to many other structure sharing schemes (like (Malouf et al., 2000)), our algorithm allows sharing of nodes that are part of the grammar. As nodes from the di erent input graphs are never assigned the same table entry, they are always bound independently of each other. (See the footnote for line 3 of Unify1.) The sharing version of Copy is similar to the variant in (Tomabechi, 1992). The extra check can be implemented straightforwardly by comparing the old o set with the o set for the new nodes. Because we derive the o sets from index values associated with nodes, we need to compensate for a di erence between the index of the shared node and the index it should have in the new graph. We store this information in a specialized share arc. We need to adjust Unify1 to handle share arcs accordingly.</Paragraph> <Paragraph position="6"> Deferred Copying Just as we use a table for uni cation and copying, we also use a table for subsumption checking. Tomabechi's algorithm requires that the graph resulting from uni cation be copied before it can be used for further processing. This can result in super uous copying when the graph is subsumed by an existing graph. Our technique allows subsumption to use the bindings generated by Unify1 in addition to its own table. This allows us to defer copying until we completed subsumption checking.</Paragraph> <Paragraph position="7"> Packed Nodes With a straightforward implementation of our algorithm, we obtain a node size of 8 bytes.4 By dropping the concept of a xed node size, we can reduce the size of atom and bottom nodes to 4 bytes.</Paragraph> <Paragraph position="8"> Type information can be stored in two bits.</Paragraph> <Paragraph position="9"> We use the two least signi cant bits of pointers (which otherwise are 0) to store this type information. Instead of using a pointer for the value eld, we store nodes in place. Only for reentrancies we still need pointers. Complex nodes require 8 bytes, as they include a pointer to the rst node past its children (necessary for uni cation). This scheme requires some extra logic to decode nodes, but signi cantly reduces memory consumption.</Paragraph> </Section> <Section position="5" start_page="7" end_page="7" type="metho"> <SectionTitle> 4 Experiments </SectionTitle> <Paragraph position="0"> We have tested our algorithm with a medium-sized grammar for Dutch. The system was implemented in Objective-C using a xed arity graph representation. We used a test set of 22 sentences of varying length. Usually, approximately 90% of the uni cations fails. On average, graphs consist of 60 nodes. The experiments were run on a Pentium III 600EB (256 KB L2 cache) box, with 128 MB memory, running Linux.</Paragraph> <Paragraph position="1"> We tested both memory usage and execution time for various con gurations. The results are shown in Figure 7 and 8. It includes a version of Tomabechi's algorithm. The node size for this implementation is 20 bytes.</Paragraph> <Paragraph position="2"> For the proposed algorithm we have included several versions: a basic implementation, a packed version, a version with deferred copying, and a version with structure sharing.</Paragraph> <Paragraph position="3"> The basic implementation has a node size of 8 bytes, the others have a variable node size.</Paragraph> <Paragraph position="4"> Whenever applicable, we applied the same optimizations to all algorithms. We also tested the speedup on a dual Pentium II 266 Mhz.5 Each processor was assigned its own scratch tables. Apart from that, no changes to the 5These results are scaled to re ect the speedup relative to the tests run on the other machine.</Paragraph> <Paragraph position="5"> algorithm were required. For more details on the multi-processor implementation, see (van Lohuizen, 1999).</Paragraph> <Paragraph position="6"> The memory utilization results show significant improvements for our approach.6 Packing decreased memory utilization by almost 40%. Structure sharing roughly halved this once more.7 The third condition prohibited sharing in less than 2% of the cases where it would be possible in Tomabechi's approach.</Paragraph> <Paragraph position="7"> increase execution times. Our algorithm even scrapes o roughly 7% of the total parsing time. This speedup can be attributed to improved cache utilization. We veri ed this by running the same tests with cache disabled.</Paragraph> <Paragraph position="8"> This made our algorithm actually run slower than Tomabechi's algorithm. Deferred copying did not improve performance. The additional overhead of dereferencing during subsumption was not compensated by the savings on copying. Structure sharing did not signi cantly alter the performance as well. Although, this version uses less memory, it has to perform additional work.</Paragraph> <Paragraph position="9"> Running the same tests on machines with less memory showed a clear performance advantage for the algorithms using less memory, because paging could be avoided.</Paragraph> </Section> class="xml-element"></Paper>