File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2159_metho.xml

Size: 18,709 bytes

Last Modified: 2025-10-06 14:15:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2159">
  <Title>An Efficient Parallel Substrate for Typed Feature Structures on Shared Memory Parallel Machines</Title>
  <Section position="3" start_page="968" end_page="969" type="metho">
    <SectionTitle>
2 Programmers' View
</SectionTitle>
    <Paragraph position="0"> From a programmers&amp;quot; point of view, the PSTFS mechanism is quite simple and natural, which is due to careful design for accomplishing high-performance and ease of progranmfing.</Paragraph>
    <Paragraph position="1"> Systems to be constructed on our PSTFS will include two different types of agents:  As illustrated in Figure 1, CAs have overall control of a system, including control of parallelism, and they behave as masters of CSAs. CSAs modify TFSs according to the orders from CAs. Note that CAs can neither modify nor generate TFSs by themselves.</Paragraph>
    <Paragraph position="2"> PSTFS has been implemented by combining two existing programming languages: the concurrent object-oriented programm, ng language ABCL/f (Taura, 1997) and the sequential programming language LiLFeS (Makino et as., 1998). CAs can be written in ABCL/f, while description of CSAs can be mainly written in LiLFeS.</Paragraph>
    <Paragraph position="3"> Figure 2 shows an example of a part of the PSTFS code. The task of this code is to concatenate the first and the second name in a given list. One of the CAs is called nameconcatenator. This specific CA gathers pairs of the first and last name by asking a CSA with the message solve-constraint('name(?)'). When the CSA receives this message, the argument 'name(?)' is treated as a Prolog query in LiLFeS 1, according to the program of a CSA ((A) of Figure 2). There are several facts with the predicate 'name'. When the goal 'name(?)' is processed by a CSA, all the possible answers defined by these facts are returned. The obtained pairs are stored in the variable F in the name-coneatenator ((C)in Figure 2).</Paragraph>
    <Paragraph position="4"> The next behavior of the name-eoncatenator agent is to create CAs (name-concatenator?F~) and to send the message solve with a to each created CA running in parallel.</Paragraph>
    <Paragraph position="5"> The message contains one of the TFSs in F.</Paragraph>
    <Paragraph position="6"> Each name-concatenator-sub asks a CSA to concatenate FIRST and LAST in a TFS. Then each CSA concatenates them using the defi- null concatenator-sub which had asked to do the job. Note that the name-concatenator-sub can ask any of the existing CSAs. All CSAs can basically perform concatenation in parallel and independent way. Then, the name-concatenator waits for the name-concatenator-sub to return concatenated names, and puts the return values into the variable R.</Paragraph>
    <Paragraph position="7"> The CA name-concatenator controls the over-all process. It controls parallelism by creating CAs and sending messages to them. On tile other hand, all the operations on TFSs are performed by CSAs when they are asked by CAs.</Paragraph>
    <Paragraph position="8"> Suppose that one is trying to implement a parsing system based oil PSTFS. The distinction between CAs and CSAs roughly corresponds to the distinction between an abstract parsing schema and application of phrase structure rules. Here, a parsing schema means a high-level description of a parsing algorithm in which the application of phrase structure rules is regarded as an atomic operation or a subroutine. This distinction is a minor factor in writing a sequential parser, but it has a major impact on a parallel environment.</Paragraph>
    <Paragraph position="9"> For instance, suppose that several distinct agents evoke applications of phrase structure rules against the same data simultaneously, and the applications are accompanied with destructive operations on the data. This can cause an anomaly, since the agents will modify the orig.inal data in unpredictable order and there is no way to keep consistency. In order to avoid this anomaly, one has to determine what is an atomic operation and provide a method to prevent the anomaly when atomic operations are evoked by several agents. In our framework, any action taken by CSAs is viewed as such an atomic operation and it is guaranteed that no anomaly occurs even if CSAs concurrently a LiLFeS supports definite clause programs, a TFS version of Horn clauses.  perform operations on the same data. This can be done by introducing copying of TFSs, which does not require any destructive operations. The details axe described in the next section. null The other implication of the distinction between CAs and CSAs is that this enables efficient communication between agents in a natural way. During parsing in HPSG, it is possible that TFSs with hundreds of nodes can be generated. Encoding such TFSs in a message and sending them in an efficient way are not trivial. PSTFS provides a communication scheme that enables efficient sending/receiving of such TFSs. This becomes possible because of the distinction of agents. In other words, since CAs cannot nmdify a TFS, CAs do not have to have a real image of TFSs. When CSAs return the results of computations to CAs, the CSAs send only an ID of a TFS. Only when the ID is passed to other CSAs and they try to modify a TFS with the ID, the actual transfer of the TFS's real image occurs. Since the transfer is carried out only between CSAs, it can be directly performed using a low level representation of TFSs used in CSAs in an efficient manner. Note that if CAs were to modify TFSs directly, this scheme could not have been used.</Paragraph>
  </Section>
  <Section position="4" start_page="969" end_page="970" type="metho">
    <SectionTitle>
3 Architecture
</SectionTitle>
    <Paragraph position="0"> This section explains the inner structure of PSTFS focusing on the execution mechanism of CSAs (See (Taura, 1997) for further detail on CAs). A CSA is implemented by modifying the abstract machine for TFSs (i.e., LiAM), originally designed for executing LiLFeS (Makino et al., 1998).</Paragraph>
    <Paragraph position="1"> The important constraint in designing the execution mechanism for CSAs is that TFSs generated by CSAs must be kept unmodified. This is because the TFSs must be used with several agents in parallel. If the TFS had been modified by a CSA and if other agents did not know the fact, the expected results could not have been obtained. Note that unification, which is  (i) Copying from shared heap o&amp;quot; :ii ::i:. iiiii:.i ~i:.iii~ i ::!~4~ii' .::iii: ~:~ ~~ii ~iii~ ~ ~!i i~l ~ i!~:~ :.:  a major operation on TFSs, is a destructive operation, and modifications are likely to occur while executing CSAs. Our execution mechanism handles this problem by letting CSAs copy TFSs generated by other CSAs at each time.</Paragraph>
    <Paragraph position="2"> Though this may not look like an efficient way at first glance, it has been performed efficiently by shared memory mechanisms and our copying methods.</Paragraph>
    <Paragraph position="3"> A CSA uses two different types of memory areas as its heap:</Paragraph>
    <Paragraph position="5"> A local heap is used for temporary operations during the computation inside a CSA. A CSA cannot read/write local heap of other CSAs. A shared heap is used as a medium of communication between CSAs, and it is realized on a shared memory. When a CSA completes a computation on TFSs, it writes the result on a shared heap. Since the shared heap can be read by any CSAs, each CSA can read the result performed by any other CSAs. However, the portion of a shared heap that the CSA can write to is limited. Any other CSA cannot write on that portion.</Paragraph>
    <Paragraph position="6"> Next, we look at the steps performed by a CSA when it is asked by CAs with a message.</Paragraph>
    <Paragraph position="7">  Note that the message only contains the IDs of the TFSs as described in the previous section. The IDs are realized as pointers on the shared heap.</Paragraph>
    <Paragraph position="8">  1. Copy TFSs pointed at by the IDs in the message from the shared heap to the local heap of the CSA. ((i) in Figure 4.) 2. Process a query using LiAM and the local heap. ((ii) in Figure 4.) 3. If a query has an answer, the result is copied to the portion of the shared heap writable by the CSA. Keep IDs on the copied TFSs. If there is no answer for the query, go to Step 5. ((iii) in Figure 4.) 4. Evoke backtracking in LiAM and go to Step 2.</Paragraph>
    <Paragraph position="9"> 5. Send the message, including the kept IDs,  back to the CA that had asked the task. Note that, in step 3, the results of the computation becomes readable by other CSAs. This procedure has the following desirable features. Simultaneous Copying An identical TFS on a shared heap can be copied by several CSAs simultaneouslv. This is due to our shared memory mecilanism and the prop-erty of LiAM that copying does not have any side-effect on TFSs 2.</Paragraph>
    <Paragraph position="10"> Simultaneous/Safe Writing CSAs can write on their own shared heap without the danger of accidental modification by other CSAs.</Paragraph>
    <Paragraph position="11"> Demand Driven Copying As described in the previous section, the transfer of real images of TFSs is performed only after the IDs of the TFSs reach to the CSAs requiring the TFSs. Redundant copying/sending of the TFSs' real image is reduced, and the transfer is performed efficiently by mechanisms originally provided by LiAM.</Paragraph>
    <Paragraph position="12"> With efficient data transfer in shared-memory machines, these features reduce the overhead of parallelization.</Paragraph>
    <Paragraph position="13"> Note that copying in the procedures makes it possible to support non-determinism in NLP systems. For instance, during parsing, intermediate parse trees must be kept. In a chart parsing for a unification-based grammar, generated 2Actually, this is not trivial. Copying in Step 3 normalizes TFSs and stores the TFSs into a continuous region on a shared heap. TFSs stored in such a way can be copied without any side-effect.</Paragraph>
    <Paragraph position="14"> edges are kept untouched, and destructive operations on the results must be done after copying them. The copying of TFSs in the above steps realizes such mechanisms in a natural way, as it is designed for efficient support for data sharing and destructive operations on shared heaps by parallel agents.</Paragraph>
  </Section>
  <Section position="5" start_page="970" end_page="972" type="metho">
    <SectionTitle>
4 Application and Performance
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="970" end_page="970" type="sub_section">
      <SectionTitle>
Evaluation
</SectionTitle>
      <Paragraph position="0"> This section describes two different types of HPSG parsers implemented on PSTFS. One is designed for our Japanese grammar and the algorithm is a parallel version of the CKY algorithm (Kasami, 1965). The other is a parser for an ALE-style Grammar (Carpenter and Penn, 1994). The algorithms of both parsers are based on parallel parsing algorithms for CFG (Ninomiya et al., 1997; Nijholt, 1994; Grishman and Chitrao, 1988; Thompson, 1994). Descriptions of both parsers are concise. Both of them are written in less than 1,000 lines. This shows that our PSTFS can be easily used. With the high performance of the parsers, this shows the feasibility and flexibility of our PSTFS.</Paragraph>
      <Paragraph position="1"> For simplicity of discussion, we assume that HPSG consists of lexical entries and rule schemata. Lexical entries can be regarded as TFSs assigned to each word. A rule schema is a rule in the form of z --- abe.., where z. a. b. c are TFSs.</Paragraph>
    </Section>
    <Section position="2" start_page="970" end_page="971" type="sub_section">
      <SectionTitle>
4.1 Parallel CKY-style HPSG Parsing
Algorithm
</SectionTitle>
      <Paragraph position="0"> A sequential CKY parser for CFG uses a data structure called a triangular table. Let Fi ~ denote a cell in the triangular table. Each cell Fi,~ has a set of the non-terminal symbols in CF~ that can generate the word sequence from the i + 1-th word to the j-th word in an input sentence. The sequential CKY algorithm computes each Fi,j according to a certain order.</Paragraph>
      <Paragraph position="1"> Our algorithm for a parallel CKY-style parser for HPSG computes each Fi,j in parallel. Note that Fi,j contains TFSs covering the word sequence from the i + 1-th word to the j-th word, not non-terminals. We consider only the rule schemata with a form of z ---* ab where z,a,b are TFSs. Parsing is started by a CA called PATCSCTC/. 7)ATiSCTC/ creates cell-agents Ci,j(O &lt;_ i &lt; j &lt;_ n) and distributes them to processors on a parallel machine (Figure 5). Each Ci,j computes Fi,j in parallel. More precisely, Ci,j(j - i = 1) looks up a dictionary and obtains lexical entries. Ci,j(j - i &gt; 1) waits for the messages including Fi,k and Fk,j for all k(i &lt; k &lt; j) from other cell-agents. When Ci,j receives Fi,k and Fk,jfor an arbitrary k, Ci,j computes TFSs b~ appl3ing rule schemata to each members of  ered to be naothers of members of Fi,k and Fkj and they are added to Fi,j. Note that these applications of rule schemata are done in parallel in several CSAs 3. Finally. when computation of Fi (using Fi k and Fk j for all k(i &lt; k &lt; j)) is completed, Ci, d\]strlbutes Fi, to other agents * J . . 3 waiting for Fij. Parsing \]s completed when the computation of F0 n is completed.</Paragraph>
      <Paragraph position="2"> We have done a series of experiments on a shared-memory parallel machine, SUN Ultra Enterprise 10000 consisting of 64 nodes (each node is a 250 MHz UltraSparc) and 6 GByte shared memory. The corpus consists of 879 random sentences from the EDR Japanese corpus written in Japanese (average length of sentences is 20.8) 4 . The grammar we used is an underspecified Japanese HPSG grammar (Mitsuishi et al., 1998) consisting of 6 ID-schemata and 39 lexical entries (assigned to functional words) and 41 lexical-entry-templates (assigned to parts of speech)* This grammar has wide coverage and high accuracy for real-world texts s. Table 1 shows the result and comparison with a parser written in LiLFeS. Figure 6 shows its speed-up. From the Figure 6, we observe that the maximum speedup reaches up to 12.4 times. The average parsing time is 85 msec per</Paragraph>
    </Section>
    <Section position="3" start_page="971" end_page="972" type="sub_section">
      <SectionTitle>
4.2 Chart-based Parallel HPSG
Parsing Algorithm for ALE
Grammar
</SectionTitle>
      <Paragraph position="0"> Next, we developed a parallel chart-based HPSG parser for an ALE-style grammar. The algorithm is based on a chart schema on which each agent throws active edges and inactive edges containing a TFS. When we regard the rule schemata as a set of rewriting rules in CFG, this algorithm is exactly the same as the Thompson's algorithm (Thompson, 1994) and similar to PAX (Matsumoto, 1987). The main difference between the chart-based parser and our CKY-style parser is that the ALE-style parser supports a n-branching tree.</Paragraph>
      <Paragraph position="1"> A parsing process is started by a CA called P.AT~SPST~. It .creates word-position agents :Pk(0 &lt; k &lt; n), distributes them to parallel processors and waits for them to complete their tasks. The role of the word-position agent Pk e Using 60 processors is worse than with 50 processors. In general, when the number of processes increases to near or more than the number of existing processors, context switch between processes occurs frequently on shared-memory parallel machines (many people can use the machines simultaneously). We believe the cause for the inefficiency when using 60 processors lies in such context switches.</Paragraph>
      <Paragraph position="2">  ~hort Length ~entences /i / kim beli ..... andy to walk a person whom he sees walks he is seen he persuades her to walk Don n Length ~entences  (I) e. person who sees klm who sees sandy whom he tries to see walks (2) a person who sees kim who sees sandy who sees kim whom he tries to see walks (3) a person who sees kim who sees sandy who sees kim who believes her to tend to walk walks  is to collect edges adjacent to the position k. A word-position agent has its own active edges and inactive edges. An active edge is in the form (i,z --. AoxB), where A is a set of TFSs which have already been unified with an existing constituents, B is a set of TFSs which have not been unified yet, and x is the TFS which can be unified with the constituent in an inactive edge whose left-side is in position k. Inactive edges are in the form (k,x,j), where kis the left-side position of the constituent x and j is the right-side position of the constituent x. That is, the set of all inactive edges whose left-side position is k are collected by T'k.</Paragraph>
      <Paragraph position="3"> In our algorithm, ~k is always waiting for either an active edge or an inactive edge, and performs the following procedure when receiving an edge.</Paragraph>
      <Paragraph position="4"> * When Pk receives an active edge (i,z --A o xB), 7-)k preserve the edge and tries to find the unifiable constituent with x from the set of inactive edges that :Pk has already received. If the unification succeeds, a new active edge (i,z ~ Ax o B) is created. If the dot in the new active edge reaches to the end of RHS (i.e. B = 0), a new inactive edge is created and is sent to :Pi. Otherwise the new active edge is sent to :Pj.</Paragraph>
      <Paragraph position="5"> * When Pk receives an inactive edge (k, x,j), :Pk preserves the edge and tries to find the unifiable constituent on the right side of the dot from the set of active edges that :Pk has already received. If the unification succeeds, a new active edge (i, z ---, Ax o B) is created. If the dot in the new active edge reaches to the end of RHS (i.e. B = 0), a new inactive edge is created and is sent to 7:)i. Otherwise the new active edge is sent to ~Oj.</Paragraph>
      <Paragraph position="6"> As long as word-position-agents follow these behavior, they can run in parallel without any other restriction.</Paragraph>
      <Paragraph position="7"> We have done a series of experiments in the same machine settings as the experiments with  based parallel HPSG parser the CKY-style HPSG parser. We measured both its speed up and real parsing time, and we compared our parallel parser with the ALE system and a sequential parser on LiLFeS. The grammar we used is a sample HPSG grammar attached to ALE system 7, which has 7 schemata and 62 lexical entries. The test corpus we used in this experiment is shown in the Table</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML