File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2133_metho.xml
Size: 13,980 bytes
Last Modified: 2025-10-06 14:15:02
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2133"> <Title>Shalom Wintner (1997) An Abstract Machine for Unification Grammars -with Application to an HPSG Grammar for Hebrew. Ph.D. thesis, the Technion - Israel Institute of</Title> <Section position="4" start_page="807" end_page="808" type="metho"> <SectionTitle> 3 Abstract Machine for Attribute- </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="807" end_page="807" type="sub_section"> <SectionTitle> Value Logics </SectionTitle> <Paragraph position="0"> The Abstract Machine for Attribute-Value Logics (AMAVL) is the unification engine of the LiLFeS system. AMAVL provides (1) efficient representation of TFSs on the memory, and (2) compilation of TFSs into abstract machine codes.</Paragraph> </Section> <Section position="2" start_page="807" end_page="807" type="sub_section"> <SectionTitle> 3.1 Representation of a TFS on the Memory </SectionTitle> <Paragraph position="0"> AMAVL, as does LiLFeS, requires all TFSs to be totally well-typed 3. In other words, (1) the types and features should be explicitly declared, (2) appropriateness of specifications between types and features should be properly declared, and (3) all TFSs should follow these appropriateness specifications. Provided these requirements are satisfied, AMAVL efficiently represents a TFS in memory.</Paragraph> <Paragraph position="1"> The representation of a TFS on memory resembles the graph notation of a TFS; A node is represented by n+ 1 continuous data cells, where n is the number of features outgoing from the node. The first data cell contains the type of the node, and the rest of the data cells contain pointers to the values of the corresponding features.</Paragraph> <Paragraph position="2"> The merit of this representation is that feature names need not be represented in the TFS representation on memory. The requirements on a TFS guarantee that the kinds and number of features are statically defined and constant for a given type, therefore we can determine the offset of the pointer to a given feature only by referring to the type of the given node.</Paragraph> </Section> <Section position="3" start_page="807" end_page="808" type="sub_section"> <SectionTitle> 3.2 TFS as an Instruction Sequence </SectionTitle> <Paragraph position="0"> Unification is an operation defined between two TFSs. However, in most cases, one of the two TFSs is known in advance at compile-time. We can therefore compile the TFS into a sequence of specialized codes for unification, as illustrated in Figure 1, rather than using a general unification routine. The compiled unification codes are specialized for given specific TFSs and therefore much more efficient than a general unifier. This is because any general unifier has to traverse both TFSs each time unification occurs at run-time.</Paragraph> <Paragraph position="1"> Many studies have been reported for compiling unification of Prolog terms (for example, WAM (A'ft-Kaci, 1991)). However, the TFS unification is much more complex than Prolog-term unification, because (1) unification between different types may succeed due to the existence of a type hierarchy, and (2) features must be merged in the fixed-offset TFS representation on memory.</Paragraph> <Paragraph position="2"> AMAVL compiles a type hierarchy and prepares for the complex situations described above. A TFS itself is compiled into a sequence of four kinds of instructions: ADDNEW (the unification of TFS types), UNIF&quot;fVAR (creation of structuresharing), PUSH (feature traversing) and pop (end of PUSH block). These instructions refer to the compiled type hierarchy, if necessary.</Paragraph> <Paragraph position="3"> These operations are implemented following the original proposal of AMAVL. We also added several other instructions in our AMAVL implementation, such as initialization of instructions for successive unifications and combined instructions for reducing overhead.</Paragraph> </Section> </Section> <Section position="5" start_page="808" end_page="808" type="metho"> <SectionTitle> 4 LiLFeS System </SectionTitle> <Paragraph position="0"> The LiLFeS system is designed based on two abstract machines: AMAVL for TFS representation and unification procedures, and Warren's</Paragraph> </Section> <Section position="6" start_page="808" end_page="809" type="metho"> <SectionTitle> Abstract Machine (WAM) (Ai't-Kaci, 1991) for </SectionTitle> <Paragraph position="0"> control of execution of definite clause programs.</Paragraph> <Paragraph position="1"> In this section we describe the current status of the LiLFeS system and applications running on it.</Paragraph> <Paragraph position="2"> Thereafter, we discuss the performance of LiLFeS in our experiments.</Paragraph> <Section position="1" start_page="808" end_page="808" type="sub_section"> <SectionTitle> 4.1 Current Status of the LiLFeS System </SectionTitle> <Paragraph position="0"> The LiLFeS system is developed as a combination of AMAVL/WAM emulator, TFS compiler, and built-in support functions. They are all written in C++ with the source code of more than 25,000 lines (See Table 1). The source code can be compiled by GNU C++, and we have confirmed operation on Sun SunOS4/Solaris, DEC Digital UNIX, and Microsoft Windows.</Paragraph> <Paragraph position="1"> We have several practical applications on the LiLFeS system. We currently have several different parsers for HPSG and HPSG grammars of Japanese and English, as follows: * A underspecified Japanese grammar developed by our group (Mitsuisi, 1998). Lexicon consists of TFSs each of which has more than 100 nodes. The grammar can produce parse trees for 88% of the corpus of the real world texts (EDR Japanese corpus), 60% of which are given correct parse trees 4. This grammar is used for the experiments in the next section.</Paragraph> <Paragraph position="2"> * XHPSG, An HPSG English grammar (Tateisi, 1997). The grammar is converted from the XTAG grammar (XTAG group, 1995), which has more than 300,000 lexical entries.</Paragraph> <Paragraph position="3"> * A na'fve parser using a CYK-like algorithm.</Paragraph> <Paragraph position="4"> Although using a simple algorithm, the parser utilizes the full capabilities provided by LiLFeS, such as built-in predicates (TFS copy, array op- null rithm (Torisawa and Tsujii, 1996). This algorithm compiles an HPSG grammar into 2 parts: its CFG skeletons and a remaining part, and parses a sentence in two phases. Although the parser is not a complete implementation of the algorithm, its efficiency benefits from its 2phase parsing, which reduces the amount of unification. null These parsers and grammars are used for the performance evaluations in the next section.</Paragraph> </Section> <Section position="2" start_page="808" end_page="809" type="sub_section"> <SectionTitle> 4.2 Performance Evaluation </SectionTitle> <Paragraph position="0"> We evaluated the performance of the LiLFeS system over three aspects: Parsing performance of LiLFeS, comparison to other TFS systems, and comparison to different Prolog systems.</Paragraph> <Paragraph position="1"> Table 2 shows the performance of HPSG parsers on a real-world corpus. However, even with the sophisticated algorithm, the parsing speed is 3.5 times slower than intended. To achieve our goal, we need a drastic improvement of a performance.</Paragraph> <Paragraph position="2"> We therefore performed the following experiments to find out the problem.</Paragraph> <Paragraph position="3"> Table 3 shows the performance comparison to other TFS systems, ALE and ProFIT. Two grammars are used in the experiments: &quot;Simple&quot; is a small HPSG-like grammar written by our group, while &quot;HPSG&quot; is the small-lexicon HPSG grammar distributed with the ALE package. In the &quot;Simple&quot; experiments, the LiLFeS system is far more efficient than ALE, but is outperformed by ProFIT. However, in the &quot;HPSG&quot; experiment, which has to handle much more complex TFSs than &quot;Simple&quot; experiments, LiLFeS is clearly better than ProFIT.</Paragraph> <Paragraph position="4"> On the contrary, with simple data LiLFeS is relatively inefficient. Experiments in Table 4, which show comparisons to Prolog systems, show that the performance of LiLFeS is significantly worse than that of those Prolog systems.</Paragraph> <Paragraph position="5"> To summarize, the performance of LiLFeS is far more impressive when it has to handle complex TFSs. This fact indicates that the TFS engine in LiLFeS is efficient but that the other parts, i.e. the analysis will help to further optimize in the compilation process.</Paragraph> <Paragraph position="6"> These techniques are the basis of latest Prolog systems. It is therefore expected that LiLFeS augmented with these techniques becomes as efficient as commercially available Prolog systems.</Paragraph> </Section> <Section position="3" start_page="809" end_page="809" type="sub_section"> <SectionTitle> 5.2 Current Status of the LiLFeS Native- code Compiler </SectionTitle> <Paragraph position="0"> We are developing the LiLFeS native-code compiler in LiLFeS itself. This is because the best language that manipulates TFSs is LiLFeS; low-level languages, such as C, are not appropriate for TFS manipulation.</Paragraph> <Paragraph position="1"> Currently all of the basic components have been implemented. We are now working on further code optimizations and implementation of built-in functions on the native-code compiler.</Paragraph> </Section> <Section position="4" start_page="809" end_page="809" type="sub_section"> <SectionTitle> 5.3 Performance Evaluation of the LiL- FeS Native-Code Compiler </SectionTitle> <Paragraph position="0"> We evaluated the performance of the LiLFeS native-code compiler with the same experiments as used in Section 4.2. The results of the experiments are shown in Table 5 and Table 6.</Paragraph> <Paragraph position="1"> The results of the native-code compiler are significantly better than those of the emulator-based LiLFeS system. In particular, comparison to Prolog (Table 6) shows that the LiLFeS native-code compiler achieves a speedup of 20 to 30 times compared to emulator-based LiLFeS, and parts concerning LiLFeS as a general logic programming system, are not yet efficient enough.</Paragraph> <Paragraph position="2"> This means that, in order to improve the LiLFeS system as a whole, we have to include various optimization techniques already encoded in recent Prolog implementations.</Paragraph> <Paragraph position="3"> Thus we decided to redesign and optimize the whole system. The next section describes this optimized LiLFeS.</Paragraph> </Section> </Section> <Section position="7" start_page="809" end_page="810" type="metho"> <SectionTitle> 5 LiLFeS Native-Code Compiler </SectionTitle> <Paragraph position="0"> We are currently developing a native-code compiler of LiLFeS in order to attain maximum performance. This section at first describes the design policies of the compiler, and then, describes the current status of implementation. The results of the performance evaluations on the native-code compiler are also presented.</Paragraph> <Section position="1" start_page="809" end_page="810" type="sub_section"> <SectionTitle> 5.1 Design Policies of the LiLFeS Native- Code Compiler </SectionTitle> <Paragraph position="0"> The design policies for the LiLFeS native-code compiler are: * Native code output. We chose native-code compiling for optimal efficiency. Although this costs high for development, the resulting efficiency will compensate the cost.</Paragraph> <Paragraph position="1"> * Execution model close to a real machine. We designed the execution model by referring to the implementation of Aquarius Prolog (Van Roy, 1990), an optimizing native-code compiler for Prolog. Aquarius Prolog adopts an execution model with an instruction set that is fine-grained and close to an instruction set of a real machine. As a result, the output code can be optimized up to the real-machine instruction level. In particular, we fully redesigned the AMAVL instructions as fine-grained instructions, which allow extensive optimizations on compiled TFS code.</Paragraph> <Paragraph position="2"> * Static code analysis. The types of variables can be determined by analyzing the flow of data within a program. The result of this dataflow approaches to the native-code compiler versions of commercial Prolog systems. We can say that the bottleneck of the emulator-based LiLFeS system is effectively eliminated.</Paragraph> <Paragraph position="3"> The result of the comparison to other TFS systems (Table 5) shows a speedup of 3-5 times from the emulator-based LiLFeS. It is still slower than ProFIT + SICStus native-code compiler in some experiments, though the difference is very small. We think the reason is the different traversing order between ProFIT + SICStus (breadth-first) and LiLFeS native-code compiler (depth-first) 5. What is notable in those experiments is that the LiLFeS native code compiler shows a far better performance in the &quot;HPSG&quot; experiment than all other systems. Since the &quot;HPSG&quot; experiment focuses on the efficiency of TFS handling, this means that the native code compiler improves the TFS handling capability.</Paragraph> <Paragraph position="4"> We cannot yet perform the experiments on real-world text parsing, because the implementation of the native-code compiler is not completed. However, we can estimate the result from the experiment result on emulator-based LiLFeS (350 milliseconds with sophisticated algorithm) and speed ratio between emulator-based LiLFeS and native-code compiler (3 to 5 times speed-up). The estimated parsing time is 120ms - 70ms per sentence; so we can say that we will be able to achieve our goal of lOOms in the near future.</Paragraph> </Section> </Section> class="xml-element"></Paper>