File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/97/j97-3004_abstr.xml
Size: 16,489 bytes
Last Modified: 2025-10-06 13:48:56
<?xml version="1.0" standalone="yes"?> <Paper uid="J97-3004"> <Title>An Efficient Implementation of the Head-Corner Parser</Title> <Section position="2" start_page="0" end_page="428" type="abstr"> <SectionTitle> 1. Motivation </SectionTitle> <Paragraph position="0"> In this paper I discuss in full detail the implementation of the head-corner parser.</Paragraph> <Paragraph position="1"> But first I describe the motivations for this approach. I will start with considerations that lead to the choice of a head-driven parser; I will then argue for Prolog as an appropriate language for the implementation of the head-corner parser.</Paragraph> <Section position="1" start_page="0" end_page="426" type="sub_section"> <SectionTitle> 1.1 Head-driven Processing </SectionTitle> <Paragraph position="0"> Lexicalist grammar formalisms, such as Head-driven Phrase Structure Grammar (HPSG), have two characteristic properties: (1) lexical elements and phrases are associated with categories that have considerable internal structure, and (2) instead of construction-specific rules, a small set of generic rule schemata is used. Consequently, the set of constituent structures defined by a grammar cannot be read off the rule set directly, but is defined by the interaction of the rule schemata and the lexical categories. Applying standard parsing algorithms to such grammars is unsatisfactory for a number of reasons. Earley parsing is intractable in general, as the rule set is simply too general. For some grammars, naive top-down prediction may even fail to terminate.</Paragraph> <Paragraph position="1"> Alfa-informatica & BCN. E-mail: vannoord@let.rug.nl (~ 1997 Association for Computational Linguistics Computational Linguistics Volume 23, Number 3 Shieber (1985) therefore proposes a modified version of the Earley parser, using restricted top-down prediction. While this modification often leads to better termination properties of the parsing method, in practice it easily leads to a complete trivialization of the top-down prediction step, thus leading to inferior performance.</Paragraph> <Paragraph position="2"> Bottom-up parsing is far more attractive for lexicalist formalisms, as it is driven by the syntactic information associated with lexical elements, but certain inadequacies remain. Most importantly, the selection of rules to be considered for application may not be very efficient. Consider, for instance, the following Definite Clause Grammar (DCG) rule: s(\[\] ,Sem) --> Arg, vp(\[Arg\] ,Sem). (1) A parser in which application of a rule is driven by the left-most daughter, as it is for instance in a standard bottom-up active chart parser, will consider the application of this rule each time an arbitrary constituent Arg is derived. For a bottom-up active chart parser, for instance, this may lead to the introduction of large numbers of active items. Most of these items will be useless. For instance, if a determiner is derived, there is no need to invoke the rule, as there are simply no VP's selecting a determiner as subject. Parsers in which the application of a rule is driven by the right-most daughter, such as shift-reduce and inactive bottom-up chart parsers, encounter a similar problem for rules such as: vp(As,Sem) --> vp(\[ArgIAs\],Sem), Arg. (2) Each time an arbitrary constituent Arg is derived, the parser will consider applying this rule, and a search for a matching VP-constituent will be carried out. Again, in many cases (if Arg is instantiated as a determiner or preposition, for instance) this search is doomed to fail, as a VP subcategorizing for a category Arg may simply not be derivable by the grammar. The problem may seem less acute than that posed by uninstantiated left-most daughters for an active chart parser, as only a search of the chart is carried out and no additional items are added to it. Note, however, that the amount of search required may grow exponentially, if more than one uninstantiated daughter is present:</Paragraph> <Paragraph position="4"> as appears to be the case for some of the rule-schemata used in HPSG.</Paragraph> <Paragraph position="5"> Several authors have suggested parsing algorithms that may be more suitable for lexicalist grammars. Kay (1989) discusses the concept of head-driven parsing. The key idea is that the linguistic concept head can be used to obtain parsing algorithms that are better suited for typical natural language grammars. Most linguistic theories assume that among the daughters introduced by a rule there is one daughter that can be identified as the head of that rule. There are several criteria for deciding which daughter is the head, two of which seem relevant for parsing. First of all, the head of a rule determines to a large extent what other daughters may or must be present, as the head selects the other daughters. Second, the syntactic category and morphological properties of the mother node are, in the default case, identical to the category and morphological properties of the head daughter. These two properties suggest that it may be possible to design a parsing strategy in which one first identifies a potential head of a rule, before starting to parse the nonhead daughters. By starting with the van Noord Efficient Head-Corner Parsing head, important information about the remaining daughters is obtained. Furthermore, since the head is to a large extent identical to the mother category, effective top-down identification of a potential head should be possible.</Paragraph> <Paragraph position="6"> In Kay (1989) two different head-driven parsers are presented. First, a head-driven shift-reduce parser is presented, which differs from a standard shift-reduce parser in that it considers the application of a rule (i.e., a reduce step) only if a category matching the head of the rule has been found. Furthermore, it may shift onto the parse-stack elements that are similar to the active items (or &quot;dotted rules&quot;) of active chart parsers. By using the head of a rule to determine whether a rule is applicable, the head-driven shift-reduce parser avoids the disadvantages of parsers in which either the left-most or right-most daughter is used to drive the selection of rules. Kay also presents a head-corner parser. The striking property of this parser is that it does not parse a phrase from left to right, but instead operates bidirectionally. It starts by locating a potential head of the phrase and then proceeds by parsing the daughters to the left and the right of the head. Again, this strategy avoids the disadvantages of parsers in which rule selection is uniformly driven by either the left-most or right-most daughter. Furthermore, by selecting potential heads on the basis of a head-corner table (comparable to the left-corner table of a left-corner parser) it may use top-down filtering to minimize the search-space. This head-corner parser generalizes the left-corner parser. Kay's presentation is reminiscent of the left-corner parser as presented by Pereira and Shieber (1987), which itself is a version without memorization of the BUP parser (Matsumoto et al. 1983).</Paragraph> <Paragraph position="7"> Head-corner parsing has also been considered elsewhere. In Satta and Stock (1989), Sikkel and op den Akker (1992, 1993), and Sikkel (1993), chart-based head-corner parsing for context-free grammar is considered. It is shown that, in spite of the fact that bidirectional parsing seemingly leads to more overhead than left-to-right parsing, the worst-case complexity of a head-corner parser does not exceed that of an Earley parser. Some further variations are discussed in Nederhof and Satta (1994).</Paragraph> <Paragraph position="8"> In van Noord (1991, 1993) I argue that head-corner parsing is especially useful for parsing with nonconcatenative grammar formalisms. In Lavelli and Satta (1991) and van Noord (1994) head-driven parsing strategies for Lexicalized Tree Adjoining Grammars are presented.</Paragraph> <Paragraph position="9"> The head-corner parser is closely related to the semantic-head-driven generation algorithm (see Shieber et al. \[1990\] and references cited there), especially in its purely bottom-up incarnation.</Paragraph> </Section> <Section position="2" start_page="426" end_page="427" type="sub_section"> <SectionTitle> 1.2 Selective Memorization </SectionTitle> <Paragraph position="0"> The head-corner parser is in many respects different from traditional chart parsers. An important difference follows from the fact that in the head-corner parser only larger chunks of computation are memorized. Backtracking still plays an important role for the implementation of search.</Paragraph> <Paragraph position="1"> This may come as a surprise at first. Common wisdom is that although small grammars may be successfully treated with a backtracking parser, larger grammars for natural languages always require the use of a data structure such as a chart or a table of items to make sure that each computation is only performed once. In the case of constraint-based grammars, however, the cost associated with maintaining such a chart should not be underestimated. The memory requirements for an implementation of the Earley parser for a constraint-based grammar are often outrageous. Similarly, in an Earley deduction system too much effort may be spent on small portions of computation, which are inexpensive to (re-)compute anyway.</Paragraph> <Paragraph position="2"> For this reason, I will argue for an implementation of the head-corner parser in Computational Linguistics Volume 23, Number 3 which only large chunks of computation are memorized. In linguistic terms, I will argue for a model in which only maximal projections are memorized. The computation that is carried out in order to obtain such a chunk uses a depth-first backtrack search procedure. This solution dramatically improves upon the (average case) memory requirements of a parser; moreover it also leads to an increase in (average case) time efficiency, especially in combination with goal weakening, because of the reduced overhead associated with the administration of the chart. In each of the experiments discussed in Section 7, the use of selective memorization with goal weakening out-performs standard chart-parsers.</Paragraph> </Section> <Section position="3" start_page="427" end_page="427" type="sub_section"> <SectionTitle> 1.3 Why Prolog </SectionTitle> <Paragraph position="0"> Prolog is a particularly useful language for the implementation of a head-corner parser for constraint-based grammars because: * Prolog provides a built-in unification operation.</Paragraph> <Paragraph position="1"> * Prolog provides a built-in backtrack search procedure; memorization can be applied selectively.</Paragraph> <Paragraph position="2"> * Underspecification can be exploited to obtain results required by certain techniques for robust parsing.</Paragraph> <Paragraph position="3"> * Prolog is a high-level language; this enables the application of partial evaluation techniques.</Paragraph> <Paragraph position="4"> The first consideration does not deserve much further attention. We want to exploit the fact that the primary data structures of constraint-based grammars and the corresponding information-combining operation can be modeled by Prolog's first order terms and unification.</Paragraph> <Paragraph position="5"> As was argued above, Prolog backtracking is not used to simulate an iterative procedure to build up a chart via side-effects. On the contrary, Prolog backtracking is used truly for search. Of course, in order to make this approach feasible, certain well-chosen search goals are memorized. This is clean and logically well-defined (consider, for example, Warren \[1992\]), even if our implementation in Prolog uses extra-logical predicates.</Paragraph> <Paragraph position="6"> The third consideration is relevant only for robust parsing. In certain methods in robust parsing, we are interested in the partial results obtained by the parser. To make sure that a parser is complete with respect to such partial results, it is often assumed that a parser must be applied that works exclusively bottom-up. In Section 6 it will be shown that the head-corner parser, which uses a mixture of bottom-up and top-down processing, can be applied in a similar fashion by using underspecification in the top goal. Clearly, underspecification is a concept that arises naturally in Prolog. The fact that Prolog is a high-level language has a number of practical advantages related to the speed of development. A further advantage is obtained because techniques such as partial evaluation can be applied. For example, I have successfully applied the Mixtus partial evaluator (Sahlin 1991) to the head-corner parser discussed below, to obtain an additional 20% speed increase. In languages such as C, partial evaluation does not seem to be possible because the low-levelness of the language makes it impossible to recognize the concepts that are required.</Paragraph> </Section> <Section position="4" start_page="427" end_page="428" type="sub_section"> <SectionTitle> 1.4 Left-Corner Parsing and Head-Corner Parsing </SectionTitle> <Paragraph position="0"> As the names suggest, there are many parallels between left-corner and head-corner parsing. In fact, head-corner parsing is a generalization of left-corner parsing. Many van Noord Efficient Head-Corner Parsing of the techniques that will be described in the following sections can be applied to a left-corner parser as well.</Paragraph> <Paragraph position="1"> A head-corner parser for a grammar in which for each rule the left-most daughter is considered to be the head, will effectively function as a left-corner parser. In such cases, the head-corner parser can be said to run in left-corner mode. Of course, in a left-corner parser, certain simplifications are possible. Based on the experiments discussed in Section 7, it can be concluded that a specialized left-corner parser is only about 10% faster than a head-corner parser running in left-corner mode. This is an interesting result: a head-corner parser performs at least almost as well as a left-corner parser, and, as some of the experiments indicate, often better.</Paragraph> </Section> <Section position="5" start_page="428" end_page="428" type="sub_section"> <SectionTitle> 1.5 Practical Relevance of Head-Corner Parsing: Efficiency and Robustness </SectionTitle> <Paragraph position="0"> The head-corner parser is one of the parsers that is being developed as part of the NWO Priority Programme on Language and Speech Technology. An overview of the Programme can be found in Boves et al. (1995). An important goal of the Programme is the implementation of a spoken dialogue system for public transport information (the OVIS system). The language of the system is Dutch.</Paragraph> <Paragraph position="1"> In the context of the OVIS system, it is important that the parser can deal with input from the speech recognizer. The interface between the speech recognizer and the parser consists of word-graphs. In Section 5, I show how the head-corner parser is generalized to deal with word-graphs.</Paragraph> <Paragraph position="2"> Moreover, the nature of the application also dictates that the parser proceeds in a robust way. In Section 6, I discuss the OVIS Robustness component, and I show that the use of a parser that includes top-down prediction is not an obstacle to robustness. In Section 7, I compare the head-corner parser with the other parsers implemented in the Programme for the OVIS application and show that the head-corner parser operates much faster than implementations of a bottom-up Earley parser and related chart-based parsers. Moreover, the space requirements are far more modest. The difference with a left-corner parser, which was derived from the head-corner parser, is small.</Paragraph> <Paragraph position="3"> We performed similar experiments for the Alvey NL Tools grammar of English (Grover, Carroll, and Briscoe 1993), and the English grammar of the MiMo2 system (van Noord et al. 1991). From these experiments it can be concluded that selective memorization with goal-weakening (as applied to head-corner and left-corner parsing) is substantially more efficient than conventional chart parsing. We conclude that at least for some grammars, head-corner parsing is a good option.</Paragraph> </Section> </Section> class="xml-element"></Paper>