File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/p04-1031_metho.xml

Size: 26,315 bytes

Last Modified: 2025-10-06 14:08:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1031">
  <Title>Balancing Clarity and Efficiency in Typed Feature Logic through Delaying</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1 Motivation
</SectionTitle>
    <Paragraph position="0"> By convention, current HPSGs consist, at the very least, of a deductive backbone of extended phrase structure rules, in which each category is a description of a typed feature structure (TFS), augmented with constraints that enforce the principles of grammar. These principles typically take the form of statements, &amp;quot;for all TFSs, holds,&amp;quot; where is usually an implication. Historically, HPSG used a much richer set of formal descriptive devices, however, mostly on analogy to developments in the use of types and description logics in programming language theory (A&amp;quot;it-Ka'ci, 1984), which had served as the impetus for HPSG's invention (Pollard, 1998). This included logic-programming-style relations (H&amp;quot;ohfeld and Smolka, 1988), a powerful description language in which expressions could denote sets of TFSs through the use of an explicit disjunction operator, and the full expressive power of implications, in which antecedents of the above-mentioned principles could be arbitrarily complex. null Early HPSG-based natural language processing systems faithfully supported large chunks of this richer functionality, in spite of their inability to handle it efficiently -- so much so that when the designers of the ERG set out to select formal descriptive devices for their implementation with the aim of &amp;quot;balancing clarity and efficiency,&amp;quot; (Flickinger, 2000), they chose to include none of these amenities. The ERG uses only phrase-structure rules and type-antecedent constraints, pushing all would-be description-level disjunctions into its type system or rules. In one respect, this choice was successful, because it did at least achieve a respectable level of efficiency. But the ERG's selection of functionality has acquired an almost liturgical status within the HPSG community in the intervening seven years.</Paragraph>
    <Paragraph position="1"> Keeping this particular faith, moreover, comes at a considerable cost in clarity, as will be argued below.</Paragraph>
    <Paragraph position="2"> This paper identifies what it is precisely about this extra functionality that we miss (modularity, Section 2), determines what it would take at a minimum computationally to get it back (delaying, Section 3), and attempts to measure exactly how much that minimal computational overhead would cost (about 4 s per delay, Section 4). This study has not been undertaken before; the ERG designers' decision was based on largely anecdotal accounts of performance relative to then-current implementations that had not been designed with the intention of minimizing this extra cost (indeed, the ERG baseline had not yet been devised).</Paragraph>
    <Paragraph position="3"> 2 Modularity: the cost in clarity Semantic types and inheritance serve to organize the constraints and overall structure of an HPSG grammar. This is certainly a familiar, albeit vague justification from programming languages research, but the comparison between HPSG and modern programming languages essentially ends with this statement.</Paragraph>
    <Paragraph position="4"> Programming languages with inclusional polymorphism (subtyping) invariably provide functions or relations and allow these to be reified as methods within user-defined subclasses/subtypes. In HPSG, however, values of features must necessarily be TFSs themselves, and the only method (implicitly) provided by the type signature to act on these values is unification. In the absence of other methods and in the absence of an explicit disjunction operator, the type signature itself has the responsibility of not only declaring definitional subfin-wh-fill-rel-clinf-wh-fill-rel-cl red-rel-cl simp-inf-rel-cl fin-hd-fill-ph inf-hd-fill-ph wh-rel-cl non-wh-rel-cl hd-fill-ph hd-comp-ph inter-cl rel-cl hd-adj-ph hd-nexus-ph  class relationships, but expressing all other nondefinitional disjunctions in the grammar (as subtyping relationships). It must also encode the necessary accoutrements for implementing all other necessary means of combination as unification, such as difference lists for appending lists, or the so-called qeq constraints of Minimal Recursion Semantics (Copestake et al., 2003) to encode semantic embedding constraints.</Paragraph>
    <Paragraph position="5"> Unification, furthermore, is an inherently nonmodular, global operation because it can only be defined relative to the structure of the entire partial order of types (as a least upper bound). Of course, some partial orders are more modularizable than others, but legislating the global form that type signatures must take on is not an easy property to enforce without more local guidance.</Paragraph>
    <Paragraph position="6"> The conventional wisdom in programming languages research is indeed that types are responsible for mediating the communication between modules. A simple type system such as HPSG's can thus only mediate very simple communication. Modern programming languages incorporate some degree of parametric polymorphism, in addition to subtyping, in order to accommodate more complex communication. To date, HPSG's use of parametric types has been rather limited, although there have been some recent attempts to apply them to the ERG (Penn and Hoetmer, 2003). Without this, one obtains type signatures such as Figure 1 (a portion of the ERG's for relative clauses), in which both the semantics of the subtyping links themselves (normally, subset inclusion) and the multi-dimensionality of the empirical domain's analysis erode into a collection of arbitrary naming conventions that are difficult to validate or modify.</Paragraph>
    <Paragraph position="7"> A more avant-garde view of typing in programming languages research, inspired by the Curry-Howard isomorphism, is that types are equivalent to relations, which is to say that a relation can mediate communication between modules through its arguments, just as a parametric type can through its parameters. The fact that we witness some of these mediators as types and others as relations is simply an intensional reflection of how the grammar writer thinks of them. In classical HPSG, relations were generally used as goals in some proof resolution strategy (such as Prolog's SLD resolution), but even this has a parallel in the world of typing. Using the type signature and principles of Figure 2, for ex-</Paragraph>
    <Paragraph position="9"> pend relation as sort resolution.</Paragraph>
    <Paragraph position="10"> ample, we can perform proof resolution by attempting to sort resolve every TFS to a maximally specific type. This is actually consistent with HPSG's use of feature logic, although most TFS-based NLP systems do not sort resolve because type inference under sort resolution is NP-complete (Penn, 2001).</Paragraph>
    <Paragraph position="11"> Phrase structure rules, on the other hand, while they can be encoded inside a logic programming relation, are more naturally viewed as algebraic generators. In this respect, they are more similar to the immediate subtyping declarations that grammar writers use to specify type signatures -- both chart parsing and transitive closure are instances of allsource shortest-path problems on the same kind of algebraic structure, called a closed semi-ring. The only notion of modularity ever proven to hold of phrase structure rule systems (Wintner, 2002), furthermore, is an algebraic one.</Paragraph>
    <Paragraph position="12"> 3 Delaying: the missing link of functionality If relations are used in the absence of recursive data structures, a grammar could be specified using relations, and the relations could then be unfolded off-line into relation-free descriptions. In this usage, relations are just macros, and not at all inefficient. Early HPSG implementations, however, used quite a lot of recursive structure where it did not need to be, and the structures they used, such as lists, buried important data deep inside substructures that made parsing much slower. Provided that grammar writers use more parsimonious structures, which is a good idea even in the absence of relations, there is nothing wrong with the speed of logic programming relations (Van Roy, 1990).</Paragraph>
    <Paragraph position="13"> Recursive datatypes are also prone to non-termination problems, however. This can happen when partially instantiated and potentially recursive data structures are submitted to a proof resolution procedure which explores the further instantiations of these structures too aggressively. Although this problem has received significant attention over the last fifteen years in the constraint logic programming (CLP) community, no true CLP implementation yet exists for the logic of typed feature structures (Carpenter, 1992, LTFS). Some aspects of general solution strategies, including incremental entailment simplification (A&amp;quot;it-Kaci et al., 1992), deterministic goal expansion (Doerre, 1993), and guard statements for relations (Doerre et al., 1996) have found their way into the less restrictive sorted feature constraint systems from which LTFS descended. The CUF implementation (Doerre et al., 1996), notably, allowed for delay statements to be attached to relation definitions, which would wait until each argument was at least as specific as some variable-free, disjunction-free description before resolving. null In the remainder of this section, a method is presented for reducing delays on any inequationfree description, including variables and disjunctions, to the SICStus Prolog when/2 primitive (Sections 3.4). This method takes full advantage of the restrictions inherent to LTFS (Section 3.1) to maximize run-time efficiency. In addition, by delaying calls to subgoals individually rather than the (universally quantified) relation definitions themselves,1 we can also use delays to postpone non-deterministic search on disjunctive descriptions (Section 3.3) and to implement complex-antecedent constraints (Section 3.2). As a result, this single method restores all of the functionality we were missing.</Paragraph>
    <Paragraph position="14"> For simplicity, it will be assumed that the target language of our compiler is Prolog itself. This is inconsequential to the general proposal, although implementing logic programs in Prolog certainly involves less effort.</Paragraph>
    <Paragraph position="15"> 1Delaying relational definitions is a subcase of this functionality, which can be made more accessible through some extra syntactic sugar.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Restrictions inherent to LTFS
</SectionTitle>
      <Paragraph position="0"> LTFS is distinguished by its possession of appropriateness conditions that mediate the occurrence of features and types in these records. Appropriateness conditions stipulate, for every type, a finite set of features that can and must have values in TFSs of that type. This effectively forces TFSs to be finitebranching terms with named attributes. Appropriateness conditions also specify a type to which the value of an appropriate feature is restricted (a value restriction). These conditions make LTFS very convenient for linguistic purposes because the combination of typing with named attributes allows for a very terse description language that can easily make reference to a sparse amount of information in what are usually extremely large structures/records: Definition: Given a finite meet semi-lattice of types, Type, a fixed finite set of features, Feat, and a countable set of variables, Var, is the least set of descriptions that contains:</Paragraph>
      <Paragraph position="2"> A nice property of this description language is that every non-disjunctive description with a non-empty denotation has a unique most general TFS in its denotation. This is called its most general satisfier. null We will assume that appropriateness guarantees that there is a unique most general type, Intro(F) to which a given feature, F, is appropriate. This is called unique feature introduction. Where unique feature introduction is not assumed, it can be added automatically in O(F T) time, whereF is the number of features and T is the number of types (Penn, 2001). Meet semi-latticehood can also be restored automatically, although this involves adding exponentially many new types in the worst case.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Complex Antecedent Constraints
</SectionTitle>
      <Paragraph position="0"> It will be assumed here that all complex-antecedent constraints are implicitly universally quantified, and are of the form:</Paragraph>
      <Paragraph position="2"> where ; are descriptions from the core description language, , and is drawn from a definite clause language of relations, whose arguments are also descriptions from . As mentioned above, the ERG uses the same form, but where can only be a type description, , and is the trivial goal, true.</Paragraph>
      <Paragraph position="3"> The approach taken here is to allow for arbitrary antecedents, , but still to interpret the implications of principles using subsumption by , i.e., for every TFS (the implicit universal quantification is still there), either the consequent holds, or the TFS is not subsumed by the most general satisfier of . The subsumption convention dates back to the TDL (Krieger and Sch&amp;quot;afer, 1994) and ALE (Carpenter and Penn, 1996) systems, and has earlier antecedents in work that applied lexical rules by subsumption (Krieger and Nerbone, 1991). The Con-Troll constraint solver (Goetz and Meurers, 1997) attempted to handle complex antecedents, but used a classical interpretation of implication and no deductive phrase-structure backbone, which created a very large search space with severe non-termination problems.</Paragraph>
      <Paragraph position="4"> Within CLP more broadly, there is some related work on guarded constraints (Smolka, 1994) and on inferring guards automatically by residuation of implicational rules (Smolka, 1991), but implicit universal quantification of all constraints seems to be unique to linguistics. In most CLP, constraints on a class of terms or objects must be explicitly posted to a store for each member of that class. If a constraint is not posted for a particular term, then it does not apply to that term.</Paragraph>
      <Paragraph position="5"> The subsumption-based approach is sound with respect to the classical interpretation of implication for those principles where the classical interpretation really is the correct one. For completeness, some additional resolution method (in the form of a logic program with relations) must be used. As is normally the case in CLP, deductive search is used alongside constraint resolution.</Paragraph>
      <Paragraph position="6"> Under such assumptions, our principles can be converted to: trigger( ) =)v^whenfs((v = );((v = )^ )) Thus, with an implementation of type-antecedent constraints and an implementation of whenfs/2 (Section 3.3), which delays the goal in its second argument until v is subsumed by (one of) the most general satisfier(s) of description , all that remains is a method for finding the trigger, the most efficient type antecedent to use, i.e., the most general one that will not violate soundness. trigger( ) can be defined as follows:</Paragraph>
      <Paragraph position="8"> where t and u are respectively unification and generalization in the type semi-lattice.</Paragraph>
      <Paragraph position="9"> In this and the next two subsections, we can use Figure 3 as a running example of the various stages of compilation of a typical complex-antecedent constraint, namely the Finiteness Marking Principle for German (1). This constraint is stated relative to the signature shown in Figure 4. The description to the left of the arrow in Figure 3 (1) selects TFSs whose substructure on the path SYNSEM:LOC:CAT satisfies two requirements: its HEAD value has type verb, and its MARKING value has type fin. The principle says that every TFS that satisfies that description must also have a SYNSEM: LOC: CAT: HEAD: VFORM value of type bse.</Paragraph>
      <Paragraph position="10"> To find the trigger in Figure 3 (1), we can observe that the antecedent is a feature value description (F: ), so the trigger is Intro(SYNSEM), the unique introducer of the SYNSEM feature, which happens to be the type sign. We can then transform this constraint as above (Figure 3 (2)). Theconsand goal operators in (2)-(5) are ALE syntax, used respectively to separate the type antecedent of a constraint from the description component of the consequent (in this case, just the variable, X), and to separate the description component of the consequent from its relational attachment. We know that any TFS subsumed by the original antecedent will also be subsumed by the most general TFS of type sign, because sign introduces SYNSEM.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Reducing Complex Conditionals
</SectionTitle>
      <Paragraph position="0"> Let us now implement our delay predicate, whenfs(V=Desc,Goal). Without loss of generality, it can be assumed that the first argument is actually drawn from a more general conditional language, including those of the form Vi = Desci closed under conjunction and disjunction. It can also be assumed that the variables of each Desci are distinct. Such a complex conditional can easily be converted into a normal form in which each atomic conditional contains a non-disjunctive description.</Paragraph>
      <Paragraph position="1"> Conjunction and disjunction of atomic conditionals then reduce as follows (using the Prolog convention of comma for AND and semi-colon for OR):</Paragraph>
      <Paragraph position="3"> ; true)).</Paragraph>
      <Paragraph position="4"> The binding of the variable Trigger is necessary to ensure that Goal is only resolved once in case the  goals for both conditionals eventually unsuspend. For atomic conditionals, we must thread two extra arguments, VsIn, and VsOut, which track which variables have been seen so far. Delaying on atomic type conditionals is implemented by a special whentype/3 primitive (Section 3.4), and feature descriptions reduce using unique feature introduction:</Paragraph>
      <Paragraph position="6"> whentype(Intro,V, (farg(F,V,FVal), whenfs(FVal=Desc,Goal,VsIn, VsOut))).</Paragraph>
      <Paragraph position="7"> farg(F,V,FVal) binds FVal to the argument position of V that corresponds to the feature F once V has been instantiated to a type for which F is appropriate.</Paragraph>
      <Paragraph position="8"> In the variable case, whenfs/4simply binds the variable when it first encounters it, but subsequent occurrences of that variable create a suspension using Prolog when/2, checking for identity with the previous occurrences. This implements a primitive delay on structure sharing (Section 3.4):</Paragraph>
      <Paragraph position="10"> VsOut=VsIn,V=X,call(Goal)).</Paragraph>
      <Paragraph position="11"> In practice, whenfs/2 can be partially evaluated by a compiler. In the running example, Figure 3, we can compile the whenfs/2 subgoal in (2) into simpler whentype/2subgoals, that delay until Xreaches a particular type. The second case of whenfs/4tells us that this can be achieved by successively waiting for the types that introduce each of the features, SYNSEM, LOC, and CAT. As shown in Figure 4, those types are sign, synsem and local, respectively (Figure 3 (3)).</Paragraph>
      <Paragraph position="12"> The description that CatVal is suspended on is a conjunction, so we successively suspend on each conjunct. The type that introduces both HEAD and MARKING is category (4). In practice, static analysis can greatly reduce the complexity of the resulting relational goals. In this case, static analysis of the type system tells us that all four of these whentype/2 calls can be eliminated (5), since X must be a sign in this context, synsem is the least appropriate type of any SYNSEM value, local is the least appropriate type of any LOC value, and category is the least appropriate type of any CAT value.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.4 Primitive delay statements
</SectionTitle>
      <Paragraph position="0"> The two fundamental primitives typically provided for Prolog terms, e.g., by SICStus Prolog when/2, are: (1) suspending until a variable is instantiated, and (2) suspending until two variables are equated or inequated. The latter corresponds exactly to structure-sharing in TFSs, and to shared variables in descriptions; its implementation was already discussed in the previous section. The former, if carried over directly, would correspond to delaying until a variable is promoted to a type more specific than ?, the most general type in the type semilattice. There are degrees of instantiation in LTFS, however, corresponding to long subtyping chains that terminate in ?. A more general and useful primitive in a typed language with such chains is suspending until a variable is promoted to a particular type. whentype(Type,X,Goal), i.e., delaying subgoal Goaluntil variable Xreaches Type, is then the non-universally-quantified cousin of the type-antecedent constraints that are already used in the ERG.</Paragraph>
      <Paragraph position="1"> How whentype(Type,X,Goal) is implemented depends on the data structure used for TFSs, but in Prolog they invariably use the underlying Prolog implementation of when/2. In ALE, for example, TFSs are represented with reference chains that extend every time their type changes. One can simply wait for a variable position at the end of this chain to be instantiated, and then compare the new type to Type. Figure 3 (6) shows a schematic representation of a sign-typed TFS with SYNSEM value SynVal, and two other appropriate feature values. Acting upon this as its second argument, the corresponding definition of whentype(Type,X,Goal) in Figure 3 (7) delays on the variable in the extra, fourth argument position. This variable will be instantiated to a similar term when this TFS promotes to a subtype of sign.</Paragraph>
      <Paragraph position="2"> As described above, delaying until the antecedent of the principle in Figure 3 (1) is true or false ultimately reduces to delaying until various feature values attain certain types using whentype/3. A TFS may not have substructures that are specific enough to determine whether an antecedent holds or not. In this case, we must wait until it is known whether the antecedent is true or false before applying the consequent. If we reach a deadlock, where several constraints are suspended on their antecedents, then we must use another resolution method to begin testing more specific extensions of the TFS in turn. The choice of these other methods characterizes a true CLP solution for LTFS, all of which are enabled by the method presented in this paper. In the case of the signature in Figure 4, one of these methods may test whether a marking-typed substructure is consistent with either fin or inf. If it is consistent with fin, then this branch of the search may unsuspend the Finiteness Marking Principle on a sign-typed TFS that contains this substructure.</Paragraph>
    </Section>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Measuring the cost of delaying
</SectionTitle>
    <Paragraph position="0"> How much of a cost do we pay for using delaying? In order to answer this question definitively, we would need to reimplement a large-scale grammar which was substantially identical in every way to the ERG but for its use of delay statements. The construction of such a grammar is outside the scope of this research programme, but we do have access to MERGE,2 which was designed to have the same extensional coverage of English as the ERG. Internally, the MERGE is quite unlike the ERG. Its TFSs are far larger because each TFS category carries inside it the phrase structure daughters of the rule that created it. It also has far fewer types, more feature values, a heavy reliance on lists, about a third as many phrase structure rules with daughter categories that are an average of 32% larger, and many more constraints. Because of these differences, this version of MERGE runs on average about 300 times slower than the ERG.</Paragraph>
    <Paragraph position="1"> On the other hand, MERGE uses delaying for all three of the purposes that have been discussed in this paper: complex antecedents, explicit whenfs/2 calls to avoid non-termination problems, and explicit whenfs/2 calls to avoid expensive non-deterministic searches. While there is currently no delay-free grammar to compare it to, we can pop open the hood on our implementation and measure delaying relative to other system functions on  MERGE. Times were measured on an HP Omnibook XE3 laptop with an 850MHz Pentium II processor and 512MB of RAM, running SICStus Prolog 3.11.0 on Windows 98 SE.</Paragraph>
    <Paragraph position="2"> cost of delaying is on a par with other system functions such as constraint enforcement and relational goal resolution, delaying takes between three and five times more of the percentage of sentence parse 2The author sincerely thanks Kordula DeKuthy and Detmar Meurers for their assistance in providing the version of MERGE (0.9.6) and its test suite (1347 sentences, average word length 6.3, average chart size 410 edges) for this evaluation. MERGE is still under development.</Paragraph>
    <Paragraph position="3"> time because it is called so often. This reflects, in part, design decisions of the MERGE grammar writers, but it also underscores the importance of having an efficient implementation of delaying for large-scale use. Even if delaying could be eliminated entirely from this grammar at no cost, however, a 6% reduction in parsing speed would not, in the present author's view, warrant the loss of modularity in a grammar of this size.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML