XML Viewer - c04-1009

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1009_metho.xml
Size: 22,331 bytes
Last Modified: 2025-10-06 14:08:40
<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1009">
  <Title>Type-inheritance Combinatory Categorial Grammar</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Implementation Details
</SectionTitle>
    <Paragraph position="0"> I assume for this paper a rudimentary understanding of CCG. TCCG encodes as usual a small set of simplex syntactic categories (S, N, NP, PP, and CONJ) from which complex categories are built via slash operators. For example, eat is assigned category (Sa1 NP)a2 NP, i.e. eat is a function from an NP to its right to a function from an NP to its left to S.</Paragraph>
    <Paragraph position="1"> The basic rule-set is outlined in (1):1  list of categories (e.g. X$ could be Xa7 Y, Xa4 Y, (Xa7 Y)a4 Z, etc.), a7 $ for a list of backward-slashed categories, and a4 $ for a list of forward-slashed categories. Subscripts indicate category identity, e.g. $a9 refers to the same list in all its uses in one category. Note that a0 B is generalized to allow for composition of a1 -ary functions (but currently only for a1a3a2a5a4 ), and a0 T is restricted to nominative subject NPs (the only place in English where it is important). Turning to encoding, I assume a sign-based packaging of syntactic and semantic information:2</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
DTRS Phrasal daughters
</SectionTitle>
    <Paragraph position="0"> Following Baldridge (2002), the root category is the final result of a category after all applications (e.g. S for a transitive verb (Sa1 NP)a2 NP) and defines the morphosyntactic features of a category. Ignoring the details of the category type hierarchy, simplex categories are atomic types and complex categories are feature structures with a simplex result and a list of arguments as illustrated in (3).</Paragraph>
    <Paragraph position="1">  Finally, I briefly discuss how TCCG deals with the so-called &amp;quot;spurious ambiguity&amp;quot; of CCG. The combinatory power of CCG allows for a potentially exponential number of parses for a given reading of a single string.3 A considerable amount of work 2In this paper I ignore the semantics of TCCG. It is worth noting that I do not adopt the a20 -calculus semantics typical of CCG but opt instead for the Minimal Recursion Semantics (MRS) (Copestake et al., 1999) native to the LKB.</Paragraph>
    <Paragraph position="2"> 3However, the so-called &amp;quot;spurious&amp;quot; parses are in fact motivated by intonational and information structural phrases, as argued by Steedman (2000), although TCCG does not implement any prosody information.</Paragraph>
    <Paragraph position="3"> has focused on spurious ambiguity and its effects on efficiency (see Karttunen 1986; see Vijay-Shankar and Weir 1990 for proof of a polynominal-time parsing algorithm and Clark and Curran 2004b for statistical models of CCG parsing), however most of these solutions are parser based. Rather than making proprietary modifications to the LKB's parser, I instead adopt Eisner's (1996) CCG normal form to eliminate spurious ambiguity. Eisner demonstrates that the parse forest assigned to a given string can be partitioned into semantic equivalence classes such that there is only one &amp;quot;canonical&amp;quot; (normal form) structure per equivalence class, where the normal form prefers application over B and right-branching a0 B over left-branching a0 B (and vice versa for a21 B).4 These preferences are statable as constraints on what may serve as the primary functors of different combinators. I implement this by assigning one of the values in (4) to the feature NF:  An NF value fc marks a sign as being the output of a0 B, bc as the output of a21 B, ot as a lexical item or the output of application, and tr as the output of T. The subtypes are disjunctive, so that fc-ot-tr is either a lexeme or the output of a0 B, application, or T. Each combinator constrains the NF features of its output and daughters to be of specific value. For example, to prefer right-branching a0 B over left-branching a0 B, a0 B is constrained as in (5). (5) (Xa2 Y)a22a24a23a26a25a28a27a30a29a31a25a32a29a34a33 Ya2 Z a35 Xa2 Za36a37a23 This constraint says that the output of a0 B is marked fc and its left daughter is bc-ot-tr, i.e. must be a lexical item or the output of a21 B, application, T, but not another a0 B (marked fc), thus ruling out left-branching a0 B over right-branching a0 B. Other combinators in (1) are constrained similarly. The cumulative effect results in only one &amp;quot;canonical&amp;quot; parse for each reading of a given string. For more discussion of the efficiency of this approach see Eisner (1996) and Clark and Curran (2004a). For purposes of TCCG, however, eliminating spurious ambiguity facilitates exploration of TCCG's hybrid nature by making direct comparisons possible between types of grammatical encoding in TCCG and more standard HPSG/CCG approaches, which I turn to next.</Paragraph>
    <Paragraph position="5"> a38 a3a40a39 and are thus ignored. I do augment Eisner's system by restricting T to only occur when needed for B.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 A Comparison of CCG and HPSG
</SectionTitle>
    <Paragraph position="0"> In this section I briefly review some major differences between CCG and HPSG. Both theories share roots in the same strand of lexicalist syntax, wherein grammatical information is lexically encoded and combination is category driven. While the two theories differ considerably in several fundamental ways, there are two key differences relevant to this discussion. The first is how categories are constructed. In CCG the restricted set of simplex categories, the means by which complex categories are built, and the generality of the combinators collectively yield a principled system that conforms strongly to the lexicalist assumption that all combinatory information is encoded categorially.</Paragraph>
    <Paragraph position="1"> HPSG, however, allows a wide range of simplex categories and no restrictions on types of rules, allowing uneven divisions of combinatory information between categories and constructions. In principle a CCG style category/combinatory system is possible in HPSG (as TCCG demonstrates), but in practice large scale HPSGs tend to represent information heterogeneously, making certain cross-cutting generalizations difficult to state, largely a result of the directions HPSG has taken as a research program.</Paragraph>
    <Paragraph position="2"> The second relevant difference between these theories is how categories are structured relative to one another. Traditionally, CCG offers no grammatical tools to statically relate categories. Instead, these relationships are left implicit even when linguistically relevant, only statable meta-theoretically. HPSG has from its inception employed multiple inheritance type hierarchies (e.g. as in (4)), where some of the grammatical information for a particular sign is inherited from its immediate supertype, which itself inherits grammatical information from which its supertype, and all types share inherited information with their sisters. The result is a richly structured set of relationships between linguistic units that reduces redundancy and can be exploited to state grammatical and typological generalizations.</Paragraph>
    <Paragraph position="3"> As noted in a0 1, the respective advantages of these theories are compatible, and much previous work has exploited this fact. Use of unification (a core operation in HPSG) in CG dates at least as far back as Karttunen (1986, 1989), Uszkoreit (1986), and Zeevat (1988). Work on incorporating inheritance hierarchies into CCG is relatively more recent. Most notably Villavicencio (2001) implements a hybrid CCG/HPSG grammar in the LKB for purposes of exploring a principles and parameters acquisition model, defining parameters in terms of underspecified type hierarchies that the learner makes more precise during the learning process.5 Moving 5Note that TCCG employs a different type of CG than beyond acquisition, Baldridge (2002) argues more generally for a type-hierarchy approach to the structure of a CCG lexicon so as to reduce redundancy and capture broader typological generalizations, although he does not explicitly flesh out this proposal.6 With TCCG I build directly on this previous work by applying Villavicenio's type inheritance techniques to the issues raised by Baldridge, addressing head on the advantages of a hybrid approach and comparing it to prior HPSG and CCG analyses. In the following sections I outline several case studies of this approach.7</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Advantages of TCCG over CCG
</SectionTitle>
    <Paragraph position="0"> I turn first to the use of type hierarchies and lexical mapping rules in TCCG and the elimination of redundancy this brings to CCG. Using as my case study the hierarchy of verbal signs, in CCG the following categories are assigned to various verb types (note that in TCCG CPs are categorially finite NPs):  (6) (a) Intransitive (sleep): Sa1 NP (b) Intransitive PP complement (speak (to)):  marily on Sag and Wasow (1999). This source was chosen for two reasons: (a) TCCG is primarily a proof-of-concept and thus a relatively constrained textbook grammar is ideally suited to exploring the issues addressed here and (b) a parallel HPSG implementation already exists that could provide for direct comparisons (although this is a matter of future work). However, development of TCCG has been informed by a wider range of work in CCG and HPSG and the conclusions I draw are applicable to both theories at large.</Paragraph>
    <Paragraph position="1"> Of course, several linguistically relevant relationships hold across these types, as shown in (7).  (7) (a) All verbs share morphosyntactic features.</Paragraph>
    <Paragraph position="2"> (b) All verbs have a leftward subject.</Paragraph>
    <Paragraph position="3"> (c) All verbs obey obliqueness hierarchies (NPs are closest to verbs, obliques further, modulo syntactic operations like heavy-NP shift).</Paragraph>
    <Paragraph position="4"> (d) All complements are rightward.</Paragraph>
    <Paragraph position="5"> (e) Barring morphosyntax, auxiliary and  control verbs share a category.</Paragraph>
    <Paragraph position="6"> While these generalizations are of course derivable meta-theoretically (from the categories in (6), there is no explicit mechanism in CCG for stating static relationships (there are mechanisms for deriving categories, which I discuss below). TCCG, however, captures (7) via a lexical type hierarchy, the subtype for transitive verbs given in (8).8  Each sign in TCCG is assigned a type in such a hierarchy, where relevant generalizations in super-types are inherited by subtypes. For example, the constraint that all verbs are rooted in S is stated on s-lxm, while the constraint that they all have left-</Paragraph>
    <Paragraph position="8"> Further specializations add additional information, for example tv-lxm adds information that there is at least one additional item in the valence of the verb ((Sa1 NP)a2 X$). This type hierarchy has several advantages. First, it significantly reduces redundancy, since each constraint relevant for multiple categories is (ideally) stated only once. Second, these types provide a locus for cross-linguistic typological generalizations, an advantage that goes beyond parsimony. For example, the slash-marking 8I use the following type abbreviations: s-lxm=lexeme rooted in S, n-lxm=lexeme rooted in N, verb-lxm=verb, tv=transitive verb, rcv=control verb, cptv=CP complement transitive verb, dtv=ditransitive verb, ptv=PP complement transitive verb, stv=strictly transitive verb, orc=object control verb, orv=object raising verb, ocv=object equi verb.</Paragraph>
    <Paragraph position="9"> constraint on verb-lxm in (9) defines English as an SV language. For a language like Irish this type could encode a general VS constraint (e.g. verb-lxm := Sa2 NP$). Thus the type hierarchy provides an explicit means for encoding broad typological parameters not directly statable in CCG (see Bender et al.</Paragraph>
    <Paragraph position="10"> 2002 for further discussion and Villavicencio 2001 on acquisition of word order parameters).</Paragraph>
    <Paragraph position="11"> However, even (6) is not exhaustive of all possible verbal categories, since each verb carries not just its &amp;quot;basic&amp;quot; category but also a cluster of other categories corresponding to various lexical operations. For example, give is associated with several categories, including but not limited to:  (10) (a) Double object: ((Sa1 NP)a2 NP)a2 NP (b) NP-PP complement: ((Sa1 NP)a2 PPa29 a27 )a2 NP (c) Passivized double object, no agent:  categories redundantly, although frequently these relationships are described via meta-rules (for instance as proposed by Carpenter 1992 and assumed implicitly in Steedman 2000). For instance, the meta-rule for dative shift could be stated as (11):</Paragraph>
    <Paragraph position="13"> This meta-rule simply says that any double-object verb will also have a dative-shifted category as well.</Paragraph>
    <Paragraph position="14"> The meta-rule approach is of course similar to the lexical mapping rules common in much HPSG literature (cf. Flickinger 1987, inter alia), and in fact the rule in (11) is implemented as in (12).</Paragraph>
    <Paragraph position="16"> However, the difference between meta-rules and lexical rules is that the latter are first-class grammatical entities and can themselves can be organized hierarchically in a way that eliminates redundancy and captures several linguistic generalizations. An illustrative example is the encoding of predicative XPs (Kim is happy/on time/the person who came).</Paragraph>
    <Paragraph position="17"> TCCG adopts the Pollard and Sag (1994) analysis that predicative (ad)nominals have the category (Sa1 NP) and thus are compatible with the selectional restrictions of be ((Sa1 NP)a2 (Sa1 NP)). A simple solution for generating predicative XPs is to derive  (13) (a) Predicative NPs: NPa5 $a9 a5 (Sa7 NPa5 )$a9 (b) Predicative adnominals: Na5a3a2 Na5 $a9 a5 (Sa7 NPa5 )$a9  These two rules clearly share a number of similarities that positing the two rules independently do not capture. In TCCG, however, the type hierarchy captures the larger similarities, where the rules for predicative NPs and predicative modifiers share a supertype that captures common information: (14) predicative predicative-np predicative-mod The type predicative encodes the general Nom$ a35 (Sa4a37a33a6a5a8a7a10a9 a1 NP)$ (Sa4a37a33a6a5a11a7a10a9 a1 NP)$ form of the rules; predicative-np and predicative-mod merely further specify the daughter category as in (13).</Paragraph>
    <Paragraph position="18"> Again, while many CCG approaches employ metarules, the type hierarchy of TCCG allows further generalizations even among such meta-rules. In sum, the use of type hierarchies and lexical rules results in a grammar where each lexical item has (ideally) one category, with shared information stated once. Additional categories are derived via mapping rules, themselves organized hierarchically, thus capturing a variety of cross-cutting generalizations.</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Advantages of TCCG over HPSG
</SectionTitle>
    <Paragraph position="0"> TCCG of course adopts wholesale the typeinheritance, unification based approach of HPSG, adding nothing new to the underlying framework.</Paragraph>
    <Paragraph position="1"> Nonetheless, by adopting a CCG style syntax TCCG makes possible more direct comparisons of the coverage and heavily lexical nature of standard CCG analyses to common HPSG approaches. Expanding the coverage over Sag and Wasow (1999), TCCG implements CCG analyses of a wide range of unbounded dependency phenomena (e.g. piedpiping, relative clauses, p-gaps, *that-t effects; see Sag 1997, Ginzburg and Sag 2000 for well worked out HPSG analyses). More generally, TCCG implements CCG analyses of non-constituent coordination (e.g. right node raising and argument cluster coordination), largely unanalyzed in HPSG (although see Yatabe 2002, Chrysmann 2003, Beavers and Sag to appear). These are all well-known advantage of CCG and I will not discuss them at length.</Paragraph>
    <Paragraph position="2"> In this section, however, I focus on how the fully lexical nature of TCCG simplifies the analysis of bare nominals, which in Ginzburg and Sag (2000) are analyzed constructionally: a plural/mass -N is pumped to an NP with appropriate semantics (although see Beavers 2003 for an alternative HPSG proposal without pumping). The motivation for a phrasal pumping rule is to ensure (a) that modifiers may modify the -N before the category is changed to NP and (b) that the added existential/generic quantifier outscopes all constituents of the -N. For instance, to build the NP happy dogs from Cleveland in HPSG lexically would generate a lexical NP dogs incompatible with the constraints on modifiers like happy (which have -N MOD values) and further would prevent the added quantifier to outscope the modifiers. However, a phrasal approach misses the broader generalization that these constructions are lexically triggered (by particular noun classes/inflection) and again heterogeneously spreads out language particular grammatical information between the lexicon and phrasal rules. At least in terms of parsimony a lexical rule approach would be preferred as it localizes the operation to one component of the grammar. CCG allows for such a fully lexical analysis of bare plurals. The relevant categories are shown in (15):  (15) (a) Nouns: N$ (b) Attributive adjectives: Na2 N$ (c) Attributive prepositions: Na1 Na2 NP (d) Relativizers: (((Na1 N)a2 $a0 )a2 (Sa2 $a0 a2 NP))$a12 (d) Determiners: NPa2 N  N, Adj, Rel, and P are all of form N$, with only Det rooted in NP. Adopting Carpenter's (1992) meta-rule analysis of bare NPs to TCCG, I analyze bare nominals via a simple HPSG-style lexical rule of the form in N$a0 a35 NP$a0 such that (ad)nominal can be pumped to a function rooted in NP (adding the appropriate quantificational semantics), essentially making them determiners. Thus when building a bare NP the pumped category is necessarily the final functor, ensuring no category mismatches and the correct semantics, as shown in (16).9  A variety of other phenomena have been implemented lexically in TCCG without the use of additional syntactic rules above and beyond the ones assumed above in a0 2, reducing the number of different kinds of syntactic and constructional rules common in HPSG analyses. Thus, TCCG validates and makes more accessible the possibilities of fully lexical CCG-style analyses in HPSG without modifying the underlying framework.</Paragraph>
    <Paragraph position="3"> 6 Advantages over both HPSG and CCG One advantage over both HPSG and CCG comes in the treatment of modifiers. In most HPSG literature modifiers form a heterogeneous class: due to the unconstrained possibilities of category formation, the HEAD category and the synsem in MOD are not inherently related and thus do not necessarily allow for any further generalizations. In CCG, however, modifiers all have the general form Xa1 X$, where X is typically a basic category (Adjs are of category Na2 N$, Ps are Na1 N$, Advs are Sa1 S$ (ignoring VP-Advs)). Yet this generalization is not codifiable in CCG terms and each modifier must redundantly encode the same form. In TCCG, however, I posit a type xp-mod-lxm that characterizes these generalizations over modifiers of basic categories:  Here the category and morphosyntactic features of the first argument are shared with the result, with the rest of the arguments left underspecified, capturing the general nature of modifiers in TCCG.10 The advantage to the type hierarchy here is that most of the relevant information about each kind of modifier is now only stated once. Subtypes of this type 10This is a simplification of the approach actually implemented in TCCG, which enriches the slash values of all categories with modalities indicating the &amp;quot;semantic&amp;quot; headedness of the category, following Baldridge (2002) and Kruijff (2001), providing further generalizations over modifiers, but the details are irrelevant for this discussion.</Paragraph>
    <Paragraph position="4"> need only add relevant additional information, for instance the supertype of all adjectives, adj-lxm, inherits from both xp-mod-lxm (meaning it's a modifier) and nom-lxm (meaning it's rooted in N), adding only the constraint that slash in Xa1 X$ be forward:</Paragraph>
    <Paragraph position="6"> Transitive and intransitive subtypes of adj-lxm further specialize the $, and similar structuring of information occurs for all other modifier types.</Paragraph>
    <Paragraph position="7"> Thus the commonalities and differences of a wide variety of modifiers are captured in terms of type hierarchies, potentially with typological advantages.</Paragraph>
    <Paragraph position="8"> In Romance languages such as Spanish, where adnominal modifiers are overwhelmingly post-head, the directionality constraint for adjectives in (18) could instead be stated as a default on a higher supertype of all adnominals (where the few exceptions lexically override the default). Again, these types of constraints are not possible in most HPSG or CCG implementations. CCG without type hierarchies lacks the language in which such generalizations can be stated. Instead modifiers only form a class meta-theoretically with shared information stated redundantly. On the other hand, most HPSG approaches typically do not offer a sufficiently constrained set of category types to state generalizations over modifiers. Generalizations over modifier classes must be stated heterogeneously as a combination of lexical marking and pre- and post-head adjunct constructions (or alternatively stated in terms of independent linear precedence rules (Kathol, 2000)). Thus combining these approaches yields potential not easily realizable separately.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML