XML Viewer - w02-0111

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-0111_metho.xml
Size: 18,767 bytes
Last Modified: 2025-10-06 14:07:57
<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-0111">
  <Title>Lexicalized Grammar 101</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 CS 533
</SectionTitle>
    <Paragraph position="0"> NLP at Rutgers is taught as part of the graduate artificial intelligence (AI) sequence in the computer science department. As a prerequisite, computer science students are expected to be familiar with prob-July 2002, pp. 77-84. Association for Computational Linguistics. Natural Language Processing and Computational Linguistics, Philadelphia, Proceedings of the Workshop on Effective Tools and Methodologies for Teaching abilistic and decision-theoretic modeling (including statistical classification, hidden Markov models and Markov decision processes) from the graduate-level AI foundations class. They might take NLP as a preliminary to research in dialogue systems or in learning for language and information--or simply to fulfill the breadth requirement of MS and PhD degrees.</Paragraph>
    <Paragraph position="1"> Students from a number of other departments frequently get involved in natural language research, however, and are also welcome in 533; on average, only about half the students in 533 come from computer science. Students from the linguistics department frequently undertake computational work as a way of exploring practical learnability as a constraint on universal grammar, or practical reasoning as a constraint on formal semantics and pragmatics.</Paragraph>
    <Paragraph position="2"> The course also attracts students from Rutgers's library and information science department, its primary locus for research in information retrieval and human-computer interaction. Ambitious undergraduates can also take 533 their senior year; most participate in the interdisciplinary cognitive science undergraduate major. 533 is the only computational course in natural language at Rutgers.</Paragraph>
    <Paragraph position="3"> Overall, the course is structured into three modules, each of which represents about fifteen hours of in-class lecture time.</Paragraph>
    <Paragraph position="4"> The first module gives a general overview of language use and dialogue applications. Lectures follow (Clark, 1996), but instill the practical methodology for specifying and constructing knowledge-based systems, in the style of (Brachman et al., 1990), into the treatment of communication. Concurrently, students explore precise descriptions of their intuitions about language and communication through a series of short homework exercises.</Paragraph>
    <Paragraph position="5"> The second module focuses on general techniques for linguistic representation and implementation, using TAGLET. With an extended TAGLET project, conveniently implemented in stages, we use basic tree operations to introduce Prolog programming, including data structures, recursion and abstraction much as outlined in (Sterling and Shapiro, 1994); then we write a simple chart parser with incremental interpretation, and a simple communicative-intent generator scaled down after (Stone et al., 2001).</Paragraph>
    <Paragraph position="6"> The third module explores the distinctive problems of specific applications in NLP, including spoken dialogue systems, information retrieval and text classification, spelling correction and shallow tagging applications, and machine translation. Jurafsky and Martin (2000) is our source-book. Concurrently, students pursue a final project, singly or in crossdisciplinary teams, involving a more substantial and potentially innovative implementation.</Paragraph>
    <Paragraph position="7"> In its overall structure, the course seems quite successful. The initial emphasis on clarifying intuitions about communication puts students on an even footing, as it highlights important ideas about language use without too much dependence on specialized training in language or computation. By the end of the class, students are able to build on the more specifically computational material to come up with substantial and interesting final projects. In Spring 2002 (the first time this version of 533 was taught), some students looked at utterance interpretation, response generation and graphics generation in dialogue interaction; explored statistical methods for word-sense disambiguation, summarization and generation; and quantified the potential impact of NLP techniques on information tasks. Many of these results represented fruitful collaborations between students from different departments.</Paragraph>
    <Paragraph position="8"> Naturally, there is always room for improvement, and the course is evolving. My presentation of TAGLET here, for example, represents as much a project for the next run of 533 as a report of this year's materials; in many respects, TAGLET actually emerged during the semester as a dynamic reaction to the requirements and opportunities of a six-week module on general techniques for linguistic representation and implementation.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Language and Computation in NLP
</SectionTitle>
    <Paragraph position="0"> In a survey course for a broad, research-oriented audience, like CS 533 at Rutgers, a module on linguistic representation must orient itself to central ideas about computation. 533 may be the first and last place linguistics or information science students encounter concepts of specification, abstraction, complexity and search in class-work. The students who attack interdisciplinary research with success will be the ones who internalize and draw on these concepts, not those who merely hack proficiently. At the same time, computer scientists also can benefit from an emphasis on computational fundamentals; it means that they are building on and reinforcing their expertise in computation in exploring its application to language. Nevertheless, NLP is not compiler construction. Programming assignments should always underline a worthwhile linguistic lesson, not indulge in implementation for its own sake.</Paragraph>
    <Paragraph position="1"> This perspective suggests a number of desiderata for the grammar formalism for a survey course in NLP.</Paragraph>
    <Paragraph position="2"> Tree rewriting. Students need to master recursive data-structures and programming. NLP directs our attention to the recursive structures of linguistic syntax. In fact, by adopting a grammar formalism whose primitives operate on these structures as first-class objects, we can introduce a rich set of relatively straightforward operations to implement, and motivate them by their role in subsequent programs.</Paragraph>
    <Paragraph position="3"> Lexicalization. Students need to distinguish between specification and implementation, and to understand the barriers of abstraction that underlie the distinction. Lexicalized grammars come with a ready notion of abstraction. From the outside, abstractly, a lexicalized grammar analyzes each sentence as a simple combination of atomic elements from a lexicon of options. Simultaneously, a concrete implementation can assign complex structures to the atomic elements (elementary trees) and implement complex combinatory operations.</Paragraph>
    <Paragraph position="4"> Strong competence implementation. Students need to understand how natural language must and does respond to the practical logic of physical realization, like all AI (Agre, 1997). Mechanisms that use grammars face inherent computational problems and natural grammars in particular must respond to these problems: students should undertake implementations which directly realize the operations of the grammar in parsing and generation. But these must be effective programs that students can build on--our time and interest is too scarce for extensive reimplementations.</Paragraph>
    <Paragraph position="5"> Simplicity. Where possible, linguistic proposals should translate readily to the formalism. At the same time, students should be able to adapt aspects of the formalism to explore their own judgments and ideas. Where possible, students should get intuitive and satisfying results from straightforward algorithms implemented with minimal bookkeeping and case analysis. At the same time, there is no reason why the formalism should not offer opportunities for meaningful optimization.</Paragraph>
    <Paragraph position="6"> We cannot expect any formalism to fare perfectly by all these criteria--if any does, it is a deep fact about natural language! Still, it is worth remarking just how badly these criteria judge traditional unification-based context-free grammars (CFGs), as presented in say (Pereira and Shieber, 1987). Data-structures are an afterthought in CFGs; CFGs cannot in principle be lexicalized; and, whatever their merits in parsing or recognition, CFGs set up a positively abysmal search space for meaningful generation tasks.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="1" type="metho">
    <SectionTitle>
4 TAGLET
TAGLET
</SectionTitle>
    <Paragraph position="0"> is my response to the objectives motivated in Section 2 and outlined in Section 3. TAGLET represents my way of distilling the essential linguistic and computational insights of lexicalized tree-adjoining grammar--LTAG (Joshi et al., 1975; Schabes, 1990)--into a form that students can easily realize in end-to-end implementations.</Paragraph>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
4.1 Overview
</SectionTitle>
      <Paragraph position="0"> Like LTAG, TAGLET analyzes sentences as a complex of atomic elements combined by two kinds of operations, complementation and modification.Abstractly, complementation combines a head with an argument which is syntactically obligatory and semantically dependent on the head. Abstractly, modification combines a head with an adjunct which is syntactically optional and need not involve any special semantic dependence. Crucially for generation, in a derivation, modification and complementation operations can apply to a head in any order, often yielding identical structures in surface syntax. This means the generator can provide required material first, then elaborate it, enabling use of grammar in high-level tasks such as the planning of referring expressions or the &amp;quot;aggregation&amp;quot; of related semantic material into a single complex sentence.</Paragraph>
      <Paragraph position="1"> Concretely, TAGLET operations are implemented by operations that rewrite trees. Each lexical element is associated with a fragmentary phrase- null structure tree containing a distinguished word called the anchor. For complementation, TAGLET adopts TAG's substitution operation; substitution replaces a leaf node in the head tree with the phrase structure tree associated with the complement. See Figure 1. For modification, TAGLET adopts the the sister-adjunction operation defined in (Rambow et al., 1995); sister-adjunction just adds the modifier subtree as a child of an existing node in the head tree--either on the left of the head (forward sisteradjunction) as in Figure 2, or on the right of the head (backward sister-adjunction). I describe TAGLET formally in Appendix A.</Paragraph>
      <Paragraph position="2"> TAGLET is equivalent in weak generative power to context-free grammar. That is, any language defined by a TAGLET also has a CFG, and any language defined by a CFG also has a TAGLET. On the other hand context-free languages can have derivations in which all lexical items are arbitrarily far from the root; TAGLET derived structures always have an anchor whose path to the root of the sentence has a fixed length given by a grammatical element. See Appendix B. The restriction seems of little linguistic significance, since any tree-bank parse induces a unique TAGLET grammar once you label which child of each node is the head, which are complements and which are modifiers. Indeed, since TAGLET thus induces bigram dependency structures from trees, this invites the estimation of probability distributions on TAGLET derivations based on  observed bigram dependencies; see (Chiang, 2000).</Paragraph>
      <Paragraph position="3"> To implement an effective TAGLET generator, you can perform a greedy head-first search of derivations guided by heuristic progress toward achieving communicative goals (Stone et al., 2001). Meanwhile, because TAGLET is context-free, you can easily write a CKY-style dynamic programming parser that stores structures recognized for spans of text in a chart, and iteratively combines structures in adjacent spans until the analyses span the entire sentence. (More complexity would be required for multiply-anchored trees, as they induce discontinuous constituents.) The simple requirement that operations never apply inside complements or modifiers, and apply left-to-right within a head, suffices to avoid spurious ambiguity. See Appendix C.</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
4.2 Examples
</SectionTitle>
      <Paragraph position="0"> With TAGLET, two kinds of examples are instructive: those where TAGLET can mirror TAG, and those where it cannot. For the first case, consider an analysis of Chris loves Sandy madly by the trees of Figure 3. The final structure is:  For the second case, consider the embedded question who Chris thinks Sandy likes. The usual TAG analysis uses the full power of adjunction. TAGLET requires the use of one of the familiar context-free filler-gap analyses, as perhaps that suggested by the trees in Figure 4, and their composition:  The use of syntactic features amounts to an intermediate case. In TAGLET derivations (unlike in TAG) nodes accrete children during the course of a derivation but are never rewritten or split. Thus, we can decorate any TAGLET node with a single set of syntactic features that is preserved throughout the derivation. Consider the trees for he knows below:  When these trees combine, we can immediately unify the number Y of the verb with the pronoun's singular; we can immediately unify the case X of the pronoun with the nominative assigned by the verb:  The feature values will be preserved by further steps of derivation.</Paragraph>
    </Section>
    <Section position="3" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
4.3 Building on TAGLET
</SectionTitle>
      <Paragraph position="0"> Semantics and pragmatics are crucial to NLP.</Paragraph>
      <Paragraph position="1"> TAGLET lets students explore meaty issues in semantics and pragmatics, using the unification-based semantics proposed in (Stone and Doran, 1997). We view constituents as referential, or better, indexical; we link elementary trees with constraints on these indices and conjoin the constraints in the meaning of a compound structure. This example shows how the strategy depends on a rich ontology:  The example also shows how the strategy lets us quickly implement, say, the constraint-satisfaction approaches to reference resolution or the plan-recognition approaches to discourse integration described in (Stone and Webber, 1998).</Paragraph>
    </Section>
    <Section position="4" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
4.4 Lectures and Assignments
</SectionTitle>
      <Paragraph position="0"> Here is a plan for a six-week TAGLET module. The first two weeks introduce data structures and recursive programming in Prolog, with examples drawn from phrase structure trees and syntactic combination; and discuss dynamic-programming parsers, with an aside on convenient implementation using Prolog assertion. As homework, students implement simple tree operations, and build up to definitions of substitution and modification for parsing and generation; they use these combinatory operations to write a CKY TAGLET parser.</Paragraph>
      <Paragraph position="1"> The next two weeks begin with lectures on the lexicon, emphasizing abstraction on the computational side and the idiosyncrasy of lexical syntax and the indexicality of lexical semantics on the linguistic side; and continue with lectures on semantics and interpretation. Meanwhile, students add reference resolution to the parser, and implement routines to construct grammars from tree-bank parses.</Paragraph>
      <Paragraph position="2"> The final two weeks cover generation as problemsolving, and search through the grammar. Students reuse the grammar and interpretation model they already have to construct a generator.</Paragraph>
      <Paragraph position="3"> 5Conclusion Important as they are, lexicalized grammars can be forbidding. Versions of TAG and combinatory categorial grammars (CCG) (Steedman, 2000), as presented in the literature, require complex book-keeping for effective computation. When I wrote a CCG parser as an undergraduate, it took me a whole semester to get an implemented handle on the metatheory that governs the interaction of (crossing) composition or type-raising with spurious ambiguity; I still have never written a TAG parser or a CCG generator. Variants of TAG like TIG (Schabes and Waters, 1995) or D-Tree grammars (Rambow et al., 1995) are motivated by linguistic or formal considerations rather than pedagogical or computational ones. Other formalisms come with linguistic assumptions that are hard to manage. Link grammar (Sleator and Temperley, 1993) and other pure dependency formalisms can make it difficult to explore rich hierarchical syntax and the flexibility of modification; HPSG (Pollard and Sag, 1994) comes with a commitment to its complex, rather bewildering regime for formalizing linguistic information as feature structures. Of course, you probably could refine any of these theories to a simple core--and would get something very like TAGLET.</Paragraph>
      <Paragraph position="4"> I strongly believe that this distillation is worth the trouble, because lexicalization ties grammar formalisms so closely to the motivations for studying language in the first place. For linguistics, this philosophy invites a fine-grained description of sentence syntax, in which researchers document the diversity of linguistic constructions within and across languages, and at the same time uncover important generalizations among them. For computation, this philosophy suggests a particularly concrete approach to language processing, in which the information a system maintains and the decisions it takes ultimately always just concern words. In taking TAGLET as a starting point for teaching implementation in NLP, I aim to expose a broad range of students to a lexicalized approach to the cognitive science of human language that respects and integrates both linguistic and computational advantages.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML