XML Viewer - p02-1042

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/p02-1042_intro.xml
Size: 3,941 bytes
Last Modified: 2025-10-06 14:01:29
<?xml version="1.0" standalone="yes"?>
<Paper uid="P02-1042">
  <Title>Building Deep Dependency Structures with a Wide-Coverage CCG Parser</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Most recent wide-coverage statistical parsers have used models based on lexical dependencies (e.g.</Paragraph>
    <Paragraph position="1"> Collins (1999), Charniak (2000)). However, the dependencies are typically derived from a context-free phrase structure tree using simple head percolation heuristics. This approach does not work well for the long-range dependencies involved in raising, control, extraction and coordination, all of which are common in text such as the Wall Street Journal.</Paragraph>
    <Paragraph position="2"> Chiang (2000) uses Tree Adjoining Grammar as an alternative to context-free grammar, and here we use another &amp;quot;mildly context-sensitive&amp;quot; formalism, Combinatory Categorial Grammar (CCG, Steedman (2000)), which arguably provides the most linguistically satisfactory account of the dependencies inherent in coordinate constructions and extraction phenomena. The potential advantage from using such an expressive grammar is to facilitate recovery of such unbounded dependencies. As well as having a potential impact on the accuracy of the parser, recovering such dependencies may make the output more useful.</Paragraph>
    <Paragraph position="3"> CCG is unlike other formalisms in that the standard predicate-argument relations relevant to interpretation can be derived via extremely non-standard surface derivations. This impacts on how best to define a probability model for CCG, since the &amp;quot;spurious ambiguity&amp;quot; of CCG derivations may lead to an exponential number of derivations for a given constituent. In addition, some of the spurious derivations may not be present in the training data. One solution is to consider only the normal-form (Eisner, 1996a) derivation, which is the route taken in Hockenmaier and Steedman (2002b).1 Another problem with the non-standard surface derivations is that the standard PARSEVAL performance measures over such derivations are uninformative (Clark and Hockenmaier, 2002). Such measures have been criticised by Lin (1995) and Carroll et al. (1998), who propose recovery of headdependencies characterising predicate-argument relations as a more meaningful measure.</Paragraph>
    <Paragraph position="4"> If the end-result of parsing is interpretable predicate-argument structure or the related dependency structure, then the question arises: why build derivation structure at all? A CCG parser can directly build derived structures, including long1Another, more speculative, possibility is to treat the alternative derivations as hidden and apply the EM algorithm. Computational Linguistics (ACL), Philadelphia, July 2002, pp. 327-334. Proceedings of the 40th Annual Meeting of the Association for range dependencies. These derived structures can be of any form we like--for example, they could in principle be standard Penn Treebank structures.</Paragraph>
    <Paragraph position="5"> Since we are interested in dependency-based parser evaluation, our parser currently builds dependency structures. Furthermore, since we want to model the dependencies in such structures, the probability model is defined over these structures rather than the derivation.</Paragraph>
    <Paragraph position="6"> The training and testing material for this CCG parser is a treebank of dependency structures, which have been derived from a set of CCG derivations developed for use with another (normal-form) CCG parser (Hockenmaier and Steedman, 2002b).</Paragraph>
    <Paragraph position="7"> The treebank of derivations, which we call CCGbank (Hockenmaier and Steedman, 2002a), was in turn derived (semi-)automatically from the hand-annotated Penn Treebank.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML