File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/p02-1043_intro.xml
Size: 2,089 bytes
Last Modified: 2025-10-06 14:01:31
<?xml version="1.0" standalone="yes"?> <Paper uid="P02-1043"> <Title>Generative Models for Statistical Parsing with Combinatory Categorial Grammar</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The currently best single-model statistical parser (Charniak, 1999) achieves Parseval scores of over 89% on the Penn Treebank. However, the grammar underlying the Penn Treebank is very permissive, and a parser can do well on the standard Parseval measures without committing itself on certain semantically significant decisions, such as predicting null elements arising from deletion or movement.</Paragraph> <Paragraph position="1"> The potential benefit of wide-coverage parsing with CCG lies in its more constrained grammar and its simple and semantically transparent capture of extraction and coordination.</Paragraph> <Paragraph position="2"> We present a number of models over syntactic derivations of Combinatory Categorial Grammar (CCG, see Steedman (2000) and Clark et al. (2002), this conference, for introduction), estimated from and tested on a translation of the Penn Treebank to a corpus of CCG normal-form derivations. CCG grammars are characterized by much larger category sets than standard Penn Treebank grammars, distinguishing for example between many classes of verbs with different subcategorization frames. As a result, the categorial lexicon extracted for this purpose from the training corpus has 1207 categories, compared with the 48 POS-tags of the Penn Treebank.</Paragraph> <Paragraph position="3"> On the other hand, grammar rules in CCG are limited to a small number of simple unary and binary combinatory schemata such as function application and composition. This results in a smaller and less overgenerating grammar than standard PCFGs (ca.</Paragraph> <Paragraph position="4"> 3,000 rules when instantiated with the above categories in sections 02-21, instead of BQ12,400 in the original Treebank representation (Collins, 1999)).</Paragraph> </Section> class="xml-element"></Paper>