File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/j04-4004_concl.xml
Size: 3,312 bytes
Last Modified: 2025-10-06 13:53:57
<?xml version="1.0" standalone="yes"?> <Paper uid="J04-4004"> <Title>c(c) 2004 Association for Computational Linguistics Intricacies of Collins' Parsing Model</Title> <Section position="13" start_page="506" end_page="508" type="concl"> <SectionTitle> 9. Conclusion </SectionTitle> <Paragraph position="0"> We have documented what we believe is the complete set of heretofore unpublished details Collins used in his parser, such that, along with Collins' (1999) thesis, thi s article contains all information necessary to duplicate Collins' benchmark results.</Paragraph> <Paragraph position="1"> Indeed, these as-yet-unpublished details account for an 11% relative increase in error from an implementation including all details to a clean-room implementation of Collins' model. We have also shown a cleaner and equally well-performing method for the handling of punctuation and conjunction, and we have revealed certain other probabilistic oddities about Collins' parser. We have not only analyzed the effect of the unpublished details but also reanalyzed the effect of certain well-known details, revealing that bilexical dependencies are barely used by the model and that head choice is not nearly as important to overall parsing performance as once thought. Finally, we have performed experiments that show that the true discriminative power of lexicalization appears to lie in the fact that unlexicalized syntactic structures are generated conditioning on the headword and head tag. These results regarding the Computational Linguistics Volume 30, Number 4 lack of reliance on bilexical statistics suggest that generative models still have room for improvement through the employment of bilexical-class statistics, that is, dependencies among head-modifier word classes, where such classes may be defined by, say, WordNet synsets. Such dependencies might finally be able to capture the semantic preferences that were thought to be captured by standard bilexical statistics, as well as to alleviate the sparse-data problems associated with standard bilexical statistics. This is the subject of our current research.</Paragraph> <Paragraph position="2"> Appendix: Complete List of Parameter Classes This section contains tables for all parameter classes in Collins' Model 3, with appropriate modifications and additions from the tables presented in Collins' thesis. The notation is that used throughout this article. In particular, for notational brevity we</Paragraph> <Paragraph position="4"> to refer to the three items M</Paragraph> <Paragraph position="6"> that constitute some fully lexicalized modifying nonterminal and similarly M(t) i to refer to the two items M</Paragraph> <Paragraph position="8"> that constitute some partially lexicalized modifying nonterminal. The (unlexicalized) nonterminal-mapping functions alpha and gamma are defined in Section 6.1. As a shorthand, g(M(t)</Paragraph> <Paragraph position="10"> The head-generation parameter class, P parameters includes words that are the heads of the observed roots of sentences (that is, the headword of the entire sentence). Also, note that there is no coord flag, as coordinating conjunctions are generated in the same way as regular modifying nonterminals when they are dominated by NPB. Finally, we</Paragraph> </Section> class="xml-element"></Paper>