File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/n06-1019_concl.xml
Size: 2,414 bytes
Last Modified: 2025-10-06 13:55:07
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-1019"> <Title>Partial Training for a Lexicalized-Grammar Parser</Title> <Section position="7" start_page="149" end_page="150" type="concl"> <SectionTitle> 5 Conclusions </SectionTitle> <Paragraph position="0"> Our main result is that it is possible to train a CCG dependency model from lexical category sequences alone and still obtain parsing results which are only 1.3% worse in terms of labelled F-score than a model trained on complete data. This is a noteworthy result and demonstrates the significant amount of information encoded in CCG lexical categories.</Paragraph> <Paragraph position="1"> The engineering implication is that, since the dependency model can be trained without annotating recursive structures, and only needs sequence information at the word level, then it can be ported rapidly to a new domain (or language) by annotating new sequence data in that domain.</Paragraph> <Paragraph position="2"> One possible response to this argument is that, since the lexical category sequence contains so much syntactic information, then the task of annotating category sequences must be almost as labour intensive as annotating full derivations. To test this hypothesis fully would require suitable annotation tools and subjects skilled in CCG annotation, which we do not currently have access to.</Paragraph> <Paragraph position="3"> However, there is some evidence that annotating category sequences can be done very efficiently. Clark et al. (2004) describes a porting experiment in which a CCG parser is adapted for the question domain. The supertagger component of the parser is trained on questions annotated at the lexical category level only. The training data consists of over 1,000 annotated questions which took less than a week to create. This suggests, as a very rough approximation, that 4 annotators could annotate 40,000 sentences with lexical categories (the size of the Penn Treebank) in a few months.</Paragraph> <Paragraph position="4"> Another advantage of annotating with lexical categories is that a CCG supertagger can be used to perform most of the annotation, with the human annotator only required to correct the mistakes made by the supertagger. An accurate supertagger can be bootstrapped quicky, leaving only a small number of corrections for the annotator. A similar procedure is</Paragraph> </Section> class="xml-element"></Paper>