File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-1111_intro.xml
Size: 1,480 bytes
Last Modified: 2025-10-06 14:03:38
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1111"> <Title>Prototype-Driven Grammar Induction</Title> <Section position="4" start_page="881" end_page="881" type="intro"> <SectionTitle> 2 Experimental Setup </SectionTitle> <Paragraph position="0"> The majority of our experiments induced tree structures from the WSJ section of the English Penn treebank (Marcus et al., 1994), though see section 7.4 for an experiment on Chinese. To facilitate comparison with previous work, we extracted WSJ-10, the 7,422 sentences which contain 10 or fewer words after the removal of punctuation and null elements according to the scheme detailed in Klein (2005). We learned models on all or part of this data and compared their predictions to the manually annotated treebank trees for the sentences on which the model was trained. As in previous work, we begin with the part-of-speech (POS) tag sequences for each sentence rather than lexical sequences (Carroll and Charniak, 1992; Klein and Manning, 2002).</Paragraph> <Paragraph position="1"> Following Klein and Manning (2004), we report unlabeled bracket precision, recall, and F1. Note that according to their metric, brackets of size 1 are omitted from the evaluation. Unlike that work, all of our induction methods produce trees labeled with symbols which are identified with treebank categories. Therefore, we also report labeled precision, recall, and F1, still ignoring brackets of size 1.1</Paragraph> </Section> class="xml-element"></Paper>