File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/99/w99-0708_concl.xml
Size: 2,105 bytes
Last Modified: 2025-10-06 13:58:34
<?xml version="1.0" standalone="yes"?> <Paper uid="W99-0708"> <Title>MDL-based DCG Induction for NP Identification</Title> <Section position="8" start_page="66" end_page="66" type="concl"> <SectionTitle> 7 Conclusion </SectionTitle> <Paragraph position="0"> We presented an MDL-based incremental DCG learner.</Paragraph> <Paragraph position="1"> Experimental evaluation showed that estimation is possible using just raw sentences, but that better results are possible when additional parsed corpora is used.</Paragraph> <Paragraph position="2"> Evaluation also showed that this parsed corpora need not be that detailed, and that NP bracketing information produced similar results to using full WSJ parses. This final results seems counterintuitive, and merits further investigation.</Paragraph> <Paragraph position="3"> Future work on the learner will be in three main directions: null * Abandonment of the SCFG as the basis of the language model. We are considering either Abney's random fields \[1\] or Goodman's Probabilistic Feature Grammmars \[14\] as a replacement. Apart from performance improvements, altering the model class should allow empirical investigation of the MDL claim that model classes can be evaluated in terms of compression. So, if we discover even more compact models using (say) Goodman's scheme than we could using our SCFG, we might deduce that this is the case. Naturally: lexicalisation would enter into any scheme entertained.</Paragraph> <Paragraph position="4"> * Use of semantics in estimation. We have at our disposal a large grammar augmented with a compositional semantics \[15\]. Again, this should lead to better results.</Paragraph> <Paragraph position="5"> * Prior we!ghting. As is well known, MDL-based learners sometimes improve from weighting the prior with respect to tile likelihood. Schemes, such as Quinlan and Rivest's \[24\], fall outside of the coding framework and (effectively) replicate the training set. We intend to pursue encoding-based schemes that achieve the.</Paragraph> <Paragraph position="6"> same purpose.</Paragraph> </Section> class="xml-element"></Paper>