File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/96/w96-0111_concl.xml
Size: 1,478 bytes
Last Modified: 2025-10-06 13:57:41
<?xml version="1.0" standalone="yes"?> <Paper uid="W96-0111"> <Title>Two Questions about Data-Oriented Parsing*</Title> <Section position="5" start_page="137" end_page="137" type="concl"> <SectionTitle> Conclusion </SectionTitle> <Paragraph position="0"> In this paper we have addressed two, previously neglected questions about the DOP model: how does DOP perform if tested on unedited Penn Treebank data, and (2), how can DOP be used for directly parsing word strings that contain unknown words. We have shown that although parse results are considerably lower on unedited data than on cleaned-up data, they are very competitive, if not better than other models. With respect to the parsing of word strings, we have shown that the hardness of the problem does not lie so much in unknown words, but in previously unseen lexical categories of known words. We have given a novel method for parsing these words by estimating the probabilities of unknown subtrees. The method was tested on ATIS trees obtaining results that to the best of our knowledge are not exceeded by other stochastic parsers. Moreover, the results of a less-than-optimal version of DOP on the Wall Street Journal corpus suggest that the approach can be succesfully extended to larger domains. As future research, we will apply the full DOP model on WSJ word strings in order to compare our results with the best known parsers on this domain (Magerman, 1995; Collins, 1996).</Paragraph> </Section> class="xml-element"></Paper>