File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/w96-0111_intro.xml
Size: 1,809 bytes
Last Modified: 2025-10-06 14:06:11
<?xml version="1.0" standalone="yes"?> <Paper uid="W96-0111"> <Title>Two Questions about Data-Oriented Parsing*</Title> <Section position="2" start_page="0" end_page="125" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The Data-Oriented Parsing (DOP) method suggested in Scha (1990) and developed in Bod (19921995) is a probabilistic parsing strategy which does not single out a narrowly predefined set of structures as the statistically significant ones. It accomplishes this by maintaining a large corpus of analyses of previously occurring utterances. New input is parsed by combining tree-fragments from the corpus; the frequencies of these fragments are used to estimate which analysis is the most probable one.</Paragraph> <Paragraph position="1"> In previous work, we tested the DOP method on a cleaned-up set of analyzed part-of-speech strings from the Penn Treebank (Marcus et al., 1993), achieving excellent test results (Bod, 1993a, b). This left, however, two important questions unanswered: (1) how does DOP perform if tested on unediteddata, and (2), how can DOP be used for parsing word strings that contain unknown words? This paper addresses these questions. The rest of it is divided into three parts. In section 2 we give a short resume of the DOP method. In section 3 we address the first question: how does DOP perform on unedited data? In section 4 we deal with the question how DOP can be used for parsing word strings that contain unknown words. This second question turns out to be the actual focus of the article, while the answer to the first question serves as a baseline.</Paragraph> <Paragraph position="2"> * This work was partially supported by the Netherlands Organization for Scientific Research (NWO).</Paragraph> </Section> class="xml-element"></Paper>