File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/93/e93-1006_abstr.xml
Size: 1,432 bytes
Last Modified: 2025-10-06 13:47:40
<?xml version="1.0" standalone="yes"?> <Paper uid="E93-1006"> <Title>Using an Annotated Corpus as a Stochastic Grammar</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> In Data Oriented Parsing (DOP), an annotated corpus is used as a stochastic grammar. An input string is parsed by combining subtrees from the corpus. As a consequence, one parse tree can usually be generated by several derivations that involve different subtrces. This leads to a statistics where the probability of a parse is equal to the sum of the probabilities of all its derivations. In (Scha, 1990) an informal introduction to DOP is given, while (Bed, 1992a) provides a formalization of the theory.</Paragraph> <Paragraph position="1"> In this paper we compare DOP with other stochastic grammars in the context of Formal Language Theory. It it proved that it is not possible to create for every DOP-model a strongly equivalent stochastic CFG which also assigns the same probabilities to the parses.</Paragraph> <Paragraph position="2"> We show that the maximum probability parse can be estimated in polynomial time by applying Monte Carlo techniques. The model was tested on a set of hand-parsed strings from the Air Travel Information System (ATIS) spoken language corpus. Preliminary experiments yield 96% test set parsing accuracy.</Paragraph> </Section> class="xml-element"></Paper>