File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/93/e93-1006_concl.xml

Size: 1,565 bytes

Last Modified: 2025-10-06 13:56:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="E93-1006">
  <Title>Using an Annotated Corpus as a Stochastic Grammar</Title>
  <Section position="7" start_page="675" end_page="675" type="concl">
    <SectionTitle>
6 Conclusions and Future Research
</SectionTitle>
    <Paragraph position="0"> We have presented a language model that uses an annotated corpus as a stochastic grammar. We restricted ourselves to substitution as the only combination operation between corpus subtrees. A statistical parsing theory was developed, where one parse can be generated by different derivations, and where the probability of a parse is computed as the sum of the probabilities of all its derivations. It was shown that our model cannot always be described by a stochastic CFG. It turned out that the maximum probability parse can be estimated as accurately as desired in polynomial time by using Monte Carlo techniques. The method has been succesfully tested on a set of part-of-speech sequences derived from the ATIS corpus. It turned out that parsing accuracy improved if larger subtrees were used.</Paragraph>
    <Paragraph position="1"> We would like to extend our experiments to larger corpora, like the Wall Street Journal corpus. This might raise computational problems, since the number of subtrees becomes extremely large. Furthermore, in order to tackle the problem of data sparseness, the possibility of abstracting from corpus data should be included, but statistical models of abstractions of features and categories are not yet available.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML