XML Viewer - w97-1010

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/97/w97-1010_evalu.xml
Size: 3,981 bytes
Last Modified: 2025-10-06 14:00:29
<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-1010">
  <Title>Learning Stochastic Categorial Grammars</Title>
  <Section position="7" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
6 Experiments
</SectionTitle>
    <Paragraph position="0"> Here, we report on a number of experiments showing that when there is a danger of overfitting taking place, MDL produces a quantitatively better SCG than does MLE.</Paragraph>
    <Paragraph position="1"> To evaluate the various lexica produced, we used the following metrics: * To measure a grammar's coverage, we note the number of tag sequences, drawn from a corpus of naturally occurring language, some grammar generates. The higher the number, the better the grammar.</Paragraph>
    <Paragraph position="2"> * To measure a grammar's overgeneration, we note the number of ungrammatical strings, drawn from a source that generates all strings up to some length randomly, a grammar generates. The lower the number, the better the grammar. That is, random sequences of tags, of a sufficient length, will have a low probability of being grammatically well-formed. * To measure the accuracy of the parses pro. duced, we use the Grammar Evaluation Interest Group scheme (GEIG) (Harrison et al., 19). This compares unlabelled, manually produced parses with automatically produced parses in terms of recall (the ratio of matched brackets over all brackets in the manually produced parses), precision (the ratio of matched brackets in the manually produced parse over all brackets found by the parser) and crossing rates (the number of times a bracketed sequence produced by the parser overlaps with one in the manually produced parse, but neither is properly contained in the other). The higher the precision and recall, and the lower the crossing rates, the better the grammar.</Paragraph>
    <Paragraph position="3"> Throughout our experiments, we used the Brill part-of-speech tagger to create testing and training material (Brill, 1993). Our trigram model was created using seven million words of tagged material drawn from the British National Corpus (BNC); training material consisted of 43,000 tagged sentences also taken from the BNC. For test material, we took 429 sentences taken from the Spoken English Corpus (SEC). To compute crossing rates, recall and precision figures, we used a program called Parseval to compare most probable parses with manually produced parses (232 trees in total taken from the SEC) (Harrison et al., 19). To measure overgeneration, we randomly generated 250 strings. From a manual inspection, these do appear to be ungrammatical. Here is an example randomly generated tag  things being equal, we prefer the lexicon to be as small as possible. The larger the lexicon, the slower the parsing. As predicted by theory, the lexicon learnt using MLE is larger than the one learnt using MDL.</Paragraph>
    <Paragraph position="4"> Testing for coverage, we produced the results shown in figure 2. Again as predicted, lexicon A is  Turning now to figure 3, we see that, with respect to the test set, neither lexicon overgenerates.</Paragraph>
    <Paragraph position="5">  MDL has lead to the estimation of a better lexicon than has MLE. Note that the actual figures are not as great as they might be. This follows from the fact that although categorial grammars assigned binary-branching trees to sentences, the test parses used to compute crossing rates were not restricted to being binary branching. Also, our learner used virtually no supervision (for example parsed corpora), and did not start with a given lexicon: learning using parsed corpora is substantiMly easier than learning from just a tagged text, whilst starting with a given, manually constructed lexicon is equivalent to learning with a good initial estimation of the target lexicon, which greatly increases the chance of successful learning. However, the figures are sufficient for the purposes of our demonstration.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML