File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/93/p93-1035_evalu.xml
Size: 7,269 bytes
Last Modified: 2025-10-06 14:00:07
<?xml version="1.0" standalone="yes"?> <Paper uid="P93-1035"> <Title>Automatic Grammar Induction and Parsing Free Text: A Transformation-Based Approach</Title> <Section position="6" start_page="261" end_page="263" type="evalu"> <SectionTitle> RESULTS </SectionTitle> <Paragraph position="0"> In the first experiment we ran, training and testing were done on the Texas Instruments Air Travel Information System (ATIS) corpus(HGD90). 8 In table 1, we compare results we obtained to results cited in (PS92) using the inside-outside algorithm on the same corpus. Accuracy is measured in terms of the percentage of noncrossing constituents in the test corpus, as described above.</Paragraph> <Paragraph position="1"> Our system was tested by using the training set to learn a set of transformations, and then applying these transformations to the test set and scoring the resulting output. In this experiment, 64 transformations were learned (compared with 4096 context-free rules and probabilities used in the inside-outside algorithm experiment). It is significant that we obtained comparable performance using a training corpus only 21% as large as that used to train the inside-outside algorithm.</Paragraph> <Paragraph position="2"> After applying all learned transformations to the test corpus, 60% of the sentences had no crossing constituents, 74% had fewer than two crossing constituents, and 85% had fewer than three. The mean sentence length of the test corpus was 11.3.</Paragraph> <Paragraph position="3"> In figure 2, we have graphed percentage correct as a function of the number of transformations that have been applied to the test corpus. As the transformation number increases, overtraining sometimes occurs. In the current implementation of the learner, a transformation is added to the list if it results in any positive net change in the Sin all experiments described in this paper, results are calculated on a test corpus which was not used in any way in either training the learning algorithm or in developing the system.</Paragraph> <Paragraph position="4"> training set. Toward the end of the learning procedure, transformations are found that only affect a very small percentage of training sentences. Since small counts are less reliable than large counts, we cannot reliably assume that these transformations will also improve performance in the test corpus.</Paragraph> <Paragraph position="5"> One way around this overtraining would be to set a threshold: specify a minimum level of improvement that must result for a transformation to be learned. Another possibility is to use additional training material to prune the set of learned transformations. null With Right-Linear Structure.</Paragraph> <Paragraph position="6"> We next ran an experiment to determine what performance could be achieved if we dropped the initial right-linear assumption. Using the same training and test sets as above, sentences were initially assigned a random binary-branching structure, with final punctuation always attached high. Since there was less regular structure in this case than in the right-linear case, many more transformations were found, 147 transformations in total. When these transformations were applied to the test set, a bracketing accuracy of 87.13% resulted. The ATIS corpus is structurally fairly regular.</Paragraph> <Paragraph position="7"> To determine how well our algorithm performs on a more complex corpus, we ran experiments on the Wall Street Journal. Results from this experiment can be found in table 2. 9 Accuracy is again 9For sentences of length 2-15, the initial right-linear parser achieves 69% accuracy. For sentences of length measured as the percentage of constituents in the test set which do not cross any Penn Treebank constituents.ldeg As a point of comparison, in (SRO93) an experiment was done using the inside-outside algorithm on a corpus of WSJ sentences of length 1-15. Training was carried out on a corpus of 1,095 sentences, and an accuracy of 90.2% was obtained in bracketing a test set.</Paragraph> <Paragraph position="8"> In the corpus we used for the experiments of sentence length 2-15, the mean sentence length was 10.80. In the corpus used for the experiment of sentence length 2-25, the mean length was 16.82. As would be expected, performance degrades somewhat as sentence length increases.</Paragraph> <Paragraph position="9"> In table 3, we show the percentage of sentences in the test corpus that have no crossing constituents, and the percentage that have only a very small In table 4, we show the standard deviation measured from three different randomly chosen training sets of each sample size and randomly chosen test sets of 500 sentences each, as well as 2-20, 63% accuracy is achieved and for sentences of length 2-25, accuracy is 59%.</Paragraph> <Paragraph position="10"> adegIn all of our experiments carried out on the Wall Street Journal, the test set was a randomly selected set of 500 sentences.</Paragraph> <Paragraph position="11"> nFor sentences of length 2-15, the initial right linear parser parses 17% of sentences with no crossing errors, 35% with one or fewer errors and 50% with two or fewer. For sentences of length 2-25, 7% of sentences are parsed with no crossing errors, 16% with one or fewer, and 24% with two or fewer.</Paragraph> <Paragraph position="12"> the accuracy as a function of training corpus size for sentences of length 2 to 20.</Paragraph> <Paragraph position="13"> We also ran an experiment on WSJ sentences of length 2-15 starting with random binary-branching structures with final punctuation attached high. In this experiment, 325 transformations were found using a 250-sentence training corpus, and the accuracy resulting from applying these transformations to a test set was 84.72%.</Paragraph> <Paragraph position="14"> Finally, in figure 3 we show the sentence length distribution in the Wall Street Journal corpus. null While the numbers presented above allow us to compare the transformation learner with systems trained and tested on comparable corpora, these results are all based upon the assumption that the test data is tagged fairly reliably (manually tagged text was used in all of these experiments, as well in the experiments of (PS92, SRO93).) When parsing free text, we cannot assume that the text will be tagged with the accuracy of a human annotator. Instead, an automatic tagger would have to be used to first tag the text before parsing. To address this issue, we ran one experiment where we randomly induced a 5% tagging error rate beyond the error rate of the human annotator. Errors were induced in such a way as to preserve the unigram part of speech tag probability distribution in the corpus. The experiment was run for sentences of length 2-15, with a training set of 1000 sentences and a test set of 500 sentences. The resulting bracketing accuracy was 90.1%, compared to 91.6% accuracy when using an unadulterated training corpus. Accuracy only degraded by a small amount when training on the corpus with adulterated part of speech tags, suggesting that high parsing accuracy rates could be achieved if tagging of the input were done automatically by a part of speech tagger.</Paragraph> </Section> class="xml-element"></Paper>