File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/93/h93-1047_evalu.xml

Size: 6,963 bytes

Last Modified: 2025-10-06 14:00:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="H93-1047">
  <Title>Automatic Grammar Induction and Parsing Free Text: A Transformation-Based Approach</Title>
  <Section position="4" start_page="239" end_page="241" type="evalu">
    <SectionTitle>
3. RESULTS
</SectionTitle>
    <Paragraph position="0"> In the first experiment we ran, training and testing were done on the Texas Instruments Air Travel Information System (ATIS) corpus\[8\]. 9 In table 1, we compare results we obtained to results cited in \[ 12\] using the inside-outside algorithm on the same corpus. Accuracy is measured in terms of the percentage of noncrossing constituents in the test corpus, as described above. Our system was tested by using the training set to learn a set of transformations, and then applying these transformations to the test set and scoring the resulting output. In this experiment, 64 transformations were learned (compared with 4096 context-free rules and probabilities used in the i-o experiment). It is significant that we obtained comparable performance using a training corpus only 21% as large as that used to train the inside-outside algorithm.</Paragraph>
    <Paragraph position="1">  pus.</Paragraph>
    <Paragraph position="2"> After applying all learned transformations to the test corpus, 60% of the sentences had no crossing constituents, 74% had fewer than two crossing constituents, and 85% had fewer than three. The mean sentence length of the test corpus was 11.3. In figure 1, we have graphed percentage correct as a function of the number of transformations that have been applied to the test corpus. As the transformation number increases, overtraining sometimes occurs. In the current implementation of the learner, a transformation is added to the list if it results in any positive net change in the training set. Toward the end of the learning procedure, transformations are found that only affect a very small percentage of training sentences. Since small counts are less reliable than large counts, we cannot reliably assume that these transformations will also 9In all experiments described in this paper, results are calculated on a test corpus which was not used in any way in either training the learning algorithm or in developing the system.</Paragraph>
    <Section position="1" start_page="239" end_page="241" type="sub_section">
      <SectionTitle>
Linear Structure
</SectionTitle>
      <Paragraph position="0"> We next ran an experiment to determine what performance could be achieved if we dropped the initial right-linear assumption. Using the same training and test sets as above, sentences were initially assigned a random binary-branching structure, with final punctuation always attached high. Since there was less regular structure in this case than in the right-linear case, many more transformations were found, 147 transformations in total. When these transformations were applied to the test set, a bracketing accuracy of 87.13% resulted.</Paragraph>
      <Paragraph position="1"> The ATIS corpus is structurally fairly regular. To determine how well our algorithm performs on a more complex corpus, we ran experiments on the Wall Street Journal. Results from this experiment can be found in table 2.1deg Accuracy is again measured as the percentage of constituents in the test set which do not cross any Penn Treebank constituents. 1~ As a point of comparison, in \[14\] an experiment was done using the i-o algorithm on a corpus of WSJ sentences of length 1-15.</Paragraph>
      <Paragraph position="2"> Training was carried out on 1,095 sentences, and an accuracy of 90.2% was obtained in bracketing a test set.</Paragraph>
      <Paragraph position="3"> ldegFor sentences of length 2-15, the initial right-linear parser achieves 69% accuracy. For sentences of length 2-20, 63% accuracy is achieved and for sentences of length 2-25, accuracy is 59%.</Paragraph>
      <Paragraph position="4"> 11 In all of our experiments carried out on the Wall Street Journal, the test set was a randomly selected set of 500 sentences.</Paragraph>
      <Paragraph position="5">  punctuation attached high. In this experiment, 325 transformations were found using a 250-sentence training corpus, and the accuracy resulting from applying these transformations to a test set was 84.72%.</Paragraph>
      <Paragraph position="6"> Finally, in figure 2 we show the sentence length distribution in the Wall Street Journal corpus.</Paragraph>
      <Paragraph position="7">  In the corpus used for the experiments of sentence length 215, the mean sentence length was 10.80. In the corpus used for the experiment of sentence length 2-25, the mean length was 16.82. As would be expected, performance degrades somewhat as sentence length increases. In table 3, we show the percentage of sentences in the test corpus which have no crossing constituents, and the percentage that have only a very small number of crossing constituents 12.</Paragraph>
      <Paragraph position="8">  In table 4, we show the standard deviation measured from three different randomly chosen training sets of each sample size and randomly chosen test sets of 500 sentences each, as well as the accuracy as a function of training corpus size.</Paragraph>
      <Paragraph position="9">  We also ran an experiment on WSJ sentences of length 2-15 starting with random binary-branching structures with final 12For sentences of length 2-15, the initial right linear parser parses 17% of sentences with no crossing errors, 35% with one or fewer errors and 50% with two or fewer. For sentences of length 2-25, 7% of sentences are parsed with no crossing errors, 16% with one or fewer, and 24% with two or fewer.  Corpus.</Paragraph>
      <Paragraph position="10"> While the numbers presented above allow us to compare the transformation learner with systems trained and tested on Comparable corpora, these results are all based upon the assumption that the test data is tagged fairly reliably (manually tagged text was used in all of these experiments, as well in the experiments of \[12, 14\].) When parsing free text, we cannot assume that the text will be tagged with the accuracy of a human annotator. Instead, an automatic tagger would have to be used to first tag the text before parsing. To address this issue, we ran one experiment where we randomly induced a 5% tagging error rate beyond the error rate of the human annotator. Errors were induced in such a way as to preserve the unigram part of speech tag probability distribution in the corpus. The experiment was run for sentences of length 2-15, with a training set of 1000 sentences and a test set of 500 sentences. The resulting bracketing accuracy was 90.1%, compared to 91.6% accuracy when using an unadulterated corpus. Accuracy only degraded by a small amount when using the corpus with adulterated part of speech tags, suggesting that high parsing accuracy rates could be achieved if tagging of the input was done automatically by a tagger.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML