File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/a00-1028_metho.xml

Size: 9,055 bytes

Last Modified: 2025-10-06 14:07:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="A00-1028">
  <Title>Experiments with Corpus-based LFG Specialization</Title>
  <Section position="4" start_page="204" end_page="206" type="metho">
    <SectionTitle>
3 Experimental Setup
</SectionTitle>
    <Paragraph position="0"> The experiments carried out to determine the effectiveness of corpus-based specialization were performed as illustrated in Figure 2. Two broad-coverage LFG grammars were used, one for French and one for English, both of which were developed within the Pargram project (Butt et al., 1999) during several years time. The French grammar consists of 133 rule schemata, the English grammar of 8.5 rule schemata.</Paragraph>
    <Paragraph position="1"> Each gralmnar is equipped with a treebank, which was developed for other purposes than grammar specialization. Each treebank was produced by letting the system parse a corpus of technical documentation. Any sentence that did not obtain any parse was discarded. At this point, the French corpus was reduced to 960 sentences, and the English corpus to 970. The average sentence length was 9 for French and 8 for English. For each sentence, a human expert then selected the most appropriate analysis among those returned by the parser.</Paragraph>
    <Paragraph position="2"> In the current experiments, each treebank was used to specialize the grammar it had been developed with. A set of 10-fold cross-validation experiments was carried out to measure several interesting quantities under different conditions. This means that, for each language, the corpus was randomly split into ten equal parts, and one tenth at a time was held out for testing while the remaining nine tenths were used to specialize the grammar, and the results were averaged over the ten runs.. For each grammar the average number of parses per sentence, the fraction of sentences which still received at least one parse (angparse) and the fraction of sentences for which the parse selected by the expert was still derived (coverage) were measured 1. The average CPU time required by parsing was also measured, and this was used to compute the speedup with respect to the original grammar.</Paragraph>
    <Paragraph position="3"> The thus established results constitute one data point in the trade-off between ambiguity reduction on one side, which is in turn related to parsing speed, and loss in coverage on the other. In order to determine other points of this trade-off, the same set. of experiments was performed where speciMization was inhibited for certain rule schemata. In particular, for each grammar, the two rule schemata that received the largest number of distinct expansions in the corpora were determined. These proved to be those associated with the LHS symbols 'VPverb\[main\]' and 'NP' for the French grammar, and 'VPv' and 'NPadj' for the English one. 2 The experiments were repeated while inhibiting specialization of first the scheme with the most expansions, and then the two most expanded schemata.</Paragraph>
    <Paragraph position="4"> Measures of coverage and speedup are important 1 As long as we are interested in preserving the f-structure assigned to sentences, this notion of coverage is stricter than necessary. The same f-structure can in fact be assigned by more than one parse, so that in some cases a sentence is considered out of coverage even if the specialized grammar assigns to it the correct f-structure.</Paragraph>
    <Paragraph position="5"> 2'VPv' and 'VPverb\[main\]' cover VPs headed by a main verb. 'NPadj' covers NPs with adjectives attached.</Paragraph>
    <Paragraph position="7"> interpretation as in regular expressions. A sub-expression enclosed in parenthesis is optional. Alternative sub-expressions are enclosed in curly brackets and separated by the &amp;quot;\[&amp;quot; sign. An &amp;quot;@&amp;quot; followed by an identifier is a macro expansion operator, and is eventually replaced by further functional descriptions.</Paragraph>
    <Paragraph position="8">  indicators of what can be achieved with this form of grammar pruning. However, they could potentially be misleading, since failure times for uncovered sentences might be considerably lower than their parsing times, had they not been out of coverage. If the pruned grammar fails more frequently on sentences which take longer to parse, the measured speedup might be artificiMly high. This is easily realized, as simply removing the hardest sentences froln the corpus would cause a decrease ill the average parsing time, and thus result in a speedup, without any pruning at all. To factor out the contribution of uncovered sentences fi'om the results, the performance of a two-stage architecture analogous to that of (Samuelsson and Rayner, 1991) was silnulated, in which the pruned grammar is attempted  first, and the sentence is passed on to the original unpruned grammar whenever the pruned grammar fails to return a parse (see Figure 3). The measured speedup of this simulated architecture, which preserves the anyparse measure of the original grammar, takes into account the contribution of uncovered sentences, as it penalizes sweeping difficult sentences under the carpet.</Paragraph>
  </Section>
  <Section position="5" start_page="206" end_page="206" type="metho">
    <SectionTitle>
4 Experimental Results
</SectionTitle>
    <Paragraph position="0"> The results of the experiments described in the section above are summarized in the table in Figure 4.</Paragraph>
    <Paragraph position="1"> The upper part of the table refers to experiments with the French grammar, the lower part to experiments with the English grammar. For each language, the first line presents data gathered for the original grammar for comparison with the pruned grammars. The figures in the second line were collected by pruning the grammar based on the whole corpus, and then testing on the corpus itself. The grammars obtained in this way contain 516 and 388 disjuncts -- corresponding to purely concatenative rules -- for French and English respectively. Anyparse and coverage are not, of course, relevant in this case, but the statistics on parsing time are, especially the one on the maximum parsing time. For each iteration in the 10-fold cross-validation experiment, the maximum parsing time was retained, and those ten times were eventually averaged. If pruning tended to leave sentences which take long to parse uncovered, then we would observe a significant difference between the average over ma.ximum times on the grammar trained and tested on the same corpus (which parses all sentences, including the hardest), and the average over maximum times for grammars trained and tested on different sets. The fact that this does not seem to be the case indicates that pruning does not penalize difficult sentences. Note also that the average number of parses per sentence is significantly smaller than with the full grammar, of almost a factor of 9 in the case of the French graminar. null The third line contains results for the fully pruned grammar. In the case of the French grammar a speedup of about 6 is obtained with a loss in coverage of 13%. The smaller speedup gained with the English grammar can be explained by the fact that here, the parsing times are lower in general, and that a non-negligible part of this time, especially that needed for morphological analysis, is unaffected by pruning. Even in the case of the English grammar, though, speedup is substantial (2.67). For both grammars, the reduction in the average maxinmm parsing time is particularly good, confirming our hypothesis that trimming the grammar by removing heavy constructs makes it considerably more efficient. A partially negative note comes from the average number of disjuncts in the prun.ed grainmars, which is 501 for French and 374 for English.</Paragraph>
    <Paragraph position="2"> Comparing this figures to the number of disjuncts in grammars pruned on the full corpus (516 and 388), we find that after training on nine tenths of the corpus, adding the last tenth still leads to an increase of 3-4% in the size of the resulting grammars. In other words, the marginal gain of further training examples is still significant after considering about 900 sentences, indicating that the training corpora are somewhat too small.</Paragraph>
    <Paragraph position="3"> The last two lines for each language show figures for grammars with pruning inhibited on the most variable and the two most variable symbols respectively. For both languages, inhibiting pruning on the most variable symbol has the expected effect of increasing both parsing time and coverage. Inhibiting pruning also on the second most variable symbol has ahnost no effect for French, and only a small effect for English.</Paragraph>
    <Paragraph position="4"> The table in Figure 5 summarizes the measures on the simulated two-stage architecture. For both languages the best trade-off, once the distribution of uncovered sentences has been taken into account, is achieved by the fully pruned grammars.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML