File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/p04-1014_evalu.xml
Size: 5,908 bytes
Last Modified: 2025-10-06 13:59:09
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-1014"> <Title>Parsing the WSJ using CCG and Log-Linear Models</Title> <Section position="8" start_page="0" end_page="0" type="evalu"> <SectionTitle> 8 Experiments </SectionTitle> <Paragraph position="0"> Gold standard dependency structures were derived from section 00 (for development) and section 23 (for testing) by running the parser over the derivations in CCGbank, some of which the parser could not process. In order to increase the number of test sentences, and to allow a fair comparison with other CCG parsers, extra rules were encoded in the parser (but we emphasise these were only used to obtain 4Coordinate constructions can create multiple dependencies for a single argument slot; in this case the score for the multiple dependencies is the average of the individual scores.</Paragraph> <Paragraph position="1"> form models the section 23 test data; they were not used to parse unseen data as part of the testing). This resulted in 2,365 dependency structures for section 23 (98.5% of the full section), and 1,825 (95.5%) dependency structures for section 00.</Paragraph> <Paragraph position="2"> The first stage in parsing the test data is to apply the supertagger. We use the novel strategy developed in Clark and Curran (2004): first assign a small number of categories (approximately 1.4) on average to each word, and increase the number of categories if the parser fails to find an analysis. We were able to parse 98.9% of section 23 using this strategy. Clark and Curran (2004) shows that this supertagging method results in a highly efficient parser. For the normal-form model we returned the dependency structure for the most probable derivation, applying the two types of normal-form constraints described in Section 6. For the dependency model we returned the dependency structure with the highest expected labelled recall score.</Paragraph> <Paragraph position="3"> Following Clark et al. (2002), evaluation is by precision and recall over dependencies. For a labelled dependency to be correct, the first 4 elements of the dependency tuple must match exactly. For an unlabelled dependency to be correct, the heads of the functor and argument must appear together in some relation in the gold standard (in any order).</Paragraph> <Paragraph position="4"> The results on section 00, using the feature sets described earlier, are given in Table 1, with similar results overall for the normal-form model and the dependency model. Since experimentation is easier with the normal-form model than the dependency model, we present additional results for the normal-form model.</Paragraph> <Paragraph position="5"> Table 2 gives the results for the normal-form model for various feature sets. The results show that each additional feature type increases perfor- null mance. Hockenmaier also found the dependencies to be very beneficial -- in contrast to recent results from the lexicalised PCFG parsing literature (Gildea, 2001) -- but did not gain from the use of distance measures. One of the advantages of a log-linear model is that it is easy to include additional information, such as distance, as features.</Paragraph> <Paragraph position="6"> The FINAL result in Table 2 is obtained by using a larger derivation space for training, created using more categories per word from the supertagger, 2.9, and hence using charts containing more derivations. (15 machines were used to estimate this model.) More investigation is needed to find the optimal chart size for estimation, but the results show a gain in accuracy.</Paragraph> <Paragraph position="7"> Table 3 gives the results of the best performing normal-form model on the test set. The results of Clark et al. (2002) and Hockenmaier (2003a) are shown for comparison. The dependency set used by Hockenmaier contains some minor differences to the set used here, but &quot;evaluating&quot; our test set against Hockenmaier's gives an F-score of over 97%, showing the test sets to be very similar. The results show that our parser is performing significantly better than that of Clark et al., demonstrating the benefit of derivation features and the use of a sound statistical model.</Paragraph> <Paragraph position="8"> The results given so far have all used gold standard POS tags from CCGbank. Table 3 also gives the results if automatically assigned POS tags are used in the training and testing phases, using the C&C POS tagger (Curran and Clark, 2003). The performance reduction is expected given that the supertagger relies heavily on POS tags as features.</Paragraph> <Paragraph position="9"> More investigation is needed to properly compare our parser and Hockenmaier's, since there are a number of differences in addition to the models used: Hockenmaier effectively reads a lexicalised PCFG off CCGbank, and is able to use all of the available training data; Hockenmaier does not use a supertagger, but does use a beam search.</Paragraph> <Paragraph position="10"> Parsing the 2,401 sentences in section 23 takes 1.6 minutes using the normal-form model, and 10.5 minutes using the dependency model. The difference is due largely to the normal-form constraints used by the normal-form parser. Clark and Curran (2004) shows that the normal-form constraints significantly increase parsing speed and, in combination with adaptive supertagging, result in a highly efficient wide-coverage parser.</Paragraph> <Paragraph position="11"> As a final oracle experiment we parsed the sentences in section 00 using the correct lexical categories from CCGbank. Since the parser uses only a subset of the lexical categories in CCGbank, 7% of the sentences could not be parsed; however, the labelled F-score for the parsed sentences was almost 98%. This very high score demonstrates the large amount of information in lexical categories.</Paragraph> </Section> class="xml-element"></Paper>