File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/w01-0904_evalu.xml
Size: 4,559 bytes
Last Modified: 2025-10-06 13:58:46
<?xml version="1.0" standalone="yes"?> <Paper uid="W01-0904"> <Title>Translating Treebank Annotation for Evaluation</Title> <Section position="6" start_page="0" end_page="0" type="evalu"> <SectionTitle> 5 Results </SectionTitle> <Paragraph position="0"> Here we provide similar evaluation of the systems as others (Hockenmaier et al., 2000; Xia, 1999) for easy comparison. Both systems were used translate C1 and C2. C2 is used for determining the coverage of the grammar used by the two systems. Both systems, at times, failed to translate examples (frequently due to annotation error in the original treebank). The top-down system failed on 60 and 15 examples from C1 and C2 both approaches respectively. The bottom-up system failed on 66 and 15 examples from C1 and C2 respectively.</Paragraph> <Paragraph position="1"> Table 1 describes the type of categories used to translate C1 and the size of the lexicons generated. Categories with variables in were ignored, as they could usually be unified with an already existing category. With this in mind, the bottom-up algorithm extracted a more compact lexicon. The average category sizes (the number of slash operators in categories) are interesting, as they indicate the profligacy of the top-down algorithm in creating unwieldy categories, whereas the bottom-up approach uses smaller and, on inspection, more plausible categories. These results seem, in part, to vindicate the choice of a controlled bottom-up approach.</Paragraph> <Paragraph position="2"> Tables 2 and 3 present the results for both systems for the frequency distribution of categories (i.e. the number of categories that appeared with a particular frequency) and the frequency distribution of the number of categories for a word (i.e. the number of words that had a particular number of categories). The trends for both systems are similar. There are a large number of categories that appear very infrequently, these tend quency range of number of categories to be the larger, generated categories and often fit unusual circumstances e.g. misannotation of the treebank, or mistakes in the use of the heuristics. The bottom-up approach has many fewer of these categories, indicating the problem of propagating of errors down the tree with the top-down approach. There are also a few exceptionally frequent categories, these are noun phrases, nouns, and some of the common verbs.</Paragraph> <Paragraph position="3"> The number of categories per word is similar, suggesting the approaches are similar in their ability to produce the variety of categories required for words.</Paragraph> <Paragraph position="4"> While these figures give some indication of the quality and compactness of the translation, it is useful to determine the coverage of the lexicon extracted from C1 by comparing it with a lexicon extracted from C2 and so determine the quality and generality of the lexicon that has been produced in the translation. Table 4 shows the comparison. Here entry means the C1 lexicon contains an entry the same as the C2 entry. kwkc means that the entry from C2 is not in C1, but both the word and the category are known. kwuc means the word is in the C1 lexicon, but the category is not. Finally, uw indicates that the word is in C1. Despite a smaller lexicon and a smaller number of categories, the bottom-up system gives better coverage. Note especially that there are no unknown categories with with the bottom-up approach and that the percentage of exact entries is much higher.</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="evalu"> <SectionTitle> approaches 6 Conclusions </SectionTitle> <Paragraph position="0"> The system presented provides a useful and accurate method for translating the annotation of the Penn Treebank into a CG annotation. Comparisons with an alternative approach suggest that the increase of control provided by the system lead to a more accurate and compact translation, which is more linguistically plausible. Most importantly, the system is flexible enough to allow the user to annotate corpora with the kind of CG they are interested in, which is vital when it is to be used for evaluation.</Paragraph> <Paragraph position="1"> It would be useful to expand the systems to work on the full treebank i.e. including sentences with movement (see Hockenmaier et al (Hockenmaier et al., 2000) for discussion of a possible method). The correcting of the annotation of the treebank during translation should also be investigated further.</Paragraph> </Section> class="xml-element"></Paper>