File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/c02-1079_evalu.xml
Size: 4,401 bytes
Last Modified: 2025-10-06 13:58:46
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1079"> <Title>Best Analysis Selection in Inflectional Languages</Title> <Section position="4" start_page="0" end_page="0" type="evalu"> <SectionTitle> 3 Results </SectionTitle> <Paragraph position="0"> This section presents results of experiments with the stated figures of merit for the best analysis selection algorithm. First, the acquisition of training data set derived by exploitation of a standard dependency tree bank for Czech is described. Then, we step to a comparison of parser running times with that of another available parser.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 The Training Set Acquisition </SectionTitle> <Paragraph position="0"> A common approach to acquiring the statistical data for analysis of syntax employs learning the values from a fully tagged tree bank training corpus. Building of such corpora is a tedious and expensive work and it requires a team cooperation of linguists and computer scientists. At present the only source of Czech tree bank data is the Prague Dependency Tree Bank (PDTB) (HajiVc, 1998), which includes dependency analyses of about 100000 Czech sentences.</Paragraph> <Paragraph position="1"> First, in order to be able to exploit the data from PDTB, we have supplemented our grammar with the dependency specification precision on sentences percentage of 1-10 words 86.9% of 11-20 words 78.2% of more than 20 words 63.1% overall precision 79.3% number of sentences with 8.0% mistakes in input for constituents. Thus the output of the analysis can be presented in the form of pure dependency tree. In the same time we unify classes of derivation trees that correspond to one dependency structure. We then define a canonical form of the derivation to select one representative of the class that is used for assigning the edge probabilities.</Paragraph> <Paragraph position="2"> This technique enables us to relate the output of our parser to the PDTB data.</Paragraph> <Paragraph position="3"> However, the profit of exploitation of the information from the dependency structures can be higher than that and can run in an automatically controlled environment. For this purpose, we use the mechanism of pruning constraints. A set of strict limitations is given to the syntactic analyser, which passes on just the compliant parses. The constraints can be either supplied manually for particular sentence by linguists, or obtained from the transformed dependency tree in PDTB.</Paragraph> <Paragraph position="4"> The Table 1 summarizes the precision estimates counted on real corpus data. These measurements presented here may discount the actual benefits of our approach due to the estimated 8% of mistakes in the input corpus.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Running Time Comparison </SectionTitle> <Paragraph position="0"> The effectivity comparison of different parsers and parsing techniques brings a strong impulse to improving the actual implementations. Since there is no other generally applicable and available NL parser for Czech, we have compared the running times of our syntactic analyser on the data provided at http://www.cogs.susx.ac.uk/ lab/nlp/carroll/cfg-resources/.</Paragraph> <Paragraph position="1"> These WWW pages resulted from discussions at the Efficiency in Large Scale Parsing Systems Workshop at COLING'2000, where one of the main conclusions was the need for a bank of data for standardization of parser benchmarking. The best results reported on standard data sets (ATIS and PT grammars) until today are the comparison data by Robert C. Moore (Moore, 2000). In the package, only the testing grammars with input sentences are at the disposal, the release of referential implementation of the parser is currently being prepared (Moore, personal communication).</Paragraph> <Paragraph position="2"> Since we could not run the referential implementation of Moore's parser on the same machine, the above mentioned times are not fully comparable (we assume that our tests were run on a slightly faster machine than that of Moore's tests). We prepare a detailed comparison, which will try to explain the differences of results when parsing with grammars of varying ambiguity level.</Paragraph> </Section> </Section> class="xml-element"></Paper>