File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/p96-1046_metho.xml
Size: 2,202 bytes
Last Modified: 2025-10-06 14:14:28
<?xml version="1.0" standalone="yes"?> <Paper uid="P96-1046"> <Title>Using Parsed Corpora for Structural Disambiguation in the TRAINS Domain</Title> <Section position="4" start_page="345" end_page="346" type="metho"> <SectionTitle> 3 Measure of Success </SectionTitle> <Paragraph position="0"> One hope of this project is to make generalizations across corpora of different domains. Thus, experiments included trials where the 91-93 dialogs were used to predict the 95 dialogs 3 and vice versa.</Paragraph> <Paragraph position="1"> Experiments on the effect of training and testing KANKEI on the same set of dialogs used cross validation; several trials were run with a different part of the corpus being held out each time. In all these cases, the use of partial patterns and word classes was varied in an attempt to determine their effect.</Paragraph> <Paragraph position="2"> testing on the 95 dialogs Tables 1, 2, and 3 show the results for the best parameter settings from these experiments.</Paragraph> <Paragraph position="3"> 391-93 dialogs were used for training and the 95 dialogs for testing.</Paragraph> <Paragraph position="4"> dialogs The rows labeled % by Default give the portion of the total success rate (last row) accounted for by KANKEI's default guess. The results of training on the 95 data and testing on the 93 data are not shown because the best results were no better than always attaching to the VP. Notice that all of these results involve either word classes or partial patterns. There is a difference of at least 30 attachments (1.9% accuracy) between the best results in these tables and the results that did not use word classes or partial patterns. Thus, it appears that at least one of these methods of generalization is needed for this high-dimensional space. The 93 dialogs predicted attachments in the 95 test data with a success rate of 90.9% which suggests that KANKEI is capable of making generalizations that are independent of the corpus from which they were drawn. The overall accuracy is high: the 95 data was able to predict itself with an accuracy of 92.2%, while the 93 data predicted itself with an accuracy of 92.4%.</Paragraph> </Section> class="xml-element"></Paper>