File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/w02-0304_evalu.xml
Size: 3,156 bytes
Last Modified: 2025-10-06 13:58:53
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-0304"> <Title>Accenting unknown words in a specialized language</Title> <Section position="7" start_page="0" end_page="0" type="evalu"> <SectionTitle> 4 Results </SectionTitle> <Paragraph position="0"> The baseline of this task consists in accenting no e.</Paragraph> <Paragraph position="1"> On the accented part of the MeSH, it obtains an accuracy of 0.623, and on the test sample, 0.642. The Brill tagger learns 80 contextual rules with MeSH training (208 on ABU and 47 on CIM-SNOMED).</Paragraph> <Paragraph position="2"> The context method learns 1,832 rules on the MeSH training set (16,591 on ABU and 3,050 on CIMSNOMED). null Tables 5, 6 and 7 summarize the validation results obtained on the accented part of the MeSH. Set denotes the subset of words as explained in section 3.5. Cor. stands for the number of correctly accented words.</Paragraph> <Paragraph position="3"> Not surprizingly, the best global precision is obtained with MeSH training (table 6). The mixed context method obtains a perfect precision, whereas Brill reaches 0.901 (table 5). ABU and CIM-SNOMED training also obtain good results (table 7), again better with the mixed context method (0.9120.931) than with Brill (0.871-0.895). We performed the same tests with right and left contexts (table 6): precision can be as good for fully processed words (set CU) as that of mixed contexts, but recall is always lower. The results of these two context variants are therefore not kept in the following tables. Both precision and recall are generally slightly better with the majority decision variant. If we concentrate on the fully processed words (CU), precision is always higher than the global result and than that of words with no decision (D2). The D2 class, whose words are left unaccented, generally obtain a precision well over the baseline. Partially processed words (D4)are always those with the worst precision.</Paragraph> <Paragraph position="4"> training set cor. recall precisionA6ci MeSH training, 4054 words of accented MeSH.</Paragraph> <Paragraph position="5"> Precision and recall for the unaccented part of the MeSH are showed on tables 8 and 9. The global results with the different training sets at breakeven point, with their confidence intervals, are not really distinguishable. They are clustered from 0.819A60.047 to 0.842A60.044, except the unambiguous decision method trained on MeSH which stands a bit lower at 0.800A60.049 and the Brill tagger trained on ABU (0.785). If we only consider fully processed words, precision can reach 0.884A60.043 (ICD-SNOMED training, majority decision), with a recall of 0.731 (or 0.876A60.043 / 0.758 with MeSH training, majority decision).</Paragraph> <Paragraph position="6"> Consensus combination of several methods (table 8) does increase precision, at the expense of recall. A precision/recall of 0.920A60.037/0.750 is obtained by combining Brill and the mixed context method (majority decision), with MeSH training on both sides. The same level of precision is obtained with other combinations, but with lower recalls.</Paragraph> </Section> class="xml-element"></Paper>