File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/99/w99-0608_evalu.xml
Size: 5,514 bytes
Last Modified: 2025-10-06 14:00:40
<?xml version="1.0" standalone="yes"?> <Paper uid="W99-0608"> <Title>Improving POS Tagging Using Machine-Learning Techniques</Title> <Section position="9" start_page="57" end_page="59" type="evalu"> <SectionTitle> 5 Experiments and Results </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="57" end_page="57" type="sub_section"> <SectionTitle> 5.1 Constructing and Evaluating Ensembles </SectionTitle> <Paragraph position="0"> First, the three types of ensembles were applied to the 19 selected ambiguity classes in order to decide which is the best in each case. The evaluation was performed by means of a 10-fold cross-validation ,using the training corpus. The obtained results confirm that all methods contribute to improve accuracy in almost all domains. The absolute improvement is not very impressive but the variance is generally very low and, so, the gain is statistically significant in the majority of cases. Summarizing, BAG wins in 8 cases, FCOMB in 9, and FSC in 2 (including the unknown-word class).</Paragraph> <Paragraph position="1"> These results are reported in table 4, in which the error rate of a single basic tree is compared to the results of the ensembles for each ambiguity class r. The last column presents the percentage of error reduction for the best method in each row.</Paragraph> <Paragraph position="2"> Second, CPD was applied to the 82 selected ambiguity classes, with positive results in 59 cases, from which 25 were statistically significant (again in a 10-fold cross-validation experiment). These 25 classes agglutinate 20,937 examples and the error rate was diminished, o11 average, from 20.16% to 18.17%.</Paragraph> </Section> <Section position="2" start_page="57" end_page="59" type="sub_section"> <SectionTitle> 5.2 Tagging with the Enriched Model </SectionTitle> <Paragraph position="0"> Ensembles of classifiers were learned for the ambiguity classes explained in the previous sections using the best technique in each case.</Paragraph> <Paragraph position="1"> These ensembles were included in the tree-base, used by the basic taggers of section 3, substituting the corresponding individual trees, and both taggers were tested again using the enriched model.</Paragraph> <Paragraph position="2"> At runtime, the combination of classifiers was done by averaging the results of each individual decision tree.</Paragraph> <Paragraph position="3"> In order to test the relative improvement of each component, the inclusion of the ensembles is performed in three steps: 'CPD ~ stands for the ensembles for infrequent ambiguity classes, 'ENS' stands for the ensembles for frequent ambiguity classes and unknown words, and 'CPD-~ENS' stands for the inclusion of both. Results are described in table 5.</Paragraph> <Paragraph position="4"> Some important conclusions are: * The best result of each tagger is significantly better than each corresponding basic version, and the accuracy consistently grows as more components are added.</Paragraph> <Paragraph position="5"> * The relative improvement of STT + is lower than those of RTT and STT, suggesting than the better the tree-based model is, the less relevant is the inclusion of n-gram information.</Paragraph> <Paragraph position="6"> * The special treatment of low frequent ambiguity classes results in a very small contribution, indicating that there is no much 7These figures are calculated by averaging the resu|ts of the ten folds.</Paragraph> <Paragraph position="7"> to win from these classes, unless we were able to fix their errors in a much greater proportion than we really did.</Paragraph> <Paragraph position="8"> * The price to pay for the enriched models is a substantial overhead in storage requirement and speed decreasing, which in the worst case is divided by 5.</Paragraph> <Paragraph position="9"> In order to compare our results to others, we list in table 6 the results reported by several state-of-the-art PO5 taggers, tested on the WSJ corpus with the open vocabulary assumption. In that table, TBL stands for Brill's transformation-based error-driven tagget (Brill, 1995), ME stands for a tagger based on the maximum entropy modelling (Ratnaparkhi, 1996), SPATTER stands for a statistical parser based on decision trees (Magerman, 1996), IGTREE stands for the memory-based tagger by Daelemans et al. (1996), and, finally, TComb stands for a tagger that works by combination of a statistical trigram-based tagger, Comparing to all the individual tuggers we observe that our approach reports the highest accuracy, and that it is comparable to that of YComb obtained by the combination of three tuggers. This is encouraging, since we have improved an individual POS tagger which could be further introduced as a better component in an ensemble of tuggers.</Paragraph> <Paragraph position="10"> Unfortunately, the performance on unknown words is difficult to compare, since it strongly depends on the used lexicon. For instance, IGTREE does not include in the lexicon the numbers appearing in the training set, and, so, any number in the test set is considered unknown (they report an unusually high percentage of Unknown words: 5.5% compared to our 2.25%).</Paragraph> <Paragraph position="11"> The fact that numbers are very easy to recognize could explain their outstanding results on tagging unknown words. ME also reports a higher percentage of unknown words, 3.2%, * while TBL says nothing about this issue.</Paragraph> </Section> </Section> class="xml-element"></Paper>