File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/99/w99-0608_concl.xml
Size: 2,934 bytes
Last Modified: 2025-10-06 13:58:34
<?xml version="1.0" standalone="yes"?> <Paper uid="W99-0608"> <Title>Improving POS Tagging Using Machine-Learning Techniques</Title> <Section position="10" start_page="59" end_page="59" type="concl"> <SectionTitle> 6 Conclusions and Further Work </SectionTitle> <Paragraph position="0"> In this paper, we have applied several ML techniques for constructing ensembles of classifiers to address the most representative and/or difficult cases of ambiguity within a decision-tree-based English POS tagger. As a result, the over-all accuracy has been significantly improved.</Paragraph> <Paragraph position="1"> Comparing to other approaches, we see that our tagger performs better on the WSJ corpus and under the open vocabulary assumption, than a number of state-of-the-art POS tuggers, and similar to another approach based on the combination of several tuggers s.</Paragraph> <Paragraph position="2"> 8However, it has to be said that the pure statistical or machine-learning based approaches to POS tagging still significantly underperform some sophisticated manually constructed systems, such as the English shallow parser based on Constraint Grammars developed at the Helsinki University (Samuelsson and Voutilainen, 1997). The cost of this improvement has been quantiffed in terms of storage requirement and speed of the resulting enriched tuggers. Of course, there exists a clear tradeoff between accuracy and efficiency which should be resolved on the basis of the user needs. Although all proposed techniques are fully automatic, it has to be said that the construction of appropriate ensembles requires a significant human and computational effort.</Paragraph> <Paragraph position="3"> There are several features that should be further studied with respect to the used methods for constructing the ensembles of decision trees, the way they are combined and included in the tuggers, etc. However, we are now more interested on experimenting with the inclusion of our tagger as a component in an ensemble of pre-existing tuggers, in the style of (Brill and Wu, 1998; van Halteren et al., 1998).</Paragraph> <Paragraph position="4"> More generally, one may think that, after all the involved effort, the achieved improvement seems small. On this particular, we think that we are moving very close to the best achievable results using fully statistically-based techniques, and that some kind of specific human knowledge should be jointly considered in order to achieve the next qualitative step. We also think that other issues than simply 'accuracy rates' are becoming more important in order to test and evaluate the real utility of different approaches for tagging. Such aspects, that should be studied in the near future, refer to the ability of adapting to new domains (tuning), the types of errors committed and their influence on the task at hand, the language independence assumption, etc.</Paragraph> </Section> class="xml-element"></Paper>