File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-1504_concl.xml
Size: 1,982 bytes
Last Modified: 2025-10-06 13:53:46
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1504"> <Title>Low-cost Named Entity Classification for Catalan: Exploiting Multilingual Resources and Unlabeled Data</Title> <Section position="8" start_page="0" end_page="0" type="concl"> <SectionTitle> 6 Conclusions </SectionTitle> <Paragraph position="0"> We have presented a thorough experimental work on developing low-cost Named Entity classifiers for a language with no available annotated resources.</Paragraph> <Paragraph position="1"> Several strategies to build a Catalan NEC system have been devised and evaluated. On the one hand, using only a small initial hand-tagged corpus, supervised (AdaBoost) and fully unsupervised (Greedy Agreement) learning algorithms have been compared. On the other, using existing resources for a similar language as a starting point, a bilingual classifier has been trained. In both cases, bootstrapping strategies have been tested.</Paragraph> <Paragraph position="2"> The main conclusions drawn form the presented results are: a5 Given a small labelled data set, AdaBoost supervised learning algorithm clearly outperforms the fully unsupervised Greedy Agreement algorithm, even when large unlabelled text is available.</Paragraph> <Paragraph position="3"> a5 Supervised models trained with few annotated data do not easily profit from bootstrapping strategies, even when using examples with high-confidence for retraining. Examples labelled with unsupervised models provide a complementary boost when bootstrapping.</Paragraph> <Paragraph position="4"> a5 Multilingual models, trained with an automatically derived dictionary, are able to significantly improve accuracy for the language with less annotated resources without significantly decreasing performance in the language with more data available. Retraining with unlabelled examples performs a bit better, learning a much accurate classifier than when using only Catalan labelled examples.</Paragraph> </Section> class="xml-element"></Paper>