File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/p05-2005_concl.xml
Size: 2,671 bytes
Last Modified: 2025-10-06 13:54:43
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-2005"> <Title>Exploiting Named Entity Taggers in a Second Language</Title> <Section position="6" start_page="28" end_page="28" type="concl"> <SectionTitle> 5 Conclusions </SectionTitle> <Paragraph position="0"> Named entities have a wide usage in natural language processing tasks. For instance, it has been shown that indexing NEs within documents can help increase precision of information retrieval systems (Mihalcea and Moldovan, 2001). Other applications of NEs are in Question Answering (Mann, 2002; P'erez-Couti~no et al., 2004) and Machine Translation (Babych and Hartley, 2003). Thus it is important to have accurate NER systems, but these systems must be easy to port and robust, given the great variety of documents and languages for which it is desirable to have these tools available.</Paragraph> <Paragraph position="1"> In this work we have presented a method for performing named entity recognition. The method uses a hand coded system and a set of lexical and orthographic features to train a machine learning algorithm. Apart from the hand coded system our method does not require any language dependent features, we do not make use of lists of trigger words, neither we use any gazetteer information.</Paragraph> <Paragraph position="2"> The only information used in this approach is automatically extracted from the documents, without human intervention. Yet, the results presented here are very encouraging. We were able to achieve good accuracies for NEC in Portuguese, where we needed to classify NEs into 10 possible classes, by exploiting a hand-coded system for Spanish targeted to only 4 classes. This achievement gives evidence of the flexibility of our method. Additionally we outperform the hand coded system on NER in Spanish. Thus, our method has shown to be robust and easy to port to other languages. The only requirement for using our method is a tokenizer for languages that do not separate words with white spaces, the rest can be used pretty straightforward.</Paragraph> <Paragraph position="3"> We are interested in exploring the use of this method to perform NER in English, we would like to determine to what extent our system is capable of achieving competitive results without the use of language dependent resources, such as dictionaries and lists of words. Another research direction is the adaptation of this method to cross language NER.</Paragraph> <Paragraph position="4"> We are very interested in exploring if, by training a classifier with mixed language corpora, we can perform NER in more than one language simultaneously. null</Paragraph> </Section> class="xml-element"></Paper>