File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1506_intro.xml
Size: 2,175 bytes
Last Modified: 2025-10-06 14:02:01
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1506"> <Title>Multi-Language Named-Entity Recognition System based on HMM</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> There is increasing demand for cross-language information retrieval. Due to the development of the World Wide Web, we can access information written in not only our mother language but also foreign languages. One report has English as the dominant language of web pages (76.6 %), followed by Japanese (2.77 %), German (2.28 %), Chinese (1.69 %), French (1.09 %), Spanish (0.81 %), and Korean (0.65 %) [1]. Internet users who are not fluent in English finds this situation far from satisfactory; the many useful information sources in English are not open to them.</Paragraph> <Paragraph position="1"> To implement a multi-language information retrieval system, it is indispensable to develop multi-language text analysis techniques such as morphological analysis and named-entity recognition. They are needed in many natural language processing applications such as machine translation, information retrieval, and information extraction.</Paragraph> <Paragraph position="2"> We developed a multi-language named-entity recognition system based on HMM. This system is mainly for Japanese, Chinese, Korean and English, but it can handle any other language if we have training data of the target language. This system has a common analytical engine and only the lexical analysis rules and statistical language model need be changed to handle any other language. Previous works on multi-language named-entity recognition are mainly for European languages [2]. Our system is the first one that can handle Asian languages, as far as we know.</Paragraph> <Paragraph position="3"> In the following sections, we first describe the system architecture and language model of our named-entity recognition system. We then describe the evaluation results of our system. Finally, we report preliminary experiments on the automatic construction of a bilingual named-entity dictionary.</Paragraph> </Section> class="xml-element"></Paper>