File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1718_intro.xml
Size: 2,700 bytes
Last Modified: 2025-10-06 14:02:07
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1718"> <Title>Single Character Chinese Named Entity Recognition</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The research of named entity recognition (NER) becomes very popular in recent years due to its wide applications and the Message Understanding Conference (MUC) which provides a standard test-bed for NER evaluation. Recent research on English NER includes (Collins, 2002; Isozaki, 2002; Zhou, 2002; etc.). Chinese NER research includes (Liu, 2001; Zheng, 2000; Yu, 1998; Chen, 1998; Shen, 1995; Sun, 1994; Zhang, 1992 etc.) In Chinese NEs, there is a special kind of NE, called single character named entity (SCNE), on which there is little in-depth research. SCNE is a NE composed of only one Chinese character, such as the location name &quot; a0&quot; (zhong1,China) and &quot; a1&quot; (e2,Russia) in the phrase &quot; a0a1a4a5&quot; (zhong1-e2-mao4-yi4, trade between China and Russia). SCNE is very common in written Chinese text. For instance, SCNE accounts for 8.17% of all NE tokens according to our statistics on a 10MB corpus. However, due to the lack of research, SCNE is a major source of errors in NER. Among three state-of-the-art systems we have, the best F-scores of single character location (SCL) and single character person (SCP) are 43.63% and 43.48% respectively. This paper formulates the SCNE recognition within the source-channel model framework. Our results show very encouraging performance. We achieve an F-score of 81.01% for SCL recognition and an F-score of 68.02% for SCP recognition. An alternative view of the SCNE recognition problem is to formulate it as a classification task. For example, &quot;a0&quot; is a SCNE in &quot;a0a1a4a5&quot;, but not in &quot;a6a7a8a0&quot;(bei3-jing1si4-zhong1, Beijing No.4 High School). We then construct two classifiers respectively based on two statistical models: maximum entropy model (ME) and vector space model (VSM). We compare these two classifiers with the source-channel model, showing that the source-channel model is slightly better. We then compare the source-channel model with other three state-of-the-art NER systems.</Paragraph> <Paragraph position="1"> The remainder of this paper is structured as follows: Section 2 introduces the task of SCNE recognition and related work. Section 3 and 4 propose the source-channel model and two classifiers for SCNE recognition, respectively. Section 5 presents experimental results and error analysis. Section 6 gives conclusion.</Paragraph> </Section> class="xml-element"></Paper>