File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/i05-3031_intro.xml
Size: 1,359 bytes
Last Modified: 2025-10-06 14:03:04
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-3031"> <Title>Two-Phase LMR-RC Tagging for Chinese Word Segmentation</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The Chinese word segmentation is a non-trivial task because no explicit delimiters (like spaces in English) are used for word separation. As the taskisanimportantprecursortomanynaturallanguage processing systems, it receives a lot of attentions in the literature for the past decade (Wu and Tseng, 1993; Sproat et al., 1996). In this paper, we propose a statistical approach based on the works of (Xue and Shen, 2003), in which the Chinese word segmentation problem is first transformed into a tagging problem, then the Maximum Entropy classifier is applied to solve the problem. We further improve the scheme by introducing correctional treatments after first round tagging. Two different training methods are proposed to suit our scheme.</Paragraph> <Paragraph position="1"> The paper is organized as follows. In Section 2, we briefly discuss the scheme proposed by (Xue and Shen, 2003), followed by our additional works to improve the performance. Experimental and bakeoff results are presented in Section 3. Finally, We conclude the paper in Section 4.</Paragraph> </Section> class="xml-element"></Paper>