File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-0134_intro.xml
Size: 1,305 bytes
Last Modified: 2025-10-06 14:03:50
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-0134"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics A Pragmatic Chinese Word Segmentation System</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Word is a logical semantic and syntactic unit in natural language. Unlike English, there is no delimiter to mark word boundaries in Chinese language, so in most Chinese NLP tasks, word segmentation is a foundation task, which transforms Chinese character string into word sequence. It is prerequisite to POS tagger, parser or further applications, such as Information Extraction, Question Answer system.</Paragraph> <Paragraph position="1"> Our system participated in the Third International Chinese Word Segmentation Bakeoff, which held in 2006. Compared with our system in the last bakeoff (Jiang 2005A), the system in the third bakeoff is adjusted intending to have a better pragmatic performance. This paper mainly focuses on describing two sub-tasks: (1) The basic Word Segmentation; (2) Named entities recognition. We apply different approaches to solve above two tasks, and all the modules are integrated into a pragmatic system (ELUS).</Paragraph> </Section> class="xml-element"></Paper>