File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-0135_intro.xml
Size: 1,543 bytes
Last Modified: 2025-10-06 14:03:50
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-0135"> <Title>NetEase Automatic Chinese Word Segmentation</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction Automatic Chinese Word Segmentation (WS) is </SectionTitle> <Paragraph position="0"> the fundamental task of Chinese information processing [Liu, 2000].Since there are lots of works depending on the automatic segmentation of Chinese words, different Chinese NLPenabled applications may have different requirements that call for different granularities of word segmentation. The key to accurate automatic word identification in Chinese lies in the successful resolution of those ambiguities and a proper way to handle out-of-vocabulary (OOV) words (such as person names, place names and organization name etc.).</Paragraph> <Paragraph position="1"> We have applied corpus-based method to extracting various language phenomena from real texts; and have combined statistical model with rules in Chinese word segmentation, which has increased the precision of segmentation by improving ambiguous phrase segmentation and out-of-vocabulary word recognition.</Paragraph> <Paragraph position="2"> In the second section of this paper, we describe a Chinese word segmentation system developed by NetEase. And we present our strategies on solving the problems of ambiguous phrase segmentation and identification of Chinese people names and place names. The third section is analysis of evaluation result.</Paragraph> </Section> class="xml-element"></Paper>