File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-0120_concl.xml
Size: 886 bytes
Last Modified: 2025-10-06 13:55:32
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-0120"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics On Closed Task of Chinese Word Segmentation: An Improved CRF Model Coupled with Character Clustering and Automatically Generated Template Matching</Title> <Section position="6" start_page="136" end_page="136" type="concl"> <SectionTitle> 4 Conclusion </SectionTitle> <Paragraph position="0"> The contribution of this paper is two fold. First, we successfully apply the K-means algorithm to character clustering and develop several cluster set selection algorithms for our GS tagger. This significantly improves the handling of sentences containing non-Chinese words as well as the overall performance. Second, we develop a post-processing method that compensates for the weakness of ML-based CWS on longer words.</Paragraph> </Section> class="xml-element"></Paper>