File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-1125_concl.xml
Size: 1,432 bytes
Last Modified: 2025-10-06 13:55:20
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1125"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics A Phonetic-Based Approach to Chinese Chat Text Normalization</Title> <Section position="10" start_page="999" end_page="999" type="concl"> <SectionTitle> 7 Conclusions </SectionTitle> <Paragraph position="0"> To address the sparse data problem and dynamic problem in Chinese chat text normalization, the phonetic mapping models are proposed in this paper to represent mappings between chat terms and standard words. Different from character mappings, the phonetic mappings are constructed from available standard Chinese corpus. We extend the source channel model by incorporating the phonetic mapping models. Three conclusions can be made according to our experiments.</Paragraph> <Paragraph position="1"> Firstly, XSCM outperforms SCM with same training data. Secondly, XSCM produces higher performance consistently on time-varying test sets. Thirdly, both SCM and XSCM perform best with biggest training chat language corpus.</Paragraph> <Paragraph position="2"> Some questions remain open to us regarding optimal size of training chat language corpus in XSCM. Does the optimal size exist? Then what is it? These questions will be addressed in our future work. Moreover, bigger context will be considered in chat term normalization, discourse for instance.</Paragraph> </Section> class="xml-element"></Paper>