File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-1725_concl.xml
Size: 819 bytes
Last Modified: 2025-10-06 13:53:49
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1725"> <Title>A Unicode based Adaptive Segmentor</Title> <Section position="7" start_page="3" end_page="3" type="concl"> <SectionTitle> 5 Conclusion </SectionTitle> <Paragraph position="0"> In this paper, design and algorithms of a generalpurposed Unicode based segmentor is proposed. It is able to process Simplified and Traditional Chinese appear in the same text. Sophisticated pre-processing and other auxiliary modules help segmenting text more accurately. User interactions and modules can be easily added with the help of its modular design. A built-in new word extractor is also implemented for extracting new words from running text. It saves much time on training and thus it can be quickly adapted to new environments.</Paragraph> </Section> class="xml-element"></Paper>