File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/w04-1107_abstr.xml
Size: 979 bytes
Last Modified: 2025-10-06 13:43:49
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1107"> <Title>Chinese Chunking with another Type of Spec</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> Spec is a critical issue for automatic chunking.</Paragraph> <Paragraph position="1"> This paper proposes a solution of Chinese chunking with another type of spec, which is not derived from a complete syntactic tree but only based on the un-bracketed, POS tagged corpus. With this spec, a chunked data is built and HMM is used to build the chunker. TBL-based error correction is used to further improve chunking performance. The average chunk length is about 1.38 tokens, F measure of chunking achieves 91.13%, labeling accuracy alone achieves 99.80% and the ratio of crossing brackets is 2.87%. We also find that the hardest point of Chinese chunking is to identify the chunking boundary inside</Paragraph> </Section> class="xml-element"></Paper>