File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/04/w04-1107_relat.xml
Size: 1,582 bytes
Last Modified: 2025-10-06 14:15:46
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1107"> <Title>Chinese Chunking with another Type of Spec</Title> <Section position="10" start_page="5" end_page="5" type="relat"> <SectionTitle> 7 Related Work </SectionTitle> <Paragraph position="0"> For chunking spec, the CoNLL2000 shared task defines a program chunklink to extract chunks from English Treebank. (Li, 2003) defines the similar Treebank-derived spec for Chinese and she reports manual check is also needed to make data consistent. Part of the Sparkle project has concentrates on a spec based on un-bracketed corpus of English, Italian, French and German(Carroll et al., 1997). (Zhou, 2002) defines base phrase which is similar as chunk for Chinese, but his annotation and experiment are on his own corpus.</Paragraph> <Paragraph position="1"> For chunking algorithm, many machine learning (ML) methods have been applied and got promising results after chunking is represented as tagging problem, such as: SVM (Kudoh and Matsumoto, 2001), Memory-based (Bosch and Buchholz, 2002), SNoW (Li and Roth), et al..</Paragraph> <Paragraph position="2"> Some rule-base chunking (Kinyon, 2003) and combining rules with learning (Park and Zhang, 2003) are also reported.</Paragraph> <Paragraph position="3"> For annotation, (Brants, 2000) reports the inter-annotator agreement of part-of-speech annotations is 98.57%, the one of structural annotations is 92.43% and some consistency measures. (Xue et al., 2002) also address some issues related to building a large-scale Chinese corpus.</Paragraph> </Section> class="xml-element"></Paper>