File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-1119_concl.xml
Size: 1,240 bytes
Last Modified: 2025-10-06 13:54:16
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1119"> <Title>A Semi-Supervised Approach to Build Annotated Corpus for Chinese Named Entity Recognition</Title> <Section position="11" start_page="1" end_page="1" type="concl"> <SectionTitle> 5 Conclusion </SectionTitle> <Paragraph position="0"> This paper presents a semi-supervised method to save human effort in building annotated corpus.</Paragraph> <Paragraph position="1"> This method uses a small set of human-annotated corpus to boost the quality of the annotation of the entire corpus. We test this method on Gao's Chinese word segmentation system, which achieves a state-of-the-art performance on SIGHAN backoff data sets (Gao et al, 2004).</Paragraph> <Paragraph position="2"> Several conclusions can be drawn from our experiments: null null The obtained corpus is of high quality.</Paragraph> <Paragraph position="3"> null 20-million-characters is the optimal size of hand-annotated subset to boost the 80-millioncharacter training data, considering the trade-off between the cost of human labor and the performance of the resulting segmenter.</Paragraph> <Paragraph position="4"> null We save 62.5% human labor in corpus annotation. null</Paragraph> </Section> class="xml-element"></Paper>