File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-1119_concl.xml

Size: 1,240 bytes

Last Modified: 2025-10-06 13:54:16

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1119">
  <Title>A Semi-Supervised Approach to Build Annotated Corpus for Chinese Named Entity Recognition</Title>
  <Section position="11" start_page="1" end_page="1" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> This paper presents a semi-supervised method to save human effort in building annotated corpus.</Paragraph>
    <Paragraph position="1"> This method uses a small set of human-annotated corpus to boost the quality of the annotation of the entire corpus. We test this method on Gao's Chinese word segmentation system, which achieves a state-of-the-art performance on SIGHAN backoff data sets (Gao et al, 2004).</Paragraph>
    <Paragraph position="2"> Several conclusions can be drawn from our experiments: null null The obtained corpus is of high quality.</Paragraph>
    <Paragraph position="3"> null 20-million-characters is the optimal size of hand-annotated subset to boost the 80-millioncharacter training data, considering the trade-off between the cost of human labor and the performance of the resulting segmenter.</Paragraph>
    <Paragraph position="4"> null We save 62.5% human labor in corpus annotation. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML