File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/i05-3029_concl.xml

Size: 1,486 bytes

Last Modified: 2025-10-06 13:54:38

<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-3029">
  <Title>Maximal Match Chinese Segmentation Augmented by Resources Generated from a Very Large Dictionary for Post-Processing</Title>
  <Section position="7" start_page="548" end_page="548" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> We have reported our results on two open tracks of the Second International Chinese Word Segmentation Bakeoff, based on a production segmentation system, which draws heavily on a large and unique dictionary. The dictionary is derived from processing a very large amount of synchronous textual data gathered from various Chinese speech communities, based on a uniform segmentation standard. It is shown that the primary dictionary-based BMM segmentation alone contribute the most in our segmentation system, with over 95% in recall and over 90% in precision, attributable to the large size of the dictionary, although our uniform segmentation standard may not have realized its full potential given the test corpora with different and changing standards. We also explored supplementary features offered by the large dictionary in postprocessing, and results incrementally improve.</Paragraph>
    <Paragraph position="1"> Hence our large dictionary derived from our uniform treatment of synchronous data provides a useful resource and provides a good platform for further extension in various aspects of language engineering.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML