File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-1724_concl.xml

Size: 1,557 bytes

Last Modified: 2025-10-06 13:53:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1724">
  <Title>Integrating Ngram Model and Case-based Learning For Chinese Word Segmentation</Title>
  <Section position="7" start_page="0" end_page="0" type="concl">
    <SectionTitle>
6 Conclusion and future work
</SectionTitle>
    <Paragraph position="0"> We have presented our recent work for participation in ICWSB-1 based on a general-purpose ngram model for probabilistic word segmentation and a case-based learning strategy for disambiguation. The ngram model is trained using available unsegmented texts with the EM algorithm with the aid of Viterbi segmentation. The learning strategy acquires a set of context-dependent transformation rules to correct mistakes in the probabilistic segmentation of ambiguous substrings. This integrated approach demonstrates an impressive effectiveness by its outstanding performance on IV word identi cation. With elimination of the bug and false errors, its performance could be signi cantly better.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
6.1 Future work
</SectionTitle>
      <Paragraph position="0"> The above problem analysis points to two main directions for improvement in our future work: (1) OOV word detection; (2) a better strategy for learning and applying transformation rules to reduce the side-effect. In addition, we are also interested in studying the effectiveness of higher-order ngram models and variants of EM training for Chinese word segmentation.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML