File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-2056_concl.xml

Size: 1,566 bytes

Last Modified: 2025-10-06 13:55:25

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2056">
  <Title>Unsupervised Segmentation of Chinese Text by Use of Branching Entropy</Title>
  <Section position="10" start_page="433" end_page="434" type="concl">
    <SectionTitle>
8 Conclusion
</SectionTitle>
    <Paragraph position="0"> We have reported an unsupervised Chinese segmentation method based on the branching entropy. This method is based on an assumption that \if the entropy of successive tokens increases, the location is at the context border.&amp;quot; The entropies of n-grams were learned from an unsegmented 200-MB corpus, and the actual segmentation was conducted directly according to the above assumption, on 1 MB of test data. We found that the precision was as high as 90% with recall being around 80%.</Paragraph>
    <Paragraph position="1"> We also found an amazing tendency for the precision to always remain high, regardless of the size of the learning data.</Paragraph>
    <Paragraph position="2"> There are two important considerations for our future work. The rst is to gure out how to combine the supervised and unsupervised methods. In particular, as the performance of the supervised methods could be insucient for data that are not from newspapers, there is the possibility of combining the supervised and unsupervised methods to achieve a higher accuracy for general data. The second future work is to verify our basic assumption in other languages. In particular, we should undertake experimental studies in languages writtenwith phonogram characters.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML