File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-0133_metho.xml
Size: 2,263 bytes
Last Modified: 2025-10-06 14:10:37
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-0133"> <Title>Maximum Entropy Word Segmentation of Chinese Text</Title> <Section position="5" start_page="187" end_page="187" type="metho"> <SectionTitle> 3 Further testing </SectionTitle> <Paragraph position="0"> In order to get some idea of how each of our additions to Low et al.'s system contributed to our results, we ran a number of experiments with the gold-standard segmentations distributed after the completion of the bakeoff. We stripped out all of the additions and then added them back in one by one, segmenting and scoring the test data each time. What we found is that our system actually performed best with the implementation of the Viterbi algorithm (which raised F scores by an average of about 0.09 compared to simply choosing the most likely tag at each stage) but without any of the extra outcome-dependent or independent features. There were only two exceptions to this: * The system achieved slightly higher OOV recall rates for the MSRA and CITYU corpora with the place-char and deng-list features than without.</Paragraph> <Paragraph position="1"> * The system achieved a very small increase in F score for the UPUC corpus with the place-char feature than without.</Paragraph> <Paragraph position="2"> Besides these small differences, the model was best off without any of the features enumerated in Sections 1.3 and 1.4, obtaining the scores listed in Table2. Thisisasurprisingresult, asinourtesting the added features helped to improve the F scores and OOV recall rates of the system when dealing with the 2005 bakeoff data, even if only by a small amount in some cases.</Paragraph> <Paragraph position="3"> It should be noted that in our testing during development,evenwhenwestrovetocreateasystem null which matched as closely as possible the one described by Low et al. (2005), we were unable to achieve scores for the 2005 bakeoff data as high as their system did. Why this was the case remains a mystery to us. It is possible that at least some of the gap is due to implementation differences. In particular, the maximum entropy toolkit utilized along with the training algorithms chosen seem likely candidates for sources of the disparity.</Paragraph> </Section> class="xml-element"></Paper>