File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-1724_evalu.xml

Size: 2,402 bytes

Last Modified: 2025-10-06 13:59:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1724">
  <Title>Integrating Ngram Model and Case-based Learning For Chinese Word Segmentation</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5 Performance and analysis
</SectionTitle>
    <Paragraph position="0"> The performance of our system in the bakeoff is presented in Table 1 in terms of precision (P), recall (R) and F score in percentages, where c denotes closed tests. Its IV word identi cation performance is remarkable.</Paragraph>
    <Paragraph position="1"> However, its overall performance is not in balance with this, due to the lack of a module for OOV word discovery. It only gets a small number of OOV words correct by chance. The higher OOV proportion in the test set, the worse is its F score. The relatively high Roov for PKc track is, mostly, the result of number recognition with regular expressions.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Error analysis
</SectionTitle>
      <Paragraph position="0"> Most errors on IV words are due to the side-effect of the context-dependent transformation rules. The rules resolve most remaining ambiguities and correct many errors, but at the same time they also corrupt some proper segmentations. This side-effect is most likely to occur when there is inadequate context information to decide which rules to apply.</Paragraph>
      <Paragraph position="1"> There are two strategies to remedy, or at least alleviate, this side-effect: (1) retrain probabilistic segmentation a conservative strategy; or, (2) incorporate Brill's error-driven learning with several rounds of transformation rule extraction and application, allowing mistakes caused by some rules in previous rounds to be corrected by other rules in later rounds.</Paragraph>
      <Paragraph position="2"> However, even worse than the above side-effect is a bug in our disambiguation module: it always applies the rst available rule, leading to many unexpected errors, each of which may result in more than one erroneous word. For instance, among 430 errors made by the system in the SA closed test, some 70 are due to this bug. A number of representative examples of these errors are presented in Table 2, together with some false errors resulting from the inconsistency in the standard segmentation.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML