File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-1311_metho.xml

Size: 2,133 bytes

Last Modified: 2025-10-06 14:07:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1311">
  <Title>Detection of Language (Model) Errors</Title>
  <Section position="4" start_page="91" end_page="92" type="metho">
    <SectionTitle>
3 Evaluation
</SectionTitle>
    <Paragraph position="0"> In the evaluation, the training data is the PH corpus and the test data is the YZZK magazine articles (4+ Mbytes), downloaded from the Internet. In handwritten character recognition, the optimal size of the number of candidates is 6 (Wong and Chan, 1995). For robustness, each recognized character in our evaluation is selected from 10 candidates.</Paragraph>
    <Paragraph position="1"> We measured the performance in terms of recall, precision and the manual effort reduction in scanning the text for errors. The recall is the number of identified errors over the total number of errors. The precision is the number of identified errors over the total number of cases classified as errors. The amount of saving in manual scanning for errors is called the skip ratio, which is the number of blocks classified as correct over the total number of blocks. The recall and the skip ratio are more important than the precision because post error correction (manual or automatic) can improve the recognition accuracy.</Paragraph>
    <Paragraph position="2"> It is possible to combine the recall and precision into one, using the F measures (Van Rijsbergen, 1979) but the value for rating the relative importance is subjective.</Paragraph>
    <Paragraph position="3"> Table 1 shows the classification performance of the Bayesian classifier. The recall of errors by the Bayesian classifier has reduced slightly from 83% using a single classifier to 79% using 3 classifiers but the precision improved from 51% to 60%.</Paragraph>
    <Paragraph position="4"> Also, the skip ratio is 65%, which is much higher than the skip ratio of 0.1% if we did not use the classifier. Although the MLP has a higher precision (80%), its recall is slightly lower than  the Bayesian classifier. The skip ratio of the both  of the 3 types of classifiers in detecting language model errors.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML