File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/96/w96-0108_concl.xml

Size: 1,402 bytes

Last Modified: 2025-10-06 13:57:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="W96-0108">
  <Title>A Statistical Approach to Automatic OCR Error Correction in Context</Title>
  <Section position="6" start_page="95" end_page="95" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> The system we have created uses information from a variety of sources--qetter n-grams, character confusion probabilities, and word-bigram probabilities---to realize context-based, automatic, word-error correction. It can correct non-word errors as well as real-word errors. The system can also learn character confusion probability tables by correcting OCR text and use such information to achieve better performance. Overall, for complete (real- and non-word) error correction, it achieved a 60.2% rate of error reduction.</Paragraph>
    <Paragraph position="1">  The techniques we have used are subject to certain systematic problems. However, we believe they will prove to be useful not only in improving the quality of OCR processing, but also in enhancing a variety of information retrieval applications.</Paragraph>
    <Paragraph position="2"> In future work, we plan to explore different heuristics to deal with word boundary problems and to incorporate other models of context representation, including both SLM approaches, such as word trigram models, and simple discourse structures.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML