File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-1648_concl.xml

Size: 1,718 bytes

Last Modified: 2025-10-06 13:55:42

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1648">
  <Title>Language Modeling, and Shallow Morphology</Title>
  <Section position="7" start_page="412" end_page="412" type="concl">
    <SectionTitle>
5 Conclusion and Future Work
</SectionTitle>
    <Paragraph position="0"> The paper examined the use of single character and character segment models based correction of Arabic OCR text combined with language modeling and shallow morphological analysis.</Paragraph>
    <Paragraph position="1"> Further, character position and smoothing issues were also examined. The results show the superiority of the character segment based model compared to the single character based model.</Paragraph>
    <Paragraph position="2"> Further, the use of language modeling yielded improved error correction particularly for the character segment based model. Accounting for character position and shallow morphological analysis had a negative impact on correction, while smoothing had a positive impact. Lastly, given a large in-domain corpus to extract a correction dictionary and to train a language model is a sufficient strategy for correcting a morphologically rich language such as Arabic with a 70% reduction in word error rate.</Paragraph>
    <Paragraph position="3"> For future work, a factor language model might prove beneficial to incorporate morphological information and other factors such as part of speech tags while overcoming training data sparseness problems. Also, determining the size of a sufficiently large corpus to generate a correction dictionary and to train a language model is desirable. Finally, word prediction might prove useful for cases where OCR grossly mis-recognized words.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML