File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-1648_concl.xml
Size: 1,718 bytes
Last Modified: 2025-10-06 13:55:42
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1648"> <Title>Language Modeling, and Shallow Morphology</Title> <Section position="7" start_page="412" end_page="412" type="concl"> <SectionTitle> 5 Conclusion and Future Work </SectionTitle> <Paragraph position="0"> The paper examined the use of single character and character segment models based correction of Arabic OCR text combined with language modeling and shallow morphological analysis.</Paragraph> <Paragraph position="1"> Further, character position and smoothing issues were also examined. The results show the superiority of the character segment based model compared to the single character based model.</Paragraph> <Paragraph position="2"> Further, the use of language modeling yielded improved error correction particularly for the character segment based model. Accounting for character position and shallow morphological analysis had a negative impact on correction, while smoothing had a positive impact. Lastly, given a large in-domain corpus to extract a correction dictionary and to train a language model is a sufficient strategy for correcting a morphologically rich language such as Arabic with a 70% reduction in word error rate.</Paragraph> <Paragraph position="3"> For future work, a factor language model might prove beneficial to incorporate morphological information and other factors such as part of speech tags while overcoming training data sparseness problems. Also, determining the size of a sufficiently large corpus to generate a correction dictionary and to train a language model is desirable. Finally, word prediction might prove useful for cases where OCR grossly mis-recognized words.</Paragraph> </Section> class="xml-element"></Paper>