File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/96/c96-2136_concl.xml

Size: 2,222 bytes

Last Modified: 2025-10-06 13:57:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-2136">
  <Title>Context-Based Spelling Correction for Japanese OCR</Title>
  <Section position="8" start_page="809" end_page="810" type="concl">
    <SectionTitle>
7 Discussion
</SectionTitle>
    <Paragraph position="0"> Previous works oil Japanese OCR error correction are l)ased on either the character trigram model or tile part of speech t)igram model. Their targets are printed characters, not handwritten characters.</Paragraph>
    <Paragraph position="1"> That is, they assutne the underlying OCI{.'s ac curacy is over 90%. Moreover, their treatment of unknown words and short words is rather ad hoe.</Paragraph>
    <Paragraph position="2">  ('l'Mmo and Nishino, 1989) used 1)~u't of speech bi gra, m a,nd best \[irsl, sea+rob for ()C,I{, correction. They used heuristic templal;es \[Lr ttnkllown words. ( 11;o a,nd M a,rtty,'tma,, 1.()92) used pa,rt of speech I)i graan a, nd \]lea,In search ill order to get, niultiple c,'mdidaJ, es in their int;eracl;ive 0(-:11, correcter r, The proposed Ja,paa\]ese spelling correction meLh.od uses pa,rt of speech trigra,m ;rod N best sea,reh, This (:oml&gt;ina,l,ion is l, heoretica, lly a, nd pra,ctica,lly iilore ;l,CCtlr;l, Le (;\[liLII previous reel, hods. In addition, t&gt;y using sl,a,t;istiea,I word ntodel, a,nd cc)llteXt; I&gt;a,sed n,l)lm)xin\]a,l,c word \[na, l, ch, il, t)e comes robust enottgh |;o }tm~dle very noisy texts, such a,s the ottl,put o\[' FAX O(111, systetns.</Paragraph>
    <Paragraph position="3"> To improve the word correction a,ccuraey, more powerful hmgua,ge models, stteh as word bigram, are required. (Jelinek, 1.(.)85) pointed out that &amp;quot;I)()S (pa,rt of speech) elassiliea,tion is too crude a,nd not necessa,rily suited 1,o la+ngtutge modeling&amp;quot;. Ilowever, il; is 1;c)o expensive to prepa, re a, la,rge m,~nua, lly segmented (:ort~,tts ()f e;tch l,a, rget do Ilia,ill L()(:O\[llpute the word 1)igra,m. 'l'her&lt;q'ore, we a,re thinking o\[' ran,king a, set\[&amp;quot; orga,tfized word seg meni;aJ, ion method I)y generMizing the l&amp;quot;orwm'd Ibtekwa,rd a, lgoritlml \['or those hmgua,ges tha, t ha,ve no delimiter between words (Na,gaJ, a,, 199(i),</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML