WORD CLASS DISCOVERY FOR POSTPROCESSING 
CHINESE HANDWRITING RECOGNITION 
Chao-Huang Chang 
E000/CCI~, Building 11, Industrial Technology Research Institute 
Chutung, Hsinchu 31015, TAIWAN, R.O.C. 
Summary 
This article presents a novel Chinese class n-gram 
model for contextual postprocessing of haudwriting 
recognition results. The word classes in the model 
are automatically discovered by a corpus-based simu- 
lated anuealing procedure. Three other language mod- 
els, least-word, word-frequency, and the powerflfl inter- 
word character bigram model, have been constructed 
for comparison. Extensive experiments on large text 
corpora show that the discovered class bigram model 
outperforms the other three competing models. 
1. INTRODUCTION 
Class-based language models (Brown et al., 1992)have 
been proposed for dealing with two problems con- 
fronted by the well-known word n-gram language mod- 
els (1) data sparseness: the amount of training data 
is insufficient for estimating the huge number of pa- 
rameters; and (2) domain robustness: the model is not 
adaptable to new application domains. The classes 
can be either linguistic categories or statistical word 
clusters. The tbrmer includes morphological features 
(Lee L. et al., 1993), grammatical parts-of-speech (Der- 
ouault and MeriMdo, 1986; Church, 1989; Chang and 
Chen, 1993a), and semantic categories. The latter uses 
word classes discovered by the computer using statis- 
tical characteristics in very large corpora. There have 
recently been several groups working on corpus-based 
word class discovery such as Brown ct al. (1992), 
Jardino and Adda (1993), Schutze (1993), and Chang 
and Chen (1993b). However, the practical value of 
word class discovery needs to be proved by real-world 
applications. In this paper, we apply the discovered 
word classes to language models for contextual post- 
processing o\[" Chinese handwriting recognition. 
The Chinese language has more than 10,000 char- 
acter categories. Therefore, the problem of Chinese 
character recognition is very challenging and has at- 
tracted many researchers. The tield has usually divided 
into three types: on-line recognition, printed character 
recognition, and handwriting recognition, in the order 
of difficulty. The recognition systems have been re- 
ported to have character accuracies ranging h'om 60% 
to 99%, by character recognizers for different types of 
texts from different producers. Misrecognitions and/or 
rejections are hard to avoid due to the problems of dif- 
ferent fonts, charaeters with similar shape, character 
segmentation, different writers, and algorithmic imper- 
fections. Therefore, contextual postprocessing of the 
recognition results is very useful in both reducing the 
number of recognition errors and saving the time in 
human proofreading. 
Contextual postprocessing of character recognition 
results is not novel: Shinghal (1983) and Sinha (1988) 
proposed approaches for English; Sugimura and Saito 
(1985) dealt with the reject correction of Japanese 
character recognition; and several researchers (Chou 
B. and Cbang, 1992; Lee II. et al., 1993) presented ap- 
proaches for postprocessing Chinese character recogni- 
tion, just to name a thw. 
Three large text corpora have been used in the 
experiments: 10-million-character 1991ud for collect- 
ing character bigrams and word frequencies, 540- 
thousand-character day7 for word class discovery, and 
92-thousand-character poll2 for evaluating postpro- 
cessing language models 
A simulated annealing approach is used for discov- 
ering the statistical word classes in the training cor- 
pus. The discovery process converges to an optimal 
class assignment to the words, with a minimal perplex- 
ity for a predefined number of classes. The discovered 
word classes are then used in the class bigram language 
model for postprocessing. 
We have used a state-of-the-art Chinese handwriting 
recognizer (Li et al., 1992) developed by ATC, CCL, 
ITRI, Taiwan as the basis of our experiments. The 
CCL/IICCR handwritten character database (5401 
character categories, 200 samples each category) (riLl ct 
al., 1991) was automatically sorted according to char- 
acter quality (Chou S. and Yu, 1993). The recognizer 
produces N best category candidates for each character 
sample in the test part of the database. The postpro- 
cessor then uses as its input the category candidates 
for the pol±2 corpus and chooses one of the candidates 
for each character as its output. 
For comparison, we have also implemented three 
other language models: a least-word model, a word- 
frequency model, and the powerful inter-word char- 
acter bigram model (Lee L. et al., 1993). We have 
conducted extensive experiments with the discovered 
class bigram (changing the number of classes) and 
these three competitive models on character samples 
1221 
with different quality. The experimental results show 
that our discovered class bigram model outperforms 
the three competing models. 
2. WORD CLASS DISCOVERY 
We describe in this section the problem of corpus-based 
word class discovery and the simulated annealing ap- 
proach tbr the problem. 
2.1 The problem 
Let T= Wl,W2, ...,wL be a text corpus with L words; 
V = vl, v~, ..., VNv be the vocabulary composed of the 
NV distinct words in T; and C = C1,C2,...,CNc be 
the set of classes, where NC is a predefined number of 
classes. The word class discovery problem can be for~ 
mulated as follows: Given V and C (with afixed NC), 
find a class assignment ¢ from V to C which maxi- 
mizes the estimated probability of T, \[~(T), according 
to a specific probabilistic language model. 
For a class bigram model, find ¢ : V --+ C to maxi- 
mize ~(T) = ~I/L=I p(wi I¢(wl))p(¢(wi)l¢(wi-1)))) 
Alternatively, perplexity (Jardino an d Adda, 1993) 
or average mutual information (Brown et al., 1992) can 
be used as the characteristic value for optimization. 
Perplexity, PP, is a well-known quality metric for lan- 
guage models in speech recognition: PP = /5(T)-~. 
The perplexity for a class bigram model is: 
L 1-(p(w~l¢(wd)p(¢(wdl¢(w~-0))) P P = cxp(- 
i=1 
where wj is the j-th word in the text and ~b(wj) is 
the class that wj is assigned to. 
For class N-gram models with fixed NC, lower per- 
plexity indicates better class assignment of the words. 
The word class discovery problem is thus defined: find 
the class assignment of the words to minimize the per- 
plexity of the training text. 
2.2 The simulated annealing approach 
The word class discovery problem can be considered 
as a combinatorial optimization problem to be solved 
with a simulated annealing approach. Jardino and 
Adda (1993) used the approach for antomatically clas- 
sifying French and German words. The four compo- 
nents (Kirkpatrick et al,, 1983) of a simulated anneal- 
ing algorithm are (1) a specification of configuration, 
(2) a random move generator for rearrangements 
of the elements in a configuration, (3) a cost tim(:- 
lion for evaluating a configuration, (4) an annealing 
s('hedule that specifies time and duration to decrease 
the control parameter (or temperature). The configu- 
ration is clearly the class assignment q~, for the word 
class discovery problem. The move generator is also 
straightforward -- randomly choosing a word to be re- 
assigned to a randomly chosen class. Perplexity can 
serve as the cost fimction to evaluate the quality of 
word classification. The Metropolis algorithm speci- 
fies the annealing schedule. The discovery procedure is 
thus: (1) Initialize: Assign the words randomly to the 
predefined number of classes to have an initial config- 
uration; (2) Move: R,eassign a randomly selected word 
to a randomly selected class (Monte Carlo principle); 
(3) Accept or Backtrack: If the perplexity is changed 
within a controlled limit (decreases or increases within 
limit), the new configuration is accepted; otherwise, 
undo the reassignment (Metropolis Mgorithm, see be- 
tow); and (4) Loop: Iterate the above two steps until 
the perplexity converges. 
Metropolis algorithm (Jardino and Adda, 1993): 
The original Monte Carlo optimization accepts a new 
configuration only if the perplexity decreases, suffers 
from the local minimum problem. Metropolis et al. 
proposed in 1953 that a worse configuration can be ac- 
cepted according to the control parameter cp. The new 
configuration is accepted if cxp(APP/cp) is greater 
than a random number between 0 and 1, where APP is 
the difference of perplexities for two consecutive steps. 
cp is decreased logarithmically (multiplied by an an- 
nealing factor a f) after a fixed number of iterations. 
3. CONTEXTUAL POSTPROCESSING OF 
HANDWRITING RECOGNITION 
The problem of contextual postprocessing can be de- 
scribed as follows: The character recognizer produces 
top K candidates (with similarity score) for each char- 
acter in the input stream; the postprocessor then de- 
cides which of the K candidates is correct based on the 
context and a language model. Let the recognizer pro- 
dace the candidate matrix M for the input sequence of 
length N: 
Cll C2t Caj .... CN~ 
C12 C22 Ca2 .., CN.~ 
C~K C,2~ Cadre .,. CNK 
the postprocessor is to find the combination with 
highest probability according to the language model: 
0 = 01,02 .... ON =argmax P(OIM) 
The overall probability can be divided into two 
parts: pattern recognition probability and linguistic 
probability, P(OI M) = f'pn(OlM) * PLM(OIM). The 
former is produced by the recognizer, while the latter 
is defined by thr language model. 
This problem can be reformnlated as one of finding 
the optimal path in a word lattice, since word is the 
smallest meaningful unit in the Chinese language. The 
word lattice is formed with the words proposed by a 
word hypothesizer, which is composed of a dictionary 
marcher and some lexical rules. Thus, PrM(O\[M) = 
max~l~paths P(path), where a path is a word sequence 
formed by a character combination in M. 
3.1 Least-word model (LW) 
A simple language model is based on a dictionary (ac- 
tually a wordlist). The characteristic function of the 
model is the number of words in the word-lattice path. 
The best path is simply one with the least number of 
1222 
words, l'cM (OIM) -: (-1)* #words-in-the-path. This 
is similar to the principle of Max|reran Mal.ching ill 
Chinese word segmentation. 
3.2 Word-fr(:queney model (WF) 
Another simple model is based on the word frequencies 
of the words in the word-lattice pai;h. '\['his can be 
considered as a word unigram language model. The 
path probal)ility is tit(; product of word probabilities 
of the words in the path. 
a.3 Inter-word ('haraeter bigram model (IWC B) 
l,ee b. el aL (1993) recently presented a novel |(lea 
called word-latticcobased Chinese character bigram for 
Chinese language modeling. Basically, they approxi~ 
mate the eii)ct of word I)igr;mls by applying character 
bigrams to the boundary characters of adjacent words. 
'l!he approach is simple and very effective. \]t can also 
be considered as one of class-base.d bigram models, us 
ing morl)hological features the lirst and last charac- 
ters of a word. Wc luM implemented a variation of 
the model, called inter-word character l)igram model. 
Word probal)ilities and Chinese character bigrams wer('. 
built from the 10-million-character UI) ('orlms. 'l?he 
path probability is computed as the product ol" word 
probabilities and inter-word character bigram proba- 
bilities of the words in the path. This model is one of 
the best among the existing Chinese language models, 
and has been successfully applic, d to Chinese homo- 
t)hone (lisambiguation and linguistic decoding (l,ee /,. 
c~ el., 1993). 
3.4 Discovered ('lass 1)|gram model 
Our novel language model uses the word classes dis 
covered by the simulated anneMing procedure as the 
1)asis of (:lass bigram language model. The ram,her of 
classes (NC) can be selected according I;o the size of 
training corl)uS. 
Every word in the training corI)uS is assigned to a 
certain class after the training process converges with 
a minimal perplexity. Thus, we can store the class in-- 
dices in the corresponding le.xicM entries in the dictio- 
nary. Words in a word-lattice path ;~re then au|;otllat- 
ieMly mapped to the. class indices through dictionary 
look-up. The path 1)robability is thus the product o\[' 
lexical l)robabilities and contextuM class bigram l)rob 
abilities, as in a usual (:lass bigrmn language model. 
4. EXPEII,IMENTAL RESULTS 
4.1 The eorpora and word blgralns 
The 1991 U l) newsl)aper corpus (199lad) of approxi- 
mately 10,000,000 characters has beeo used for collect- 
ing the character bigrams and word frequencies used in 
the lWCll model. A sul)corpus of 1991ud, day7, was 
used for word (;lass discovery. 
'l!he subcorpns is first segmented automat|rally into 
sentences, then into words by our Viterbid)ased word 
identification program VSG. SI, atistics of the day7 sub- 
corpus arc; summarized: 42,537 senteuces, 2;t,9"/7 wor(b 
types (3,37'7 I-character, 16,004 2-character, 2,4611 3- 
character, 2,135 4-character), and 355,347 word-tokens 
(189,838 I character, 150,267 2-character, 10,783 3- 
character, 4,460 d-character). 
A sin,pie program is then used for counting the word 
collocation frequencies for the 23,977x23,977 word bi 
gram, iu which only 203,304 entries arc; nonzero. Af- 
ter that, the full word bigram is stored in compressed 
form. 
'Fhe simulated anneMing procedm:e is w~ry time- 
consuming; that is why we have used the smMler day7 
rather than the original 1991ud corpus for word class 
discove.ry. For example, it took 201.2 CPU hours on 
a I)EC 3000/500 AXP workstation to classify 23,977 
words into 200 classes with 50,000 trials in each of 416 
iterations, using the day7 corpus. 
An iudelmndent, set of news abstract artMes, polL2, 
were collected for evaluating the l)erforntance of lan-- 
guage models, polL2 is cli\[\['erenl; from day7 in both 
pulAisher and time period poll2 contains 6,930 sen- 
tences or !t2,710 (Jhiuese characters. 
4.2 llanclwriting recognition 
We have used a state-of-the art Chinese handwriting 
recognizer (I,i el el., 1992) de.veloped by ATC, (XII,, 
\['I'll\], Taiwan as the basis of our experiments. The 
(',(JlffllCCl/. hamlwritten character tie|abase (5401 
character categories, 2(10 samples each category) ('IS| 
el el., 1991)was first automatically sorted according 
I.o character quality (Chou S. and Yu, 1993), then was 
divided into two t~m'l,s: the odd-rank s~mq)les \]))r |;rMn 
ing the recognizer, the. eves-rank samples as iteM-out 
test data. 
We have used for our experiments three sets of char- 
acter samples, CQI0, CQ20, and CQ30, which are the 
saml)les with quMity ranks 10, 20, and 30, respectively. 
The recognition results; are sumu,arized in Table l(a). 
The table shows the n,unbers of character samples in 
which position the correct character categories were 
ranked by the recognizer. There are, for example, 5,270 
character samples ranked 1, 105 ranked 2, 15 ranked 3~ 
..., aud 4 ranked after 10, for CQI0. The error rates, in 
terms of character categories, would be 2.43%, '3.48%, 
and 4.07%, for (JQI0, CQ20, and (X230, respe.ctiw~ly. 
4.3 Word class discovery 
The day7 subcorlms was used for discovering word 
classes. Tim initial contiguration is: Words with tYe- 
quency less tlum m (currently set to 6) are assigne'd 
to Class-0, the unseen word (:lass (Jardino and Adda 
1993); i)ttnctuation marks are assigned to a speciM 
class Class-l; aud l 4 character numl)er words are as- 
signed to Classes 2 .5, resl)ectively; all other words are 
assigne.d to Class--0. The word-types assigne(t to the 
six spe.cial classes classes 0-5 are not subject to reas- 
signment. '\['he control \[)a.ra/tleter (7.\]) is initially set to 
0.1 and the amlealing factor af 0.9. 
We have conducted rmmbers of experiments with 
1223 
Table 1: Handwriting Recognition Results 
I rank \[ CQ10 \[ CQ20 \[ CQ3O I 
1 5270 5213 5181 
2 105 133 162 
3 15 20 29 
4 2 11 7 
5 3 2 5 
6 2 7 3 
7-10 0 0 3 
>10 4 15 11 
(a) Number of Correct Character Categories 
' 2 \[rank\[CQJOlCQOlCQ30\] 
1 90778 88924 89699 
2 1451 2994 2112 
!3 178 168 399 
4 2 86 38 
5 1.35 0 199 
6 64 95 62 
7-10 0 0 4 
>10 50 391 145 
out 52 52 52 
(b) Number of Correct Characters in po1±2 
different predefined number of classes NC. The auto- 
matie discovery procedure stops when the perplexity 
converges or the control parameter approaches to zero. 
The converged perplexities range from 670 to 1200, 
depending on NC. Classifications with higher NC have 
lower training set perplexities. However, we have to 
careful about the problem of overtraining due to insuf- 
ficient training data. See Chang and Chen (1993b) for 
discussion on the problem. 
A statistical langnage model must be able to deal 
with the problem of unseen words and bigrams, in real- 
world applications. We adopt a simple linear smooth- 
ing scheme, similar to Jardino and Adda (1993). The 
interpolation parameters ct and ¢? are set to 1 - 10 -'~ 
and 0.1, respectively. 
4.4 Contextual postproeessing 
The poll2 corpus of 92,710 Chinese characters was 
used for evaluating the performance of contextual post- 
processing. The recognition resnlts for the three sets of 
character samples were used as the basis of evalnation. 
Table 1 (b) shows the recognition results in terms of the 
poli2 corpus. The corpus contains 52 uncommon char- 
acters which do not belong to any of the 5401 character 
categories. The table shows the nmnbers of characters 
in the corpus in which position the correct characters 
were ranked by the recognizer. For example, there are 
90,778 characters ranked 1, 1451 ranked 2, 178 ranked 
3, ..., and 50 ranked after 10, in terms of the CQI0 
samples. The recognition error rate for CQ10 would 
be 2.08%, without contextual postprocessing. 3'he er- 
For rate for CQ20, 4.08%, is higher than that for CQ30, 
3.25%, because some very common characters, e.g., ;/~ 
, ~ in CQ20 samples are misrecognized. We set the 
number of candidates K to 6 in the experiments, as a 
tradeoff for better performance. Therefore, the char- 
acters ranked after 6 and the 52 uncomnmn characters 
are impossible to recover using the postprocessor. The 
optimM results a language model can do are thus with 
error rates 0.11%, 0.48%, and 0.22%, for CQ10, CQ20, 
and CQ30, respectively. 
The changes the postprocessor makes can be classi- 
fied into three types: wrong-to-correct (XO), correct- 
to-wrong (OX), and wrong-to-wrong (XX). In the XO 
type, a wrong character (i.e., a recognition error) is cot: 
rected; in the OX type, a correct character is changed 
to a wrong one; and in the XX type, a wrong char- 
acter is changed to another different wrong one. The 
performance of the postprocessor can be evaluated as 
the net gain, @XOs - #OXs. 
'Fable 2: Postprocessing Results for the CQ10, CQ2, CQ30 
Character Samples 
Model I xo\[ oxlxx A¢ai l R(%) 
No Grammar 0 0 0 0 3.14 
Least Word 1713 1361 67 351 2.76 
Word Freq. 2417 702 149 1714 1.29 
IWCB 2563 668 204 1895 1.10 
NC = 50 2349 2071 134 2148 0.82 
NC=100 2354 201 133 2153 0.81 
NC= 150 2351 192 128 2159 0.81 
NC=200 2355 212 131 2143 0.82 
NC = 250 2361 240 135 2120 0.85 
NC=300 2348 232 141 2116 0.86 
NC : 500 2317 311 153 2006 0.97 
Table 2 summarizes the experimental results of post- 
processing \['or the three sets of character samples. The 
columns XO, OX, XX, and Gain list the average num- 
bers of characters in types XO, OX, XX, and XO- 
OX, respectively. The last column ER lists the overall 
error rates after postprocessing with the various lan- 
guage models. The No Grammar row lists the error 
rates without postprocessing; the rows Least Word, 
Word Freq., and IWCB show the results for the Least;- 
Word, Word-Frequency, and Inter-word Character Bi- 
gram models; and tire NC rows show the results for 
discovered class bigram models with different nnmbers 
of classes. We observe from Table 2 that: 
• Our discovered class bigram model out-performed 
the other three models in general. The order of 
performance is: NC = 200 > IWCI3 > Wt ~' > 
LW. The average error rates are - Kecognizer: 
3.14%, LW:2.76%, WF:1.29%, lWCB:I.10%, and 
NC = 200: 0.82%. 
In other words, our NC = 200 rednced the error 
rate by 73.89%, while IWCB reduced it by 64.97%, 
7224 
WF by 58.92%, and LW by 12.10%. Note that a 
0.27% average of the characters arc always wrong; 
that; is, the least error rate is 9.27%. le, xcluding 
these characters, the NC = 200 model reduced 
the error rate by 80.84%! 
• The l,east-word model is not sufficient (it has neg- 
ative gain for CQ10), and the Word-frequency 
model is much better, reducing the error rates by 
more than Iifty percent. 
• Our model outperformed the powerful \[WCB 
model, except for CQ20. The difference of CQ20 
performance is just 0.05%, while our model out- 
performed IWCB by much larger margins, 0.51% 
and 0.4:3%, tbr CQ10 and CQ30, respeetiw~ly. Be- 
sides, the storage requirement of otlr model is 
much less than that of 1WCB model. 
• The IWCB model usually corrects more errors 
than ours, while it also commits much more OX 
mistakes. 
• The optimal NC vahtes for the discovered class 
bigram models are 200 for CQ10 and CQ20, and 
150 for CQ30. This is consistent to the common 
rule of thumb: the size of training data should 
be at least ten times the number of parameters, 
which suggests a NC value of 189 for the size of 
the ctay7 corpus (355,347 words). 
The N(; = 500 models are apparently over- 
trained, which is consistent to the evaluation of 
test t(,t perplexities we discussed in (?hang and 
Chen (1993b). 
5. CONCLUDING REMARKS 
We have proposed using auton,aticaliy discovered word 
classes in Chinese class n-gram models for r.ont.ex- 
tual postproeessing of handwriting recognition results. 
Three other language models have been constructed for 
comparison, gxtensive exl)eriments on large text col 
pora show that the discovered class bigram language 
model has outperformed all the three competing mod- 
els, including the powerful inter-word character bigram 
model. Future works include (1) applying the discov- 
ered class bigram models to linguistic decoding in Chi- 
nese speech recognizer; and (2) studying other auto- 
matic discovery approaches. 
Acknowledgements 
Thanks are due to the Chinese llandwriting l{.ecogni- 
lion group, ATC/CCL/ITIL\] for the character recog- 
nizer, especially Y.-C. l,ai for preparing the recognition 
results. This paper is a partial result of the project no. 
37112100 conducted by the. H'I{I under sponsorship of 
the Minister of F, conomie Affairs, R.O.C. 
References 
Brown, l).l,'., V.J. Della l)ietra, P.V. de Soaza, J.C. 
Lai, and ILl,. Mercer (1992). Class-hased n-gram 
models of natural language. Computational Lin- 
.quistics, 18, pp. 467- 479. 
Chang, C.-II. and C.-D. Chen (1993a). HMM-based 
part-of-speech tagging for Chinese corpora. In 
Proc. of the Workshop on Very Large Corpora 
(WVLUI), Columbus, Ohio, USA, pp. 40 47. 
Chang, CAl. and C.-I). Chen (1993b). Automatic 
clustering of Chinese characters and words. In 
Proc. of IgOCLINC VI, pages 57-78, Chitou, 
Nantou, Taiwan, pp. 57 78. 
Chou, B.II. anti J.S. Chang (11992). Applying lan- 
guage modeling to Chinese character recognition. 
In Proc. of ROCLING V, Taipei, Taiwan, pp. 
261- 286. (in Chinese). 
Chou, S.-1,. and S.-S. Yu (11993). Sorting qualities 
of handwritten Chinese characters for setting up 
a research database. In Proc. of IG'DAIG93, 
Tsukuba, Japan, Pl). 474 477. 
Church, g. (1989). A stochastic parts program and 
noun phrase parser for unresticted text. in Proc. 
of ICA,S'5'P-89, C, lasgow, Scotland, pp. 695 698. 
1)erouault, A. aim B. Merialdo (1986). Natural 
language modeling for phoneme-to-text transcrip- 
tion. H'JEE Trans. PAMI, 8, pp. 742-74:9. 
3ardino, M. and G. Adda (1993). Automatic word 
classification using simulated annealing. In Proc. 
of ICASS1~-93, II, Minneapolis, Minnesota, USA, 
pp. 41-44. 
Kirkpatriek, S., C.I). Gelatt, Jr., and M.P. Vecchi 
(1983). Optimization by simulal;ed annealing. 
Science, 220, pp. 671 680. 
Lee, H.-J., C.-H. Tung, and C.-H. Chang Chien 
(:1993). A Markov bmguage model in Chi- 
nese text recognition. In .l'roc. of ICDAR-93, 
'l?suknba, Japan, pp. 72 75. 
Lee, L.-S. et al (1993). Golden Mandarin (I1) - an 
improved single-chip real-time Mar, darin dictation 
machine \['or Chinese language with very large vo- 
cabulary. In Proc. of ICASSP-93, 11, Minneapo- 
lis, Minnesota, USA, pp. 503-506. 
l,i, T.-F., S.-S. Yu, II.-F. Sun, and S.-L. Chou (1992). 
llandwritten Chinese character recognition using 
Bayes rule. In Proc. of ICCPCOL-92, Florida, 
USA, pp. 406 dill. 
Sehutze, 1t. (1993). Part-of-speech induction from 
scratch. In Proc. of AUL-93, Columbus, Ohio, 
USA, pp. 251- 258. 
Shinghal, R. (1983). A hybrid algorithm for contex- 
tual text recognition. Pattern Recognition, 16, 
pp. 251 267. 
Sinha, R. and B. Prasada (1988). Visual text recog- 
nition through contextual processing. Pattern 
tgecognition. 2l, pp. 463-479. 
Sugimura, S. and T. Saito (1985). A study of r(~ec.- 
lion correction for character recognition based on 
binary n-gram. IE\[CE Japau, J68-D, pp. 64-71. 
(in Japanese). 
'lhq L.-T. et al (1991). Recognition of handprinted 
characters by feature matching. In Proc. of 1991 
l"irst National Workshop on Character Rcco.qni- 
lion, tlsinchu, q'aiwan, pp. 166 175. 
1225 
