Corpus-based Adaptation Mechanisms 
for Chinese Homophone Disambiguation 
Chao-Huang Chang 
E000/CCL, Building 11, industria! Technology Research Institute 
Chutung, Hsinchu 31015, Taiwan, R.O.C. 
E-mail: changch©eOsun3, col. itri. org. tw 
Abstract 
Based on the concepts of bzd~rectwnal converswn 
and automahc evaluatzon, we propose two user. 
adaptation mechanzsms, character-preference learn. 
in9 and pseudo-word learning, for resolving Chinese 
homophone ambiguities in syllable-to.character con- 
version. The 1991 Umted Daily corpus of approxi- 
mately 10 million Chinese characters ts used for ex- 
traction of 10 reporter-specific article databases and 
.\[or computat,on of word frequencies and character hi- 
grams. Ezpemments show that ~0.5 percent (testing 
sets) to 71.8 percent (trammg sets) of conversion er. 
rots can be eliminated through the proposed mecha- 
nisms. These concepts are thus very useful tn ap- 
phcattons such as Chinese znput methods and speech 
recognition systems. 
1 Introduction 
Corpus-based Chinese NLP research has been very 
active in the recent years as more and more computer 
readable Chinese corpora are available. Reported 
corpus-based NLP applications \[10\] include machine 
translation, word segmentation, character recogni- 
tion, text classification, lexicography, and spelling 
checker. In this paper, we will describe our work on 
adaptive Chinese homophone disambiguation (also 
known as phonetic-input-to-character conversion or 
phonetic decoding) using part of the 1991 United 
Daily (UD) corpus of approximately 10 million Chi- 
nese characters (Hanzi). 
It requires a coding method, structural or phonetic, 
to input Chinese characters into a computer, since 
there are more than I0,000 of them in common use. 
In the literature \[3,7\], there are several hundred dif- 
ferent coding methods for this purpose. For most 
users, phonetic coding (Pinyin or Bopomofo) is the 
choice. To input a Chinese character, the user sim- 
ply keys in its corresponding phonetic code. It is 
easy to learn, but suffers from the homophone prob- 
lem, i.e., a phonetic code corresponding to several 
different characters. Therefore, the user needs to 
choose the desired character from a (usually long) 
list of candidate characters. It is inefficient and an- 
noying. So, automatic homophone disambiguation is 
highly desirable. Several disambiguation approaches 
have been reported in the literature \[3, 7\]. Some of 
them have even been realized in commercial input 
methods, e.g., ttanin, WangXing, Going. However, 
the accuracies of these disambiguators are not sat- 
isfactory. In this paper, we propose a corpus-based 
adaptation method for improving the accuracy of ho- 
mophone disambiguation. 
For homophone disambiguation, what we need as 
input is syllable (phonetic code) corpora instead of 
text corpora. For adaptation, what we need is per- 
sonal corpora instead of general corpora (such as 
the UD corpus). Thus, we first design a selec- 
tion procedure to extract articles by individual re- 
porters. Ten personal corpora were set up in this 
way. An additional domain-specific corpus, trans- 
lated AP news, was built up similarly. Then, we 
design a highly-reliable (99.7% correct) character-to- 
syllable converter \[I\] to transfer the text corpora into 
syllable corpora. 
Our baseline disambiguator is rather conventional, 
composed of a word-lattice searching module, a path 
scorer, and a lexicon-driven word hypothesizer. Us- 
ing the original text corpora and the correspond- 
ing syllable corpora, we propose a user-adaptation 
method, applying the concept of bidirectional con- 
version \[I\] and automatic evaluation \[2\]. The adapta- 
tion method includes two parts: character-preference 
learning and pseudo word learning. Given a per- 
sonal corpus (i.e., sample text), the adaptation pro- 
94 
cedure is able to produce a user-specific character- 
preference model and a pseudo word lexicon auto- 
matically. Then the system can use the user-specific 
parameters in the two models for improving the con- 
version accuracy. 
Extensive experiments have been conducted for (1) 
ten sets of local-news articles (one set per reporter) 
and (2) translated international news from AP News. 
Each set is divided into two subsets: one for train- 
ing, the other for testing. The character accuracy of 
the b&seline version is 93.46% on average. With the 
proposed adaptation method, the augmented version 
increases the accuracy to 98.16~ for the training sets 
and to 94.80% for the test sets. In other words, 71.8% 
and 20.5% of the errors have been eliminated, respec- 
tively. The results are encouragiug for us to further 
pursue corpus-based adaptive learning methods for 
Chinese phonetic input and language modeling for 
speech recognition. 
2 Homophone Disambiguation 
Mandarin Chinese has approximately 1300 syllables, 
13,051 commonly used characters, and more than 
100,000 words. Each character is pronounced as a 
syllable. Thus, it is clear that there are many sylla- 
bles are shared by numbers of characters. Actually, 
some syllables correspond to more than 100 charac- 
ters, e.g.. tile syllable \[yi4\] corresponds to 125 char- 
acl, ers, ,~, Jilt, ~, ~¢, ~, ~, etc. Thus, homophone 
(character) disambiguation is difficult but important 
in Chinese phonetic input methods and speech recog- 
nition systems. 
The problem of homophone disambiguation can be 
defined as how to convert a sequence of syllables S = 
sl, s2 ..... sn (usually a sentence or a clause) into a cor- 
responding sequence of characters C = cl,e~,...,cn 
correctly. Here, each si stands for one of the 1300 
Chinese syllables and each c, one of the 13,051 char- 
acters. 
Fortunately, when the characters are grouped into 
words (the smallest meaningful unit), the homophone 
problem is lessened. The number of homophone poly- 
syllables is much less than that of homophone char- 
acters. (A Chinese word is usually composed of 1 to 
4 characters.) For the disamhiguation, longer words 
are usually correct and preferred. Thus, the ho- 
mophone disambiguation problem is usually formu- 
lated as a word-lattice optimal path finding prob- 
lem. (Note that there is the problem of unknown 
words, especially personal names, compound words, 
and acronyms, which are not registered in the lexi- 
con.) 
For example, a sequence of three syllables sl, s2, 
s3 involves six possible subsequences sl, s2, s3, sl-s2, 
s2-s3, sl-s2-s3, which can correspond to some words. 
Each subsequence could correspond to more than one 
word, especially in the case of monosyllables. Accord- 
ingly, a word lattice is formed by the words with one 
of the six subsequences as pronunciation. See Figure 
1 for a sample word lattice. 
Note that syllables are chosen as input units 
instead of word-sized units used in systems like 
TianMa. The major reason is: Chinese is a mono- 
syllabic language; characters/syllables are the most 
natural units, while "words" are not well-defined in 
Chinese. It is difficult for people to segment the words 
correctly and consistently, especially according to the 
dictionary provided by the system. This is also the 
reason why newer intelligent Chinese input methods 
in Taiwan like Hanin, WangXing, and Going, all use 
syllables (for a sentence) as input units. In addi- 
tion, our target system is an isolated-syllables speech 
recognition system. 
3 The Baseline System 
The proposed system (Figure 2) is composed of 
a baseline system plus two new features: character- 
preference learning (CPL) and pseudo word learning 
(PWL). 
The baseline syllable-to-character converter con- 
sists of three components: (1) a word hypothesizer0 
(2) a word-lattice search algorithm, and (3) a score 
function. The basic model used in our system is: 
(-1) a Viterbi search algorithm, (2) a lexicon-based 
word hypothesizer, and (3) a score function consider- 
ing word length and word frequency. 
The word hypothesizer matches the current input 
syllable candidates with the lexical entries in the 
lexicon (7,953 1-character words, 25,567 2-character, 
12,216 3-character, 12,419 4-character, 558,155 words 
totally). All matched words are proposed as word 
hypotheses forming the word lattice. Currently, we 
consider only those words with at most four syllables 
(only less than 0.1% of words contain five or more 
syllables). In addition, Determinative-Measure (DM) 
95 
\[jing3\] \[fangl\] \[huai2\] \[yi2\] \[quan2\]\[an4\] \[you3\] \[yin3\] \[qing2\] 
........................................ ° .... 
Figure 1: A Sample Word Lattice (,: correct words ...... : more monosyllabic words) 
Chinese 
l..exicon 
Chinese 
Syllables 
I-- 
_l -I 
Pseudo Word 
Lexicon 
Character 1 Preference 
Table 
l _--I 
Word 
Hypothesizer 
I 
Score 
Function 
I 
Adaptive Word Lattice Search Algorithm 
Figure 2: The Overall Structure 
Characters 
96 
compounds are proposed dynamically, i.e., not stored 
in the lexicon. 
Viterbi search is a well-known algorithm for op- 
timal path-finding problems. The word lattice for a 
whole clause (delimited by a punctuation) is searched 
using the dynamic-programnfing-style Viterbi algo- 
rithm. 
The score function is defined as follows: If a path P 
is composed of n words uq ..... w,, and two assumed 
clause delimiters w0 and wn+l, the path score for P is 
the sum of word scores for the n words and inter-word 
link scores for the n+l word links (n-1 between-word 
links and 2 boundary links). 
t,ath~core( P) =   wordscore( u,i ) 
sml 
+ ~ linkscore(wl, wi+l) 
i=O 
The word score of a word is based on the word fre- 
quency statistics computed by counting the number 
of occurrences the word appears in the 10-million- 
character UD corpus. The word frequency is mapped 
into an integral score by taking its logarithm value 
and truncating the value to an integer. Lee et al. \[11\] 
recently presented a novel idea called word-lattzce- 
based Chinese character bigram for Chinese language 
modeling. Basically, they approximate the effect of 
word bigrams by applying character bigrams to the 
boundary characters of adjacent words. The ap- 
proach is simple (easy to implement) and very ef- 
fective. Following the idea, we built a Chinese char- 
acter bigram based on the UD corpus and used it 
to compute inter-word link scores. For two adjacent 
words, the last. character of the first word and the 
first character of the second word are used to consult 
the character bigram which recorded the number of 
occurrences in the UD corpus. Inter-word link scores 
are then computed similarly to word scores. 
4 Bidirectional Conversion 
and Automatic Evaluation 
Here, we will only briefly review the concepts of bidi- 
rectional conversion and automatic evaluation \[I,2\]. 
For more details, see the cited papers. 
Homophone disambiguation can be considered as a 
process of syllable-to-character ($2C) conversion, Its 
reverse process, character-to-syllable (C2S) conver- 
sion, is also nontrivial. There are more than 1000 
characters, so-called Poyinzi (homographs), with 
multiple pronunciations. However, a high-accuracy 
C2S converter is achievable. Using an n-gram looka- 
head scheme, we have designed such a converter with 
99.71c~ accuracy. Because of the high accuracy, the 
C2S converter can be used to convert a text corpus 
to a syllable corpus automatically. The two processes 
together form a bidirectional conversion model. The 
i~oint is: If we ignore the 0.29% error (could be re- 
duced ifa better C2S system is used), many applica- 
tions of the model appear. 
We have applied the bidirectional model to auto- 
matic evaluation of language models for speech recog- 
nition. A more straightforward application is auto- 
matic evaluation of the $2C converter. A text is con- 
verted into a syllable sequence, which then is con- 
verted back to an output text. Comparing the input 
text with the output, we can compute the accuracy 
of the $2C converter automatically. 
5 Corpus-based 
Mechanisms 
Adaptation 
In the following, we describe how to apply the model 
to user-adaptation of homophone disambiguator. 
5.1 Character-Preference Learning 
Everyone has his own preference for characters and 
words. A chemist might use the special characters for 
chemical elements frequently. Different people uses 
a different set of proper names that are usually not 
stored in the lexicon. In this section, we propose an 
adaptation method based on the bidirectional conver- 
sion model. 
From a sample text given by the user, the system 
first converts it to a sequence of syllables. Then, the 
baseline system is used to convert them back to Chi- 
nese characters. After that, we can compare them 
with the input to obtain the error records. From 
the comparison report, we will derive three indices 
for each character in the character set (say, 13,051 
characters in the Big-5 coding used in Taiwan): A- 
count, B-count, and C-count. A-count is defined a.s 
97 
the number of times thai the character is misrecog- 
nized. B-count the number of times it is wrougly used. 
while C-count the number of times it is correctly rec- 
ognized. For example, if the user wants to input the 
string ~I~N~ and keys in the corresponding syllables 
\[li3\]\[zhenl\]\[zhenl\] while the output is ?-~t.~t~, the in- 
dices would be: A(~)=0, B(~)=0, C(~)=I, A(~ 
)=2, B(~)=0, C(~)=0, A({.~)=0, B({~)=2, C(\].~ 
)=0. From these indices, we propose a character- 
preference learmng procedure: 
1. Convert the given sample text 1¢ intoa syllable 
file 10, using the character-to-syllable converter. 
Let the baseline version be I~j. Run V c with l, to 
obtain an output 0 °. From \]¢ and 0 °, compute 
the initial accuracy a °. 
2. Initialize the 13051-entry character-preference 
table CPT ° to zeroes. Set n to 1. 
3. From Ic and O n-l. compute the A. B, C indices 
for each character. 
4. For each character c, add to the corresponding 
entry in CP'1 'n-I a preference score (according 
to a preference adjustment function pf of A(c), 
B(c), C(c)) to form CPT n 
5. Form a new version V" of the syllable-to- 
character converter by considering CPT'. Run 
V n with 1., to obtain a new output O n . 
6. From \]c and O", compute the new accuracy rate 
a n " 
7. If a n > a '~-l, set n to n + 1 and repeat steps 3.- 
6. Otherwise, stop and let CPT"-\] be the final 
CPT for the user. 
Adjustment Functions 
In step 4, the adjustment function pf is a function of 
A(c), B(c), C(c). Several versions have been tried in 
our experiments. Three of them are: 
+l if.4(c)- B(c) > 0 
pI(c) = -1 if A(c)- B(c) < 0 (1) 
0 otherwise 
vf(c) = 
+I if A(c)- B(c) + C(c) > 0 
-I if A(c) - B(c) + C(c) < 0 
0 otherwise 
(2) 
+1 ifA(c)- B(c)+C(c) > 0 
pf(c) = 0 otherwise (3) 
Pf (1) is intuitive but easily suffers from over- 
training since it only considers error cases. To avoid 
the problem, we devise a new pf (2) taking the correct 
cases into account. After trying several combinations 
of A, B, C for pf, we observe that positive learning 
(3) is most effective, i.e., achieving the highest accu- 
racy. Therefore, in the current implementation, pf 
(3) is used. 
5.2 Pseudo Word Learning 
'the second adaptation mechanism is to treat error 
N-grams as new words (called pseudo words). An er- 
ror N-gram is defined as a sequence of N characters 
in which at least (N - 1) characters are wrongly con- 
verted (from syllables) by the system. (In practice, 
2 < N < 4.) For example, if \[fan4\]\[zhen4\]\[he2\] (to 
input ffill\]~$fl) is converted to ~E~, three pseudo 
words are produced: t:~, ~$\[1, and $~$~. There 
are two modes for generating pseudo words: corpus 
training and interactive correction. In the former, 
the user-specific text corpus (or simply a sample text) 
is used for generating the pseudo word lexicon (PW 
lexicon), applying the concept of bidirectional con- 
version. In the latter, pseudo words are produced 
through user corrections in an interactive input pro- 
cess. Both modes can be used at the same time. 
In the following, we will describe how to build, 
maintain, and use the user-specific PW lexicon. The 
PW lexicon stores the M (lexicon size) pseudo words 
that are produced or referenced in the most recent pe- 
riod. It is structurally exactly the same as the general 
lexicon, containing Hanzi, phonetic code, and word 
frequency. The word frequency of a new PW is set to 
/0 (3 in the implementation) and incremented by one 
when referenced. Once the word frequency exceeds 
an upper bound F, the PW would be considered as a 
real word and no longer liable to replacement. 
The procedure is: 
I. Segment the sample text into clauses (separated 
by punctuations). For each clause It, do steps 
2.--4. 
2. Convert the clause into syllable sequence I0 us- 
ing C2S, then convert I0 back to a character se- 
98 
quence Oc using baseline $2C. For each character 
Cn in 0~. do steps 3-4. 
3. Compare Cn with the corresponding input char- 
acter. Set the error flag if different. 
4. If a pseudo word ending with Cn is found (ac- 
cording to error flags) then (1) increment the 
word frequency if it is already in the PW lexi- 
con, and check the upper bound F; (2) replace 
the old entry and set frequency to f0 if the lexi- 
con has a homophone PW: (3) add a new entry 
if the lexicon has vacancies: (4) otherwise, re- 
move one of the entries that have the least word 
frequency and add the new PW. 
5. We have a new PW lexicon after the above steps 
are done. 
We observe that 3-character pseudo words are very 
useful for dealing with the unregistered proper name 
problem, which is a significant source of conversion 
errors. The reasons are: ( 1 ) A large part. of unknown 
word~ in news articles are proper names, especially 
three-character personal names; (2) It is not practi- 
cal to store all the proper names beforehand; (3) The 
proper names usually contain uncommon characters 
which are difficult to convert from syllables. There- 
fore, the user (or author) can have a personalized PW 
lexicon which contains unregistered proper names he 
will use, simply by providing a sample text. 
The parameters for both CPL and PWL can be 
trained by the bidirectional learning procedure. The 
only input the user needs to provide is a sample text 
similar to the texts he wants to input by" the phonetic- 
input-to-character converter. The phonetic input file 
will be automatically generated by the character-to- 
syllable converter. 
pus with more than 10 million characters. First, 
collect character-trigrams after the word \[~:~ 
(ji4zhe3,'reporter') and sort them according to the 
number of occurrences. Most of these trigrams ha W 
pen to be names of reporter. We use the top-10 
names as the basis for selecting articles. Then, search 
the names in the corpus in order to built the article 
databases for the 10 reporters. The AP News corpus 
is built in a similar way (searching for the word 
I\[~:~± rnei31ian2she4). Table 1 lists some statistics for 
the article databases. The first column lists the set 
names, the second column the numbers of articles in 
the set, the third column the numbers of characters, 
and the fourth column the numbers of pronounceable 
characters. 
Set 
\]wy 
lkq 
Ilg 
lxd 
yft 
yxj 
fzh 
tdc 
lsq 
ljn 
ap 
{ #Articles #Char1 ~Char2 
83 35,000 31,163 
52 22,424 20,214 
47 16,178 14,677 
49 19,429 17,368 
44 18,647 16,423 
45 14,791 13,132 
45 17,154 15,299 
44 15,323 13,638 
44 18,804 17,1'50 
44 19,065 17,122 
408 122,554 109,218 
Table 1: The Article Databases 
Each corpus is then divided into two parts accord- 
ing to publication date: a training set and a test- 
ing set. For example, the corpus lwy is divided into 
lvy-1 and lwy-2. 
In the following, we show the experimental results 
for training sets and testing sets, respectively. 
6 Experimental Results 
6.1 The Corpora 
Eleven sets of newspaper articles are extracted from 
the 1991 United Daily News Corpus (kindly provided 
by United Informatics. Inc., Taiwan). Ten of them 
are by specific reporters, i.e., one set per reporter. 
The other is translated AP News. These corpora are 
used to validate the proposed adaptation techniques. 
We design an extraction procedure to select ar- 
ticles written by a specific reporter from the cor- 
6.2 Training Sets 
Table 2 shows the adaptation results for the train- 
ing sets. The RI column lists the accuracy rates for 
the baseline system, while the R2 column lists those 
for the adapted (or personalized) system. To avoid 
the problem of over-training, we train the the system 
only by two iterations in practice. More iterations 
can improve the performance for training sets but 
hurt the performance for testing sets. The average 
character accuracy rate is improved by 4.68% (from 
93.48% to 98.16%). That is, 71.8 percent of errors 
are eliminated. 
99 
Se, Re I ne-n  I no.o I 
Iwy-1 94.32 98.35 4.03 70.9% 
lkq-1 93.07 97.85 4.78 68.9% 
llg-1 94.19 98.42 4.23 72.8% 
lxd-I 95.49 98.53 3.04 67.4% 
yft-1 94.75 98.32 3.57 68.0% 
yxj-1 91.83 98.23 6,40 78.3% 
fzh-I 91.20 98.15 i 6.95 78.9% 
tdc-I 92.13 97.71 5.58 70.9% 
lsq-1 93.63 98.23 4.60 72.2~, 
ljn-1 93.64 98.09! 4.45 69.9% 
ap-1 94.04 97.86 \] 3.82 64.0% 
Ave. 93.48 98.16 I 4.68 71.8% 
Table 2: Accuracy Rates for Training Sets 
6.3 Testing Sets 
Table 3 shows tile results for the testing sets. The 
average accuracy is improved by 1.34c~. (from 93.46% 
to 94.80%). That is, 20.5 percent of errors are elimi- 
nated. 
Set I RI Re RE.R1\] Rat,o 
lwy-2 94.06 95.00 0.94 15.8% 
ikq-2 93.81 95.95 2.14 34.6% 
ilg-2 93.38 94.18 0.80 12.1% 
lxd-2 95.97 96.42 0.45 11.2% 
yft-2 95.21 96.17 0.96 20.0% 
yxj-2 91.67 93.93 2.26 27.1% 
fzh-2 91.28 92.29 i.01 11.6% 
tdc-2 92.25 93.58 1.33 17.2% 
lsq-2 92.86 94.27 1.41 19.7% 
ljn-2 93.66 95.52 1.86 29.2% 
ap-2 i 93.93 95.54 1.61 26.5% 
Ave. I 93.46 94.80 \[ 1.34 20.5% 
Table 3: Accuracy Rates for Testing Sets 
7 Related Work 
The study of phonetic-input-to-character conversion 
has been quite active m the recent years. There are 
two different approaches for the problem: dictionary- 
based and statistics-based. 
Matsushita (Taipei) developed a Chinese word- 
string input system, ltanin, as a new input method 
(Chen \[4\]) in which phonetic symbols are continu- 
ously converted to Chinese characters through dic- 
tionary lookup. Commercial systems TianMa and 
WangXing (ETch Corp.) also belong to this type. In 
the mainland, there have been several groups involv- 
ing in similar projects \[14,15\] although most of them 
are pinyin-based and word-based. 
In the statistics-based school are relaxation tech- 
niques (Fan and Tsai \[6\] ), character bigrams with 
dynamic programming (Sproat \[12\]), constraint sat- 
isfaction approaches (JS Chang \[3\]), and zero-order 
or first-order Markov models (Gu et aL \[7\]). 
Ni \[9\] mentioned a so-called self-learning capabil- 
ity for his Chinese PC called LX-PC. However, the 
method is (1) let the user define new words during 
the input process (2) dynamically adjust the word 
frequency of used words. Chen \[4\] also proposed a 
learning function that uses a learning file to store 
user-selected characters and words and the character 
before them. The entries in the learning file are fa- 
vored over those in the regular dictionary. Lua and 
Gall \[8\] describe a simple error-correcting mechanism: 
increase the usage frequency of the desired word by 
1 unit when the user corrects the system's output. 
These methods are either manual adaptation or sim- 
ple word frequency counting. 
Recently, Su el at. \[5,13\] proposed a discrimina- 
tion oriented adaptive learning procedure for vari- 
ous problems, e.g., speech recognition, part-of-speech 
tagging, and word segmentation. The basic idea is: 
When an error is made, i.e., the first candidate is not 
correct, adjust the parameters in the score function 
based on subspace projection. The parameters for 
the correct candidate are increased, while those for 
the first candidate are decreased, both in an amount 
decided by the difference between the scores of the 
two candidates. This process continues until the cor- 
rect candidate becomes the new first candidate; that 
is, the score of the correct candidate is greater than 
that of the old first one. Our learning procedure is dif- 
ferent from theirs because (I) ours is increment-based 
while theirs is projection-based, (2) ours is not dis- 
crimination oriented, (3) ours is coarse-grained learn- 
ing while theirs is fine-grained, and (4) the applica- 
tion domain is different. 
8 Concluding Remarks 
We have presented two corpus-based adaptation 
mechanisms for Chinese homophone disambiguation: 
character-preference learning and pseudo-word learn- 
I00 
ing. Experimental results show that the error rates 
have been reduced significantly. This proves yet an- 
other success of corpus-based NLP research. 
Future works include (l) more experiments using 
various texts, (2) study on more effective adjustment 
functions for CPL, (3) study on weighting of differ- 
ent lengths of pseudo words, (4) adaptation based on 
other parameters, e.g., parts-of-speech, semantic cat- 
egories, and (5) application to linguistic decoding for 
speech recognition. 
Acknowledgements 
The author is grateful to the Chi.ese Lexicon group 
(CCL/1TRI) for the 90,000-word lexicon. This paper 
is a partial result of the project no. 37H2100 con- 
dueled by llw ITRI ,,,,d,'r Sl,O..-c~rshi P of the Minister 
of Economic Affairs, R.O.C. 

References 
\[1\] C.-H. Chang. Bidirectional conversion between 
Mandarin syllables and Chinese characters. In 
Proc. o\] 1992 lnleT~attonal ConJerence on Com- 
puter Processing of Chinese and Oriental La~- 
guages, Florida, USA, 1992. 
\[2\] C.-H. Chang. Design and evaluation of language 
models for Chinese speech recognition. In Proc. 
of 1992 CMEX Workshop on Chinese Speech 
Recognition, Taipei, Taiwan, November 1992. 
\[3\] J.-S. Chang, S.-D. Chen, and C.-D. Chen. 
Conversion of phonetic-input to Chinese text 
through constriant satisfaction. In Proc. 1991 
ICCPCOL. pages 30-36. Taipei. 1991. 
\[4\] S.-I, Chen, C,-T. Chang, J.-J. Kuo, and M.-S. 
Hsieh. The continuous conversion algorithm of 
Chinese character's phonetic symbols to Chinese 
character. In Proc. of National Computer Sym- 
posium, pages 437..-442, 1987. 
\[5\] T.-H. Chiang, J.-S. Chang, M.-Y. Lin, and K.-Y. 
Su. Statistical models for word segmentation and 
unknown word resolution In Proc. ROCLING 
1'. pages 121 146. Taipei. Taiwaln. September 
1992. 
\[6\] C.-K. Fan and W.-H. Tsai. Relaxation-based 
word identification for removing the ambiguity 
in phonetic Chinese input, lnt. J. of Pattern 
Recognition and Artificial Intelligence, 4(4):651- 
666, 1990. 
\[7\] H.-Y. Gu, C.-Y. Tseng, and L.-S. Lee. Markov 
modeling of Mandarin Chinese for decoding the 
phonetic sequence into Chinese characters. Com- 
puter Speech and Language, 5:363-377, 1991. 
\[8\] L.-S. Lee et al. Golden Mandarin (II) - an im- 
proved single-chip real-time Mandarin dictation 
machine for Chinese language with very large vo- 
cabulary. In Proc. 1993 ICASSP, pages II:503- 
506, April 1993. 
\[9\] K.T. Lua and K.W. Gan. A touch-typing Pinyin 
input system. Computer Processing of Chinese 
Oriental Languages, 6(1):85-94, June 1992. 
\[10\] G. Ni. A Chinese PC features intelligent Hanzi 
input. In Proc. 1986 Int. Conf. on Chinese Com- 
puting, pages 155-159, August 1986. 
\[I1\] ROC Computational Linguistics Society. Proc. 
of Workshop on Corpus.based Researches and 
Techniques for Natural Language Processing, 
Taipei, Taiwan, September 1992. 
\[12\] R. Sproat. An application of statistical optimiza- 
tion with dynamic programming to phonetic- 
input-to-character conversion for Chinese. In 
Proc. ROCMNG III, pages 3779-390, September 
1990. 
\[13\] K.-Y. Su and C.-H. Lee. Robustness and dis- 
crimination oriented speech recognition using 
weighted HMM and subspace projection ap- 
proaches. In Proc. ICASSP91, pages 541-544, 
Toronto, Ontario, Canada, May 1991. 
\[14\] X. Wang, K. Wang, and X. Bai. Separating syl- 
lables and characters into words in natural lan- 
guage understanding. Journal of Chinese In/or- 
matron Processing, 5(3):48-58, 1991. 
\[t5\] X. Zhong. A multiple phrase pinyin/Hanzi con- 
version mechanism. Journal of Chinese Infor- 
mation Processing, 4(2):55-64, 1990. 
