Unsupervised Training for Overlapping Ambiguity Resolution in  
Chinese Word Segmentation 
Mu Li, Jianfeng Gao, Changning Huang 
Microsoft Research, Asia  
Beijing 100080, China 
{t-muli,jfgao,cnhuang}@microsoft.com 
Jianfeng Li 
University of Science and Technology of China 
Hefei, 230027, China 
jackwind@mail.ustc.edu.cn 
Abstract 
This paper proposes an unsupervised 
training approach to resolving overlap-
ping ambiguities in Chinese word seg-
mentation. We present an ensemble of 
adapted Naïve Bayesian classifiers that 
can be trained using an unlabelled Chi-
nese text corpus. These classifiers differ 
in that they use context words within 
windows of different sizes as features. 
The performance of our approach is 
evaluated on a manually annotated test set. 
Experimental results show that the pro-
posed approach achieves an accuracy of 
94.3%, rivaling the rule-based and super-
vised training methods. 
1 Introduction 
Resolving segmentation ambiguities is one of the 
fundamental tasks for Chinese word segmentation, 
and has received considerable attention in the re-
search community. Word segmentation ambigui-
ties can be roughly classified into two classes: 
overlapping ambiguity (OA), and combination am-
biguity (CA). In this paper, we focus on the meth-
ods of resolving overlapping ambiguities.  
Consider a Chinese character string ABC, if it 
can be segmented into two words either as AB/C 
or A/BC depending on different context, ABC is 
called an overlapping ambiguity string (OAS). For 
example, given a Chinese character string “g2520g3281
g7389” (ge4-guo2-you3), it can be segmented as either 
“g2520 | g3281g7389” (each state-owned) in Sentence (1) of 
Figure 1, or “g2520g3281 | g7389” (every country has) in 
Sentence (2).  
(1) g3324 (in) | g2520 (each) | g3281g7389 (state-owned) | g1237
g1006 (enterprise)  |  g1025 (middle)  
(in each state-owned enterprise) 
(2) g3324 (in) | g1166g7447 (human rights) | g19394g20076 (prob-
lem) | g990 (on) | g2520g3281 (every country) | g7389 
(have) | g1861g2528g9869 (common ground) 
(Regarding human rights, every country has 
some common ground) 
 
Figure 1.  Overlapping  ambiguities of Chinese 
character string “g2520g3281g7389” 
Our method of resolving overlapping ambigui-
ties contains two procedures. One is to construct an 
ensemble of Naïve Bayesian classifiers to resolve 
ambiguities. The other is an unsupervised method 
for training the Naïve Bayesian classifiers which 
compose the ensemble. The main issue of the un-
supervised training is how to eliminate the nega-
tive impact of the OA errors in the training data. 
Our solution is to identify all OASs in the training 
data and replace them with a single special token. 
By doing so, we actually remove the portion of 
training data that are likely to contain OA errors. 
The classifiers are then trained on the processed 
training data. 
Our approach is evaluated on a manually anno-
tated test set with 5,759 overlapping segmentation 
ambiguities. Experimental results show that an ac-
curacy of 94.3% is achieved.  
This remainder of this paper is structured as fol-
lows: Section 2 reviews previous work. Section 3 
defines overlapping ambiguous strings in Chinese. 
Section 4 describes the evaluation results. Section 
5 presents our conclusion. 
g3622 g4054 g6573 
2 Previous Work 
Previous methods of resolving overlapping am-
biguities can be grouped into rule-based ap-
proaches and statistical approaches.  
Maximum Matching (MM) based segmentation 
(Huang, 1997) can be regarded as the simplest 
rule-based approach, in which one starts from one 
end of the input sentence, greedily matches the 
longest word towards the other end, and repeats the 
process with the rest unmatched character se-
quences until the entire sentence is processed. If 
the process starts with the beginning of the sen-
tence, it is called Forward Maximum Matching 
(FMM). If the process starts with the end of the 
sentence, it is called Backward Maximum Match-
ing (BMM). Although it is widely used due to its 
simplicity, MM based segmentation performs 
poorly in real text. 
Zheng and Liu (1997) use a set of manually 
generated rules, and reported an accuracy of 81% 
on an open test set. Swen and Yu (1999) presents a 
lexicon-based method. The basic idea is that for 
each entry in a lexicon, all possible ambiguity 
types are tagged; and for each ambiguity types, a 
solution strategy is used. They achieve an accuracy 
of 95%. Sun (1998) demonstrates that most of the 
overlapping ambiguities can be resolved without 
taking into account the context information. He 
then proposes a lexicalized rule-based approach. 
His experiments show that using the 4,600 most 
frequent rules, 51% coverage can be achieved in an 
open test set. 
Statistical methods view the overlapping 
ambiguity resolution as a search or classification 
task. For example, Liu (1997) uses a word unigram 
language model, given all possible segmentations 
of a Chinese character sequence, to search the best 
segmentation with the highest probability. Similar 
approach can be traced back to Zhang (1991). But 
the method does not target to overlapping 
ambiguities. So the disambiguation results are not 
reported. Sun (1999) presents a hybrid method 
which incorporates empirical rules and statistical 
probabilities, and reports an overall accuracy of 
92%. Li (2001) defines the word segmentation dis-
ambiguation as a binary classification problem. Li 
then uses Support Vector Machine (SVM) with 
mutual information between each Chinese charac-
ter pair as a feature. The method achieves an accu-
racy of 92%. All the above methods utilize a su-
pervised training procedure. However, a large 
manually labeled training set is not always avail-
able. To deal with the problem, unsupervised ap-
proaches have been proposed. For example, Sun 
(1997) detected word boundaries given an OAS 
using character-based statistical measures, such as 
mutual information and difference of t-test. He 
reported an accuracy of approximately 90%. In his 
approach, only the statistical information within 4 
adjacent characters is exploited, and lack of word-
level statistics may prevent the disambiguation 
performance from being further improved.  
3 Ensemble of Naïve Bayesian Classifier 
for Overlapping Ambiguity Resolution 
3.1 Problem Definition 
We first give the formal definition of overlapping 
ambiguous string (OAS) and longest OAS.  
An OAS is a Chinese character string O that 
satisfies the following two conditions: 
a) There exist two segmentations Seg
1
 and Seg
2
 
such that
2211
, SegwSegw ∈∈∀ , where Chinese 
words w
1
 and w
2
 are different from either literal 
strings or positions; 
b)
2211
, SegwSegw ∈∈∃ , where w
1
 and w
2
 
overlap.  
The first condition ensures that there are 
ambiguous word boundaries (if more than one 
word segmentors are applied) in an OAS. In the 
example presented in section 1, the string “g2520g3281
g7389” is an OAS but “g2520g3281g7389g1237g1006” is not because 
the word “g1237g1006” remains the same in both FMM 
and BMM segmentations of “g2520 | g3281g7389 | g1237g1006” and 
“g2520g3281 | g7389 | g1237g1006”. The second condition indicates 
that the ambiguous word boundaries result from 
crossing brackets. As illustrated in Figure 1, words 
“g2520g3281” and “g3281g7389” form a crossing bracket.  
The longest OAS is an OAS that is not a sub-
string of any other OAS in a given sentence. For 
example, in the case “g10995g8975g8712g5191” (sheng1-huo2-
shui3-ping2, living standard), both “g10995g8975g8712” and 
“g10995g8975g8712g5191” are OASs, but only “g10995g8975g8712g5191” is the 
longest OAS because “g10995g8975g8712” is a  substring of 
“g10995g8975g8712g5191”. In this paper, we only consider the 
longest OAS because both left and right bounda-
ries of the longest OAS are determined.  
Furthermore, we constrain our search space 
within the FMM segmentation O
f
 and BMM seg-
mentation O
b
 of a given longest OAS. According 
to Huang (1997), two important properties of OAS 
has been identified: (1) if the FMM segmentation 
is the same as its BMM segmentation (O
f
 = O
b
), for 
example “ g6640g13046g5353g6818” (sou1-suo3-yin3-qing2, 
Search Engine), the probability that the MM seg-
mentation is correct is 99%;  Otherwise, (2) if the 
FMM segmentation differs from its BMM segmen-
tation (O
f
 ≠ O
b
 ), for example “g2520g3281g7389”, the prob-
ability that at least one of the MM segmentation is 
correct is also 99%. So such a strategy will not 
lower the coverage of our approach. 
Therefore, the overlapping ambiguity resolution 
can be formulized as a binary classification prob-
lem as follows: 
Given a longest OAS O and its context feature 
set C, let G(Seg, C) be a score function of Seg  for 
},{
bf
OOseg ∈ , the overlapping ambiguity reso-
lution task is to make the binary decision: 



<
>
=
),(),(
),(),(
COGCOGO
COGCOGO
seg
bfb
bff
 
(1) 
Note that O
f
 = O
b
 means that both FMM and 
BMM arrive at the same result. The classification 
process can then be stated as: 
a) If O
f
 = O
b
, then choose either segmentation 
result since they are same; 
b) Otherwise, choose the one with the higher 
score G according to Equation (1). 
For example, in the example of “g6640g13046g5353g6818”, if 
O
f
 = O
b  
= “g6640g13046 | g5353g6818”, then “g6640g13046 | g5353g6818” is se-
lected as the answer. In another example of “g2520g3281
g7389” in sentence (1) of Figure 1, O
f
 = “g2520g3281 | g7389”, 
O
b  
= “g2520 | g3281g7389”. Assume that C = {g3324, g1237g1006}, i.e., 
we used a context window of size 3; then the seg-
mentation “ g2520g3281 | g7389 ” is selected if 
>}),{,"|(" g1237g1006g3324g7389g2520g3281G }),{,"|(" g1237g1006g3324g3281g7389g2520G , 
otherwise “g2520 | g3281g7389” is selected. 
3.2 Naïve Bayesian Classifier for Overlapping 
Ambiguity Resolution 
Last section formulates the overlapping ambi-
guity resolution of an OAS O as the binary classi-
fication between O
f
 and O
b
. This section describes 
the use of the adapted Naïve Bayesian Classifier 
(NBC) (Duda and Hart, 1973) to address problem. 
Here, we use the words around O within a window 
as features, with w
-m
…w
-1
 denoting m words on the 
left of the O and w
1
…w
n
 denoting n words on the 
right of the O. Naïve Bayesian Classifier assumes 
that all the feature variables are conditionally inde-
pendent. So the joint probability of observing a set 
of context features C = {w
-m
…w
-1, 
w
1
…w
n
} of a 
segmentation Seg (O
f
 or O
b
) of O is as follows: 
∏
−−=
−−
=
nmi
i
nm
SegwpSegp
Segwwwwp
,...1,1,...
1,1
)|()(
),,...,(
 
(2) 
Assume that Equation (2) is the score function 
in Equation (1) G, we then have two parameters to 
be estimated: p(Seg) and p(w
i
|Seg). Since we do 
not have enough labeled training data, we then re-
sort to the redundancy property of natural language. 
Due to the fact that the OAS occupies only in a 
very small portion of the entire Chinese text, it is 
feasible to estimate the word co-occurrence prob-
abilities from the portion of corpus that contains no 
overlapping ambiguities. Consider an OASg1461g5527g3332 
(xin4-xin1-de, confidently). The correct segmenta-
tion would be “g1461g5527 | g3332”, if g1817g9397 (cong1-man3, 
full of) were its context word. We note thatg1817g9397 
appears as the left context word of g1461g5527 in both 
strings g1817g9397g1461g5527g3332 and g1817g9397g1461g5527g2656g2203g8680 (g2203g8680, 
yong3-qi4, courage). While the former string con-
tains an OAS, the latter does not. We then remove 
all OAS from the training data, and estimate the 
parameters using the training data that do not con-
tain OAS. In experiments, we replace all longest 
OAS that has O
f
 ≠ O
b
 with a special token [GAP]. 
Below, we refer to the processed corpus as 
tokenized corpus. 
Note that Seg is either the FMM or the BMM 
segmentation of O, and all OASs (including Seg) 
have been removed from the tokenized corpus, 
thus there are no statistical information available to 
estimate p(Seg) and p(w
-m
…w
-1
,w
1
…w
n
|Seg) based 
on the Maximum Likelihood Estimation (MLE) 
principle. To estimate them, we introduce the fol-
lowing two assumptions. 
1) Since the unigram probability of each word w 
can be estimated from the training data, for a 
given segmentation Seg=w
s1
…w
sk
, we assume 
that each word w of Seg is generated inde-
pendently. The probability p(Seg) is approxi-
mated by the production of the word unigram 
probabilities: 
∏
∈
=
Segw
i
i
wpSegp )()(  
(3) 
2) We also assume that left and right context word 
sequences are only conditioned on the leftmost 
and rightmost words of Seg, respectively. 
)()(
)...,(),...(
)|...()|...(
)|...,...(
1
111
111
11
sks
nsksm
sknsm
nm
wpwp
wwwpwwwp
wwwpwwwp
Segwwwwp
−−
−−
−−
=
=  
(4) 
where the word sequence probabilities P(w
-m
, …, 
w
-1
, w
s1
) and P(w
sk
,w
1
, …, w
n
) are decomposed as 
productions of trigram probabilities. We used a 
statistical language model toolkit described in (Gao 
et al, 2002) to build trigram models based on the 
tokenized corpus.  
Although the final language model is trained 
based on a tokenized corpus, the approach can be 
regarded as an unsupervised one from the view of 
the entire training process: the tokenized corpus is 
automatically generated by an MM based segmen-
tation tool from the raw corpus input with neither 
human interaction nor manually labeled data re-
quired. 
3.3 Ensemble of Classifiers and Majority 
Vote 
Given different window sizes, we can obtain dif-
ferent classifiers. We then combine them to 
achieve better results using the so-called ensemble 
learning (Peterson 2000). Let NBC(l, r)  denote the 
classifier with left window size l and right window 
size r. Given the maximal window size of 2, we 
then have 9 classifiers, as shown in Table 1.  
 
 L = 0 l = 1 l = 2 
r = 0 NBC(0, 0) NBC(1, 0) NBC(2,0) 
r = 1 NBC(0,1) NBC(1,1) NBC(2,1) 
r = 2 NBC(0,2) NBC(1,2) NBC(2,2) 
Table 1. Bayesian classifiers in the ensemble 
 
The ensemble learning suggests that the ensemble 
classification results are based on the majority vote 
of these classifiers: The segmentation that is se-
lected by most classifiers is chosen.  
4 Experiments and Discussions 
4.1 Settings 
We evaluate our approach using a manually anno-
tated test set, which was selected randomly from 
People’s Daily news articles of year 1997, contain-
ing approximate 460,000 Chinese characters, or 
247,000 words. In the test set, 5759 longest OAS 
are identified. Our lexicon contains 93,700 entries. 
4.2 OAS Distribution 
We first investigate the distribution of different 
types of OAS in the test set. In our approach, the 
performance upper bound (i.e. oracle accuracy) 
cannot achieve 100% because not all the OASs’ 
correct segmentations can be generated by FMM 
and BMM segmentation. So it is very useful to 
know to what extent our approach can deal with 
the problem. 
The results are shown in Table 2. We denote 
the entire OAS data set as C, and divide it into two 
subsets A and B according to the type of OAS. It 
can be seen from the table that in data set A 
(O
f
=O
b
), the accuracy of MM segmentation 
achieves 98.8% accuracy. Meanwhile, in data set B 
(O
f
 ≠ O
b
) the oracle recall of candidates proposed 
by FMM and BMM is 95.7% (97.2% in the entire 
data set C). The statistics are very close to those 
reported in Huang (1997). 
 
O
f 
= O
b
 = COR 
2731 
47.42% 
A 
OAS 
Of = Ob 
2763 
47.98% 
O
f 
= O
b
 ≠ COR 
32 
0.56% 
O
f
  = COR ∨ O
b
 = COR 
2866 
49.77% 
B 
OAS 
Of ≠ Ob 
2996 
52.02% 
O
f
  ≠ COR ∧ O
b
 ≠ COR  
130 
2.26% 
Table 2. Distribution of OAS in the test set 
 
Here are some examples for the overlapping 
ambiguities that cannot be covered by our ap-
proach. For errors resulting from O
f  
= O
b
 ≠ COR, 
 
a typical example in the literature is g13479g2524g6116g2010g4388g7114 
(jie2-he2-cheng2-fen1-zi3-shi2, g13479g2524 | g6116 | g2010g4388 | 
g7114). For errors caused by O
f
 ≠ O
b
  and O
f
  ≠ COR ∧ 
O
b
 ≠ COR, g1998g10628g3324g1002g13438(g1055g2033) (chu1-xian4-zai4-
shi4-ji4, g1998g10628 | g3324 | g1002g13438) serves as a good exam-
ple. These two types of errors are usually com-
posed of several words and need a much more 
complicated search process to determine the final 
correct output. Since the number of such errors is 
very small, they are not target of our approach in 
this paper. 
4.3 Experimental Results of Ensemble of Na-
ïve Bayesian Classifiers 
The classifiers are trained from the People’s Daily 
news articles of year 2000, which contain over 24 
million characters. The training data is tokenized. 
That is, all OAS with O
f
 ≠ O
b
 are replaced with the 
token [GAP]. After tokenization, there are 
16,078,000 tokens in the training data in which 
203,329 are [GAP], which is 1.26% of the entire 
training data set. Then a word trigram language 
model is constructed on the tokenized corpus, and 
each Bayesian classifier is built given the language 
model. 
 
 l = 0 l = 1 l = 2 
r = 0 88.73% 88.85% 88.95% 
r = 1 89.09% 89.39% 89.39% 
r = 2 88.95% 89.39% 89.35% 
Table 3. Accuracy of each individual classifier 
 
Table 3 shows the accuracy of each classifier 
on data set B. The performance of the ensemble 
based on majority vote is 89.79% on data set B, 
and the overall accuracy on C is 94.13%. The en-
semble consistently outperforms any of its mem-
bers. Classifiers with both left and right context 
features perform better than the others because 
they are capable to segment some of the context 
sensitive OAS. For example, contextual informa-
tion is necessary to segment the OAS “ g11487g2500
g990”(kan4-tai2-shang4, on the stand) correctly in 
both following sentences: 
g1332 | g11487 | g2500g990 | g18039g1022 | g9448g2604 
(Look at the performer in the stage) 
g12461 | g3324 | g7380g20652 | g980 | g4630 | g11487g2500 | g990 
(Stand in the highest stand) 
Both Peterson (2000) and Brill (1998) found 
that the ultimate success of an ensemble depends 
on the assumption that classifiers to be combined 
make complementary errors. We investigate this 
assumption in our experiments, and estimate the 
oracle accuracy of our approach. Result shows that 
only 6.0% (180 out of 2996) of the OAS in data set 
B that is classified incorrectly by all the 9 classifi-
ers. In addition, we can see from Table 2, that 130 
instances of these 180 errors are impossible to be 
correct because neither O
f
 nor O
b
 is the correct 
segmentation. Therefore, the oracle accuracy of the 
ensemble is 94.0%, which is very close to 95.7%, 
the theoretical upper bound of our approach in data 
set B described in Section 4.2. However, our ma-
jority vote based ensemble only achieves accuracy 
close to 90%. This analysis thus suggests that fur-
ther improves can be made by using more powerful 
ensemble strategies. 
4.4 Lexicalized Rule Based OAS Disambigua-
tion 
We also conduct a series of experiments to evalu-
ate the performance of a widely used lexicalized 
rule-based OAS disambiguation approach. As re-
ported by Sun (1998) and Li (2001), over 90% of 
the OAS can be disambiguated in a context-free 
way. Therefore, simply collecting large amount of 
correctly segmented OAS whose segmentation is 
independent of its context would yield pretty good 
performance. 
We first collected 730,000 OAS with O
f
 ≠ O
b
 
from 20 years’ People’s Daily corpus which con-
tains about 650 million characters. Then approxi-
mately 47,000 most frequently occurred OASs 
were extracted. For each of the extracted OAS, 20 
sentences that contain it were randomly selected 
from the corpus, and the correct segmentation is 
manually labeled. 41,000 lexicalized disambigua-
tion rules were finally extracted from the labeled 
data, whose either MM segmentation (O
f
 or O
b
) 
gains absolute majority, over 95% in our experi-
ment. The rule set covers approximately 80% oc-
currences of all the OASs in the training set, which 
is very close to that reported in Sun (1998). Here is 
a sample rule extracted: g1461g5527g3332 => g1461g5527 | g3332. It 
means that among the 20 sentences that contain the 
character sequence “g1461g5527g3332”, at least 19 of them 
are segmented as “g1461g5527 | g3332”. 
The performance of the lexicalized rule-based 
approach is shown in Table 4, where for compari-
son we also include the performance of using only 
FMM or BMM segmentation algorithm. 
 
Accuracy   
 
Data set B Data set C 
FMM 49.44% 73.12%
BMM 46.31% 71.51% 
Rule + FMM 83.10% 90.65% 
Rule + BMM 84.43% 91.33% 
NBC(0, 0) 88.73% 93.70% 
Ensemble 89.79% 94.13% 
Table 4. Performance comparison 
 
In Table 4, Rule + FMM means if there is no 
rule applicable to an OAS, FMM segmentation will 
be used. Similarly, Rule + BMM means that BMM 
segmentation will be used as backup. We can see 
in Table 4 that rule-based systems outperform their 
FMM and BMM counterparts significantly, but do 
not perform as well as our method, even when no 
context feature is used. This is because that the 
rules can only cover about 76% of the OASs in the 
test set with precision 95%, and FMM or BMM 
performs poorly on the rest of the OASs. Although 
the precision of these lexicalized rules is high, the 
room for further improvements is limited. For ex-
ample, to achieve a higher coverage, say 90%, 
much more manually labeled training data (i.e. 
81,000 OAS) are needed.  
5 Conclusion and Future work 
Our contributions are two-fold. First, we propose 
an approach based on an ensemble of adapted na-
ïve Bayesian classifiers to resolving overlapping 
ambiguities in Chinese word segmentation. Second, 
we present an unsupervised training method of 
constructing these Bayesian classifiers on an unla-
beled training corpus. It thus opens up the possibil-
ity for adjusting this approach to a large variety of 
applications. We perform evaluations using a 
manually annotated test set. Results show that our 
approach outperforms a lexicalized rule-based sys-
tem. Future work includes investigation on how to 
construct more powerful classifier for further im-
provements. One promising way is combining our 
approach with Sun’s (1997), with a core set of con-
text free OASs manually labeled to improve accu-
racy. 
Acknowledgements 
We would like to thank Wenfeng Yang and Xiao-
dan Zhu for helpful discussions on this project and 
Wenfeng’s excellent work on the lexicalized dis-
ambiguation rule set construction.  

References 
Eric Brill and Wu Jun. 1998. Classifier combination for 
improved lexical disambiguation. In Proceedings of 
the 36
th
 Annual Meeting of the Association for Com-
putational Linguistics, Montreal. CA. 
Richard Duda and Peter Hart. 1973. Pattern Classifica-
tion and Scene Analysis. Wiley, New York, NY. 
Jianfeng Gao and Joshua Goodman, Li Mingjing, Lee 
Kai-Fu. 2002. Toward a unified approach to statisti-
cal language modeling for Chinese. ACM Transac-
tions on Asian Language Information Processing, 1 
(1):3-33. 
Changning Huang. 1997. Segmentation problem in Chi-
nese Processing. (In Chinese) Applied Linguistics. 
1:72-78. 
Rong Li, Shaohui Liu, Shiwei Ye and Zhongzhi Shi. 
2001. A Method of Crossing Ambiguities in Chinese 
word Segmentation Based on SVM and k-NN. (In 
Chinese) Journal of Chinese Information Processing, 
15(6): 13-18. 
Ting Liu, Kaizhu Wang and Xinghai Jiang. 1997. The 
Maximum Probability Segmentation Algorithm of 
Ambiguous Character Strings. (In Chinese) Lan-
guage Engineering. Tsinghua University Press. 
pp.182-187. 
Ted Pedersen. 2000. A Simple Approach to Building 
Ensembles of Naive Bayesian Classifiers for Word 
Sense Disambiguation. In Proceedings of the First 
Annual Meeting of the North American Chapter of 
the Association for Computational Linguistics. Seat-
tle, WA. pp. 63-69 
Maosong Sun, Changning Huang and Benjamin K. Tsou. 
1997. Using Character Bigram for Ambiguity Reso-
lution In Chinese Words Segmentation. (In Chinese) 
Computer Research and Development, 34(5): 332-
339 
Maosong Sun and Zhengping Zuo. 1998. Overlapping 
ambiguity in Chinese Text. (In Chinese) Quantita-
tive and Computational Studies on the Chinese 
Language, HK, pp. 323-338 
Maosong Sun, Zhengping Zuo and Changning Huang. 
1999. Algorithm for solving 3-character crossing 
ambiguities in Chinese word segmentation. (In Chi-
nese) Journal of Tsinghua University Science and 
Technology, 39(5). 
Bing Swen and Shiwen Yu. 1999. A Graded Approach 
for the Efficient Resolution of Chinese Word Seg-
mentation Ambiguities. In Proceedings of 5
th
 Natural 
Language Processing Pacific Rim Symposium. pp. 
19-24 
Junsheng Zhang, Zhida Chen and Shunde Chen. 1991. 
Constraint Satisfaction and Probabilistic Chinese 
Segmentation. (In Chinese) In Proceedings of 
ROCLING IV (R.O.C. Computational Linguistics 
Conference). pp. 147-165 
Jiaheng Zheng and Kaiying Liu. 1997. The Research of 
Ambiguity Word – Segmentation Technique for the 
Chinese Text. (In Chinese) Language Engineering, 
Tsinghua University Press, pp. 201-206 
