Chinese and Japanese Word Segmentation Using Word-Level and
Character-Level Information
Tetsuji Nakagawa
Corporate Research and Development Center
Oki Electric Industry Co., Ltd.
2¡5¡7 Honmachi, Chuo-ku, Osaka 541-0053, Japan
nakagawa378@oki.com
Abstract
In this paper, we present a hybrid method
for Chinese and Japanese word segmentation.
Word-level information is useful for analysis
of known words, while character-level informa-
tion is useful for analysis of unknown words,
and the method utilizes both these two types
of information in order to effectively handle
known and unknown words. Experimental re-
sults show that this method achieves high over-
all accuracy in Chinese and Japanese word seg-
mentation.
1 Introduction
Word segmentation in Chinese and Japanese is
an important and difficult task. In these lan-
guages, words are not separated by explicit delim-
iters, and word segmentation must be conducted
first in most natural language processing applica-
tions. One of the problems which makes word seg-
mentation more difficult is existence of unknown
(out-of-vocabulary) words. Unknown words are de-
fined as words that do not exist in a system’s dictio-
nary. The word segmentation system has no knowl-
edge about these unknown words, and determining
word boundaries for such words is difficult. Accu-
racy of word segmentation for unknown words is
usually much lower than that for known words.
In this paper, we propose a hybrid method for
Chinese and Japanese word segmentation, which
utilizes both word-level and character-level infor-
mation. Word-level information is useful for anal-
ysis of known words, and character-level informa-
tion is useful for analysis of unknown words. We
use these two types of information at the same time
to obtain high overall performance.
This paper is organized as follows: Section 2
describes previous work on Chinese and Japanese
word segmentation on which our method is based.
Section 3 introduces the hybrid method which com-
bines word-level and character-level processing.
Section 4 shows experimental results of Chinese and
Japanese word segmentation. Section 5 discusses
related work, and Section 6 gives the conclusion.
2 Previous Work on Word Segmentation
Our method is based on two existing methods for
Chinese or Japanese word segmentation, and we ex-
plain them in this section.
2.1 The Markov Model-Based Method
Word-based Markov models are used in English
part-of-speech (POS) tagging (Charniak et al.,
1993; Brants, 2000). This method identifies POS-
tags T = t1;:::;tn, given a sentence as a word se-
quence W = w1;:::;wn, where n is the number
of words in the sentence. The method assumes that
each word has a state which is the same as the POS
of the word and the sequence of states is a Markov
chain. A state t transits to another state s with prob-
ability P(sjt), and outputs a word w with probabil-
ity P(wjt). From such assumptions, the probability
that the word sequence W with parts-of-speech T is
generated is
P(W;T) =
nY
i=1
P(witijw0t0 :::wi¡1ti¡1);
’
nY
i=1
P(wijti)P(tijti¡1); (1)
where w0(t0) is a special word(part-of-speech) rep-
resenting the beginning of the sentence. Given a
word sequence W, its most likely POS sequence ˆT
can be found as follows:
ˆT = argmax
T
P(TjW);
= argmax
T
P(W;T)
P(W) ;
= argmax
T
P(W;T);
’ argmax
T
nY
i=1
P(wijti)P(tijti¡1): (2)
The equation above can be solved efficiently by the
Viterbi algorithm (Rabiner and Juang, 1993).
In Chinese and Japanese, the method is used
with some modifications. Because each word in a
Figure 1: Example of Lattice Used in the Markov Model-Based Method
sentence is not separated explicitly in Chinese and
Japanese, both segmentation of words and identifi-
cation of the parts-of-speech tags of the words must
be done simultaneously. Given a sentence S, its
most likely word sequence ˆW and POS sequence
ˆT can be found as follows where W ranges over the
possible segments of S (w1¢¢¢wn = S):
( ˆW; ˆT) = argmax
W;T
P(W;TjS);
= argmax
W;T
P(W;T;S)
P(S) ;
= argmax
W;T
P(W;T;S);
= argmax
W;T
P(W;T);
’ argmax
W;T
nY
i=1
P(wijti)P(tijti¡1): (3)
The equation above can be solved using the Viterbi
algorithm as well.
The possible segments of a given sentence are
represented by a lattice, and Figure 1 shows an ex-
ample. Given a sentence, this method first con-
structs such a lattice using a word dictionary, then
chooses the best path which maximizes Equation
(3).
This Markov model-based method achieves high
accuracy with low computational cost, and many
Japanese word segmentation systems adopt it
(Kurohashi and Nagao, 1998; Matsumoto et al.,
2001). However, the Markov model-based method
has a difficulty in handling unknown words. In the
constructing process of a lattice, only known words
are dealt with and unknown words must be handled
with other methods. Many practical word segmen-
tation systems add candidates of unknown words to
Tag Description
B The character is in the beginning of a word.
I The character is in the middle of a word.
E The character is in the end of a word.
S The character is itself a word.
Table 1: The ‘B, I, E, S’ Tag Set
the lattice. The candidates of unknown words can be
generated by heuristic rules(Matsumoto et al., 2001)
or statistical word models which predict the proba-
bilities for any strings to be unknown words (Sproat
et al., 1996; Nagata, 1999). However, such heuris-
tic rules or word models must be carefully designed
for a specific language, and it is difficult to properly
process a wide variety of unknown words.
2.2 The Character Tagging Method
This method carries out word segmentation by tag-
ging each character in a given sentence, and in
this method, the tags indicate word-internal posi-
tions of the characters. We call such tags position-
of-character (POC) tags (Xue, 2003) in this paper.
Several POC-tag sets have been studied (Sang and
Veenstra, 1999; Sekine et al., 1998), and we use the
‘B, I, E, S’ tag set shown in Table 1 1.
Figure 2 shows an example of POC-tagging. The
POC-tags can represent word boundaries for any
sentences, and the word segmentation task can be
reformulated as the POC-tagging task. The tagging
task can be solved by using general machine learn-
ing techniques such as maximum entropy (ME)
models (Xue, 2003) and support vector machines
(Yoshida et al., 2003; Asahara et al., 2003).
1The ‘B, I, E, S’ tags are also called ‘OP-CN, CN-CN, CN-
CL, OP-CL’ tags (Sekine et al., 1998) or ‘LL, MM, RR, LR’
tags (Xue, 2003).
Figure 2: Example of the Character Tagging Method: Word boundaries are indicated by vertical lines (‘j’).
This character tagging method can easily han-
dle unknown words, because known words and un-
known words are treated equally and no other ex-
ceptional processing is necessary. This approach is
also used in base-NP chunking (Ramshaw and Mar-
cus, 1995) and named entity recognition (Sekine et
al., 1998) as well as word segmentation.
3 Word Segmentation Using Word-Level
and Character-Level Information
We saw the two methods for word segmentation
in the previous section. It is observed that the
Markov model-based method has high overall ac-
curacy, however, the accuracy drops for unknown
words, and the character tagging method has high
accuracy for unknown words but lower accuracy
for known words (Yoshida et al., 2003; Xue, 2003;
Sproat and Emerson, 2003). This seems natural be-
cause words are used as a processing unit in the
Markov model-based method, and therefore much
information about known words (e.g., POS or word
bigram probability) can be used. However, un-
known words cannot be handled directly by this
method itself. On the other hand, characters are
used as a unit in the character tagging method. In
general, the number of characters is finite and far
fewer than that of words which continuously in-
creases. Thus the character tagging method may be
robust for unknown words, but cannot use more de-
tailed information than character-level information.
Then, we propose a hybrid method which com-
bines the Markov model-based method and the char-
acter tagging method to make the most of word-
level and character-level information, in order to
achieve high overall accuracy.
3.1 A Hybrid Method
The hybrid method is mainly based on word-level
Markov models, but both POC-tags and POS-tags
are used in the same time and word segmentation
for known words and unknown words are conducted
simultaneously.
Figure 3 shows an example of the method given
a Japanese sentence “ ”,
where the word “ ”(person’s name) is an un-
known word. First, given a sentence, nodes of
lattice for known words are made as in the usual
Markov model-based method. Next, for each char-
acter in the sentence, nodes of POC-tags (four nodes
for each character) are made. Then, the most likely
path is searched (the thick line indicates the correct
path in the example). Unknown words are identified
by the nodes with POC-tags. Note that some transi-
tions of states are not allowed (e.g. from I to B, or
from any POS-tags to E), and such transitions are
ignored.
Because the basic Markov models in Equation
(1) are not expressive enough, we use the following
equation instead to estimate probability of a path in
a lattice more precisely:
P(W;T) =
nY
i=1
P(witijw0t0 :::wi¡1ti¡1);
’
nY
i=1
f‚1P(wijti)P(ti)
+‚2P(wijti)P(tijti¡1)
+‚3P(wijti)P(tijti¡2ti¡1)
+‚4P(witijwi¡1ti¡1)g;
(‚1 +‚2 +‚3 +‚4 = 1): (4)
The probabilities in the equation above are esti-
mated from a word segmented and POS-tagged cor-
pus using the maximum-likelihood method, for ex-
ample,
P(wijti) =
8<
:
f(wi;ti)P
w f(w;ti)
(f(wi;ti) > 0),
0:5P
w f(w;ti)
(f(wi;ti) = 0), (5)
where f(w;t) is a frequency that the word w with
the tag t occurred in training data. Unseen events
in the training data are handled as they occurred 0.5
times for smoothing. ‚1;‚2;‚3;‚4 are calculated
by deleted interpolation as described in (Brants,
2000). A word dictionary for a Markov model-
based system is often constructed from a training
corpus, and no unknown words exist in the training
corpus in such a case. Therefore, when the param-
eters of the above probabilities are trained from a
training corpus, words that appear only once in the
training corpus are regarded as unknown words and
decomposed to characters with POC-tags so that
statistics about unknown words are obtained2.
2As described in Equation (5), we used the additive smooth-
ing method which is simple and easy to implement. Although
there are other more sophisticated methods such as Good-
Turing smoothing, they may not necessarily perform well be-
cause the distribution of words is changed by this operation.
Figure 3: Example of the Hybrid Method
In order to handle various character-level fea-
tures, we calculate word emission probabilities for
POC-tags by Bayes’ theorem:
P(wijti)
= P(tijwi;ti 2 TPOC)P(wi;ti 2 TPOC)P(t
i)
;
= P(tijwi;ti 2 TPOC)
P
t2TPOC P(wi;t)
P(ti) ; (6)
where TPOC = fB;I;E;Sg, wi is a character and
ti is a POC-tag. In the above equation, P(ti) and
P(wi;t) are estimated by the maximum-likelihood
method, and the probability of a POC tag ti, given
a character wi (P(tijwi;ti 2 TPOC)) is estimated
using ME models (Berger et al., 1996). We use the
following features for ME models, where cx is the
xth character in a sentence, wi = ci0 and yx is the
character type of cx (Table 2 shows the definition of
character types we used):
(1) Characters (ci0¡2, ci0¡1, ci0, ci0+1, ci0+2)
(2) Pairs of characters (ci0¡2ci0¡1, ci0¡1ci0,
ci0¡1ci0+1, ci0ci0+1, ci0+1ci0+2)
(3) Character types (yi0¡2, yi0¡1, yi0, yi0+1, yi0+2)
(4) Pairs of character types (yi0¡2yi0¡1, yi0¡1yi0,
yi0¡1yi0+1, yi0yi0+1, yi0+1yi0+2)
Parameters of ME are trained using all the words in
training data. We use the Generalized Iterative Scal-
ing algorithm (Darroch and Ratcliff, 1972) for pa-
rameter estimation, and features that appeared less
than or equal to 10 times in training data are ignored
in order to avoid overfitting.
What our method is doing for unknown words
can be interpreted as follows: The method exam-
ines all possible unknown words in a sentence, and
probability for an unknown word of length k, wi =
Character Type Description
Alphabet Alphabets
Numeral Arabic and Chinese numerals
Symbol Symbols
Kanji Chinese Characters
Hiragana Hiragana (Japanese scripts)
Katakana Katakana (Japanese scripts)
Table 2: Character Types
cj ¢¢¢cj+k¡1 is calculated as:
P(witijh) (7)
=
8>
<
>:
P(cjSjh) (k = 1);
P(cjBjh)Qj+k¡2l=j+1 P(clIjh)P(cj+k¡1Ejh)
(k > 1);
where h is a history of the sequence. In other words,
the probability of the unknown word is approxi-
mated by the product of the probabilities of the com-
posing characters, and this calculation is done in the
framework of the word-level Markov model-based
method.
4 Experiments
This section gives experimental results of Chinese
and Japanese word segmentation with the hybrid
method. The following values are used to evaluate
the performance of word segmentation:
R : Recall (The number of correctly segmented
words in system’s output divided by the num-
ber of words in test data)
P : Precision (The number of correctly segmented
words in system’s output divided by the num-
ber of words in system’s output)
F : F-measure (F = 2£R£P=(R+P))
Rknown : Recall for known words
Runknown : Recall for unknown words
Corpus # of Training Words # of Testing Words # of Words Rate of
(known/unknown) in Dictionary Unknown Words
AS 5,806,611 11,985 (11,727/ 258) 146,212 0.0215
HK 239,852 34,955 (32,463/2,492) 23,747 0.0713
PK 1,121,017 17,194 (16,005/1,189) 55,226 0.0692
RWCP 840,879 93,155 (93,085/ 70) 315,602 0.0008
Table 3: Statistical Information of Corpora
4.1 Experiments of Chinese Word
Segmentation
We use three Chinese word-segmented corpora, the
Academia Sinica corpus (AS), the Hong Kong City
University corpus (HK) and the Beijing University
corpus (PK), all of which were used in the First
International Chinese Word Segmentation Bake-
off (Sproat and Emerson, 2003) at ACL-SIGHAN
2003.
The three corpora are word-segmented corpora,
but POS-tags are not attached, therefore we need to
attach a POS-tag (state) which is necessary for the
Markov model-based method to each word. We at-
tached a state for each word using the Baum-Welch
algorithm (Rabiner and Juang, 1993) which is used
for Hidden Markov Models. The algorithm finds
a locally optimal tag sequence which maximizes
Equation (1) in an unsupervised way. The initial
states are randomly assigned, and the number of
states is set to 64.
We use the following systems for comparison:
Bakeoff-1, 2, 3 The top three systems participated
in the SIGHAN Bakeoff (Sproat and Emerson,
2003).
Maximum Matching A word segmentation sys-
tem using the well-known maximum matching
method.
Character Tagging A word segmentation system
using the character tagging method. This sys-
tem is almost the same as the one studied by
Xue (2003). Features described in Section 3.1
(1)–(4) and the following (5) are used to esti-
mate a POC tag of a character ci0, where tx is
a POC-tag of the xth character in a sentence:
(5) Unigram and bigram of previous POC-
tags (ti0¡1;ti0¡2ti0¡1)
All these systems including ours do not use any
other knowledge or resources than the training data.
In this experiments, word dictionaries used by the
hybrid method and Maximum Matching are con-
structed from all the words in each training corpus.
Statistical information of these data is shown in Ta-
ble 3. The calculated values of ‚i in Equation (4)
are shown in Table 4.
Corpus ‚1 ‚2 ‚3 ‚4
AS 0.037 0.178 0.257 0.528
HK 0.048 0.251 0.313 0.388
PK 0.055 0.207 0.242 0.495
RWCP 0.073 0.105 0.252 0.571
Table 4: Calculated Values of ‚i
The results are shown in Table 5. Our system
achieved the best F-measure values for the three
corpora. Although the hybrid system’s recall val-
ues for known words are not high compared to the
participants of SIGHAN Bakeoff, the recall values
for known words and unknown words are relatively
well-balanced. The results of Maximum Matching
and Character Tagging show the trade-off between
the word-based approach and the character-based
approach which was discussed in Section 3. Max-
imum Matching is word-based and has the higher
recall values for known words than Character Tag-
ging on the HK and PK corpus. Character Tagging
is character-based and has the highest recall values
for unknown words on the AS, HK and PK corpus.
4.2 Experiments of Japanese Word
Segmentation
We use the RWCP corpus, which is a Japanese
word-segmented and POS-tagged corpus.
We use the following systems for comparison:
ChaSen The word segmentation and POS-tagging
system based on extended Markov models
(Asahara and Matsumoto, 2000; Matsumoto et
al., 2001). This system carries out unknown
word processing using heuristic rules.
Maximum Matching The same system used in the
Chinese experiments.
Character Tagging The same system used in the
Chinese experiments.
As a dictionary for ChaSen, Maximum Matching
and the hybrid method, we use IPADIC (Matsumoto
and Asahara, 2001) which is attached to ChaSen.
Statistical information of these data is shown in Ta-
ble 3. The calculated values of ‚i in Equation (4)
are shown in Table 4.
Corpus System R P F Rknown Runknown
Hybrid method 0.973 0.971 0.972 0.979 0.717
Bakeoff-1 0.966 0.956 0.961 0.980 0.364
AS Bakeoff-2 0.961 0.958 0.959 0.966 0.729
Bakeoff-3 0.944 0.945 0.945 0.952 0.574
Maximum Matching 0.917 0.912 0.915 0.938 0.000
Character Tagging 0.962 0.959 0.960 0.966 0.744
Hybrid method 0.951 0.948 0.950 0.969 0.715
Bakeoff-1 0.947 0.934 0.940 0.972 0.625
HK Bakeoff-2 0.940 0.908 0.924 0.980 0.415
Bakeoff-3 0.917 0.915 0.916 0.936 0.670
Maximum Matching 0.908 0.830 0.867 0.975 0.037
Character Tagging 0.917 0.917 0.917 0.932 0.728
Hybrid method 0.957 0.952 0.954 0.970 0.774
Bakeoff-1 0.962 0.940 0.951 0.979 0.724
PK Bakeoff-2 0.955 0.938 0.947 0.976 0.680
Bakeoff-3 0.955 0.938 0.946 0.977 0.647
Maximum Matching 0.930 0.883 0.906 0.974 0.020
Character Tagging 0.932 0.931 0.931 0.943 0.786
Table 5: Performance of Chinese Word Segmentation
Corpus System R P F Rknown Runknown
Hybrid method 0.993 0.994 0.993 0.993 0.586
RWCP ChaSen 0.991 0.992 0.991 0.991 0.243
Maximum Matching 0.880 0.918 0.898 0.880 0.100
Character Tagging 0.972 0.968 0.970 0.972 0.629
Table 6: Performance of Japanese Word Segmentation
The results are shown in Table 63. Compared to
ChaSen, the hybrid method has the comparable F-
measure value and the higher recall value for un-
known words (the difference is statistically signif-
icant at 95% confidence level). Character Tagging
has the highest recall value for unknown words as
in the Chinese experiments.
5 Discussion
Several studies have been conducted on word seg-
mentation and unknown word processing. Xue
(2003) studied Chinese word segmentation using
the character tagging method. As seen in the pre-
vious section, this method handles known and un-
known words in the same way basing on character-
level information. Our experiments showed that the
method has quite high accuracy for unknown words,
but accuracy for known words tends to be lower than
other methods.
3In this evaluation, Rknown and Runknown are calculated
considering words in the dictionary as known words. Words
which are in the training corpus but not in the dictionary are
handled as unknown words in the calculations. The number of
known/unknown words of the RWCP corpus shown in Table 3
is also calculated in the same way.
Uchimoto et al. (2001) studied Japanese word
segmentation using ME models. Although their
method is word-based, no word dictionaries are
used directly and known and unknown words are
handled in a same way. The method estimates how
likely a string is to be a word using ME. Given a
sentence, the method estimates the probabilities for
every substrings in the sentence. Word segmenta-
tion is conducted by finding a division of the sen-
tence which maximizes the product of probabilities
that each divided substring is a word. Compared
to our method, their method can handle some types
of features for unknown words such as “the word
starts with an alphabet and ends with a numeral” or
“the word consists of four characters”. Our method
cannot handle such word-level features because un-
known words are handled by using a character as
a unit. On the other hand, their method seems to
have a computational cost problem. In their method,
unknown words are processed by using a word as
a unit, and the number of candidates for unknown
words in a sentence which consists of n characters
is equal to n(n + 1)=2. Actually, they did not con-
sider every substrings in a sentence, and limited the
length of substrings to be less than or equal to five
characters. In our method, the number of POC-
tagged characters which is necessary for unknown
word processing is equal to 4n, and there is no lim-
itation for the length of unknown words.
Asahara et al. (2003) studied Chinese word seg-
mentation based on a character tagging method
with support vector machines. They preprocessed
a given sentence using a word segmenter based on
Markov models, and the output is used as features
for character tagging. Their method is a character-
based method incorporating word-level information
and that is reverse to our approach. They did not use
some of the features we used like character types,
and our method achieved higher accuracies com-
pared to theirs on the AS, HK and PK corpora (Asa-
hara et al., 2003).
6 Conclusion
In this paper, we presented a hybrid method for
word segmentation, which utilizes both word-level
and character-level information to obtain high ac-
curacy for known and unknown words. The method
combines two existing methods, the Markov model-
based method and character tagging method. Ex-
perimental results showed that the method achieves
high accuracy compared to the other state-of-the-art
methods in both Chinese and Japanese word seg-
mentation. The method can conduct POS tagging
for known words as well as word segmentation, but
tagging identified unknown words is left as future
work.
Acknowledgements
This work was supported by a grant from the Na-
tional Institute of Information and Communications
Technology of Japan.

References

Masayuki Asahara and Yuji Matsumoto. 2000. Ex-
tended Models and Tools for High-performance Part-
of-Speech Tagger. In Proceedings of the 18th Inter-
national Conference on Computational Linguistics,
pages 21–27.

Masayuki Asahara, Chooi Ling Goh, Xiaojie Wang, and
Yuji Matsumoto. 2003. Combining Segmenter and
Chunker for Chinese Word Segmentation. In Pro-
ceedings of the 2nd SIGHAN Workshop on Chinese
Language Processing, pages 144–147.

Adam L. Berger, Stephen A. Della Pietra, and Vincent
J. Della Pietra. 1996. A Maximum Entropy Approach
to Natural Language Processing. Computational Lin-
guistics, 22(1):39–71.

Thorsten Brants. 2000. TnT — A Statistical Part-
of-Speech Tagger. In Proceedings of ANLP-NAACL
2000, pages 224–231.

Eugene Charniak, Curtis Hendrickson, Neil Jacobson,
and Mike Perkowitz. 1993. Equations for Part-of-
Speech Tagging. In Proceedings of the Eleventh Na-
tional Conference on Artificial Intelligence, pages
784–789.

J. Darroch and D. Ratcliff. 1972. Generalized iterative
scaling for log-linear models. The annuals of Mathe-
matical Statistics, 43(5):1470–1480.

Sadao Kurohashi and Makoto Nagao. 1998. Japanese
Morphological Analysis System JUMAN version 3.61.
Department of Informatics, Kyoto University. (in
Japanese).

Yuji Matsumoto and Masayuki Asahara. 2001. IPADIC
User’s Manual version 2.2.4. Nara Institute of Sci-
ence and Technology. (in Japanese).

Yuji Matsumoto, Akira Kitauchi, Tatsuo Yamashita,
Yoshitaka Hirano, Hiroshi Matsuda, Kazuma
Takaoka, and Masayuki Asahara. 2001. Morpholog-
ical Analysis System ChaSen version 2.2.8 Manual.
Nara Institute of Science and Technology.

Masaki Nagata. 1999. A Part of Speech Estimation
Method for Japanese Unknown Words using a Statis-
tical Model of Morphology and Context. In Proceed-
ings of the 37th Annual Meeting of the Association for
Computational Linguistics, pages 227–284.

Lawrence R. Rabiner and Biing-Hwang Juang. 1993.
Fundamentals of Speech Recognition. PTR Prentice-
Hall.

Lance Ramshaw and Mitch Marcus. 1995. Text Chunk-
ing using Transformation-Based Learning. In Pro-
ceedings of the 3rd Workwhop on Very Large Corpora,
pages 88–94.

Erik F. Tjong Kim Sang and Jorn Veenstra. 1999. Rep-
resenting Text Chunks. In Proceedings of 9th Confer-
ence of the European Chapter of the Association for
Computational Linguistics, pages 173–179.

Satoshi Sekine, Ralph Grishman, and Hiroyuki Shinnou.
1998. A Decision Tree Method for Finding and Clas-
sifying Names in Japanese Texts. In Proceedings of
the 6th Workshop on Very Large Corpora, pages 171–177.

Richard Sproat and Thomas Emerson. 2003. The First
International Chinese Word Segmentation Bakeoff. In
Proceedings of the Second SIGHAN Workshop on Chi-
nese Language Processing, pages 133–143.

Richard Sproat, Chilin Shih, William Gale, and Nancy
Chang. 1996. A Stochastic Finite-State Word-
Segmentation Algorithm for Chinese. Computational
Linguistics, 22(3):377–404.

Kiyotaka Uchimoto, Satoshi Sekine, and Hitoshi Isahara.
2001. The Unknown Word Problem: a Morphological
Analysis of Japanese Using Maximum Entropy Aided
by a Dictionary. In Proceedings of the 2001 Confer-
ence on Empirical Methods in Natural Language Pro-
cessing, pages 91–99.

Nianwen Xue. 2003. Chinese Word Segmentation as
Character Tagging. International Journal of Compu-
tational Linguistics and Chinese, 8(1):29–48.

Tatsumi Yoshida, Kiyonori Ohtake, and Kazuhide Ya-
mamoto. 2003. Performance Evaluation of Chinese
Analyzers with Support Vector Machines. Journal
of Natural Language Processing, 10(1):109–131. (in
Japanese).
