A New Chinese Natural Language Understanding Architecture
Based on Multilayer Search Mechanism
Wanxiang Che Ting Liu Sheng Li
School of Computer Science and Technology
Harbin Institute of Technology
P.O. Box 321, HIT
Harbin China, 150001
fcar, tliu, lsg@ir.hit.edu.cn
Abstract
A classical Chinese Natural Language Under-
standing (NLU) architecture usually includes
several NLU components which are executed
with some mechanism. A new Multilayer Search
Mechanism (MSM) which integrates and quan-
tifles these components into a uniform multi-
layer treelike architecture is presented in this
paper. The mechanism gets the optimal re-
sult with search algorithms. The components
in MSM afiect each other. At last, the per-
formance of each component is enhanced. We
built a practical system { CUP (Chinese Under-
standing Platform) based on MSM with three
layers. By the experiments on Word Segmen-
tation, a better performance was achieved. In
theory the normal cascade and feedback mech-
anism are just some special cases of MSM.
1 Introduction
At present a classical Chinese NLU architec-
ture usually includes several components, such
as Word Segmentation (Word-Seg), POS Tag-
ging, Phrase Analysis, Parsing, Word Sense Dis-
ambiguation (WSD) and so on. These compo-
nents are executed one by one from lower layers
(such as Word-Seg, POS Tagging) to higher lay-
ers (such as Parsing, WSD) to form a kind of
cascade mechanism. But when people build a
NLU system based on these complex language
analysis, it is a very serious problem since the
errors of each layer component are multiplied.
With more and more analysis components, the
flnal result becomes too bad to be applicable.
Another problem is that the components in
the system afiect each other when people build a
practical but toy NLU system. Here the toy sys-
tem means that each component is ideal enough
with perfect input. But in fact, on the one hand
the lower layer components need the informa-
tion of higher layer components; on the other
hand the incorrect analysis of lower layers must
reduce the accuracy of higher layers. In Chinese
Word-Seg component, many segmentation am-
biguities which cannot be solved using only lexi-
cal information. In order to improve the perfor-
mance of Word-Seg, we have to use some syntax
and even semantic information. Without cor-
rect Word-Seg results, however the syntax and
semantic parser cannot obtain a correct analy-
sis. It is a chain debts problem.
People have tried to solve the error-multiplied
problem by integrating multi-layers into a uni-
form model (Gao et al., 2001; Nagata, 1994).
But with the increasing number of integrated
layers, the model becomes too complex to build
or solve.
The feedback mechanism (Wu and Jiang,
1998) helps to use the information of high lay-
ers to control the flnal result. If the analysis
at feedback point cannot be passed, the whole
analysis will be denied. This mechanism places
too much burden on the function of feedback
point. This leads to the problems that a correct
lower layer result may be rejected or an error
result may be accepted.
We propose a new Multilayer Search Mecha-
nism (MSM) to solve the problems mentioned
above. Based on the mechanism, we build a
practical Chinese NLU platform { CUP (Chi-
nese Understanding Platform). Section 2 intro-
duces the background and architecture of the
new mechanism and how to build it up. Exper-
imental results with CUP is given in Section 3.
In Section 4, we discuss why the new mechanism
gets better results than the old ones. Conclu-
sions and the some future work follow in Sec-
tion 5.
2 Multilayer Search Mechanism
The novel Multilayer Search Mechanism (MSM)
integrates and quantifles NLU components into
a uniform multilayer treelike platform, such as
Word-Seg, POS Tagging, Parsing and so on.
These components afiect each other by comput-
ing the flnal score and then get better results.
2.1 Background
Considering a Chinese sentence, the sen-
tence analysis task can be formally deflned
as flnding a set of word segmentation se-
quence (W), a POS tagging sequence (POS),
a syntax dependency parsing tree (DP) and
so on which maximize their joint probability
P(W;POS;DP;¢¢¢). In this paper, we assume
that there are only three layers W, POS and
DP in MSM. It is relatively straightforward,
however, to extend the method to the case for
which there are more than three layers. There-
fore, the sentence analysis task can be described
as flnding a triple < W;POS;DP > that max-
imize the joint probability P(W;POS;DP).
< W;POS;DP >= arg max
W;POS;DP
P(W;POS;DP)
The joint probability distribution
P(W;POS;DP) can be written in the fol-
lowing form using the chain rule of probability:
P(W;POS;DP)
=P(W)P(POSjW)P(DPjW;POS)
Where P(W) is considered as the probabil-
ity of the word segmentation layer, P(POSjW)
is the conditional probability of POS Tag-
ging with a given word segmentation result,
P(DPjW;POS) is the conditional probability
of a dependency parsing tree with a given word
segmentation and POS Tagging result similarly.
So the form of < W;POS;DP > can be trans-
formed into:
< W;POS;DP >
= arg max
W;POS;DP
P(W;POS;DP)
= arg max
W;POS;DP
P(W)P(POSjW)P(DPjW;POS)
= arg max
W;POS;DP
logP(W)+logP(POSjW)
+logP(DPjW;POS)
= arg min
W;POS;DP
¡logP(W)¡logP(POSjW)
¡logP(DPjW;POS)
We consider that each inversion of probability’s
logarithm at the last step of the above equation
is a score given by a component (Such as Word-
Seg, POS Tagging and so on). So at last, we flnd
an n-tuple < W;POS;DP;¢¢¢ > that minimizes
the last score Sn of a sentence analysis result
with n layers. Sn is deflned as:
Sn = s1 +s2 +¢¢¢+sn (1)
si denotes the score of the ith layer compo-
nent.
2.2 The Architecture of Multilayer
Search Mechanism
Because there are lots of analysis results at each
layer, it’s a combinatorial explosion problem to
flnd the optimal result. Assuming that each
component produces m results for an input on
average and there are n layers in a NLU system,
the flnal search space is mn. With the increas-
ing of n, it’s impossible for a system to flnd the
optimal result in the huge search space.
The classical cascade mechanism uses a
greedy algorithm to solve the problem. It only
keeps the optimal result at each layer. But if
it’s a fault analysis result for the optimal result
at a layer, it’s impossible for this mechanism to
flnd the flnal correct analysis result.
To overcome the di–culty, we build a new
Multilayer Search Mechanism (MSM). Difierent
from the cascade mechanism, MSM maintains a
number of results at each component, so that
the correct analysis should be included in these
results with high probability. Then MSM tries
to use the information of all layer components
to flnd out the correct analysis result. Difierent
from the feedback mechanism, the acceptance
of an analysis is not based on a higher layer
components alone. The lower layer components
provide some information to help to flnd the
correct analysis result as well.
According to the above idea, we design the
architecture of MSM with multilayer treelike
structure. The original input is root and the
several analysis results of the input become
branches. Iterating this progress, we get a big-
ger analysis tree. Figure 1 gives an analysis ex-
ample of a Chinese sentence \��
�	��7
� b" (He likes beautiful  owers). For the in-
put sentence, there are several Word-Seg results
withscores(thelowerthebetter). Thenforeach
of Word-Seg results, there are several POS Tag-
ging results, too. And for each of POS Tagging
result, the same thing happens. So we get a big
tree structure and the correct analysis result is a
path in the tree from the root to the leaf except
for there is no correct analysis result in some
analysis components.
A search algorithm can be used to flnd out the
correct analysis result among the lowest score in
the tree. But because each layer cannot give the
exact score in Equation 1 as the standard score
and the ability of analysis are difierent with
difierent layers, we should weight every score.
Then the last score is the linear weighted sum
(Equation 2).
Sn = w1s1 +w2s2 +¢¢¢+wnsn (2)
si denotes the score of the ith layer compo-
nent which we will introduce in Section 3; wi
denotes the weight of the ith layer components
which we will introduce in the next section.
In order to get the optimal result, all kinds
of tree search algorithms can be used. Here
the BEST-FIRST SEARCH Algorithm (Rus-
sell and Norvig, 1995) is used. Figure 2 shows
the main algorithm steps.
POS Tag
Word-Seg
……
他喜爱美丽的鲜花。
56.2他 喜爱 美丽 的 鲜花。 87.3他 喜 爱美 丽 的 鲜花。
108.3他 喜爱 美丽 的 鲜花。
  r      v        a      u      n
187.4他 喜爱 美丽 的 鲜花。
  b      v        a      u      n
Figure 1: An Example of Multilayer Search
Mechanism
1. Add the initial node (starting point) to the
queue.
2. Compare the front node to the goal state. If
they match then the solution is found.
3. If they do not match then expand the front
node by adding all the nodes from its links.
4. If all nodes in the queue are expanded then the
goal state is not found (e.g.there is no solution).
Stop.
5. According to Equation 2 evaluate the score of
expanded nodes and reorder the nodes in the
queue.
6. Go to step 2.
Figure 2: BEST-FIRST SEARCH Algorithm
2.3 Layer Weight
We should flnd out a group of appropriate
w1;w2;¢¢¢;wn in Equation 2 to maximize the
number of the optimal paths in MSM which can
get the correct results. They are expressed by
W⁄.
W⁄ = arg max
W
ObjFun(minSn) (3)
Here W⁄ is named as Whole Layer Weight.
ObjFun(⁄) denotes a function to value the re-
sultthatagroupofW canget. Herewecancon-
sider that the performance of each layer is pro-
portional to the last performance of the whole
system in MSM. So it maybe the F-Score of
Word-Seg, precision of POS Tagging and so on.
minSn returns the optimal analysis results with
the lowest score.
Here, the F-Score of Word-Seg can be deflned
as the harmonic mean of recall and precision of
Word-Seg. That is to say:
Seg:F-Score = 2⁄Seg:Pre⁄Seg:RecSeg:Pre+Seg:Rec
Seg:Pre = #words correctly segmented#words segmented
Seg:Rec = #words correctly segmented#words in input texts
Finding out the most suitable group of W
is an optimization problem. Genetic Algo-
rithms (GAs) (Mitchell, 1996) is just an adap-
tive heuristic search algorithm based on the evo-
lutionary ideas of natural selection and genetics
to solve optimization problems. It exploits his-
torical information to direct the search into the
region of better performance within the search
space.
To use GAs to solve optimization prob-
lems (Wall, 1996) the following three questions
should be answered:
1. How to describ genome?
2. What is the objective function?
3. Which controlling parameters to be se-
lected?
A solution to a problem is represented as a
genome. The genetic algorithm then creates a
population of solutions and applies genetic op-
erators such as mutation and crossover to evolve
the solutions in order to flnd the best one(s) af-
ter several generations. The numbers of popula-
tion and generation are given by controlling pa-
rameters. The objective function decides which
solution is better than others.
In MSM, the genome is just the group of W
which can be denoted by real numbers between
0 and 1. Because the result is a linear weighted
sum, weshouldnormalizetheweightstoletw1+
w2+¢¢¢+wn = 1. The objective function is just
ObjFun(⁄) in Equation 3. Here the F-Score
of Word-Seg is used to describe it. We set the
genetic generations as 10 and the populations in
one generation as 30. The Whole Layer Weight
shows in the row of WLW in Table 4. The F-
Score of Word-Seg shows as Table 3.
We can see that the Word-Seg layer gets an
obviously large weight. So the flnal result is
inclined to the result of Word-Seg.
2.4 Self Confldence
Our analysis indicates that the method of
weighting a whole layer uniformly cannot re-
 ect the individual information of each sen-
tence to some component. So the F-Score of
Word-Seg drops somewhat comparing with us-
ing Only Word-Seg. For example, the most
sentences which have ambiguities in Word-Seg
component are still weighted high with Word-
Seg layer weight. Then the flnal result may still
be the same as the result of Word-Seg compo-
nent. It is ambiguous, too. So we must use a
parameter to decrease the weight of a compo-
nent with ambiguity. It is used to describe the
analysis ability of a component for an input. We
name it as Self Confldent (SC) of a component.
It is described by the difierence between the flrst
and the second score of a component. Then the
bigger SC of a component, the larger weight of
it.
There are lots of methods to value the difier-
ence between two numbers. So there are many
kinds of deflnitions of SC. We use A and B to
denotetheflrstandthesecondscoreofacompo-
nent respectively. Then the SC can be deflned
as B¡A, BA and so on. We must select the bet-
ter one to represent SC. The better means that
a method which gets a lower Error Rate with a
threshold t⁄ which gets the Minimal Error Rate.
t⁄ = arg min
t
ErrRate(t)
ErrRate(t) denotes the Error Rate with the
threshold t. An error has two deflnitions:
† SC is higher than t but the flrst result is
fault
† SC is lower than t but the flrst result is
right
Then the Error Rate is the ratio between the
error number and the total number of sentences.
Table 2 is the comparison list between difier-
ent deflnitions of SC and their Minimal Error
Rate of Word-Seg. By this table we select B¡A
as the last SC because it gets the minimal Min-
imal Error Rate within the difierent deflnitions
of SC.
SC is added into Equation 2 to describe the
individual information of each sentence inten-
sively. Equation 4 shows the new score method
of a path.
Sn = w1sc1s1 +w2sc2s2 +¢¢¢+wnscnsn (4)
sci denotes the SC of a component in the ith
layer.
3 Experimental Results
3.1 Score of Components
We build a practical system CUP (Chinese
Understanding Platform) based on MSM with
three layers { Word-Seg, POS Tagging and
Parsing. Each component not only provides the
n-best analysis result, but also the score of each
result.
In the Word-Seg component, we use the uni-
gram model (Liu et al., 1998) to value difierent
results of Word-Seg. So the score of a result is:
ScoreWord¡Seg = ¡logP(W) = ¡XlogP(wi)
wi denotes the ith word in the Word-Seg re-
sult of a sentence.
In the POS Tagging component the classical
Markov Model (Manning and Sch˜utze, 1999) is
used to select the n-best POS results of each
Word-Seg result. So the score of a result is:
ScorePOS =¡logP(POSjW)
=¡log P(WjPOS)P(POS)P(W)
=¡XlogP(wijti)¡XlogP(tijti¡1)
+logP(W)
ti denotes the POS of the ith word in a Word-
Seg result of a sentence.
In the Parsing component, we use a Chinese
Dependency Parser System developed by HIT-
IRLab1. The score of a result is:
ScoreParsing =¡logP(DPjW;POS)
=¡log P(W;POS;DP)P(W;POS)
=¡XlogP(lij)
+logP(W;POS)
lij denotes a link between the ith and jth
word in a Word-Seg and POS Tagging result
of a sentence.
Table 1 gives the one and flve-best results of
each component with a correct input. The test
data comes from Beijing Univ. and Fujitsu Chi-
nese corpus (Huiming et al., 2000). The F-Score
is used to value the performance of the Word-
Seg, Precision to POS Tagging and the correct
rate of links to Parsing.
Table 1: The flve-best results of each compo-
nent
1-best 5-best
Word-Seg 87.83% 94.45%
POS Tag 85.34% 93.28%
Parsing 80.25% 82.13%
3.2 Self Confldence Selection
InordertoselectabetterSC,wetestallkindsof
deflnition form to calculate their Minimal Error
Rate. For example B¡A, BA and so on. A and B
denote the flrst and the second score of a com-
ponent respectively. Table 2 shows the relation-
ship between deflnition forms of SC and their
Minimal Error Rate. Here, we experimented
with the flrst and the second Word-Seg results
of more than 7100 Chinese sentences.
3.3 F-Score of Word-Seg
The result of Word-Seg is used to test our
system’s performance, which means that the
ObjFun(⁄) returns the F-Score of Word-Seg.
There are 1,500 sentences as training data
and 500 sentences as test data. Among these
data about 10% sentences have ambiguities and
the others come from Beijing Univ. and Fujitsu
1The Parser has not been published still.
Chinese corpus (Huiming et al., 2000). In CUP
the flve-best results of each component are se-
lected. Table 3 lists the F-Score of Word-Seg.
They use Only Word-Seg (OWS), Whole Layer
Weight (WLW), SC (SC) and FeedBack mecha-
nism (FB) separately. Using the feedback mech-
anism means that the last analysis result of a
sentence is decided by the Parsing. We select
the result which has the lowest score of Pars-
ing. Table 4 shows the weight distributions in
WLW and SC weighting methods.
3.4 The E–ciency of CUP
The e–ciency test of CUP was done with 7112
sentences with 20 Chinese characters averagely.
It costs 58.97 seconds on a PC with PIV 2.0
CPU and 512M memory. The average cost of a
sentence is 0.0083 second.
4 Discussions
According to Table 1, we can see that the per-
formance of each component improved with the
increasing of the number of results. But at the
same time, the processing time must increase.
So we should balance the e–ciency and efiec-
tiveness with an appropriate number of results.
Thus, it’s more possible for CUP to flnd out the
correct analysis than the original cascade mech-
anism if we can invent an appropriate method.
We deflne SC as B ¡A which gets the mini-
mal Minimal Error Rate with the analysis of the
Table 2: SC and Minimal Error Rate
Deflnition Form of SC Minimal
Error Rate
1
A ¡
1
B 23.85%B ¡A 21.07%
B
A 23.98%B
A ¡
A
B 23.98%B¡A
length of a sentence 24.12%B¡A
length of a sentence+100 23.71%
Table 3: F-Score of Word-Seg
OWS WLW SC FB
F-Score 86.99% 85.80% 88.13% 80.72%
Table 4: Layer Weight
1-layer 2-layer 3-layer
In WLW 0.84 0.12 0.04
In SC 0.44 0.40 0.16
Table 2. Take the case of Word Segmentation:
B ¡A = X
i
logP(wAi )¡X
j
logP(wBj )
It’s just the difierence between logarithms of
difierent word results’ probability of the flrst
and the second result of Word Segmentation.
Table 3 shows that MSM using SC gets a bet-
ter performance than other methods. For a Chi-
nese sentence \0/b"+� b". (There
are some drinks under the table). The CUP
gets the correct analysis { \0/n//ndb/v
"/u+/m/q�/n b/w". But the cascade
and feedback mechanism’s result is \0/n/
b/v"/u+/m/q�/n b/w".
The cascade mechanism uses the Only Word-
Seg result. In this method P(/b) is more
than P(/) ⁄ P(b). At the same time, the
wrong analysis is a grammatical sentence and
is accepted by Parsing. These create that these
two mechanisms cannot get the correct result.
But the MSM synthesizes all the information of
Word-Seg, POS Tagging and Parsing. Finally
it gets the correct analysis result.
Now, CUP integrates three layers and its e–-
ciency is high enough for practical applications.
5 Conclusions and Future Work
A new Chinese NLU architecture based on Mul-
tilayer Search Mechanism (MSM) integrates al-
most all of NLU components into a uniform
multilayer treelike platform and quantifles these
components to use the search algorithm to flnd
out the optimal result. Thus any component
can be added into MSM conveniently. They
only need to accept an input and give several
outputs with scores. By experiments we can see
that a practical system { CUP based on MSM
improves the performance of Word-Seg to a cer-
tain extent. And its e–ciency is high enough for
most practical applications.
The cascade and the feedback mechanism are
JUST the special cases of MSM. If greedy algo-
rithm is used at each layer to expand the result
with the lowest score, MSM becomes the cas-
cade mechanism. If the weight of each layer
except the feedback point is set 0, the MSM be-
comes the feedback mechanism.
In the future we are going to add the Phrase
Analysis, WSD (Word Sense Disambiguation)
and Semantic Analysis components into CUP,
because it is impossible to analyze some sen-
tences correctly without semantic understand-
ing and the Phrase Analysis helps to en-
hance the performance of Parsing. At last,
CUP becomes a whole Chinese NLU platform
with Word-Seg, POS Tagging, Phrase Analy-
sis, Parsing, WSD and Semantic Analysis, six
components from lower layers to higher layers.
Under the framework of MSM, it becomes very
easy to add these components.
With the increasing of layers the handle speed
must decrease. So some heuristic search algo-
rithms will be used to improve the speed of
searching while enhancing the speed of each
component. Under the MSM framework, we can
do these easily.
The performance of each component should
be improved in the future. At least, it is impos-
sible for MSM to flnd out the correct analysis
result if there is a component which cannot give
a correct result within n-best results with a cor-
rect input. In addition, we are going to evalu-
ate the performance of each component not just
Word-Seg only.
6 Acknowledgements
We thank Liqi Gao and Zhuoran Wang provide
the Word Segmentation tool, Wei He provide
the POS Tagging tool and Jinshan Ma provide
the Parser tool for us. We acknowledge Dekang
Lin for his valuable comments on the earlier ver-
sions of this paper. This work was supported by
NSFC 60203020.

References
Shan Gao, Yan Zhang, Bo Xu, ChengQing
Zong, ZhaoBing Han, and RangShen Zhang.
2001. The research on integrated chinese
words segmentation and labeling based on tri-
gram statistic model. In Proceedings of IJCL-
2001, Tai Yuan, Shan Xi, China.
Duan Huiming, Song Jing, Xu Guowei,
Hu Guoxin, and Yu Shiwen. 2000. The de-
velopment of a large-scale tagged chinese cor-
pus and its applications. Applied Linguistics,
(2):72{77.
Ting Liu, Yan Wu, and Kaizhu Wang. 1998.
The problem and algorithm of maximal prob-
ability word segmentation. Journal of Harbin
Institute of Technology, 30(6):37{41.
Christopher D. Manning and Hinrich Sch˜utze.
1999. Foundations of Statistical Natural Lan-
guage Processing. The MIT Press, Cam-
bridge, Massachusetts.
Melanie Mitchell. 1996. An Introduction to
Genetic Algorithms. The MIT Press, Cam-
bridge, Massachusetts.
Masaaki Nagata. 1994. A stochastic japanese
morphological analyzer using a forward-dp
backward-A* n-best search algorithm. In
Proceedings of the 15th International Con-
ference on Computational Linguistics, pages
201{207.
Stuart Russell and Peter Norvig. 1995. Artifl-
cial Intelligence: A Modern Approach. Pren-
tice Hall Series in Artiflcial Intelligence, En-
glewood Clifis, NJ, USA.
Matthew Wall. 1996. GAlib: A C++ Li-
brary of Genetic Algorithms components.
http://lancet.mit.edu/ga/.
Andi Wu and Zixin Jiang. 1998. Word segmen-
tation in sentence analysis. In Proceedings
of the 1998 International Conference on Chi-
nese Information Processing, pages 169{180,
Beijing, China.
