Syntactic Ambiguity Resolution Using A Discrimination 
and Robustness Oriented Adaptive Learning Algorithm 
Tung-Hui Chiang, Yi-Chung Lin and Keh-Yih Su 
Department of Electrical Engineering 
National Tsing Hua University 
Hsinchu, Taiwan 300, R.O.C. 
E-Mail: thchiang@ ee.nthu.edu.tw 
Topic Area: Computational Methods (Statistical), Application (NLP) 
Abstract 
In this paper, a discrimination and robusmess 
oriented adaptive learning procedure is proposed to 
deal with the task of syntactic ambiguity resolution. 
Owing to the problem of insufficient training data 
and approximation error introduced by the language 
model, traditional statistical approaches, which re- 
solve ambiguities by indirectly and implicitly using 
maximum likelihood method, fail to achieve high 
performance in real applications. The proposed 
method remedies these problems by adjusting the 
parameters to maximize the accuracy rate directly. 
To make the proposed algorithm robust, the possi- 
ble variations between the training corpus and the 
real tasks are als0 taken into consideration by en- 
larging the separation margin between the correct 
candidate and its competing members. Significant 
improvement has been observed in the test. The 
accuracy rate of syntactic disambiguation is raised 
from 46.0% to 60.62% by using this novel approach. 
1. Introduction 
Ambiguity resolution has long been the focus in 
natural language processing. Many rule-based ap- 
proaches have been proposed in the past, However, 
when applying such approaches to large scale appli- 
cations, they usually fail to offer satisfactory pertbr- 
mance. As a huge amount of fine-grained knowl- 
edge is required to solve the ambiguity problem, it 
is quite difficult for rule-based approach to acquire 
the huge and fine-grained knowledge, and maintain 
consistency among them by human \[Su 90a\]. 
Probabilistic approaches attack these problems 
by providing a more objective measure on the pref- 
erence to a given interpretation. Then, these ap- 
proaches acquire huge and fine grained knowledge, 
or parameters in statistic terms from the corpus au- 
tomatically. The uncertainty problem in linguistic 
phenomena is resolved on a more solid basis if a 
probabilistic approach is adopted. Moreover, the 
knowledge acquired by the statistical method is al- 
ways consistent because the knowledge is acquired 
by jointly considering all the data in the corpus at 
the same time. Hence, the time for knowledge ac- 
quisition and the cost to maintain consistency are 
significantly reduced by adopting those probabilis- 
tic approaches. 
To resolve the problems resulting from syntac- 
tic ambiguities, a unified statistical approach for am- 
biguity resolution has been proposed by Su \[Su 88, 
92b\]. In that approach, all knowledge sources, in- 
cluding lexical, syntactic and semantic knowledge, 
are encoded by a unifiedprobabilistic score function 
with a uniform formulation. This uniform proba- 
bilistic score function has been successfully applied 
in spoken language processing \[Su 90b, 91b, 92a\] 
and machine translation systems \[Chen 91\] to in- 
tegrate different knowledge sources for ambiguity 
resolution. 
In implementing this unified probabilistic score 
function, values of score functions are estimated 
from the data in the training corpus. However, due 
to the problem of insufficiency of training data and 
incompleteness of model knowledge, the statistical 
variations between the training corpus and the real 
application are usually not covered by this approach. 
Therefore, the performance in the testing set some- 
times gets poor in the real application. 
To enhance the capability of discrimination 
and robustness of those proposed score function, 
a discrimination-oriented adaptive learning is pro- 
posed in this paper. And then, the robustness of this 
proposed adaptive learning procedure is enhanced 
by enlarging the margin between the correct candi- 
date and its confusing candidates to achieve maxi- 
mum separation between different candidates. 
Since the implementation of this adaptive learn- 
ing procedure is based on the uniform probabilistic 
score function, we will first briefly review the uni- 
fied probabilistic score function. Readers who are 
ACTES DE COLING-92, NANTas, 23-28 AO~r 1992 3 5 2 PROC. OF COLING-92, NANTES, AuG, 23-28. 1992 
interested in the details about the uniform proba- 
bilistic score function please refer \[Chen 91, Su 
91b, 92a, 92b\]. 
2. Overview of Uniform Probabilistic 
Score Function 
2.1, General Definition 
A Score Function for a given syntactic tree, say 
Synj, is defined as follows: 
S~or~ (s~,,~)_-- v(%,,. L~, IG') (\]) 
where w~ ~ is the input word sequence, w~ ~ = 
{Wl,W2,'",w,,}, and Lcxj, the correspond- 
ing lexical string, i.e., part of speech sequence 
{cjl , cj2,... , cj,~ }. By applying the multiplication 
theorem of probability, l'(Sun), Lexj I w?) can be re- 
stated as follows. 
P (Syn:, l,e~j I G') 
The two components, s,~,, (.','yn D and s~,~ ( Lexj ), m 
thc above formula are called syntactic Score Func- 
tion and Lexieal Score Function, respectively. The 
original score function, i.e., P(Sy%,Lexjlw~'), is 
then called Integrated Score Function. 
Next, we assume the information, from the 
word .sequence w\] ~, required for syntactic ambigu- 
ity resolution, has percolated to the lexical inter- 
pretation Lezj. Also, only little additional infor- 
mation can be provided from w\] ~ for the task of 
disambiguating syntactic interpretation ~yuj after 
the lexical interpretation Lexj is given. Thus, rite 
syntactic score can be approximated as shown in 
t~t.(3): 
.'~.,~,, (Syn:)= l'(Sy% I Lex~,w'~ ') ~ I'(Sy% I Le%). 
(3) 
The integrated score function P(Sy%, Lez~\]wi') is 
then approximated as follows. 
P (Synj, Lex~ I w~') (4) 
-~ P(Sy% I Lez,) × P(Lex, I u,;'). 
Such a formulation "allows us to use both lexical 
and syntactic knowledge in assigning preference 
measure m a syntactic tree. in the real computation, 
log operation is used to convert the operations of 
multiplication to the operations of addition. The 
following equation shows the final form in the real 
application. 
log P ( Syrtj, Lex~ I w\[') = log S,u, ( Sy% )+log S~,, ( Lex~ ) . 
(5) 
2.2. Lexical Score F'unction 
Let ck~, denote the k-th sequence of tile lexical 
category, or part of speech, co~xesl)onding to the 
word sequence w~L The Lexical Score Function can 
be expressed as follows \[Chen 91, Su 92b\]: 
= fi e(~, I~ki ...... ;'), (6) 
i=l 
where ck~ is rite lexic',d category of w i. Sev- 
eral forms lGars 87, Chur 88, Su 92b\] tor 
1' (~k, I~q ..... p) were propo~d to simplify the 
computanon. For example, IChur 88\] approximated 
"(~k IG ...... ;')by \[P(~k, I%,-,\] × t'(ck, 1~,)\] , A 
general nonlinear smoothing l'onn \[Chen 91\] de- 
scribed in Eq.(7) is adopted in this paper: 
u (P (~, I ~ ...... ,)) (7) 
~ Ag(P(,:k. \]w,))+(l .~)9 (Ck, \[ ..... ), 
where A is the lexical weight (A = 0.6 is used in the 
current setup), and 9 is a transform function (log (.) 
is used in this paper). Hence, given both Eq.(6) and 
(7), the lollowing tormula is derived: 
log (St~ ( Lr, z~,)) 
n 
= ~ {Atog P(~,l,,,) + (~ -~) o~°(~ I ..... )} 
J=l (8) 
It is noted that the above generalized form reduced 
to the formulation of \[Chur 88\] when file transtonn 
function is log function and A is 0.5. 
2.3. Syntactic Score Function 
To show the computing mechanism for the syn- 
tactic score, we take the syntax tree in Fig.1 as an 
example. The syntax tree is decomposed into a 
number of phrase levels. Each phrase level (also 
called a sententialform) consists of a set of sym- 
bols (terminal or nontenninal) which can derive all 
the temfin',d symbols in a sentence. Let label t i in 
Fig.l be the time index lot each state transition of a 
LR parser, and Li be the i-fit phrase level. Thus, a 
transition from phrase level L i to phrase level Li+ 1 
is equivalent to a redue action at time ti. 
A ACIION 
O C l.Tffi {B C ..... A ............. $ "" 
~ I.- {II I'. O\] ...... .................... 
b I~ F G IA. \[B C3. C41 ...... 11 ............ CA .... 
I II I I1 17" 14 ~ 1\['21 II~ C2' C3' C4I ..... 1\[~ .................. 
C I C 2 C~ 124 LI~ 1121. Ca, C~, C4I ..... D' ........... C2"" .................... C1 .... 
Figure I The dr, composition of a syntax tree 
hito phrase levels. 
ACfES DE COLING-92, NANTF.S, 23-28 AOtTr 1992 3 5 3 Pgoc. OF COLING-92, NANri~s, AUG. 23-28, 1992 
The syntactic score of the syntax tree in Fig.l 
is then defined as 
S,y, (svna) =- P(Ls,Lr, ... ,L, I L~) 
8 8 : ( ) 
IIe L, I L~-' 
i=~ i=2 (9) 
where syn A is the parse tree, and LI through L 8 
represent different phrase levels. Note that the prod- 
uct terms in the last formula correspond to the right- 
most derivation sequence in a general LR parser \[Su 
91c\], with left and right contexts taken into account. 
Therefore, such a formulation is especially useful 
for a generalized LR parsing algorithm, in which 
context-sensitive processing power is desirable. 
Although the context-sensitive model in the 
above equation provides the ability to deal with 
intra-level context-sensitivity, it falls to catch inter- 
level correlation. In addition, the formulation of 
Eq.(9) gives rise to the normalization problem for 
ambiguous syntax trees with different number of 
nodes. An alternative to relieve this problem is 
to compact multiple highly correlated phrase levels 
into one in evaluating the syntactic scores. Tile 
formulation is expressed as follows \[Su 91c\]: 
S,v, (sv,,a) 
Word Category (part of speech) 
1 pron (pronoun) 
n (noun) 
vi (intransitive verb) saw 
vt (txansitive verb) 
art (article) a 
prep (preposition) 
man n (noun) 
log P(elw) 
-0.22 
-0.39 
-0.52 
-0.16 
-0.02 
-1.30 
0 
Table 1 Categories for words and their log 
word-to-category scores. 
In Table 1, the log word-to-category score, 
log (P (e \] to)), for each word is estimated from the 
training corpus by calculating their relative frequen- 
cies. For exanrple, in the training corpus, the word 
'T' is used as pronoun for 60 times, and 40 times 
as noun. Then, the log word-to-category scores can 
be calculated as follows. 
60 lOglo l" (prou l {I} ) = loglo ( 6-'~--~-~ ) = --0.22, 
(,o) IOg,o P(u I (1}) = loan ~ = -0.39. (11) 
In this example, there are 2"2"2"1=8 possible dif- 
ferent ways to assign lexical categories to the input 
P(L~,LT,L~ \[ L~) × P(Lr~ I L4) x P(L4,L~ I L~) × P(L7 I LQsentence. When these 8 possible lexical ,sequences 
P(La I Lr,) × P(Ln I L4) × P(L4 I L~) × I' (L~, I Lt) (10) 
Because thc number of shifts, i.e., the number of 
terms in Eq.(lO), is always the same for all am- 
biguous syntax trees, the normalization problem 
is then resolved. Moreover. it provides a way 
to consider both intra-level context-sensitivity and 
inter-level correlation of the underlying context-free 
grammar. With such a score function, the capabil- 
ity of context-sensitive parsing (in probability sense) 
can be achieved with a context-free grammar. 
3. Discrimination and Robustness Oriented 
Adaptive Learning 
3.1. Concepts of Adaptive Learning 
The general idea of adaptive learning is to ad- 
just the model parameters (in this paper, they are 
lexical scores and syntactic scores) to achieve the 
desired criterion (in our case, it is to minimize the 
error rate). To explain clearly how the adaptive 
learning works, we take the sentence "1 saw a man." 
as an example. The lexical category (i.e,, part of 
speech) and its corresponding log score for each 
word are listed in Table 1. 
are parsed, only four of them are accepted by our 
parser. They are listed as follows: 
1. pron vt art n 
2. n vt art n 
3. pron vi prep n 
4. n vi prep n. 
The syntactic scores of different parse trees are 
then calculated according to Eq.(10). A parse tree 
corresponding to the lexical sequence "\[pron vt art 
n\]" is drawn below as an example. 
s 
NP VP 
pron v NP 
Ult n 
The log syntactic scores for those four grammatical 
inputs are computed and listed in Table 2. 
AcrEs DE COLING-92, NANTES, 23-28 Aot3"r 1992 3 5 4 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
Input Lexlcal Sequence log syntactic sco~e 
\[pron vt art n\] (-0.7)+(41.3)+(-0.3)+(-0.2) = -I.5 
\[, vt art n\] (41.2)+(-0.3)+(-0.3)+(-0.2) = -1.0 
\[prot, vi prep n\] (-0.7)+(-0.7)+(-0.4)+(-0.3) = -2.1 
\[n vi prep ,1 (-0.2) +(-0.7)+(-0.4)+(-0.3) = -1.6 1 
"fable 2 log syntactic scores of the grammatical 
Input lexlcal sequences. 
According to Eq.(5), the total log integrated 
score (log Slez+logSsvn) for each parsed sentence 
hypothesis is calculated. For example, the log lexi- 
cal score for "I/\[pron\] saw/\[vt\] a/\[art\] man/In\]" = (- 
0.22-0.16-0.02-0)= -0.4. Finally, the log integrated 
scores for the above grammatical inputs are listed 
as follows: 
candidate-1, log integrated score = (-0.40-1.5 = -1.90) : 
l/Ipron\] saw/\[vt\] a/\[art\] man/In\] 
candidate -2. logintegrated score = ( 0.57-1.0 = -1.57) : l/in\] 
saw/\[vt\] a/\[art\] maa/\]nl 
candidate-3, log integrated score = (-2.04-2.1 = 4.71) : 
I/\[pron\] saw/lvi\] a/\[prep\] man/\[n\] 
candidate -4. log integrated score = (-2.21-1.6 = -3.81) : l/\[n\] 
saw/\[vii a/\[prepl man/\[nl 
Among these four candidates, the candidate 1 is 
regarded as the desired selection by linguists. Since 
our decision criterion will select the candidate which 
has the highest integrated score, i.e., the second one; 
I/\[n\] ~w/\[vt\] a/\[art\] man/\[n\], it results in a decision 
error in this case. 
To remedy this error, adaptive learning proce- 
dure is adopted to adjust the score values iteratively, 
including lexical and syntactic scores, until the inte- 
grated score of the correct candidate (i.e., candidate 
1) raises to the highest rank. In this paper, parame- 
ters which are adjusted by adaptive learning proce- 
dure are those log scores, including log P (c/q I wi), 
logP(ek,\]ck,-i ~ and logP(LilL~-l). The 
amount of adjutstment in each iteration depends on 
the misclassihcation distance. Misclassification dis- 
tance is defined as the difference between the score 
of the top candidate and that of the correct one. 
(In the above example, distance = (score of cor- 
rect candidate)-(score of top candidate) = (-1.90)- 
(-1.57)= -0.33). From iteration to iteration, the 
parameters (both lexical and syntactic scores) are 
adjusted so that the integrated score of the correct 
candidate is increased, and the integrated score of 
the wrong candidate is decreased at the same time. 
The learning procedure for a sentence is stopped 
when the candidate of this sentence is correctly se- 
lected. To make the explanation of this adaptive 
learning procedure clear, we assume lexical scores 
are unchanged during learning. That is, only the 
paranteters of the syntactic scores are adjusted. The 
details of adaptive leanling for adjusting syntactic 
scores are listed as follows: 
hdtlal 'lzution 
candidate -1. zx syntactic score = \[-0.7 -0.3 -0.3 -0.2\] = -1.5. 
log integrated score = -1.9; 
candidate .2. " syntactic score = \[-0.2 -0.3 -0.3 -0.2\] = -1.0, 
log integrated score = -1.57; 
candidate -3. syntactic score = \[-0.7 -0.7 -0.4 -0.3\] = -2.1. 
log integrated score = .4.71; 
catudidate -4. syntactic score = \[-0.2 -0.7 -0.4 -0.3\] = -1.6. 
log integrated score = -3.81; 
Iteration I 
candidate -1. ~ syntactic score = \[-0.5 .0.3 -0.3 -0.2\] = -1.3, 
log integrated score = -1.7; 
candidate -2. " syntactic score = \[-0.3 -0.3 -0.3 .0.2\] = -1.1, 
log integrated score = -1.67; 
candidate -3. syntactic score = \[-0.5 -0.7 -0.4 ~0.3\] = -1.9, 
log integrated score = -3.94; 
candidate -4. syntactic score = \[-0.3 -0.7 -0.4 -0.3\] = -1.7, 
1o 8 integrated score = -3.91; 
Iteration 2 
candidate -1. e," syntactic score = \[-0.2 -0.3 -0.3 -0.2\] = -1.0, 
log integrated score = -1.4; 
(stop learning) 
candidate -2. syntactic score = \[-0.6 -0.3 -0.3 -0.2\] = -1.4, 
log integrated score = -1.97; 
candidate -3. syntactic score = \[-0.2 -0.7 -0.4 -0.31 = -1.6, 
log integrated score = -3.64; 
candidate 5. syntactic score = \[41.6 -0.7 -0.4 -0.3\] = 2.0, 
log integrated score = -4.21; 
(where * denotes the top candidate, and A denotes the desired 
candidate) 
It is clear that after the second iteration, paranaeters 
have been adjusted so that the desired candidate 
(i.e., candidate 1) would be selected. 
3.2. Procedure of Discrimination Learning 
Since correct decision only depends upon cor- 
rect rank ordering of the integrated scores for all 
ambiguities, not their real value, a discrimination- 
oriented approach should directly pursue correct 
rank ordering. To derive the discrimination func- 
tion, the probability scores mentioned above are first 
jointly considered. Then, a discrimination-oriented 
function, namely g (.), is defined as a measurement 
of above mentioned score functions, so that it can 
well pre~rve the correct rank ordering \[Su 91a\]. 
Here, 9(') is chosen as the weighted sum of log 
ACRES DE COLING-92, NANTES, 23-28 aofrr 1992 3 5 5 PROC. OV COLING-92, NANTES. AUO. 23-28, 1992 
lexical and log syntactic scores, i.e., 
a (sv,,k) 
= wl~, . log Sl,x (Le~:k) + W,vn ' log S.~,. (Sun~:) 
u 
..... ..... :.)+ ..... 
i=l 
n n 
(12) 
where .h,, (i) = logP (ckA%; .... 1'), and A,v,, (i) = 
logP(LdLi-:). Both stand for the log lexical score 
and the log syntactic score of the i-th word for the 
k-th syntactic ambiguity, respectively. In addition, 
Wle z and Wsyn correspond to the weights of lexical 
and syntactic scores, respectively. 
If the parse tree of a sentence is misselected, 
the parameters (i.e., the lexical and the syntactic 
scores) are adjusted via the proposed adaptive learn- 
ing procedure. Otherwise, no parameters would be 
adjusted. When misselection occurs, the misclassi- 
fication distance, dso is less than zero. This mis- 
classification distance is defined as the difference 
between the log integrated .score of the correct can- 
didate and that of the top one. A specific term of 
the syntactic score components in the (t+l)-th itera- 
A(t+l) lion of the correct candidate, say sv,* (j), would 
be adjusted as follows: 
{ ~('+'~ ' ~ - L~). (i) + ~(.;),, (j) a,. < o, ,v- t3)- ' - (13) 
At the same time, the term of the syntactic score 
components of the top candidate would be adjusted 
according to the following formulas: 
fa(.~+,,')(i) x(') "" ~a(') = ,~s)- ,~,,(j), a,~<o, (I,1) 
where AA~ n (j)is the amount of adjustment. This 
value is represented as 
zx~i~). O) = "d ....... 
I=\] " leX tlJ + iOst/n " 
(15) 
where do is a constant which stands for a window 
size, and e is the learning constant for controlling 
the speed of convergence. The learning rule for 
adjusting the lexical scores can be represented in a 
similar manner. Notice that only the parameters of 
the top candidate and those of the correct candidate 
would be adjusted when misselections occur. Those 
parameters of other wrong candidates would not be 
adjusted in this adaptive leaming procedure. From 
Eq.(13), (14) and (15), it is clear that the score of 
the correct candidate will increase and that of wrong 
candidate will decrease from iteration to iteration 
until the correct candidate is selected. For the 
purpose of clarity, the detailed derivations of the 
above adaptive learning procedure will not be given 
hem. Interested readers can contact the authors for 
details. 
3.3. Robustness Issues 
Since it is easy to improve the performance in a 
training set by adopting a model with a large number 
of parameters, the error rate measured in the train- 
ing set frequently turns out to be over-optimistic. 
Moreover, the parameters estinlated from the train- 
ing corpus may be quite differ from that obtained 
from the real applications. These phenomena may 
occur due to the factors of finite sampling size, style 
mismatch, or domain mismatch, etc. To achieve a 
better perlbrmance in the real application, one must 
deal with the possible mismatch of parameters, or 
statistical variation between the training corpus and 
the real application. One way to achieve this goal is 
to enlarge the inter-class distance to achieve maxi- 
mum separation \[Su 91a\] between the cot:ect can- 
didate and the other candidates. That is, this ap- 
proach provides a tolerance zone between different 
candidates for allowing possible data scattering in 
the real application. 
Traditional adaptive learning methods \[Amar 
67, Kata 90\] stop adjusting parameters once the in- 
put pattern has been correctly classified. However, 
if we stop adjusting parameters under the condi- 
tion that the observations are correctly classified in 
the training corpus, the distance between the cor- 
rect candidate and other ambiguities may still be 
too small. Thus, it is vulnerable to deal with pos- 
sible modeling errors and statistical variations be- 
tween the training corpus and the real application. 
Su \[Su 91a\] has proposed a robust learning proce- 
dure which continues to enlarge the margin between 
the correct candidate and the top one, even if the 
syntax tree of the sentence has been correctly se- 
leeted. That is, the parameters will not be adjusted 
only if the distance between the correct candidate 
and the others has exceeded a given threshold. The 
learning rules in Eq.(13), (14) are then modified as 
follows. 
If dsc _< 6, where 6 is a preset margin, the syntactic 
score in the (t+l) iteration tor the correct candidate 
is adjusted according to the following formulas: 
(,~(t+l) ll) (t) ~,~, I:) = .t,~., OI + A;,.~. O), a,, < ~, (16) 
...... O) ,x~,'~),, 0), ot/,~T,o,~. 
And, the syntactic score of the top candidate is 
adjusted as follows: 
~L~,t,'~Ol -~"~ Ol-,ax~,;LO) a,~<~, - -,v .... (17) ) .{t+l),., 
-- 1(t) La'~" t31- ",~, O), otherwtse. 
ACa-ES DE COLING-92, NANTES, 23-28 xot:rr 1992 3 5 6 I'ROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
4. Simulations 
The following experiments are conducted to ili- 
vestigate the advmltage of the proposed discrimina- 
tion and robustness oriented adaptive leanfing pro_ 
cedure. In the experiments, 4,000 sentences, which 
are extracted from IBM tectmical manuals are first 
associated with their conesponding correct category 
sequences and correct parsed trees by linguists. The 
corpus are then partitioned into a training corpus 
of 3,2(X) sentences and a test set of 800 sentences. 
Next, the lexical and syntactic probabilities are es- 
timated from the data in the training corpus. Af- 
terwards, the sentences in the test set a~e used to 
evaluate rite perlomtance of the proposed algorithm 
using the estimated lexical and syntactic probabili- 
ties. This integrated score timction apploach using 
the estimated probabilities is considelcd as the base- 
line system. Performances of discrimination ufi~ 
ented adaptive learnings with and widtout robust- 
ness enhancement are then evaluated. The accuracy 
rate of the syntactic ambiguity resolution for the 
training corpus and rile test set are summarized in 
Table 3. (Note that the top candidate is selected 
t?om all po~ible parses allowed by tile gmntmars 
of the system; therefore, the baseline perlonnance is 
evaluated under a highly ambiguous environment.) 
Ilaseline 
+ Ila.sie version of 
learning 
+ Robust version of 
Learning 
'able 3 Accuracy rate (In 
Training I Test Set 
CorDus I 
79.75 I 46.(X) 
95.50 I 56.88 
96.03 I 60.62 
%) of syntactic disanl|flguatlun 
Table 3 shows that syntax tree accuracy ~ate 
is improved from 46% to 56.88% using the basic 
version of discrimination oriented adaptive learning 
procedure. This significant iiuprovement shows the 
superiority of the adaptive learning procedure for 
dealing with the disambiguation task. l:unhenuore, 
when the rt)bust versiou of leanfiug proccdure is 
adopted, the perfommnce is iml)roved further (flora 
56.88% to 60.62%). It means that the rohuslness 
of the learning procedure is indeed enhanced by en- 
larging the distance between the correct candidate 
and other candidates. Moreover, not only the ac- 
curacy rate of syntax trec is improved using adap- 
tive learning, but al~ that of lexical sequence is 
improved. In this paper, a lexical sequence is ~e- 
garded as "correct" only if all the lexical categories 
in a sentence perfectly match those selected by lin- 
guists. In other words, we are measuring "sentence 
accurac3: rate" in contract tu "word accuracy rate" 
as adopled ill \[Chur 88, Gal~ 87\]. Table 4 shows 
that file basic version of adaptive leaming proce- 
dule ilnproves the sentence accuracy rate of lexi- 
col sequences a|xmt 5% (from 77.12% to 82.38%). 
Again, with the robust version of learning, the ac- 
curate rate of lexical sequences is greatly enbanced. 
I Trahdng 
Corpu~ 
Baseline 91.41 ~_~ 
+ It asit i version of learn big 98.91 8~_2.3~\] 
+ ltohust version of 98.53 1 
Learning L__ 
Table 4 ~'-;entencc accuracy rate (in %) of lexlcal sequences 
The behavior of cach iteration of the adaptive 
learning process is shown in Figure 2. Through ob- 
serving this figure, we can conclude that if the ro- 
bustness issues are not considered during learning, 
the performance of tile test set would decrease as the 
training t)~ocess goes on. This is the phenomena of 
river-tuning, llowever, by h~rcing the learning pro- 
cedure to continue unlil the separation between the 
correct candidate and the top one exceeds the de- 
sired margin, the performmlce of the test set can be 
t'utther improved, and no degradation phenomenon 
is observed. 
I ............. =;:: ..................................................... 
l:lgurc 2 Syntax t~e accuracy rate versus Iteratkms for 
basic and r~lbusl verslou of adaptive learning 
AcrEs DE COLING-92, NAbrrl,:s, 23 28 Ao(rr 1992 3 5 7 l'lto¢:, oi: C()l,IN(;-92, N^yn~s, AUG. 23-28, 1992 
5. Summary 
Because of insufficient training data, and ap- \[Su 88\] 
proximation error introduced by the language model, 
traditional statistical approaches, which resolve am- 
biguities by indirectly and implicitly using maxi- 
mum likelihood method, fail to achieve high per- 
formance in real applications. To overcome these \[Su 90a\] 
problems, adaptive learning is proposed to pursue 
the goal of minimizing discrimination error directly. 
The performance of syntactic ambiguity resolution 
is significantly improved using the discrimination \[Su 90b\] 
oriented analysis. In addition, the sentence ac- 
curacy rate of the lexical sequences is also im- 
proved. Moreover, the performance is further en- 
hanced by using the robust version of learning pro- 
cedure, which enlargeds the margin between the 
correct candidate and its candidates. The final re- 
suits show that using the basic version of learning, \[Su 91b\] 
the syntax tree selection accuracy rate is improved 
about 10% (from 46% to 56.88%), and the total im- 
provement is over 14% using robust version learn- 
ing. Also, the sentence accuracy rate for lexical 
sequences is improved from 77.12% to 82.38 and 
87.88% using the basic and robust version of leam- 
ing procedure, respectively. 
ACRES DE COLING-92, NANTES, 23-28 AOUT 1992 3 5 8 Prtoc. OF COLING-92, NANTES, AUG. 23-28, 1992 

Reference 

Amari S., "A theory of adaptive pattern 
classifiers," IEEE Trans. on Electronic 
Computers, Vol. EC-16, pp. 299-307, \[Su 91c\] 
June 1967. 

Chen S.-C., J.-S. Chang, J.-N. Wang, 
and K.-Y. Su, "ArchTran: A Corpus- 
based Statistics-oriented English-Chinese 
Machine Translation System," Proc. of \[Su 92a\] 
Machine Translation Summit II1. Washing- 
ton, D. C., U.S.A., July 1-3, 1991. 

Church, K., "A Stochastic Parts Program 
and Noun Phrase for Unrestricted Text," 
ACL Proceedings of 2nd Conference on 
Applied Natural Language Processing, 
pp.136-143, Austin, Texas, U.S.A., 9-12 
Feb. 1988. 

Garside, R., G., Leech, and G., Samp- \[Su 92b\] 
son, "The Computational Analysis of Eng- 
lish: A Corpus-Based Approach," Lon- 
don: Longman. 

S.Katagiri. C.H. Lee, "A General- 
ized Probabilistic Decent Method," Proc. 
Acous. Sco. of Japan, 2-p-6, pp. 
141-142, Nagoya, Sept 1990. 

Su K.-Y., and J.-S. Chang, "Semantic 
and Syntactic Aspects of Score Function," 
Proc. COLING-88, Vol.2, pp. 642-644, 
12th Int. Conf. on Comput. Linguistic, 
Budapest, Hurgay, 22-27, Aug. 1988. 

Su, K.-Y., and J.-S Chang, 1990. "Some 
Key Issues in Designing MT Systems," 
Machi'ne Translation, vol. 5, no. 4, pp. 
265-300, 1990. 

Su K.-Y., T.-H. Chlang and Y.-C. Lin, "A 
Unified Probabilistic Score Function for 
Integrating Speech and Language Infor- 
mation in Spoken Language Processing," 
Proceeding of 1990 International Con- 
ference on Spoken Language Processing, 
pp.901-904, Kobe, Japan, 19-22 Nov. 
1990. 

Keh-Yih Su, Tung-Hui Chiang and Yi- 
Chung Lin, "A Robustness and Discrim- 
ination Orientexl Score Function for Inte- 
grating Speech and Language Processing," 
Proceeding of the 2rid European Confer- 
ence on Speech Communication and Tech- 
nology, Genova, Italy, pp. 207-210, Sep. 
24-26 1991. 

Su K.-Y., and C.-H. Lee, "Robustness and 
Discrimination Oriented Speech Recogni- 
tion Using Weighted HMM and Subspace 
Projection Approaches," Proc. ICASSP- 
91, pp.541-544, Toranto, Canada, 14-17 
May, 1991. 

Su K.-Y., J.-N. Wang, M.-H. Su, and J.- 
S. Chang, "GLR Parsing with Scoring," 
in Tomita, Masaru (ed.), Generalized LR 
Parsing, Chapter 7, pp.93-112, Kluwer 
Academic Publisher 1991.. 

Keh-Yih Su, Tung-Hui Chiang and Yi- 
Chung Lin, "An Unified Framework to In- 
corporate Speech and Language Informa- 
tion in Spoken Language Processing" to 
appear in the Proceeding of IEEE Inter- 
national Conference on Acoustics, Speech 
and Signal Processing, ICASSP-92, San 
Francisco, California, U.S.A., March 23- 
26 1992. 

Sn K.-Y., J.-S. Chang and Y.-C. Lin, "A 
Unified Approach to Disambiguation Us- 
ing A Uniform Formulation of Prohabilis- 
tic Score Function," in preparation. 
