Hybrid Neuro and Rule-Based Part of Speech Taggers 
Qing Ma, Masaki Murata, Kiyotaka Uchimoto, Hitoshi Isahara 
Communic~tion s l{esea.rch Labora.tory 
Ministry of Posts a.nd Telecommm~ications 
588-2, lwa,oka,, Nishi-ku, Kobe 6511-2/192, 3a, pa,n 
{qma, murata., uchimoto, isa.hara}{))crl.go.jp 
Abstract 
A hybrid system R)r tagging part of speech is 
descril)ed that consists of a neuro tagger and 
a rule-based correcter. The neuro tagger is 
an initia.1--state a.nnotator tha.t uses difl'ertnt 
h_,,ngths of contexts based on longe, st context l)ri- 
ority. Its inputs a.re weighted 1)y information 
gains tha.t are obtained by information ma.xi- 
mization. The rule-1)ased correcter is construct- 
ed by a. sol; of trm~sfc)rma.tion rules to xna.ke Ul) 
for the shortcomings o\[' the nou17o tagger. Corn- 
puter experiments show that ahnost 20% of the 
errors ma.de by the neuro tagger a.re correct- 
ed by the, st trans\[orma.tion rules, so tha.t the 
hybrid system ca.n reach a.n a,tcura.cy of 95.5% 
counting only the ambiguous words and 99.1% 
counting all words when a. small Thai corpus 
with 22,311 a mbig;uous words is used t))v tra.in- 
ing. This a(;cu racy is far higher than that using 
an IIMM and is also higher tha.n that using a. 
rule-1)ased model. 
1 Introduction 
Many pa.rt of speech (POS) tatters proposed 
so far (e.g., Brill, 1994; Meria.ldo, 1994; l)aele- 
marls, el. al., 1996; and Schmid, 1994) ha.re 
achieved a. high accura.ey partly because a. very 
large amount of dal,~ was used to 1;rain them 
(e.g., on the order of 1,000,000 words for \]'hl- 
glish). For ma.ny other la.nguages (e.g., Thai, 
which we treat in this paper)~ however, it is not 
as easy to cremate \]a.rge corpora from which lm:ge 
amounts of tr~fining data can be extra.cted. It is 
therefore desirable to construct a practic;d tag- 
ger tha.t needs as little training d a.t;a~ as possible. 
A multi-neuro tagger (Ma a.nd ls~hara, 11998) 
and its slimmed-down version called the ela.s- 
tic neuro tagger (Ma, el; al., 1999), which have 
high genera.lizing ability and therefore are good 
at dealing with the problems of data sp~u:se- 
hess, were proposed to satist~y this requh:ement. 
These taggers perform POS tagging using difl'er- 
ent lengths of corltexts I)~.~sed on longest context 
prk)rity, and each element of tile input is weight- 
ed with information gains (Quinla.n, 1993) for 
retlecting that tile elements of the input h~ve 
different rtlevances in t~Gging. They ha.d a tag- 
ging accuracy of 94.4% (counting only the am- 
biguous words in part of speech) in computer ex- 
periments when a. small 'l'ha.i corpus with 22,311 
am biguous words was u se(l for tr~fi n ing. This ~(:- 
curacy is bu" higher thml t\]lat; USillg tile hidden 
Marker model (IIMM), the main approach to 
\])art o\[ speech tagging, ~nd is ~dso higher t,\]lan 
tha.t using a. rule-based mode\]. 
Neuro taggers, however, htwe several crucial 
shortcomings. First, even in the case where the 
POS of a word is uniquely determined by the 
word on its left, for example, a neural net will 
also try to perlbrnl tagging based on tile com- 
plete context. As a result, even for" when the 
word on tile left; is the same, the tagging result~ 
s will be difl'erent if the complete contexts are 
different, rl'ha, t is~ the neuro tagger carl hard- 
ly acquire the rules with single inputs. Fur- 
thermore, although lexica.l in\[brma.tion is very 
ilnport~ult in t~gging, it is difficult for: neural 
nets to use it becmme doing so would make the 
network enorlnous. That is, the neuro tagger 
ca.nnot acquire (;lit rules with lexical informs> 
tion. Additionally, Imca.use of convergence and 
509 
over-training l)roblems, it is impossible and also 
not advisM)le to train neural nets to an a.ccura,- 
cy of 100%. The training should be stopped at 
an appropriate level of a.ceuracy. Consequently, 
neural nets may not acquire some usefnl rules. 
To make up for these shortcomings of the 
neuro tagger, we introduce in this pa.per a rule- 
based corrector as tile post-processor and con- 
struct a hyl)rid system. The rule-based cot- 
rector is constructed by a set of transforma- 
tion rules, which is acqnired by transforma£ion- 
based error-driven learning (Brill, 1.994:) from 
training corpus using a set of templates. The 
templates are designed to SUl)l)ly the rules that 
the neuro tagger can hardly acquire. Actual- 
ly, by examining the transformation rules ac- 
quired in the computer experiments, the 99.9% 
of them are exactly those that the neuro tagger 
can hardly acquire~ even when using a template 
set including those for generating the'rules that 
the neuro tagger can easily acquire. This rein- 
forces onr expectation that the rule-based ap- 
proach is a well-suited method to cope with the 
shortcomings of the neuro t~gger. Computer ex- 
periments shows thai; about 200/0 of errors made 
by the neuro tagger can be corrected by using 
these rules and that the hybrid system ca.n reach 
an accuracy of 95.5% counting only the aml)ign- 
ous words and 99.1% counting all words in l, he 
testing corpus, when tile same corpus described 
above is used for training. 
2 POS Tagging Problems 
In this paper, suppose there is a lexicon V, 
where the POSs that can be served by each word 
are list.ed, and tiler(; is a set of POSs, l?. That is, 
unknown words that do not exist in the lexicon 
are not dealt with. The POS tagging problem 
is thus to find a string of POSs T = T172..-% 
(ri C F, i = 1,-..,s) by following procedure 
~o when sentence W = wlw2...w.~ (wi C V, 
i = 1,.-.,s) is given. 
#:W ~ -+ rt, (1) 
where t is the index of the target word (the word 
to be tagged), and W t is a word sequence with 
length l + 1 + r (:entered on the target word: 
lie t : wt_ 1 .... io t • . . Wt+r~ (2) 
where t - 1 > 1, t + r _< s. 'l'agging ca.n thus be 
regarded as a classification problem 1) 3, replacing 
the POS with (;lass and can therefore be handled 
by using neural nets. 
3 Hybrid System 
Our hybrid system (Fig. J) consists of a neuro 
tagger, which is used as an initial-state an nota- 
i;or, ~nd a rule-based corrector, which corrects 
the outputs of the neuro tagger. When a word 
seque,~ce W t \[see l~q. (2)\] is given, the neuro 
tagger outl)ut a tagging result rN(Wt) for tile 
target word wt at first. The rule-based correc- 
tor then corrects the output of the neuro tagger 
as a fine tuner and gives tile final tagging result 
Ncuro Tagger Rule-Based 
Corrcctor 
Figure 1: Hybrid neuro and rule-based tagger. 
3.1 Neuro tagger 
As shown in Fig. 2, the neuro tagger consists 
of a three-layer I)erceptron with elastic input. 
This section mainly descril)es the construction 
of inl)ut and output of the nenro tagger, and 
the elasticity by which it; becomes possible to 
use variable length of context for tagging. For 
details of the architecture of l)erceptron see e.g., 
Haykin, 1994 and for details of the features of 
the neuro tagger see Ma and isahara, 1998 and 
Ma, et aJ., 1999. 
lnl)ut IPT is constructed fi'om word se- 
quence W t \[Eq. (2)\], which is centered on target 
word wt and has length l + 1 + r: 
IPT = (iptt_l,..', iph,..., iph+.,.), (3) 
provided that input length l+ J+r has elasticity, 
as described a,t tile end of this section. When 
woM wis given in position x (x = t-l,...,t+r), 
510 
OPT 
iptt_ I ...... il)\[t_ I ipl t 
11"1" 
il)lt+l ...... i\]Ht+r 
Figure 2: Neuro tagger. 
element ipt, of input IPT is a. weighted pattern, 
defined as 
ipt.,: = :/.,,. (e,,,,('-,,'e,." ,q,,~), (4) 
where g;,; is the inIbrmation gain which can be 
obtained using information theory (for details 
see Ma and lsahara., 11998) and 7 is the number 
of tyl)es of POSs. l\[' w is a word that apl)ears 
in the tra.ining data, then ea.ch bit e,,,i can be 
obtained: 
= P,,ot,(/I,,,), (s) 
where l>'rob(ril'w) is a prior l>robal)ility of r i 
tha.t (;he woM 'w can take,. It is estitnated from 
the t raining (la,ta: 
C _ (<, "') 
c'(,4 ' 
where C'(r (,w) is the number of lfimes both r: 
at++d w al)pea, r , a,nd C(w) is the number oi' times 
w appears in the training data. 1\[ w is a word 
that does not at)pear ill (,he training data~ then 
each t)it c,,,,i is obtained: 
~ i\['r i is a. candidate (7) 
e.,,,," = (} otherwise, 
where 7,, is the number of P()Ss that the word 
'w Call ta.ke. Output OPT is defined as 
OFT= ,o+), (s) 
provi<led that the output OI)T is decoded as 
S r/ ifOi=l &O/=0forj¢i YN(Wt) 
Unknown otherwise, 
(.9) 
where rN(W,) is the £a.gging result obtained by 
the neuro tagger. 
There is more inforlnation available for con- 
structing the input for words on the left, be- 
cause they have already been tagged. In the 
tagging phase, instead of using (4:)-(6), the in- 
put can be constructed simply as 
i/,,_4 = . oPT(-i), (1()) 
where i = 1,...,1, and O I)T(-i) means the out- 
put of the tagger for the ith word before the 
target word. ltowever, in the training process, 
the out;put of the tagger is not alway.a correct 
a.nd cannot be ted back to the inputs directly. 
Instead, a weighted awerage of the actual output 
a.nd tlm desired output is used: 
iptt_i = 9t-i ' (wol,T " 0 PT (- i) + WlOJ,:s " I) liS), 
(1.1) 
where 1)l':,q' is the desired output, 
o:,:,5' : (&,/)2,..., 
whose bits are defined as 
\] iI' r i is a desired answer 
I)i = 0 otherwise, (la) 
and WOl,'r and w/)l,\],q' are respecLh:ely de\[(ned as 
1'\]013J 
~,:o1'~ .... (.14) 
1JACT' 
a,nd 
'w>l,;,s, = :1 - wopT, (15) 
where \]@,uo and \]'JAC'T are the objective and 
actual errors. Thus, at the beginning of train- 
ing, the weighting of the desired output is largo. 
It decreases to zero during training. 
Elastic inputs are used in the neuro tagger 
so that the length of COlltext is variable in tag- 
ging based on longest context priority. In (te~ 
tail, (l, r) is initially set as large as possible for 
tagging. If rN('Wi) = Unknown, then (1, r) is 
reduced by some constant interval. This l)ro- 
cess is repeated until rN(W~) 7 k Unknown or 
(1, r) = (0,0). On the other hand, to nmke the 
same set of connection weights of the neu ro tag- 
ger with the largest (1,'r) ava.ilable a.s lnuch as 
511 
possible when using short inputs for tagging, in 
training phase the neuro tagger is regarded as a 
neural network that has gradually grown fi'om 
small one. The training is therefore performed 
step by step from small networks to large ones 
(for details see Ma, et al. 1999). 
3.2 Rule-based eorreetor 
Even when the POS of a word can be deter- 
mined with certainty by only the word on the 
left, for example, tile neuro tagger still tries to 
tag based on the complete context. That is, 
in general, what tile neuro tagger can easily 
acquire by learning is the rules whose condi- 
tional parts are constructed by all inpttts iptx 
(x = t - l,...,t + r) that are .joined with all 
AND logical operator, i.e., (iptt-t & "'" iptt & 
• .. iptt+~, -+ OPT). In other words, it is (lit: 
ticult for tile neuro tagger to learn rules whose 
conditional parts are constructed by only a sin- 
gle input like (ipt,. --+ OPT) ~). Also, although 
lexical information is very important in tagging, 
it is difficult for the neuro tagger to use it, be- 
cause doing so would make the network euof 
mous. Tha.t is, the neuro tagger cannot acquire 
rules whose conditional parts consist of lexical 
information like (w -4 OPT), (w&r -4 OPT), 
and (w~w2 --+ OPT), where w, Wl, and w2 are 
words and 7- is tile POS. Furthermore, because 
of convergence and over-training 1)rol)lems, it is 
iml)ossible and also not advisable to train net> 
ral nets to all accuracy of 100%. The training 
should be stopped at an apl)rol)riate level of a.c- 
curacy. Thus, neural net may not acquire some 
useful rules. 
The transfbrmation rule-based corrector 
makes up for these crucial shortcomings. 
The rules are acquired Dora a training col 
pus using a set of transformation templates 
by transformation-based error-driven learning 
(Brill, 1994). Tile templates are constructed 
using only those that supply the rules that tile 
nenro tagger can hardly acquire, i.e., are those 
1)The neuro tagger can also learn this kind of rules 
because it can tag tile word using only ipt, (the input 
of tile target word), ill the case of reducing tile (I, r) to 
(0,0), as described in Sec. a.l. The rules with single 
input described here, however, are a more general case, 
ill which the input call be ipt,~ (~: = t - 1,..., t + r). 
for acquiring the rules with single input, with 
lexical information, and with AND logica.1 in- 
put of POSs and lexical information. The set of 
templa,tes is shown in Table 112). 
According to the learning procedure shown 
in Table 2, an ordered list of transformation 
rules are acquired by applying the template set 
to a training corpus, which had ah'eady been 
tagged by the neuro tagger. After tile trans- 
formation rules are acquired, a corl)us is tagged 
as tbllows. It is first tagged by the neuro tag- 
ger. The tagged corpus is then corrected by 
using the ordered list of transformation rules. 
The correction is a repetitive process applying 
the rules ill order to the corptlS, which is then 
updated, until all rules have been applied. 
4 Experimental Results 
Data: For our computer experiments, we used 
tile same Thai corpus used by Ma et al. (1999). 
Its 10,d52 sentences were randomly divided in- 
to two sets: one with 8,322 sentences for trail> 
ing and the other with 2,1.30 sentences for test- 
int. The training set contained 12d,331 word- 
s, of which 22,311 were ambiguous; the testing 
set contained a4,5~14 words, of which 6,717 were 
ambiguous. For training tile n euro tagger, only 
the ambiguous wor(ls in the. training set were 
used. For training the HMM, all tilt words in 
the training set were used. In both cases, all the 
words in tile training set were used to estimate 
Prob(rilw), tim probability of "c i that wor(I w 
can be (for details on the HMM, see Ma, et al., 
1999). In the corpus, 4:7 types of POSs are de- 
fined (Charoenporn et al., 1997); i.e., 7 = 47. 
Neuro tagger: The neuro tagger was con- 
structed by a three-layer perceptron whose 
input-middle-outI)ut layers had p z, 2 7 units, 
respectively, where p = 7 × (1 + I + r). The 
(l + 1 + r) had tile following elasticity. In train- 
ing, tile (I, r) was increased step by step as (71,1) 
-+ (u,2) (a,2) (a,a) a,d gra,dual 
training fl'om a small to a large network was 
pertbrmed. Ill tagging, on the other hand, the 
2)To see whether this set is suitable, a immloer of ad- 
ditional experiments were conducted using various sets 
of templates. The details are described in Sec. 4. 
512 
r ~ I a,1)lc 1: Set o\[' templa.tes for tra.ns\[orln;~tion rules 
Change t;ag v a to t;ag v ° when: 
(single inlm|;) 
(input ('onsists of a POS) 
1. left (right) word is tagged v. 
2. second left (right) word is tagged r. 
3. third left (right) word is ta.gged r. 
(inI)n|; consist;s of a word) 
4. ta.rget word is ~. 
5. left (right) word is w. 
6. second left, (right) word is w. 
(AND logical inpu¢ ot" words) 
7. l, arget word is 'UO 1 ~tlld left (right) word is wu. 
8. left; (right) word is u,1 and second lcfl, (,'ight) word is w2. 
9. left, word is w~ a.nd right, word is 'wu. 
(AND logical in.lint; of POS and words) 
10. ta.rget word is uq and left (right) word is llaggod r. 
:11. left (righl;) word is .w~ and left. (right) word is tagged r. 
12. ta.rget word is w~, left (right) word is ,w.,, and left (right) word is tagged r. 
Ta,1)le 2: l)roetdure for learning transi'orma,tion rules 
1. Apply neuro taggtr to training corpus, which is then updated. 
2. Compare tagged results with desired ones and find errors. 
3. Ma.teh templates l'or all errors and obtain set of tra.nsformation rules. 
d. Select rule in corpus with the maximum value of' (cnl,_qood-h. cnl,_bad), where 
cnZ_qood: number that transforms incorrect tags to correct elliS: 
c'nl._bad: number that transforms correct tags to incorrect ones, 
h: weight to control the strict, hess of generating 1;he rule. 
5. Apply selecttd rule to training corpus, which is then updated. 
6. Append selected rule to ordered list o1" trausl'orma.tion rules. 
7. Ih'4)eal; steps :2 through (j until no such rule can I)e selected, i.e., c'n,t_good- 
h,. cnl,_bad < O. 
(l, 'r) was inversely reduce(l ste l) by step as (3,3) 
-+ (3,2) (2,2) (2,:1) O,:l) (:l,o) 
(0,0) a.s needed, provided tJlat the number of 
units in the middle layer was kept a.t the ma.xi- 
Ill lllll vahle. 
Rule-based correttor: The parameter h in 
the tw~Juat;ion function (cnl,_9ood - h, . c'M._bad) 
used in 1;he learning procedure (Table 2) is a 
weight to control the strictness of generating a. 
rule. IF It is large, the weight of cnt_bad is la.rge 
and the possibility of generating incorrect rules 
is reduced. By regarding the neuro tagger as ~d- 
ready having high accuracy and using tile rule-- 
based correcter as a fine tuner, weight h. was set 
to a. large vahm, 100. Applying |;lit templates 
Co the training corptm, which had already been 
tagged 1) 5, the neuro ta.gger, we obta.ined a.n or- 
dered list; of 520 transfbrmation rules. '.l'~d)le 3 
shows the first 15 transfbrmation rules. 
Results: Table 4 shows the resull;s of I)()S tag- 
ging for the testing data.. In addition to the 
accuracy o\[" the neuro tagger and hybrid sys- 
tem, the ta.ble also shows tile accuracy of a, bast- 
line model, the IIMM, and a rule-based model 
\['or comparison. The baseline model is one that 
performs tagging without using the contextual 
inlorma.tion; instead, it performs ta.gging using 
only f'requency informa.tion: the proba.bility of 
P()S that; tach word can be. The rule-based 
model, to be exact, is also a hybrid system con- 
513 
'l'a.1)le 3: First 15 transfbrmation rules 
No. From To Cond i t i on 
1 PREL 
2 PREL 
3 Unknown 
4 XVHI4 
5 VATT 
6 Unknown 
7 NCI4N 
8 VATT 
O PREL 
i0 VST~ 
ii VfiTT 
12 NCMN 
13 NCHN 
14 Unknown 
15 NCNN 
RPRE left word is punctuation and right tuord is 5~gu 
RPRE left yard is ~ 
RDVN left ~ord Ls tagged XVfiE 
XVBH left word is II~D 
flDVN left word is ~ 
VRTT left word is tagged PREL 
RPRE left word is ua 
VSTfi left word is ~q~ 
RPRE right word is ~gu and second right word is a~q;J 
RDVN target word is ~t~4 
ADVN target word is ~4~ 
RPRE target word is n14 and left word is eentluu 
RPRE left word is ~tt and left ward is tagged NCHN 
fiDVN third left word is tagged WCT 
CNIT taPget ~ord is nn~ 
where PREL: Relative Pronoun, RPRE: Preposition, fiDVN: fidverb with normal form .... 
Table d: Results of POS ta,gging for testing data* 
model baseline IIMM rule-based lleuro hybrid 
accuracy 0.836 0.891 0.935 0.944 0.955 
*Accurac9 was determiued only for am lfiguous words. 
sisting of an initial-state annotator and a set of 
transformation rules. As the initial-state anno- 
b~tor, however, the baseline model is used in- 
stea.d of' the neuro tagger. And, its rule set. has 
1,177 transformation rules acquired h'om a more 
general teml)late set, which is described at the 
end of this section. The reason for using a gener- 
al template set is that the sol; of tra.nsibrma.tion 
rules in the rule-based model should be the main 
annotator, not a fine post-processing tuner. For 
the same reason, the parameter to control the 
strictness of generating a rule, h, was set to a 
small value, \], so that a larger number of rules 
were generated. 
As shown in the table, the accuracy of the 
nenro tagger was far higher than that of the 
HMM and higher than that of the rule-based 
model. The accuracy of the rule-based mod- 
el, on the other hand, was also far higher than 
that of the IIMM, ~lthough it was inferior to 
that of the neuro tagger. The accuracy of the 
hybrid system was 1.1% higher than that of the 
neuro tagger. Actually, the rule-based corrector 
corrected 88.4% and 19.7% of the errors made 
by the neuro tagger for the training and testing 
data, respectively. 
Because the template set shown in Table 1 
was designed only to make up for the short- 
comings of the neuro tagger, tile set is smal- 
l compared to that used by Brill (1994). To 
see whether this set is la.rge enough for our sys- 
tem, we perlbrmed two additional experiments 
in which (\]) a sol; constructed 193' adding the 
templates with OR logical input of words to the 
original set and (2) a, set constructed 1)5' fnrther 
adding the templates with AND and OR logi- 
cal inputs of POSs to the set of case (1) were 
used. The set used in case (2) inclnded the set 
used by Brill (\]994) and all the nets nsed in our 
experiments. It was also used for acquiring the 
transformation rules in the rule-based model. 
The experimental results show that compared 
to the original case, the accuracy in case (1) 
was improved very little and the accuracy in 
case (2) was also improved only 0.03%. These 
results show that the original set is nearly la.rge 
enough for our system. 
To see whether tile set is snitable tbr our 
system, we performed ~tn additional experimen- 
t using the original set in which the templa.tes 
with OR logical inputs were used instead of the 
templates with AND logical inputs. The accu- 
racy dropped by 0.1%. Therefore, tile templates 
with AND logical inputs are more suitable than 
514 
those with O11 logical inputs. 
We also performed an experiment using a 
template set without lexical intbrmation. In this 
case, l;he accuracy dropl)ed by 0.9%, indicating 
that lexical informatioll is important in tagging. 
To determine the effect o1' using a. large h, 
for generating rules, we per\['ormed an experi- 
ment with h = 1. In this case, the accuracy 
dropped by only 0.045%, an insignifica.nt differ- 
ence compared to the case of h, = 100. 
By examining the acquired rules that were 
obtained by al)plying the most COml)lete tem- 
plate set, i.e., the set used in case (2) described 
above, we found that 99.9% of them were those 
that can be obtained by a.pl)lying the original 
set of templates, rl'ha.t is, the acquired rules 
were almost those that are dif\[icult \['or the neu- 
re tagger to acquire. '.l'his rein forced our expec- 
tat;ion that the rule-based al)l)roach is a well- 
suited method to cope with the shortcoming of 
the neuro tagger. 
Finally, il, should 1)e noted that ill the liter- 
atures, tile tagging a.ccuracy is usua.lly delined 
by counting a.ll tile words regardless of whether 
they are a.nlbiguous or not. If we used this dell- 
nil:ion, t\]le accura.cy of our hybrid system would 
be 99.1%. 
5 Conclusion 
To collstruct a 1)tactical tagger that needs as 
little training data. a.s possible, neuro taggers, 
which have high generalizing al)ility and there- 
fore a.re good at dealing with the problems ofda~ 
ta. sl)a,rseness, have been proposed so fa.r. Neu- 
re tatters, however, have crucial shortcomings: 
they ca.nnot utilize lexical information; they 
have trouble learning rules with single inputs; 
and they cannot learn training data to an ac~ 
curacy of 100%. To make up for these short- 
comings, we introduced a rule-based correcter, 
which is constructed by a. set of trans\[brma.tion 
rules obtained by error-driven learning, for post 
1)recessing and constructed a hybrid tagging 
system, l{y examining the transtbrma.tion rules 
acquired in the computer experiments, we found 
that 1;he 99.9% of them were those that; the neu- 
re tagger can hardly acquire, even when using a. 
template set including t;hose for generating the 
rules that the neuro tagger can easily acquire. 
This reinlbrced our expecta.tion that the rule- 
based approach is a well-suited method to cope 
with the shortcoming of the neuro tagger. Com- 
puter experiments showed that 19.7% of the er- 
rors made by the neuro tagger were corrected 
by the tra.nslbrmation rules, so the hybrid sys- 
tem rea.ched an accuracy of 95.5% counting only 
the ambiguous words and 99.\]% counting all the 
words in the testing data, when a small corpus 
with only 22,311 ambiguous words was used tbr 
train int. ~l'h is ind icates thai; ou r tagging ,qystem 
can nearly reach a pra.ctica.l level in terms of tag- 
ging accuracy even when a small Thai corpus is 
used tbr tra.ining. This kind of tagging system 
can be used to constructs multilingua.1 corpora 
that include s in which large corpora 
have not yet been constructed. 

References 

l~rill, E.: Transfornmtion-based error-driven lca.rn- 
ing and natural  processing: ~ case s- 
tudy ill 1)art-of-sl)eech tagging, Computational 
Li~g'uistics, Vol. 21, No. 4, pp. 543-565, 199~1. 

Cha.roenporll, T., Sornlertlanlva.nich, V., ~md Isa- 
hara, 11.: Building a la.rge Thai text corpus 
parl; of speech tagged corpus: OI{CIlll), Pro< 
Natural Language Processi~fl Pacific lNm ,5'gn~- 
po.du'm \[997, Phuket, Thailand, pp. 509-5\]2, 
1997. 

I)aelemans, W., Z~wrel, a., Berck, P., and C,i/lis, S.: 
MI3'I': A m<mlory-based pm-t of speech tagger- 
genera.tot, P'roc. /tl.h Workshop on Very Large 
Co,'po~zl, Copenhagen, l)em na.rk, pp. 1-1+1, 199(5. 

l\]aykin, S.: Neural Nchvorlcs, Macmillan College 
Publishing Coral)any, Inc., 199/t. 

Ma, Q. and lsahm'a., H.: A multi-neuro tagger us- 
ing variable lengths of contexts, Prec. COLING- 
ACL'g8, Montreal, pp. 802-806, 1998. 

Ma, Q., Uchimoto, K., Mura.ta, M., and 1sahara H.: 
F, lastic neural networks tbr part of speech tag: 
ging, Prec. IJCNN'99, Washington, \])C., pp. 
2991-2996, 1999. 

Meriaklo, B.: Tagging English text with a proba- 
bilistic model, Computational Linguistics, Vo\]. 
20, No. 2, pp. 1.55-171, 19(.)4. 

Quinla.n, 3.: G'~.5: Programs Jot Machine Learning, 
San Mateo, CA: Morgan Kaufinann, 1993. 

Schmid, 1t.: l'art-of-speech tagging with neural net- 
works, Prec. COLING'94, Kyoto, Japan, pp. 
172-176, 1994. 
