Committee-based Decision Making 
in Probabilistic Partial Parsing 
INUI Takashi* and INUI Kentaro *t 
• Department of Artificial Intelligence, Kyushu Institute of Technology 
• PRESTO, Japan Science and Technology Corporation 
{t_inui, inui}@plut o. ai. kyutech, ac. jp 
Abstract 
This paper explores two directions ibr the 
next step beyond the state of the art of statis- 
tical parsing: probabilistic partial parsing and 
committee-based decision making. Probabilis- 
tic partial parsing is a probabilistic extension of 
the existing notion of partial parsing~ which en- 
ables fine-grained arbitrary choice on the trade- 
off between accuracy and coverage. Committee- 
based decision making is to combine the out- 
puts from different systems to make a better 
decision. While varions committee-based tech- 
niques for NLP have recently been investigated, 
they would need to be fln'ther extended so as 
to be applicable to probabilistic partial pars- 
ing. Aiming at this coupling, this paper gives 
a general fl'amework to committee-based deci- 
sion making, which consists of a set of weight- 
ing flmctions and a combination function, and 
discusses how it can be coupled with probabilis- 
tic partial parsing. Our ext)eriments have so far 
been producing promising results. 
1 Introduction 
There have been a number of attempts to use 
statistical techniques to improve parsing perfor- 
mance. While this goal has been achieved to a 
certain degree given the increasing availability 
of large tree banks, the remaining room tbr the 
improvement appears to be getting saturated 
as long as only statistical techniques are taken 
into account. This paper explores two directions 
tbr the next step beyond the state of the art of 
statistical parsing: probabilistic partial parsing 
and committee-based decision making. 
Probabilistic partial parsing is a probabilistic 
extension of the existing notion of partial pars- 
ing ( e.g. (Jensen et al., 1993)) where a parser 
selects as its output only a part of the parse tree 
that are probabilistically highly reliable. This 
decision-making scheme enables a fine-grained 
arbitrary choice on the trade-off between ac- 
curacy and coverage. Such trade-oil is impor- 
tant since there are various applications that re- 
quire reasonably high accuracy even sacrificing 
coverage. A typical example is the t)araI)hras- 
ing task embedded in summarization, sentence 
simplification (e.g. (Carroll et al., 1998)), etc. 
Enabling such trade-off" choice will make state- 
of the-art parsers of wider application. Partial 
parsing has also been proven useflll ibr boot- 
strapping leanfing. 
One may suspect that the realization of par- 
tial parsing is a trivial matter in probabilistic 
parsing just because a probabilistic parser in- 
herently has the notion of "reliability" and thus 
has the trade-off:' between accuracy and cover- 
age. However, there has so far been surpris- 
ingly little research focusing on this matter and 
ahnost no work that evaluates statistical parsers 
according to their coverage-accuracy (or recall- 
precision) curves. Taking the significance of 
partial parsing into account, therefi)re in this 
paper, we evaluate parsing perfbrmance accord- 
ing tO coverage-accuracy cnrves. 
Committee-based decision making is to con> 
bine the outputs from several difl'erent systems 
(e.g. parsers) to make a better decision. Re- 
cently, there have been various attempts to at)- 
ply committee-based techniques to NLP tasks 
such as POS tagging (Halteren et al., 1998; 
Brill et al., 1998), parsing (Henderson and 
Brill, 1999), word sense disambiguation (Peder- 
sen, 2000), machine translation (lh'ederking and 
Nirenburg, 1994), and speech recognition (Fis- 
cus, 1997). Those works empirically demon- 
strated that combining different systems often 
achieved significant improvelnents over the pre- 
vious best system. 
In order to couple those committee-based 
348 
schemes with t)robat)ilistic t)artial parsing, how- 
ever, Olle would still need to make a fllrther ex- 
tension. Ainling at this coupling, ill this t)at)er, 
we consider a general framework of (:ommil, tee- 
based decision making that consists of ~ set 
of weighting flmctions mid a combination flmc- 
tion, and (lis('uss how that Kalnework enal)les 
the coupling with t)robal)ilistic t)artial t)m:sing. 
To denionstr~te how it works, we ret)ort the re- 
sults of our t)arsing exl)eriments on a Japanese 
tree bank. 
2 Probabilistic partial parsing 
2.1 Dependency probability 
In this t)at)er, we consider the task of (le(:id- 
ing the det)endency structure of a Jat);mese in- 
put sentence. Note that, while we restrict ore: 
discussion to analysis of Jat)anese senl;(;nc(;s in 
this t)~l)er, what we present l)elow should also 
t)e strnightfi?rwardly ?xt)plical)h~ to more wide- 
ranged tasks such as English det)endency anal- 
ysis just like the t)roblem setting considered t)y 
Collins (1996). 
Givell ;m inl)ut sentence ,s as a sequence, of 
B'unset,su-t)hrases (BPs) J, lq b2 ... lh~, our task 
is to i(tent, i\[y their inter-BP del)endency struc- 
t,,e n = l,j)l,: = ',,,}, where 
(tenot;es that bi (let)on(Is on (or modities) bj. 
Let us consider a dependency p'roba, bility (I)P): 
P('r(bi, bj)l.s'), a t)rol)al)ility l;lu~t 'r(bi, b:j) hohts 
in a Given senl:ence s: Vi. Ej P(','(51, t,j)l.4 = a. 
2.2 Estimation of DPs 
Some of the state-of:the-art 1)rol)at)ilis(;ic bm- 
guage inodels such as the l)ottomu t) models 
P(l~,l.,.) propos,,d by Collins (1:)96) and Fujio 
et al. (1998) directly estimate DPs tbr :~ given 
int)ut , whereas other models su('h as PCFO- 
t)ased tel)down generation mod(;ls P(H,,,s) do 
not, (Charnink, 1997; Collins, 1997; Shir~fi et ~rl., 
1998). If the latter type of mod(,'ls were totally 
exchlded fronl any committee, our commit;tee- 
based framework would not work well in I)rac- 
lice. Fortm:ately, how(:ver, even tbr such a 
model, one can still estimate l)l?s in the follow- 
ing way if the rood(;1 provides the n-best del)en- 
1A bunsctsu phrase (BP) is a chunk of words (-on- 
sist;ing of a content word (noun, verl), adjective, etc.) 
accoml)mfied by sonic flmctional word(s) (i)arti(:le, mlx- 
iliary, etc.). A .lai)anes(' sentc'nce can 1)c analyzed as a 
sequence of BPs, which constitutes an inter-BP deI)en- 
dency structure 
dency structure candidates cout)led with prot)- 
abilistic scores. 
Let Ri be the i-th best del)endency st;ruct;ure 
(i = 1,..., 'n) of ;~ given input ,s' according to a 
given model, and h;t ~H l)e a set; of H,i. Then, 
,.,u, l,e csl;ima|;ed by the following 
ai)l)l"OXilnation equation: 
./)F 
• 7~H P(',(b,z, (1) 
where P'R.u is the probal)ilit;y mass of H, E 7~Lr, 
and prn. is the probability mass of R ~ ~H 
that suppori;s 'r(bi, bj). Tile approximation er- 
ror c is given 1)y c < l;r~--1%, where l),p,, is 1;t2(; 
-- l~p~ ' 
prol)abilil;y mass of all the dependency struc- 
ture candidates for s (see (Peele, 1993) for the 
l?roof). This means that the al)t)roximation er- 
ror is negligil)le if P'R,, is sut\[iciently close to 
1),R, which holds for a reasonably small mlmt)er 
'n in lnOSt cases in practical statistical parsing. 
2.3 Coverage-accuracy curves 
We then conside, r the task of selecting depen- 
dency relations whose estimated probability is 
higher I:han a (:e|:i;ain l;hreshoht o- (0 < a < 1). 
When (r is set 1;o be higher (closer to 1.0), t;he 
accuracy is cxt)ected to become higher, while 
the coverage is ext)ecl;ed to become lowe,:, and 
vi(:e versm Here, (;over~ge C* and a,(;ctlra(;y A 
are defined as follows: 
# of the. decided relations C 
# of nil the re, lations in I;\]le t;est so,}i2 )/~ 
# of the COl'rectly decided relati°n~3~vJ A 
# of the decided relations 
Moving the threshohl cr from 1.0 down to- 
ward 0.0, one (:an obtain a coverage-a(:cura(:y 
(:urve (C-A curve). In 1)rol)al)ilistic t)artial pars- 
ing, we ewflunte the t)erforman('e of a model 
~mcording to its C-A curve. A few examt)les 
are shown in Figure 1, which were obtained 
in our ext)erim(mt (see Section 4). Ot)viously, 
Figure 1 shows that model A outt)erformed the 
or, her two. To summarize a C-A cIlrve, we use 
the ll-t)oint average of accuracy (l l-t)oint at:- 
curacy, hereafl;er), where the eleven points m'e 
C = 0.5, 0.55,..., 1.0. The accuracy of total 
parsing correst)onds to the accuracy of the t)oint 
in a C-A curve where C = 1.0. We call it total 
~ccuracy to distinguish it from l\]-l)oint at:el> 
racy. Not;('. that two models with equal achieve- 
349 
! 
A 
0.95 
0.9 
0.85 
0.8 
0 0.2 0.4 0.6 0.8 1 
coverage 
Figure 1: C-A curves 
meuts in total accuracy may be different in ll- 
point accuracy. In fact, we found such cases in 
our experiments reported below. Plotting C-A 
curves enable us to make a more fine-grained 
perfbrmance evaluation of a model. 
3 Committee-based probabilis- 
tic partial parsing 
We consider a general scheme of comnfittee- 
based probabilistic partial parsing as illustrated 
in Figure 2. Here we assume that each connnit- 
tee member M~ (k = 1,..., m) provides a DP 
matrix PM~(r(bi, bj)ls ) (bi, bj E s) tbr each in- 
put 8. Those matrices are called inlmt matrices, 
and are give:: to the committee as its input. 
A committee consists of a set of weighting 
functions and a combination flmction. The role 
assigned to weighting flmctions is to standardize 
input matrices. The weighting function associ- 
ated with model Mk transforms an input ma- 
trix given by MI~ to a weight matrix WaG- The 
majority flmction then combines all the given 
weight matrices to produce an output matrix O, 
which represents the final decision of the con> 
mittee. One can consider various options for 
both flmctions. 
3.1 Weighting functions 
We have so far considered the following three 
options. 
Simple The simplest option is to do nothing: 
~a~ = PA~ (,.(b~, bj)l~) (4) ij 
o Mk where wij is the (i,j) element of I/VMk. 
Normal A bare DP may not be a precise es- 
timation of the actual accuracy. One can see 
this by plotting probability-accuracy curves (P- 
A curves) as shown in Figure 3. Figure 3 shows 
that model A tends to overestimate DPs, while 
/ 
O}llllll\[ttct ~ based decision nlakilll~ 
i 
i " i -- =" ,, 
models itlpllt i weight matrices i .................................. : 
matrices "tV l:: "~Vt~iglt|iJI g \]"u n ¢1\]Oll 
CF: Ct3mhinatlan I:mlcl\[.n 
Figure 2: Committee-based probabilistic partial 
parsing 
0,9 
;,~ 
0,8 
= 
0.7 
0.6 
C 
0.5 
0,5 0.6 0.7 0.8 0.9 
dependency probability 
Figure 3: P-A curves 
model C tends to underestimate DPs. This 
lneans that if A and C give different answers 
with the same DP, C's answer is more likely 
to be correct. Thus, it is :sot necessarily a 
good strategy to simply use give:: bare DPs in 
weighted majority. To avoid this problem, we 
consider the tbllowing weighting flmction: 
w~J k =@lkAM~(PMk(','(bi, b:i)l.s)) (5) 
where AMk (P) is the function that returns the 
expected accuracy of Mk's vote with its depen- 
Mk dency probability p, and oz i is a normalization 
factor. Such a function can be trained by plot- 
ting a P-A curve fbr training data. Note that 
training data should be shared by all the com- 
mittee members. In practice, tbr training a P-A 
curve, some smoothing technique should be ap- 
plied to avoid overfitting. 
Class The standardization process in the 
above option Normal can also be seen as an 
effort for reducing the averaged cross entropy 
of the model on test, data. Since P-A curves 
tend to defi~,r not only between different mod- 
els but also between different problem classes, 
if one incorporates some problem classification 
into (5), the averaged cross entropy is expected 
350 
to be reduced fllrther: 
'w~ ~ =/~';'~AM.%(i)M~(r(b~,bj)l,s)) (6) 
where AMkcl, i (P) is the P-A curve of model Mk 
only tbr the problems of class Cb~ in training 
data, and flMk is a normalization factor. For i 
probleln classification, syntactie/lexieal features 
of bi may be useful. 
3.2 Combining functions 
For combination flmctions, we have so far con- 
sidered only simple weighted voting, which av- 
erages the given weight matrices: 
1;I, Mk 1 v-" Mk 
• = -- 2_, ~'J'U (7) °iJ 'm, 
h=l 
where o.i.f/~:_ is the (i, j) element of O. 
Note that the committee-based partial pars- 
ing frmnework t)resented here can be see, n as 
a generalization of the 1)reviously proposed 
voting-based techniques in the following re- 
spects: 
(a) A committee a(:(:epts probabilistically para- 
meterized votes as its intmt. 
(d) A committee ac(:el)ts multil)le voting (i.e. it; 
allow a comnfittee menfl)er to vote not only 
to the 1)est-scored calMi(late trot also to all 
other potential candidates). 
((:) A. (:ommittee 1)rovides a metals tbr standard- 
izing original votes. 
(b) A committee outl)uts a 1)rot)abilisti(" distri- 
bution representing a tinal decision, which 
constitutes a C-A curve. 
For examt)le, none of simple voting techniques 
for word class tagging t)roposed 1)y van Hal- 
teren et al. (1998) does not accepts multiple 
voting. Henderson and Brill (1999) examined 
constituent voting and naive Bayes classifi(:a- 
lion for parsing, ol)taining positive results ibr 
each. Simple constituent voting, however, does 
not accept parametric votes. While Naive Bayes 
seems to partly accept l)arametric multit)le vot- 
ing, it; does not consider either sl;andardization 
or coverage/accuracy trade-off. 
4 Experiments 
4.1 Settings 
We conducted eXl)erinmnts using the tbllow- 
ing tive statistical parsers: 
Table 1: The total / l l-t)oint accuracy achieved 
1)y each individual model 
total 11-point 
A 0.8974 0.9607 
B 0.8551 0.9281 
C 0.8586 0.9291 
D 0.8470 0.9266 
E 0.7885 0.8567 
• KANA (Ehara, 1998): a bottom-up model 
based oll maxinmm entropy estimation, 
Since dependency score matrices given by 
KANA have no probabilistic semantics, we 
normalized them tbr each row using a cer- 
tain function manually tuned for this parser. 
• CI\]AGAKE (Fujio et al., 1998): an exten- 
sion of the bottom-up model proposed by 
Collins (Collins, 1996). 
• Kanaymna's parser (Kanayama et al., 1999): 
a l)o|,tom-up model coupled with an HPSG. 
• Shirai's parser (Shirai et al., 1998): a top- 
down model incorporating lexical collocation 
statistics. Equation (1) was used tbr estimat- 
ing DPs. 
• Peach Pie Parser (Uchilnoto et al., 1999): 
a bottom-up model based on maximum en- 
tropy estimation. 
Note that these models were developed flfily 
independently of ea('h other, and have siglfifi- 
Calltly different (:haracters (Ii)r a comparison of 
their performance, see %tble 1). In what Jbl- 
lows, these models are referred to anonymously. 
For the source of the training/test set, we 
used the Kyoto corpus (ver.2.0) (Kurohashi et 
al., 1.997), which is a collection of Japanese 
newspaper articles mmotated in terms of word 
boundaries, POS tags, BP boundaries, and 
inter-BP dependency relations. The corpus 
originally contained 19,956 sentences. To make 
the training/test sets~ we tirst removed all the 
sentences that were rejected by any of the above 
five parsers (3,146 sentences). For the remain- 
ing 16,810 sentences, we next checked the con- 
sistency of the BP boundaries given by the 
parsers since they had slightly different crite- 
ria tbr BP segmentation fl'om each other. In 
this process, we tried to recover as many in- 
consistent boundaries as possible. For example, 
we tbund there were quite a few cases where 
a parser recoglfized a certain word sequence ms 
a single BP, whereas some other parser recog- 
nized the same sequence as two BPs. In such 
351 
0.965 
\[ASimple DNormal \[\]Class 
0.96 
0.955 
0.95 
A : 0.9607 \ 
\ 
na ¢o ~ ~0 
0.975 \[ 
e eg 
Figure 4: ll-point accuracy: A included 
0.96 ', \[.~Normal mClass 
0.95 
0.94 
0.93 
0.92 
9291 f 
5 g a0 
Figure 5: l 1-point accuracy: B/C included 
a case, we regarded that sequence as a single 
BP under a certain condition. An a result;, we 
obtained 13,990 sentences that can be accepted 
by all the parsers with all the BP boundaries 
consistent 2 We used thin set tbr training and 
evaluation. 
For cloned tests, we used 11,192 sentences 
(66,536 BPs a) for both training and tests. 
For open tests, we conducted five-fold cross- 
validation on the whole sentence set. 
2In the BP concatenation process described here, 
quite a few trivial dependency relations between neigl,- 
boring BPs were removed from the test set. This made 
our test set slightly more difficult tlmn what it should 
have 1)cert. 
3This is the total nmnber of BPs excluding the right- 
most two BPs for each sentence. Since, in Jal)anese, a 
BP ahvays depends on a BP following it, the right-most 
BP of a sentence does not (lei)(tnd on any other BP, and 
the second right-most BP ahvays depends on the right- 
most BP. Therefore, they were not seen as subjects of 
evahmtion. 
0.97 
DSimple \[21Normal mClass 
0.965 
0.97 
0.96 
0.955 
Figure 6: ll-point accuracy: +KNP 
For the classification of problems, we man- 
ually established the following twelve (:lasses, 
each of which is defined in terms of a certain 
nlol:phological pattern of depending BPs: 
1.. nonfinal BP wit, h a case marker "'wa (topic)" 
2. nominal BP with a case marker "no (POS)" 
3. nominal BP with a case marker "ga (NOM)" 
4. nominal BP with a case marker % (ACC)" 
5. nonlinal BP with a case marker "hi (DAT)" 
6. nominal BP with a case marker "de (LOC/...)" 
nominal BP (residue) 
adnominal verbal BP 
verbal BP (residue) 
adverb 
adjective 
residue 
4.2 Results and discussion 
Table 1 shown the total/ll-point accuracy of 
each individual model. The performance of each 
model widely ranged from 0.96 down to 0.86 
in ll-point accuracy. Remember that A is the 
optimal model, and there are two second-best 
models, B and C, which are closely comparable. 
In what tbllows, we use these achievements ms 
the baseline for evaluating the error reduction 
achieved by organizing a committee. 
The pertbrmanee of various committees is 
shown in Figure 4 and 5. Our primary inter- 
est here is whether the weighting functions pre- 
sented above effectively contribute to error re- 
duction. According to those two figures, al- 
though the contribution of the flmction Nor- 
mal were nor very visible, the flmction Class 
consistently improved the accuracy. These re- 
sults can be a good evidence tbr the important 
role of weighting flmctions in combining parsers. 
7. 
8. 
9. 
1(). 
11. 
12. 
352 
0.96 
0.94 
0.92 t ~ t II, \[iJ 
Figure 7: Single voting vs. Multiple voting 
While we manually tmill: the 1)roblem classiti('a- 
l;ion in our ext)erimen|;, autom;~I;ic (:lassitication 
te.chniques will also 1)e obviously worth consid- 
ering. 
We l;hen e.on(tucted another exl)e, rime.nI; to ex- 
amine the, et\['e(-l;s of muli;it)le voting. One (:an 
sl;raighi;forwardly sinn|late a single-voting com- 
nlil;tee by ret)lacing wij in equal;ion (7) with w~. i 
given by: 
, { wi.i (if' j = m'g m~xk 'wit~) 
=_ 0 (o|;he.wise) (S) 
The resull;s are showll in Figure 7, which 
corot)ares l;he original multi-voting committees 
and l;he sinmlai;e(t single-voi:ing (:olmnil;l;ees. 
Clearly, in our se|;tings, multil)le voting signif- 
icanl;ly oul;pertbrmed single vol;ing 1)arti(:ul~rly 
when t;he size of a ('ommii;tee is small. 
The nexl; issues are whel;her ~ (:Omlnil;te,(', al- 
ways oul;perform its indivi(tmd memt)ers, mtd if 
not;, what should be (-onsidered in organizing a 
commii;i;ee. Figure 4 and 5 show |;hal; COllllllil;- 
tees nol; ilmlu(ling t;he ot)timal model A achieved 
extensive imt)rovemenl;s, whereas the merit of 
organizing COlmnitl;ees including A is not very 
visible. This can be t)arl, ly attrilml;ed to the 
fa.ct that the corot)el;once of the, individual mem- 
l)ers widely diversed, and A signiti(:md;ly OUtl)er- 
forms the ol:her models. 
Given l,he good error reduct;ion achieved 
by commit, tees containing comt)ar~ble meml)ers 
sueh ~s BC, BD a, nd B@I), however, it should t)e 
reasonable 1;o eXl)ect thai; a (:omlnil,l,e,e includ- 
ing A would achieve a significant imt)rovement; if 
anol;her nearly ol)t;ilnal model was also incorl)o- 
0.8 
v 
0,7 
0.fi 
0.5 0.6 |1,7 {).g 0.9 
dependency probability 
Figure 8: P-A curves: +KNP 
rated. To empirically prove this assmnpl;ion, we, 
conduct;ed anot;her experiment, where we add 
another parser KNP (Kurohashi el; al., 1 !)94:) 1;o 
each commil;|;ee that apt)ears in Figure 4. KNI? 
is much closer to lnodel A in l;ol;al accuracy 
t;han t;he other models (0.8725 in tol;al accu- 
racy). However, il; does not provide. DP rea- 
l;rices since it is designed in a rule-l)ased fash- 
ion the current; version of KNP 1)rovides only 
the t)esl;-t)referrext parse t;ree for ea(:h inl)Ul; sen- 
tence without ~my scoring annotation. We l;hus 
let KNP 1;o simply vol;e its l;ol;al aeem:aey. Tim 
results art; shown in lqgure 6. This time all l;he 
commil;tees achieved significant improvemenl;s, 
wil;h |;he m~ximum e, rror re(hu:|;ion rate up l;o '3~%. 
As suggested 1)y |;he. re, suits of t;his exl)erimenl; 
with KNP, our scheme Mlows a rule-based 11011- 
t)~r;m,el:ric p~rse.r t;o pb~y in a eommil;l;e.e pre- 
serving it;s ~d)ilit:y t;o oui;t)ul; t)aralnel;rie I)P ma- 
(;ri(:es. To 1)ush (;he ~u'gumen(; fl,rl;her, SUl)pose 
;~ 1)lausil)le sil;ual;ion where we have ;m Ol)l;imal 
l)ut non-1)arametrie rule-based parser and sev- 
eral suboptimal si;atistical parsers. In su('h ~ 
case, our commil;teeA)ased scheme may t)e able 
l;o organize a commi|,tee that can 1)rovide l)P 
lnatri(:es while preserving the original tol;al ac- 
curacy of the rule-b~sed parser. To set this, we 
conducted another small experiment, where, we 
combined KNP with each of C and D, 1)oth of 
whi(:h are less compe.tent than KNP. The result- 
ing (:ommil;l;ees successflflly t)rovided reasonal)le 
P-A curves as shown in Figure 8, while even 
further lint)roving the original |;ol;al at:curacy of 
KNP (0.8725 to 0.8868 tbr CF and 0.8860 for 
DF). Furthermore, t;he COmlnittees also gained 
the 11-point accuracy over C and D (0.9291 to 
353 
0.9600 tbr CF and 0.9266 to 0.9561 for DF). 
These. results suggest that our committee-based 
scheme does work even if the most competent 
member of a committee is rule-based and thus 
non-parametric. 
5 Conclusion 
This paper presented a general committee- 
based frmnework that can be coupled with prob- 
abilistic partial parsing. In this framework, a 
committee accepts parametric multiple votes, 
and then standardizes them, and finally pro- 
vides a probabilistic distribution. We presented 
a general method for producing probabilistic 
multiple votes (i.e. DP matrices), which al- 
lows most of the existing probabilistic models 
for parsing to join a committee. Our experi- 
ments revealed that (a) if more than two compa- 
rably competent models are available, it is likely 
to be worthwhile to combine them, (b) both 
multit)le voting and vote standardization effec- 
tively work in committee-based partial parsing, 
(c) our scheme also allows a non-parametric 
rule-based parser to make a good contribution. 
While our experiments have so far been produc- 
ing promising results, there seems to be much 
room left for investigation and improvement. 
Acknowledgments 
We would like to express our special thanks to 
all the creators of the parsers used here for en- 
abling ~fll of this research by providing us their 
systems. We would also like to thank the re- 
viewers tbr their suggestive comments. 

References 

Brill, E. and J. Wu. Classifier Combination for ha- 
proved Lexical Disambiguation. In Proc. of the 
17th COLING, pp.191-195, 1998. 

Carroll, J. ,G. Minnen, Y. Cmming, S. Devlin and 
J. Tait. Practical Simplification of English News- 
paper Text to Assist Aphasic Readers. In Prvc. of 
AAAI-98 Workshop on Integrating Artificial In- 
telligence and Assistive Technology,1998. 

Charniak, E. Statistical parsing with a context- 
free grammar and word statistics. In Prvc. of the 
AAAI, pp.598 603, 1997. 

Collins, M. J. A new statistical parser based on bi- 
grmn lexical dependencies. In Proc. of the 3~th 
ACL, pp.184-191, 1996. 

Collins, M. J. Three generative, lexicalised models 
for statistical parsing. In Proc. of the 35th A CL, 
pp.16-23, 1997. 

Ehara, T. Estinlating the consistency of Japanese 
dependency relations based on the maximam en- 
trot)y modeling. Ill Proc. of the/~th Annual Meet- 
ing of The Association of Natural Language Pro- 
cessing, 1)t).382-385, 1998. (In Japanese) 

Fiscus, J. G. A post-processing system to yield re- 
duced word error rates: Recognizer output voting 
error reduction (ROVER). In EuroSpccch, 1997. 

Fk'ederking, R. and S. Nirenburg. Three heads are 
better titan one. In Proc. of the dth ANLP, 1994. 

Fujio, M. and Y. Matsmnoto. Japmmse dependency 
structure analysis based on lexicalized statistics. 
In Proc. of the 3rd EMNLP, t)I).87-96, 1998. 

Henderson, J. C. and E. Brill. Exploiting Diver- 
sity in Natural Language Processing: Combining 
Parsers. In Proc. of the 1999 Joint SIGDAT Con- 
fcrcncc on EMNLP and I/LC, pt).187--194. 

Jensen, K., G. E. Heidorn, and S. D. Richardson, 
editors, natural  processing: The PLNLP 
AppTvach. Kluwer Academic Publishers, 1993. 

Kanayama, H., K. Torisawa, Y. Mitsuisi, and 
J. Tsujii. Statistical Dependency Analysis with 
an HPSG-based Japanese Grainmar. In Proc. of 
the NLPRS, pp.138-143, 1999. 

Kurohashi, S. and M. Nagao. Building a Jat)anese 
parsed corpus while lint)roving tile parsing system. 
In Proc. of NLPRS, pp.151-156, 1997. 

Kurohashi, S. and M. Nagao. KN Parser : Japanese 
Dependency/Case Structure Analyzer. in Proc. of 
Th.e httcrnational Worksh.op on Sharablc Natural 
Lang'aagc Rcso'arccs, pp.48-55, 1994. 

Poole, D. Average-case analysis of a search algo- 
rithm fl)r estimating prior and 1)ostcrior probabil- 
ities in Bayesian networks with extreme 1)rot)abil- 
ities, thc i3th LICAL pp.606 612, 1993. 

Pedersen, T. A Simple AI)l)roach to Building En- 
sembles of Naive Bayesian Classifiers for Word 
Sense Dismnbiguation In Proc. of the NAACL, 
pp.63-69, 2000. 

Shirai, K., K. hmi, T. Tokunaga and H. Tanaka 
An empirical evaluation on statistical 1)arsing 
of Japanese sentences using a lexical association 
statistics, thc 3rd EMNLP, pp.80-87, 1998. 

Uchimoto, K., S. Sekine, and H. Isahara. Japanese 
dependency structure analysis based on maxi- 
mum entopy models. In Proc. of thc 13th EACL, 
pp.196-203, 1999. 

van Halteren, H., J. Zavrel, and W. Daelemans. hn- 
t)roving data driven wordclass tagging 1)y system 
combination. In Proc. of the 17th COLING, 1998. 
