A Hybrid Japanese Parser with Hand-crafted Grammar and 
Statistics 
Hiroshi Kanayama 1, Kentaro Torisawa 1:*, 
Yutaka Mitsuishi* a,nd Jun'ichi Tsujiit* 
\[ Tokyo Research Lal)oratory, IBM Jal)an, Ltd. 
1623-1 d Shimo-tsuruma, Yamato-shi, Kanagawa 242-8502, Jal)an 
:!: Del)artment of Inlbrnmtion Science, Graduate School of Science, University of 2bkyo 
7-a-1 Hongo, Bunkyo-ku, Tokyo 113-0033, .\]al)an 
• hfformation and Human Behavior, PII.I~3S'\].'O, Jal)an Scie, tce and Technology Corporation 
Kawaguehi lion-the 4-1-8, Kawaguchi-shi, Saitmna a32-0012, Japan 
, CCL, UMIST, U.K. 
{kanayama, torisawa, mitsuisi, tsujii}@is, s.u-tokyo, ac. jp 
Abstract 
This l)al)er (leseril)es a hybrid t)arsing method for 
Jal)anese which uses both a hand-crafted gram- 
mar and a statistical 1;e(:hniqlte. The key featm'e 
of our syst(nn is that in order to estimate likeli- 
hood for a parse tree, the systeln Ilses informa- 
tion taken from Mternative 1)artial parse trees gen- 
erated by the grammar. This utilization of alter- 
native trees enables us to construct a new statis- 
tical model called 'l}'it)let/Quadrul)let Model. We 
show that this model can capture a certain ten- 
dency in .\]apalmse syntactic structures and this point 
COlltril)lltes to ilnl)rovetllellt of l)al'sill~ acctlracy oil 
a shallow level. We rel)ort that, with an under- 
Sl)ecified HPSG-1)ased grammar and ~t maximum en- 
tropy estilnation, our parser achieved high a(:curacy: 
88.6% accuracy in del)endency analysis of the EDR 
annotated eorlms, and that it, outI)erformed oth(u' 
lmrely statistical l)arsing methods on the same cor- 
pus. This result suggests that 1)rol)er t reatlnent of 
hand-crafted gra, mnml'S ca,n contribute to l)arsiltg ac- 
curacy on a shallow level. 
1 Introduction 
There have been many attempts to combine hand- 
crafted high-level gramnmrs, such as FB-UfAG, 
HPSG and LFG, and statistical disambiguation 
techniques to ol)tain precise linguistic struc, tures 
(Schabes, 1992; Almey, 1996; Carroll el, al., 1998). 
One evident advantage of this apl)roaeh over lmrely 
statistical parsing techniques is that grammars can 
provide precise smnantie representations. However, 
considering that remarkable parsing accuracy in a 
shallow level has been achieved by purely statisti- 
cal techniques (e.g. Ratnal)arkhi (1997)), it may be 
thought more reasonable to use high-level gramnmrs 
just tin' 1)osti)rocessing which nmps results of shallow 
syntactical analyses onto dee 1) analyses. 
'l.'his work was conducted while the first ~mthor was a 
graduate student at Univ. of Tokyo. 
Figure 1: A tree M with a non-head daughter NH and 
a }mad daughter H. 
In this work we prol)ose that hand-crafl, ed high- 
level grammars (:all be useful in shallow-level analy- 
ses and statistical models. In our fl'amework, gram- 
mars are used to obtain precise features for probabil- 
ity estimation, which are difficult to obtain without a 
grannnar, and we show that such features contribute 
to high parsing accuracy on a shallow level. 
In this lmper, the most preferable parse trees are 
chosen with a statistical model. In our method, the 
likelihood value L(M) of a (partial) tree M in Fig- 
ure 1 is detined as in (1): 
L(M) dor L(NH) x L(H) x P('n ~ h) (1) 
where NH is M's non-head daughter (whose lexical 
head is n), H is the head-daughter (whose lexical 
head is h), and /)(n -~ h) is the probability of n 
t)eing related to h. For a. single lexical iteni W, L(W) 
is defined as 1.0. 
In most models already proposed, the probability 
P(n ~ h) is calculated with the conditional proba- 
bility (2): 
-, h) d°d P(T I ¢',,, %, (2) 
where T indicates that the dependency is true; (1)~ 
and q~h are attributes of 'n and h, respectively. And 
An,h, the distance between the two words, is widely 
used, because this attribute is believed to strongly 
affect whether those two words are going to be re- 
late& 
In contrast, in the statistical model proposed in 
this paper, P(n -~ It) depends not only on the at- 
trilmtes of the tree M, but also on alternative trees 
411 
Input smltence : < . . . n . . . hl . . . hi . . . ht . . . > 
Mi Mz 
M1 
n h,1 " " " ?z hi ~, h,l 
Figure 2: Pro'tiM trees whose non-head daughter's lexi- 
cal head is n. 
M 
l r 
Figure 3: ~h'anstbrmation fl'om a tree to a dependency. 
l' and r' denote the bunsets,~s l and r belong to, respec- 
tively. 
in the parse forest generated by the grmmnar. More 
precisely, when P(n --+ h) is calculated, we consider 
partial trees whose non-head daughter's lexical head 
is n, as displayed in Figure 2. Here alternative pos- 
sible hk (k = 1,-.., l) are taken into consideration, 
and ordered according to their distance to n. We 
call such set of hk modification candidates, and all 
modification candidates are placed together in the 
conditional part of the probability as in (3). Now 
assume h = hi. 
P(i I %,, %2,, %,,, %,) (3) 
where "i" indicates the ith candidate mnong the 
modification candidates. Equation (3) shows two 
important properties of our model. One point lies in 
the new distance metric. (3) is the probability that n 
chooses the ith candidate as the modifiee among the 
modification candidates which are ordered according 
to their distance to n. Thus, we no longer require 
the distance metric A~,h, instead we use the relative 
position among the modification candidates, which 
works as an attribute of the modification. The other 
point is the use of the attributes of the alternative 
parse trees, that is, attributes of the modifier and all 
its modification candidates are considered simulta- 
neously. We show that these techniques sophisticate 
our model, by providing linguistic examples in Sec- 
tion 3.2. 
In practice, however, treating all candidates is not 
feasible because of data-sparseness. We therefore 
apply a strategy of restricting the modification can- 
didates to at most three. The strategy and its justi- 
fication are discussed in Section 3.1. 
Applying the strategy to the equation (3), we ob- 
tain equations (4) and (5): 
P(It hi) de=f P(i I (i = 1, 2) (4) 
hi) der P(i I %,, %=, %,) (i = *, 2, t)(5) 
When there are only two candidates, equation (4) 
is used; otherwise, equation (5) is used. Our statis- 
tical model is called the ~Hplet/Quadrut)let Modal, 
which was named after the nmnbcr of constituents 
in the conditional parts of the equations. 
We report that our parsing framework achieved 
high accuracy (88.6%) in dependency analysis of 
Japanese with a combination of an underspecified 
HPSG-based Japanese grammar, SLUNG (Mitsu- 
ishi et al., 1998) and the maximum entropy method 
(Berger et al., 1996). Moreover, the resulting parse 
trees generated by our hybrid parser are legitimate 
trees in terms of given hand-crafted grammars, and 
we are expecting that we can enjoy advantages pro- 
vided by high-level gramnmr formalisms, such as 
construction of semantic structures. 
In the above explanation, we used the notion of 
lexical heads for the estimation of probabilities of 
trees for the sake of simplicity. But, in the present 
implementation, we use bunscts,Ls instead of lexical 
heads, and a relation on a tree is converted to a 
bunsetsu-dependency as shown in Figure 3. A bun- 
sctsu is a basic syntactic unit in Japanese. It consists 
of a content word and some flmctional morphemes 
such as a particle. 
In Section 2, we describe some existing statisti- 
cal parsers, and the Japanese grannnar which we 
adopted. Section 3 describes our statistical method 
and its adwmtages in detail. We report ext)erimental 
results in Section 4. 
2 Background 
In this section, we describe several models for 
Japanese dependency analysis and works on statisti- 
cal approaches with gramlnars. Next, we introduce 
SLUNG, the HPSG-based Japanese grammar which 
is used in our hybrid parser. 
2.1 Previous Dependency Analysis Models 
of Japanese 
Several statistical models for Japanese dependency 
analysis which do not utilize a lland-crafted granl- 
mar have been proposed. We evaluate the accuracy 
of bunsetsu-dependencies as they do, thus here we 
introduce thenl for comparison. All models intro- 
duced below are based on the likelihood value of the 
dependency between two bunsetsus. But they differ 
from each other in the attributes or outputs which 
are considered when a likelihood value is calculated. 
There are some models which calculate the likeli- 
hood values of a dependency between bunsetsu i and 
j as in (6), such as a decision tree model (Haruno et 
al., 1998), a maximum entropy model (Uchimoto et 
al., 1999), a model based on distance and lexical in- 
formation (Pujio and Matsumoto, 1998). Attributes 
(I)i and ~I,j consist of a part-of-speech (POS), a lexi- 
cal item, presence of a comma, and so on. And Ai,j 
412 
is the number of intervening bnnscts'us between i and 
j. 
p(i -~ j) d,j ~'Crl ,I,i, %, a~,j) ((0 
However, these lnodels Nil to reftect contextual 
information because attributes of the surrounding 
bunsets,tts are not considered. 
Uchimoto et al. (2000) proposed a model us- 
ing posterior context;. The model utilizes not only 
attributes about bunscts~s i, j but also attributes 
about all bunsets~> (including j) wlfich tbllow bun- 
setsu i. That is, instead of learning two output val- 
ues "T(true)" or ':F(false)" for the del)endency be- 
tween two bunsets~zs, three output values are used 
*br leanfing: the b~m.setsu i is "bynd (dependent on 
a bunsctsu beyond j)", "dpnd (del)endent on the 
b~tsets~t 3)" or "btwn (dependent on a b'unscts~t be- 
tween i and j)". The 1)robability is calculated by 
multiplying probabilities for all bunscts,~ls which tbl- 
low b~trtsctsu i as in (7). 'l'hey report that this kind 
of contextual information improves accuracy. How- 
ever, the model has to assume, the independency of 
all the random variables, which may cause some er- 
rors. 
P(i --, j) "°Z H ~'(by.d I ¢'~, %, &,k) 
i<k<j 
xP(dpnd I (1)i, il)5, Aid) x Hl'(btw,, \[ (I,i, q?k, A<k)(7) 
k>j 
The difference between our model and these pre- 
vious models are discussed in Section 3. 
2.2 Statistical Approaches with a grmnnmr 
There have been nlally l)rOl)osals tbr statistical 
t'rameworks particularly designed tbr 1)arsers with 
hand-crafted grmnmars (Schal)es, 1992; Briscoe and 
Carroll, 1993; Abney, 1996; Inui et al., 1!)97). The 
main issue in tiffs type of research is how to assign 
likelihoods to a single linguistic structure generated 
by a gramlnar. Some of tlmm (Briscoe and Carroll, 
1!)93; hmi et al., 1997) treat information on contexts, 
but the contextual intbrmation is de.rived only fl'om 
a structure to wlfich the parser is trying to assign 
a likelihood value. Then, tim major difference be.- 
tween their method and ours is that we consider the 
attributes of alternative linguistic structures gener- 
ated by the grammar in order to deternfine the like- 
lihood for linguistic structures. 
2.3 SLUNG : Japanese Grammar 
The Japanese grammar which we adopted, SLUNG 
(Mitsuishi et al., 1998), is an HPSG-based under- 
specified grammar. It consists of 8 rule schemata, 
48 lexical templates for POSs and 105 lexical entries 
for functional words. As can be seen fl'om these fig- 
ures, the granmmr does not contain detailed lexk:al 
information that needs intensive labor for develop- 
ment. However, it is precise in the sense that it 
aclfieves 83.7% dependency accuracy with a silnple 
heuristics 2 for the El)I{ almotated corl)us , and it 
can produce at least one parse tree for 98.4% sen- 
tences in the EDR annotated corpus. We use the 
grammar for generating parse tree forests, and our 
'l~'iplet/Quadruplet Model is used tbr picking Ul) a 
single tree fl'om a forest. 
3 The Hybrid Parsing Method 
This section describes tim procedure of parsing with 
the ~l"riplet/Quadrul)let Model. Our hybrid 1)arsing 
method proceeds as tbllows: 
• At; the beginning, dependency structures are 
obtained from trees generated by SLUNG. For 
each bunsctsu, modification candidates are enu- 
merated, and if there are four or more candi- 
dates, tlmy are restricted to three. The lmuristic 
used in this process is described in Section 3.1. 
• Then, with the ~'il)let/Quadruplef; Mode.l and 
maxinnnn entropy estimation, prol)abilities of 
the del)endencies are calculated. Secti(m 3.2 
discusses the characteristics and advantages of 
the model. 
• Finally, the most preferable trees for the whole 
sentence are selected. 
3.1 Restriction of Modification Candidates 
Kanayama et al. (1999) report that when mod- 
ification candidates are emnnerated according to 
SLUNG, 98.6% of the correct modifie.es are in one of 
the following three 1)ositions among the candidates: 
1;11(; nearest one from the modifier, the second nearest 
one, and the. farthest one. 
As a consequence, we can siml)lil\[y I;11(; problem 
by considering only these three candidates and dis- 
carding tim other candidates, with only 1.4% poten- 
tial errors. We therefore assume that the. number of 
modification candidates ix always three or less. 
This idea is sinfilar to that of Sekine (2000)'s 
study, which restricts the candidates to five, i)ut in 
his case, without a granmmr. 
3.2 The Triplet/Quadruplet Model 
The 'Diplel,/Quadruplet Model calculates the like- 
lihood of the dependency between bunsetsu i and 
bunsctsu cn; P(i --, cn) with the formulas (8) and 
(9), where c,~ denotes the nth candidate among b,m- 
sctsu i's candidates; (I,i denotes some attributes of 
i; and ~I~¢,~ denotes attributes of c,~ (including at- 
tributes between i and cn). 
P(i -~ c,d dJ P(n I ¢,~, %.,, %~) (.n = 1,2)(8) 
P(~ -~ c,~) ,,~r p(,~ I ¢'i, %,, ,I,~, ,I%) (,n = 1, 2,/)(9) 
2This heuristics is a Japanese version of a left-association 
rule: see (Mitsuishi et M., 1998) for detail. 
413 
As (8) and (9) suggest, the model considers at- 
tributes of the modifier bunsetsu and attributes of all 
modification candidates simultaneously in the condi- 
tional parts of the probabilities. Moreover, what is 
calculated is not tile probability of "whether the de- 
pendency is correct (T, see Formula(6))", but the 
probability of "which of tile given candidates is cho- 
sen as tile nlodifiee (n =1, 2, or 1)". These charac- 
teristics imply the fbllowing two advantages. 
Advantage 1 A new distance metric. The correct 
modifiee can be chosen by considering relative 
position among grannnatically licensed candi- 
dates, instead of the absolute distance between 
bunsets~as. 
Advantage 2 2)'eating alternative trees. The can- 
didates are taken into consideration simultane- 
ously. But because the nlodifica£ion candidates 
are restricted to at most three, we considerably 
avoid data-sparseness 1)rot)lems. 
Below we discuss these advantages in order. These 
advantages clarify the differences fl'om previous 
models described in Section 2.1, and are empMcally 
confirmed through the experiments in Section 4. 
3.2.1 Advantage 1 : A new distance metric 
As discussed in Section 2.1, the distance metric Ai,j 
used in previous statistical methods was obtained 
simply by counting intervening words or b'unscts,t~ 
l)etween i and j. On the other hand, we use the rel- 
ative position among the modification candidates as 
the distance metric. Tile following examples illus- 
trate a difference between those two types of melric. 
The correct modifiee of kare-ga is hashir'u-no-wo in 
both (10a) ~u~d (lOb). 
(10)a. kare-ga hashiru-no-wo mira koto 
he-SUBJ mm see fact 
(the fact that I saw him run) 
b. kare-ga yukkuri hashiru-no-wo mira koto 
he-SUB.} slowly run see fact 
(the fact that I saw him run slowly) 
In previous models, (10a) and (10b) would yield, 
P. ( kare-o,~--* t~ashir'u-no-wo)=P(T I kar,~-ga, h,~shiru-no-~vo,A1) 
\])b(kare-ga--+ hashi,.tt-?zo-wo)=\])(~l'll~a~ve-ga, hashi,'u-,zo-wo,A2) 
respectively, where A1 = 1 and A 2 = 2. Then, the 
two probabilities above do not have the same value 
in general. 
Our grammar does not allow the dependency 
"kare-.qa --~yukkurY tbr (10b). The modification 
candidates of karc-ga are hashiru-no-wo and mita, 
hence (8) gives the probabilities between kare-ga and 
hashiru-no-wo as follows, in both examples. 
\]~ ( kare-ga -~ hashiru- no- wo ) 
= Pb(karc-,qa --~hashiru-no-wo) 
= P(llkare-ga, hashiru-no-wo, mita) 
Thus, P(kare-ga --+hashiru-no-wo) has the same value 
for both examples. Our interl)retation of this difl'er- 
enee is sumnlarized as follows. The word yukk'uri is 
an adverb modifying the verb h, ash&'u. Our linguis- 
tic intuition tells us that the presence of such adverb 
should not affect the strength tbr the dependency 
between kare-ga and hashiru-no-wo. According to 
this intuition, the existence of the adverb should be 
considered as a noise. Our model allows us to ignore 
such a noise in learning from annotated corpus, while 
previous nlodels are atfected by such noisy elements. 
3.2.2 Advantage 2 : Treating alternative 
trees or contextual information 
Consider the following examples. 
(11) a. Ta~v-no kawaii musume 
NP Adj NP 
Taro-POgS 1)retty daughter 
(~\[h.ro's pretty daughter) 
b. Taro-no yuujin-no musumc 
NP NP NP 
Taro-POSS friend-POSS daughter 
(Taro's fl'iend's daughter) 
Contrary to tim previous examl)les, TaTv-no ill 
(11) ntodifies different nlodification candidates. In 
example (11a), "~hr'o-no --+musume" is the correct 
dependency while "Taro-no -~musume" is not cor- 
rect in (11l)). This difference is caused t)y the b'u'a- 
setsu between Taro-no and musume, kawaii (Adj) 
in (lla) and y,u~lfin-no (NP) in (llb). Actually, the 
grannnar allows Taro-no to depend on either of these 
types of words. Thus, in our model, 
/',('late-no --, musume) 
l't, ( ~1aro-7~o --. m~*sume) 
= P(21 
Then, P(varo-no-+musume) has different values 
for the two examples, hi the annotated corpus, 
l'(21~laro-no, kawaii, musume) tends to have a high 
value since kawaii is an adjective. However, since 
yuujin-no is an NP, P(2\[Taro-no, yuujin-no, musume) 
tends to have a low value. 
Now consider previous models. 
Pb(Taro-,~o--+ m**s~mz~) = P(TI Tin'o-no, musume, 2) 
Then, contrary to our model, P(Taro-no --~musumc) 
lms exactly the same wdue for both examples. The 
outconle is determined by 
= P(TI Taro-no, kawaii, 1) 
In text corpora, P(TITaro-no , yu~,jin-no, 1) tends 
to be high, and consequently, P(T ITaro-no, musume, 
2) is very small. These values will make the correct 
prediction for (111)) as yuujin-no will be favored over 
musume. However, for (11a), these models are likely 
to incorrectly favor kawaii over musume. This is 
414 
because 1'('.171 Tin'o-no, mus'ume, 2), being very small, is 
likely to be snlaller than P(T\] :late-no, t~,,waii, 1). 
4 Experiments and Discussion 
\].'his section reports a series of parsing experiments 
with our mode, l, and gives some discussion. 
4.1 Environlnents 
We used the EDR ,lal)anese Corl)us (El)R, 1996) 
for training and evaluation of 1)arsing accuracy. The 
EI)R Corpus ix a ,Japanese treebank which consists 
of 208,1.57 sentences from newspapers and maga- 
zines. We. used 192,778 sentences for training, (1,744 
for pro-analysis (as reported in Section 3.1), and 
3,372 tbr testing 3. 
With tril)lets constitute(\] of a modifice and two 
modification eandida.te.s extractc(l ti'onl the learn- 
ing corl)uS l;hc Triplet Model is ('.onstructed. \Vith 
the quadruplets constituted of a moditiee and three 
candidates, the Quadruplet Model is constructed. 
'£hese~ inodels arc estimated by the ChoiccMaker 
Maxinmm Entropy Estimator (Borthwick, 19!)9). 
The features fin' the estimation are listed in Ta- 
/)le 1. The values partially folk)w other researches 
e.g. Uchimoto el; al. (\]999), and JUMAN's outputs 
are used for POS classification. Mainly the head of 
the b'unsc.tsu (the rightmost morl)helnc in a b'unscts'u, 
except for whose major POS is "peculiar", "auxiliary 
verb", "particle", "suffix" or "copula") and type of 
the b'ltnscts'u (the rightmost morphenm in a b'wnsel.s'lt 
except tbr whose major P()S is "l)eculiar") are used 
as thc at.tributes. \~;e show the meaning of some 
f('atures below. 
POS JUMAN's minor \]?()S (for both "head" and 
':type"). 
particle, adverb Frequent words: 26 lmrticles and 
69 adverbs. 
head lex 2.(14 lexical forms regardless of their POS. 
type lex 70 suffixes or auxiliary verbs. 
inflection 6 types of inttcction : "normal", "a(lver- 
l)ial", "adnominal", "tc-fornf', "ta-tbrm", and 
"others". 
The cohmm %aria(ion" in Tal)h; 1 denotes the 
mnnbcr of possible values tbr the feature. "Valid 
features" indicates the nmnber of features which al)- 
peared three times or more in the training corlms. 
4.2 Results 
Wil;h our model and the features described above, 
the accuracy shown in Tal)lc 2 is achieved. We oval- 
uate the following two tyl)eS of accuracy: 
35,263 SOld,enccs were rOllloved 1)eCmlSe the order of the 
words in the annotal;ion ditl'ered front that in the original 
SOlltOllCeS. 
\[ I i \,,re,it, ........ <,~ Ill Peal, ure type Val'iation 'l'rip. \] Q,md, 
I IIoad I'OS ¢~f nmdifier 24 ,12 6,t 
2 Type P()S of illodil'ier 34 0(l !)D 
3 l'artich, of l/led(tier 27 ,t7 7:{ 
4 Adverb ,fl' in(~difior 70 131 103 
5 Tylm l(}X (if iiltldifiol" 71 110 225 
d Inflecti(nl ()|" i.lodifi(n" {; 12 18 
7 \Vh(!thl!r lxl(iditi(!r has a COlllllllt *2 4 ~- 
8 lIead I'OS ,ff ltH~dil'ioo 24 7(I 158 
9 Type I'OS of lnqMi|'iee 34 96 231 
10 lh,ad \]¢~x ~,1 ini)difieo 2D5 llG,I 2597 
11 lhtrt;iclo \[d" Inodilie(~ 27 92 20.1 
12 'l'ypo Iox o| illodi|i/~o 71 210 454 
13 \]nlh~l:t.itHI of In(.difle(, 6 2,1 5:1 
1.l ~.Vhothor Inodil'ioc~ has it GI)IlllIIII 2 8 18 
15 \Vhethor nlodifioe has %rod' 2 8 18 
IG ~Vhol,h~r inodlfioe lilts "/o" 2 6 17 
17 :\]/ (if (~()1\[111~ii~ b(!tlg(~(!ll l;l';¢l IHtlt.utlt.uos ,1 16 36 
18 # i)f "ilia" l)tl~,~v(!(*lI t~V() bltYl~t:~gllH 3 12 27 
19 2 X 8 816 1187 2727 
20 2 x 7 x 14 13G 38(} 8711 
21 3 X 10 7905 6,t(15 13,10:{ 
22 2 x 9 1156 1213 311)8 
23 3 X 11 729 618 1637 
2.1 2 X 11 i118 1025 2.194 
'25 2 X 12 241,t 1483 351'1 
26 2 x 3 x 7 x 8 132192 1331 3O58 
27 1 X 2 X 6 X 8 X 13 7()5024 (i(i05 1,t7{10 
I t 'r''':''~ I -I '224:':~ I """~" J 
Table 1: Used f'eaturcs : l,k,,atures from 8 to 27 are re- 
lated to the nm(litiee, thus they are considered for each 
candidate, li'eatures from 19 to 27 are combination fea- 
\[;IlI'CS. 
I II-COVCI'a~O 
S(~.l ltOl ICCS 
\]htnset.su accuracy 88.ooX,(23078/2(i062) 
Sentence accu,'acy /16.{)0% (1560/3326) 
All lhm.set.su accuracy 88.33% (2335()/26436) 
sentences ,qentcncc~ accuracy 46.35% (1563/3372) 
Tal)h; 2: l{csulls of parsing with the Tril)let/Quadrul)let 
Modcl. 
Bunsetsu accuracy The percentage of bu.n,~cts'us 
whose rood(rice is correctly identified. The dc- 
nonfinator includes all b'unsets'us except for the 
last bun,~cts'u of a sentence. 
Sentence accuracy The percentage of sentences 
whose detmndencies art'. perfectly correct. 
"h>coverage sentences" is the accuracy for the 
sentences flw which SLUNG could generate parse 
trees. We give the accuracy for "All sentences" too, 
by 1)art(ally 1)arsing sentences which SLUNG fail to 
parse. The coverage of SLUNG is al)out 99%, thus 
high accuracy is achieved even for "All sentences". 
Moreover, we conducted a series of experiments 
in order to evaluate the COld;ribution of each charac- 
teristic in our parsing model. The parsing schemes 
used are the four in Figure 3. Major differences 
among them are (I) whether a gralnlnar is used, 
(II) whether modification candidates are restricted 
to three, and (III) whether a previous pair model 
with Formula (6) or the 'lS'iplet/Quadrulflet Model 
with Formula (8),(9) was used. 
W/O Grammar Model This model does not use 
a grammar. Likelihood values for dcpenden- 
4I 5 
W/O Grammar 
W/O Restriction 
Pair 
~IMplet/Quadruplet 
I G R P 
P 
\]~Un8gts~L accuracy 
86.70%(22594/26062) 
+ - P 87.37%(22770/26062) 
+ + P 87.67%(22849/26062) 
+ + T 88.55%(23078/26062) 
Table 3: Bunsetsu accuracies for four models. Cohmm 
"G" indicates whether the grmmnar is used, "R" indi- 
cates whether the modification candidates are restricted 
to three, and "F" denotes the formula; "P" is the pair 
tbrmula (6), and "T" is the %'iplet/Quadruplet formula 
(s), (9). 
cies are calculated for all bunsctsiLs that follow 
a modifier bunsctsu. Formula (6) is used, and 
as a distance metric Ai,j, the mnnl)er of bun- 
scts~ls between the modifier and tile modifiee 4 
are combined with all features. In general lines, 
this model corresponds to models such as (Fu- 
lie and Matsumoto, 1998; Haruno et al., 1998; 
Uchimoto et al., 1999). 
W/O Restriction Model Modification candi- 
dates are restricted by SLUNG. Tim remaining 
is the same as the W/O Grannnar Model. 
Pair Model Modification candidates are restricted 
to three, in the way described in Section 3.1. 
The remaining is the same as W/O Grannnar 
Model. 
Triplet/Quadruplet Model Tiffs is the model 
proposed in the paper. Modification candidates 
are restricted to tln'ee, and Fornmla (8) or (9) 
are used. 
From the result shown in Table 3, we can say 
our method contributes to the improvement of our 
parser, because of the following reasons: 
• The %'iplet/Quadruplet Model outperforms the 
Pair Model by 0.9%. Both of them restricts 
modification candidates to three, l)nt tim accu- 
racy got higher when all candidates are consid- 
ered simultaneously. It is because of the two 
adwmtages described in Section a.2. 
• TILe Pair Model outperforms the W/O Restric- 
tion Model by 0.3%. Thus the restriction of 
modification candidates does slot reduce tile ac- 
curacy. 
• TILe W/O Restriction Model outperforms tile 
W/O Grammar Model by 0.7%. This means 
that the use of a grammar as a preprocessor 
works well to pick up possible modifice. 
We found that many structures similar to the 
ones described iLL Section 3.2 appeared in the EDR 
4Three vahms: "1", "from 2 to 5", "6 or more" are distin- 
guished. 
In-coverage \]3unsct.su accuracy 87.08% (8299/9530) 
sentences Sentence accuracy 44.70% (493/\]103) 
Table 4: Accuracy tbr Kyoto University Corpus 
corpus. Our Tl'iplet/Quadruplet model could treat 
these structures precisely as we intended. Tlfis is the 
main factor that contributed to the improvement of 
the overall parsing accuracy. 
Based on tim above experiments, we can say that 
our approach to use the grammar as a preprocessor 
before the calculating of the probability is appropri- 
ate for the improvement of parsing accuracy. 
4.3 Comparison to other models 
4.3.1 Models using the EDR corpus 
There are several works which use the EDR corpus 
for evaluation. The decision tree model (Haruno et 
al., 1998) achieves around 85%, the integrated model 
of lexical/syntactic information (Slfirai et al., 1998) 
achieves around 86%, and the lexicalized statistical 
model (Ft0io and Matsumoto, 1999) achieves 86.8% 
in bunsets'u accuracy. Our model outperforms all of 
them by 2 or 3%. 
4.3.2 Models using the Kyoto corpus 
Slfirai et al. (1998) used the Kyoto University text 
corpus (Kurohashi and Nagao, 1997) for evaluation 
and achieved around 86%. Uclfimoto et al. (2000) 
also used the Kyoto corlms , and their accuracy was 
87.9%. For comparison, we applied our method to 
the same 1,246 sentences that Uclfimoto et al. (2000) 
used. The result is shown in Table 4. 
Our result is worse than theirs. The reason is 
thought to l)e as follows: 
• g~re use tim EDR corpus for training. Although 
we used around 24 times the amount of train- 
ing data that Uchimoto et al. used, our training 
data lead to ca'ors in tile analysis of the Kyoto 
Corpus, because of differences in tile mmotation 
schenms adopted. 
• Uchimoto et al. used the correct morphological 
analyses, but we used JUMAN. Solnetimes this 
may cause errors. 
• The grammar SLUNG was designed for tile 
EDR corpus, and some types of structures in 
the Kyoto Corpus are not allowed. 
Clearly, our parser should be improved to overcome 
these problems and compared with other works di- 
rectly. 
4.4 Discussion and I~lture Work 
TILe following are some observations about the speed 
of our parser. Existing statistical parsers are quite 
etficient compared to grammar-based systems. Par- 
ticularly, our system used an HPSG-1)ased grmmnar, 
416 
whose speed is said to be slow. However, recent ad- 
vances in HPSG 1)arsing (~Ibrisawa et al., 2000) en- 
abled us to obtain a unique parse tree with our sys- 
gem in 0.5 sec. in average tbr sentences in the EDR 
corpus. 
Future work shall extend SLUNG so that senmntie 
representatkms are produced. Carroll el; al. (1.998) 
discussed i;he 1)recisiol~ of argument si;ruetures. V~Te 
1)elieve that the focus of ore' study will shift; from a 
shallow level to such a deeper level for ()Ill' tinal aim, 
realization of intelligent natural  processing 
systems. 
5 Conclusion 
\¥e 1)resenl;ed a hyl)rid 1)arsing scheme l;hat uses a 
hand-crafted grammar and a statist.teal technique. 
As other hybrid pa.rsing ntethods, l;he st.al;isi;ical 
technique is used for 1licking u 1) the most l)re, ferable, 
lmrs(; ire(; fl'om l;he parse fol"(;sI; gent'.rai;e,d I)y t;h(~ 
grammar. The difference fl'om other works is that 
the precise contexi;ual information needed to esti- 
mate |;he likelihood of a parse, 1;ree is obtained fl'ont 
adternative 1)arse trees generated 1)5' the grammar, 
and that such contextual information from alterna- 
tive I;rees enables Its to eonsl;ruel; our new statisti- 
cal model called the 'l¥iplet/Quadruplet model. We 
have shown that these poinl;s contributed to sul)sl;an- 
tia l illlprovenlenl; of parsing acellra(:y ill ,lal)ane~se dc- 
1)en(lency analysis, through a serie, s of ext)(~riments 
using an IiPSG-based .lalmnese grammar SLUNC, 
and the, maxinmm entropy method. 

References 

St;even Abney. 1996. Sl;ochasti(: aH;ribut(',-vahm 
grannnars. The Computation and \]\]anguage E- 
Print Archive, October. 

Adam L. Berger, Stephen A. Della Pietra, and Vin- 
cent. J. Della Pietra. \]996. A itiaxilnuln entropy 
approach to natural languag('~ processing. Compu- 
tatio'n, al Li'n.gui.stics, 22(1.)::/9 71. 

Andrew Borthwiek. 199.(). Choieemaker maximmn 
entropy estimator. ChoieeMaker 'lbch., Inc. Email 
borthwic~cs .nyu. edu for information. 

~\[l'~d Briseoe and John Carroll. 1993. Generalized 
1)robabilistic LR parsing of natural  (col 
1)Ol'~t) with unifieation-I)ased gramnmrs. Compu- 
tational Linguistics, 19(1):25-50. 

Jolm Carroll, Guido Minnen, and ~lL'd Briscoe. 1998, 
Can subeategorisation probabilities help a statis- 
tical parser? In Proc. of th, e 5th, ACL/SIGDAT 
Workshop on Very Lawe Corpora, pages 118 126. 

EDR. 1996. EDR (Japan Electronic Dictionary Re- 
search Institute, Ltd.) dictionary version 1.5 tech- 
nical guide,. Second edition is awdlable via 
http ://www. iijnet, or. j p/edr/E_TG .html. 

Masakazu Fujio and Yuji Matsumot;o. 1998. 
,htl)anes<', <lel)endeney structure analysis 1)ased on 
lexicalized statistics. In PTvc. of the 3rd Cm@r- 
ence on Empirical Methods in Natural Language 
Procc.ssin 9, pages 88 96. 

Masakazu FI0io and Ymtji Ma£sumoto. 1999. Sta- 
tistieal syntactic analysis based on co-occurrence 
probability of words. In P'roc. of 5th workshop 
of Nat'u~nl Language Processing, pages 71 78. (in 
Jal)altese ) . 

Masahiko Haruno, Satoshi Shirai, and goshiflmfi 
Ooyama. 1998. Using decision trees to construct 
a. practical parser. In Prec. COLING ACL '98, 
pages 505 -511. 

Kentaro hmi, Virach Sornlertbunwmich, Hozumi 
Tanaka, and Takenobu 9bkmmga. 1997. A new 
l>robal)ilistic LR  lnodel t'(31' statistical 
parsing. 'l'echnical I{eport TR974)005, Dept. of 
Coml)uter Science, Tokyo Institute of 'lbehnology. 

l liroshi Kanayanm, Kentaro 'l.brisawa, Yutaka Mit- 
suishi, and Jun'i(:hi Tsujii. 1999. Statistical de- 
1)e, ndency analysis with an HPSG-1)ased Jal)anese 
grannnar. In P'roc. 5th NLPRb', pages 138-143. 

Sadao I(urohashi att(l Makoto Nagao. 1.997. Kyoto 
University text corpus in'ojecK In Prec. of 3rd 
Ann'ual M(~cti~ N of Nat,u, raI Language i)roccssi'ng, 
l)ages 115 118. (in Japanese). 

Yutaka Mii;suishi, Kentaro Torisawa, and Jun'ichi 
Tsujii. 1.(/98. HPSG-stsde undersl)e(:ified .Japanese 
grammar with wide coverage. In P'mc. COLING- 
ACL '98, 1)ages 876 880, Augusl;. 

Adwait llatnalmrkhi. 1997. A linear obse, rved tinl(; 
statistical lm.rser based Oll maximum entropy 
models. In P'mc. th.c Empirical Mt~thods in Nat'u- 
'ral \])a,'n,guag(: \])'roce, ssi'n,9 Co~@rence. 

Yves Sehabes. \]992. Stoclmsti(: lexicalize, d tree- 
adjohfing granmwms. In P'mc. 1dth COLINO, 
pages d26 432. 

S~toshi Sekim,. 2000. Japanese dependency analysis 
using a deternfinistic, tinite state, transdue(;r. In 
Prec. COLING 2000. (this proceedings). 

Kiyoaki Shirai, Kentaro huff, Takenolm Tokunaga, 
and Hozumi Tanaka. 1998. A framework of inl;e- 
grating ,syntactic and lexical statistics in si;atisti- 
cal 1)arsing..lo,urnal of Nat'ural Langua9c l)~vccss - 
int.\], 5(3). (in Japanese). 

Kentaro 'Ibrisawa, Kenji Nishida, Yusuke Miyao, 
and Jun'ichi Tsujii. 2000. An HPSG parser with 
CFG filtering. Jounal of Nat'mal Language E'n, gi- 
nccrin.q. (to al)pear ). 

Kiyotaka Uchimoto, Satoshi Sekine, and Hitoshi 1sa- 
hara. 19!19. Japanese dependency structure anal- 
ysis based on maximum entropy models. In P'roc. 
13th EACL, pages 196 -203. 

Kiyotaka Uchimoto, Masaki Mural;a, Satoshi Sekine, 
and Ititoshi isahara. 2000. \])el)endeney model us- 
ing posterior context. In Prec. of Sixth, intcrna- 
lionel Workshop on Parsing 7'cch, nolo9ics. 
