Backward Beam Search Algorithm 
for Dependency Analysis of Japanese 
Satoshi Sekine 
Computer Science Department 
New York University 
715 Broadway, 7th floor 
New York, NY 10003, USA 
sekine@cs, nyu. edu 
Kiyotaka Uchimoto Hitoshi Isahara 
Communications Research Laboratory 
588-2 Iwaoka, Iwaoka-cho, Nishi-ku, 
Kobe, Hyogo, 651-2492, Japan 
\[uchimoto, isahara\] @crl. go. j p 
Abstract 
Backward beam search tbr dependency analy- 
sis of Japanese is proposed. As dependencies 
normally go fl'om left to right in Japanese, it is 
effective to analyze sentences backwards (from 
right to left). The analysis is based on a statisti- 
cal method and employs a bemn search strategy. 
Based on experiments varying the bemn search 
width, we found that the accuracy is not sen- 
sitivc to the bemn width and even the analysis 
with a beam width of 1 gets ahnost the stone de- 
pendency accuracy as the best accuracy using a 
wider bemn width. This suggested a determin- 
istic algorithm for backwards Japanese depen- 
dency analysis, although still the bemn search 
is eitbctive as the N-best sentence accuracy is 
quite high. The time of analysis is observed to 
be quadratic in the sentence length. 
1 Introduction 
Dependency analysis is regarded as one of the 
standard methods of Japanese syntactic anal- 
ysis. The Japanese dependency structure is 
usually represented by the relationship between 
phrasal units called 'bunsetsu'. A bunsetsu nsu- 
ally contains one or more content words, like a 
noun, verb or adjective, and zero or more func- 
tion words, like a postposition (case marker) 
or verb/noun sul~\[ix. The relation between two 
bunsetsu has a direction front a dependent to 
its head. Figure 1 shows examples of 1)unsetsu 
and dependencies. Each bunsetsu is separated 
by "I"" The first segment "KARE-HA" consists 
of two words, KARE (He) and HA (subject case 
marker). The numbers in the "head" line show 
the head ID of the corresponding bunsetsus. 
Note that the last segment does not have a head, 
and it is the head bunsetsu of the sentence. The 
task of the Japanese dependency analysis is to 
find the head ID for each bunsetsu. 
The analysis proposed in this paper has two 
conceptual steps. In the first step, dependency 
likelihoods are calculated for all possible pairs 
of bunsetsus. In the second step, an optimal de- 
pendency set for the entire sentence is retrieved. 
In this paper, we will mainly discuss the second 
step, a method fbr finding an optimal depen- 
dency set. In practice, the method proposed in 
this paper should be able to be combined with 
any systems which calculate dependency likeli- 
hoods. 
It is said that Japanese dependencies have the 
tbllowing characteristics1: 
(1) Dependencies are directed from left to right 
(2) Dependencies don't cross 
(3) Each seglnent except the rightmost one has 
only one head 
(4) In many cases, the left; context is not nec- 
essary to determine a dependency 
The analysis method proposed in this paper as- 
sumed these characteristics and is designed to 
utilize them. Based on these assumptions, we 
can analyze a sentence backwards (from right 
to left) in an efficient manner. There are two 
merits to this approach. Assume that we are 
analyzing the M-th segment of a sentence of 
length N and analysis has already been done 
for the (M + 1)-th to N-th segments (M < N). 
The first merit is that the head of the depen- 
dency of the M-th segment is one of the seg- 
1Of course, there are several exceptions (S.Shirai, 
1998), but the frequencies of such exceptions are neg- 
ligible compared to the current precision of the system. 
We believe those exceptions have to be treated when the 
problems we are facing at the moment are solved. As- 
sumption (4) has not been discussed very much, but our 
investigation with humans showed that it is true in more 
titan 90°./0 of the cases. 
754 
ID i 2 3 4 5 6 
KARE-HA \[ FUTATABI I PAI-W0 \[ TSUKURI, I KANOJO-NI I 0KUTTA. 
(He-subj) (again) (pie-obj) (made ,) (to her) (present) 
Head 6 4 4 6 6 - 
Translation: He made a pie again and presented it to her. 
Figure 1: Exmnt)le a JaI)anese sentence, 1)unsetsus and det)endencies 
ments between M + 1 and N (because of as- 
sumption 1), which are already analyzed. Be- 
cause of this, we don't have to kce 1) a huge lnlln- 
1)er of possible analyses, i.e. we can avoid some- 
thing like active edges in a chart parser, or mak- 
ing parallel stacks in GLR parsing, as we can 
make a decision at this time. Also, we can use 
the beam search mechanism, 1)y keet)ing only a 
certain nmnl)er of.analysis candidates at (',ach 
segment. The width of the 1)(;am search can 1)c, 
easily tuned and the memory size of the i)ro- 
(:ess is l)rot)ortional to the 1)roduct of the inl)ut 
sentence length and tile boron search width. 
The other merit is that the possit)le heads 
of tile d(~l)en(lency can t)e narrowed down 1)c- 
cause of the ~ssuml)tion of non-crossing det)en- 
(lencies (assumption 2). For exani1)le , if the 
K-th seglll(;nl; dCl)ends on the L-tll segnient 
(A4 < \]~ <~ L), then the \]~J-th segillent (:~l~n't 
depend on any segments between 1~ and L. 
According to our experilnent, this reduced the 
numl)er of heads to consider to less than 50(X~. 
The te(:hnique of backw~trd analysis of 
,lal)anese sentences has 1)een used in rule-based 
methods, for example (Fujita, 1988). How- 
ever, there are several difficulties with rule- 
based methods. First the rules are created by 
hmnans, so it is difficult to have wide cover- 
age and keel) consistency of the rules. Also, it 
is difficult to incorporate a scoring scheme in 
rule-1)ased methods. Many such met;hods used 
hem'isties to make deterministic decisions (and 
backtracking if it; fails in a sear(:hing) rather 
l;han using a scoring scheme. However, the com- 
1)ination of the backward analysis and the sta- 
tistical method has very strong advantages, one 
of which is the 1)emn search. 
2 Statistic framework 
We. coin|lined tile backward beam search strat- 
egy with a statistical dependency analysis. 'rile 
det~fil of our statistic framework is described 
ill (Uehimoto et al., 1999). There have been 
a lot of prol)OS~fls for statistical analysis, in 
ninny s, in particular in English and 
Japanese (Magerman, 1995) (Sekine and Grish- 
man, 1995) (Collins, 1997) (I/atnal)arkhi, 1997) 
(K.Shirai et.al, 1998) (Fujio and Matsnlnoto, 
1998) (Itaruno ct.al, 1997)(Ehara, 1998). One 
of the most advance(t systems in English is l)ro- 
posed 1)y I{atnaparkhi. It, uses the Maximum 
Entropy (ME) model and both of the accuracy 
and the speed of the system arc among the best 
ret)ortcd to date. Our system uses the ME 
model, too. in the ME model, we define a set 
el! \]2~,atlll'eS which arc thought to l)e uscflfl in 
del)ealden(:y analysis, and it: learns the weights 
of the R~atures fl'om training data. Our t~ntttres 
in(:lude part-of-st)eech, inflections, lexical items, 
the existence of a contain or bra(:ket 1)etween 
the segments, and the distmme between the seg- 
ments. Also, confl)inations of those features are 
used as additional fe, atures. The system eal- 
(:ulates the probabilities of dependencies based 
on the model, which is trained using a training 
corpus. The probability of an entire sentence is 
derived from the 1)roduct of tile probal)ilities of 
all the dependencies in the sentence. We choose 
the analysis with the highest probafl)ility to be 
the analysis of the sentence. Although the ac- 
curacy of the analyzer is not the main issue of 
the t)al)er, as any types of models which use de- 
1)endency 1)rol)al)ilities can be iml)lelnented by 
our method, the 1)ertbrmance ret)orted in (Uchi- 
lnoto et al., 1999) is one of the best results re- 
ported by statistic~flly based systems. 
755 
3 Algorithm 
In this section, the analysis algorithm will be de- 
scribed. First the algorithm will be illustrated 
using an example, then the algorithm will be 
formally described. The main characteristics of 
the algorithm are the backward analysis and the 
beam search. The sentence "KARE-HA FUTATABI PAI-W\[I 
TSUKURI, KANOJ0-NI 0KUTTA. (He made a pie 
again and presented it to her)" is used as an in- 
put. We assume the POS tagging and segmen- 
tation analysis have been done correctly before 
starting the process. The border of each seg- 
ment is shown by "1". In the figures, the head of 
the dependency for each segment is represented 
by the segment number shown at the top of each 
segment. 
<Initial> 
ID 1 2 3 4 5 6 
RARE-HA \[ FUTATABI \[ PAI-WO \[ TSUKURI, \[ KANOJO-NI I OKUTTA. 
(He-subj) (again) (pie-obj) (made ,) (to her) (present) 
................................................................. 
Algorithm 
1. Analyze np to the second segment from the 
end 
The last segment has no dependency, so we 
don't have to analyze it. The second seg- 
ment fl'om the end always depends on the 
last segment. So the result up to the sec- 
end segment from the end looks like the 
following. 
<Up to the second segment from the end> 
ID 1 2 3 4 5 6 
KARE-HA I FUTATABI \[ PAI-WO I TSUKURI, I KANOJO-NI I OKFITA. 
(He-subj) (again) (pie-obj) (made ,) (to her) (present) 
Cand 6 
................................................................. 
. The third segment from the end 
This segment ("TSUKURI," ) has two depen- 
dency candidates. One is the 5th segment 
("KANOJ0-NI") and the other is the 6th seg- 
ment ("0KUTTA"). Now, we use the proba- 
bilities calculated using the ME model in 
order to assign probabilities to the two can- 
didates (Candl and Cand2 in the following 
figure). Let's assume the probabilities 0.1 
and 0.9 respectively as an example. At the 
tail of each analysis, the total probability 
(the product of the probabilities of all de- 
pendencies) is shown. The candidates are 
sorted by the total probability. 
. 
<Up to the third segment from the end> 
ID 1 2 3 4 5 6 
KARE-HA I FUTATABI I PAI-WO I TSUKURI, I KANOJO-HI I OKUITA. 
(He-subj) (again) (pie-obj) (made ,) (to her) (present) 
Candl 6 0 - (0.9) 
Cand2 5 6 - (0.I) 
................................................................. 
The tburth segment from the end 
For each of the two candidates created at 
the previous stage, the dependencies of the 
fburth segment from the end ("PAI-W0") 
will be analyzed. For Candl, the segment 
can't have a dependency to the fifth seg- 
ment ("KANOJ0-1gI"), because of the non- 
crossing assmnption. So the probabili- 
ties of the dependencies only to the fourth (Candi-1) 
and the sixth (Candi-2) seg- 
ments are calculated. In the example, these 
probabilities are assmned to be 0.6 and 0.4. 
A similar analysis is conducted for Cand2 
(here probabilities are assumed to be 0.5, 
0.1 and 0.4) and three candidates are cre- 
ated (Cand2-1, Cand2-2 and Cand2-3). 
<Up to the fourth segment from the end> 
ID 1 2 3 4 5 6 
RARE-HA I FUTATABI I PAI-WO I TSUKURI, I KANOJO-NI I OKUTTA. 
(He-subj) (again) (pie-obj) (made ,) (to her) (present) 
C~dt-i 4 6 6 - (0.64) 
Candl-2 6 6 6 - (0.30) 
Cand2-1 4 5 6 - (0.05) 
Cand2-2 6 5 6 - (0.04) 
Caud2-3 5 5 6 - (0.01) 
................................................................. 
As tile analysis proceeds, a large number 
(almost L!) of candidates will he created. 
However, by linfiting the number of candi- 
dates at each stage, the total nmnber of 
candidates can be reduced. This is the 
beam search, one of the characteristics of 
the algorithm. By observing the analyses 
in the example, we can e~sily imagine that 
this beam search may not cause a serious 
problem in performance, because the candi- 
dates with low probabilities may be incor- 
rect anyway. For instance, when we set the 
beam search width = 3, then Canal2-2 and 
Cand2-3 in the figure will be discarded at 
this stage, and hence won't be used in the 
following analyses. The relationship of the 
beam search width and the accuracy oh- 
served in our experiments will be reported 
in the next section. 
756 
. Up to the, first segment 
The analyses are conducted in the, same 
way up to the first segment. For example, 
the result of tile analysis tbr the entire sell- 
tence will be shown below. (Appropriate, 
probabilities are used.) 
4.2 Beam search width and accuracy 
In this subsection, the relationship between the 
beam width and the accuracy is discussed. In 
principle, the wider the beam search width, the 
more analyses can be retained and the better 
the accuracy cml be expected. However, the re- 
.................................................................. sult is somewhat different froan tile expectation. 
<Up to the first segment> 
ID 1 2 3 4 5 6 
KARE-IIA \[ FUTATABI \[ PAl-W0 { TSUKURI, \[ KANOJ0-NI \[ 0KUTTA. 
(Ile-subj) (again) (pie-obj) (made ,) (to her) (present) 
Candl 6 4 4 6 0 - (0. ii) 
Cand2 4 4 6 6 6 - (0.09) 
Cand3 6 4 6 5 6 - (0.05) ................................................................. 
Now, the formal algorithm is described induc- 
tiveJy in Figure 3. The order of the analysis is 
quadratic ill the length of the sentence. 
4 Experiments 
In this section, experiments and evaluations will 
be reported. We use the Kyoto University Cor- 
pus (version 2) (Kurohashi el.el, 1{)97), a hand 
created Japanese corpus with POS-tags, bun- 
setsu segments and dependency information. 
The sentences in the articles from January 1, 
1994 to January 8, 1994 (7,960 sentences) a.re 
used t'or tim training of the ME model, and 
the sente, nccs in the artMes of Janum'y 9, 1994: 
(1,246 sentences) are used for the ewduation. 
The seid;ences ill the articles of Jalluary 10, 1994 
are kept for future evaluations. 
4.1 Basic Result 
The evahlation result of our systenl is shown ill 
Table 1. The experiment uses the correctly seg- 
mente(1 and 1)art-oSsl)eet'h tagger1 sentences of 
the Kyoto University corpus. The bealn search 
width is sol; to 1, in other words, the systeln runs 
deterministically. Here, 'dependency accuracy' 
Table 1: lBvaluation 
Dependency accuracy 
Sentence accuracy 
Average analysis time 
87.14% (9814/11263) 
40.60% 0503/1239) 
0.03 sec 
is the percentage of correctly analyzed depen- 
dencies out of all dependencies. 'Sentence accu- 
racy' is the i)ercentage of the sentences in which 
all the dependencies are analyzed correctly. 
Table 2 shows the dependency accuracy and 
sentence accuracy for bemn widths 1 through 
20. The difference is very small, but the best 
Table 2: Relationship between beam width and 
accuracy 
Bemn width Dependency Sentence 
Accuracy Accuracy 
1 
2 
3 
4 
5 
6 
7 
10 
15 
20 
87.14 
87.16 
87.20 
87.1.5 
87.14 
87.16 
87.20 
87.20 
86.21 
86.21 
40.60 
40.76 
40.76 
40.68 
40.60 
40.60 
40.60 
40.60 
40.60 
40.60 
accuracy is obtained when the beain width is 11 
(fbr the dependency accuracy), and 2 and 3 (tbr 
the sentence accuracy). This proves that there 
are cases where the analysis with the highest 
product of probabilities is not correct, but the 
analysis decide(1 at each stage is correct. This is 
a very interesting result of our experiment, and 
it is related to assulnption 4 regarding Japanese 
dependency, lnentioned earlier. 
This suggests that when we analyze a 
.Japanese sentence backwards, we can do it de- 
terministically without great loss of accuracy. 
Table 3 shows where the mlalysis with bemn 
width 1 appears among the analyses with bealn 
width 200. It shows that most deterministic 
analyses appear as tile best analysis in the non- 
deterministic analyses. Also, mnong the deter- 
aninistic analyses which are correct (503 Sell- 
tences), 498 sentences (99.0%) have the same 
mmlysis at the best rank in the 200-beam-width 
analyses. (Followed by 3 sentences at the see-. 
end, 1 sentence each at the third and fifth rank.) 
It means that in most of the cases, the mmlysis 
757 
<Variable> 
Length: 
W: 
C\[len\]: 
Length of the input sentence in segments 
The beam search width 
Candidate list; C for each segment keeps 
the top W partial analyses from that segment 
to the last segment. 
<Initial Operation> 
The second segment from the end depends on the last segment. 
This analysis is stored in C\[Length-l\]. 
<Inductive Operation> 
Assume the analysis up to the (M+l)-th segment has been finished. 
For each candidate ~c ' in C\[M+i\], do the following operation. 
Compute the possible dependencies of the M-th segment compatible 
with 'c'. For each dependency, create a new candidate Cd~ by 
adding the dependency to 'c'. Calculate the probability of 'd'. 
If C\[M\] has fewer than W entries, add ~d ~ to C\[M\]; 
else if the probability of Cd~ > the probability of the least 
probable entry of C\[M\], replace this entry by 'd'; 
else ignore 'd ' 
When the operation finishes for all candidates in C\[M+i\], 
proceed to the analysis of the (M-l)-th segment. 
Repeat the operation until the first segment is analyzed. 
The best analysis for the sentence is the best candidate in 
C\[1\]. 
Figure 2: Formal Algorithln 
with the highest probability at each stage also 
has the highest probability as a whole. This is 
related to assumption 4. The best analysis with 
the left context and the best analysis without 
tile left context are the same 95% of the time in 
general, and 99% of the time if the analysis is 
correct. These numbers are much higher than 
our human experinmnt mentioned in the ear- 
lier footnote (note that the number here is the 
percentage in terms of sentences, and the num- 
ber in the footnote is the percentage in terms of 
segnmnts.) It means that we may get good ac- 
curacy even without left contexts in analyzing 
Japanese dependencies. 
4.3 N-Best accuracy 
As we can generate N-best results, we measured 
N-best sentence accuracy. Figure 3 shows the 
N-best accuracy. N-best accuracy is the per- 
centage of tile sentences which have the correct 
analysis among its top N analyses. By setting 
a large beam width, we can observe N-best ac- 
curacy. The table shows the N-best accuracy 
when the beam width is set, to 20. When we set 
N = 20, 78.5% of the sentences have the cor- 
rect analysis in the top 20 analyses. If we have 
758 
Rank 1 
\]5"e<luc, ncy 1175 
(%) (.{},5.8) 
Rank 11 
Frequen(:y 1 (%) 
Table 3: The rank of the deterministic analysis 
2 3 4: 5 6 7 8 
20 11 8 4 2 1 2 
(1.6) (0.9) (0.6) (0.3) (I).2)(0.1)(0.2) 
12 j,5 1(i 17 18 
0 o 1 0 J J 
(0.1) ({}.1) (0.1) (0.1) 
9 10 
0 3 
(02) 
19 20 and more 
0 8 
(0.6) 
80 
70 
60 
50 
40 
30 
Sent;once Accuracy 
.53% 
#, 40.60% 
I I I I I I I I I I I-~TE 
0 5 10 11.5 20 
N 
Figure 3: N-best sentenc(~ Accuracy 
an ideal sysl;(ml for finding th(~ COl'lCCi; mmlysis 
a,lnOllg£ th(;ln~ which maS, 11.%O SCllltl,lll;ic O1" COll- 
l,(;x{; inforlllt~I;io\]\]~ we can have a v(Ty a(:(;Hr~d;e 
an alyzer. 
\~TC Call llltl,l((; two interesting observations 
trom the result. The ac(:uracy of the 1--best 
mmlysis is about 40%, which is more tlm.n half 
of t, he accura(:y of 20-1)est analysis. This shows 
that although the system is not 1)erfb, ct, the 
computation of the 1)rolml)ilities is t)rol)ably 
good in order l;o find the correct mmlysis at the 
top rank. 
The other point is that the accm'aey is sat- 
urated at m'omM 80%. Iml)rovemel,t over 80% 
seelns very dit\[icult even if we use a very large 
bemn width W. (lf we set; W to the number 
of all possible combinations, which means al- 
most L! for sentence length L, we (21M gC{; 100(~0 
N-best accm'aey, lint this is not worth eonsidel'- 
ing.) This suggests tlmt wc h~we missed some- 
thing important. In part;icular, from our inves- 
tigation of the result, we believe that (:oordinate 
structure is one of the most important factors 
to iml)rove the accuracy. This remains one area 
of fllturc work. 
4.4 Speed of the analysis 
Based on the f'(n'nml algorithm, the analysis 
tinle can be estimated as t)rot)orl;ional to the 
square, of the inl)ut sentence length. Figure 4: 
shows the relationshi I) between the analysis 
time and the sentence length when wc set the 
beam width to 1. We use a Sun Ultra10 ma- 
chine and the process size is about 8M byte. 
We can see that the actual analyzing time al- 
Analysis time (see.) 
0.3 
0.2 */" 
0 ~~ , r , , , 
0 10 20 30 40 
Sentence length 
\]?igure 4: \]~.elationshi 1) between sentence length 
and mmlyzing time 
most follows the quadratic curve. The ~verage 
amflysis time is 0.03 second and the ~werage sen- 
tence lengl:h is 10 segments. The analysis time 
for the longest sentence (41 segments) is 0.29 
second. W\; have not ot)l;imized the In'ogram in 
terms of speed aim there is room to shrink /;he 
process size. 
759 
5 Conclusion 
In this paper, we proposed a statistical Jttpanese 
dependency analysis method which processes a 
sentence backwards. As dependencies normally 
go from left to right in Japanese, it is eflhctive 
to analyze sentences backwards (from right to 
left). In this paper, we proposed a Japanese de- 
pendency analysis which combines a backward 
analysis and a statistical method. It can nat- 
urally incorporate a beam search strategy, an 
effective way of limiting the search space in the 
backwm'd analysis. We observed that the best 
perfbrmances were achieved when the width is 
very small. Actually, 95% of the analyses ob- 
tained with bemn width=l were the stone as 
the best analyses with beam width=20. The 
analysis time was proportional to the square of 
the sentence length (nmnber of segments), as 
was predicted from the algorithm. The average 
analysis time was 0.03 second (average sentence 
length was 10.0 bunsetsus) and it took 0.29 sec- 
end to analyze the longest sentence, which has 
41 segments. This method can be ~tpplied to 
various s which haw~ the stone or simi- 
lar characteristics of dependencies, for example 
Koran, Turkish etc. 

References 

Adam Berger and Harry Printz. 1998 : "A 
Comparison of Criteria for Maximum En- 
tropy / Mininmm Divergence Feature Selec- 
tion". Proceedings of the EMNLP-98 97-106 

Michael Collins. 1997 : "Three Generative, 
Lexicalized Models for Statistical Parsing". 
Proceedings of the ACL-97 16-23 

Terumasa Ehara. 1998 : "CMculation of 
Japanese dependency likelihood based on 
Maximmn Entropy model". Proceedings of 
the ANLP, Japan 382-385 

Masakazn t51jio and Yuuji Matsumoto. 1998 
: "Japanese Dependency Structure Analysis 
based on Lexicalized Statistics". Proceedings 
of the EMNLP-98 87-96 

Katsuhiko Fujita. 1988 : "A Trial of determin- 
istic dependency analysis". Proceedings of the 
Japanese Artificial Intelligence Annual meet- 
in9 399-402 

Masahiko Haruno and Satoshi Shirai and Yoshi- 
fumi Ooyama. 1998 : "Using Decision Trees 
to Construct a Practical Parser". Proceedings 
qf the the COLING/A CL-98 505-511 

Sadao Kurohashi and Makoto Nagao. 1994 : 
"KN Parser : ,J~tpanese Dependency/Case 
Structure Analyzer". Proceedings of The In- 
ternational Workshop on Sharable Natural 
Language Resources 48-55 

Sadao Kurohashi and Makoto Nagao. 1997 : 
"Kyoto University text corpus project". Pro- 
ceedings of the ANLP, Japan 115-118 

David Magerman. 1995 : "Statistical Decision- 
Tree Models for Parsiug". Proceedings of the 
ACL-95 276-283 

Adwait I/,atnaparkhi. 1997 : "A Linear Ob- 
served Time Statistical Parser Based on 
Maximum Entropy Models". Proceedings o.f 
EMNLP-97 

Satoshi Sekine and Ralph Grishman. 1995 : "A 
Corpus-based Probabilistic Grammar with 
Only Two Non-terminals". Proceedings of the 
IWPT-95 216-223 

Satoshi Shirai. 1998 : "Heuristics and its lira- 
itation". Jowrnal o\[ the ANLP, Japan Vol.5 
No.l, 1-2 

Kiyoaki Shirai, Kentaro Inui, Takenobu 'lbku- 
naga and Hozunli Tanaka. 1998 : "An Em- 
pirical Evaluation on Statistical Parsing of 
Japanese Sentences Using Lexical Association 
Statistics". P'roceedings of EMNLP-98 80-86 

Kiyotaka Uchimoto, Satoshi Sekine, Hitoshi 
Isahara. 1999 : "Jat)anese Dependency 
Structm'e Analysis Based on Maximum En- 
tropy Models". P~vceedings o\[ the EACL-99 
pp196-203 
