Analyzing Dependencies of Japanese Subordinate Clauses 
based on Statistics of Scope Embedding Preference 
Takehito Utsuro, Shigeyuki Nishiokayama, Masakazu Fujio, Yuji Matsumoto 
Graduate School of Information Science, Nara Institute of Science and Technology 
8916-5, Takayama-cho, Ikoma-shi, Nara, 630-0101, JAPAN 
E-mail: utsuro @is. aist-nara, ac.jp, URL: http://cl, aist-nara, ac.jp/-utsuro/ 
Abstract 
This paper proposes a statistical method for 
learning dependency preference of Japanese 
subordinate clauses, in which scope embedding 
preference of subordinate clauses is exploited 
as a useful information source for disambiguat- 
ing dependencies between subordinate clauses. 
Estimated dependencies of subordinate clauses 
successfully increase the precision of an existing 
statistical dependency analyzer. 
1 Introduction 
In the Japanese , since word order in a 
sentence is relatively free compared with Euro- 
pean s, dependency analysis has been 
shown to be practical and effective in both rule- 
based and stochastic approaches to syntactic 
analysis. In dependency analysis of a Japanese 
sentence, among various source of ambiguities 
in a sentence, dependency ambiguities of sub- 
ordinate clauses are one of the most problem- 
atic ones, partly because word order in a sen- 
tence is relatively free. In general, dependency 
ambiguities of subordinate clauses cause scope 
ambiguities of subordinate clauses, which result 
in enormous number of syntactic ambiguities of 
other types of phrases such as noun phrases. 1 
1In our preliminary corpus analysis using the stochas- 
tic dependency analyzer of Fujio and Matsumoto (1998), 
about 30% of the 210,000 sentences in EDR bracketed 
corpus (EDR, 1995) have dependency ambiguities of sub- 
ordinate clauses, for which the precision of chunk (bun- 
setsu) level dependencies is about 85.3% and that of sen- 
tence level is about 25.4% (for best one) ~ 35.8% (for 
best five), while for the rest 70% of EDR bracketed cor- 
pus, the precision of chunk (bunsetsu) level dependencies 
is about 86.7% and that of sentence level is about 47.5% 
(for best one) ~ 60.2% (for best five). In addition to 
that, when assuming that those ambiguities of subor- 
dinate clause dependencies are initially resolved in some 
way, the chunk level precision increases to 90.4%, and the 
sentence level precision to 40.6% (for best one) ~ 67.7% 
(for best five). This result of our preliminary analysis 
110 
In the Japanese linguistics, a theory of Mi- 
nami (1974) regarding scope embedding pref- 
erence of subordinate clauses is well-known. 
Minami (1974) classifies Japanese subordinate 
clauses according to the breadths of their scopes 
and claim that subordinate clauses which inher- 
ently have narrower scopes are embedded within 
the scopes of subordinate clauses which inher- 
ently have broader scopes (details are in sec- 
tion 2). By manually analyzing several raw cor- 
pora, Minami (1974) classifies various types of 
Japanese subordinate clauses into three cate- 
gories, which are totally ordered by the embed- 
ding relation of their scopes. In the Japanese 
computational linguistics community, Shirai et 
al. (1995) employed Minami (1974)'s theory on 
scope embedding preference of Japanese sub- 
ordinate clauses and applied it to rule-based 
Japanese dependency analysis. However, in 
their approach, since categories of subordinate 
clauses are obtained by manually analyzing 
a small number of sentences, their coverage 
against a large corpus such as EDR bracketed 
corpus (EDR, 1995) is quite low. 2 
In order to realize a broad coverage and high 
performance dependency analysis of Japanese 
sentences which exploits scope embedding pref- 
erence of subordinate clauses, we propose a 
corpus-based and statistical alternative to the 
rule-based manual approach (section 3). 3 
clearly shows that dependency ambiguities of subordi- 
nate clauses are among the most problematic source of 
syntactic ambiguities in a Japanese sentence. 
2In our implementation, the coverage of the categories 
of Shirai et al. (1995) is only 30% for all the subordinate 
clauses included in the whole EDR corpus. 
~Previous works on statistical dependency analysis in- 
clude Fujio and Matsumoto (1998) and Haruno et al. 
(1998) in Japanese analysis as well as Lafferty et al. 
(1992), Eisner (1996), and Collins (1996) in English anal- 
ysis. In later sections, we discuss the advantages of our 
approach over several closely related previous works. 
Table 1: Word Segmentation, POS tagging, and Bunsetsu Segmentation of A Japanese Sentence 
Word Segmentation 
POS (+ conjugation form) 
Tagging 
Bunsetsu Segmentation 
Tenki ga yoi kara dekakeyou 
noun case- adjective predicate- verb 
particle (base) conjunctive-particle (volitional) 
Tenki-ga yoi-kara dekakeyou 
(Chunking) 
English Translation weather subject fine because let's go out 
(Because the weather is fine, let's go out.) 
First, we formalize the problem of decid- 
ing scope embedding preference as a classifi- 
cation problem, in which various types of lin- 
guistic information of each subordinate clause 
are encoded as features and used for deciding 
which one of given two subordinate clauses has 
a broader scope than the other. As in the case of 
Shirai et al. (1995), we formalize the problem of 
deciding dependency preference of subordinate 
clauses by utilizing the correlation of scope em- 
bedding preference and dependency preference 
of Japanese subordinate clauses. Then, as a sta- 
tistical learning method, we employ the decision 
list learning method of Yarowsky (1994), where 
optimal combination of those features are se- 
lected and sorted in the form of decision rules, 
according to the strength of correlation between 
those features and the dependency preference 
of the two subordinate clauses. We evaluate 
the proposed method through the experiment 
on learning dependency preference of Japanese 
subordinate clauses from the EDR bracketed 
corpus (section 4). We show that the pro- 
posed method outperforms other related meth- 
ods/models. We also evaluate the estimated de- 
pendencies of subordinate clauses in Fujio and 
Matsumoto (1998)'s framework of the statisti- 
cal dependency analysis of a whole sentence, in 
which we successfully increase the precisions of 
both chunk level and sentence level dependen- 
cies thanks to the estimated dependencies of 
subordinate clauses. 
2 Analyzing Dependencies between 
Japanese Subordinate Clauses 
based on Scope Embedding 
Preference 
2.1 Dependency Analysis of A 
Japanese Sentence 
First, we overview dependency analysis of a 
Japanese sentence. Since words in a Japanese 
sentence are not segmented by explicit delim- 
iters, input sentences are first word segmented, 
111 
Phrase Structure 
Scope of Subordin.~.ff..f.~... 
( !(ffenki-ga) (yO~:ra)) \[ (dekakeyou)) 
t 
Dependency (modification) Relation 
Figure 1: An Example of Japanese Subordinate 
Clause (taken from the Sentence of Table 1) 
part-of-speech tagged, and then chunked into a 
sequence of segments called bunsetsus. 4 Each 
chunk (bunsetsu) generally consists of a set of 
content words and function words. Then, de- 
pendency relations among those chunks are es- 
timated, where most practical dependency ana- 
lyzers for the Japanese  usually assume 
the following two constraints: 
1. Every chunk (bunsetsu) except the last one 
modifies only one posterior chunk (bun- 
setsu). 
2. No modification crosses to other modifica- 
tions in a sentence. 
Table 1 gives an example of word segmenta- 
tion, part-of-speech tagging, and bunsetsu seg- 
mentation (chunking) of a Japanese sentence, 
where the verb and the adjective are tagged 
with their parts-of-speech as well as conjuga- 
tion forms. Figure i shows the phrase structure, 
the bracketing, 5 and the dependency (modifica- 
tion) relation of the chunks (bunsetsus) within 
the sentence. 
4Word segmentation and part-of-speech tagging are 
performed by the Japanese morphological analyzer 
Chasen (Matsumoto et al., 1997), and chunking is done 
by the preprocessor used in Fujio and Matsumoto (1998). 
5The phrase structure and the bracketing are shown 
just for explanation, and we do not consider them 
but consider only dependency relations in the analysis 
throughout this paper. 
A Japanese subordinate clause is a clause whose head chunk satisfies the following properties. 
1. The 
(a) 
(b) 
2. The 
(a) 
(b) 
(c) 
(d) 
(e) 
(f) 
(g) 
(h) 
content words part of the chunk (bunsetsu) is one of the following types: 
A predicate (i.e., a verb or an adjective). 
nouns and a copula like "Noun1 dearu" (in English, "be Noun1"). 
function words part of the chunk (bunsetsu) is one of the following types: 
Null. 
Adverb type such as "Verbl ippou-de" (in English, "(subject) Verb1 ..., on the other hand,"). 
Adverbial noun type such as "Verb1 tame" (in English, "in order to Verb1"). 
FormM noun type such as "Verb1 koto" (in English, gerund "Verbl-ing"). 
Temporal noun type such as "Verb1 mae" (in English, "before (subject) Verb1 ..."). 
A predicate conjunctive particle such as "Verbl ga" (in English, "although (subject) Verbl ...,"). 
A quoting particle such as "Verbl to (iu)" (in English, "(say) that (subject) Verbl ..."). 
(a),,~(g) followed by topic marking particles and/or sentence-final particles. 
Figure 2: Definition of Japanese Subordinate Clause 
2.2 Japanese Subordinate Clause 
The following gives the definition of what we call 
a "Japanese subordinate clause" throughout this 
paper. A clause in a sentence is represented as 
a sequence of chunks. Since the Japanese lan- 
guage is a head-final , the clause head 
is the final chunk in the sequence. A grammati- 
cal definition of a Japanese subordinate clause is 
given in Figure 2. 6 For example, the Japanese 
sentence in Table 1 has one subordinate clause, 
whose scope is indicated as the shaded rectangle 
in Figure 1. 
2.3 Scope Embedding Preference of 
Subordinate Clauses 
We introduce the concept of Minami (1974)'s 
classification of Japanese subordinate clauses 
by describing the more specific classification by 
Shirai et al. (1995). From 972 newspaper 
summary sentences, Shirai et al. (1995) man- 
ually extracted 54 clause final function words 
of Japanese subordinate clauses and classified 
them into the following three categories accord- 
ing to the embedding relation of their scopes. 
Category A: Seven expressions representing 
simultaneous occurrences such as "Verb1 
SThis definition includes adnominal or noun phrase 
modifying clauses "Clause1 (NP1)" (in English, rela- 
tive clauses "(NP1) that Clause1"). Since an adnom- 
inal clause does not modify any posterior subordinate 
clauses, but modifies a posterior noun phrase, we regard 
adnominal clauses only as modifees when considering de- 
pendencies between subordinate clauses. 
to-tomoni (Clause2)" and "Verbl nagara 
(Clause2)". 
Category B: 46 expressions representing 
cause and discontinuity such as "Verb1 
te (Clause2)" (in English "Verbl and 
(Clause2)") and "Verb1 node" (in English 
"because (subject) Verb1 ...,"). 
Category C: One expression representing in- 
dependence, "Verb1 ga" (in English, "al- 
though (subject) Verb1 ...,"). 
The category A has the narrowest scope, while 
the category C has the broadest scope, i.e., 
Category A -4 Category B -4 Category C 
where the relation '-<~ denotes the embedding 
relation of scopes of subordinate clauses. Then, 
scope embedding preference of Japanese subor- 
dinate clauses can be stated as below: 
Scope Embedding Preference of 
Japanese Subordinate Clauses 
1. A subordinate clause can be embedded within 
the scope of another subordinate clause which 
inherently has a scope of the same or a broader 
breadth. 
2. A subordinate clause can not be embedded 
within the scope of another subordinate clause 
which inherently has a narrower scope. 
For example, a subordinate clause of 'Category 
B' can be embedded within the scope of another 
subordinate clause of 'Category B' or 'Category 
C', but not within that of 'Category A'. Figure 3 
112 
(a) Category A -.< Category C 
Scopes of Subordinate Clauses 
Category C 
boil- polite/past- scorch- perfect-polite/past-period stir up-with although- comma 
( Although • boiled it with stirring it up, it had got scorched. ) 
(b) Category C P- Category A 
Scopes of Subordinate C~ 
r Catego 
c ) ) 
boil- polite scorch fear- sbj exist- polite- hot_fwe-over stir_up-with (volitional)-period 
although- comma 
( Although there is some fear of its getting scorched, let's boil it with stirring it up over a hot\]we. ) 
Figure 3: Examples of Scope Embedding of Japanese Subordinate Clauses 
(a) gives an example of an anterior Japanese 
subordinate clause ( "kakimaze-nagara", Cate- 
gory A), which is embedded within the scope 
of a posterior one with a broader scope ("ni- 
mashita-ga-,", Category C). Since the poste- 
rior subordinate clause inherently has a broader 
scope than the anterior, the anterior is embed- 
ded within the scope of the posterior. On the 
other hand, Figure 3 (b) gives an example of 
an anterior Japanese subordinate clause ("ari- 
masu-ga-,', Category C), which is not embed- 
ded within the scope of a posterior one with a 
narrower scope ( "kakimaze-nagara", Category 
A). Since the posterior subordinate clause in- 
herently has a narrower scope than the anterior, 
the anterior is not embedded within the scope 
of the posterior. 
2.4 Preference of Dependencies 
between Subordinate Clauses based 
on Scope Embedding Preference 
Following the scope embedding preference of 
Japanese subordinate clauses proposed by Mi- 
nami (1974), Shirai et al. (1995) applied it 
to rule-based Japanese dependency analysis, 
and proposed the following preference of decid- 
ing dependencies between subordinate clauses. 
Suppose that a sentence has two subordinate 
clauses Clausez and Clause2, where the head 
vp chunk of Clauses precedes that of Clause2. 
Dependency Preference of Japanese 
Subordinate Clauses 
1. The head vp chunk of Clause1 can modify that 
of Clause2 if Clause2 inherently has a scope of 
the same or a broader breadth compared with 
that of Clause1. 
2. The head vp chunk of Clausez can not mod- 
ify that of Clause2 if Clause2 inherently has a 
narrower scope compared with that of Clause1. 
3 Learning Dependency Preference 
of Japanese Subordinate Clauses 
As we mentioned in section 1, the rule-based 
approach of Shirai et al. (1995) to analyz- 
ing dependencies of subordinate clauses using 
scope embedding preference has serious limi- 
tation in its coverage against corpora of large 
size for practical use. In order to overcome 
the limitation of the rule-based approach, in 
this section, we propose a method of learning 
dependency preference of Japanese subordinate 
clauses from a bracketed corpus. We formalize 
the problem of deciding scope embedding pref- 
erence as a classification problem, in which var- 
ious types of linguistic information of each sub- 
ordinate clause are encoded as features and used 
for deciding which one of given two subordinate 
clauses has a broader scope than the other. As 
a statistical learning method, we employ the de- 
cision list learning method of Yarowsky (1994). 
113 
Table 2: Features of Japanese Subordinate Clauses 
Feature Type # of Feat .... Each Binary Feature 
Punctuation 2 with-comma, without-comma 
Grammatical adverb, adverbial-noun, formal-noun, temporal-noun, 
(some features have distinction 17 quoting-particle, copula, predicate-conjunctive-particle, 
of chunk-final/middle) topic-marking-particle, sentence-final-particle 
12 Conjugation form of 
chunk-final conjugative word 
Lexical (lexicalized forms of 
'Grammatical' features, 
with more than 
9 occurrences 
in EDR corpus) 
235 
stem, base, mizen, ren'you, rental, conditional, 
imperative, ta, tari, re, conjecture, volitional 
adverb (e.g., ippou-de, irai), adverbial-noun (e.g., tame, baai) 
topic-marking-particle (e.g., ha, mo), quoting-particle (to), 
predicate-conjunctive-particle (e.g., ga, kara), 
temporal-noun (e.g., ima, shunkan), formal-noun (e.g., koto), 
copula (dearu), sentence-final-particle (e.g., ka, yo) 
3.1 The Task Definition 
Considering the dependency preference of 
Japanese subordinate clauses described in sec- 
tion 2.4, the following gives the definition of our 
task of deciding the dependency of Japanese 
subordinate clauses. Suppose that a sen- 
tence has two subordinate clauses Clause1 and 
Clause2, where the head vp chunk of Clausel 
precedes that of Clause2. Then, our task of de- 
ciding the dependency of Japanese subordinate 
clauses is to distinguish the following two cases: 
1. The head vp chunk of Clausez modifies that of 
Clause2. 
2. The head vp chunk of Clause1 does not modify 
that of Clause2, but modifies that of another 
subordinate clause or the matrix clause which 
follows Clause2. 
Roughly speaking, the first corresponds to the 
case where Clause2 inherently has a scope of the 
same or a broader breadth compared with that 
of Clause1, while the second corresponds to the 
case where Clause2 inherently has a narrower 
scope compared with that of Clause1.7 
3.2 Decision List Learning 
A decision list (Yarowsky, 1994) is a sorted list 
of the decision rules each of which decides the 
value of a decision D given some evidence E. 
Each decision rule in a decision list is sorted 
TOur modeling is slightly different from those of other 
standard approaches to statistical dependency analy- 
sis (Collins, 1996; Fujio and Matsumoto, 1998; Haruno 
et al., 1998) which simply distinguish the two cases: the 
case where dependency relation holds between the given 
two vp chunks or clauses, and the case where dependency 
relation does not hold. In contrast to those standard ap- 
proaches, we ignore the case where the head vp chunk 
of Clause1 modifies that of another subordinate clause 
which precedes Clause2. This is because we assume that 
this case is more loosely related to the scope embedding 
preference of subordinate clauses. 
in descending order with respect to some pref- 
erence value, and rules with higher preference 
values are applied first when applying the deci- 
sion list to some new test data. 
First, let the random variable D represent- 
ing a decision varies over several possible values, 
and the random variable E representing some 
evidence varies over '1' and '0' (where '1' de- 
notes the presence of the corresponding piece 
of evidence, '0' its absence). Then, given some 
training data in which the correct value of the 
decision D is annotated to each instance, the 
conditional probabilities P(D =x \[ E= 1) of ob- 
serving the decision D = x under the condition 
of the presence of the evidence E (E = 1) are 
calculated and the decision list is constructed 
by the following procedure. 
1. For each piece of evidence, calculate the likeli- 
hood ratio of the conditional probability of a de- 
cision D = xl (given the presence of that piece 
of evidence) to the conditional probability of 
the rest of the decisions D =-,xl: 
P(D=xl I E=I) 
l°g2 P(D='~xl \[E=I) 
Then, a decision list is constructed with pieces 
of evidence sorted in descending order with re- 
spect to their likelihood ratios, s 
2. The final line of a decision list is defined as 'a 
default', where the likelihood ratio is calculated 
as the ratio of the largest marginal probability 
of the decision D = xl to the marginal proba- 
Syarowsky (1994) discusses several techniques for 
avoiding the problems which arise when an observed 
count is 0. Among those techniques, we employ the sim- 
plest one, i.e., adding a small constant c~ (0.1 < c~ < 
0.25) to the numerator and denominator. With this 
modification, more frequent evidence is preferred when 
there exist several evidences for each of which the con- 
ditional probability P(D=x \[ E=I) equals to 1. 
114 
(a) An Example Sentence with Chunking, Bracketing, and Dependency Relations 
Subordinate Clauses 
Clause2 
~"i:" .............................. ar i..,,:..:.,,,,,:i:,,:~:::,i::::i::ti:, :~::~ f(~/o:,a,,~( Cs..0 (~you~,ai,-taa-toi-)) (k~s~-ga)) (dete-tu u-.) ) 
I 
raise-price 10%-~ -although 3%- emphatic_.au×iliay 
- comma _verb (te-form) -comma involuntary dealer-charge-of case- sbj happen-will/may- ~dod 
(If the tax rate is 10%, the dealers will raise price, but, because it is 3%, there will happen to be the cases that the dealers pay the tax. ) 
(b) Feature Expression of Head VP Chunk of Subordinate Clauses 
Head VP Chunk of Subordinate Clause Feature Set 
Seg 1 : "neage-suru-ga-," 
Se92 : "3%-ha-node-," 
Feature Set 
~-z = ~f with-comma, predicate-conjunctive-particle(chunk-final), k 
predicate-conjunctive-particle(chunk-final)-"ga" } 
~'2 = I with-comma, chunk-final-conjugative-word-te-form } 
(c) Evidence-Decision Pairs for Decision List Learning 
Evidence E (E= 1) (feature names are abbreviated) 
El 
with-comma I with-comma 
re-form 
with-comma, te-form 
with-comma 
... 
with-comma 
.°. 
with-comma 
°.. 
with-comma 
... 
with-comma 
with-comma 
pred-conj-particle(final) 
.o° 
with-comma, pred-conj-particle(final) 
... 
pred-conj-particle(final)- "ga" 
.°. 
with-comma, pred-conj-particle(final)-"ga" 
°°. 
Decision D 
"beyond" 
"beyond" 
"beyond" 
"beyond" 
... 
"beyond" 
"beyond" 
... 
"beyond" 
°., 
Figure 4: An Example of Evidence-Decision Pair of Japanese Subordinate Clauses 
bility of the rest of the decisions D-=--xz: final conjugative word: used when the chunk- 
P(D=xl) 
l°g2 P(D="xl) 
The 'default' decision of this final line is D-= xz 
with the largest marginal probability. 
3.3 Feature of Subordinate Clauses 
Japanese subordinate clauses defined in sec- 
tion 2.2 are encoded using the following four 
types of features: i) Punctuation: represents 
whether the head vp chunk of the subordinate 
clause is marked with a comma or not, ii) Gram- 
matical: represents parts-of-speech of function 
words of the head vp chunk of the subordi- 
nate clause, 9 iii) Conjugation form of chunk- 
9Terms of parts-of-speech tags and conjugation forms 
are borrowed from those of the Japanese morphological 
ana/ysis system Chasen (Matsumoto et al., 1997). 
final word is conjugative, iv) Lexical: lexicalized 
forms of 'Grammatical' features which appear 
more than 9 times in EDR corpus. Each fea- 
ture of these four types is binary and its value 
is '1' or '0' ('1' denotes the presence of the cor- 
responding feature, '0' its absence). The whole 
feature set shown in Table 2 is designed so as to 
cover the 210,000 sentences of EDR corpus. 
3.4 Decision List Learning of 
Dependency Preference of 
Subordinate Clauses 
First, in the modeling of the evidence, we con- 
sider every possible correlation (i.e., depen- 
dency) of the features of the subordinate clauses 
listed in section 3.3. Furthermore, since it is 
necessary to consider the features for both of the 
given two subordinate clauses, we consider all 
115 
the possible combination of features of the an- 
terior and posterior head vp chunks of the given 
two subordinate clauses. More specifically, let 
Seg\] and Seg2 be the head vp chunks of the 
given two subordinate clauses (Segl is the ante- 
rior and Seg2 is the posterior). Also let 9Vl and 
9r2 be the sets of features which Segl and Seg2 
have, respectively (i.e., the values of these fea- 
tures are '1'). We consider every possible subset 
F1 and F2 of ~-1 and ~2, respectively, and then 
model the evidence of the decision list learning 
method as any possible pair (F1, F2)3 ° 
Second, in the modeling of the decision, we 
distinguish the two cases of dependency rela- 
tions described in section 3.1. We name the first 
case as the decision "modify", while the second 
as the decision "beyond". 
3.5 Example 
Figure 4 illustrates an example of transforming 
subordinate clauses into feature expression, and 
then obtaining training pairs of an evidence and 
a decision from a bracketed sentence. Figure 4 
(a) shows an example sentence which contains 
two subordinate clauses Clause1 and Clause2, 
with chunking, bracketing, and dependency re- 
lations of chunks. Both of the head vp chunks 
Segl and Seg2 of Clause1 and Clause2 modify 
the sentence-final vp chunk. As shown in Fig- 
ure 4 (b), the head vp chunks Segl and Seg2 
have feature sets ~'1 and ~'2, respectively. Then, 
every possible subsets F1 and F2 of ~1 and 
~2 are considered, n respectively, and training 
pairs of an evidence and a decision are collected 
as in Figure 4 (c). In this case, the value of the 
decision D is "beyond", because Segl modifies 
the sentence-final vp chunk, which follows Seg 2. 
1°Our formalization of the evidence of decision list 
learning has an advantage over the decision tree learn- 
ing (Quinlan, 1993) approach to feature selection of de- 
pendency analysis (Haruno et al., 1998). In the feature 
selection procedure of the decision tree learning method, 
the utility of each feature is evaluated independently, 
and thus the utility of the combination of more than one 
features is not evaluated directly. On the other hand, in 
our formalization of the evidence of decision list learn- 
ing, we consider every possible pair of the subsets F1 and 
Fz, and thus the utility of the combination of more than 
one features is evaluated directly. 
lXSince the feature 'predicate-conjunctive- 
particle(chunk-final)' subsumes 'predicate-conjunctive- 
particle(chunk-final)-"ga", they are not considered 
together as one evidence. 
i . \ 
Coverage (Model (b)) ....... " ...... \~ \ P~OurModet) --.-- "~.. \ \ 
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0,85 0.9 0.95 
Lower Bound of P(DIE ) 
Figure 5: Precisions and Coverages of Deciding 
Dependency between Two Subordinate Clauses 
100 ,,,~ , , ' Our Model , ' 
~_~ Model (a) ......... ...... ~ Model (b) ........... 
95 
90 
ii ........ ~,. 
"m.. 
80 
75 .... "i "": .......... '~. , 
0 20 40 60 80 100 
Coverage (%) 
Figure 6: Correlation of Coverages and Precisions 
4 Experiments and Evaluation 
We divided the 210,000 sentences of the whole 
EDR bracketed Japanese corpus into 95% train- 
ing sentences and 5~0 test sentences. Then, 
we extracted 162,443 pairs of subordinate 
clauses from the 199,500 training sentences, and 
learned a decision list for dependency prefer- 
ence of subordinate clauses from those pairs. 
The default decision in the decision list is 
D ="beyond", where the marginal probability 
P(D = "beyond") = 0.5378, i.e., the baseline 
precision of deciding dependency between two 
subordinate clauses is 53.78 %. We limit the fre- 
quency of each evidence-decision pair to be more 
than 9. The total number of obtained evidence- 
decision pairs is 7,812. We evaluate the learned 
decision list through several experiments. 12 
First, we apply the learned decision list to 
deciding dependency between two subordinate 
clauses of the 5% test sentences. We change 
the threshold of the probability P(D I E) 13 in 
12Details of the experimental evaluation will be pre- 
sented in Utsuro (2000). 
I~P( D I E) can be used equivalently to the likelihood 
116 
the decision list and plot the trade-off between 
coverage and precision. 14 As shown in the plot 
of "Our Model" in Figure 5, the precision varies 
from 78% to 100% according to the changes of 
the threshold of the probability P(D I E). 
Next, we compare our model with the other 
two models: (a) the model learned by apply- 
ing the decision tree learning method of Haruno 
et al. (1998) to our task of deciding depen- 
dency between two subordinate clauses, and (b) 
a decision list whose decisions are the following 
two cases, i.e., the case where dependency rela- 
tion holds between the given two vp chunks or 
clauses, and the case where dependency relation 
does not hold. The model (b) corresponds to a 
model in which standard approaches to statis- 
tical dependency analysis (Collins, 1996; Fujio 
and Matsumoto, 1998; Haruno et al., 1998) are 
applied to our task of deciding dependency be- 
tween two subordinate clauses. Their results 
are also in Figures 5 and 6. Figure 5 shows that 
"Our Model" outperforms the other two mod- 
els in coverage. Figure 6 shows that our model 
outperforms both of the models (a) and (b) in 
coverage and precision. 
Finally, we examine whether the estimated 
dependencies of subordinate clauses improve 
the precision of Fujio and Matsumoto (1998)'s 
statistical dependency analyzer. 15 Depending 
on the threshold of P(D \[ E), we achieve 
0.8,,~1.8% improvement in chunk level precision, 
and 1.6~-4.7% improvement in sentence level, is 
5 Conclusion 
This paper proposed a statistical method for 
learning dependency preference of Japanese 
ratio. 
14Coverage: the rate of the pairs of subordinate clauses 
whose dependencies are decided by the decision list, 
against the total pairs of subordinate clauses, Precision: 
the rate of the pairs of subordinate clauses whose depen- 
dencies are correctly decided by the decision list, against 
those covered pairs of subordinate clauses. 
15Fujio and Matsumoto (1998)'s lexicalized depen- 
dency analyzer is similar to that of Collins (1996), where 
various features were evaluated through performance 
test and an optimal feature set was manually selected. 
16The upper bounds of the improvement in chunk level 
and sentence level precisions, which are estimated by 
providing Fujio and Matsumoto (1998)'s statistical de- 
pendency analyzer with correct dependencies of subor- 
dinate clauses extracted from the bracketing of the EDR 
corpus, are 5.1% and 15%, respectively. 
subordinate clauses, in which scope embed- 
ding preference of subordinate clauses is ex- 
ploited. We evaluated the estimated dependen- 
cies of subordinate clauses through several ex- 
periments and showed that our model outper- 
formed other related models. 

References 

M. Collins. 1996. A new statistical parser based on 
bigram lexical dependencies. In Proceedings of the 
34th Annual Meeting of ACL, pages 184-191. 

EDR (Japan Electronic Dictionary Research Insti- 
tute, Ltd.). 1995. EDR Electronic Dictionary 
Technical Guide. 

J. Eisner. 1996. Three new probabilistic models for 
dependency parsing: An exploration. In Proceedings of the 16th COLING, pages 340-345. 

M. Fujio and Y. Matsumoto. 1998. Japanese dependency structure analysis based on lexicalized 
statistics. In Proceedings of the 3rd Conference on 
Empirical Methods in Natural Language Processing, pages 88--96. 

M. Haruno, S. Shirai, and Y. Oyama. 1998. Using decision trees to construct a practical parser. 
In Proceedings of the 17th COLING and the 36th 
Annual Meeting of ACL, pages 505-511. 

J. Lafferty, D. Sleator, and D. Temperley. 1992. 
Grammatical trigrams: A probabilistic model of 
link grammar. In Proceedings of the AAAI Fall 
Symposium: Probabilistic Approaches to Natural 
Language, pages 89-97. 

Y. Matsumoto, A. Kitauchi, T. Yamashita, 
O. Imaichi, and T. Imamura. 1997. Japanese 
morphological analyzer ChaSen 1.0 users manual. 
Information Science Technical Report NAIST-IS-TR9?007, Nara Institute of Science and Technology. (in Japanese). 

F. Minami. 1974. Gendai Nihongo no Kouzou. 
Taishuukan Shoten. (in Japanese). 

J. R. Quinlan. 1993. CJ.5: Programs for Machine 
Learning. Morgan Kaufmann. 

S. Shirai, S. Ikehara, A. Yokoo, and J. Kimura. 1995. 
A new dependency analysis method based on 
semantically embedded sentence structures and 
its performance on Japanese subordinate clauses. 
Transactions of Information Processing Society of 
Japan, 36(10):2353-2361. (in Japanese). 

T. Utsuro. 2000. Learning preference of dependency between Japanese subordinate clauses and 
its evaluation in parsing. In Proceedings of the Pad 
International Conference on Language Resources 
and Evaluation. (to appear). 

D. Yarowsky. 1994. Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French. In Proceedings of the 
32nd Annual Meeting of ACL, pages 88-95.
