The Selection of the Most Probable Dependency Structure in 
Japanese Using Mutual Information 
Eduardo de Paiva Alves 
University of Electro-Connnunications 
1-5-1 Chofugaoka Chofushi Tokyo Japan 
ealves@phaeton.cs.uec.ac.jp 
Abstract 
We use a statistical method to select the 
most probable structure or parse for a given 
sentence. It takes as input the dependency 
structures generated for the sentence by 
a dependency grammar, finds all triple of 
modifier, particle and modificant relations, 
calculates mutual information of each re- 
lation and chooses the structure for which 
the product of the mutual information of 
its relations is the highest. 
1 Introduction 
Computer Aided Instruction (CAI) systems are im- 
portant and effective tools, especially for teaching 
foreign languages. Many students of Japanese as a 
foreign language are aware of the Computer Assisted 
TEchnical Reading System (CATERS) that provides 
helpful information for reading texts in science and 
technology fields (Kano a~ld Yamamoto, 1995). 
One of the difficulties in learning Japanese lies in 
recognizing dependency relations in Japallese sen- 
tences. This is because the language allows relatively 
free word orders. Take an example from a leading 
newspaper: 
We would like to expect a prompt study of the causes, based 
on a national investigation 
To understand this sentence it is necessa~'y to 
know that ~1:~=~= (investigation) modifies ~o ~ -'3 
~,~$: (based) but not ~L~.=~ ~ (expect); ~0 ~"3~fa 
(based) modifies ~\] (study) but not ~,~ (cause). 
CATERS is useful because it provides such infor- 
mation through several user-friendly functions. 
As effective as it is for foreign students, however, 
the texts in CATERS are fixed and the dependency 
structure of every sentence in them is all hand-coded. 
This inability to handle new text poses a serious 
problem in its general applicability and extensibility. 
This paper describes a method for selecting the 
right or most probable structure for a Japanese sen- 
tence among multiple probable structures generated 
by Restricted Dependency Grammar (RDG) (Fuku- 
moto, 1992). If this method works, then its results 
will be quite valuable for facilitating the develop- 
ment of new texts for CAI systems like CATERS. 
2 Background 
As pointed out earlier, the dependency relation of 
elements in Japanese sentences are fairly compli- 
cated due to relatively free word orders. RDG is 
designed to determine dependency relations among 
words and phrases in sentences. To do so, it clas- 
sifies the phrases according to grammatical cate- 
gories and syntactic attributes. However, it fails to 
reject semantically unacceptable dependency struc- 
tures. The inevitable consequence is that RDG often 
produces multiple parses even for a simple sentence. 
Kurohashi and Nagao (1993) try to determine the 
dependency relations of a sentence by means of using 
sample sentences. When the sentence is structurally 
ambiguous, they determine its structure by compar- 
ing it to structurally similax patterns taken from a 
manually generated set of examples and calculating 
similarity values. 
Our method, on the contrary, uses a statistical ap- 
proach to select the most probable structure or parse 
of a given sentence. It takes as input dependency 
structures generated by RDG for a sentence, finds all 
of modifier-particle-modificant relations, calculates 
their mutual information and chooses the structure 
for which the product of the nmtual information of 
its relations is the highest. 
In order to calculate the mutual information 
for any modifier-particle-modificant pattern, we use 
the Conceptual Dictionary: (CD) to build a tax- 
onomic hierarchy of the modifiers which occur 
1The Co-occurrence Dictionary and Conceptual Dic- 
tionary used in the process are part of a set of machine 
readable :\]apanese dictionaries compiled by the .Japan 
Electronic Dictionary Research Institute (EDIt, 1993). 
The Conceptual Dictionary is a set of graphs consisting 
of 400,000 concepts and a number of taxonomic as well as 
functional relations between them. The Co-occurrence 
Dictionary consist of a list of 1,100,000 dependency re- 
lations (modifier, particle and modificant) taken from a 
corpus. Each entry includes syntactic information, con- 
cept identifiers (a numerical code) and the number of 
occurrences in the corpus. 
372 
with the particle-modificant sub-pattern in the Co- 
occurrence Dictionary (COD). The mutual informa- 
tion for any pattern is the maximum mutual infor- 
mation between the sub-pattern and the concepts in 
the taxonomic hierarchy which generalize the modi- 
tier in the pattern. 
Resnik and Hearst (1993) use a similar approach 
to calculate preferences for prepositional phrase at- 
tachment. While they use data on word groups, our 
method directly uses word co-occurrence data to es- 
timate the preferences using the CD to identify the 
most adequate grouping for each relation. 
While Kurohashi and Nagao compare the sentence 
with a single sample of patterns, we use all occur- 
rehces of the pattern in COD to calculate the mu- 
tual information. Our approach automatically ex- 
tracts the occurrences from the dictionary as well as 
builds the taxonomic hierarchy. Unlike Kurohashi 
and Nagao (1993), which uses only verb and adjec- 
tive patterns, we cover all dependency relations. 
3 Selecting the Most Probable 
Structure 
RDG identifies all possible dependency structures 
which consist of modifier-modificant relations be- 
tween elements in a sentence. The arcs in the fol- 
lowing example show modifier-modificant relations 
which can be combined into six different dependency 
structures. 
I I, , ,InJ , 
n&tional investig&tion based cause pro--~ s-~~-- 
Our objective is to develop a method to automat- 
ically select the correct dependency structures accu- 
rately or at least those which have the highest prob- 
ability of being correct. We evaluate the various 
possible structures according to the mutual infor- 
mation between modifiers and particle-modificants. 
In some cases there is no particle and the modifi- 
cant directly precedes the modifier (see example in 
section 3.2). To calculate the mutual information 
for each relation, we obtain form the COD the con- 
ceptual identifiers (a numerical code) for the mod- 
ifiers that appear with the particle-modifica~t and 
the number of their occurrences in the corpus. If the 
pattern is not present, backing off, we search this in- 
formation for the modificant only. For each of those 
concept identifiers we obtain from the CD all gen- 
eralizers (concept identifiers that express a similar 
meaning in a more general way) and build a taxo- 
nomic hierarchy with them. Using the number of 
occurrences obtained, we calculate the mutual infor- 
mation for the concepts in the taxonomic hierarchy. 
We also build a taxonomic hierarchy for the modi- 
fier that appears with the particle-modificant in the 
sentence. Then comparing these two taxonomic hi- 
erarchies (one for the modifiers in the COD, one for 
the modifers in the sentence), we look for the con- 
cept identifier common to both hierarchies that has 
the highest mutual information. This is the mutual 
information for the relation itself. For each depen- 
dency structure we calculate a score by multiplying 
the mutual information for all ambiguous relations 
(the non-ambiguous do not contribute to the evalua- 
tion). The dependency structure with highest prob- 
ability of being correct is the one with the highest 
score. Since all structures have the same number of 
relations, this multiplication reflects the likelyhood 
of the structure. 
3.1 The Algorithm 
The process described above is written in an algo- 
rithmic form as follows: 
1. Select the ambiguous relations (those with more 
than one modificant) for each structure. 
2. Search COD for the particle-nmdificant sub- 
pattern, in the corresponding positions. If there 
is no entry, search for the modificant only. 
3. Obtain from the COD the concept identifiers for 
the modificant (there may be multiple mean- 
ings) and the concept identifiers with the num- 
ber of their occurrences in the corpus for 
the modifiers which occur with the particle- 
modificant pattern. 
4. For each modificant concept identifer, build a 
taxonomic hierarchy with its modifiers using 
CD to find the generalizer for each concept iden- 
tifier. 
5. Calculate the mutual information 2 
for all the concept identifiers in the taxonomic 
hierarchies. 
6. For the modifiers in the sentence, extract their 
concept identifiers from COD and build the tax- 
onomic hierarchies using CD to find the gener- 
alizers for each concept identifier. 
7. For each relation (modifier-particle-modificant 
pattern), search the concept identifier that gen- 
eralizes the modifier word and has maximum 
nmtual information. This value is the mutual 
information for the relation. 
8. For each dependency structure, multiply the 
mutual information of its ambiguous depen- 
dency relations to obtain the score for that 
structure. 
9. Arrange the structures according to their scores. 
2The mutual information tells how much information 
one outcome &ives about the other and is given by the 
formula: 
I(Wl, w2) ---- In kp-(-~) \] (1) 
373 
3.2 Examples 
The following figure shows the output from RDG for 
a given sentence. The arrows in the figure indicate 
the dependency relations. 
~ ~ -- work people stress structure - lllIIOY~lOn progress 
grow worse 
The ambiguous relations are ~$i~g~ ~J~./v'C, 
and ~ A~-~ ¢) ~ "~. Accordingly the occurrences for 
the modificants in these relations (~O~O, ~ (, 
(, ~, and ~©7o) are extracted from COD, ob- 
taining a list of modifier concept identifiers with the 
number of their occurrences. Note that in the pat- 
tern ~ ( A and /~ ( ~ b l/7, the modificant pre- 
cedes the modifier. The following figure shows some 
modifiers for ~ ((work) with their number of oc- 
currences. 
person wom~n mother drive each person f~ctory wife f~ct worker 32 18 6 6 3 3 3 2 2 2 
Next, the taxonomic hierarchy for each particle- 
modificant is built and the mutual information cal- 
culated for each concept identifier. An extra£t of the 
hierarchy for ~ ( is shown in the following figure. 
~ (0.0)-~ ~2~ 
2n~ pseudo-stilq life 
~,~'~7o:TF~ 71 life ~.bstract product 
huma, n or similar 
/~ live body relative to action 
human # )~ (3.61) ~ (3.40) 
person force 
Next the generalizers for (~, A, and ;~ b P~) 
are searched in the hierarchies for their modificants 
to obtain the mutual information for the relations. 
For ~ (A (working person) it happened to be the 
concept A (person) itself with mutual information 
of 3.61. For ~ ( 5~ b l~ 5~ (working stress) the match 
occurred for ~ (force) giving a mutual information 
of 0.69. 
Multiply the mutual information for all the depen- 
dency relations in each structure. For the example 
sentence the mutual information for the ambiguous 
relations are as follows: 
~-~.95 .~- --'~ 
")©~'C'~\]'z 
From this the algorithm selects the parse with 
highest score which is drawn in thick lines. The next 
figure shows the result for the first example sentence. 
1.60 3.40 
sudden relation deep heart disease pressure more than 10"/0 was 
4 Results and Evaluation 
We have applied our method to 35 sentences taken 
from a leading newspaper and included with RDG 
software. The average number of dependency struc- 
tures per sentence is 8.68. The method we used se- 
lected the correct structures for 25 sentences. The 
correct structures for 8 sentences were found as the 
second most probable structure by the method. 
In another experiment, we parsed 70 sentences us- 
ing a grammar similar to the one used in Kurohashi 
and Nagao (1993). Our method selected the most 
likely relation among the multiple generated in 95~, 
of the cases. 
Although the size of the test data is small, we say 
that our method provided a way to identify the most 
probable structure more efficiently than RDG. Since 
the sentences used are extracted from a newspaper, 
it's also general in its applicability. Therefore it can 
be used in preparing teaching materials such as the 
structures used by a CAI system such as CATERS, 
saving the instructor of hand-coding them. In future 
work we shall extract the co-occurrences directly 
from the corpora, and use other grouping techniques 
to replace the CD. 
5 Acknowledgments 
I am thankful to my thesis advisor Dr. T. Furugori 
and the anonymous referees for their suggestions and 
comments. 

References 
Fukumoto, F.; Sano H., Saitoh, Y.; and Fukumoto 
J. 1992. A Framework for Dependency Gram- 
mar based on the word's modifiability level - 
Restricted Dependency Grammar. In Trans. IPS 
Japan, 33(10), (in Japanese). 
Resnik,P. and Hearst M. 1993. Structural Ambiguity 
and Conceptual Relations. In Proceedings of the 
Workshop on Very Large Corpora: Academic and 
Industrial Perspectives. Ohio State University. 
Japan Electronic Dictiona~'y Research Institute, Ltd. 
1993. EDR Electronic DictionaiT Specifications 
Guide (in Japanese). 
Kano, C. and Yanlamoto, H. 1995. A System for 
Reading Scientific and Technical Texts, Class- 
room, Instruction and Evaluation. In Jinbunka- 
gaku to computer 27(1) (in Japanese). 
Kurohashi, S., and Nagao, M. 1993. Structural 
Disambiguation in Japanese by Evaluating Case 
Structures based on Examples in Case Frame Dic- 
tionary. In Proceedings of IVCPT93. 
