Fertilization of Case Frame Dictionary
for Robust Japanese Case Analysis
Daisuke Kawaharay and Sadao Kurohashiyz
yGraduate School of Information Science and Technology, University of Tokyo
zPRESTO, Japan Science and Technology Corporation (JST)
fkawahara,kurog@kc.t.u-tokyo.ac.jp
Abstract
This paper proposes a method of fertilizing a
Japanese case frame dictionary to handle com-
plicated expressions: double nominative sen-
tences, non-gapping relation of relative clauses,
and case change. Our method is divided into
two stages. In the first stage, we parse a large
corpus and construct a Japanese case frame dic-
tionary automatically from the parse results. In
the second stage, we apply case analysis to the
large corpus utilizing the constructed case frame
dictionary, and upgrade the case frame dictio-
nary by incorporating newly acquired informa-
tion.
1 Introduction
To understand a text, it is necessary to find out
relations between words in the text. What is
required to do so is a case frame dictionary. It
describes what kinds of cases each verb has and
what kinds of nouns can fill a case slot. Since
these relations have millions of combinations,
it is difficult to construct a case frame dictio-
nary by hand. We proposed a method to con-
struct a Japanese case frame dictionary auto-
matically by arranging large volumes of parse
results by coupling a verb and its closest case
component (Kawahara and Kurohashi, 2001).
This case frame dictionary, however, could not
handle complicated expressions: double nomi-
native sentences, non-gapping relation of rela-
tive clauses, and case change.
This paper proposes a method of fertiliz-
ing the case frame dictionary to handle these
complicated expressions. We take an iterative
method which consists of two stages. This
means gradual learning of what is understood
by an analyzer in each stage. In the first stage,
we parse a large raw corpus and construct a
Japanese case frame dictionary automatically
from the parse results. This is the method pro-
posed by (Kawahara and Kurohashi, 2001). In
the second stage, we apply case analysis to the
large corpus utilizing the constructed case frame
dictionary, and upgrade the case frame dictio-
nary by incorporating newly acquired informa-
tion.
We conducted a case analysis experiment
with the upgraded case frame dictionary, and
its evaluation showed effectiveness of the fertil-
ization process.
2 Japanese Grammar
We introduce Japanese grammar briefly in this
section.
Japanese is a head-final language. Word or-
der does not play a case-marking role. Instead,
postpositions function as case markers (CMs).
The basic structure of a Japanese sentence is as
follows:
(1) kare
he
ga
nom-CM
hon
book
wo
acc-CM
kaku
write
(he writes a book)
ga and wo are postpositions which mean nom-
inative and accusative, respectively. kare ga
and hon wo are case components, and kaku is a
verb1.
There are two phenomena that case markers
are hidden.
A modifying clause is left to the modified
noun in Japanese. In this paper, we call a
noun modified by a clause clausal modifiee.
A clausal modifiee is usually a case component
for the verb of the modifying clause. There is,
however, no case marker for their relation.
1In this paper, we call verbs, adjectives, and
noun+copulas as verbs for convenience.
(2) hon
book
wo
acc-CM
kaita
write
hito
person
(the person who wrote the book)
(3) kare
he
ga
nom-CM
kaita
write
hon
book
(a book which he wrote)
In (2), hito ‘person’ has ga ‘nominative’ rela-
tion to kaita ‘write’. In (3), hon ‘book’ has wo
‘accusative’ relation to kaita ‘write’.
There are some non case-marking postposi-
tions, such as wa and mo. They topicalize or
emphasize noun phrases. We call them topic
markers (TMs) and a phrase followed by one
of them TM phrase.
(4) kare
he
wa
TM
hon
book
wo
acc-CM
kaita
write
(he wrote a book)
(5) kare
he
ga
nom-CM
hon
book
mo
TM
kaita
write
(he wrote a book also)
In (4), wa is interpreted as ga ‘nominative’. In
(5), mo is interpreted as wo ‘accusative’.
3 Construction of the initial case
frame dictionary
This section describes how to construct the ini-
tial case frame dictionary. This is the first stage
of our two-stage approach, and is performed by
the method proposed by (Kawahara and Kuro-
hashi, 2001). In the rest of this section, we de-
scribe this approach in detail.
The biggest problem in automatic case frame
construction is verb sense ambiguity. Verbs
which have different meanings should have dif-
ferent case frames, but it is hard to disam-
biguate verb senses very precisely. To deal
with this problem, we distinguish predicate-
argument examples, which are collected from
a large corpus, by coupling a verb and its
closest case component. That is, examples
are not distinguished by verbs such as naru
‘make/become’ and tsumu ‘load/accumulate’,
but by couples such as “tomodachi ni naru”
‘make a friend’, “byouki ni naru” ‘become
sick’, “nimotsu wo tsumu” ‘load baggage’, and
“keiken wo tsumu” ‘accumulate experience’.
This process makes separate case frames
which have almost the same meaning or usage.
For example, “nimotsu wo tsumu” ‘load bag-
gage’ and “busshi wo tsumu” ‘load supply’ are
separate case frames. To merge these similar
case frames and increase coverage of the case
frame, we cluster the case frames.
We employ the following procedure for the
automatic case frame construction:
1. A large raw corpus is parsed by a Japanese
parser, and reliable predicate-argument ex-
amples are extracted from the parse re-
sults. Nouns with a TM such as wa or
mo and clausal modifiees are discarded, be-
cause their case markers cannot be under-
stood by syntactic analysis.
2. The extracted examples are bundled ac-
cordingtotheverbanditsclosestcasecom-
ponent, making initial case frames.
3. The initial case frames are clustered using
a similarity measure, resulting in the final
case frames. The similarity is calculated by
using NTT thesaurus.
We constructed a case frame dictionary from
newspaper articles of 20 years (about 20,000,000
sentences).
4 Target expressions
The following expressions could not be handled
with the initial case frame dictionary shown in
section 3, because of lack of information in the
case frame.
Non-gapping relation
This is the case in which the clausal modifiee
is not a case component of the verb in the modi-
fying clause, but is semantically associated with
the clause.
(6) kare ga
he
syudoken wo
initiative
nigiru
have
kaigi
meeting
(the meeting in which he has the initiative)
In this example, kaigi ‘meeting’ is not a case
component of nigiru ‘have’, and there is no case
relation between kaigi and nigiru. We call this
relation non-gapping relation.
Double nominative sentence
This is the case in which the verb has two
nominatives in sentences such as the following.
(7) kuruma
car
wa
TM
engine ga
engine
yoi
good
(the engine of the car is good)
In this example, wa plays a role of nominative,
so yoi ‘good’ subcategorizes two nominatives:
kuruma ‘car’ and engine. We call this outer
nominative outer ga and this sentence double
nominative sentence.
Case change
In Japanese, to express the same meaning,
we can use different case markers. We call this
phenomenon case change.
(8) Tom ga
Tom
Mary
Mary
no
of
shiji wo
support
eta
derive
(Tom derived his support from Mary)
In this example, Mary has kara ‘from’ relation
to eta ‘derive’. In this paper, we handle case
change related to no ‘of’, such as (no, kara).
The following is an example that outer nom-
inative is related to no case.
(9) kuruma no
car
engine ga
engine
yoi
good
(the engine of the car is good)
The outer nominative of (7) can be nominal
modifier of the inner nominative like this ex-
ample. This is case change of (no, outer ga).
There is a different case from the above that
an NP with no modifying a case component
does not have a case relation to the verb.
(10) kare ga
he
kaigi no
meeting
syudoken wo
initiative
nigiru
have
(he has the initiative in the meeting)
In this example, kaigi ‘meeting’ has a no rela-
tion to syudoken ‘initiative’, but does not have a
case relation to nigiru ‘have’. This example is a
transformation of (6), and includes case change
of (no, non-gapping).
5 Fertilization of case frame
dictionary
We construct a fertilized case frame dictionary
from the initial case frame dictionary shown in
section 3, to handle the complicated expressions
described in section 4.
We apply case analysis to a large corpus using
the dictionary, collect information which could
not be acquired by a mere parsing, and upgrade
the case frame dictionary.
The procedure is as follows (figure 1):
1. The initial case frames are acquired by the
method shown in section 3.
2. Case analysis utilizing the case frames ac-
quired in phase 1 is applied to a large cor-
pus, and examples of outer nominative are
collected from case analysis results.
3. Case analysis utilizing the case frames ac-
quired in phase 2 is applied to the large
corpus, and examples of non-gapping rela-
tion are collected similarly.
4. Case similarities are judged to handle case
change.
5.1 Case analysis based on the initial
case frame dictionary
Case analysis of TM phrases and clausal modi-
fiees is indebted to a case frame dictionary. This
section describes an example of case analysis
utilizing the initial case frame dictionary.
(11) sono
that
hon
book
wa
TM
kare ga
he
tosyokan
library
de
in
yonda
read
(he read that book in the library)
Case analysis of this example chooses the fol-
lowing case frame “tosyokan de yonda” ‘read in
the library’ (“*” in the case frame means the
closest CM.).
CM examples input
read
nom person, child, ¢¢¢ he
acc book, paper, ¢¢¢ book
loc* library, house, ¢¢¢ library
kare ‘he’ and tosyokan ‘library’ correspond to
nominative and locative, respectively, according
to the surface cases. The case marker of TM
phrase “hon wa” ‘book (TM)’ cannot be under-
stood by the surface case, but it is interpreted
as wo ‘accusative’ because of the matching be-
tween “hon wa” ‘book (TM)’ and the accusative
case slot of the case frame (underlined in the
case frame).
gano fueru
consumer
bank
company{ }
outer
ga{ }
corporation
bank
Japan { }
effect
result
profit}{
(1) case frames based on
    parsing
(2) collection of
    outer nominative
(3) collection of
    non-gapping relation
(4) case similarity judgement
Figure 1: Outline of our method
5.2 Collecting examples of outer
nominative
In the initial case frame construction described
in section 3, the TM phrase was discarded, be-
cause its case marker could not be understood
by parsing. In the example (7), “engine ga yoi”
‘the engine is good’ is used to build the initial
case frame, but the TM phrase “kuruma wa”
‘the car’ is not used.
Case analysis based on the initial case frame
dictionary tells a case of a TM phrase. Corre-
spondencetoouternominativecannotbeunder-
stood by the case slot matching, but indirectly.
If the TM cannot correspond to any case slots of
the initial case frame, the TM can be regarded
as outer nominative. For example, in the case
of (7), since the case frame of “engine ga yoi”
‘the engine is good’ has only nominative which
corresponds to “engine”, the TM of “kuruma
wa” cannot correspond to any case slots and is
recognized as outer nominative. On the other
hand, in the case of (11), the TM of hon wa
is recognized as accusative, because hon ‘book’
is similar to the examples of the accusative slot.
We can distinguish and collect outer nominative
examples in this way.
We apply the following procedure to each sen-
tence which has both a TM and ga. To reduce
the influence of parsing errors, the collection
process of these sentences is done under the con-
dition that a TM phrase has no candidates of
its modifying head without its verb.
1. We apply case analysis to a verb which is a
head of a TM phrase. If the verb does not
have the closest case component and can-
not select a case frame, we quit processing
this sentence and proceed to the next sen-
tence. In this phase, the TM phrase is not
made correspondence with a case of the se-
lected case frame.
2. If the case frame does not have any cases
which have no correspondence with the
case components in the input, the TM can-
not correspond to any case slots and is
regarded as outer nominative. This TM
phrase is added to outer nominative exam-
ples of the case frame.
The following is an example of this process.
(12) nagai
long
sumo
sumo
wa
TM
ashi-koshi ni
legs and loins
futan ga
burden
kakaru
impose
(long sumo imposes a burden on legs and
loins)
Case analysis of this example chooses the fol-
lowing case frame “futan ga kakaru” ‘impose a
burden’.
CM examples input
impose nom* burden burdendat heart, legs, loins, ¢¢¢ legs and loins
futan ‘burden’ and ashi-koshi ‘legs and loins’
correspond to nominative and dative of the case
frame, respectively, and sumo corresponds to no
case marker. Accordingly, the TM of “sumo
wa” is recognized as outer nominative, and
sumo is added to outer nominative examples of
the case frame “futan ga kakaru”.
This process made outer nominative of 15,302
case frames (of 597 verbs).
5.3 Collecting examples of non-gapping
relation
Examples of non-gapping relation can be col-
lected in a similar way to outer nominative.
When a clausal modifiee has non-gapping re-
lation, it should not be similar to any exam-
ples of any cases in the case frame, because the
constructed case frames have examples of only
casesexceptfornon-gappingrelation. Fromthis
point of view, we apply the following procedure
to each example sentence which contains a mod-
ifying clause. To reduce the influence of pars-
ing errors, the collection process of example sen-
tences is done under the condition that a verb
in a clause has no candidates of its modifying
head without its clausal modifiee (“¢¢¢ [modify-
ing verb] N1 no N2” is not collected).
1. We apply case analysis to a verb which is
contained by a modifying clause. If the
verb does not have the closest case compo-
nentandcannotselectacaseframe, wequit
processing this sentence and proceed to the
next sentence. In this phase, the clausal
modifiee is not made correspondence with
a case of the selected case frame.
2. If the similarity between the clausal modi-
fiee and examples of any cases which have
no correspondence with input case com-
ponents does not exceed a threshold, this
clausal modifiee is added to examples of
non-gapping relation in the case frame. We
set the threshold 0.3 empirically.
The following is an example of this process.
(13) gyomu
business
wo itonamu
carry on
menkyo
license
wo syutoku-shita
get
(` got a license to carry on business)
Case analysis of this example chooses the follow-
ingcaseframe“fgyomu, businessgwo itonamu”
‘carry on f work, business g’.
CM examples input
carry on nom bank, company, ¢¢¢ -acc* work, business business
Nominative of this case frame has no corre-
spondence with a case component of the in-
put, so the clausal modifiee, menkyo ‘license’,
is checked whether it can correspond to nom-
inative case examples. In this case, the sim-
ilarity between menkyo ‘license’ and examples
of nominative is not so high. Consequently, the
relation of menkyo ‘license’ is recognized as non-
gapping relation, and menkyo is added to exam-
ples of non-gapping relation in the case frame
“fgyomu, businessg wo itonamu”.
(14) ihouni
illegally
denwa
telephone
gyomu
business
wo
itonande-ita
carry on
utagai
suspect
(suspect that ` carried on telephone busi-
ness illegally)
In this case, the above case frame is also se-
lected. Since utagai ‘suspect’ is not similar to
the nominative case examples, it is added to
case examples of non-gapping relation in the
case frame.
This process made non-gapping relation of
23,094 case frames (of 637 verbs).
Collecting examples of non-gapping rela-
tion for all the case frames
Non-gapping relation words which have wide
distribution over verbs can be considered to
have non-gapping relation for all the verbs or
case frames. We add these words to examples
of non-gapping relation of all the case frames.
For example, 5 verbs have menkyo ‘license’ (ex-
ample (13)) in their non-gapping relation, and
381 verbs have utagai ‘suspect’ (example (14)).
We, consequently, judge utagai has non-gapping
relation for all the case frames. We call such a
word global non-gapping word.
We treated words which have non-gapping re-
lation for more than 100 verbs as global non-
gapping words. We acquired 128 global non-
gapping words, and the following is the exam-
ples of them (in English).
possibility, necessity, result, course, case,
thought, schedule, outlook, plan, chance,
¢¢¢
