Japanese Case Frame Construction by Coupling the Verb
and its Closest Case Component
Daisuke Kawahara
Graduate School of Informatics, Kyoto University
Yoshida-Honmachi, Sakyo-ku, Kyoto, 606-8501,
Japan
kawahara@pine.kuee.kyoto-u.ac.jp
Sadao Kurohashi
Graduate School of Informatics, Kyoto University
Yoshida-Honmachi, Sakyo-ku, Kyoto, 606-8501,
Japan
kuro@pine.kuee.kyoto-u.ac.jp
ABSTRACT
Thispaperdescribesamethodtoconstructacaseframedic-
tionary automatically from a raw corpus. The main prob-
lem is how to handle the diversity of verb usages. We col-
lect predicate-argument examples, which are distinguished
by the verb and its closest case component in order to deal
with verb usages, from parsed results of a corpus. Since
these couples multiplyto millionsofcombinations, it isdif-
ﬁcult tomakeawide-coverage case framedictionaryfroma
small corpus like an analyzed corpus. We, however, use a
rawcorpus,sothatthisproblemcanbeaddressed. Further-
more, we cluster and merge predicate-argument examples
whichdoes nothave diﬀerent usages but belong todiﬀerent
case frames because of diﬀerent closest case components.
We also report on an experimental result of case structure
analysis using the constructed case frame dictionary.
1. INTRODUCTION
Syntacticanalysisorparsinghasbeenamainobjectivein
NaturalLanguageProcessing. IncaseofJapanese,however,
syntactic analysis cannot clarifyrelations between words in
sentences because of several troublesome characteristics of
Japanese such as scrambling, omission of case components,
and disappearance of case markers. Therefore, in Japanese
sentence analysis, case structure analysis is an important
issue, and a case frame dictionary is necessary forthe anal-
ysis.
Some research institutes have constructed Japanese case
frame dictionaries manually [2, 3]. However, it is quite ex-
pensive, or almost impossible to construct a wide-coverage
case frame dictionary by hand.
Others have tried to construct a case frame dictionary
automatically from analyzed corpora. However, existing
syntactically analyzed corpora are too small to learn a dic-
tionary, since case frame information consists of relations
between nouns and verbs, which multiplies to millions of
combinations. Based on such a consideration, we took the
unsupervised learning strategy to Japanese case frame con-
struction
1
.
To construct a case frame dictionary from a raw corpus,
weparsearawcorpusﬁrst,butparseerrorsareproblematic
in this case. However, if we use only reliable modiﬁer-head
relations to construct a case frame dictionary, this problem
can be addressed. Verb sense ambiguity is rather problem-
atic. Since verbs can have diﬀerent cases and case compo-
nents depending on their meanings, verbs which have dif-
ferent meanings should have diﬀerent case frames. To deal
withthisproblem, wecollectpredicate-argument examples,
whicharedistinguishedbytheverbanditsclosestcasecom-
ponent, and cluster them. Thatis,examples are notdistin-
guished by verbs such as naru ‘make, become’ and tsumu
‘load, accumulate’, but by couples such as tomodachi ni
naru ‘make a friend’, byouki ni naru ‘become sick’,nimotsu
wo tsumu ‘loadbaggage’, and keiken wo tsumu ‘accumulate
experience’. Sincethesecouplesmultiplytomillionsofcom-
binations, it is diﬃcult to make a wide-coverage case frame
dictionaryfromasmallcorpuslikeananalyzedcorpus. We,
however, use a raw corpus, so that this problem can be ad-
dressed. Theclustering process istomerge examples which
does not have diﬀerent usages but belong to diﬀerent case
frames because of diﬀerent closest case components.
2. VARIOUS METHODS FOR CASE FRAME
CONSTRUCTION
Weemploythefollowingprocedureofcaseframeconstruc-
tion from rawcorpus (Figure 1):
1. A large raw corpus isparsed by KNP [5],and reliable
modiﬁer-head relations are extracted from the parse
results. We call these modiﬁer-head relations exam-
ples.
2. The extracted examples are distinguished by the verb
and its closest case component. We call these data
example patterns.
3. The example patterns are clustered based on a the-
saurus. We call the output of this process example
case frames, which is the ﬁnal result of the system.
We call words which compose case components case
examples, andagroup ofcaseexamples case exam-
ple group. In Figure 1, nimotsu ‘baggage’, busshi
1
In English, several unsupervised methods have been pro-
posed[7, 1]. However, it is diﬀerent from those that combi-
nations ofnouns and verbs must be collected in Japanese.
example patterns
raw corpus
tagging or
analysis+extraction
of reliable relations
thesaurus
by hand or learning
I. examples example case frames
III. merged frame
II. co-occurrences
IV. semantic case frames
wo
wo
niga
ga
tsumuwonimotsu
busshi
keiken
nikuruma
truck
hikoki
gajugyoin
sensyu
wo
ni
ga
wo
ni
ga
wo
tsumu
tsumu
tsumu
tsumu
tsumu
tsumu
tsumu
tsumu
nimotsu
kuruma
jugyoin
busshi
truck
keiken
sensyu
nihikoki
woga
ga woni tsumu
tsumusensyu keiken
jugyoin kuruma
truck
hikoki
nimotsu
busshi
ni
niga
wo
wo
wo tsumu
tsumu
tsumu
nimotsu
busshi
keiken
kurumajugyoin
truck
hikoki
gasensyu
wo
wo
wo
wo
wo
ni
ni
ga
ga
ni
kuruma
car
tsumu
load
tsumu
load
tsumu
load
tsumu
load
tsumu
nimotsu
baggage
nimotsu
baggage
jugyoin
worker
busshi
supply
busshi
supply
experience
keikensensyu
player
truck
hikoki
truck
airplane
tsumu
tsumu<mind>
<thing><vehicle><person>
<person>
accumulate
Figure 1: Several methods for case frame construction.
‘supply’, and keiken ‘experience’ are case examples,
and {nimotsu ‘baggage’, busshi ‘supply’} (of wo case
markerintheﬁrstexamplecase frameof tsumu ‘load,
accumulate’) is a case example group. A case com-
ponentthereforeconsistsofacaseexampleandacase
marker (CM).
Let us now discuss several methods of case frame construc-
tion as shown in Figure 1.
First, examples (I of Figure 1) can be used individually,
but this method cannot solve the sparse data problem. For
example,
(1) kuruma ni nimotsu wo tsumu
car dat-CM baggage acc-CM load
(load baggage onto the car)
(2) truck ni busshi wo tsumu
truck dat-CM supply acc-CM load
(load supply onto the truck)
even if these two examples occur in a corpus, it cannot
be judged whether the expression “kuruma ni busshi wo
tsumu”(load supply onto the car) isallowed or not.
Secondly, examples can be decomposed into binomial re-
lations (II of Figure 1). These co-occurrences are utilized
bystatisticalparsers,andcanaddressthesparsedataprob-
lem. Inthis case, however, verb sense ambiguity becomes a
serious problem. For example,
(3) kuruma ni nimotsu wo tsumu
car dat-CM baggage acc-CM load
(load baggage onto the car)
(4) keiken wo tsumu
experience acc-CM accumulate
(accumulate experience)
fromthese two examples, three co-occurrences (“kuruma ni
tsumu”,“nimotsu wo tsumu”,and “keiken wo tsumu”)are
extracted. They, however, allow the incorrect expression
“kuruma ni keiken wo tsumu” (load experience onto the
car, accumulate experience onto the car).
Thirdly, examples can be simply merged into one frame
(III of Figure 1). However, information quantity of this is
equivalent to that of the co-occurrences (II of Figure 1), so
verb sense ambiguity becomes a problem as well.
We distinguish examples by the verb and its closest case
component. Our method can address the two problems
above: verb sense ambiguity and sparse data.
Ontheotherhand, semanticmarkers can beused ascase
componentsinsteadofcaseexamples. Thesewecallseman-
tic case frames (IV of Figure 1). Constructing semantic
caseframesbyhandleadstotheproblemmentionedinSec-
tion1. Utsuroetal. constructed semanticcase framesfrom
a corpus [8]. There are three main diﬀerences to our ap-
proach: they use an annotated corpus, depend deeply on a
thesaurus, and did not resolve verb sense ambiguity.
3. COLLECTING EXAMPLES
This section explains how to collect examples shown in
Figure1. Inorder toimprovethequalityofcollectedexam-
ples, reliablemodiﬁer-head relationsare extracted fromthe
parsed corpus.
3.1 Conditions of case components
When examples are collected, case markers, case exam-
ples,and case components must satisfythefollowingcondi-
tions.
Conditions of case markers
Case components which have the following case markers
(CMs) are collected: ga (nominative), wo (accusative), ni
(dative), to (with, that), de (optional), kara (from), yori
(from), he (to),and made (to). Wealsohandle compound
case markers such as ni-tsuite ‘in terms of’, wo-megutte
‘concerning’, and others.
Inadditiontothesecases,weintroduce time case marker.
Casecomponents whichbelong totheclass <time>(seebe-
low) and contain a ni, kara,ormade CM are merged into
time CM. This is because it is important whether a verb
deeplyrelatestotimeornot,butnottodistinguishbetween
surface CMs.
Generalization of case examples
Case examples which have deﬁnite meanings are general-
ized. Weintroducethefollowingthreeclasses,andusethese
classes instead of words as case examples.
<time>
• nouns which mean time
e.g. asa ‘morning’, haru ‘spring’,
rainen ‘next year’
• case examples which contain a unit of time
e.g. 1999nen ‘year’,12gatsu ‘month’,
9ji ‘o’clock’
• words which are followed by the suﬃx mae ‘before’,
tyu ‘during’,or go ‘after’anddonothavethesemantic
marker <place> on the thesaurus
e.g. kaku mae ‘before ··· write’,
kaigi go ‘afterthe meeting’
<quantity>
• numerals
e.g. ichi ‘one’, ni ‘two’, juu ‘ten’
• numerals followedby anumeralclassiﬁer
2
suchas tsu,
ko,and nin.
They are expressed with pairs of the class <quan-
tity>andanumeralclassiﬁer: <quantity>tsu,<quan-
tity>ko,and <quantity>nin.
e.g. 1tsu → <quantity>tsu
2ko → <quantity>ko
<clause>
• quotations (“··· to”‘that ···’)and expressions which
function as quotations (“··· koto wo” ‘that ···’).
e.g. kaku to ‘that ··· write’,
kaita koto wo ‘that ··· wrote’
Exclusion of ambiguous case components
We do not use the followingcase components:
• Since case components which contain topic markers
(TMs) and clausal modiﬁers do not have surface case
markers, wedo not use them. For example,
sono giin wa ··· wo teian-shita.
the assemblyman TM acc-CM proposed
wa is a topic marker and giin wa ‘assemblyman TM’
depends on teian-shita ‘proposed’,butthereisnocase
marker for giin ‘assemblyman’ in relation to teian-
shita ‘proposed’.
··· wo teian-shiteiru giin ga ···
acc-CM proposing assemblyman
“··· wo teian-shiteiru”isaclausalmodiﬁerand teian-
shiteiru ‘proposing’ depends on giin ‘assemblyman’,
but there is no case marker for giin ‘assemblyman’ in
relation to teian-shiteiru ‘proposing’.
• Casecomponentswhichcontaina ni or de casemarker
are sometimes used adverbially. Since they have the
optional relation to theirverbs, wedo not use them.
e.g. tame ni ‘because of’,
mujouken ni ‘unconditionally’,
ue de ‘in addition to’
For example,
30nichi ni souri daijin ga
30th on prime minister nom-CM
sono 2nin ni
those two people dat-CM
syou wo okutta
award acc-CM gave
2
Most nouns must take a numeral classiﬁer when they are
quantiﬁedinJapanese. AnEnglishequivalenttoitis‘piece’.
(On 30th the prime minister gave awards to those twopeo-
ple.)
from this sentence, the followingexample is acquired.
<time>:time-CM daijin:ga
minister:nom-CM
<quantity>nin:ni syou:wo okuru
people:dat-CM award acc-CM give
3.2 Conditions of verbs
Wecollectexamplesnotonlyforverbs,butalsoforadjec-
tivesand noun+copulas
3
. However,whenaverbisfollowed
by a causative auxiliary or a passive auxiliary, we do not
collect examples, since the case pattern is changed.
3.3 Extraction of reliable examples
When examples are extracted from automatically parsed
results, the problem is that the parsed results inevitably
contain errors. Then, to decrease inﬂuences of such errors,
we discard modiﬁer-head relations whose parse accuracies
are low and use only reliablerelations.
KNPemploys thefollowingheuristicrulestodeterminea
head of a modiﬁer:
HR1 KNP narrows the scope of a head by ﬁnding a clear
boundary ofclauses inasentence. When thereisonly
onecandidate verb inthe scope,KNPdeterminesthis
verb as the head of the modiﬁer.
HR2 Among the candidate verbs, verbs which rarely take
case components are excluded.
HR3 KNPdeterminestheheadaccordingtothepreference:
a modiﬁer which isnot followedby a comma depends
onthenearestcandidate,andamodiﬁerwithacomma
depends on the second nearest candidate.
Our approach trusts HR1 but not HR2 and HR3. That is,
modiﬁer-head relations which are decided in HR1 (there is
only one candidate of the head in the scope) are extracted
as examples, but relations which HR2 and HR3 are applied
to are not extracted. The following examples illustrate the
application ofthese rules.
(5) kare wa kai-tai hon wo
he TM want to buy book acc-CM
takusan mitsuketa node,
a lot found because
Tokyo he okutta.
Tokyo to sent
(Because he found a lotofbooks which hewants tobuy, he
sent them toTokyo.)
Inthisexample,anexamplewhichcanbeextractedwithout
ambiguityis“Tokyo he okutta”‘sent φ toTokyo’attheend
ofthesentence. Inaddition,since node ‘because’isanalyzed
as a clear boundary of clauses, the head candidate of hon
wo ‘book acc-CM’ is only mitsuketa ‘ﬁnd’, and this is also
extracted.
Verbs excluded from head candidates by HR2 possibly
become heads, so wedonot use theexamples whichHR2 is
applied to. For example, when there is a strong verb right
3
In this paper, we use ’verb’ instead of ’verb/adjective or
noun+copula’ for simplicity.
afteranadjective,thisadjectivetendsnot tobeaheadofa
case component, so it isexcluded from head candidates.
(6) Hi no mawari ga hayaku
ﬁre of spread nom-CM rapidly
sukuidase-nakatta.
could not save
(The ﬁre spread rapidly, so φ
1
could not save φ
2
.)
In this example, the correct head of mawari ga ‘spread’ is
hayaku ‘rapidly’. However, since hayaku ‘rapidly’ is ex-
cluded from the head candidates, the head of mawari ga
‘spread’ is analyzed incorrectly.
We show an example of the process HR3:
(7) kare ga shitsumon ni
he nom-CM question acc-CM
sentou wo kitte kotaeta.
lead acc-CM take answered
(He took the lead to answer the question.)
In this example, head candidates of shitsumon ni ‘question
acc-CM’ are kitte ‘take’ and kotaeta ‘answered’. According
tothepreference“modifythenearerhead”,KNPincorrectly
decides the head is kitte ‘take’. Like this example, when
there are many head candidates, the decided head is not
reliable,so we do not use examples in this case.
We extracted reliable examples from Kyoto University
Corpus[6],thatisasyntacticallyanalyzed corpus, andeval-
uated the accuracy of them. The accuracy of all the case
examples which have the target cases was 90.9%, and the
accuracy of the reliable examples was 97.2%. Accordingly,
this process isvery eﬀective.
4. CONSTRUCTION OF EXAMPLE CASE
FRAMES
As shown in Section 2, when examples whose verbs have
diﬀerentmeaningsaremerged,acaseframewhichallowsan
incorrect expression is created. So, for verbs with diﬀerent
meanings, diﬀerent case frames should be acquired.
Inmostcases,animportantcasecomponentwhichdecides
thesenseofaverbistheclosestonetotheverb,thatis,the
verb sense ambiguity can be resolved by coupling the verb
and itsclosest casecomponent. Accordingly,wedistinguish
examples by the verb and its closest case component. We
call the case marker of the closest case component closest
case marker.
The number of example patterns which one verb has is
equal to that of the closest case components. That is, ex-
ample patterns which have almost the same meaning are
individually handled as follows:
(8) jugyoin:ga kuruma:ni
worker:nom-CM car:dat-CM
nimotsu:wo tsumu
baggage:acc-CM load
(9) {truck,hikoki}:ni
{truck,airplane}:dat-CM
busshi:wo tsumu
supply:acc-CM load
In order to merge example patterns that have almost the
same meaning, we cluster example patterns. The ﬁnal ex-
( 5 + 8 ) + ( 3 + 2 + 10 )
( 3 + 5 + 8 ) + ( 3 + 2 + 10 )
()
1/2
= 0.90ratio of common cases :
0.911.0 0.86
5
10
8
32
0.91= 0.94
1.0 ¡  (5¡ 3) + 0.86 ¡  (5¡ 2)
(5¡ 3) + (5¡ 2)
1/2 1/2
1/2 1/2
3
tsumu
tsumu
wo
wo
nimotsu
 busshini
kuruma
{truck  , hikoki  }
jugyoin ga ni
0.92 ¡  0.90 = 0.83
similarity between
example patterns
:
= 0.92
0.94 ¡  ( (5¡ 3) + (5¡ 2)  )   + 0.91 ¡  (8¡ 10)
1/2 1/2 1/2 1/4
( (5¡ 3) + (5¡ 2)  )  + (8¡ 10)
1/2 1/2 1/2 1/4
similarity between
case example groups
:
load
load
baggageworker car
supply
airplanetruck
Figure 2: Example of calculating the similarity be-
tween example patterns (Numerals in the lower
right of examples represent their frequencies.)
ample case frames consist of the example pattern clusters.
Thedetailoftheclusteringisdescribedinthefollowingsec-
tion.
4.1 Similarity between example patterns
Theclustering ofexample patterns isperformed by using
the similarity between example patterns. This similarity
is based on the similarities between case examples and the
ratio of common cases. Figure 2 shows an example of cal-
culating the similaritybetween example patterns.
First,the similaritybetweentwoexamples e
1
,e
2
iscalcu-
lated using the NTTthesaurus as follows:
sim
e
(e
1
,e
2
)=max
x∈s
1
,y∈s
2
sim(x,y)
sim(x,y)=
2L
l
x
+l
y
where x,y are semantic markers, and s
1
,s
2
are sets of se-
mantic markers of e
1
,e
2
respectively
4
. l
x
,l
y
are the depths
of x,y inthe thesaurus, andthe depthoftheirlowest(most
speciﬁc)commonnodeis L.Ifx and y areinthesamenode
of the thesaurus, the similarity is 1.0, the maximum score
based on this criterion.
Next,thesimilaritybetweenthetwocaseexamplegroups
E
1
,E
2
is the normalized sum of the similarities of case ex-
amples as follows:
sim
E
(E
1
,E
2
)
=
C8
e
1
∈E
1
C8
e
2
∈E
2
√
|e
1
||e
2
|sim
e
(e
1
,e
2
)
C8
e
1
∈E
1
C8
e
2
∈E
2
√
|e
1
||e
2
|
where |e
1
|,|e
2
| represent the frequencies of e
1
,e
2
respec-
tively.
The ratio of common cases of example patterns F
1
,F
2
is
4
Inmanycases,nounshavemanysemanticmarkersinNTT
thesaurus.
calculated as follows:
cs=
D7
C8
n
i=1
|E
1cc
i
|+
C8
n
i=1
|E
2cc
i
|
C8
l
i=1
|E
1c1
i
|+
C8
m
i=1
|E
2c2
i
|
where the cases ofexample pattern F
1
are c1
1
,c1
2
,···,c1
l
,
the cases of example pattern F
2
are c2
1
,c2
2
,···,c2
m
, and
the common cases of F
1
and F
2
is cc
1
,cc
2
,···,cc
n
. E
1cc
i
is the case example group of cc
i
in F
1
. E
2cc
i
, E
1c1
i
, and
E
2c2
i
are deﬁned in the same way. The square root in this
equation decreases inﬂuences of the frequencies.
The similarity between F
1
and F
2
is the product of the
ratioof common cases and thesimilaritiesbetweencase ex-
ample groups of common cases of F
1
and F
2
as follows:
score=cs·
C8
n
i=1
√
w
i
sim
E
(E
1cc
i
,E
2cc
i
)
C8
n
i=1
√
w
i
w
i
=
CG
e
1
∈E
1cc
i
CG
e
2
∈E
2cc
i
D4
|e
1
||e
2
|
where w
i
is the weight of the similarities between case ex-
ample groups.
4.2 Selection of semantic markers of example
patterns
The similarities between example patterns are deeply in-
ﬂuenced by semantic markers of the closest case compo-
nents. So,when the closest case components have semantic
ambiguities, a problem arises. For example, when cluster-
ing example patterns of awaseru ‘join, adjust’, the pair of
example patterns (te ‘hand’, kao, ‘face’)
5
is created with
thecommon semanticmarker <partofananimal>,and (te
‘method’, syouten ‘focus’) is created with the common se-
manticmarker <logic,meaning>. Fromthesetwopairs,the
pair(te ‘hand’, kao ‘face’,syouten‘focus’)iscreated,though
<part of an animal> is not similar to <logic,meaning> at
all.
Toaddressthisproblem,weselectonesemanticmarkerof
theclosestcasecomponentofeachexamplepatterninorder
of the similaritybetween example patterns as follows:
1. Inorderofthesimilarityofapair,(p,q),oftwoexam-
plepatterns, weselectsemanticmarkers oftheclosest
case components, n
p
,n
q
of p,q. Theselectedsemantic
markers s
p
,s
q
maximizethesimilaritybetweenn
p
and
n
q
.
2. Thesimilaritiesofexamplepatternsrelatedto p,q are
recalculated.
3. These two processes are iterated whilethere are pairs
of two example patterns, of which the similarity is
higher than a threshold.
4.3 Clustering procedure
The followingis the clustering procedure:
1. Elimination of example patterns which occur infre-
quently
Target example patterns of the clustering are those
whose closest case components occur more frequently
than a threshold. We set this threshold to 5.
5
Example patterns arerepresented bythe closestcasecom-
ponents.
2. Clustering of example patterns which have the same
closest CM
(a) Similarities between pairs of two example pat-
terns which have the same closest CM are calcu-
lated, and semantic markers of closest case com-
ponents are selected. These two processes are it-
erated as mentioned in 4.2.
(b) Eachexamplepatternpairwhosesimilarityishigher
than some threshold is merged.
3. Clusteringof all the example patterns
Theexample patterns which areoutput by 2 areclus-
tered. In this phase, it is not considered whether the
closestCMsarethesameornot. Thefollowingexam-
ple patterns have almost the same meaning, but they
are not merged by 2 because of the diﬀerent closest
CM.Thisclusteringcanmergetheseexamplepatterns.
(10) {busshi,kamotsu}:wo
{supply,cargo}:acc-CM
truck:ni tsumu
truck:dat-CM load
(11) {truck,hikoki}:ni
{truck,airplane}:dat-CM
{nimotsu,busshi}:wo tsumu
{baggage,supply}:acc-CM load
5. SELECTION OF OBLIGATORY CASE
MARKERS
IfaCMwhosefrequencyislowerthanotherCMs,itmight
be collected because of parsing errors, or has little relation
to its verb. So, we set the threshold for the CM frequency
as2
√
mf,where mf meansthefrequencyofthemostfound
CM. If the frequency of a CM is less than the threshold, it
is discarded. For example, suppose the most frequent CM
for a verb is wo,100 times,and the frequency of ni CMfor
the verb is 16, ni CM is discarded (since it is less than the
threshold, 20).
However,sincewecansaythatalltheverbshave ga (nom-
inative)CMs, ga CMsarenotdiscarded. Furthermore,ifan
example case frame do not have a ga CM, we supplement
its ga case withsemantic marker <person>.
6. CONSTRUCTED CASE FRAME DICTIO-
NARY
We applied the above procedure to Mainichi Newspaper
Corpus (9years,4,600,000 sentences). Wesetthethreshold
oftheclustering0.80. Thecriterionforsettingthisthreshold
is that case frames which have diﬀerent case patterns or
diﬀerentmeaningsshouldnotbemergedintoonecaseframe.
Table1showsexamplesofconstructed examplecase frames.
Fromthecorpus, examplecaseframesof71,000 verbsare
constructed; the average number of example case frames of
a verb is 1.9; the average number of case slots of a verb is
1.7; the average number of example nouns in a case slot is
4.3. Theclusteringledadecrease inthenumberofexample
case frames of 47%.
Table 1: Examples of the constructed case frames(*
means the closest CM).
verb CM case examples
kau1 ga person, passenger
‘buy’ wo* stock, land, dollar, ticket
de shop, station,yen
kau2 ga treatment, welfare,postcard
wo* anger, disgust, antipathy
.
.
.
.
.
.
.
.
.
yomu1 ga student, primeminister
‘read’ wo* book, article,news paper
yomu2 ga <person>
wo talk, opinion, brutality
de* news paper, book, textbook
yomu3 ga <person>
wo* future
.
.
.
.
.
.
.
.
.
tadasu1 ga member, assemblyman
‘examine’ wo* opinion, intention, policy
ni tsuite problem, <clause>, bill
tadasu2 ga chairman, oneself
‘improve’ wo* position, form
.
.
.
.
.
.
.
.
.
kokuchi1 ga doctor
‘inform’ ni* the said person
kokuchi2 ga colleague
wo* infection, cancer
ni* patient, family
sanseida1 ga <person>
‘agree’ ni* opinion, idea, argument
sanseida2 ga <person>
ni* <clause>
AsshowninTable1,examplecaseframesofnoun+copulas
such as sanseida ‘positiveness+copula (agree)’, and com-
pound casemarkers such as ni-tsuite ‘intermsof’of tadasu
‘examine’ are acquired.
7. EXPERIMENTS AND DISCUSSION
Since it is hard to evaluate the dictionary statically, we
usethedictionaryincasestructureanalysisandevaluatethe
analysis result. We used 200sentences ofMainichiNewspa-
perCorpusasatestset. Weanalyzed casestructures ofthe
sentences using the method proposed by [4]. Asthe evalua-
tionofthecasestructureanalysis,wecheckedwhethercases
of ambiguous case components (topic markers and clausal
modiﬁers) are correctly detected or not. The evaluation re-
sult is presented in Table 2. The baseline is the result by
assigningavacantcaseinorderof’ga’,’wo’,and’ni’. When
we do not consider parsing errors to evaluate the case de-
tection, the accuracy of our method for topic markers was
96% and that for clausal modiﬁers was 76%. The baseline
accuracy for topic markers was 91% and that for clausal
modiﬁers was 62%. Thus we see our method is superior to
the baseline.
Table 2: The accuracy of case detection.
correct case
detection
incorrect case
detection
parsing error
our method
topic marker 85 4 13
clausal modiﬁer 48 15 2
baseline
topic marker 81 8 13
clausal modiﬁer 39 24 2
Thefollowing are examples of analysis results
6
:
(1)
1
ookurasyo
©ga
wa ginko ga
the Ministry of Finance TM bank nom-CM
2
tsumitate-teiru
2
ryuhokin
©wo
no
deposit reserve fund of
torikuzushi wo
3
mitomeru
consume acc-CM consent
3
houshin
×ni
† wo
1
kimeta .
policy acc-CM decide
(The Ministry of Finance decided the policy of con-
senting to consume the reserve fund which the banks
have deposited.)
(2)
korera no
1
gyokai
×wo
‡ wa seijiteki
these industry TM political
hatsugenryoku ga tsuyoi toiu
voice nom-CM strong
tokutyo ga
1
aru .
characteristic nom-CM have
(These industries have the characteristic of
strong politicalvoice.)
Analysiserrorsaremainlycausedbytwophenomena. The
ﬁrst is clausal modiﬁers which have no case relation to the
modifeessuchas“··· wo mitomeru houshin”‘policyofcon-
senting ···’(† above). The Second is verbs which take two
ga ‘nominative’ case markers (one is wa superﬁcially) such
as “gyokai wa ··· toiu tokutyo ga aru” ‘industries have the
characteristic of ···’(‡ above). Handling these phenomena
is an area of future work.
8. CONCLUSION
Weproposed anunsupervised methodtoconstructacase
frame dictionary by coupling the verb and its closest case
component. We obtained a large case frame dictionary,
which consists of 71,000 verbs. Using this dictionary, we
candetectambiguous casecomponents accurately. Weplan
to exploit this dictionary in anaphora resolution in the fu-
ture.
9. ACKNOWLEDGMENTS
The research described in this paper was supported in
part by JSPS-RFTF96P00502 (The Japan Society for the
Promotion ofScience, Research for the Future Program).
6
The underlined words with © are correctly analyzed, but
oneswith × arenot. Thedetected CMsareshownafterthe
underlines.
10. REFERENCES
[1] T.Briscoe and J. Carroll.Automaticextraction of
subcategorization fromcorpora. In Proceedings of the
5th Conference on Applied Natural Language
Processing,pages 356–363, 1997.
[2] S. Ikehara, M. Miyazaki, S. Shirai,A. Yokoo,
H. Nakaiwa,K.Ogura, and Y.O.Y. Hayashi, editors.
Japanese Lexicon.Iwanami Publishing, 1997.
[3] Information-Technology Promotion Agency, Japan.
Japanese Verbs : A Guide to the IPA Lexicon of Basic
Japanese Verbs.1987.
[4] S. Kurohashi and M.Nagao. A method ofcase
structure analysis for japanese sentences based on
examples in case frame dictionary. In IEICE
Transactions on Information and Systems,volume
E77-D No.2,1994.
[5] S. Kurohashi and M.Nagao. A syntactic analysis
method of long japanese sentences based on the
detection ofconjunctive structures. Computational
Linguistics,20(4), 1994.
[6] S. Kurohashi and M.Nagao. Building a japanese
parsed corpus whileimproving the parsing system. In
Proceedings of The First International Conference on
Language Resources & Evaluation,pages 719–724, 1998.
[7] C.D. Manning. Automatic acquisition of a large
subcategorization dictionary from corpora. In
Proceedings of the 31th Annual Meeting of ACL,pages
235–242, 1993.
[8] T.Utsuro, T. Miyata,and Y.Matsumoto. Maximum
entropy model learning ofsubcategorization preference.
In Proceedings of the 5th Workshop on Very Large
Corpora,pages 246–260, 1997.
