Japanese Case Structure Analysis 
by Unsupervised Construction of a Case Frame Dictionary 
Daisuke Kawahara, Nobuhiro Kaji and Sadao Kurohashi 
Graduate School of Intbrm~tics, Kyoto University 
Yoshida-Honmachi, S~kyo-ku, Kyoto, 606-8501, Japan 
{kawahara, kaj i, kuro }@p inc. kuee. kyoto-u, ac. jp 
Abstract 
In Japanese, case structure analysis is very im- 
t)ortant to handle several troublesome charac- 
teristics of Japanese snch as scrambling, onfis- 
sion of ease components, mid disappearance of 
case markers. However, fi)r lack of a wide- 
coverage ease frame dictionary, it has been dif- 
ficult to perfornl case structure analysis accu- 
rat;ely. Although several methods to construct 
a ease fl'mne dictionary from analyzed corpora 
have been proposed, they cannot avoid data 
sparseness 1)rol)lem. This paper proposes an un- 
supervised method of constructing a case frame 
dictionary from an enormous raw corpus by us- 
ing a robust and accurate parser. It also pro- 
rides a case structure analysis method based on 
the constructed dictionary. 
1 Introduction 
Syntactic analysis, or parsing has been a main 
objective in Natural Language Processing. In 
case of Jat)anese , however, syntactic analysis 
cannot clarify relations between words ill sen- 
tences because of several troublesome character- 
istics of Japanese such as scrambling, omission 
of case components, and disappearance of case 
markers. Therefore, in Japanese sentence analy- 
sis, case structure analysis is an important issue, 
and a case frame dictionary is necessary for the 
analysis. 
Some research institutes have constructed 
Japanese case frmne dictiouaries manually (Ike- 
hara et al., 1997; Infbrmation-Technology Pro- 
motion Agency, Japan, 1987). However, it is 
quite expensive, or almost impossible to con- 
struct a wide-coverage ease fl'anm dictionary by 
hand. 
Others have tried to construct a case fl'mne 
dictionary automatically from analyzed corpora 
(Utsuro et al., 1998). However, existing syntac- 
tically analyzed corpora are too small to learn a 
dictionary, since case fl'ame iuformation consists 
of relations between nouns and verbs, which rnul- 
tiplies to millions of combinations. 
Based on such a consideration, we took the 
fbllowing unsupervised learning strategy to the 
.Japanese case structure analysis: 
1. At first, a robust and accurate parser is de- 
veloped, which does not utilize a case fl'mne 
dictionary, 
2. a very large corI)us is parsed by the parser, 
3. reliable noun-verb relations are extracted 
from the parse results, and a case frmne dic- 
tionary is constructed from them, and 
4. the dictionary is utilized for case structure 
analysis. 
2 Characteristics of Japanese 
 and necessity of case 
structure analysis 
In Japanese, postpositions function as case 
markers ((Ms) mid a verb is final in a sentence. 
The basic structure of a Japanese sentence is as 
fbllows: 
(1) kate 9a coat wo ki~'u. 
he nominative-CM coat accusative-CM wear 
(lie wears a coat) 
A clause modifier is left to the modified noun 
as follows: 
(2) kate 9 a kite-iru coat 
lie nom-CM wear coat 
(the coat he wears) 
The modified noun followed by a postposition 
then becomes a case component of a matrix verb. 
The typical structure of a Japanese complex sen- 
tence is as fbllows: 
432 
(3) boush, i no irv wa kitc-ir'u 
hat of color tol)ic-marker wear 
coat ni a'wa~'cr"~,. 
coal; dative-CM harmonize 
(c/) harmonizes the color of his/her hat with 
the coat he/she wears) 
In terms of autolnatic analysis, the problen> 
atic characteristics of Japanese sentences can be 
summarized as follows: 
1. Case componenl;s are often scrambled or 
omitted. 
2. Case-marking postpositions disappear when 
case components are accompanied by topic- 
markers or other special 1)ostpositions 
meaning 'just', 'also' and others. 
cx) karv 'wa coat me ti:itc-iv'u. 
he tol}iC-lna.rker coat also wear 
(Ile wears a coat a.lso) 
3. A noun modified 1)y a clause is usually a case 
component for the verb of the mo(litlying 
clause. However, there is no case-marker for 
their relation. In case of sentence 3, there is 
no case-marker for coat in relation to kite- 
ir'u 'wear'. Note that 'hi (dative-CM) of coat: 
ni does not show the case to kitc-ivu 'wear', 
lint to awascr'v, 'harmonize'. 
4. Sentence 3 exhibits a typical structural am- 
1)iguity in a ,lalmnese sentence. That is, 
ir'o "~va 'color topic-marker' possit)ly modi- 
ties kit, c-iru 'wear' or awa.scv'u qmrmonize'. 
In English, sentence structure is rather rigid, 
and word order (the position in relation to the 
verb) clearly defines cases. In Japanese, how- 
ew% the l)roblem 1 above makes word order use- 
less, and CMs constitute the only int'ormation for 
detecting cases. 
Nevertheless, CMs often disapl)ear because of 
the problems 2 and 3, whidl means that sim- 
ple syntactic analysis cmmot clari(5~ cases sui\[i- 
cientl> For eXalnple, given an inlmt sentence: 
(4) har'c wa Dcv, tsch,-go me hano, sv,. 
he topic-marker Germall also sl)eak 
(he speaks German also) 
a simple syntactic analysis just detects both kar'c 
'he' and Dcutsch-go 'German' modifies \]ta'aas~t 
'speak', but tells nothing about which is subject 
and object. This analysis result is not sufficient 
for subsequent NLP applieations like Japanese 
to English machine l;rmmlation. 
Then, what we need to do is a case structure. 
analysis based on a case fl'ame dictiolmry, or a 
subcat, of each verb as follows: 
hanasu 'speak': 
ga (nora) ks're 'he', hire 'person' 
'we (ace) cigo 'English', kotoba '' 
~;i?'U, ~%vear': 
ga (nora) kavc qm', hil, o 'person' 
we (ace) fuhu 'cloth', coat 'coat' 
a'tlJasel'~t 'harlllonize': 
ga (nora) kar'c 'he', hito 'person' 
'we (ace) ire ~color' 
ni (dat) fltku 'cloth' 
Consultation of such a dictionary can easily find 
that kar'c 'he' is a nomilmtive case and Dev, tsch,- 
90 'German' is an accusative (:as(', in the sentence 
4. 
Furthermore, a (:ase frame dictionary Call solve 
the problem 4 above, that is, some part of struc- 
tural ambiguity in sentences. In case of sentence 
3, a t)r()l)er head for 'ir'o wa 'color topic-marker' 
(:all })e selected by consulting case slots of kir'u 
~wear' and those of a'wascru 'harmonize'. 
3 Unsupe, rvised construction of a 
case fralne dictionary 
This s(x:tion explains how to construct a case 
fralll(*, dictionary fl'om corl)ora autonmtica.lly. 
As mentioned in the introduction section, it; 
is quite expensive, or ahnost ilnl)ossible to con- 
struct a wide-coverage case frame dictionary by 
lmnd. In Japanese,, some noun q- copula works 
like an adjective. For example, sa~tsei da 'posi- 
tiveness + Colmla' can take 9a case and 'hi case. 
However, such case frames are rarely covered t)y 
the existing handmade dictionaries 1. 
Fm'thermore, existing halldmade dictionaries 
cover typical obligatory cases like ga (nomina- 
tive), wo (accusative), ni (dative), but do not 
cover compound case markers such as ni-kandz.itc 
'in terms of', 'wo-rncqutte 'concerning' and oth- 
ers. 
Then, we tried to construct an example-based 
case frmne dictionary from corpora, which de- 
lOut method collects case frames not only tbr verbs, 
but also tbr adjectives mM nouns-kcopula. In this paper, 
we use 'verb' instead of 'w;rb/adjective. or llOllll -{- copula.' 
for simplicity. 
433 
Table 1: The accuracy of KNP. 
'wa~ 7tto clause clause 
9 a 'we ni ka'r'a rr~.ade ?lori topic- modif~ying modifying 
noln. ace. dative from to from marker verbs nouIIS 
'lbtal 
91.2% 97.7% 94.2% 83.8% 85.3% 82.8% 88.0% 84.3% 95.5% 91.3% 
scribes what kind of cases each verb has and 
what kind of nouns can fill a case slot. Very large 
syntactically analyzed corpora could be useful to 
construct such a dictionary. However, corpus an- 
notation costs very much and existing analyzed 
corpora are too small from the view point of case 
frame learning. For exmnple, in Kyoto Univer- 
sity Corpus which consists of about 40,000 ana- 
lyzed sentences of newspaper articles, very basic 
verbs like tetsudau 'help' or v, ketsv, kcr'u 'accept' 
appear only 10 times or 15 times respectively. It 
is obvious that such small data are insufficient 
for automatic case frmne learning. That is, case 
frame learning must be done from enormous un- 
analyzed corpora, in unsupervised way 2. 
3.1 Good parser 
NLP research group at Kyoto University has 
been developing a robust and accurate parsing 
system, KNP, over the last ten yem's (Kurohashi 
and Nagao, 1994; Kurohashi and Nagao, 1998). 
This parser has the following advantages: 
• .Japanese is an agglutinative , and 
several Nnction words (auxiliary verbs, suf- 
fixes, and postpositions) often appear to- 
gether and in many cases compositionality 
does not hold among them. KNP treats 
such function words careflflly and precisely. 
• KNP detects scopes of coordination struc- 
tures well based on their parallelism. 
• KNP employs several heuristic rules to pro- 
duce mfique parses for the input sentences. 
The accuracy of KNP is shown in Table 1, 
which counted whether each phrase modifies a 
proper head or not. The overall accuracy was 
around 90%, and the accuracy concerning case 
components varies from 82% to 98%. 
21n English, several unsupervised methods have been 
proposed (Manning, 1993; Briscoe and Carroll, 19!)7). 
However, as mentioned in Section 3, automatic Japanese 
case analysis is much harder than English. 
We can collect pairs of verbs and case compo- 
nents from the automatic analyses of large cor- 
pora by KNP. 
3.2 Coping with two problems 
The quality of automatic case frame learning 
could be negatively influenced by the %llowing 
two problems: 
Word sense ambiguity: A verb sometimes 
has w~rious usages and possibly has several 
case frames depending on its usages. 
Structural ambiguity: KNP performs fairly 
well, but automatic parse results inevitably 
contt~in errors. 
The tbllowing sections explain how to solve 
these problems. 
3.2.1 Word sense ambiguity 
If a verb has two or more meanings and their 
case fl'ame patterns differ, we htwe to disam- 
biguate the sense of each occurrence of the verb 
in a corpus first, and collect case components for 
each sense respectively. However, unsupervised 
word sense disambiguation of fl'ee texts is one of 
the most ditficult problems in NLP. At the very 
begimfing, even the definition of word senses is 
open to question. 
To cope with this problem, we made a very 
simple but usefltl assumption: a light verb has 
diffbrent case frames det)ending on its main case 
component; an ordinary verb has a unique case 
frmne even if it has two or more meanings. For 
example, the case frmne of the verb narn 'be- 
come' differs depending on its ni (dative) case 
as %llows: 
... ga b?}ouki ni na'ru 
nora. become ill 
• .. ga ... to tornodachi ni na'r"u 
nora. with become a fliend 
In most cases, the main case components are 
placed just in front of the light verbs so that 
the automatic parser can detect their relations 
434 
Tal/le 2: EXmnl)les of the constructed ease frames. 
verl)s 
t, aS?t\]gCl"~l, 
'help' 
yomu 
l'ead' 
case lnarkel"s 
(1,o111) 
,,,,o (it(:(:) 
'r~,i (dat) 
ae (op) 
.qa (no\]n) 
'wo (at(;) 
hi, (dat) 
& (o10 
example nomls 
husband, person, child, staff, I, SUSl)eet, faculty, ... 
.jol), shol) , farmwork, preparation, election, move, ... 
son, friend, ambassador, meml~er, thank, holid~\y, ... 
volunte(,r, aft'air, otfice, rewar(l, house, headquarters, ... 
lX;rson, \], chihl, adult, parent, teacher, ... 
newspaper, book, magazine, article, nov(J, letter, ... 
chiht, person, daughter, teacher, student, reader, ... 
newspaper, book, magazine, library, classroom, bathroom, ... 
reliably. Therefore, as for five major and trou- 
t)lesome light verbs (.~'.,r'u 'do', 'nwr'u, q)ceomo?, 
ar'u 'is ...', iu ~s~w', nai 'not'), their case fl'mnes 
are distinguished depending (m their left neigh- 
bouring case components. For other verbs, we 
aSStlllle a \]lnique ease frame. 
3.2.2 Structural ambiguity 
As shown in '_\['~dfle 1, KNP detects heads of case 
conlt~onents in faMy high accuracy. However, 
in order to collect nmch reliable data, we dis- 
carded moditier-hcad relations in the aul;onmti- 
tally Imrsed corpora in the following cases: 
• When CMs of ease conqxments disappear 
because oi" topic markers or others. 
• When the verb is followed 1)y a causative 
auxiliary o1' a passive auxiliary, l;he case tm.t- 
t(:rn is e\]mnged and the 1;race in KNI' is not 
so rclial)le. 
Based on the conditions al)ove, case compo- 
nents of each verb are collected froln the 1)arscd 
corpora, and the collected data arc considered as 
case frames of verbs. However, if the flcquency 
of a CM is very low compared to other CMs, it 
might t)e collected because of parse errors. So, 
we set the threshold for the CM flequency as 
2~, where m.f means the frequency of the 
1nest folln(t ChJ. if the fl'equeney of ~t CM is less 
tlmn the threshold, it is discarded, l.~br exalnple, 
suppose the most frequent CM fin' a verb is we, 
100 times, and the frequency of ni CM tbr the 
verb is 1.6, ni CM is discarded (since it is less 
than the threshold, 20). 
a.3 Constructed case frmne dictionary 
We applied the al)ow', procedure to Mainichi 
Newst)al)er Corpus (7 years, 3,600,00(} sen- 
tences). Fronl the cortms , case franws of 23,497 
verbs are constructed; the average number of 
ease slots of a verb is 2.8; the average munber 
of cxanqflc nouns in a (:as(: slot is 33.6. Table 2 
shows exmnlfles of constructed ease Dames. 
Although the constructed data look apl)ropri- 
ate in most cases, it is hard to evaluate a (lictio- 
nary statica.ll> In the next section, we use the 
dictiomu'y in case structure analysis and eval- 
uate the analysis result, wlfich also im\])lies an 
cvahu~.ti(m of the dictionary itself. 
4 Case structure analysis using the 
constructed case frame dictionary 
4.1. Matching of an input sentence and 
a case frallle 
'Jl~e basic 1)ro(:cdure in ('ase strucl;ul"e analysis is 
lo match an inlml sentence with a case frame, 
aS show11 ill lqgUl'C, 1. 
The matching of case conq)onenl:s in an input 
and case slots in a case fl'alllO is (\[Olle Oll the 
following conditions: 
I. When a ease component has a CM, it must 
be assigned to 1;11o case slot with the same 
CM. 
. When a case COml)Onent does nol: have a 
CM, it can 1)e assigned to the 9a, we, or ni 
CM slot. 
. ()nly one case component can be assigned 
to a case slot (unique case assiglmmnt con- 
straint). 
The conditions above may produce nmltil)le 
matching patterns, and to select the proper one 
alllOng {,llclll, 11Oll118 of case COlllpon(',lltS al'o COlll- 
pared with examph',s in case slots of the (tictio- 
nary. 
435 
syorui wa . 
(5) document topic-marker / 
ka,'e .i .___1 
! 
,,e 1 
~" walashila 
" hand 
\[ (1 handed the document to him.) 
WaRlSU 'hand' 
ga defendam, president .... 
we money, nlelllO, bribe .... 
ni person, suspect, ... 
de affair, office, room .... 
(6) Deutsch-go me 
/ Gcl'nlan also q 
hHII(ISII 
speak "-7 
{';cq/;i'i;ir 
a teacher who speaks also German) 
~ professor, president 1 
ni person, friend .... 
-- we reason English Japanese 
to (sentence) 
Figure 1: Matching of an inl)ut sentence and a case fl:ame. 
Even though a 3,600,000 sentences corpus was 
used for learning, examples in case slots are still 
sparse, and an input noun mostly does not match 
exactly an example in the dictionary. Then, a 
thesaurus is employed to solve this problem. 
In our experiments, NTT Semantic Feature 
Dictionary (Ikehara et al., 1997) is employed as 
a thesaurus. Suppose we calculate the silnilar- 
ity between Wl and w2, their depth is dl and d2 
in the thesaurus, and the depth of their lowest 
(most specitic) common node is de, the similarity 
score between them is calculated as follows: 
= (4 × + 
If W 1 and w2 are in the same node of the the- 
saurus, the similarity is 1.0, the maximum score 
based on this criteria. If Wl and w2 are identical, 
the similarity is 1.0, of course. 
The score of case assigmnent is the best sim- 
ilarity between the input noun and examples in 
the case slots. The score of a matching pattern 
is the sum of scores of case assignments in it. If 
two or more patterns meet the above conditions, 
one which has the best score is selected as a final 
result. 
In the case of sentence 5 in Figure 1, karc 7ti 
'he dativc-CM' is assigned to the ni case slot. 
Then, syorui wa 'document topic-marker' can be 
assigned to the ga or wo case slot. By calculating 
similarity between syorui and 9a-slot examples 
and wo-slot exmnples, it; is considered to be as- 
signed to the wo slot. 
In case of sentence 6, none of the case compo- 
nents has a CM. Based on similarity calculation, 
Deutsch,-go is assigned to 'wo, sensei is assigned 
to ga. 
4.2 Parsing with case structure analysis 
A complex sentence which contains a clausal 
modifier exhit)its a typical structural ambiguity 
of Japanese; case components left to a verb of 
a clausal modifier, Vc, possibly modify V~: or a 
matrix verb Vm. 
For example, in sentence 3, ir'o 'w~L 'color 
topic-inarker' possibly modifies kite-iru 'wear' or 
(l,~l)(\],Sel'~l, qlariilonize'. 
KNP, a rule-based parser, handles this type of 
ambiguity ~s follows. If a case component is fol- 
lowed by a comma, it is treated as modif\[ying Vm ; 
if not, it is treated as modif\[ying 1~:. Although 
this heuristic rule usually explains real data very 
well, sentence 3 will be analyzed incorrectly. 
Parsing which utilizes a case frame dictionary 
can consider which is a proper head, V~ or Vm, 
tbr an ambiguous case compolmnt by comparing 
examples in the case slots of V~ and 14~. Such 
a consideration nmst be done considering wlmt 
other case components modifly Vc and Vm, since 
the assigned case slot of a case component might 
differ depending on the candidate structure of 
the sentence due to the unique case assignment 
constraint. 
Therefore, it is necessary to expand the struc- 
tural ambiguity and consider all the possible 
structures fbr an input. So, we calculate the 
matching score of all pairs of case components 
and verbs in all possible structures of the sen- 
tence, and select the best structure based on the 
436 
boushi tic; ire wa 
hat color 
bott.~\]ti no 
hal ~,~ 
~---__ ire wa 
color-- I 
C(;¢lt It\[ 
COat ~5"¢rll 
harlllOlliZe 
We ClOth, tllli\]'/)l'lll, CI)la .... ~ WO \] pOWCI', face, }l}ind 
~ .I I de party, oily, home .... ni I prcl)lcnce, cItlth .... 
kite-it'lt Co(It I1i HWCLTCI'll 
weal" coat hllrnlolliZC 
hat 
ire wa -2 (distance penalty) 
c°l°r ~ \] 
kite-iru \[ 
co.t ,,i ~ 
~ COat IdilA't!rll 
kiru 'wear' ~.\ awa,~emt 'harllloilizc' 
2 ,:2,,,,,,6 .... 
Figure 2: Parsing with east structure analysis. 
sum of the matching scores in it. 
Since the heuristic rule employed ill KNP is 
actually very useful, we in(:orporate it, that is, 
l)enalty score is imposed to the modifier-hea(l re- 
la.tion depending on the distraint between ~t mod- 
ifi(;l" and a head. If a moditier is not followed by 
a comma, the penalty score, 0,-2,-4,-6, ... is 
imposed when a moditler modifie.s the first (nea.r- 
est), second, third, tburth, ... verbs ill a sentence 
respectively; if with a comma, the tmnalty score, 
-2, 0, -2, -4, ... is impose& 
For example, sentence 3 was analyzed t)y our 
method as shown ill Figure 2. Since the simi- 
lm'ity score between fro ~color' a.nd the 'we-slot 
of uwa.s'cr'u hmunonize is nmch larger t;\]iall theft 
l)etween ire 'eoloff and the ga-slot of lci'r'u. 'wear', 
the correct structure of the selltellee was de- 
tected (the right-lmnd parse of Figure 2). Note 
that, furthermore, both the ease of ire ill reb> 
tion to awascru 'harmonize', and the case of coal, 
in relation to kite-iru 'wear' were dete(:ted cor- 
rectly. 
Structm'al ambiguities often cause a combina- 
torial explosion when a sentence is long. How- 
ever, by detecting the SeOl)eS of coordinate struc- 
tures 1)e%rehand, which off;ell aPl)ear in long 
'l'td)le 3: The at:curacy of case detection. 
(;orre(:I; ill(:orl'e(;t \])arsing 
ease case 
error (lel;e(:l;ion (tete(:tion 
topic-marleer 82 13 5 
clausal modifier 73 18 9 
senl;ences, we can reasonably limit the possil)le 
sl;ructures of the sentence. 
The ~werage analysis speed of tile ext)criments 
described in the next section was about 50 sen- 
tenets/aria. 'File tinm-oul, of one rain. was only 
employed to 7 out of 4,272 test Selltellces. 
4.3 Experilnents and discussion 
We used 4,272 sentences of Kyoto University col 
pus as a test set. We parsed them by our new 
lnethod (Figure 3 shows several examt)les) and 
cheekc, d two 1)oints: case detection of mnbiguous 
case (-omponents and syntactic analysis. 
First, we randomly selected ambiguous ease 
components: 100 l,ol)ic-markcA case components 
all(t 100 (:ase coral)orients moditied by clausal 
437 
ookllrasyo ha 
the Treasury 
3gatsuki kes:~an de 
settlenlellt ill March ,, 
.l'hintakuginkotl kakukmt ga I impr<n'ed by case iajbmtatiml 
each trust bank 
tsttmitareteirtt 
save lip 
tokubetsu t3,uhokil~ ,1o 
specially reserved money 
mrikuzushi wo 
collstllllptioll 
gai,,'yoltha 
the Foreign Minislcr 
m 
tnitonwru 
allow 
hm~shhula. 
policy 
mikka ni 
on tile third \] 
4 Mexico ga Mexico 
\[Iglppyolt shiRl imln'ored hy cave i@~tmltli¢~*i 
illltl(RlnCC ~ ~\] 
i@lre Ixmshi mulo • 
prevention o1" inl\]alion \[ 
I 
L'eizai misaku ni 
fimmclal pllllcy 
t,wtite 
al~(Rl\[ 
st'Lvlltttl'i ~hifa 
cxphdn 
Figure 3: Exmnt)les of the mmlysis results. 
modifiers, and checked whether their cases were 
correctly detected or not. As shown in Table 3, 
the accuracy of the analysis was fairly good: that 
tbr topic-markers was 82% and that tbr clausal 
modifiers was 73%. 
Then, we compared the parse results of our 
method with those of the original KNP. As a re- 
sult, 565 modifier-head relations differed; in 260 
cases, our method was correct and the original 
KNP was incorrect (by considering the struc- 
tures in the Kyoto University Corpus as a golden 
standard); in 224 cases, vice versa. That is, 
our method was superior to KNP by 36 cases, 
and increased the overall accuracy from 89.8% 
to 89.9%. Since the heuristic rule used in KNP 
is very strong, the improvement was not big. 
The improvement of the accuracy, though small, 
is valuable, because the accuracy around 90% 
seems close to the ceiling of this task. 
5 Conclusion 
We proposed an unsupervised construction 
method of a case frame dictionary. We obtained 
a large case fl'alne dictionary, which consists 
of 23,497 verbs. Using this dictionary, we can 
detect ambiguous case components accurately. 
Also since our method employs unsupervised dic- 
tionary learning, it can be easily scaled up. 

References 

Ted Briscoe and John Carroll. 1997. Automatic 
extraction of subcategorization from corpora. 
In Prvccedings of ANLP-97. 

Satoru Ikehara, Masahiro Miyazaki, Satoshi 
Shirai, Akio Yokoo, Hiromi Nakaiwa, Ken- 
tarou Ogura, and Yoshiflmfi Oyama Yoshi- 
hiko Hayashi, editors. 1997. Japanese Lexi- 
con. Iwananfi Publishing. 

Information-q~chnology Promotion Agency, 
,Japan. 1987. Japanese Verbs : A Guide to 
the H~A Lea:icon of Basic ,Japa~tcsc Verbs. 

S. Kurohashi and M. Nagao. 1994. A syntac- 
tic analysis method of long japanese sentences 
based on the detection of conjunctive struc- 
tures. Computational Linguistics, 20(4). 

S. Kurohashi and M. Nagao. 1998. Build- 
ing a jal)anese parsed corpus while improv- 
ing the t)arsing system. In Prvcccdin.qs of" Th.c 
Fir;st h~,tcr'national Co't@r~ncc on Lwnguage 
R.csources 64 Evaluation, pages 719 724. 

Christopher D. Maturing. 1993. Automatic ac- 
quisition of a large snbcategorization dictio- 
nary froln corpora. In Pr'occcding s of A CL-93. 

Takehito Utsuro, Takashi Miyata, and Yuji Mat- 
sumoto. 1998. General-to-simeific model se- 
lection tbr subcategorization preference. In 
Proceedings of th.c 17th International ConJ'cr- 
cncc on Computational Li'n.quistics and the 
36th Annual Mectin.q of the Association for 
Computational Lin.quistics. 
