Bunsetsu Identification Using Category-Exclusive Rules 
Masaki Murata Kiyotaka Uchimoto Qing Ma Hitoshi Isahara 
C()mmunications Research Laboratory, Ministry of Posts and ~I~lecommunications 
588-2, \]waoka, Nishi-ku, Kobe, 651-2d92, Japan 
• -j ( ' . tel:-k81- 78-969-2 \] 81 tax: +81- 78-369-2189 http://www-karc.crl, go.j p/ips/murata 
{ murata,u(:himoto,qma,isahara}(@crl.go.ji) 
Abstract 
This pal>or describes two new bunsetsu identificatkm 
methods using supervised learning. Sin(:e ,Jat)anese 
syntactic analysis ix usnally done after bunsetsu 
identification, lmnsetsu identiiieation is iml)orl;ant 
for analyzing Japanese sentences. In experiments 
comparing the four previously available machine- 
learning methods (decision tree, maximmn-entropy 
method, example-based apI)roaeh and deeiskm list,) 
an(l two new methods llSing categot'y-exclusive rules~ 
the new method using l;he category-exclusive rules 
with the highest similarity t)erformetl best. 
1 Introduction 
This paper is about machine learning methods for 
identifying bwnsr'ts'~zs, which correspond to English 
phrasal units such as noun phrases and t)rel)ositional 
phrases. Since .Japanes(.' syntactic analysis ix usu- 
ally done after lmnselisu identitication (Uchimot;o el; 
a\].. 1999), i(lentitlying lmnsetsu is important l'or an- 
alyzing ,J;~p~tnese seltt(}tl(:es. The conventional stud- 
ies on lmnsetsu identitieation ~ have used hand-made 
rules (Kameda, \]995; Kurohashi, 3998), })ill; bun- 
sel;su identification is not an easy task. Conventional 
studies used many hand-nmde rules develot)ed at the 
cost of many man-hours. Kurohashi, tbr examl)le, 
made 146 rules for lmnsetsu identification (Kuro- 
hashi, 1998). 
Itl }l.tl a.ttetllpt to reduce the mnnber of man- 
hours, we used machine-learning methods for bun- 
setsu identitication. Because it; was not clear which 
machine-learning method would 1)e the one most al)- 
propriate for bunsctsu identification, so we tried a 
variety of them. In this paper we rei)orl; ext)er- 
inlellts comparing tbur inachine-learning me, thods 
(decision tree, maximmn entropy, example-based, 
and decision list; methods) and our new methods us- 
ing category-exclusive rules. 
l lhmsetsu ideni,illcation is a ln'oblem similar to ohm,king 
(lLamshaw and Marcus, 1995; Sang and \h;ellsl;ra, 1999) in 
other l;mguages. 
2 Bunsetsu identification problem 
We conducted experiments (m the following super- 
vised learning methods tbr idel~tiflying }mnsetsu: 
• \])eeision {;l'ee method 
• Maxilnunl entropy method 
• Examt)le-based method (use of sinfilarity) 
• Decision list (use of probability and frequency) 
• Method 1 (use of exclusive rules) 
• Method 2 (use of exclusive rules with the high- 
est similarity). 
In general, t)misetsu identification is (tone afl;er 
morl)hological and l)efore syntactic analysis. Mor- 
1)hological analysis correst)onds to part-of-st)ee(:h 
tagging ill English. Japanese syntactic structures are 
usually ref)resented by the. relations between lmn- 
setsus, which correspond to l)hrasal units such as a 
noml l)hrase or a t)repositional 1)hrase in \]r, nglish. 
St), 1)unsetsu identification is imi)ortant in .lnpanese 
sentence mmlysis. 
In this paper, we identit\[y a bunsetsu by using 
intbrmation Dora a morl~hological analysis. Bun- 
setsu identitication is treated as the task of deciding 
whether to insert a "\[" mark to indicate the partition 
between two hunsetsus as in Figure 1. There, fore, 
bunsetsu identilical;ion is done by judging whether a 
partition mark should be inserted between two adja- 
cent nlorphemes or not. (We. do not use l;he inserted 
partition mark in the tbllowing analysis ill this paper 
for the sake of simplicity.) 
Our lmnsetsu identification method uses i;t1(} lilOr- 
phok)gk:al intbrmation of the two preceding and two 
succeeding morphemes of an analyzed space bel;ween 
two adjacent morphemes. We use the following mor- 
phological information: 
(i) Major part-of speech (POS) category, 2 
(ii) Minor P()S category or intlection tYl)e, 
(iii) Semantic information (the first three-digit nun> 
bet of a category nmnlmr as used ill "BGIt" 
(NLI{,I, 1964:)), 
2Part-of-spec.ch c~ttegories follow those of 3 \[/MAN (Kuro- 
hashi and N~tgao, 1998). 
565 
bohu .qa 
(I) nominative-case particle 
(I identify bunsetsu.) 
\[ bunsetsu wo 
(bunsetsu) objective-case particle 
I matomeagcru 
(identify) 
Figure 1: Example of identified bunsetsus 
Major POS 
Minor POS 
Semantics 
Word 
bun wo ~ugiru 
(sentence) (obj) (divide) 
((I) divide sentences) 
Noun Particle Verb 
Normal Noun Case-Particle Normal Form 
x None 217 
x wo ku.qiru 
Symbol 
Punctuation 
X 
X 
Figure 2: hfformation used in bunsetsu identification 
(iv) Word (lexical iifformation). 
For simplicity we do not use the "Semmltic infor- 
matioif' and "Word" in either of the two outside 
morphemes. 
Figure 2 shows the information used to judge 
whether or not to insert a partition mark in the space 
between two adjacent morphemes, "wo (obj)" and 
"kugiru (divide)," in the sentence "bun wo kugiru. 
((I) divide sentences)." 
3 Bunsetsu identification process for 
each machine-learning method 
a.1 Deeision-tree method 
In this work we used the program C4.5 (Quinlan, 
1.995) for the decision-tree learning method. The 
four types of information, (i) major POS, (ii) mi- 
nor POS, (iii) semmltic information, and (iv) word, 
mentioned in the previous section were also used 
as features with the decision-tree learning method. 
As shown in Figure 3, the number of features is 12 
(2 + 4 + 4 + 2) because we do not use (iii) semantic 
information and (iv) word information from the two 
outside morphemes. 
In Figure 2, for example, the value of the feature 
'the major POS of the far left morpheme' is 'Noun.' 
a.2 Maximum-entropy method 
The maximum-entropy method is useful with sparse 
data conditions and has been used by many re- 
searchers (Berger et al., 1996; Ratnaparkhi, 1996; 
Ratnaparkhi, 1997; Borthwick el; al., 1998; Uchi- 
moto et al., 1999). In our maximuln-entropy exper- 
iment we used Ristad's system (Ristad, 1998). The 
analysis is performed by calculating the probability 
of inserting or not inserting a partition mark, from 
the output of the system. Whichever probability is 
higher is selected as the desired answer. 
In the maximum-entropy method, we use the same 
four types of morI)hological information, (i) major 
POS, (ii) minor POS, (iii) semantic information, and 
(iv) word, as in the decision-tree method. However, 
it, does not consider a combination of features. Un- 
like the decision-tree method, as a result, we had to 
combine features mmmally. 
First we considered a combination of the bits of 
each morphological information. Because there were 
four types of information, the total number of com- 
binations was 2 ~- 1. Since this number is large 
and intractable, we considered that (i) major POS, 
(ii) minor POS, (iii) semantic information, aim (iv) 
word information gradually becolne inore specific in 
this order, and we coml)ined the four types of infor- 
mation in the following way: 
Information A: (i) major POS 
Intbrmation B: (i) major POS and (ii) minor POS 
hfformat, ion C: (i) major POS, (ii) minor POS and 
(iii) semantic information 
Information D: (i) major POS, (ii) minor POS, 
(iii) semantic informa~aion and (iv) word (~) 
We used only Information A and B for the two out- 
side morphemes because we (lid not use semantic 
and word information in the same way it is used in 
the decision-tree inethod. 
Next, we considered the combinations of each type 
of information. As shown in Figure 4, the number 
of combinations was 64 (2 x 4 x 4 x 2). 
For data sparseness, in addition to the above com- 
binations, we considered the cases in which frst, one 
of the two outside morphemes was not used, sec- 
ondly, neither of the two outside ones were used, m~d 
thirdly, only one of the two middle ones is used. The 
nmnber of features used in the maximum-entropy 
method is 152, which is obtained as follows: a 
3When we extr~,cted features from all of the articles on 
566 
Far M't mort)henm Left morl)heme 
( Ma.ior POS '} 
~'Major POS'\[ ,J Minor POS 
I, Minor POgJ +/S,mmntic Information + 
( \¥ord , 
2 4 
Right morl)hemc Far right mort)heine 
Minor POS f Ma.ior POS ~ 
Semantic Infi)rmation ' + \[ Minor POS j 
Word 
4 2 
Figure 3: Features used in the decision-tree method 
Far left morl)heme 
Information 
hffbrnmtion A } 
2 
Left morpheme Right morl)hcme 
(Information!} (Infbrmation A 
J hfformation J Intbrmation B 
/hfformation & ~ hfformation C 
I, Infbrmation 1, hffbrmation D 
4 4 
Far right morpheme 
J" Information 
& \[hlformation B A } 
2 
Figure 4: lJ~,atm'es used in the maximunt-entrol)y met.hod. 
No. of t>atures= 2 x 4 x 4 x 2 
+2 x 4 x 4 
+ 4 x 4 x 2 
+ 4 x 4 
+ 4 
+ 4 
= 152 
In Figure 2, (;lie feature that uses Infornultion 
B in the far left morl)heme, Infbrnmtion D in the 
left mort)heine, Information C in the right mor- 
pheme, and Information A in the fa.r right mop 
l/heme is "Noun: Nornml Noun; Particle: Case- 
Particle: none: wo; Verl): Nornml Form: 217; Sym- 
bol". In tim maximmn-entrol)y method we used for 
each space 152 ligatures such as this ()tie. 
3.3 Example-based method (use of 
similarity) 
An example-based method was t)rollosed t) 3, Nagao 
(Nagao, 1984) in an attempt to solve I)roblenls in 
machine translation. To resolve a. l)rol)h'm, it; uses 
the most similar (;xami)le. In the i)resent work, the 
examt)le-1)ased method imt)artially used the same 
four types of information (see Eli. (1)) as used in 
the maxinmm-entrotly method, 
To use tills method, we must define the similarity 
of an ini)ut to an example. We use the 152 1)atterns 
fl'om the maximum-entropy method to establish the 
level of similarity. We define the similarity S be- 
tween all input and an exmnl)le according to which 
one of these 152 levels is the lnatching level, as fol- 
lows. (The equation reflects the importance of the 
two middle morphemes.) 
January 1, 1995 of a Kyoto University corpus (l;hc mnnber of 
spaces between nmrl)henms was 25,81d) by using this method, 
the nunfl)e,r of types of features was 1,534,701. 
S = s(m_t) x s(m-H) x 10,000 + s(m_2) x s(.q.~) (2) 
Here m-l, m4-\], m-2, and m+2 refer respectively to 
the left;, rigid;, far M't, ;rod far righl; mortflmnms, and 
s(x) is the mort)hological similarity of a lllOl't)hellle 
x, which is defined as follows: 
s(x) =1 (when no information of x is matched) 
2 (when Information A of x is matdmd) 
3 (when hfl~)rmal:ion B of x is mate, heal) 
4 (when Information C of x is mat;cited) 
5 (when Information D of x is matched) (a) 
Figure 5 shows an exmnple of the levels of sim- 
ilarity. When a pattern matches Information A of 
all four lnort)henies , such as "Noun; Particle; Verb; 
Symbol", its similarity is 40,004 (2 x 2 x 10,000 + 
2 x 2). When a pattern matches a pattern, such as 
" ; Particle: Case-Particle: none: wo; ; ", its 
similarity is 50,001 (5 x 1 x 10,000 + 1 x 1). 
The exmnl)le-1)ased method extracts the exam- 
ple with the highest level of similm'ity and checks 
whether or not that exami)le is marked. A partition 
marl{ is inserted in tile input data only when the ex~ 
amt)le iv marked. When multit)le exalnl)les have the 
same highest level of similarity, the selection of tile 
best example is ambiguous, hi this case, we count 
tile number of nlarked and mlinarked sl)aces in all 
of the examples and choose the larger. 
a.4 Decision-list method (use of probability 
and fl'equeney) 
Tile decision-list method was proposed by Rivest 
(Rivest, 1987), in which tile rules are not expressed 
as a tree structure like in the decision-tree method, 
567 
No information 
hffornmtion A 
Information B 
Information C 
Information D 
bun wo kugiru 
(senWnce) (obj) (divide) 
S(X) ?~\]'- 2 7D,--1 ?/Zq-1 '11~'+2 
1 .... 
2 Noun Particle Verb Symbol 
3 Normal Noun Case-Particle Normal Form Punctuation 
4 x None 217 x 
5 x wo kugiru x 
Figure 5: Exmnple of levels of similarity 
but are expanded by combining all the features, and 
are stored in a one-dimensional list. A priority or- 
der is defined in a certain way and all of the rules 
are arranged in this order. The decision-list method 
searches for rules Dora tile top of the list and an- 
alyzes a particular problem by using only the first 
applicable rule. 
In this study we used ill the decision-list method 
the same 152 types of patterns that were used in/;lie 
ma.ximuln-entropy method. 
To determine the priority order of the rules, we re- 
ferred to Yarowsky's method (Yarowsky, 1994) and 
Nishiokwama's method (Nishiokaymna et al., 1998) 
and used the probability a.nd frequency of each rule 
as measures of this priority order. When nnlltiple 
rifles had the same probability, the rules were ar- 
ranged in order of their frequency. 
Suppose, for example, that Pattern A "Noun: 
Normal Noun; Particle: Case-Particle: none: wo; 
Verb: Normal Form: 217; Symhol: Punctuatioif' 
occurs 13 times in a learlfing set and that tell of 
the occurrences include the inserted partition Inal:k. 
Suppose also thai; Pattern B "Noun; Particle; Verb; 
Symbol" occurs 12a times in a learning set and that 
90 of the occurrences include the mark. 
This exmnple is recognized by the following rules: 
Pattern A ~ Partition 76.9% (10/ 13), Freq. 23 
Pattern B => Partition 73.2% (90/123), Freq. 123 
Many similar rules were made and were then listed 
in order of their probabilities and, for any one prob- 
ability, in order of their frequencies. This list was 
searched from tile top ml(:l the answer was obtained 
by using the first, ai)plicable rule. 
3.5 Method I (use of eategory-exelusive 
rules) 
So far, we have described the four existing machine 
learning methods. In the next two sections we de- 
scribe our methods. 
It is reasoimble to consider tile 152 patterns used 
in three of the previous methods. Now, let us sup- 
pose that the 152 patterns fl'om the learning set yield 
the statistics of Figure 6. 
"Partition" means that the rule determines that a 
partition mark should be inserted in the input data 
and "non-t)arl:ition" ineans that tile rule determines 
that a partition mark should not be inserted. 
Suppose that when we solve a hypothetical prob- 
lem Patterns A to G are apt)licable. If we use the 
decision-list inethod, only Rule A is used, which is 
applied first, and this determines that a partition 
mark should not be inserted. For Rules B, C, and 
D, although the fl'equency of each rule is lower thml 
that of Rule A, tile suln of their frequencies of the 
rules is higher, so we think that it is better to use 
Rules B, C, ml(t D than Rule A. Method 1 follows 
this idea, but we do not simply sum up tile frequen- 
cies. Instead, we count the munber of exalnples used 
ill Rules B~ C, and D and judge the category having 
tile largest number of exmnplcs that satisfy the pat- 
tern with the highest probability to be the desired 
ai1swer. 
For exmnple, suppose that in the above examt)le 
the number of examples satis(ying Rules B, C, and 
D is 65. (Because some exmnples overlq) in multi- 
pie rules, the total nunfl)er of exalnples is actually 
smaller than the total number of tile frequencies of 
the three rules.) In this case, among the examples 
used by the rules having 100% probability, tile nmn- 
ber of examples of partition is 65, m~d the number 
of examt)les of non-t)artitioi~ is 34. So, we deternline 
that tile desired answer is to partition. 
A rule having 100% probability is called a 
category-exclusive rule because all the data satist~y- 
ing it belong to one category, which is either parti- 
tion or noi>partition. Because for any given space 
the number of rules used call be as large as 152, 
category-exclusive rules are applie(t often ~. Method 
1 uses all of these category-exclusive rules, so we call 
it tile method using category-exclusive rules. 
Solving problems by using rules whose prol)abili- 
l;ies are nol; 100% may result ill the wrong solutions. 
Almost all of the traditional machine learning meth- 
ods solve problelns by usiug rules whose i)robabilities 
4'l'he ratio of the spaces analyzed by using category- 
exclusive rules is 99.30% (16864/16983) in Experiinent 1 of 
Section d. This indicates that ahnost all of the spaces are 
analyzed by category-exclusive rules. 
568 
\]{,uIe A: 
l{,uh.' B: 
Rule C: 
Rule D: 
Rule E: 
Rule F: 
Rule G: 
l'attern A 
Pattern \] 
Pattern C 
Patl;crn \]) 
Patt;ern 1'3 
Pal;l;erll F 
Pattern G 
-¢- I)rol)ability of non-i)al|;ition 
=> probability of partition 
=> i)rolml)ility of partition 
=> probal)ility of 1)artition 
:~ i)robability of partition 
:~ probability of parl:ition 
=> probability of non-partition 
m0% (a4/34) 
100% (,33/a3) 
\]oo% (25/2s) 
J()0(X, ( 1!)/ 19) 
8J.3% (1¢)¢}/123) 
76.9% ( 10/ ~a) 
57.4% (31{)/540) 
Figm'e 6: ;Ill (',Xaml)h; of rules used in Method :l 
1,¥equency 34 
Frequency 33 
Frequency 25 
Frequency 19 
Frequcncy 123 
Fl'o, qucncy \] 3 
lq'equc, ncy 540 
are not J(}0%. By using such methods, we cannot 
hol)e to improve, a,:curacy. If we want to improve ac- 
(;llra(;y~ we nlllst use catt;gory-excltl,qive l'lllCS. Thel'e 
are some eases, however, tbr which, even if we take 
this at)l)r()ach, eategory-exchlsive rules aye rarely al> 
plied. In such cases, we lilllSl; add new feai;ures t() 
I;he mlalysis to create a situation in which many 
ci~tegory-exehlsive rules Call \])(; appli(~d. 
I\]owever, il; is not suflieient to use t;~/tt~ory- 
exclusive rules. There arc, many nmaningless rules 
which \]la,1)l)ell to 1)e c~tl;egor3~-ex(;lll,qive ollly ill ~t 
learning set. We lllllSt consider how to (~Iimim/te 
such meaningh;ss rule,q. 
3.6 Method 2 (rising category-exclusive 
rules with the highest sintilarit;y) 
Method 2 combines the, exami)h>based method and 
Method 1. That is, it; combines the. method using 
similarity m~d the method usint'~ category-exchlsive 
rules in order to eliminate the meaningless (:art;gory- 
exclusive rules ment;ion(;(l ill lhe 1)revious st;el;ion. 
Mel;ho(1 2 also uses 152 patl(!rus for i(tentillving 
\])llllS(',|;,qll. q~h(!s(, ~ t)~li;l,(!l'll~q ~/l'(? llF,(RI ;IS rules i~ the 
,q;lllle ~,\;;ty }is ill Met\]lo(\] \]. l)esir('d HIISWtH',q ill'(; (h;t(;l- 
mined by using the rule. having the high(>t probabil- 
ity. When mull;iple rules have the same 1)rolmbility, 
M(;thod 2 uses the wdue of the similarity described 
in the section of the examl)h>based m(;thod and }lll- 
alyzes the 1)robk:m with the rule having the highest 
simihu'ity. When multiple rules have th(; stone prob- 
nbilil;y and similm'ity, the method takes the exam- 
pies used by the rules having the highest probability 
and the higlmst si,nilarity, and chooses the (:ategory 
with the larger llllllI))CF Of exami)les as t:hc desired 
answer~ in the same way as in Method 1. 
Itowever, when (:ategory-c.xchlsive rules having 
llltiro thall Olte fre(lllellCy exist, the a})ove t)roccdllr(; 
is performed after eliminating all of the category- 
exclusive rules having one frequency, in el;her words, 
category-exclusive rules having more than one fl'e- 
quency are giwm a higher priority than category- 
exclusive rules having only one. flTo.qll(;nc.y })lit hav- 
ing ~ high(w similarity. This is 1)c(:ause eategory- 
(;xclusivc rules having only one fl'equen(:y are not so 
reliabh',. 
4 Experiments and discussion 
In our experiments we used a Kyoto University text 
eorlms (Knrohashi and Nagao, 1997), which is a 
tagged corpus made Ul) of articles fi'om the Mainichi 
newspaper. All exl)eriments reported in this paper 
we.re performed using art, ielcs dated fi'om ,\]mmary 
\] to 5, 1995. We obtained the correct infi)rnmtion 
()n morphoh)gy and }mnse.t;su identiticathm from the 
tagged corpus. 
The following experiments were conducted to de- 
termine which supervised \]earnillg~ lnethod achieves 
the high<'.st a(:Cllra(:y l~tl;e. 
• Exlmriment \] 
\[,(;arninl,; set: ,Janllary 1, 1.995 
~J?t;Sl; st?t: ,\]atlll~/l'y 3, 1.995 
• \]';xt)eriment 2 
Learning set: 3ammry 4, 1995 
Test set: .\]a.nuary 5, 1995 
ltecm>e we used F, xlmriment \] in maki,lg Method 
I and Method 2, \]i;Xl)erinieut 71 is a ch)sc'd data..~et 
for Mel:l~od \] and Method 2. So, we l)crformed Ex- 
lmriment 2. 
The ,'(;suits arc. listed in '12fl)lt;,q I to d. \Ve used 
KNP2.0b4 (Kurohashi, 11997) mM KNP2.0t/6 (Kuro- 
hashi: 1998), which are bmlsetsu identitication and 
syntael;i(" analysis systems using tmmy hand-made 
rules in addition 1;o the six methods des(:ribed in 
Section 3. Be('mtse KNP is not based on a machine 
learning inethod but :many hand-made rules, in the 
KNP results "Learning selY and '~'.Test set" in the ta- 
llies have nt) meanings. In the eXll(wiment of KNP, 
we also uses morphological information in a corpus. 
~\].~hc ';F': ill l;\]le tables indicates the F-measure~ which 
is the. harmonic mean of a recall and a precision. A 
recall is l;he fl'action of correctly identilied partitions 
out of all the partitions. A t)reeision is the ffae- 
th)n of correctly identitied partitions out of all the 
SlmCeS which were judged to have a partition mark 
inserted. 
Tables I to -/I show the. following results: 
® In the test set I;he dc.cision-tree method was 
a little better thmt the maximmn-entropy 
569 
Table 1: Results of learning set of Experiment 1 
Method D ~ 
Decision %-ee 99.58% 
Maximum Entropy 99.20% 
Example-Based 99.98% 
Decision List 99.98% 
Method 1 99.98% 
Method 2 99.98% 
KNP 2.0114 99.23% 
KNP 2.0116 99.73% 
Recall Precision 
99.66% 99.51% 
99.35% 99.06% 
100.00% 99.97% 
100.00% 99.97% 
100.00% 99.97% 
100.00% 99.97% 
99.78% 98.69% 
99.77% 99.69% 
The number of spaces between two )nor flmmes is 
25,814. The number of lmrtitions is 9,523. 
Table 3: Results of learning set of Exi)eriment 2 
Method 
Decision Tree 
Maximum Entropy 
Example-Based 
Decision List 
Method 1 
Method 2 
KNP 2.01)4 
KNP 2.066 
99.07% I 
99.99% I 
99.99% I 
99.99% I 
99.99% I 
98.94% I 
99.47% I 
RecaU Precision 
99.71% 99.69% 
99.23% 98.92% 
100,00% 99.98% 
100.00% 99.98% 
100.00% 99.98% 
100.00% 99.98% 
99.50% 98.39% 
99.47% 99.48% 
The mlmber of spaces between two mor )heines is 
27,665. The number of partitions is 10 143. 
Table 2: Results of test set of Experiment 1 
Method 
~ion Tree 
Maximmn Entropy 
Example-Based 
Decision List 
Method 1 
Method 2 
KNP 2.0b4 
KNP 2.066 
- F Recall 
98.87% 98.67% 99.08% 
98.90% 98.75% 99.06% 
99.02% 198.69% 99.36% 
98.95% i 98.43% ! 99.48% 
98.98% 198.54%! 99.43% 
99.16% I 98.88% ' 99.45% 
99.13% 99.72% ! 98.54% 
99.66% ~_99.68% ! 99.64% 
:)heroes is The lmmber of spaces between two mor 
Precision 
16,983. The mmiber of partitions is 6,166. 
T~ble 4: Results of test set of Experiment 2 
Method 
Decision Tree 
M aximmn Entropy 
Example-Based 
Decision List 
Method 1 
Method 2 
KNP 2.0b4 
KNP 2.01)6 
P RTcad I Precision 
98.50% 98.51% I 98.49%- 
98.57% 98.55%1 
i 98.82% 98.71%1 
198.75% 98.27%1 
i 98.79% 98.54% I 
i 98.9\[1% 98.65% I 
199.(/7% 99.43%1 
L 99.51% 99.40% ~_ 
98.59% 
98.93% 
99.23% 
99.43% 
99.15% 
98.71% 
99.61% 
The nmnber of" spaces between two morphemes is 32,3o4. 
The number of partitions is 11,756. 
method. Although the maximuln-entropy 
method has a weak point in that it, does not 
learn the combinations of features, we could 
overcome this weakness by malting almost all of 
the combilmtions of features to produce a higher 
accuracy rate. 
• Tile decision-list nlethod was better titan the 
maximum-entropy method in this experinmnt. 
• Tile example-based nlethod obtained the high- 
est accuracy rate among the four existing meth- 
ods. 
• Alttlough Method 1, which uses tim category- 
exclusive rule, was worse than the exmnple- 
based method, it was better than tile decision- 
list method. One reason for this was that 
tile decision-list metllod chooses rules rmldomly 
when multiple rules have identical probabilities 
mid fl'equeneies. 
• Method 2, which uses the category-exchlsive 
rule with the highest similarity, achieved the 
highest accuracy rate among tile supervised 
learning methods. 
• Tim example-based method, tim decision-list 
inethod, Method 1 and Method 2 obtained ac- 
curacy rates of about 100% for the leanfing set. 
This indicates that these methods m:e especially 
strong for learning sets. 
• Tile two methods using similarity example- 
based method mid Method 2) were always bet- 
ter than the other methods, indicating that the 
use of similarity is eflective if we can define it 
approl)riately. 
• We carried out experinmnts by using KNP, a 
system that uses ninny ha.nd-made rules. The 
F-measure of KNP was highest in the test set. 
• We used two versions of KNP, KNP 2.0b4 and 
KNP 2.0b6. The latter was mudl better tlmn 
tlm former, iudicating tha.t the improvements 
made by hand are effective. But, the mainte- 
nance of rules by hand has a limit, so the im- 
provements made by hand are not always effec- 
tive. 
Tlle above experiments indicate that Method 2 is 
best among the machine learning methods '5. 
In Table 5 we show some cases which were par- 
titioned incorrectly with KNP but correctly with 
51n these experiments, the. differences were very small. 
But, we think that the differences are significant to some ex- 
tent because we performed Experiment 1 and Experiment 2, 
the data we used are a large corplls containing about a few 
ten thousand morphemes and tagged objectively in advance, 
and the difference of about 0.1% is large in the precisions of 
99%. 
570 
Table 5: Cases when KNP was incorrect and Method 
2 wan correct 
ko tsukotsu \[ N H1,,'Tr 9aman.sh i 
(steadily) (lm prurient wit;h) 
L_{"" 1)e patient with ... steadily) 
lyoyuu wo \] motte \]~xrT;~l~ shirizoke 
i(enough Stl'engi;h) obj (have) (1)eat off) 
(... beat off ... having enough sl;rength) 
~aisha wo I gurupu-wake \[~ 
,.o.,v,,,,y obj (~r,,,,pi,,~) (do) 
(... 11o grouping companies) 
Method 2. A partition with "NEED" indicates that 
KNP missed inserting the i)artition mark, and a par- 
tition with "WRONG" indicates that KNP inserted 
the partitiol~ mark incorrectly. In the test set of Ex- 
periment 1, the F-measure of KNP2.0b6 was 99.66%. 
The F-measur(. ~ increases to 99.83%, ml(ler the as- 
sumption that when KNP2.0t)6 or Method 2 in cor- 
rect, the answer is correct. Although the accuracy 
rate for KNP2.0b6 was high, there were some cases 
in which KNP t)artitioned incorrectly and Method 
2 partitioned correctly, A combination of Method 
2 with KNP2.0b6 may be able to iml)rove the F- 
lile~lsllrO. 
The only 1)revious research resolving Imnnetsu 
identification by machine learning methods, in the 
work by Zhang (Zhang and Ozeki, 1998). The 
decision-tree, ine, thod was used in this work. But 
this work used only a small mmther of intor- 
lll;ttion for t)llllsetsll identification" and (lid not 
achieve lligh accuracy rat;es. (The recall rate 
was 97.6%(=2502/(2502+62)), the 1)recision rate 
was 92.4%(=2502/(2502+205)), and F-measure was 
94.2%.) 
5 Conclusion 
To solve tile t)roblem of aecm'ate btmsetsu iden- 
tification, we carried out ext)eriments comt)aring 
tbur existing machine-learning methods (decision- 
tree method, maxilnum-entrol)y method, examI)le- 
based method and decision-list method). We ob- 
tained the following order of acem'acy in bunsetsu 
identification. 
Example-Based > Decision List > 
Maximum Entropy > Decision Tree 
We also described a new method which uses 
category-exclusive rules with the highest similarity. 
This method performed better than the other learn- 
ing methods in ore" exi)eriments. 
(>l'his work used oifly the POS information of the two roof 
phemes of an analyzed space. 

References 

Adam L. Berger, Stephen A. Della Pietra, and Vincent 
J. \])ella Pietra. 1996. A Maximum Entropy Approach to 
Natural Language Processing. Computational Linguistics, 
22(l):ag-rl. 

Andrew Borthwicl% John Sterling, Eugene Agichtein, aim 
Ralph Grishlnan. 1998. Exploiting Diverse Knowledge 
Sources via Maximum I';ntropy in Named Entity ll.ecoglfi- 
tion. In Proceedings of the Sixth Workshop on Very LaT~le 
Corpora, pages 152 -160. 

Masayuki Kameda. 1995. Simple Jalmnese analysis tool qjp. 
The Association for Natural Lan, guage Processing, the Isl 
National Convention, pages 349-352. (ill .)~ti)~tlleSe ). 

Sadao Kurohashi and Makot<) Nagao. 1997. Kyoto University 
text corpus 1)ro./ect. pages 115-118. (in .lapanese). 

Sadao lgurohashi and MM~oto Nagao, 1998. Japanese Mof 
phological Analysis System JUMAN version 3.5. \])ei)art- 
mcnt of Informatics, Kyoto University. (in Japanese). 

Sadao Kurohashi, 1997. Japanese Dependency/Case Struc- 
ture Anahdzer KNI ) version 2.Obj. Department of lnfof 
matics, Kyoto Ulfiversity. (in Japmmse). 

Sadao Kurohashi, 1!)!18. Japanese Dependency/Uase Struc- 
ture Analyzer KNI' version 2.0b6. Del)artment of Infor- 
m~tics, l(yoto University. (in .lapmmse.). 

Makoto Nagao. 1984. A I,'ralneworl( of a Mechanical Transh> 
ti(m between Jai)anese alld English 1)y Analogy lh'incit)le. 
Artificial a\]l(l Iluman hitelligence~ pages 173 q80. 

Shigeyuld Nishiokayama, Takehito Utsuro, and Yu.ii Mat- 
sumoto. 119!18. Extracting preference of dependency be- 
twee/t ,Japanese subordinate clauses from corlms. IEICE- 
WGNL098- ll, pages 31-38. (in .lapnese). 

NI,I{,\]. 1964. (National Language Resemvh Institute). 
Word List b?l Semantic Principles. Syuei SyuI)pan. (in 
Japanese). 

.\]. 1{.. Quinl;m. 1995. Pro qrams for machine learning. 
Lmme A. I{;mlshaw ;uld Mitchell 1'. Marcus. 1(.)(.15. Text 
clmnking using transformationq)ased learning. \]11 Proceed- 
ira.IS of th, e Th, ird Workshop on Very La*#e Corpora, l)ages 
82 (.)4. 

Adwait l{.atnat)arklfi. 1996. A Maximum l,;nl;ropy Model tin" 
l'art-OlCSt)eech Tagging. 1)rocecdin9 s of P)mpirieal Method 
for Natural Language l'roeessings, pages 133 1,12. 

Adwait ll,atnaparkhi, 19!17. A l,inear Observed Time Statis- 
ticaI l'm'ser Based ou Maximum \]~ntropy Models. ht t~l'o - 
ceedings of Empirical Method for Natural l, an.quage l'ro- 
cessings. 

Eric Sven Ristad. 1998. Maximum Elltropy Modeling 
Toolkit, Release 1.6 beta. http://www,mnemonic.com/ 
software/metal. 

Ronald L. Rivest. 1987. lmarning l)ecision lasts. Machine 
Learning~ 2:229 -246. 

Erik P. Tjong Kim Sang and .lorn \reenstra. 1999. ll.el)re- 
senting text chunks, in EA CL'99. 

Kiyotaka Uehimoto, Satoshi Sekiim, and IIitoshi Isalmra. 
1999..Japanese dependency structure analysis based on 
maximmn entrol)y models. In Proceedings of the Ninth 
CoTtference of the 15"ulwpeau Chapter of the Association 
for Computational Linguistics (P)A CL), pages 196-203. 

\])avid Yarowsky. 1!)94. Decision lists fo," lexieal ambiguity 
resolution: Application to accent restoration in Spanish 
and l,Yench. In 22th Alt, Tt~tal Meeting of the Assoeitation 
of the Computational Linguistics, pages 88-95. 

Yujie Zlmng and Kazuhiko Ozeki. 1998. The applica- 
tion of classitieation trees to bunsetsu segmentation of 
Jat)anese sentences. Journal of Natural Language Process- 
ing, 5(4):17-33. 
