Named Entity Chunking Techniques 
in Supervised Learning for Japanese Named Entity Recognition 
Manabu Sassano 
Fujitsu Laboratories, Ltd. 
4-4-1, Kamikodanaka, Nakahara-ku, 
Kawasaki 211-8588, Japan 
sassano(@ilab.fujitsu.(:o.j I) 
Takehito Utsuro 
l)el)artment of Intbrlnation 
and Computer Sciences, 
Toyohashi University of Technology 
lcnl)aku-cho, ~l~)yohashi 441-8580, Jat)an 
utsm'og))ics, rut. ac.j p 
Abstract 
This 1)aper focuses on the issue of named entity 
chunking in Japanese named entity recognition. 
We apply the SUl)ervised decision list lean> 
ing method to Japanese named entity recogni- 
tion. We also investigate and in(:ori)orate sev- 
eral named-entity noun phrase chunking tech.- 
niques and experimentally evaluate and con> 
t)are their l)erfornlanee, ill addition, we t)rot)ose 
a method for incorporating richer (:ontextua\] 
ilflbrmation as well as I)atterns of constituent 
morphenms within a named entity, which h~ve 
not 1)een considered ill previous research, and 
show that the t)roi)osed method outt)erfi)rms 
these t)revious ai)proa('hes. 
1 Introduction 
It is widely a.greed that named entity recog- 
nition is an imt)ort;ant ste t) ti)r various al)pli- 
(:ations of natural  1)ro(:('.ssing such as 
intbnnation retrieval, maclfine translation, in- 
tbrmation extraction and natural  un- 
derstanding. In tile English , the 
task of named entity recognition is one of the 
tasks of the Message Understanding Confer- 
once (MUC) (e..g., MUC-7 (19!)8)) and has 
be.on studied intensively. In the .}al)anese lan- 
guage~ several recent conferences, such as MET 
(Multilingual Entity Task, MET-I (Maiorano, 
1996) and MET-2 (MUC, 1998)) and IREX (In- 
formation l{etriew~l and Extraction Exercise) 
Workshop (IREX Committee, 1999), focused on 
named entity recognition ms one of their con- 
test tasks, thus promoting research on Jat)anese 
named entity recognition. 
In Japanese named entity recognition, it is 
quite common to apply morphological analy- 
sis as a t)reprocessing step and to segment 
the sentence string into a sequence of mor- 
i)henles. Then, hand-crafted t)attern m~tching 
rules and/or statistical named entity recognizer 
are apt)lied to recognize named entities. It is 
ofl;en the case that named entities to be rec- 
ognized have different segmentation boundaries 
from those of morpheums obtained by the mor- 
phological analysis. For example, in our anal- 
ysis of the \]Ill,F,X workshop's training corpus of 
llallled entities, about half of the mtmed enti- 
ties have segmentation boundaries that al'e dif- 
ferellt \]'rein the result of morphological analysis 
t)y a .\]al)anese lnorphological analyzer BI~EAK- 
FAST (Sassano et al., 1997) (section 2). Thus, in 
.Japanese named entity recognition: among the 
most difficult problems is how to recognize such 
named entities that have segmentation bound- 
ary mismatch against the morphemes ot)tained 
l)y morphological analysis. Furthermore, in al- 
most 90% of (:ases of those segmentation t)oulld- 
ary mismatches, named entities to l)e recognized 
can t)e (teconq)osed into several mort)heroes as 
their constituents. This means that the 1)roblem 
of recognizing named entities in those cases can 
be solved by incorporating techniques of base 
noun phrase chunking (Ramshaw and Marcus, 
1995). 
In this paper, we tbcus on the issue of named 
entity chunking in Japanese name.d entity recog- 
nition. First, we take a supervised learning ap- 
proach rather than a hand-crafted rule based 
approach, because the tbnner is nlore promis- 
ing than the latter with respect to the amomlt 
of human labor if requires, as well as its adaI)t- 
ability to a new domain or a new definition of 
named entities. In general, creating training 
data tbr supervised learning is somewhat easier 
than creating pattern matching rules by hand. 
Next, we apply Yarowsky's method tbr super- 
vised decision list learning I (Yarowsky, 1994) to 
1VVe choose tile decision list learning method as the 
705 
Table 1: Statistics of NE Types of IREX 
NE Type 
ORGANIZATION 
PERSON 
LOCATION 
ARTIFACT 
DATE 
TIME 
MONEY 
PERCENT 
Total 
frequency (%) 
Training 
3676 (19.7) 
3840 (20.6) 
5463 (29.2) 
747 (4.0) 
3567 (19.1) 
502 (2.7) 
390 (2.1) 
492 (2.6) 
18677 
Test 
361 (23.9) 
338 (22.4) 
413 (27.4) 
48 (3.2) 
260 (17.2) 
54 (3.5) 
15 (1.0) 
21 (1.4) 
1510 
Japanese named entity recognition, into which 
we incorporate several noun phrase chunking 
techniques (sections 3 and 4) and experimen- 
tally evaluate their performance on the IREX 
, workshop's training and test data (section 5). 
As one of those noun phrase chunking tech- 
niques, we propose a method for incorporating 
richer contextual information as well as patterns 
of constituent morphemes within a named en- 
tity, compared with those considered in tire pre- 
vious research (Sekine et al., 1998; Borthwick, 
1999), and show that the proposed method out- 
perlbrms these approaches. 
2 Japanese Named Entity 
Recognition 
2.1 Task of the IREX Workshop 
The task of named entity recognition of the 
IREX workshop is to recognize eight named en- 
tity types in Table 1 (IREX Conmfittee, 1999). 
The organizer of the IREX workshop provided 
1,174 newspaper articles which include 18,677 
named entities as tire training data. In the for- 
mal run (general domain) of the workshop, the 
participating systems were requested to recog- 
nize 1,510 nanmd entities included in the held- 
out 71 newspaper articles. 
2.2 Segmentation Boundaries of 
Morphemes and Named Entities 
In the work presented here, we compare the seg- 
mentation boundaries of named entities in tire 
IREX workshop's training corpus with those of 
supervised learning technique mainly because it is easy 
to implement and quite straightibrward to extend a su- 
pervised lem'ning version to a milfimally supervised ver- 
sion (Collins and Singer, 1999; Cucerzan and Yarowsky, 
1999). We also reported in (Utsuro and Sassano, 2000) 
the experimental results of a minimally supervised ver- 
sion of Japanese named entity recognition. 
Table 2: Statistics of Boundary Match vs. Mis- 
lnatch of Morphemes (M) attd Named Entities 
(NE) 
Match/Misnmtch II freq. of NE Tags (%) 
1 M to 1 NE 10480 (56.1) 
n(> 2) Ms 
to 
1 NE 
n=2 
n=3 
n > 4 
4557 (24.4) 
1658 (8.9) 7175 
96o (5.1) (38.4) 
other boundary mismatch 1022 (5.5) 
Total J\[ 18677 
morphemes which were obtained through mor- 
phological analysis by a Japanese morphologi- 
cal attalyzer BREAKFAST (Sassano et al., 1997). 2 
Detailed statistics of the comparison are pro- 
vided in 'Fable 2. Nearly half of the named 
entities have bmmdary mismatches against the 
morI)hemes and also almost 90% of the named 
entities with boundary mismatches can be tie- 
composed into more than one morpheme. Fig-- 
ure 1 shows some examples of such cases, a 
3 Chunking and Tagging Named 
Entities 
In this section, we formalize the problem of 
named entity chunking in Japanese named en- 
tity recognition. We describe ~t novel tech- 
nique as well as those proposed in the previous 
works on nan ted entity recognition. The novel 
technique incorporates richer contextual infor- 
mation as well as p~tterns of constituent mor- 
phemes within ~ named entity, compared with 
the techniques proposed in previous research on 
named entity recognition and base noun phrase 
chunking. 
3.1 Task Definition 
First, we will provide out" definition of the task 
of Japanese named entity chunking. Suppose 
'~The set of part-of-speech tags of lllU.~AKFAST consists 
of about 300 tags. mmAKFaST achieves 99.6% part-of- 
speech accuracy against newspaper articles. 
aIn most cases of the "other boundary mismatch" in 
Table 2, one or more named entities have to be rec- 
ognized as a part of a correctly analyzed morpheme 
and those cases are not caused by errors of morpholog- 
ical analysis. One frequent example of this type is a 
Japanese verbal noun "hou-bei (visiting United States)" 
which consists of two characters "hou (visitin.q)" and "bet 
(United States)", where "bet (United States)" has to be 
recognized as <LOCATION>. \Ve believe that 1)ouudary 
mismatches of this type can be easily solved by employ- 
ink a supervised learning technique such as the decision 
list learning method. 
706 
'Dfl)le 3: Exmoding Schemes of Named Entity Chunldng States 
Named Entity Tag 
Mort)heine Sequence 
Illside/()utside Encoding 
Stmt/End Encoding 
<0RG> <LOC> <L0C> 
-.. M I M M \] M 
0 0RG_I 0 L0C_I LOC_I LOC_I LOC_B 0 
0 0RG_U 0 LOC_S L0C_C LOC_E LOC_U 0 
V Moi-l)hemes to 1 Named Entity\] 
<ORGANIZATION> 
.... Roshia gun -.. 
( S<,s,~i,. 0 ( a,',,.j) 
<PERSON> 
.... Murayama IbIniichi shushOUprimc \]"" 
(last nmne) (first name) ( minister" 
\[3 Morphemes to1 Named Entity\] 
<TIME> 
gozen ku .ii " • 
(AM) (niuc) (o ¥1ock) 
<ARTIFACT> 
hokubei .jiyuu-1)oueki kyoutei - - • 
Norfl~ ( America ) (flee trade.) (treaty) 
Figure 1: Ex~mq)les of B(mndary Mismatch of 
Morl)hemes mid Named Entities 
that a sequen('e of morl)hemes is given as 1)e- 
low: 
Left; l{,ight ( Context ) (Named Entity) ( Context; ) 
• ..~I'_'~...~IL ~,i~'~...M/'...~,l/, ''~ ~1~"...~,f/"... 
t 
(Current Position) 
Then, given tht~t the current t)osition is at 
the morpheme M .N1': the task of tanned elltity 
ehllllkillg is to assign a, C\]luukillg state (to })e de- 
scribed in Section 3.2) as well ~rs a nmned entity 
type to the morl)helne Mi NE at tim current po- 
sition, considering the patterns of surrounding 
morl)hemes. Note that in the SUl)ervised learn- 
ing phase we can use the (:lmnking iuibnnation 
on which morphemes constitute a ngune(l entity, 
and whi(-h morphemes are in the lefl;/right con- 
texts of tit(; named entity. 
3.2 Encoding Schemes of Named 
Entity Chunking States 
In this t)at)er, we evalu~te the following two 
s('hemes of encoding ctmnking states of nalned 
entities. EXalnples of these encoding s(:hemes 
are shown in Table 3. 
3.2.1 Inside/Outside Encoding 
The Inside/Outside scheme of encoding chunk- 
ing states of base noun phrases was studied in 
Ibmlshaw and Marcus (1995). This scheme dis- 
tinguishes the tbllowing three states: 0 the 
word at the current position is outside any base 
holm phrase. I the word at the current po- 
sition is inside some base holm phrase. B the 
word at the current position marks the begin- 
ning of ~ base noml t)hrase that immediately fop 
lows another base noun phrase. We extend this 
scheme to named entity chunking by further dis- 
tinguishing each of the states I and B into eight 
named entity types. 4 Thus, this scheme distin- 
guishes 2 x 8 + 1 = 17 states. 
3.2.2 Start/End Encoding 
The Start/End scheme of encoding clmnking 
states of nmned entities was employed in Sekine 
e,t al. (1998) and Borthwick (1999). This 
scheme distinguishes the, following four states 
for each named entity type: S the lllOlTt)\]lellle 
at the (:urreld; position nmrks the l)eginldng of a 
lUl.in(xt (;lltity consisting of more than one mor- 
1)\]mme. C l;he lnOrl)heme ~I; the cm'r(mt t)osi - 
tion marks the middle of a mmmd entity (:onsist- 
ing of more tlmn one lilOrt)hellle. E -- the illOf 
t)heme, at the current position ram:ks the ending 
of a n~mmd entity consisting of more than one 
morl)heme. U - the morpheme at the current 
t)osition is a named entity consisting of only one, 
mort)heine. The scheme ;dso considers one ad(li- 
tional state for the position outside any named 
entity: 0 t;he mort)heine at the current posi- 
tion is outside any named entity. Thus, in our 
setting, this scheme distinguishes 4 x 8 + 1 = 33 
states. 
a.3 Preceding/Subsequent Morphemes 
as Contextual Clues 
In this l)aper, we ewfluate the following two 
lllodels of considering preceding/subsequent 
4\Ve allow the, state :c_B for a named entity tyt)e x 
only when the, morl)hcme at t, he current 1)osition marks 
the 1)egimdng of a named entity of the type a" that im- 
mediately follows a nmned entity of the same type x. 
707 
morphemes as contextual clues to named entity 
clmnking/tagging. Here we provide a basic out- 
line of these models, and the details of how to 
incorporate them into the decision list learning 
framework will be described in Section 4.2.2. 
3.a.1 3-gram Model 
In this paper, we refer to the model used in 
Sekine et al. (1998) and Borthwick (1999) as a 
3-gram model. Suppose that the current posi- 
tion is at the morpheme M0, as illustrated be- 
low. Then, when assigning a chunking state as 
well as a named entity type to the morpheme 
M0, the 3-gram model considers the preceding 
single morpheme M-1 as well as the subsequent 
single morpheme M1 as the contextual clue. 
Left Current Right 
( Context ) ( Position ) ( Context ) 
• .. M0 M, ... (1) 
The major disadvantage of the 3-gram model 
is that in the training phase it does not 
take into account whether or not the l)re- 
ceding/subsequent morphemes constitute one 
named entity together with the mort)heine at 
the current position. 
a.a.2 Variable Length Model 
In order to overcome this disadvantage of the 3- 
gram model, we propose a novel model, namely 
the "Variable Length Model", which incorpo- 
rates richer contextual intbrmation as well as 
patterns of constituent morl)hemes within a 
named entity. In principle, as part of the train- 
ing phase this model considers which of the pre- 
ceding/subsequent morphenms constitute one 
named entity together with the morpheme at 
the current position. It also considers sev- 
eral morphemes in the lefl;/right contexts of the 
named entity. Here we restrict this model to ex- 
plicitly considering the cases of named entities 
of the length up to three morphenms and only 
implicitly considering those longer than three 
morphemes. We also restrict it to considering 
two morphemes in both left and right contexts 
of the named entity. 
Left 
( Context ) 
... ML2MI_'I 
ll,ight (Named Entity) ( Context ) 
M# ... ... Mm(<3 ) 
1" (2) 
(Current Position) 
4 Supervised Learning for Japanese 
Named Entity Recognition 
This section describes how to apply tile deci- 
sion list learning method to chunking/tagging 
named entities. 
4.1 Decision List Learning 
A decision list (Rivest, 1987; Yarowsky, 1994) 
is a sorted list of decision rules, each of which 
decides the wflue of a decision D given some ev- 
idence E. Each decision rule in a decision list is 
sorted in descending order with respect to some 
preference value, and rules with higher prefer- 
ence values are applied first when applying the 
decision list to some new test; data. 
First, the random variable D representing a 
decision w, ries over several possible values, and 
the random w~riable E representing some evi- 
dence varies over '1' and '0' (where '1' denotes 
the presence of the corresponding piece of evi- 
dence, '0' its absence). Then, given some train- 
ing data in which the correct value of the deci- 
sion D is annotated to each instance, the con- 
ditional probabilities P(D = x I E = 1) of ob- 
serving the decision D = x under the condition 
of the presence of the evidence E (E = 1) are 
calculated and the decision list is constructed 
by the tbllowing procedure. 
1. For each piece of evidence, we calculate the 
Iw of likelihood ratio of the largest; condi- 
tional probability of the decision D = :rl 
(given the presence of that piece of ev- 
idence) to the second largest conditional 
probability of the decision D =x2: 
I E=I) 
l°g2 P(D=x2 I E=I) 
Then~ a decision list is constructed with 
pieces of evidence sorted in descending or- 
der with respect to their log of likelihood 
ratios, where the decision of the rule at each 
line is D = xl with the largest conditional 
probability) 
'~Yarowsky (1994) discusses several techniques for 
avoiding the problems which arise when an observed 
count is 0. lq-om among those techniques, we employ 
tlm simplest ram, i.e., adding a small constant c~ (0.1 < 
< 0.25) to the numerator and denominator. With 
this inodification, more frcquent evidence is preferred 
when several evidence candidates exist with the same 
708 
2. The final line of a decision list; ix defined as 
% default', where the log of likelihood ratio 
is calculated D<)m the ratio of the largest; 
marginal l)robability of the decision D = x t 
to the second largest marginal l)rol)at)ility 
of the decision D =x2: 
P(D =:/11) 
log~ p (D = x'2) 
The 'default' decision of this final line is 
D = Xl with the largest lnarginal probabil- 
ity. 
4.2 Decision List Learning for 
Chunking/Tagging Named Entities 
4.2.1 Decision 
For each of the two schemes of enco(li1~g chunk- 
ing states of nalned entities descrit)ed in Sec- 
tion 3.2, as the l)ossible values of the <teei- 
sion D, we consider exactly the same categories 
of chunking states as those described in Sec- 
tion 3.2. 
4.2.2 Evidence 
The evidence E used in the decision list learn- 
ing is a combination of the tbatures of preced- 
ing/subsequent inorphemes as well as the mor- 
pheme at; the current position. The following 
describes how to form the evidence E fi)r 1)oth 
the a-gram nlodel and varial)le length model. 
3-,gram Model 
The evidence E ret)resents a tut)le (F-l, F0, F1 ), 
where F-1 and F1 denote the features of imme- 
diately t)receding/subsequent morphemes M_~ 
and M1, respectively, F0 the featm:e of the mor- 
pheme 54o at the current position (see Fonnuta 
(1) in Section 3.3.1). The definition of the pos- 
sible values of those tbatures F_l, F0, and 1'~ 
are given below, where Mi denotes the roof 
1)\]mnm itself (i.e., including its lexicM tbrm as 
well as part-of-sl)eech), C,i the character type 
(i.e., JaI)anese (hiragana or katakana), Chinese 
(kanji), numbers, English alphabets, symbols, 
and all possible combinations of these) of Mi, 
Ti the part-of-st)eech of Mi: 
F_ 1 ::m_ \]~//--1 I (C-1, T-l) I T-t Inull 
mlsmoothed conditional probability P(D = x \[ E = 1). 
Yarowsky's training Mgoritl,m also ditfcrs somewhat in 
his use of the ratio *'(~D=,d*~-j)' which is equivalent in 
the case of binary classifications, and also by the interpo- 
lation between the global probalfilities (used here) and 
tl,e residual prol)abilities further conditional on higher- 
ranked patterns failing to match in the list. 
17'1 ::--~ \]~/-/1 I (C,,V;)I T* Inul\] 
F0 ::- M0 I(C0,T0) lT0 
As the evidence E, we consider each possible 
coml)ination of the values of those three f'ea- 
lures. 
Variable Length Model 
The evidence E rel>resents a tuple 
(FL,FNu, FIt), where FL and Fl~ denote 
the features of the morphemes ML_2ML1 and 
Mff'M~ ~ in the left/right contexts of the current 
named entity, respectively, FNE the features 
of the morphemes MN~"" " MNE " "" MNEm(_<3) 
constituting the current named entity (see 
Formula (2) in Section 3.3.2). The definition of 
the possible values of those features 1 L, FNI,:, 
and FI~ arc given below, where F NI~ denotes 
the feature of the j-th constituent morpheme 
M .NJ~ within the current nalne(1 entity, and a 
k/l NI~ is the morl)heme at the cm'ren~ i)osition: 
FL ::= M*_'2M~ ~ \[M~ Inull 
Fu ::= M\]~M~IM~Inull 
FNE FNEFNE IFNE r.NI'2 7z~NE FNE ::= i i+1 i+2 \[ * i-1 * i *i+1 
\] I~NI'A~NI'21pNE 17NE~NI,2 
• , FNE , (3) 
~NIC MN~c (Cm,: T~VJ~ , ,NI,: 
As the evidence E, we consider each possit)le 
(:oml)ination of the wfiues of those three fba- 
tures, except that the tbllowing three restric- 
tions are applied. 
1. In the cases where the current named en- 
tity consists of up to three mort)heroes , as 
the possible values of the feature FNIi in 
the definition (3), we consider only those 
which are consistent with the requirement 
that each nlort)heme M NE is a constituent 
of the cun'ent named entity. For exainple, 
suppose that the cun'ent named entity con- 
sists of three morphemes, where the cur- 
rent position is at the middle of those con- 
stituent morphemes as below: 
Left Right ( Context ) (Named Entity) ( Context ) 
I. L M1N~'M N+~M~u I~ t~ • " M1 Mi -'- _/l//_ 2/~//_ 1 
1" (4) 
(Current Position) 
Then, as the possible values of the feature 
FN\],;, we consider only the tbllowing ibm': 
rN. ::= \[ F.g U.g 
709 
2. II1 the cases where the eurrellt ilalned entity 
consists of more than three morphemes, 
only the three constituent morphemes are 
regarded as within the current named en- 
tity and the rest are treated as if they 
were outside the named entity. For exam- 
pie, suppose that the current named en- 
tity consists of four morphemes as below: 
Left Right ( Context ) (Named Entity) ( Context ) 
L L 
$ 
(Current Position) 
Iit this case, the fourth constitnent mor- 
pheme M N1c is treated as if it were in the 
right context of the current named entity 
as below: 
Left Right ( Context ) (Named Entity) ( Context ) 
'.,~ 1. ~r.,vJ,:~,Nu ~,.,,'.r,_,C M~ZMff 
t 
(Curren~ Position) 
3. As the evidence E, among the possible 
combination of the values of three t'ea- 
tures /~,, ENId, and F/t, we only accept 
those in which the positions of the mor- 
phemes are continuous, and reject those 
discontimmus combinations. For example, 
in the case of Formula (4:) above, as the 
evidence E, we accel)t the combination 
(Mq, M 'My , ull), while we r( iect 
(ML1, M~EM~ 1':, 1,ull). 
4.3 Procedures for Training and 
Testing 
Next we will briefly describe the entire pro- 
cesses of learning the decision list tbr etmnk- 
ing/tagging named entities as well as applying 
it to chunking/tagging unseen named entities. 
4.3.1 Training 
In the training phase, at the positions where 
the corresponding morpheme is a constitnent of 
a named entity, as described in Section 4.2, each 
allowable combination of features is considered 
as the evidence E. On the other hand, at the 
positions where the corresponding morpheme is 
outside any named entity, the way the combi- 
nation of t~at;ures is considered is diflbrent in 
the variable length model, in that the exception 
\]. in the previous section is no longer applied. 
Theretbre, all the possible wflues of the feature 
FNB in Definition (3) are accepted. Finally, the 
frequency of each decision D and evidence E is 
counted and the decision list is learned as de- 
scribed in Section 4.1. 
4.3.2 Testing 
When applying the decision list to chunk- 
ing/tagging nnseen named entities, first, at each 
morpheme position, the combination of features 
is considered as in the case of the non-entity po- 
sition in the training phase. Then, the decision 
list is consulted and all the decisions of the rules 
with a log of likelihood ratio above a certain 
threshold are recorded. Finally, as in the case 
of previous research (Sekine et al., 1998; Berth- 
wick, 1999), the most appropriate sequence of 
the decisions that are consistent throughout the 
whole sequence is searched for. By consistency 
of the decisions, we mean requirements such as 
that the decision representing the beginning of 
some named entity type has to be followed by 
that representing the middle of the same entity 
type (in the case of Start/End encoding). Also, 
in our case, the appropriateness of the sequence 
of the decisions is measured by the stun of the 
log of likelihood ratios of 1;t1(; corresponding de- 
cision rules. 
5 Experimental Evaluation 
We experimentally evaluate the performance 
of the supervised learning tbr Japanese nalned 
entity recognition on the IREX workshop's 
training and test data. We compare the re- 
suits of the confl)inations of the two encod- 
ing schemes of named entity chunking states 
(the Inside/Outside and the Start/End encod- 
ing schemes) and the two at)preaches to contex- 
tual feature design (the 3-gram and the Variable 
Length models). For each of those combina- 
tions, we search tbr an optintal threshold of the 
log of likelihood ratio in the decision list. The 
performance of each combination measured by 
F-measure (fl = 1) is given in Table 4. 
In this ewduation, we exclude the named 
entities with "other boundary mismatch" in 
Tat)le 2. We also classify the system 
output according to the number of con- 
stitnent lnorphemes of each named entity 
and evaluate the peribnnance tbr each sub- 
set of the system output. For each sub- 
710 
3-gram 
Variable 
Length 
'l'al)le 4: Ewduation \]{esults Measured by F-measm'e (fl = 1) 
~, Mori)lmlnes to 1 Named Entity 
-., > J_ll ,,, = I,,.=21,..=311.,>_21.,,.>_31,,>4 
inside/Outside 72.9 75.9 79.7 51.4 69.4 42.5 29.2 
Start/End 72.7 76.6 79.6 43.7 68.1 37.8 29.6 
inside/Outside lJ 74.3 77.6 80.0 55.5 70.9 49.9 41.0 
Start/End t\[72.1 77.0 75.6 51.5 67.2 48.6 43.6 
set, we compare,' the performmme of the fore' 
combinations of {3-grmn, Vm'iable Length} × 
{Inside/Outside, S|;~n't/EIld} mM show the 
highest tmrtbrmance with bold-faced font. 
Several remarkable points of these re, suits of 
1)erfbrmance comparison can be stated as below: 
• Among the four coml)inations, the Variable 
Length Model with hlside/()utside Ent:od- 
ing 1)erfi)rms best in tot~fl (n > 1) as well 
as in the recognition of named entities con- 
sisting of more thml one morl)heme (',, -- 
2, 3, n > 2, 3). 
• in the re,(:ognil;ion of ilsAll(;d elll;ities con- 
sisting of more than two mOl"l)henles (~, = 
3: ?t ~ 3, 4)~ the Vm'ial)le Lellgth Model 
l)erforlllS signific;mtly t)etter thml the 3- rill "t(.~ 
{~l'alll mo(le\]. .tn\],' result (:letu'ly SUpl)orts 
the (;l~iin that our modeli\]xg of the Vm'i- 
nl)le Length Model has an adva,ntnge in the 
recognition ()f long named entities. 
" Ill general, the Inside/Outside en(:oding 
scheme l)erfol'lns slightly t)etl;er th;m the 
Sta\]'t/l'3nd encoding s(:henm, (Well though 
the tbrmer distinguislms (:onsidera|)ly ti~wer 
sl;ates th;m the latter. 
6 Conclusion 
In this 1)~per, we al)plied the supervised deci- 
si(m list learning method to ,\]at)anese mmmd en- 
tity recognition, into wlfich we, incorporated sev- 
eral n(mn phrase chunking teelmiques ~md ex- 
perimentally evaluated their pertbrmance. We, 
showed that a novel technique that we proposed 
out, performed those using previously considered 
(;otd;extual fe~tu\]:es. 
7 Acknowledgments 
This research was c~rried out while the au- 
thors were visiting scholars at l)epartment of 
Computer Science, Johns Hopkins University. 
The ~mthors would like to thank Prof David 
Yarowsky of Johns Hopkins University for in- 
valual)le sut)porl;s to this research. 

References 

A. Borthwick. 1999. A JaI)mmse named entity rec- 
ognizer constructed by a non-speaker of Japanese. 
In Proc. of the II~EX Workshop, pages 187 193. 

54. Collins a.nd Y. Singer. 1999. Unsupervised mod- 
els of named entity classification. In P.roc. of 
the 1999 Joint SIGDAT Cm@rcncc on Empiri- 
cal Mcth, ods in Natural Languagc P~vccssing and 
Very Large Corpora, pages 100 110. 

S. Cucerzmt and D. Yarowsky. 1999. l~anguage inde- 
1)endent named entity recognition combining mor- 
1)hological mid contextual evideime, in Proc. of 
th, c 1999 Joi'nt SIGDAT Cm@rence on Empiri- 
cal Methods in Natural Language PTvccssin9 mid 
Very Large Corpora, pages 90 99. 

IREX Committee, editor. 1!)99. P~vcecdings of the 
\]REX Workshop. (in Japanese). 

S. Maiorano. 1996. The multilingual entity task 
(MET): Jalmne, se, results. In ISvc. of TIPSTEI~, 
PIH)U1/,AM P HA,5'1'; 11, pa.ges 449 45\]. 

MUC. 1998. l)'rocccdings oJ" l,h,e, 7th Message Unde'r- 
standing ConJ?rence, (MUC-7). 

L. l{alnshaw and M. Ma.rcus. 1995. Text chunking 
using trmlsforma.tion-based \]emning. In P,roc. of 
th, c 3rd Work,vh, op on l/cry Larg(: Corpora~ Im.ges 
83 -94. 

ILL. Rive, st. 1987, Le, arning decision lists. Machine 
Learning, 2:229 246. 

M. Sassano, Y. Saito, and K. Matsui. 1997. 
,J~l)alle,se morphological mmlyzer for NLP apl)li- 
(:atioils. Ill Proc. of thc, 3rd gn'ttual Meeting of 
th, c Association for Natural Language Processing, 
1)a.ges 441- 444. (in Jal)anese). 

S. Sekine, lL Grishman, and H. Shinnou. 1998. 
A decision tree method tbr tinding and ('lassit~- 
ing names in Jat)almse texts. In Proc. of the 6th 
Workshop on Very La~yc Ctnpora, pages 148-152. 

T. Utsuro mid M. Sassano. 2000. Minimally su- 
pervised ,\]almnese named e, ntity recognition: I{e- 
source, s and evahmtion. In Proc. of thc 2nd Inter- 
national Confcrcncc on Lanquaqc Resources and 
Evahtation, pages 1229 -1236. 

1). Yarowsky. 1994. Decision lists for lexical mnbi- 
guity resolution: Al)t)lication to accent restora.- 
tion in Spanish and French. In Proc. of the 32rid 
Annual Mecl, ing of ACL, 1)ages 88 -95. 
