Japanese Zero Pronoun Resolution based on
Ranking Rules and Machine Learning
Hideki Isozaki and Tsutomu Hirao
NTT Communication Science Laboratories
Nippon Telegraph and Telephone Corporation
2-4 Hikaridai, Seika-cho, Souraku-gun, Kyoto, Japan, 619-0237
(isozaki,hirao)@cslab.kecl.ntt.co.jp
Abstract
Anaphora resolution is one of the most
important research topics in Natural Lan-
guage Processing. In English, overt pro-
nouns such as she and definite noun
phrases such as the company are anaphors
that refer to preceding entities (an-
tecedents). In Japanese, anaphors are of-
ten omitted, and these omissions are called
zero pronouns. There are two major ap-
proaches to zero pronoun resolution: the
heuristic approach and the machine learn-
ing approach. Since we have to take var-
ious factors into consideration, it is diffi-
cult to find a good combination of heuris-
tic rules. Therefore, the machine learn-
ing approach is attractive, but it requires
a large amount of training data. In this
paper, we propose a method that com-
bines ranking rules and machine learning.
The ranking rules are simple and effective,
while machine learning can take more fac-
tors into account. From the results of our
experiments, this combination gives better
performance than either of the two previ-
ous approaches.
1 Introduction
Anaphora resolution is an important research topic
in Natural Language Processing. For instance,
machine translation systems should identify an-
tecedents of anaphors (such as he or she) in the
source language to achieve better translation quality
in the target language.
We are now studying open-domain question an-
swering systems1, and we expect QA systems to
benefit from anaphora resolution. Typical QA sys-
tems try to answer a user’s question by finding rel-
evant phrases from large corpora. When a correct
answer phrase is far from the keywords given in
the question, the systems will not succeed in find-
ing the answer. If the system can correctly resolve
anaphors, it will find keywords or answers repre-
sented by anaphors, and the chances of finding the
answer will increase. From this motivation, we are
developing our system toward the ability to resolve
anaphors in full-text newspaper articles.
In Japanese, anaphors are often omitted and these
omissions are called zero pronouns. Since they do
not give any hints (e.g., number or gender) about an-
tecedents, automatic zero pronoun resolution is dif-
ficult. In this paper, we focus on resolving the zero
pronoun, which is shortened for simplicity to ‘zero.’
Most studies on Japanese zero pronoun resolution
have not tried to resolve zeros in full-text newspa-
per articles. They have discussed simple sentenses
(Kameyama, 1986; Walker et al., 1994; Yamura-
Takei et al., 2002), dialogues (Yamamoto et al.,
1997), stereotypical lead sentences of newspaper ar-
ticles (Nakaiwa and Ikehara, 1993), intrasentential
resolution (Nakaiwa and Ikehara, 1996; Ehara and
Kim, 1996) or organization names in newspaper ar-
ticles (Aone and Bennett, 1995).
There are two approaches to the problem: the
heuristic approach and the machine learning ap-
1http://trec.nist.gov/data/qa.html
proach. The Centering Theory (Grosz et al., 1995)
is important in the heuristic approach. Walker
et al. (1994) proposed forward center ranking for
Japanese. Kameyama (1986) emphasized the im-
portance of a property-sharing constraint. Okumura
and Tamura (1996) experimented on the roles of
conjunctive postpositions in complex sentences.
However, these improvements are not sufficient
for resolving zeros accurately. Murata and Na-
gao (1997) proposed complicated heuristic rules that
take various features of antecedents and anaphors
into account. We have to take even more factors into
account, but it is difficult to maintain such heuris-
tic rules. Therefore, recent studies employ machine
learning approaches. However, it is also difficult to
prepare a sufficient number of annotated corpora.
In this paper, we propose a method that com-
bines these two approaches. Heuristic ranking rules
give a general preference, while a machine learn-
ing method excludes inappropriate antecedent can-
didates. From the results of our experiments, the
proposed method shows better performance than ei-
ther of the two approaches alone.
Before giving a description of our methodology,
we briefly introduce the grammar of the Japanese
language here. A Japanese sentence is a sequence
of bunsetsus: a0a2a1a4a3a6a5a7a5a6a5a8a3a7a0a10a9 . A bunsetsu is a se-
quence of content words (e.g., nouns, adjectives,
and verbs) followed by zero or more functional
words (e.g., particles and auxiliary verbs): a0a12a11
a13
a1 a3a6a5a7a5a6a5a14a3
a13a16a15
a3a7a17 a1 a3a7a5a6a5a7a5a8a3a6a17a19a18 . A bunsetsu modifies one of
the following bunsetsus. A particle (joshi) marks the
grammatical case of the noun phrase immediately
before it. For example, ga is nominative (subject),
wo is accusative (object), ni is dative (object2), and
wa marks a topic.
Tomu ga
Tom=subj
/ Bobu ni
Bob=object2
/ hon wo
book=object
/ okutta.
sent
(Tom sent a book to Bob.)
Bunsetsu dependency is represented by a list of
bunsetsu pairs (modifier, modified). For instance,
a20a22a21a8a23 a3a25a24a27a26a8a3 a21a14a28 a3a7a29a19a26a8a3 a21 a29a30a3a31a24a32a26a14a3 a21 a24a27a3a34a33 a23 a26a14a35 indicates that there
are four bunsetsus in this sentence and that the first
bunsetsu modifies the fourth bunsetsu and so on.
The last bunsetsu modifies no bunsetsu, which is in-
dicated by a33 a23 .
It takes a long time to construct high-quality an-
notated data, and we want to compare our results
with conventional methods. Therefore, we obtained
Seki’s data (Seki et al., 2002a; Seki et al., 2002b),
which are based on the Kyoto University Corpus 2
2.0. These data are divided into two groups: gen-
eral and editorial. General contains 30 general news
articles, and editorial contains 30 editorial articles.
According to his experiments, editorial is harder
than general. Perhaps this is caused by the differ-
ence in rhetorical styles and the lengths of articles.
The average number of sentences in an editorial ar-
ticle is 28.7, while that in a general article is 13.9.
However, we found problems in his data. For
instance, the data contained ambiguous antecedents
like dou-shi (the same person) or dou-sha (the same
company) as correct antecedents. We replaced these
‘correct answers’ with their explicit names. We also
removed zeros in quoted sentences because they are
quite different from other sentences.
In addition, we decided to use the output of
ChaSen 2.2.93 and CaboCha 0.344 instead of the
morphological information and the dependency in-
formation provided by the Kyoto Corpus since clas-
sification of the joshi (particles) in the Corpus was
not satisfactory for our purpose. Since CaboCha
was trained by Kyoto Corpus 3.0, CaboCha’s depen-
dency output is very similar to that of the Corpus.
2 Methodology
In this paper, we combine heuristic ranking rules and
machine learning. First, we describe how we ex-
tract possible antecedents (candidates). Second, we
describe the rule-based ranking system and the ma-
chine learning system. Finally, we describe how to
combine these two methods.
We consider only anaphors for noun phrases fol-
lowing Seki and other studies. We assume that zeros
are already detected. We also assume zeros are lo-
cated at the starting point of a bunsetsu that contains
a yougen (a verb, an adjective, or an auxiliary verb).
From now on, we use ‘verb’ instead of ‘yougen’ for
readability. A zero’s bunsetsu is a bunsetsu that con-
tains the zero. We further assume that each zero’s
grammatical case is already determined by a zero
detector and represented by corresponding particles.
2http://pine.kuee.kyoto-u.ac.jp/nl-resource/courpus-e.html
3http://chasen.aist-nara.ac.jp/
4http://cl.aist-nara.ac.jp/˜taku-ku/software/cabocha/
If a zero is the subject of a verb, its case is repre-
sented by the particle ga. If it is an object, it is rep-
resented by wo. If it is an object2, it is represented
by ni. We consider only these three cases. A zero’s
particle means such a particle.
Since complex sentences are hard to analyze, each
sentence is automatically split at conjunctive post-
positions (setsuzoku joshi) (Okumura and Tamura,
1996; Ehara and Kim, 1996). In order to distinguish
the original complex sentence and the simpler sen-
tences after the split, we call the former just a ‘sen-
tence’ and the latter ‘post-split sentences’. When a
conjunctive postposition appears in a relative clause,
we do not split the sentence at that position. In the
examples below, we split the first sentence at ‘and’
but do not split the second sentence at ‘and’.
She bought the book and sold it to him.
She bought the book that he wrote and sold.
A zero’s sentence is the (original) sentence that
contains the zero. From now on, a36 stands for a zero
and a37 stands for a candidate of a36 ’s antecedent. a36 ’s
particle is denoted ZP, and CP stands for a37 ’s next
word that is a37 ’s particle or a punctuation symbol.
2.1 Enumeration of possible antecedents
Candidates (possible antecedents) are enumerated
on the fly by using the following method.
1. We extract a content word sequence
a13
a1a38a3a7a5a6a5a7a5a8a3
a13 a15 as a candidate
a37 if it is fol-
lowed by a case marker (kaku-joshi, e.g., ga,
wo), a topic marker (wa or mo), or a period.
2. If a37 ’s a13a39a15 is a verb, an adjective, an auxi-
lary verb, an adverb, or a relative pronoun
(ChaSen’s meishi-hijiritsu, e.g., koto (what he
did) and toki (when she married)), a37 is ex-
cluded. (If a13 a15 is a closing quotation mark,
a13 a15a22a40
a1 is checked instead.)
3. If a37 ’s a13 a15 is a pronoun or an adverbial noun (a
noun that can also be used as an adverb, i.e.,
ChaSen’s meishi-fukushi-kanou), a37 is excluded.
4. If a37 is dou-shi (the person), it is replaced by
the latest person name. If a37 is dou-sha (the
company), it is replaced by the latest organi-
zation name. If a37 is dou+suffix, it is replaced
by the latest candidate that has the same suffix.
For this task, we use a named entity recognizer
(Isozaki and Kazawa, 2002).
The first step extracts a content word sequence
from a bunsetsu. The second step excludes verb
phrases, adjective phrases, and clauses. As a re-
sult, we obtain only noun phrases. The third step ex-
cludes adverbial expressions like kotoshi (this year).
The forth step resolves anaphors like definite noun
phrases in English. We should also resolve pro-
nouns, but we did not because useful pronouns are
rare in newspaper articles.
In addition, we register a resolved zero as a new
candidate. If a36 ’s antecedent is determined to be a37a42a41 ,
a new candidate a37a14a43
a41
is created for future zeros. a37a8a43
a41
is
a copy of a37a38a41 except that a37 a43
a41
’s particle is ZP and a37a34a41 ’s
location is a36 ’s location. In the training phase of the
machine learning approach, we consider a correct
answer as a37a14a41 . Then, we can remove far candidates
from the list.
In this way, our zero resolver creates a ‘general
purpose’ candidate list. However, some of the can-
didates are inappropriate for certain zeros. A verb
usually does not have the same entity in two or more
cases (Murata and Nagao, 1997). Therefore, our
resolver excludes candidates that are filled in other
cases of the verb. When a verb has two or more
zeros, we resolve ga first, and its best candidate is
excluded from the candidates of wo or ni.
2.2 Ranking rules
Various heuristics have been reported in past litera-
ture. Here, we use the following heuristics.
1. Forward center ranking (Walker et al., 1994):
(topic a44 empathy a44 subject a44 object2 a44 object
a44 others).
2. Property-sharing (Kameyama, 1986): If a zero
is the subject of a verb, its antecedent is perhaps
a subject in the antecedent’s sentence. If a zero
is an object, its antecedent is perhaps an object.
3. Semantic constraints (Yamura-Takei et al.,
2002; Yoshino, 2001): If a zero is the sub-
ject of ‘eat,’ its antecedent is probably a per-
son or an animal, and so on. We use Nihongo
Goi Taikei (Ikehara et al., 1997), which has
14,730 English-to-Japanese translation patterns
for 6,103 verbs, to check the acceptability of a
candidate. Goi Taikei also has 300,000 words
in about 3,000 semantic categories. (See Ap-
pendix A for details.)
4. Demotion of candidates in a relative clause
(rentai shuushoku setsu): Usually, Japanese ze-
ros do not refer to noun phrases in relative
clauses (Ehara and Kim, 1996). (See Appendix
B for details.)
Since sentences in newspaper articles are often
complex and relative clauses are sometimes nested,
we refine this rule in the following way.
a45 A candidate’s relative clause is the inmost rel-
ative clause that contains the candidate.
a45 A relative clause finishes at the noun modified
by the clause.
a45 If
a36 appears before the finishing noun of a37 ’s rel-
ative clause, the clause is still unfinished at a36 .
Otherwise, the clause is already finished.
a45 A quoted clause (with or without quotation
marks “ ”) indicated by a quotation marker ‘to’
(‘that’ in ‘He said that she is . . . ’) is also re-
garded as a relative clause.
a45 We demote
a37 after a37 ’s relative clause finishes.
It is not clear how to combine the above heuris-
tics consistently. Here, we sort the candidates in
a lexicographical order based on the above fea-
tures of candidates. For instance, we can use
a lexicographically increasing order defined by
a21 Via3 Rea3 Aga3 Dia3 Saa26 , where
a45 Vi (for violation) is 1 if the candidate violates
the semantic constraint. Otherwise, Vi is 0.
a45 Re (for relative) is 1 if the candidate is in a rel-
ative clause that has already finished before a36 .
Otherwise, Re is 0.
a45 Ag (for agreement) is 0 if CP=ZP holds. (Since
most of wa and mo are subjects, they are re-
garded as ga here.) Otherwise, Ag is 1.
a45 Di (for distance) is a non-negative integer that
represents the number of post-split sentences
between a37 and a36 . If a candidate’s Di is larger
than maxDi, it is removed from the candidate
list.
a45 Sa (for salience) is 0 if CP is wa. Sa is 1 if CP
is ga. Sa is 2 if CP is ni. Sa is 3 if CP is wo.
Otherwise, Sa is 4. We did not implement em-
pathy because it makes the program more com-
plex, and empathy verbs are rare in newspaper
articles.
For instance, a21a8a46 a3 a46 a3 a46 a3 a28 a3 a23 a26a48a47 a21a8a23 a3 a46 a3 a46 a3 a46 a3 a46 a26 holds.
The first ranked (lexicographically smallest) candi-
date is regarded as the best candidate. We employ
lexicographical ordering because it seems the sim-
plest way to rank candidates. We put Vi in the
first place because Vi was often regarded as a con-
straint in the past literature. We put Ag before
Sa because Kameyama’s method was better than
Walker’s in Okumura and Tamura (1996). There-
fore, a21 Via3a6a5a49a5a50a3 Aga3a6a5a49a5a50a3 Saa3a6a5a50a5a49a26 is expected to be a good
ordering. The above ordering is an instance of this.
2.3 Machine Learning
Although we can consider various other features
for zero pronoun resolution, it is difficult to com-
bine these features consistently. Therefore, we
use machine learning. Support Vector Machines
(SVMs) have shown good performance in various
tasks in Natural Language Processing (Kudo and
Matsumoto, 2001; Isozaki and Kazawa, 2002; Hi-
rao et al., 2002).
Yoshino (2001) and Iida et al.(2003b) also applied
SVM to Japanese zero pronoun resolution, but the
usefulness of each feature was not clear. Here, we
add features for complex sentences and analyze use-
ful features by examining the weights of features.
We use the following features of a37 as well as CP.
CSem a37 ’s semantic categories. (See Appendix A.)
CPPOS CP’s part-of-speech (POS) tags (rough
and detailed).
CPOS The POS tags of the last word of a37 .
Siblings When CP is wa or mo, it is not clear
whether a37 is a subject. However, a verb rarely has
the same entity in two or more cases. Therefore, if a37
modifies a verb that has a subject, a37 is not a subject.
In the next example, hon is an object of katta.
Ano
that
/ hon wa
book=topic
/ Tomu ga
Tom=subj
/ katta.
bought
(As for that book, Tom bought it.)
In order to learn such things, we use sibling case-
markers that modify the same verb as a37 ’s features.
We also use the following features of a36 as well as
ZP.
Conjunct The latest conjunctive postposition in
the sentence and its classification (Okumura and
Tamura, 1996; Yoshimoto, 1986).
ZSem Semantic categories of the verb that a36 mod-
ifies. We use them only when the verb is sahen
meishi + ‘suru.’ Sahen meishi is a kind of noun that
can be an object of the verb ‘suru’ (do) (e.g., ‘shop-
ping’ in ‘do the shopping’).
We also use the following relations between a36 and
a37 as well as Ag, Vi, and Di.
Relative Whether a37 is in a relative clause.
Unfinished Whether the relative clause is unfin-
ished at a36 .
Intra (for intrasentential coreference) Whether a37
explicitly appears in a36 ’s sentence.
Sometimes it is difficult to distinguish cataphora
from anaphora. Even if an antecedent appears in a
preceding sentence, it is sometimes easier to find a
candidate after a36 , as illustrated by the case of ‘his’
in the next English example.
Bob and John separately drove to Charlie’s
house. . . . Since his car broke down, John made a
phone call.
Even if Di a11 a46 holds, Intra does not necessarily
hold because we introduce resolved zeros as new
candidaites.
Parallel Whether a37 appears in a clause parallel to
a clause in which a zero appears. This will be useful
for the resolution of a zero as with ‘it’ in the next
English sentence.
He turned on the TV set and she turned it off.
Immediate Whether a37 ’s bunsetsu appears imme-
diately before a36 ’s. In the following sentence, a can-
didate ryoushin is located immediately before the
zero.
Kare no
he+’s
/ ryoushin wa
parents=topic
/
(a36 ga)
(a51 =subj)
ikiteiru to
alive+that
/ shinjiteiru.
believe
(His parents believe that (a36 ) is still alive.)
Here, we represent all of the above features by a
boolean value: 0 or 1. Semantic categories can be
represented by a 0/1 vector whose a52 -th component
corresponds to thea52 -th semantic category. Similarly,
POS tags can be represented by a 0/1 vector whose
a52 -th component corresponds to the a52 -th POS tag. On
the other hand, Di has a non-negative integer value.
We also encode the distance by a 0/1 vector whose
a52 -th component corresponds to the fact that the dis-
tance is a52 . The distance has an upper bound maxDi.
In this way, we can represent a candidate by a
boolean feature vector. A candidate a37a10a53 ’s feature vec-
tor is denoted a54a55a53 . If a boolean feature appears only
once in the given data, we remove the feature from
the feature vectors.
The training data comprise the set of pairs
a20a22a21a57a56
a53
a3
a54 a53
a26a14a35 , where a56
a53 is
a23 if
a37 a53 is a correct antecedent
of a zero. Otherwise, a56 a53 is a33 a23 . By using the train-
ing data, SVM finds a decision function a58 a21 a54 a26a59a11
a53
a56
a53a8a60a27a53a62a61
a21
a54
a3a38a63
a53
a26a65a64a66a0 , where
a54 is the feature vector
of a candidate a37 and a63 a53 s are support vectors selected
from the training data. a60 a53 is a constant. a61 a21a14a67a3 a67a26 is
called a kernel function. If a58 a21 a54 a26 a44 a46 holds, a54 is
classified as a correct antecedent.
2.4 Combinations
Here, we use the following method to combine the
ordering and SVM.
1. Sort candidates by using the lexicographical or-
der.
2. Classify each candidate by using SVM in this
order.
3. Ifa58 a21 a54 a53 a26 is positive, stop there and sort the eval-
uated candidates by a58 a21 a54a68a53 a26 in decreasing order.
4. If no candidate satisfies a58 a21 a54a69a53 a26 a44 a46 , return the
best candidate in terms of a58 a21 a54a68a53 a26 .
3 Results
We conducted leave-one(-article)-out experiments.
For each article, 29 other articles were used for
training. Table 1 compares the scores of the above
methods. ‘First’ picks up the first candidate given
by a given lexicographical ordering. The acronym
‘vrads’ stands for the lexicographical ordering of
a21 Via3 Rea3 Aga3 Dia3 Saa26 . ‘Best’ picks up the best can-
didate in terms of a58
a21
a54
a26 without checking whether it
Table 1: Percentage of correctly resolved zeros
a70 = The combination is worse than ‘first’ or ‘best.’
a71 = (Seki et al., 2002a),
a0 = (Seki et al., 2002b)
general editorial
first mem svm1 svm2 first mem svm1 svm2
best 51.0 56.8 55.9 43.4 45.1 45.1
vrads 64.3 53.0a72 58.5a72 66.3 45.3 44.0a72 45.9 47.3
vards 64.0 53.0a72 58.5a72 66.0 45.9 44.2a72 45.9 46.9
rvads 63.4 51.0a72 58.5a72 66.3 44.4 43.4a72 46.1 47.5
avrds 62.8 53.0a72 58.5a72 66.0 44.2 44.0a72 45.9 46.9
vrdsa 55.9 53.0a72 58.5 65.7 43.4 44.0 45.9 48.6
adsvr 53.0 51.0a72 57.9 62.8 43.8 43.4a72 46.3 48.6
davrs 39.5 53.0 57.6 62.5 34.6 44.2 46.1 50.2
Seki 54.0a41 50.7a73 39.8a41
is positive. Consequently, it is independent of the
ordering (unless two or more candidates have the
best value). ‘Svm1’ uses the ordinary SVM (Vap-
nik, 1995) while ‘svm2’ uses a modified SVM for
unbalanced data (Morik et al., 1999), which gives
a large penalty to misclassification of a minority (=
positive) example.5 In general, svm2 accepts more
cadidates than svm1. According to this table, svm1
is too severe to exclude only bad candidates. We
also tried the maximum entropy model 6 (mem) and
C4.5, but they were also too severe.
When we use SVM, we have to choose a good
kernel for better performance. Here, we used the
linear kernel (a61 a21 a54 a3a38a63a65a26a69a11 a54 a67 a63 ) for SVM because it
was best according to our preliminary experiments.
We set maxDi at 3 because it gave the best results.
The table also shows Seki’s scores for reference,
but it is not fair to compare our scores with Seki’s
scores directly because our data is slightly different
from Seki’s. The number of zeros in general in our
data is 347, while Seki resolved 355 detected ze-
ros in (Seki et al., 2002a) and 404 in (Seki et al.,
2002b). The number of zeros in our editorial is
514, while (Seki et al., 2002a) resolved 498 detected
zeros. In order to overcome the data sparseness,
5An ordinary SVM minimizes
a74a31a75a76a74a6a77a25a78a4a79a81a80a83a82 a84a8a85
a84 while
the modified SVM minimizes a74a86a75a76a74a87a77a25a78a4a79a88a80a89a82a27a90 a84a50a91a92a62a93a95a94a97a96 a85 a84 a80
a82a99a98 a84a50a91a92a62a93a50a94
a98
a96 a85
a84 where
a82 a90 a78a10a82a100a98 = number of negative exam-
ples/number of positive examples.
6http://www2.crl.go.jp/jt/a132/members/mutiyama/software.
html
Seki used unannotated articles to get co-occurrence
statistics. Without the data, their scores degraded
about 5 points. We have not conducted experiments
that use unannotated corpora; this task is our future
work.
As we expected, instances of a21 Via3a6a5a50a5a49a3 Aga3a7a5a50a5a49a3 Saa3a7a5a50a5a49a26
show good performance. Without SVMs, ‘vrads’
is the best for general in the table. It is interest-
ing that such a simple ordering gives better perfor-
mance than SVMs. However, the combination of
‘vrads’ and ‘svm2’ (= vrads+svm2) gives even bet-
ter results. In general, ‘a101 +svm2’ is better than ‘first’
and ‘a101 +svm1.’ With SVM, ‘davrs+svm2’ gave the
best result for editorial. Editorial articles some-
times use anthropomorphism (e.g., The report says
. . . ) that violates semantic constraints. Therefore,
‘vrads’ does not work well for such cases.
Table 2 shows the weights of the above features
determined by svm2 for a fold of the leave-one-
out experiment of ‘vrads+svm2.’ The weights can
be given by rewriting a58
a21
a101a103a102
a23a34a104 a3a6a5a42a5a34a5a8a3
a101a103a102a105
a104 a26 as
a13a107a106
a64
a108a109a10a110
a1
a13
a102a111
a104
a101a103a102a111
a104 . This table shows that Kameyama’s
property-sharing (Ag), semantic violation (Vi), can-
didate’s particle (CP), and distance (Di) are very
important features. Our new features Parallel, Un-
finished, and Intra also obtained relatively large
weights. Semantic categories ‘suggestions’ and ‘re-
port’ reflect the fact that some articles use anthro-
pomorphism. These weights will be useful to de-
sign better heuristic rules. The fact that Unfinished’s
weight almost cancels Relative’s weight justifies the
Table 2: Weights of features
general editorial
a64 a23 a5a28a22a28 Ag=0 a64 a46 a5a50a112a22a113 Ag=0
a64 a46 a5a24a27a113 ZP=ni a64 a46 a5a50a114 a28 Parallel
a64 a46 a5a50a29a22a115 concrete
a116 CSem
a64 a46 a5a24a117a114 Di=0
a64 a46 a5a50a29a22a112 CP=ga a64 a46 a5a49a29a30a118 Intra
a64 a46 a5a50a29a22a112 Intra a64 a46 a5a50a29a22a113 CP=ga
a64 a46 a5a50a29a34a24 agents
a116 CSem
a64 a46 a5a50a29a22a29 suggestion
a116 CSem
a64 a46 a5a50a29a22a29 CP=wa a64 a46 a5a50a29a30a29 report
a116 CSem
a64 a46 a5a28 a29 Di=0 a64 a46 a5a50a29 a28 agents
a116 CSem
a64 a46 a5a46 a115 Parallel a64 a46 a5a28 a24 concrete
a116 CSem
a64 a46 a5a46 a113 Unfinished a64 a46 a5a28 a24 Unfinished
a33 a46 a5a46 a112 Relative a64 a46 a5a28 a24 CP=wa
a33 a46 a5a28a22a23 CP=mo a33 a46 a5 a28a30a46 CPPOS=‘case marker’
a33 a46 a5a28 a29 CP=no a33 a46 a5 a28 a29 Relative
a33 a46 a5a50a29 a28 ZP=wo a33 a46 a5a24 a46 CP=no
a33 a46 a5a50a29a22a112 Di=3 a33 a46 a5a24a117a115 Di=3
a33 a46 a5a50a113a22a115 Vi=1 a33 a46 a5a50a114a30a115 Vi=1
definition of Re.
4 Discussion
Yoshino (2001) used an ordinary SVM with
a61
a21
a54
a3a14a119a69a26a59a11 a21a14a23 a64
a54
a67 a119a120a26a25a121 . He tried to find use-
ful features by feature elimination. Since features
are not completely independent, removing a heav-
ily weighted feature does not necessarily degrade the
system’s performance. Hence, feature elimination is
more reliable for reducing the number of features.
However, feature elimination takes a long time. On
the other hand, feature weights can give rough guid-
ance. According to the table, our new features (Par-
allel, Unfinished, and Intra) obtained relatively large
weights. This implies their importance. When we
eliminated these three features, vrads+svm2’s score
for editorial dropped by 4 points. Therefore, combi-
nations of these three features are useful.
Recently, Iida et al. (2003a) proposed an SVM-
based tournament model that compares two candi-
dates and selects the better one. We would like to
compare or combine their method with our method.
For further improvement, we have to make the mor-
phological analyzer and the dependency analyzer
more reliable because they make many mistakes
when they process complex sentences.
SVM has often been criticized as being too slow.
However, the above data were small enough for the
state-of-the-art SVM programs. The number of ex-
amples in each set of training data was about 5,000–
6,100, and each training phase took only 5–18 sec-
onds on a 2.4-GHz Pentium 4 machine.
5 Conclusions
In order to make Japanese zero pronoun resolu-
tion more reliable, we have to maintain complicated
heuristic rules or prepare a large amount of training
data. In order to alleviate this problem, we com-
bined simple lexicographical orderings and SVMs.
It turned out that a simple lexicographical ordering
performed better than SVM, but their combination
gave even better performance. By examining feature
weights, we found that features for complex sen-
tences are important in zero pronoun resolution. We
confirmed this by feature elimination.

References
Chinatsu Aone and Scott William Bennett. 1995. Evalu-
ating automated and manual acquisition of anaphora
resolution strategies. In Proc. of ACL-1995, pages
122–129.
Terumasa Ehara and Yeun-Bae Kim. 1996. Zero-subject
resolution by probabilistic model (in Japanese). Jour-
nal of Natural Language Processing, 3(4):67–86.
Barbara J. Grosz, Aravind K. Joshi, and Scott Weinstein.
1995. Centering: A framework for modelling the lo-
cal coherence of discourse. Computational Linguis-
tics, 21(2):203–226.
Tsutomu Hirao, Hideki Isozaki, Eisaku Maeda, and Yuji
Matsumoto. 2002. Extracting important sentences
with support vector machines. In Proc. of COLING-
2002, pages 342–348.
Ryu Iida, Kentaro Inui, Hiroya Takamura, and Yuji Mat-
sumoto. 2003a. Incorporating contextual cues in
trainable models for coreference resolution (to ap-
pear). In Proc. of EACL Workshop on the Computa-
tional Treatment of Anaphora.
Ryu Iida, Kentaro Inui, Hiroya Takamura, and Yuji
Matsumoto. 2003b. One method for resolving
Japanese zero pronouns with machine learning model
(in Japanese). In IPSJ SIG-NL 154.
Satoru Ikehara, Masahiro Miyazaki, Satoshi Shirai, Akio
Yokoo, Hiromi Nakaiwa, Kentaro Ogura, Yoshifumi
Ooyama, and Yoshihiko Hayashi. 1997. Goi-Taikei —
A Japanese Lexicon (in Japanese). Iwanami Shoten.
Hideki Isozaki and Hideto Kazawa. 2002. Efficient sup-
port vector classifiers for named entity recognition. In
Proc. of COLING-2002, pages 390–396.
Megumi Kameyama. 1986. A property-sharing con-
straint in centering. In Proc. of ACL-1986, pages 200–
206.
Taku Kudo and Yuji Matsumoto. 2001. Chunking with
support vector machines. In Proc. of NAACL-2001,
pages 192–199.
Katharina Morik, Peter Brockhausen, and Thorsten
Joachims. 1999. Combining statistical learning with
a knowledge-based approach — a case study in inten-
sive care monitoring. In Proc. of ICML-1999, pages
268–277.
Masaki Murata and Makoto Nagao. 1997. An estimate
of referents of pronouns in Japanese sentences using
examples and surface expressions (in Japanese). Jour-
nal of Natural Language Processing, 4(1):41–56.
Hiromi Nakaiwa and Satoru Ikehara. 1993. Zero pro-
noun resolution in a Japanese to English machine
translation system using verbal semantic attributes (in
Japanese). Transaction of the Information Processing
Society of Japan, 34(8):1705–1715.
Hiromi Nakaiwa and Satoru Ikehara. 1996. Intrasenten-
tial resolution of Japanese zero pronouns using prag-
matic and semantic constraints (in Japanese). Journal
of Natural Language Processing, 3(4):49–65.
Manabu Okumura and Kouji Tamura. 1996. Zero pro-
noun resolution based on centering theory. In Proc. of
COLING-1996, pages 871–876.
Kazuhiro Seki, Atsushi Fujii, and Tetsuya Ishikawa.
2002a. Japanese zero pronoun resolution using a prob-
abilistic model (in Japanese). Journal of Natural Lan-
guage Processing, 9(3):63–85.
Kazuhiro Seki, Atsushi Fujii, and Tetsuya Ishikawa.
2002b. A probabilistic method for analyzing Japanese
anaphora integrating zero pronoun detection and reso-
lution. In Proc. of COLING-2002, pages 911–917.
Vladimir N. Vapnik. 1995. The Nature of Statistical
Learning Theory. Springer.
Marilyn Walker, Masayo Iida, and Sharon Cote. 1994.
Japanese discourse and the process of centering. Com-
putational Linguistics, 20(2):193–233.
Kazuhide Yamamoto, Eiichiro Sumita, Osamu Furuse,
and Hitoshi Iida. 1997. Ellipsis resolution in dia-
logues via decision-tree learning. In Proc. of NLPRS-
1997, pages 423–428.
Mitsuko Yamura-Takei, Miho Fujiwara, Makoto Yoshie,
and Teruaki Aizawa. 2002. Automatic linguistic anal-
ysis for language teachers: The case of zeros. In Proc.
of COLING-2002, pages 1114–1120.
Kei Yoshimoto. 1986. Study of Japanese zero pronouns
in discourse processing (in Japanese). In IPSJ SIG
notes, NL-56-4, pages 1–8.
Keiichi Yoshino. 2001. Anaphora resolution of Japanese
zero pronouns using machine learning (in Japanese).
Master’s thesis, Nara Institute of Science and Technol-
ogy.
