An NP-Cluster Based Approach to Coreference Resolution
Xiaofeng Yangyz Jian Suy Guodong Zhouy Chew Lim Tanz
yInstitute for Infocomm Research
21 Heng Mui Keng Terrace,
Singapore, 119613
fxiaofengy,sujian,zhougdg
@i2r.a-star.edu.sg
z Department of Computer Science
National University of Singapore,
Singapore, 117543
fyangxiao,tanclg@comp.nus.edu.sg
Abstract
Traditionally, coreference resolution is done
by mining the reference relationships be-
tween NP pairs. However, an individual NP
usually lacks adequate description informa-
tion of its referred entity. In this paper,
we propose a supervised learning-based ap-
proach which does coreference resolution by
exploring the relationships between NPs and
coreferential clusters. Compared with indi-
vidual NPs, coreferential clusters could pro-
vide richer information of the entities for bet-
ter rules learning and reference determina-
tion. The evaluation done on MEDLINE
data set shows that our approach outper-
forms the baseline NP-NP based approach
in both recall and precision.
1 Introduction
Coreference resolution is the process of linking
as a cluster1 multiple expressions which refer
to the same entities in a document. In recent
years, supervised machine learning approaches
have been applied to this problem and achieved
considerable success (e.g. Aone and Bennett
(1995); McCarthy and Lehnert (1995); Soon et
al. (2001); Ng and Cardie (2002b)). The main
idea of most supervised learning approaches is to
recast this task as a binary classiflcation prob-
lem. Speciflcally, a classifler is learned and then
used to determine whether or not two NPs in a
document are co-referring. Clusters are formed
by linking coreferential NP pairs according to a
certain selection strategy. In this way, the identi-
flcation of coreferential clusters in text is reduced
to the identiflcation of coreferential NP pairs.
One problem of such reduction, however,
is that the individual NP usually lacks ade-
quate descriptive information of its referred en-
tity. Consequently, it is often di–cult to judge
whether or not two NPs are talking about the
1In this paper the term \cluster" can be interchange-
ably used as \chain", while the former better emphasizes
the equivalence property of coreference relationship.
same entity simply from the properties of the
pair alone. As an example, consider the pair of a
non-pronoun and its pronominal antecedent can-
didate. The pronoun itself gives few clues for the
reference determination. Using such NP pairs
would have a negative in uence for rules learn-
ing and subsequent resolution. So far, several
efiorts (Harabagiu et al., 2001; Ng and Cardie,
2002a; Ng and Cardie, 2002b) have attempted to
address this problem by discarding the \hard"
pairs and select only those confldent ones from
the NP-pair pool. Nevertheless, this eliminat-
ing strategy still can not guarantee that the NPs
in \confldent" pairs bear necessary description
information of their referents.
In this paper, we present a supervised
learning-based approach to coreference resolu-
tion. Rather than attempting to mine the ref-
erence relationships between NP pairs, our ap-
proach does resolution by determining the links
of NPs to the existing coreferential clusters. In
our approach, a classifler is trained on the in-
stances formed by an NP and one of its possi-
ble antecedent clusters, and then applied dur-
ing resolution to select the proper cluster for an
encountered NP to be linked. As a coreferen-
tial cluster ofiers richer information to describe
an entity than a single NP in the cluster, we
could expect that such an NP-Cluster framework
would enhance the resolution capability of the
system. Our experiments were done on the the
MEDLINE data set. Compared with the base-
line approach based on NP-NP framework, our
approach yields a recall improvement by 4.6%,
with still a precision gain by 1.3%. These results
indicate that the NP-Cluster based approach is
efiective for the coreference resolution task.
The remainder of this paper is organized as
follows. Section 2 introduces as the baseline the
NP-NP based approach, while Section 3 presents
in details our NP-Cluster based approach. Sec-
tion 4 reports and discusses the experimental re-
sults. Section 5 describes related research work.
Finally, conclusion is given in Section 6.
2 Baseline: the NP-NP based
approach
2.1 Framework description
We built a baseline coreference resolution sys-
tem, which adopts the common NP-NP based
learning framework as employed in (Soon et al.,
2001).
Each instance in this approach takes the form
of ifNPj, NPig, which is associated with a fea-
ture vector consisting of 18 features (f1 » f18) as
described in Table 2. Most of the features come
from Soon et al. (2001)’s system. Inspired by the
work of (Strube et al., 2002) and (Yang et al.,
2004), we use two features, StrSim1 (f17) and
StrSim2 (f18), to measure the string-matching
degree of NPj and NPi. Given the following sim-
ilarity function:
Str Simlarity(Str1;Str2) = 100£jStr1 \Str2jStr
1
StrSim1 and StrSim2 are computed
using Str Similarity(SNPj, SNPi) and
Str Similarity(SNPi, SNPj), respectively. Here
SNP is the token list of NP, which is obtained
by applying word stemming, stopword removal
and acronym expansion to the original string as
described in Yang et al. (2004)’s work.
During training, for each anaphor NPj in a
given text, a positive instance is generated by
pairing NPj with its closest antecedent. A set
of negative instances is also formed by NPj and
each NP occurring between NPj and NPi.
When the training instances are ready, a clas-
sifler is learned by C5.0 algorithm (Quinlan,
1993). Duringresolution, eachencounterednoun
phrase, NPj, is paired in turn with each preced-
ing noun phrase, NPi. For each pair, a test-
ing instance is created as during training, and
then presented to the decision tree, which re-
turns a confldence value (CF)2 indicating the
likelihood that NPi is coreferential to NPj. In
our study, two antecedent selection strategies,
Most Recent First (MRF) and Best First (BF),
are tried to link NPj to its a proper antecedent
with CF above a threshold (0.5). MRF (Soon
et al., 2001) selects the candidate closest to the
anaphor, while BF (Aone and Bennett, 1995; Ng
2The confldence value is obtained by using the
smoothed ratio p+1t+2, where p is the number of positive
instances and t is the total number of instances contained
in the corresponding leaf node.
and Cardie, 2002b) selects the candidate with
the maximal CF.
2.2 Limitation of the approach
Nevertheless, the problem of the NP-NP based
approach is that the individual NP usually lacks
adequate description information about its re-
ferred entity. Consequently, it is often di–cult
to determine whether or not two NPs refer to
the same entity simply from the properties of
the pair. See the the text segment in Table 1,
for example,
[1 A mutant of [2 KBF1/p50] ], unable to
bind to DNA but able to form homo- or [3 het-
erodimers] , has been constructed.
[4 This protein] reduces or abolishes the DNA
binding activity of wild-type proteins of [5 the
same family ([6 KBF1/p50] , c- and v-rel)].
[7 This mutant] also functions in vivo as a
transacting dominant negative regulator:...
Table 1: An Example from the data set
In the above text, [1 A mutant of KBF1/p50],
[4 This protein] and [7 This mutant] are anno-
tated in the same coreferential cluster. Accord-
ing to the above framework, NP7 and its closest
antecedent, NP4, will form a positive instance.
Nevertheless, such an instance is not informa-
tive in that NP4 bears little information related
to the entity and thus provides few clues to ex-
plain its coreference relationship with NP7.
In fact, this relationship would be clear if [1 A
mutant of KBF1/p50], the antecedent of NP4,
is taken into consideration. NP1 gives a de-
tailed description of the entity. By comparing
the string of NP7 with this description, it is ap-
parent that NP7 belongs to the cluster of NP1,
and thus should be coreferential to NP4. This
suggests that we use the coreferential cluster,
instead of its single element, to resolve an NP
correctly. In our study, we propose an approach
which adopts an NP-Cluster based framework to
do resolution. The details of the approach are
given in the next section.
3 The NP-Cluster based approach
Similar to the baseline approach, our approach
also recasts coreference resolution as a binary
classiflcation problem. The difierence, however,
is that our approach aims to learn a classifler
which would select the most preferred cluster,
instead of the most preferred antecedent, for an
encountered NP in text. We will give the frame-
work of the approach, including the instance rep-
Features describing the relationships between NPj and NPi
1. DefNp 1 1 if NPj is a deflnite NP; else 0
2. DemoNP 1 1 if NPj starts with a demonstrative; else 0
3. IndefNP 1 1 if NPj is an indeflnite NP; else 0
4. Pron 1 1 if NPj is a pronoun; else 0
5. ProperNP 1 1 if NPj is a proper NP; else 0
6. DefNP 2 1 if NPi is a deflnite NP; else 0
7. DemoNP 2 1 if NPi starts with a demonstrative; else 0
8. IndefNP 2 1 if NPi is an indeflnite NP; else 0
9. Pron 2 1 if NPi is a pronoun; else 0
10. ProperNP 2 1 if NPi is a proper NP; else 0
11. Appositive 1 if NPi and NPj are in an appositive structure; else 0
12. NameAlias 1 if NPi and NPj are in an alias of the other; else 0
13. GenderAgree 1 if NPi and NPj agree in gender; else 0
14. NumAgree 1 if NPi and NPj agree in number; else 0
15. SemanticAgree 1 if NPi and NPj agree in semantic class; else 0
16. HeadStrMatch 1 if NPi and NPj contain the same head string; else 0
17. StrSim 1 The string similarity of NPj against NPi
18. StrSim 2 The string similarity of NPi against NPj
Features describing the relationships between NPj and cluster Ck
19. Cluster NumAgree 1 if Ck and NPj agree in number; else 0
20. Cluster GenAgree 1 if Ck and NPj agree in gender; else 0
21. Cluster SemAgree 1 if Ck and NPj agree in semantic class; else 0
22. Cluster Length The number of elements contained in Ck
23. Cluster StrSim The string similarity of NPj against Ck
24. Cluster StrLNPSim The string similarity of NPj against the longest NP in Ck
Table 2: The features in our coreference resolution system (Features 1 » 18 are also used in the
baseline system using NP-NP based approach)
resentation, the training and the resolution pro-
cedures, in the following subsections.
3.1 Instance representation
An instance in our approach is composed of three
elements like below:
ifNPj, Ck, NPig
where NPj, like the deflnition in the baseline,
is the noun phrase under consideration, while Ck
is an existing coreferential cluster. Each cluster
could be referred by a reference noun phrase NPi,
a certain element of the cluster. A cluster would
probably contain more than one reference NPs
and thus may have multiple associated instances.
For a training instance, the label is positive if
NPj is annotated as belonging to Ck, or negative
if otherwise.
In our system, each instance is represented as
a set of 24 features as shown in Table 2. The
features are supposed to capture the properties
of NPj and Ck as well as their relationships. In
the table we divide the features into two groups,
one describing NPj and NPi and the other de-
scribing NPj and Ck. For the former group, we
just use the same features set as in the baseline
system, while for the latter, we introduce 6 more
features:
Cluster NumAgree, Cluster GenAgree
and Cluster SemAgree: These three fea-
tures mark the compatibility of NPj and Ck
in number, gender and semantic agreement,
respectively. If NPj mismatches the agreement
with any element in Ck, the corresponding
feature is set to 0.
Cluster Length: The number of NPs in the
cluster Ck. This feature re ects the global
salience of an entity in the sense that the more
frequently an entity is mentioned, the more im-
portant it would probably be in text.
Cluster StrSim: This feature marks the string
similarity between NPj and Ck. Suppose
SNPj is the token set of NPj, we compute
the feature value using the similarity function
Str Similarity(SNPj, SCk), where
SCk = S
NPi2Ck
SNPi
Cluster StrLNPSim: It marks the string
matching degree of NPj and the noun phrase
in Ck with the most number of tokens. The
intuition here is that the NP with the longest
string would probably bear richer description in-
formation of the referent than other elements in
the cluster. The feature is calculated using the
similarity function Str Similarity(SNPj, SNPk),
where
NPk = arg maxNP
i2Ck
jSNPij
3.2 Training procedure
Given an annotated training document, we pro-
cess the noun phrases from beginning to end.
ForeachanaphoricnounphraseNPj, weconsider
its preceding coreferential clusters from right to
left3. For each cluster, we create only one in-
stance by taking the last NP in the cluster as
the reference NP. The process will not terminate
until the cluster to which NPj belongs is found.
To make it clear, consider the example in Ta-
ble 1 again. For the noun phrase [7 This mu-
tant], the annotated preceding coreferential clus-
ters are:
C1: f ..., NP2, NP6 g
C2: f ..., NP5 g
C3: f NP1, NP4 g
C4: f ..., NP3 g
Thus three training instances are generated:
if NP7, C1, NP6 g
if NP7, C2, NP5 g
if NP7, C3, NP4 g
Among them, the flrst two instances are la-
belled as negative while the last one is positive.
After the training instances are ready, we use
C5.0 learning algorithm to learn a decision tree
classifler as in the baseline approach.
3.3 Resolution procedure
The resolution procedure is the counterpart of
the training procedure. Given a testing docu-
ment, for each encountered noun phrase, NPj,
we create a set of instances by pairing NPj with
each cluster found previously. The instances are
presented to the learned decision tree to judge
the likelihood that NPj is linked to a cluster.
The resolution algorithm is given in Figure 1.
As described in the algorithm, for each clus-
ter under consideration, we create multiple in-
stances by using every NP in the cluster as the
reference NP. The confldence value of the cluster
3We deflne the position of a cluster as the position of
the last NP in the cluster.
algorithm RESOLVE (a testing document d)
ClusterSet = ;;
//suppose d has N markable NPs;
for j = 1 to N
foreach cluster in ClusterSet
CFcluster = max
NPi2cluster
CFi(NPj;cluster;NPi)
select a proper cluster, BestCluster, according
to a ceterin cluster selection strategy;
if BestCluster != NULL
BestCluster = BestCluster[fNPjg;
else
//create a new cluster
NewCluster = f NPj g;
ClusterSet = ClusterSet [ fNewClusterg;
Figure 1: The clusters identiflcation algorithm
is the maximal confldence value of its instances.
Similar to the baseline system, two cluster selec-
tion strategies, i.e. MRF and BF, could be ap-
plied to link NPj to a proper cluster. For MRF
strategy, NPj is linked to the closest cluster with
confldence value above 0.5, while for BF, it is
linkedtotheclusterwiththemaximalconfldence
value (above 0.5).
3.4 Comparison of NP-NP and
NP-Cluster based approaches
Asnotedabove, theideaoftheNP-Clusterbased
approach is difierent from the NP-NP based ap-
proach. However, due to the fact that in our
approach a cluster is processed based on its refer-
ence NPs, the framework of our approach could
be reduced to the NP-NP based framework if
the cluster-related features were removed. From
this point of view, this approach could be con-
sidered as an extension of the baseline approach
by applying additional cluster features as the
properties of NPi. These features provide richer
description information of the entity, and thus
make the coreference relationship between two
NPs more apparent. In this way, both rules
learning and coreference determination capabili-
ties of the original approach could be enhanced.
4 Evaluation
4.1 Data collection
Our coreference resolution system is a compo-
nent of our information extraction system in
biomedical domain. For this purpose, an anno-
tated coreference corpus have been built 4, which
4The annotation scheme and samples are avail-
able in http://nlp.i2r.a-star.edu.sg/resources/GENIA-
coreference
MRF BF
Experiments R P F R P F
Baseline 80.2 77.4 78.8 80.3 77.5 78.9
AllAnte 84.4 70.2 76.6 85.7 71.4 77.9
Our Approach 84.4 78.2 81.2 84.9 78.8 81.7
Table 3: The performance of difierent coreference resolution systems
consists of totally 228 MEDLINE abstracts se-
lected from the GENIA data set. The aver-
age length of the documents in collection is 244
words. One characteristic of the bio-literature
is that pronouns only occupy about 3% among
all the NPs. This ratio is quite low compared
to that in newswire domain (e.g. above 10% for
MUC data set).
A pipeline of NLP components is applied to
pre-process an input raw text. Among them,
NE recognition, part-of-speech tagging and text
chunking adopt the same HMM based engine
with error-driven learning capability (Zhou and
Su, 2002). The NE recognition component
trained on GENIA (Shen et al., 2003) can
recognize up to 23 common biomedical entity
types with an overall performance of 66.1 F-
measure (P=66.5% R=65.7%). In addition, to
remove the apparent non-anaphors (e.g., em-
bedded proper nouns) in advance, a heuristic-
based non-anaphoricity identiflcation module is
applied, which successfully removes 50.0% non-
anaphors with a precision of 83.5% for our data
set.
4.2 Experiments and discussions
Our experiments were done on flrst 100 docu-
ments from the annotated corpus, among them
70 for training and the other 30 for testing.
Throughout these experiments, default learning
parameters were applied in the C5.0 algorithm.
The recall and precision were calculated auto-
matically according to the scoring scheme pro-
posed by Vilain et al. (1995).
In Table 3 we compared the performance of
difierent coreference resolution systems. The
flrst line summarizes the results of the baseline
system using traditional NP-NP based approach
as described in Section 2. Using BF strategy,
Baseline obtains 80.3% recall and 77.5% preci-
sion. These results are better than the work by
Castano et al. (2002) and Yang et al. (2004),
which were also tested on the MEDLINE data
set and reported a F-measure of about 74% and
69%, respectively.
In the experiments, we evaluated another NP-
NP based system, AllAnte. It adopts a similar
learning framework as Baseline except that dur-
ing training it generates the positive instances by
paring an NP with all its antecedents instead of
only the closest one. The system attempts to use
such an instance selection strategy to incorpo-
rate the information from coreferential clusters.
But the results are nevertheless disappointing:
although this strategy boosts the recall by 5.4%,
the precision drops considerably by above 6% at
the same time. The overall F-measure is even
lower than the baseline systems.
The last line of Table 3 demonstrates the re-
sults of our NP-Cluster based approach. For BF
strategy, the system achieves 84.9% recall and
78.8% precision. As opposed to the baseline sys-
tem, the recall rises by 4.6% while the precision
still gains slightly by 1.3%. Overall, we observe
the increase of F-measure by 2.8%.
The results in Table 3 also indicate that the
BF strategy is superior to the MRF strategy.
A similar flnding was also reported by Ng and
Cardie (2002b) in the MUC data set.
To gain insight into the difierence in the per-
formance between our NP-Cluster based system
and the NP-NP based system, we compared the
decision trees generated in the two systems in
Figure 2. In both trees, the string-similarity
features occur on the top portion, which sup-
ports the arguments by (Strube et al., 2002)
and (Yang et al., 2004) that string-matching is a
crucial factor for NP coreference resolution. As
shown in the flgure, the feature StrSim 1 in left
treeiscompletelyreplacedbytheCluster StrSim
and Cluster StrLNPSim in the right tree, which
means that matching the tokens with a cluster
is more reliable than with a single NP. More-
over, the cluster length will also be checked when
the NP under consideration has low similarity
against a cluster. These evidences prove that
the information from clusters is quite important
for the coreference resolution on the data set.
The decision tree visualizes the importance of
the features for a data set. However, the tree is
learned from the documents where coreferential
clusters are correctly annotated. During resolu-
HeadMatch = 0:
:...NameAlias = 1: 1 (22/1)
: NameAlias = 0:
: :...Appositive = 0: 0 (13095/265)
: Appositive = 1: 1 (15/4)
HeadMatch = 1:
:...StrSim_1 > 71:
:...DemoNP_1 = 0: 1 (615/29)
: DemoNP_1 = 1:
: :...NumAgree = 0: 0 (5)
: NumAgree = 1: 1 (26)
StrSim_1 <= 71:
:...DemoNP_2 = 1: 1 (12/2)
DemoNP_2 = 0:
:...StrSim_2 <= 77: 0 (144/17)
StrSim_2 > 77:
:...StrSim_1 <= 33: 0 (42/11)
StrSim_1 > 33: 1 (38/11)
HeadMatch = 1:
:...Cluster_StrSim > 66: 1 (663/36)
: Cluster_StrSim <= 66:
: :...StrSim_2 <= 85: 0 (140/14)
: StrSim_2 > 85:
: :...Cluster_StrLNPSim > 50: 1 (16/1)
: Cluster_StrLNPSim <= 50:
: :...Cluster_Length <= 5: 0 (59/17)
: Cluster_Length > 5: 1 (4)
HeadMatch = 0:
:...NameAlias = 1: 1 (22/1)
NameAlias = 0:
:...Appositive = 1: 1 (15/4)
Appositive = 0:
:...StrSim_2 <= 54:
:..
StrSim_2 > 54:
:..
Figure 2: The resulting decision trees for the NP-NP and NP-Cluster based approaches
Features R P F
f1»21 80.3 77.5 78.9
f1»21;f22 84.1 74.4 79.0
f1»21;f23 84.7 78.8 81.6
f1»21;f24 84.3 78.0 81.0
f1»21;f23;f22 84.9 78.6 81.6
f1»21;f23;f24 84.9 78.9 81.8
f1»21;f23;f24;f22 84.9 78.8 81.7
Table 4: Performance using combined features
(fi refers to the i(th) feature listed in Table 2)
tion, unfortunately, the found clusters are usu-
ally not completely correct, and as a result the
features important in training data may not be
also helpful for testing data. Therefore, in the
experiments we were concerned about which fea-
tures really matter for the real coreference res-
olution. For this purpose, we tested our system
using difierent features and evaluated their per-
formanceinTable4. Herewejustconsideredfea-
ture Cluster Length (f22), Cluster StrSim (f23)
and Cluster StrLNPSim (f24), as Figure 2 has
indicated that among the cluster-related features
only these three are possibly efiective for resolu-
tion. Throughout the experiment, the Best-First
strategy was applied.
As illustrated in the table, we could observe
that:
1. Without the three features, the system is
equivalent to the baseline system in terms
of the same recall and precision.
2. Cluster StrSim (f23) is the most efiective
as it contributes most to the system per-
formance. Simply using this feature boosts
the F-measure by 2.7%.
3. Cluster StrLNPSim (f24) is also efiective by
improving the F-measure by 2.1% alone.
When combined with f23, it leads to the
best F-measure.
4. Cluster Length (f22) only brings 0.1% F-
measure improvement. It could barely
increase, or even worse, reduces the F-
measure when used together with the the
other two features.
5 Related work
To our knowledge, our work is the flrst
supervised-learning based attempt to do coref-
erence resolution by exploring the relationship
between an NP and coreferential clusters. In the
heuristic salience-based algorithm for pronoun
resolution, Lappin and Leass (1994) introduce
a procedure for identifying anaphorically linked
NP as a cluster for which a global salience value
is computed as the sum of the salience values of
its elements. Cardie and Wagstafi (1999) have
proposed an unsupervised approach which also
incorporates cluster information into considera-
tion. Their approach uses hard constraints to
preclude the link of an NP to a cluster mismatch-
ing the number, gender or semantic agreements,
while our approach takes these agreements to-
gether with other features (e.g. cluster-length,
string-matching degree,etc) as preference factors
for cluster selection. Besides, the idea of cluster-
ing can be seen in the research of cross-document
coreference, where NPs with high context simi-
laritywouldbechainedtogetherbasedoncertain
clustering methods (Bagga and Biermann, 1998;
Gooi and Allan, 2004).
6 Conclusion
In this paper we have proposed a supervised
learning-based approach to coreference resolu-
tion. Rather than mining the coreferential re-
lationship between NP pairs as in conventional
approaches, our approach does resolution by ex-
ploring the relationships between an NP and the
coreferential clusters. Compared to individual
NPs, coreferential clusters provide more infor-
mation for rules learning and reference determi-
nation. In the paper, we flrst introduced the con-
ventional NP-NP based approach and analyzed
its limitation. Then we described in details the
framework of our NP-Cluster based approach,
including the instance representation, training
and resolution procedures. We evaluated our ap-
proach in the biomedical domain, and the experi-
mental results showed that our approach outper-
forms the NP-NP based approach in both recall
(4.6%) and precision (1.3%).
While our approach achieves better perfor-
mance, there is still room for further improve-
ment. For example, the approach just resolves
an NP using the cluster information available so
far. Nevertheless, the text after the NP would
probably give important supplementary infor-
mation of the clusters. The ignorance of such
information may afiect the correct resolution of
the NP. In the future work, we plan to work out
more robust clustering algorithm to link an NP
to a globally best cluster.

References

C. Aone and S. W. Bennett. 1995. Evaluating
automated and manual acquistion of anaphora
resolution strategies. In Proceedings of the
33rd Annual Meeting of the Association for
Compuational Linguistics, pages 122-129.

A. Bagga and A. Biermann. 1998. Entity-based
cross document coreferencing using the vector
space model. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguisticsthe 17th International Conference on Computational Linguistics, pages
79-85.

C. Cardie and K. Wagstafi. 1999. Noun phrase
coreference as clustering. In Proceedings of
the Joint Conference on Empirical Methods in
NLP and Very Large Corpora.

J. Castano, J. Zhang, and J. Pustejovsky. 2002.
Anaphora resolution in biomedical literature.
In International Symposium on Reference Resolution, Alicante, Spain.

C. Gooi and J. Allan. 2004. Cross-document
coreference on a large scale corpus. In Proceedings of 2004 Human Language Technology
conference / North American chapter of the
Association for Computational Linguistics an-
nual meeting.

S. Harabagiu, R. Bunescu, and S. Maiorano.
2001. Text knowledge mining for coreference
resolution. In Proceedings of the 2nd Annual Meeting of the North America Chapter of
the Association for Compuational Linguistics,
pages 55-62.

S. Lappin and H. Leass. 1994. An algorithm
for pronominal anaphora resolution. Computational Linguistics, 20(4):525{561.

J. McCarthy and Q. Lehnert. 1995. Using de-
cision trees for coreference resolution. In Pro-
ceedings of the 14th International Conference
on Artiflcial Intelligences, pages 1050{1055.

V. Ng and C. Cardie. 2002a. Combining sample selection and error-driven pruning for machine learning of coreference rules. In Proceedings of the conference on Empirical Methods
in Natural Language Processing, pages 55{62,
Philadelphia.

V. Ng and C. Cardie. 2002b. Improving machine learning approaches to coreference resolution. In Proceedings of the 40th Annual
Meeting of the Association for Computational
Linguistics, pages 104{111, Philadelphia.

J. R. Quinlan. 1993. C4.5: Programs for ma-
chine learning. Morgan Kaufmann Publishers,
San Francisco, CA.

D. Shen, J. Zhang, G. Zhou, J. Su, and
C. Tan. 2003. Efiective adaptation of hidden markov model-based named-entity recognizer for biomedical domain. In Proceedings of
ACL 03 Workshop on Natural Language Processing in Biomedicine, Japan.

W. Soon, H. Ng, and D. Lim. 2001. A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 27(4):521-544.

M. Strube, S. Rapp, and C. Muller. 2002. The
in uence of minimum edit distance on reference resolution. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 312{319, Philadelphia.

M. Vilain, J. Burger, J. Aberdeen, D. Connolly,
and L. Hirschman. 1995. A model-theoretic
coreference scoring scheme. In Proceedings of
the Sixth Message understanding Conference
(MUC-6), pages 45-52, San Francisco, CA.
Morgan Kaufmann Publishers.

X. Yang, G. Zhou, J. Su, and C. Tan. 2004. Improving noun phrase coreference resolution by
matching strings. In Proceedings of the 1st International Joint Conference on Natural Language Processing, Hainan.

G. Zhou and J. Su. 2002. Named Entity recognition using a HMM-based chunk tagger. In
Proceedings of the 40th Annual Meeting of
the Association for Computational Linguistics, Philadelphia.
