Importance of Pronominal Anaphora resolution in Question
Answering systems
Jos#13e L. Vicedo and Antonio Ferr#13andez
Departamento de Lenguajes y Sistemas Inform#13aticos
Universidad de Alicante
Apartado 99. 03080 Alicante, Spain
fvicedo,antoniog@dlsi.ua.es
Abstract
The main aim of this paper is
to analyse the e#0Bects of applying
pronominal anaphora resolution to
Question Answering #28QA#29 systems.
For this task a complete QA system
has been implemented. System eval-
uation measures performance im-
provements obtained when informa-
tion that is referenced anaphorically
in documents is not ignored.
1 Introduction
Open domain QA systems are de#0Cned as
tools capable of extracting the answer to
user queries directly from unrestricted do-
main documents. Or at least, systems that
can extract text snippets from texts, from
whose content it is possible to infer the an-
swer to a speci#0Cc question. In both cases,
these systems try to reduce the amount of
time users spend to locate a concrete infor-
mation.
This work is intended to achievetwo princi-
pal objectives. First, we analyse several docu-
ment collections to determine the level of in-
formation referenced pronominally in them.
This study gives us an overview about the
amount of information that is discarded when
these references are not solved. As second ob-
jective, we try to measure improvements of
solving this kind of references in QA systems.
With this purpose in mind, a full QA system
has been implemented. Bene#0Cts obtained by
solving pronominal references are measured
by comparing system performance with and
without taking into account information ref-
erenced pronominally. Evaluation shows that
solving these references improves QA perfor-
mance.
In the following section, the state-of-the-
art of open domain QA systems will be sum-
marised. Afterwards, importance of pronom-
inal references in documents is analysed.
Next, our approach and system components
are described. Finally, evaluation results are
presented and discussed.
2 Background
Interest in open domain QA systems is quite
recent. We had little information about this
kind of systems until the First Question An-
swering Trackwas held in last TREC confer-
ence #28TRE, 1999#29. In this conference, nearly
twenty di#0Berent systems were evaluated with
very di#0Berent success rates. We can clas-
sify current approaches into two groups: text-
snippet extraction systems and noun-phrase
extraction systems.
Text-snippet extraction approaches are
based on locating and extracting the most rel-
evant sentences or paragraphs to the query by
supposing that this text will contain the cor-
rect answer to the query. This approach has
been the most commonly used by participants
in last TREC QA Track. Examples of these
systems are #28Moldovan et al., 1999#29 #28Singhal
et al., 1999#29 #28Prager et al., 1999#29 #28Takaki,
1999#29 #28Hull, 1999#29 #28Cormack et al., 1999#29.
After reviewing these approaches, we can
notice that there is a general agreement
about the importance of several Natural Lan-
guage Processing #28NLP#29 techniques for QA
task. Pos-tagging, parsing and Name En-
tity recognition are used by most of the sys-
tems. However, few systems apply other NLP
techniques. Particularly, only four systems
model some coreference relations between en-
tities in the query and documents #28Morton,
1999#29#28Breck et al., 1999#29 #28Oard et al., 1999#29
#28Humphreys et al., 1999#29. As example, Mor-
ton approach models identity, de#0Cnite noun-
phrases and non-possessive third person pro-
nouns. Nevertheless, bene#0Cts of applying
these coreference techniques have not been
analysed and measured separately.
The second group includes noun-phrase ex-
traction systems. These approaches try to
#0Cnd the precise information requested by
questions whose answer is de#0Cned typically by
a noun phrase.
MURAX is one of these systems #28Kupiec,
1999#29. It can use information from di#0Berent
sentences, paragraphs and even di#0Berentdoc-
uments to determine the answer #28the most rel-
evant noun-phrase#29 to the question. However,
this system does not take into account the
information referenced pronominally in docu-
ments. Simply, it is ignored.
With our system, wewant to determine the
bene#0Cts of applying pronominal anaphora res-
olution techniques to QA systems. Therefore,
we apply the developed computational sys-
tem, Slot Uni#0CcationParser for Anaphora res-
olution #28SUPAR#29 over documents and queries
#28Ferr#13andez et al., 1999#29. SUPAR's architec-
ture consists of three independent modules:
lexical analysis, syntactic analysis, and a reso-
lution module for natural  processing
problems, such as pronominal anaphora.
For evaluation, a standard based IR system
and a sentence-extraction QA system have
been implemented. Both are based on Salton
approach #281989#29. After IR system retrieves
relevant documents, our QA system processes
these documents with and without solving
pronominal references in order to compare #0C-
nal performance.
As results will show, pronominal anaphora
resolution improves greatly QA systems per-
formance. So, we think that this NLP tech-
nique should be considered as part of any
open domain QA system.
3 Importance of pronominal
information in documents
Trying to measure the importance of informa-
tion referenced pronominally in documents,
wehave analysed several text collections used
for QA task in TREC-8 Conference as well
as others used frequently for IR system test-
ing. These collections were the following: Los
Angeles Times #28LAT#29, Federal Register #28FR#29,
Financial Times #28FT#29, Federal Bureau Infor-
mation Service #28FBIS#29, TIME, CRANFIELD,
CISI, CACM, MED and LISA. This analy-
sis consists on determining the amount and
type of pronouns used, as well as the number
of sentences containing pronouns in each of
them. As average measure of pronouns used
in a collection, we use the ratio between the
quantity of pronouns and the number of sen-
tences containing pronouns. Thismeasure ap-
proximates the level of information that is ig-
nored if these references are not solved. Fig-
ure 1 shows the results obtained in this anal-
ysis.
As we can see, the amount and type of pro-
nouns used in analysed collections vary de-
pending on the subject the documents talk
about. LAT, FBIS, TIME and FT collections
are composed from news published in di#0Ber-
ent newspapers. The ratio of pronominal ref-
erence used in this kind of documents is very
high #28from 35,96#25 to 55,20#25#29. These doc-
uments contain a great number of pronomi-
nal references in third person #28he, she, they,
his, her, their#29 whose antecedents are mainly
people's names. In this type of documents,
pronominal anaphora resolution seems to be
very necessary for a correct modelling of rela-
tions between entities. CISI and MED collec-
tions appear ranked next in decreasing ratio
level order. These collections are composed
by general comments about document man-
aging, classi#0Ccation and indexing and doc-
uments extracted from medical journals re-
spectively. Although the ratio presented by
these collections #2824,94#25 and 22,16#25#29 is also
high, the most important group of pronominal
references used in these collections is formed
by "it" and "its" pronouns. In this case,
TEXT COLLECTION LAT FBIS TIME FT CISI MED CACM LISA FR CRANFIELD
Pronoun type
HE, SHE, THEY 38,59% 29,15% 31,20% 26,20% 15,38% 15,07% 8,59% 12,24% 13,31% 6,54%
HIS, HER, THEIR 25,84% 21,54% 35,01% 20,52% 22,96% 21,46% 15,69% 31,03% 20,70% 10,35%
IT, ITS 26,92% 39,60% 22,43% 46,68% 52,11% 57,41% 67,61% 47,86% 61,06% 79,76%
HIM, THEM 7,04% 7,08% 7,82% 4,44% 6,38% 3,96% 4,87% 6,30% 3,45% 1,60%
HIM, HER,IT(SELF), THEMSELVES 1,61% 2,63% 3,54% 2,17% 3,17% 2,10% 3,25% 2,57% 1,48% 1,75%
Pronouns in Sentences
Containing 0  pronouns 44,80% 48,09% 51,37% 64,04% 75,06% 77,84% 79,06% 83,79% 84,92% 90,95%
Containing 1 pronoun 30,40% 31,37% 29,46% 23,07% 17,17% 15,02% 17,54% 13,01% 11,64% 8,10%
Containing 2 pronouns 14,94% 12,99% 12,26% 8,54% 5,27% 4,75% 2,79% 2,56% 2,57% 0,85%
Containing +2 pronouns 9,86% 7,55% 6,90% 4,34% 2,51% 2,39% 0,60% 0,64% 0,88% 0,09%
Ratio of pronominal reference 55,20% 51,91% 48,63% 35,96% 24,94% 22,16% 20,94% 16,21% 15,08% 9,05%
Figure 1: Pronominal references in text collections
antecedents of these pronominal references
are mainly concepts represented typically by
noun phrases. It seems again important solv-
ing these references for a correct modelling
of relations between concepts expressed by
noun-phrases. The lowest ratio results are
presented by CRANFIELD collection with a
9,05#25. The reason of this level of pronominal
use is due to text contents. This collection is
composed by extracts of very high technical
subjects. Between the described percentages
we #0Cnd the CACM, LISA and FR collections.
These collections are formed by abstracts and
documents extracted from the Federal Regis-
ter, from the CACM journal and from Library
and Information Science Abstracts, respec-
tively. As general behaviour, we can notice
that as more technical document contents be-
come, the pronouns "it" and "its" become the
most appearing in documents and the ratio
of pronominal references used decreases. An-
other observation can be extracted from this
analysis. Distributionof pronounswithinsen-
tences is similar in all collections. Pronouns
appear scattered through sentences contain-
ing one or two pronouns. Using more than
two pronouns in the same sentence is quite
infrequent.
After analysing these results an important
question may arise. Is it worth enough to
solve pronominal references in documents? It
would seem reasonable to think that resolu-
tion of pronominal anaphora would only be
accomplished when the ratio of pronominal
occurrence exceeds a minimum level. How-
ever, we have to take into account that the
cost of solving these references is proportional
to the number of pronouns analysed and con-
sequently, proportional to the amount of in-
formation a system will ignore if these refer-
ences are not solved.
As results above state, it seems reason-
able to solve pronominal references in queries
and documents for QA tasks. At least, when
the ratio of pronouns used in documents rec-
ommend it. Anyway, evaluation and later
analysis #28section 5#29 contribute with empiri-
cal data to conclude that applying pronom-
inal anaphora resolution techniques improve
QA systems performance.
4 Our Approach
Our system is made up of three modules. The
#0Crst one is a standard IR system that retrieves
relevant documents for queries. The second
module will manage with anaphora resolution
in both, queries and retrieved documents. For
this purpose we use SUPAR computational
system #28section 4.1#29. And the third one is
a sentence-extraction QA system that inter-
acts with SUPAR module and ranks sentences
from retrieved documents to locate the an-
swer where the correct answer appears #28sec-
tion 4.2#29.
For the purpose of evaluation an IR sys-
tem has been implemented. This system is
based on the standard information retrieval
approach to document ranking described in
Salton #281989#29. For QA task, the same ap-
proach has been used as baseline but using
sentences as text unit. Each term in the query
and documents is assigned an inverse docu-
ment frequency #28idf #29 score based on the same
corpus. This measure is computed as:
idf#28t#29=log#28
N
df#28t#29
#29 #281#29
where N is the total number of documents
in the collection and df#28t#29 is the number of
documents which contains term t. Query ex-
pansion consists of stemming terms using a
version of the Porter stemmer. Document and
sentence similarityto the querywas computed
using the cosine similarity measure. The LAT
corpus has been selected as test collection due
to his high level of pronominal references.
4.1 Solving pronominal anaphora
In this section, the NLP Slot Uni#0Ccation
Parser for Anaphora Resolution #28SUPAR#29
is brie#0Dy described #28Ferr#13andez et al., 1999;
Ferr#13andez et al., 1998#29. SUPAR's architec-
ture consists of three independent modules
that interact with one other. These modules
are lexical analysis, syntactic analysis, and a
resolution module for Natural Language Pro-
cessing problems.
Lexical analysis module. This module
takes each sentence to parse as input, along
with a tool that provides the system with all
the lexical information for each word of the
sentence. This tool may be either a dictio-
nary or a part-of-speech tagger. In addition,
this module returns a list with all the neces-
sary information for the remaining modules
as output. SUPAR works sentence by sen-
tence from the input text, but stores informa-
tion from previous sentences, which it uses in
other modules, #28e.g. the list of antecedents of
previous sentences for anaphora resolution#29.
Syntactic analysis module. This mod-
ule takes as input the output of lexical analy-
sis module and the syntactic information rep-
resented by means of grammatical formalism
Slot Uni#0Ccation Grammar #28SUG#29. It returns
what is called slot structure, which stores all
necessary information for following modules.
One of the main advantages of this system is
that it allows carrying out either partial or
full parsing of the text.
Module of resolution of NLP prob-
lems. In this module, NLP problems
#28e.g. anaphora, extra-position, ellipsis or PP-
attachment#29 are dealt with. It takes the slot
structure #28SS#29 that corresponds to the parsed
sentence as input. The output is an SS in
which all the anaphors have been resolved. In
this paper, only pronominal anaphora resolu-
tion has been applied.
The kinds of knowledge that are going to
be used in pronominal anaphora resolution in
this paper are: pos-tagger, partial parsing,
statistical knowledge, c-command and mor-
phologic agreement as restrictions and several
heuristics such as syntactic parallelism, pref-
erence for noun-phrases in same sentence as
the pronoun preference for proper nouns.
We should remark that when wework with
unrestricted texts #28as it occurs in this paper#29
we do not use semantic knowledge #28i.e. a
tool suchasWorNet#29. Presently, SUPAR re-
solves both Spanish and English pronominal
anaphora with a success rate of 87#25 and 84#25
respectively.
SUPAR pronominal anaphora resolution
di#0Bers from those based on restrictions and
preferences, since the aim of our preferences
is not to sort candidates, but rather to dis-
card candidates. That is to say, preferences
are considered in a similar way to restrictions,
except when no candidate satis#0Ces a prefer-
ence, in which case no candidate is discarded.
For example in sentence: "Rob was asking us
about John. Ireplied that Peter saw John yes-
terday. James also saw him." After applying
the restrictions, the following list of candi-
dates is obtained for the pronoun him: #5BJohn,
Peter, Rob#5D, which are then sorted according
to their proximity to the anaphora. If pref-
erence for candidates in same sentence as the
anaphora is applied, then no candidate satis-
#0Ces it, so the following preference is appliedon
the same list of candidates. Next, preference
for candidates in the previous sentence is ap-
plied and the list is reduced to the following
candidates: #5BJohn, Peter#5D. If syntactic par-
allelism preference is then applied, only one
candidate remains, #5BJohn#5D, which will be the
antecedent chosen.
Each kind of anaphora has its own set of
restrictions and preferences, although they all
follow the same general algorithm: #0Crst come
the restrictions, after which the preferences
are applied. For pronominal anaphora, the
set of restrictions and preferences that apply
are described in Figure 2.
Procedure SelectingAntecedent ( INPUT L: ListOfCandidates,
                     OUTPUT Solution: Antecedent )
Apply restrictions to L with a result of L1
Morphologic agreement
C-command constraints
Semantic consistency
Case of:
NumberOfElements (L1) = 1
Solution = TheFirstOne (L1)
NumberOfElements (L1) = 0
Exophora or cataphora
NumberOfElements (L1) > 1
Apply preferences to L1 with a result of L2
1) Candidates in the same sentence as anaphor.
2) Candidates in the previous sentence
3) Preference for proper nouns.
4) Candidates in the same position as the anaphor
with reference to the verb (before or after).
5) Candidates with the same number of parsed
constituents as the anaphora
6) Candidates that have appeared with the verb of
the anaphor more than once
7) Preference for indefinite NPs.
Case of:
NumberOfElements (L2) = 1
       Solution = TheFirstOne (L2)
NumberOfElements (L2) > 1
Extract from L2 in L3 those candidates that have
been repeated most in the text
If NumberOfElements (L3) > 1
Extract from L3 in L4 those candidates that
have appeared most with the verb of the
anaphora
Solution = TheFirstOne (L4)
Else
Solution = TheFirstOne (L3)
EndIf
EndCase
EndCase
EndProcedure
Figure 2: Pronominal anaphora resolution al-
gorithm
The following restrictions are #0Crst applied
to the list of candidates: morphologic agree-
ment, c-command constraints and semantic
consistency. Thislistis sorted by proximityto
the anaphor. Next, if after applying restric-
tions there is still more than one candidate,
the preferences are then applied, in the order
shown in this #0Cgure. This sequence of prefer-
ences #28from 1 to 7#29 stops when, after having
applied a preference, only one candidate re-
mains. If after applying preferences there is
still more than one candidate, then the most
repeated candidates
1
in the text are extracted
from the list after applying preferences. After
this is done, if there is still more than one can-
didate, then those candidates that have ap-
peared most frequently with the verb of the
anaphor are extracted from the previous list.
Finally, if after having applied all the previ-
ous preferences, there is still more than one
candidate left, the #0Crst candidate of the re-
sulting list, #28the closest one to the anaphor#29,
is selected.
4.2 Anaphora resolution and QA
Our QA approach provides a second level of
processing for relevant documents: Analysing
matching documents and Sentence ranking.
Analysing Matching Documents. This
step is applied over the best matching docu-
ments retrieved from the IR system. These
documents are analysed by SUPAR module
and pronominal references are solved. As re-
sult, each pronoun is associated with the noun
phrase it refers to in the documents. Then,
documents are split into sentences as basic
text unit for QA purposes. This set of sen-
tences is sent to the sentence ranking stage.
Sentence Ranking. Each term in the
query is assigned a weight. This weight is
the sum of inverse document frequency mea-
sure of terms based on its occurrence in the
LAT collection described earlier. Each docu-
ment sentence is weighted the same way. The
only di#0Berence with baseline is that pronouns
are given the weight of the entity they refer
to. As we only want to analyse the e#0Bects
of pronominal reference resolution, no more
changes are introduced in weighting scheme.
For sentence ranking, cosine similarity is used
between query and document sentences.
5 Evaluation
For this evaluation, several people unac-
quainted with this work proposed 150 queries
1
Here, we mean that #0Crstly we obtain the maxi-
mum number of repetitions for an antecedentinthe
remaining list. After that, we extract from that list
the antecedents that have this value of repetition.
whose correct answer appeared at least once
into the analysed collection. These queries
were also selected based on their expressing
the user's information need clearly and their
being likely answered in a single sentence.
First, relevant documents for each query
were retrieved using the IR system described
earlier. Only the best 50 matching docu-
ments were selected for QA evaluation. As
the document containing the correct answer
was included into the retrieved sets for only
93 queries #28a 62#25 of the proposed queries#29,
the remaining 57 queries were excluded for
this evaluation.
Once retrieval of relevant document sets
was accomplished for each query, the sys-
tem applied anaphora resolution algorithm to
these documents. Finally, sentence matching
and ranking was accomplished as described in
section 4.2 and the system presented a ranked
list containing the 10 most relevant sentences
to each query.
For a better understandingof evaluation re-
sults, queries were classi#0Ced into three groups
depending on the following characteristics:
#0F Group A. There are no pronominal ref-
erences in the target sentence #28sentence
containing the correct answer#29.
#0F Group B. The information required as
answer is referenced via pronominal
anaphora in the target sentence.
#0F Group C. Any term in the query is ref-
erenced pronominally in the target sen-
tence.
Group A was made up by 37 questions.
Groups B and C contained 25 and 31 queries
respectively. Figure 3 shows examples of
queries classi#0Ced into groups B and C.
Evaluation results are presented in Figure
4 as the number of target sentences appear-
ing into the 10 most relevant sentences re-
turned by the system for each query and also,
the number of these sentences that are con-
sidered a correct answer. An answer is con-
sidered correct if it can be obtained by sim-
ply looking at the target sentence. Results
Question: “Who is the village head man of Digha ?”
Answer: “He is the sarpanch, or village head man of
Digha, a hamlet or mud-and-straw huts  10
miles from ...”
Group B Example
Anaphora resolution: Ram Bahadu
Question: “What did Democrats propose for low-income
families?”
Answer: “They also want to provide small subsidies for
low-income families in which both parents work
at outside jobs.”
Group C Example
Anaphora resolution: Democrats
Figure 3: Group B and C query examples
are classi#0Ced based on question type intro-
duced above. The number of queries pertain-
ing to each group appears in the second col-
umn. Third and fourth columns show base-
line results #28without solving anaphora#29. Fifth
and sixthcolumnsshow resultsobtained when
pronominal references have been solved.
Results show several aspects we have to
takeinto account. Bene#0Cts obtained from ap-
plying pronominal anaphora resolution vary
depending on question type. Results for
group A and B queries show us that relevance
to the query is the same as baseline system.
So, it seems that pronominal anaphora res-
olution does not achieve any improvement.
This is true only for group A questions. Al-
though target sentences are ranked similarly,
for group B questions, target sentences re-
turned by baseline can not be considered as
correct because we do not obtain the an-
swer by simply looking at returned sentences.
The correct answer is displayed only when
pronominal anaphora is solved and pronom-
inal references are substituted by the noun
phrase they refer to. Only if pronominal ref-
erences are solved, the user will not need to
read more text to obtain the correct answer.
For noun-phrase extraction QA systems the
improvement is greater. If pronominal ref-
erences are not solved, this information will
                    Baseline              Anaphora solved
Answer Type      Number Target included Correct answer Target included Correct answer
A 37 (39,78%) 18 (48,65%) 18 (48,65%) 18 (48,65%) 18 (48,65%)
B 25 (26,88%) 12 (48,00%) 0 (0,00%) 12 (48,00%) 12 (48,00%)
C 31 (33,33%) 9 (29,03%) 9 (29,03%) 21 (67,74%) 21 (67,74%)
A+B+C 93 (100,00%) 39 (41,94%) 27 (29,03%) 51 (54,84%) 51 (54,84%)
Figure 4: Evaluation results
not be analysed and probably a wrong noun-
phrase will be given as answer to the query.
Results improve again if we analyse group
C queries performance. These queries have
the following characteristic: some of the
query terms were referenced via pronominal
anaphora in the relevant sentence. When
this situation occurs, target sentences are re-
trieved earlier in the #0Cnal ranked list than in
the baseline list. This improvement is because
similarity increases between query and target
sentence when pronouns are weighted with
the same score as their referring terms. The
percentage of target sentences obtained in-
creases 38,71 points #28from 29,03#25 to 67,74#25#29.
Aggregate results presented in Figure 4
measure improvement obtained considering
the system as a whole. General percentage
of target sentences obtained increases 12,90
points #28from 41,94#25 to 54,84#25#29 and the level
of correct answers returned by the system in-
creases 25,81 points #28from 29,03#25 to 54,84#25#29.
At thispointwe need to considerthe follow-
ing question: Will these results be the same
for any other question set? Wehave analysed
test questions in order to determine if results
obtained depend on question test set. We ar-
gue that a well-balanced query set would have
a percentage of target sentences that contain
pronouns #28PTSC#29 similar to the pronominal
reference ratio of the text collection that is
being queried. Besides, we suppose that the
probability of #0Cnding an answer in a sentence
is the same for all sentences in the collec-
tion. Comparing LAT ratio of pronominal
reference #2855,20#25#29 with the question test set
PTSC we can measure how a question set can
a#0Bect results. Our question set PTSC value
is a 60,22#25. We obtain as target sentences
containing pronouns only a 5,02#25 more than
expected when test queries are randomly se-
lected. In order to obtain results according to
awell-balanced question set, we discarded #0Cve
questions from both groups B and C. Figure 5
shows that results for this well-balanced ques-
tion set are similarto previous results. Aggre-
gate results show that general percentage of
target sentences increases 10,84 points when
solving pronominal anaphora and the level
of correct answers retrieved increases 22,89
points #28instead of 12,90 and 25,81 obtained
in previous evaluation respectively#29.
As results show, we can say that pronom-
inal anaphora resolution improves QA sys-
tems performance in several aspects. First,
precision increases when query terms are ref-
erenced anaphorically in the target sentence.
Second, pronominal anaphora resolution re-
duces the amount of text a user has to read
when the answer sentence is displayed and
pronominal references are substituted with
their coreferent noun phrases. And third,
for noun phrase extraction QA systems it is
essential to solve pronominal references if a
good performance is pursued.
6 Conclusions and future research
The analysis of information referenced
pronominally in documents has revealed to
be important to tasks where high level of
recall is required. We have analysed and
measured the e#0Bects of applying pronominal
anaphora resolution in QA systems. As
results show, its application improves greatly
QA performance and seems to be essential in
some cases.
Three main areas of future work have ap-
peared while investigation has been devel-
oped. First, IR system used for retrieving
relevant documents has to be adapted for QA
                    Baseline              Anaphora solved
Answer Type      Number Target included Correct answer Target included Correct answer
A 37 (39,78%) 18 (48,65%) 18 (48,65%) 18 (48,65%) 18 (48,65%)
B 20 (21,51%) 10 (50,00%) 0 (0,00%) 10 (50,00%) 10 (50,00%)
C 26 (27,96%) 9 (34,62%) 9 (34,62%) 18 (69,23%) 18 (69,23%)
A+B+C 83 (89,25%) 37 (44,58%) 27 (32,53%) 46 (55,42%) 46 (55,42%)
Figure 5: Well-balanced question set results
tasks. The IR used, obtained the document
containing the target sentence only for 93 of
the 150 proposed queries. Therefore, its preci-
sion needs to be improved. Second, anaphora
resolution algorithm has to be extended to
di#0Berent types of anaphora such as de#0Cnite
descriptions, surface count, verbal phrase and
one-anaphora. And third, sentence ranking
approach has to be analysed to maximise the
percentage of target sentences included into
the 10 answer sentences presented by the sys-
tem.
References
Eric Breck, John Burger, Lisa Ferro, David House,
Marc Light, and Inderjeet Mani. 1999. A Sys
Called Quanda. In Eighth Text REtrieval Con-
ference #28TRE, 1999#29.
Gordon V. Cormack, Charles L. A. Clarke,
Christopher R. Palmer, and Derek I. E.
Kisman. 1999. Fast Automatic Passage Rank-
ing #28MultiText Experiments for TREC-8#29. In
Eighth Text REtrieval Conference #28TRE, 1999#29.
Antonio Ferr#13andez, Manuel Palomar, and Lidia
Moreno. 1998. Anaphora resolution in unre-
striced texts with partial parsing. In 36th An-
nual Meeting of the Association for Computa-
tional Linguistics and 17th International Con-
ference on Computational Lingustics COLING-
ACL.
Antonio Ferr#13andez, Manuel Palomar, and Lidia
Moreno. 1999. An empirical approach to Span-
ish anaphora resolution. To appear in Machine
Translation.
David A. Hull. 1999. Xerox TREC-8 Question
Answering Track Report. In Eighth Text RE-
trieval Conference #28TRE, 1999#29.
Kevin Humphreys, Robert Gaizauskas, Mark
Hepple, and Mark Sanderson. 1999. University
of She#0Eeld TREC-8 Q&A System. In Eighth
Text REtrieval Conference #28TRE, 1999#29.
Julian Kupiec, 1999. MURAX: Finding and Or-
ganising Answers from Text Search, pages 311#7B
331. Kluwer Academic, New York.
Dan Moldovan, Sanda Harabagiu, Marius Pasca,
Rada Mihalcea, Richard Goodrum, Roxana
G^#10rju, and Vasile Rus. 1999. LASSO: A Tool
for Sur#0Cng the Answer Net. In Eighth Text RE-
trieval Conference #28TRE, 1999#29.
Thomas S. Morton. 1999. Using Coreference in
Question Answering. In Eighth Text REtrieval
Conference #28TRE, 1999#29.
Douglas W. Oard, Jianqiang Wang, Dekang Lin,
and Ian Soboro#0B. 1999. TREC-8 Experiments
at Maryland: CLIR, QA and Routing. In
Eighth Text REtrieval Conference #28TRE, 1999#29.
John Prager, Dragomir Radev, Eric Brown, Anni
Coden, and Valerie Samn. 1999. The Use of
Predictive Annotation for Question Answering.
In Eighth Text REtrieval Conference #28TRE,
1999#29.
Gerard A. Salton. 1989. Automatic Text Process-
ing: The Transformation, Analysis, and Re-
trieval of Information by Computer. Addison
Wesley, New York.
Amit Singhal, Steve Abney, Michiel Bacchiani,
Michael Collins, Donald Hindle, and Fernando
Pereira. 1999. ATT at TREC-8. In Eighth Text
REtrieval Conference #28TRE, 1999#29.
Toru Takaki. 1999. NTT DATA: Overview of sys-
tem approach at TREC-8 ad-hoc and question
answering. In Eighth Text REtrieval Confer-
ence #28TRE, 1999#29.
TREC-8. 1999. Eighth Text REtrieval Confer-
ence.
