Experiments with Open-Domain Textual Question Answering 
Sanda M. Harabagiu and Marius A. Pa~ca and Steven J. Maiorano 
Department of Computer Science and Engineering 
Southern Methodist University 
Dallas, TX 75275-0122 
{sanda, marius, steve}@renoir, seas. smu. edu 
Abstract 
This paper describes the integration of several 
knowledge-based natural  processing tech- 
niques into a Question Answering system, capable 
of mining textual answers from large collections of 
texts. Surprizing quality is achieved when several 
lightweight knowledge-based NLP techniques com- 
I)lement mostly shallow, surface-based approaclms. 
1 Background 
The last decade has witnessed great advances and 
interest in the area of Information Extraction (IE) 
fi'om real-world texts. Systems that participated in 
the TIPSTER MUC competitions have been quite sue- 
cessflll at extracting information from newswire rues- 
sages and filling templates with inforInation pertain- 
ing to events or situations of interest. Typically, the 
templates model queries regarding who did what to 
wh, om, when and where, and eventually why. 
Recently, a new trend in information processing 
from texts has emerged. Textual Question Answer- 
ing (Q/A) aims at; identit~ying the answer of a ques- 
tion in large collections of on-line documents. In- 
stead of extracting all events of interest and their re- 
lated entities, a Q/A system higtflights only a short 
piece of text, accounting for the answer. Moreover, 
questions are expressed in natural , are not 
constrained to a specific domain and are not limited 
to the six question types sought by IE systems (i.e. 
wh, ol (lid what.2 to whoma, when4 and wh, eres, and 
eventually whya). 
In open-domain Q/A systems, the finite-state 
technology and domain knowledge that made IE sys- 
tems successful are replaced by a combination of (1) 
kuowledge-based question processing, (2) new forms 
of text indexing and (3) lightweight abduction of 
queries. More generally, these systems coml)ine cre- 
atively components of tile NLP basic research ill- 
frastructure developed in the 80s (e.g. the compu- 
tational theory of Q/A reported in (Lehnert 1978) 
and tim theory of abductive interpretation of texts 
reported in (Hobbs et a1.1993)) with other shallow 
teclmiques that make possil)le the open-domain pro- 
cessing on real-world texts. 
Tim idea of building open-domain Q/A systems 
that perform on real-world document collections was 
initiated by the eighth Text REtrieval Conference 
(TREC-8), by organizing the first competition of an- 
swering fact-based questions such as "Who came up 
with the name, El Nino?'. Resisting the tempta- 
tion of merely porting and integrating existing IE 
and IR technologies into Q/A systems, the develop- 
ers of the TREC Q/A systems have not only shalmd 
new processing methods, but also inspired new re- 
search in the challenging iutegration of surface-text- 
based methods with knowledge-based text inference. 
In particular, two clear knowledge processing needs 
are presented: (1) capturing the semantics of open- 
domain questions and (2) justifying the correctness 
of answers. 
In this paper, we present our experiments with 
integrating knowledge-based NLP with shallow pro- 
cessing techniques for these two aspects of Q/A. Our 
research was motivated by the need to enhance the 
precision of an implemented Q/A system and by the 
requirement to preI)are it for scaling to more com- 
plex questions than those t)resented in the TREC 
competition. In the remaining of the paper, we de- 
scribe a Q/A architecture that allows the integra- 
tion of knowledge-based NLP processing with shal- 
low processing and we detail tlmir interactions. Sec- 
tion 2 presents the flmctionality of several knowledge 
processing modules and describes tile NLP tech- 
niques for question and answer processing. Section 
3 explains the semantic and logical interactions of 
processing questions and answers whereas Sectiou 
4 higlllights the inference aspects that inlplement 
the justification option of a Q/A system. Section 
5 presents the results and the evaluations whereas 
Section 6 concludes tim paper. 
2 The NLP Techniques 
Surprising quality for open-donlain textual Q/A can 
be achieved when several lightweight knowledge- 
based NLP techniques eomt)lenmnt mostly shallow, 
surface-based approaches. The processing imposed 
by Q/A systems must be distinguished, oi1 the one 
band, from IR techniques, that locate sets of doc- 
292 
,: :,. 
Queslion lIoeumcnls Answer(s) 
Question ' Qul , ~._ k~\[ 
Semantic Logi ~~ l,"LTransfimnati°n.- )j~J ('rransI 
, -\[ - (Question ~ /\/\/\ ..... "'/ c/,-,.,..,. \[! 
/ Ot, estion \ ~=g~ Taxonomics ~/Expanded 
/ V ~.~ Expanded Word Classes \[ Question \])- 
f-~-:~2q ._1,'~ Expansion I / / P ~~¢\\ 
k.~J.j~J qSxpandcd 
Knowledge-lhrwd OuestioH l'roces,dng 
) 
tW 
Shallow Document l'roces.d~tg Kmm,ledge-Based Answer l'rocessing 
Figure 1: An architecture tbr knowh;dge-l)ased Question/Answering 
uments ('ontaining the required information, based 
on keywords tech, niques. Q/A systems are presented 
with natural  questions, far richer in seman- 
tics than a set of keywords eventually st, ru('flured 
around some, oi)erators. Ihnthernmre, the outtlut 
of Q/A systems is either the actual answer identi- 
fied in a text or small text; fragments containing the 
answer. This eliminates the user's trouble of tind- 
ing the required inibrnlation in sometime.s large sets 
of retrieved do(-uments. Op(m-donmin Q/A systems 
must also l)e distinguished, on the other hand, h'om 
IE syst(;ms that model the inforlnation need through 
(latal)as(; t(;mt)lates , thus less naturally than a tex- 
tual answer. Moreovei', open-domain IE is still ditli- 
(:ult to achieve, beeause its linguistic t)atCern re(:og- 
nition relies on domain-dependent lexico-semantie 
knowledge. 
To t)e able to satisf~y tile ol)en-donmin constraints, 
textual Q/A systems replace the linguistic pattern 
matching capabilities of IE systems with methods 
that rely (m the recognitioil of tile question type and 
of the e.'rpectcd answer type. Generally, this informa- 
tion is available by accessing a classification based 
on the question stem (i.e. what, how much, who) 
and the head of the first nOml phrase of the ques- 
ti(m. Question 1)rocessing also includes the identifi- 
cation of the question keywords. Empirical methods, 
based on a set of ordered heuristics ot)erating on the 
phrasal parse of the question, extract keywords that 
are passed to the search engine. The overall preci- 
sion of tile Q/A system depends also on th(, recogni- 
tion of the question focus, since the answer extrac- 
tion, suet:ceding the IR phase, is centered around 
the question focus. Unl})rtmmtely, eml)irical ninth- 
ods fl)r t'oeus recognition are hard to develop without 
the availability of richer semantic knowledge. 
S1)eeial requir(nnents are set Oil the documeid; pro- 
(:essing COml)Olmnt of a Q/A system. To speed-u l) 
the answer extraction, the search engine returns only 
those 1)aragrai)hs from a document that contain all 
queried keywords. The paragraphs are ordered to 
1)roInote the, eases when the keywords not only art; as 
close as 1)ossibh~', lint also t)reserve the syntactic de- 
1)enden(:ies re(:ognized in the question. Answers are 
('.xtra(:ted whenever the question topic and the m> 
swer tyI)e are recognized iil a 1)aragraph. Thereafl;er 
the answers :/1(; scored 1)ased on several bag-of-words 
hem'isties. Throughout all this 1)roeessing, the NLP 
te(:hniques are limited to (21) named entity recogni- 
tion; (b) semantic classification of the question tyt)e, 
l/ased oil information 1)rovided by an off-line ques- 
tion taxononly 21.i1(t senmntic class intbrmation avail- 
able from WordNet (Felll)mml 1998); mid (c) phrasal 
parsing produced by enhancing Brill's part-of-sl)eech 
tagger with some rules tbr phrase tbrmation. 
Ilowever simt/le, this technology surl)asses 75% 
precision on trivia questions, as posed in the TREC- 
8 (:ompetition (of. (Moldovan et al.1999)). An im- 
pressive improvenmnt of 14% is achieved when more 
knowledge-intensive NLP techniques are ai)plied a.t 
both question and answer processing level. Figure 1 
illustrates the architecture of a system that has en- 
hanced Q/A performance. 
As represented in Figure 1, all three modules 
of the Q/A system preserve the shallow processing 
eomi/onents that determine good t)erformanee. In 
t;t1(', Quest, ion Processing module, the Question Class 
re(:ognizer, working against a taxonomy of questions, 
293 
still constitutes the central processing that takes 
place at this stage. However, a far richer representa- 
tion of the quest;ion classes is employed. To be able 
to classify against the new question taxonomy each 
question is first flflly parsed and transfommd into 
a semantic representation that captures all relation- 
ships between I)hrase heads. 
The recognition of the question class is based on 
the comparison of the question smnantic representa- 
tion with the semantic representation of the nodes 
from tlm question taxonomy. Taxonomy nodes en- 
code also the answer type, the question focus and 
the semantic class of question keywords. Multiple 
sets of keywords are generated based on their seman- 
tic class, all pertaining to the stone original ques- 
tion. This thature enables the search engine to re- 
trieve multiple sets of documents, pertaining to mul- 
tit)le sets of answers, that are extracted, combined 
and ranked based on several heuristics, reported in 
(Moklovan et a1.1999). This process of obtaining 
multiple sets of answers increases l;he likelihood of 
finding the correct answer. 
However, the big boost in the precision of the 
knowledge-based Q/A system is provided by the op- 
tion of enabling the justification of the extracted an- 
swer. All extracted answers are parsed and trans- 
formed in semantic representations. Thereafter, 
both semantic transformations for questions and an- 
swers are translated into logic forms and presented 
to a simplified theoreln prover. The proof back- 
chains Dora the question to the answer, its trace 
generating a justification. The prover may access 
a set of abduction rules that relax the justification 
process. Whenever an answer cmmot l)e 1)roven, it 
is discarded. This option solves multiple situations 
when the correct answer is not ranked as the first 
return, due to stronger surface-text-based indicators 
in some other answers, which unfortunately are not 
correct. 
This architecture allows for simple integration of 
semantic and axiomatic knowledge sources in a Q/A 
system and determines efficient interaction of text- 
surface-based and knowledge-based NLP techniques. 
3 Interactions 
Three main interactions between text-surface-based 
and knowledge-based NLP techniques are designed 
in our Q/A architecture: 
1. When multiple sets of question keywords are passed 
to the search engine, increasing the chance of find- 
ing the text paragraph containing the answer. 
2. When the question focus and the answer type, re- 
suiting from the knowledge-based processing of the 
question, are used in the extraction of the answer, 
based on several empirical scores. 
3. When the justification option of the Q/A system is 
available. Instead of returning answers scored by 
some empirical measures, a proof of the correctness 
of the answer is produced, by accessing the logical 
transformations of the question and the answer, as 
well as axioms encoding world knowledge. 
All these interactions depend on two Nctors: (1) 
the l;ransformations of the question or answer into 
semantic or logical representations; and (2) the avail- 
ability of knowledge resources, e.g. the question tax- 
onomy and the world knowledge axioms. The avail- 
ability of new, high-performace parsers that operate 
on real world texts determines the transformation 
into semantic and logic formulae quite simple. In ad- 
dition, the acquisition of question taxonomies is alle- 
viated by machine learning techlfiques inspired from 
bootstrapping methods that learn linguistic patterns 
and semantic dictionaries for IE (of. (Riloff and 
Jones, 1999)). World knowledge axioms can also be 
easily derived by processing the gloss (lefinitions of 
WordNel; (Fellbaunl 1998). 
a.1 Semantic and Logic Transformations 
Semantic Transtbrmations 
Instead of t)roducing only a phrasal parse for the 
question and answer, we lnake use of one of the new 
statistical parsers of large real-world text coverage 
(Collins, 1996). The parse trees produced by such a 
parser can be easily translated into a seinantic repre- 
sentation that (1) comprises all the phrase beads and 
(2) captures their int,er-relationships by anonymous 
links. Figure 2 illustrates both the I)arse tree and 
the associated semantic representation of a TI{EC-8 
question. 
Question: Why did I)avid Koresh ask the FBI for a word processor? 
Parse: 
SBAP.Q 
-- - ..... SQ 
/ //------ v,, 
// // 4- --~ --Pl' 
i / //\ / 
WRB VBI) NNP NN\[' VB l)T NNI' IN DT NN NN 
I I I I I I I I I l I 
Why did David Koresh ask the Fill for a word processor 
Semantic representation: 
~ask word ~.. 
REASON I)avid%/Y ---~"~\ processor 
Koresh FBI 
Figure 2: Question semantic transformation 
The actual transformation into semantic repre- 
sentation of a question or an answer is obtained 
as a by-product of the parse tree traversal. Ini- 
tially, all leaves of the parse tree are classified as 
294 
.@@nodes or no'n-skipnodes. All n(mns, non-auxiliary 
verbs, adjectives a.nd adverl)s are categorized as 
non-skitmodes. All the other h~aves are skipnodes. 
Bottom-u 1) trav('.rsal of tim 1)arse tree (:ntails tlm 
t)roi)agation of leaf labels wh('amver l;hc 1)arcnt nod(; 
has more than one non-skipnod(; child. A rule based 
on the syntactic category (it' th(.' father selects one of 
the childr(m to 1)ropagatc its label a,t the next level 
in the tree. The winning node will then be consid- 
ered linked to all the other fornmr siblings thai; al'e 
non-skilmodes. The prot)agation (:ontimms mltil the 
l)arse 1;l'(~.c root receives a label, and thus a scmanti(" 
gral)h is (:rc;tl;(;(l as a 1)y-1)rodu(:t. Part of th('. la- 
bel i)roI)agation, we also (:onsider that whenever all 
('hildr(;n of a non-terminal are skilmo(l(;,% the parent; 
becomes a. skipnode as well. 
Figure 3 rel)r(~'s(;nts the lal)el I)rOl)agation for th(; 
1)arse tree of the question l'el)res(mt(M in Figure 2. 
The labels of Korcsh, ask, FB1 and processor are 
l)rot)agated to tlw, next level. This entails t;hat Ko- 
r'c.s'h is linked to David, ask to I,'BI and procc',s'sor mM 
procc, ssor to word. As a.@ be(:(/m(!s the lab(:l of th(; 
tree to(it, it is a.lso linked to I~ds'A,9ON, the qu(~sti()n 
type (l('~I;(n'lnin(',d l)y tlm question st('m: wh, y. The 
lal)el 1)rot)agation rules are id(mtical t<) the rules fl)r 
mapping from trees to d(~t)(m(hm(:y s\[.rlt(:l;lll.'cs llSC,(l 
my Mi('ha(,q Collins (of. (Collins, 199(5)). These rules 
i(lenti\[}, the head-child, and pl'Ol)agatc its label up 
in the tree. 
/_ > (/ vV (>k) 
wllm)w' /~ Nl' (K,,,,,~IO' ' ' i , /' NI' \[ I:IH,I 
/ / ~ ~ / / \ / ' \ 
WRB VBI) NNP NNP , ', VII I)T NNP, IN I)T NN NN ) 
I I I I/', I I I~" I I I I ,' 
Wily did l)avid Korcsh ask file \[:BI for a word lUl)ccssiw 
Figur('. 3: \])~II'S(~ t,l'(~.(, ~ l;ra, v(',rsa\[ 
I'l'( processor !- 
// NI~ I)rOCCssor I / , . 
Logical Transformations 
The logical formulae in whi(:h questions or answers 
arc translated are inspired \[)y the notal;i(m prol/os('.d 
in (ltobbs, 1.986-1.) and implemented in TACITUS 
(ttobbs, 1986-2). 
Based on the davidsonian tl"eatmenl; of action sen- 
|;(?llC(;S, in which events are tr(~at, ed as individuals, 
every question and every answer are transform('xl in 
a tirst-order t)redicatc tbrmula for which (1) verbs 
are mapped in 1)red\[cares vcvb(e,x,y,z,...) with the. 
(:onvention thai; variable e rel)res(;nts the evc'nl;ual- 
i(;y of that acl;ion or even(; (;o take place, wh('a'eas (;lm 
othcu arguments (e.g. z, y, z, ...) repr(~,s(mt l;lm t)rcd- 
icate argmnents of the verb; (2) nouns arc mal)l)ed 
into their lexicalized predicates; raM, (3) mo(lificws 
have the same argument as the predicate they mod- 
i\[y. For (;Xalnpl('~, l;he qu(~si;ion illustrated in Figure 2 
has l;he following logical fornl transforlnation (LFT) : 
\[REASON (x) ~David (y) ~Koresh (y) ~ask (e, x, y, z, p) 
~FBI (z) {qproces sot (p) ~word (p) \] 
The process of trmlslat;ing a sema.ntic ret)resenta.- 
tion into a logic form has the following steps: 
1. For each node in the semantic rcprcscntation, 
create a prcdicatc with a distinct argument. 
2.a. If it noun and (tit ad, jective predicate arc linlccd 
tht:y should have, the sam(" argument. 
2.b. Tltc .qamc fin" verbs and adverbs, pairs of lwuns 
or an adjective and an advcrb. 
3. l,b'r each verb predicate, add aT~lumcnts corrc.sponding 
to each predicate to which it is directly 
linked in the semantic representation. 
Predicate argunmnts (:an be identified because tim 
soma,hi;it l'ol)res(.~nl;al;ion using &nollymous relations 
represenl;s uniformly adjuncts mM thematic roles. 
Ilowevel:, sl;e 1) 2 of the l;l"mtsl~d;ion procedure l'eCOg- 
nixes the adjuncts, making predicate argmnenl;s the 
remaining (:()nn(~(:tions of tlm v(n'}) in t;}l(, ~ semal~I;ic 
rq)resentation. 
3.2 Question Taxonomy 
The question taxonomy rel)rcscnl;s each question 
nod(', as a quintuple: (1) a setnan(;ic rcpr(~s(!ni;ation 
of a qucsLion; (2) th(; question type; (3) the m~swer 
typ(,; (4) the question focus an,l (5) the question 
keywoMs. By using over 1500 questions provided by 
\]lcme(lia, as well as otlmr 2000 quest;ions retrieved 
from FAQFinder, we have b(!en abh ~. to learn classiti- 
cation rules mM buihl a (:oinplex question taxonomy. 
\ ..... , \[ \]I1SIIIIICe I 
,,/- ;,,,\__ 
// ,, 
.2_ 
\[Artwo,k jl LI.ocation 1 
Figure 4: A snapshot of the top Question Taxonomy 
Initially, we stm'tcd with a seed hit;rarchy of 25 
question classes, manually built, in which all the se- 
mantic classes of the nodes fl'om the semantic repre- 
sentations were decided oil-line, by a hmnan expert. 
300 questions were processed to create this seed hi- 
erarchy. Figure 4 illustrates some of the nodes of 
295 
the top of this hierarchy. Later, as 500 more ques- 
tions were considered, we started classifying them 
semi-automatically, using the following two steps: 
(1) first a hmnan wonld decide the semantic class of 
each node in the semantic representation of the new 
question; (2) then a classification procedure would 
decide whether the question belongs to one of the 
existing classes or a new class should be considered. 
To be able to classify a new question in one of the ex- 
isting question classes, two conditions must be satis- 
fled: (a) all nodes fi'om the taxonomy question must 
correspond to new question nodes with the same 
semantic classes; and (b) unifyable nodes must be 
linked in the same way in both representations. The 
hierarchy grew to 68 question nodes. 
Later, 2700 more questions were classified fully 
automatically. To decide the semantic classes of the 
nodes, we used the WordNet semantic hierarchies, 
by simply assigning to each semantic representation 
node the same class as that of any other question 
term from its WordNet hiera.rchy. 
The semantic representation, having the same for- 
mat for questions and answers, is a case fi'ame with 
anonymous relations, that allows the unification of 
the answer to the question regardless of the case re- 
lation. Figure 5 illustrates tbur nodes fi'om the ques- 
tion taxonomy, two for the "currency" question tyI)e 
attd two for the "person name" question type. The 
Figure also represents the mappings of four TREC- 
8 questions in these hier~rchy nodes. The mappings 
are represented by dashed arcs. In this Figm'c, the 
nodes front the semantic representations that con- 
rain a question mark are place holders for the ex- 
pected answer type. 
An additional set of classification rules is assoei- 
ated with this taxonomy, hfitially, all rules are based 
on the recognition of the question stem and of the 
answer type, obtained with class intbrmation from 
WordNet. However we could learn new rules when 
inorphologieal and semantic variations of the seman- 
tic nodes arc allowed. Moreover, along with the new 
rules, we enrich the taxonomy, because often the new 
questions unify only partially with the current tax- 
enemy. All semantic and morphologic variations of 
the semantic representation nodes are grouped into 
word classes. Several of the word classes we used are 
listed in Table 1. 
\[I Word Ulass Words l\] 
Value words "monetary value", "money", "price" 
Expenditure words "spend", "buy", "rent", "invest" 
Creation words "author", "designer", ainvent~... 
Table 1: Examples of word classes. 
The bootstrapping algorithm that learns new 
classification rules and new classes of questions 
is based on an intbrmation extraction measure: 
scorc(rulei)--Ai * lo.q2(Ni), where Ai stands for the 
Class Name: currency 
\[ ~l O7: What debts did Qintex 
, ...... ..  rouploav : 
Class Name." Ilalne persoll 
Q32:Who received Ihe Will 
? Rogers Award in 1989? 
conlpctilioll " 
HFI2 semalltic reprd~ - ~ representation 
Class Nanle: author 
Q92: Who released 1he lnlernet 
'~ worm in the late 1980s? 
el'talc "" 
| .2~ ...... I 1980s1 
~c ~Bfi-) ~ semanttc representation _ 
Figure 5: Mapping Questions in the Taxonomy 
number of different lexicon entries for the answer 
type of the question, whereas Ni = Ai/Fi, where Fi 
is the number of different focus categories classified. 
The steps of the bootstrapping algorithm are: 
1. Retrieve concepts morphologically//semantically related 
to the semantic representations 
2. Apply the classification rules to all questions that 
contain any newly retrieved concepts. 
3. New_Classificatiton_Rules={} 
MUTUAL BOOTSTRAPPING LOOP 
4. Score. all new classification rules 
5. best_CR=thc highest scoring classification rule 
6. Add bcst_CR to the classification rules 
7. Add the questions classified by best_CR to the taxonomy 
8. Goto step 4 three times. 
9. Discard all new rules but the best scoring three. 
10. Goto 3. until the Q/A performance improves. 
296 
4 The Justification Option 
A Q/A system that provides with the option of jus- 
til~ving the answer has the advantage that erroneous 
answers can be ruled out syst(,'matieally. In our quest 
of enh~mcing the precision of a Q/A system by incof 
porating additional knowledge, we fount1 this option 
very helpflH. However~ the generation of justifica- 
tions for ol)en-domain textual Q/A systems poses 
some challenges. First;, we needed to develol) a very 
efficient prover, operating on logical form transfer- 
mat;ions. Our 1)rool'q are backchaining Do\]n the qlles- 
tions through a mixture of axioms. We use thl'ee 
forlllS of axioms: (1) axioms derived fl'om the facts 
stated in I;he l;eXtllal allswel; (2) ~/XiOlllS ro4)resent- 
ing world knowledge,; and (3) axioms (let;ermined by 
coreference resohlt;ion in the atiswer text;. For ex- 
alni)le , some of the axioms eml)loyed 1;o prove the 
answer to the TREC~8 question (26: Why did David 
Kor'csh, ask th, c 1,7\]1 for a 'wo~'d p~vccssor?" are: 
SET 1 
Mr(71):= null. Koresh(71):=null. word(72):=null. 
processor(72):=null, sent(77 76 78 71):=null. 
................................................ 
SET 2 
David(1):=Mr(1). REASON(5):= enable(5 3 6). 
SET 3 
FBI (i) : =null. 
................................................ 
The first set represents fa(',l;s extraete(| through 
LFT \]~re(licat(',s of the textual answer: "0'¢;(',~' t\]t(: 
'mc(~k(m,d My Kovcsh, .s'c'nt a vcq',,c.st fo'r a 'u;ord proce, s- 
so'r to c',,ablc h,i'H~, to 'record his vt:'~;clatio'~,s". The sec- 
Ol~d set ret)r(',senl;s worl(1 knowledge axioms that; we 
acquired senfi-auto\]na.tically. For instance we know 
that David is a male name, thus tlmt person cmt be 
addressed with Mr.. Similarly, events are (real>led 
for some reason. The third set of axioms represent 
the fact that the t,'l~l is in the context of the text 
answer. To be noted that the axioms derived from 
the answer have construct argument, s, relnesentc, d by 
convention with numl)ers larger titan 70. All the 
other arguments are variables. 
Q52 \¥ho invented the road trallic cone,? 
Answer ST~iliT~g proudly for the caTl~cras , Gover~tor 
(shallow Pete Wilson, US TraTtsportatioTt Secretary 
methods) l,'edcrico l)eT~a aTtd Mayor l~ichavd l~iovda~t 
removed ~t half- dozeTt phtstic or~t~tgc coTtcs 
firm the road~oay aTtd the first ca~'s passed 
I Answer David Mor~la~, the comp~tTty's maTtagiT~ 9 
(kb-based di~'ector rt~d iT~?~eltlo~" of the plastic 
methods) co~te cycle, collects them. 
Table 2: Examples of trot)roved answer correctness. 
Tim justification of this answer is t)rovided by the 
r~ ( following proof I;r~lee. \] h ', 1)rover ;tttelllI)tS t;o 1)rove 
the LFT of the question (QLF) corre(:t 1} 3, proving 
from left to right each term of QLF. 
->Answer:0ver the weekend Mr Koresh sent a request for a word 
processor to enable him to record his revelations. 
->~LF:David(l)~Koresh(1)"word(2)^processor(2)'FBI(4) ~ 
ask(3 4 2 i 5)'_REASDN(5)'_PER(1)*0RG(4) 
->ALF:Mr(71)"Koresh(71)'word(72)'processor(72)'revelations(74)" 
record(Z3 74 75)^enable(75 73 76)~request(76)'sent(77 76 78 71) 
^weekend(78)"PEK(71)'DATE(?8) 
-->Proving:David(1)^Koresh(1)^word(2)'processor(2)^FBI(4)" 
ask(3 4 2 i 5)'REASON(5)'_PER(1)'_0RG(4) 
There are i target axioms. Selected axiom: David(1):= Mr(1). 
Unifying: i to i. Ok 
--> Proving: Mr(1)'Koresh(J)~word(2)^processor(2)'FBl(4) 
^ask(3 4 2 i 5)^_REASDN(5)'_PER(1)^_0RG(4) 
There are i target axioms. Selected axiom: Mr(71):= null. 
Unifying: i to 71. Dk 
--> Proving: Koresh(71)*word(2)-processor(2)~FBl(4) " 
ask(3 4 2 Yl 5)* REASDN(5)^_PER(71)'_0RG(4) 
There are i target axioms. Selected axiom: Koresh(71):= null. 
Unifying: Y1 to 71. Dk 
--> Proving: word(2)'processor(2)^FBl(4)~ask(3 4 2 71 5)* 
_REASON(5)^_PER(71)'0RG(4) 
There are i target axioms. Selected axiom: word(72):= null, 
Unifying: 2 to 72. Ok 
--> Proving: processor(Y2)'FBl(4)^ask(3 4 72 71 5)^REASON(5) 
^_PER(71)^_DRG(4) 
There are i target axioms. Selected axiom: processor(72):= null. 
Unifying: ?2 to 72. Ok 
--> Proving: FBl(4)'ask(3 4 72 71 5)^REASON(5)^_PER(71)'_0RG(4) 
There are i target axioms, Selected axiom: FBI(1):= null. 
Unifying: 4 to i. Ok 
--> Proving: ask(3 4 72 71 5)~REASON(5)^_PER(ZI)^_BRG(4) 
There are 2 target axioms. Selected axiom: ask(l 2 3 4 5):= 
sent(l 6 7 4)'request(5). 
Unifying: i to 2. 3 to i. 5 to 5. 71 to 4. 72 to 3. Ok 
--> Proving: sent(l 6 7 ?I)^request(6)^_REASON(5)^_PER(YI)^_0RG(2) 
There are I target axioms, Selected axiom: sent(Z7 76 78 71):= null. 
Unifying: I to 77. 6 to Y6. Y to 78. 71 to 71. Ok 
--> Proving: request(76)^REASON(5)'_PER(71)^_0RG(2) 
There are i target axioms. Selected axiom: request(76):= null. 
Unifying: 76 to 76. Ok 
--> Proving: _REASON(5)°PER(71)°8}~G(2) 
There are 1 target axioms. Selected axiom: _REASON(5):= enable(5 3 6). 
Unifying: 5 to 5. flk 
--> Proving: enable(5 3 6)^_PER(YI)-_flRG(2) 
There are i target axioms. Selected axiom: enable(75 73 76): = null. 
Unifying: 3 to ?3. 5 to 75. 6 to 76. Dk 
--> Proving: _PER(71)^_ORG(2) 
There are 3 target axioms. Selected axiom: _PER(71):= null. 
Unifying: 71 to gl. 0k 
--> Proving: _ORG(2) 
There are i target axioms. Selected axiom: _0RG(1):= FBI(1). 
Unifying: 2 to i. Ok 
--> Proving: nullJl\]J We found:Success. 
.................................................................. 
There are cases when our simple prover fails to 
prove a. correct answer. We have notice(1 that this 
hal)pens 1)ecause in the answer semantic representa- 
tion, st)me concepts that are connected in the ques- 
tion semantic representation are no longer directly 
linked. This is due to the f~mt that there are either 
parser errors or there are new syntactic dependen- 
cies between the two concepts. To acconmmdm;e 
this situation, we allow diflhrent constants that are 
arguments of the sanle predicate to be unifiable. The 
special cases in which this relaxation of the unifica- 
tion i)roeedul'e is allowed constitute our abduction 
rltles. 
297 
5 Evaluation 
Both qualitative and quautitative evaluation of the 
integration of surface text-based and knowledge- 
based methods for Q/A is imposed. Quantitatively, 
Tal)le 3 summarizes the scores obtained when only 
shallow methods were emI)loyed, in contrast with 
the results when knowledge-based methods were in- 
tegrated. We have sepm'ately measured the effect 
of tile integration of the knowledge-based methods 
at question processing and answer processing level. 
We have also evaluated the precision of the sys- 
tern when both integrations were implemented. The 
results were the first five answes's returned within 
250 bytes of text, when approximatively half mil- 
lion TREC documents are mined. Wc have used the 
200 questions from TREC-8, mid tile correct answers 
provided by NIST. The performance was measured 
both with the NIST scoring method employed in the 
TREC-8 and by simply assigning a score of 1 tbr the 
question having a correct answer, regardless of its 
position. 
Percentage of NIST score 
correct answers 
in top 5 returns 
Tezt-suTJ'acc-bascd 77.7% 64.5% 
Knowledge-based 83.2% 71.5% 
Question Processing 
(only) 
~l~:rt-su@zce-bascd 77.7% 73% 
only 'with Answer 
Justification 
Knowledge-based 89.5% 84.75% 
Question Pwces.sin9 
with Answer 
Justification 
Table 3: Accuracy t)erformance 
When using the NIST scoring method to eval- 
uate an individual answer, we used only six 
values:(1, .5, .33, .25, .2, 0), representing the score the 
answer's question obtains. If the first answer is cor- 
rect, it obtains a score of 1, if the second one is cor- 
rect, it is scored with .5, if the third one is correct, 
tile score becomes .aa, if the fourth is correct, the 
score is .25 and if the fifth one is correct, the score 
is .2. Otherwise, it is scored with 0. No credit is 
given if multiple answers are correct. Table 3 shows 
that both knowledge-based methods enhanced the 
precision, regardless of the scoring method. 
To further evaluate the contribution of tim justi- 
ficat, ion option, we evaluated separately the preci- 
sion of the prover tbr those questions for which tile 
surface-text-based methods of our system, when op- 
erating alone, emmet find correct answers. We had 
45 TREC-8 questions for which the evaluation of the 
prover was performed. Table 4 summarizes the ac- 
curacy of the prover. 
hlcorrect ~ilswors 
(no knowledge) 
Correct m~swers 
(KB-based) 
Incorrect answers 
(KB-based) 
P rovell 
con'ect 
3 
127 
l'roven Precision 
incorrect 
210 98.5% 
5 96.2% 
38 90.04% 
Table 4: Prover performmme 
Qualitatively, we find that the integration of 
knowledge-based methods is very beneficial. Table 2 
illustrates tile correct answer obtained with these 
methods, in contrast to tile incorrecl, answer pro- 
vided when only the shallow techniques m'e al)plied. 
6 Conclusions 
\Ve believe that the lmrfol'nmnce of a Q/A system 
del)ends on the knowledge sources it employs. In 
this paper we trove presented the effect of tile in- 
tegration of knowledge derived from question tax- 
enemies and produced by answer justifications on 
the Q/A precision. Our knowledge-based methods 
are lightweight, since, we do not generate precise se- 
mantic rel)resentations of questions or answo.rs, but 
mere attproximations determismd by syntactic de- 
1)en(lencies. Fm'thermore, our prover operates on 
very simple logical representations, its which syntac- 
tic and semantic ambiguities are completely ignored. 
Nevertheless, we have shown that these approxima- 
tions are functional, since we implemented a prover 
that justifies answers with high precision. Similarly, 
our knowledge-t)ased question l)rocessillg is a nlel'e 
combination of word class information and syntactic 
dep endencies. 

References 

Michael Collins. A New Statistical Parser Based on Bigram 
Lexical Dependencies. In Proceedings of the 2dst Annual 
Meeting of the Association for Computational Linguistics, 
ACL-96 pages 184 19t, 1996. 

Christlane Fellbaum (Ed). WordNet - An Electronic Lexical 
l)atabase. MIT Press, 1998. 

Jerry R. IIobbs. Discourse and Inference. Unpublished 
malmscrlpt, 1986. 

Jerry R. Hobbs. Overview of the TACITUS Project In Computational Linguistics, 12:(3), 1986. 

Jerry ltobbs, Mark StickeI, Doug Appelt, and Paul Mm'- 
tin. Interpretation as abduction. Artificial Intelligence, 63, 
pages 69 142, 1993. 

YVendy Lehnert. The processing of question answering. 
Lawrence Erlbaum Publishers, 1978. 

l)an Moldovan, Sanda llarabagiu, Marius Pasta, Rada Mi- 
halcea, Richard Goodrum, Roxana Gh:iu alK1 Vasile ll.us. 
Lasso: a tool for surfing the answer net. In Proceedings of 
TREC-8, 1999. 

Ellen Riloff and Rosie Jones. Learning l)ictionaries for hffor- 
mation Extraction by Multi-Level Bootstrat)plng. In Pro- 
cccdings of the 16th National Conference on Artificial In- 
telligence, AAAI-99, 1999. 
