Proceedings of the Workshop on Task-Focused Summarization and Question Answering, pages 32–39,
Sydney, July 2006. c©2006 Association for Computational Linguistics
Using Scenario Knowledge in Automatic Question Answering
Sanda Harabagiu and Andrew Hickl
Language Computer Corporation
1701 North Collins Boulevard
Richardson, Texas 75080 USA
sanda@languagecomputer.com
Abstract
This paper describes a novel framework
for using scenario knowledge in open-
domain Question Answering (Q/A) appli-
cations that uses a state-of-the-art textual
entailment system (Hickl et al., 2006b) in
order to discover textual information rele-
vant to the set of topics associated with a
scenario description. An intrinsic and an
extrinsic evaluation of this method is pre-
sented in the context of an automatic Q/A
system and results from several user sce-
narios are discussed.
1 Introduction
Users of today’s automatic question-answering
(Q/A) systems generally have complex informa-
tion needs that cannot be satis ed by asking single
questions in isolation. When users interact with
Q/A systems, they often formulate sets of queries
that they believe will help them gather the infor-
mation that needed to perform one or more spe-
ci c tasks. While human users are generally able
to identify their information needs independently,
the information needs of organizations are often
presented in the form of short prose descriptions
 known as scenarios  which outline the range
of knowledge sought by a customer in order to
achieve a speci c outcome or to accomplish a par-
ticular task. (An example of one scenario is pre-
sented in Figure 1.)
Recent work in Q/A has sought to use in-
formation derived from these kinds of scenar-
ios in order to retrieve sets of answers that are
more relevant  and responsive  to a customer’s
information needs. While (Harabagiu et al.,
2005) used topic signatures (Lin and Hovy, 2000;
Scenario Description
The customer has commissioned a research project looking at the
impact of the outsourcing of American jobs on the United States’
relationship with India. After conducting research on U.S.
companies currently doing business in India, the customer wants
to know why American corporations have sought to outsource jobs
to India, the types of economic advantages that American companies
could gain from relocating to India, and the kinds of economic or
political inducements that India has offered to American companies
looking to outsource jobs there. The customer is not interested
in demographic information on Indian employees of American  rms.
Table 1: Example of a User Scenario.
Harabagiu, 2004) computed automatically from
collections of documents relevant to a scenario in
order to approximate the semantic content of a
scenario, (Narayanan and Harabagiu, 2004) em-
ployed formal models of the interrelated events,
actions, states, and relations implicit to a sce-
nario in order to produce  ne-grained, context-
sensitive inferences that could be used to answer
questions. Scenario knowledge was also included
in the form of axiomatic logic transformation de-
veloped in (Moldovan et al., 2003). Under this
approach, information extracted from the scenario
narrative is converted to logical axioms that can
used in conjunction with a logic prover in order
justify answers returned for questions.
In this paper, we propose that scenario-relevant
passages in natural language texts can be identi ed
by recognizing a semantic relation, known as con-
textual entailment (CE), that exists between a text
passage and one of a set of subquestions that are
conventionally implied by a scenario. Under this
model, we expect that a scenario S can be consid-
ered to contextually entail a passage t, when there
exists at least one subquestion q derived from S
that textually entails the passage t. We show that
by using a state-of-the-art textual entailment sys-
tem (Hickl et al., 2006b), we can provide Q/A sys-
tems with another mechanism for approximating
the inference between questions and relevant an-
swers. We show how each of these cases of con-
32
textual entailment can be computed and how it can
be used in the intrinsic and extrinsic evaluation of
a Q/A system.
The remainder of the paper is organized in the
following way. Section 2 introduces our notion of
contextual entailment and provides a framework
for recognizing instances of CE between scenar-
ios and both questions and answers. Section 3 de-
scribes the textual entailment system used at the
core of our CE system. Sections 4 and 5 describe
separate frameworks for intrinsically and extrinsi-
cally evaluating the impact of CE on current Q/A
systems. Section 6 presents results from our evalu-
ations, and Section 7 summarizes our conclusions
2 Recognizing Contextual Entailment
We de ne contextual entailment (CE) as a direc-
tional relation that exists between a text passage t
and one of a set of implicit subquestions q that can
be derived from a user’s interpretation of a sce-
nario. Informally, we consider that a scenario S
contextually entails a passage t when there exists
at least one subquestion q implied by S that can be
considered to entail t.
We expect that the meaning of an information-
seeking scenario S can be represented as a ques-
tion under discussion (QUD) QS, which denotes a
partially-ordered set of subquestions (q ∈ QS) that
represent the entire set of questions that could po-
tentially be asked in order to gather information
relevant to S. Taken together, we expect these
subquestions to represent the widest possible con-
strual of a user’s information need given S.
Entailment?
Contextual
Entailment?
Contextual
CASE 1 CASE 2 CASE 3
Answer
Question Question
User Scenario
Entailment?
Answer
Question
User Scenario
Entailment?
Answer
User Scenario
Entailment?
Contextual Entailment? Contextual Entailment?
Figure 1: Three types of Contextual Entailment
We believe the set of subquestions implied by
QS can be used to test whether a text passage is
relevant to S. Since the formal answerhood re-
lation between a question and its answer(s) can
be cast in terms of (logical) entailment (Groe-
nendijk, 1999; Lewis, 1988), we believe that sys-
tems for recognizing textual entailment (Dagan et
al., 2005) could be used in order to identify those
text passages that should be considered when gath-
ering information related to a scenario. Based on
these assumptions, we expect that the set of text
passages that are textually entailed by subques-
tions derived from a scenario represent informa-
tion that is more likely to be relevant to the overall
topic of the scenario as a whole.
We expect that there are three types of contex-
tual entailment relationships that could prove use-
ful for automatic Q/A systems. First, as illustrated
in Case 1 in 1, CE could exist between a scenario
and one of the set of answers returned by a Q/A
system in response to a user’s question. Second,
as in Case 2, CE could be established directly be-
tween a scenario and the question asked by the
user. Finally, as in Case 3, CE could be established
both between a scenario and a user’s question as
well as between a scenario and one of the answers
returned by the Q/A system for that question.
Figure 2 provides examples of each of these
three types of contextual entailment using the sce-
nario presented in Figure 1.
CASE 1:
Scenario:
companies could gain from relocating to India
the types of economic advantages that American 
Answer:
Question:
GE and Dell have reported earnings growth after
outsourcing jobs to both Indonesia and India
What U.S. companies are outsourcing jobs to
Indonesia?
Textual Entailment Contextual Entailment
Scenario:
companies could gain from relocating to India
the types of economic advantages that American 
Answer:
Question:
Scenario:
companies could gain from relocating to India
the types of economic advantages that American 
Answer:
Question:
certain types of jobs to India?
How could U.S. companies profit from moving
How could U.S. companies benefit by moving jobs
to India?
Outsourcing jobs to India saved the carrier $25 
million, enabling it to turn a profit for the first time.
Despite public opposition to outsourcing jobs to
India, political support has never been higher.
CASE 3:
CASE 2: S does not entail A, Q entails A, S entails Q
S entails A, Q entails A, S does not entail Q
S entails A, Q entails A, S entails Q
Figure 2: Examples of Contextual Entailment.
In Case 1, the scenario textually entails the
meaning of the answer passage, as earnings
growth from outsourcing necessarily represents
one of the types of economic advantages that can
be derived from outsourcing. However, the sce-
nario cannot be seen as entailing the user’s ques-
tion, as the user’s interest in job outsourcing in
Indonesia cannot be interpreted as being part of
the topics that are associated with the scenario.
In this case, recognition of contextual entailment
would allow systems to be sensitive to the types of
33
scenario-relevant information that is encountered
 even when the user asks questions that are not
entailed by the scenario itself. We expect that this
type of contextual entailment would allow systems
to identify scenario-relevant knowledge through-
out a user’s interaction with a system, regardless
of topic of a user’s last query.
In Case 2, the user’s question is entailed by
the scenario, but no corresponding entailment re-
lationship can be established between the scenario
and the answer passage identi ed by the Q/A sys-
tem as an answer to the question. While political
support may be interpretable as one of the bene ts
realized by companies that outsource, it cannot be
understood as one of the economic advantages of
outsourcing. Here, recognizing that contextual en-
tailment could not be established between the sce-
nario and the answer  but could be established
between the scenario and the question  could be
used to signal the Q/A system to consider addi-
tional answers before moving on to the user’s next
question. By identifying contextual entailment
relationships between answers and elements in a
scenario, systems could perform valuable forms of
answer validation that could be used to select only
the most relevant answers for a user’s considera-
tion.
Finally, in Case 3, entailment relationships exist
between the scenario and both the user’s question
and the returned answer, as saving $25 million can
be considered to be both an economic advantage
and one of the ways that companies pro t from
outsourcing. In this case, the establishment of con-
textual entailment could be used to inform topic
models that could be used to identify and extract
other similarly relevant passages for the user.
In order to capture these three types of CE re-
lationships, we developed the architecture for rec-
ognizing contextual entailment illustrated in Fig-
ure 3.
This architecture includes three basic types of
modules: (1) a Context Discovery module, which
identi es passages relevant to the concepts men-
tioned in a scenario, (2) a Textual Entailment mod-
ule, which recognizes implicational relationships
between passages, and (3) a Entailment Merg-
ing module, which ranks relevant passages ac-
cording to their relevance to the scenario itself.
In Context Discovery, document retrieval queries
are  rst extracted from each sentence found in a
scenario. Once a set of documents has been as-
User
Scenario
Query Extraction
Documents
Relevant
Documents
Irelevant
Topic Signatures
Signature 
Answer
Signature 
Answer
Signature 
Answer
Signature 
Answer
Signature 
Answer
Entailment
Textual
Question/Answer
Scenario Context
Merging Textual Entailment Results
Contextual Entailment Decision / Confidence
Figure 3: Contextual Entailment Architecture.
sembled, topic signatures (Lin and Hovy, 2000;
Harabagiu 2004) are computed which identify the
set of topic-relevant concepts  and relations be-
tween concepts  that are found in the relevant set
of documents. Weights associated with each set of
topic signatures are then used to extract a set of
relevant sentences  referred to as topic answers  
from each relevant document. Once a set of topic
answers have been identi ed, each topic answer is
paired with a question submitted by a user and sent
to the Textual Entailment system described in Sec-
tion 2. Topic answers that are deemed to be pos-
itive entailments of the user question are assigned
a con dence value by the TE system and are then
sent to a Entailment Merging module, which uses
logistic regression in order to rank passages ac-
cording to their expected relevance to the user sce-
nario. Here, logistic regression is used to  nd a set
of coef cients bj (where 0 ≤ j ≤ p) in order to  t
a variable x to a logistic transformation of a prob-
ability q.
logit(q) = log q1 −q = b0 +
psummationdisplay
j=1
bjxj + e
We believe that since logistic regression uses a
maximum likelihood method, it is a suitable tech-
nique for normalizing across range of con dence
values output by the TE system.
34
Coreference
Coreference
NEAliasing
Concept
Textual
Input 1
Textual
Input 2
Lexical Alignment
Paraphrase Acquisition
Alignment Module
WWW
Training
Corpora 
Classifier
YES
NO
Features
Alignment
Dependency
Features
Paraphrase
Features
Semantic/
Pragmatic
Features
Feature Extraction
Classification Module
Lexico−Semantic
PoS/ NER
Synonyms/
Antonyms Normalization
Syntactic
Semantic
Temporal
Parsing
Modality Detection Speech Act Recognition
Pragmatics
Factivity Detection Belief Recognition
Preprocessing
Figure 4: Textual Entailment Architecture.
3 Recognizing Textual Entailment
Recent work in computational seman-
tics (Haghighi et al., 2005; Hickl et al., 2006b;
MacCartney et al., 2006) has demonstrated the
viability of supervised machine learning-based
approaches to the recognition of textual en-
tailment (TE). While these approaches have
not incorporated the forms of structured world
knowledge featured in many logic-based TE sys-
tems, classi cation-based approaches have been
consistently among the top-performing systems
in both the 2005 and 2006 PASCAL Recognizing
Textual Entailment (RTE) Challenges (Dagan et
al., 2005), with the best systems (such as (Hickl
et al., 2006b)) correctly identifying instances of
textual entailment more than 75% of the time.
The architecture of our TE system is presented
in Figure 4.1 Pairs of texts are initially sent to a
Preprocessing Module, which performs syntactic
and semantic parsing of each sentence, resolves
coreference, and annotates entities and predicates
with a wide range of lexico-semantic and prag-
1For more information on the TE system described in this
section, please see (Hickl et al., 2006b) and (Harabagiu and
Hickl, 2006).
matic information, including named entity infor-
mation, synonymy and antonymy information, and
polarity and modality information.
Once preprocessing is complete, texts are then
sent to an Alignment Module, which uses lexi-
cal alignment module in conjunction with a para-
phrase acquisition module in order to determine
the likelihood that pairs of elements selected from
each sentence contain corresponding information
that could be used in recognizing textual entail-
ment. Lexical Alignment is performed using a
Maximum Entropy-based classi er which com-
putes an alignment probability p(a) equal to the
likelihood that a term selected from one text cor-
responds to an element selected from another text.
Once these pairs of corresponding elements are
identi ed, alignment information is then used in
order to extract portions of texts that could be
related via one or more phrase-level alternations
or  paraphrases . In order to acquire these al-
ternations, the most likely pairs of aligned ele-
ments were then sent to a Paraphrase Acquisition
module, which extracts sentences that contain in-
stances of both aligned elements from the World
Wide Web.
Output from these two modules are then com-
bined in a  nal Classi cation Module, which uses
features derived from (1) lexico-semantic prop-
erties, (2) semantic dependencies, (3) predicate-
based features (including polarity and modality),
(4) lexical alignment, and (5) paraphrase acquisi-
tion in order learn a decision tree classi er capable
of determining whether an entailment relationship
exists for a pair of texts.
4 Intrinsic Evaluation of Contextual
Entailment
Since we believe CE is intrinsic to the Q/A task,
we have evaluated the impact of contextual en-
tailment on our Q/A system in two ways. First,
we compared the quality of the answers produced,
with and without contextual entailment. Second,
we evaluated the quality of the ranked list of para-
graphs against the list of entailed paragraphs iden-
ti ed by the CE system and the set of relevant an-
swers identi ed by the Q/A system. This compar-
ison was performed for each of the three cases of
entailment presented in Figure 2.
We have explored the impact of knowledge
derived from the user scenario through different
forms of contextual entailment by comparing the
35
Topic Signatures Relevant Answers
Generation
AUTO−QUAB
List of QuestionsQuestionSimilarity
ENTAILMENT
CONTEXTUAL
ENTAILMENT
CONTEXTUAL
Processing
Question
Module
(QP)
Keywords
Passage
Retrieval
Module
(PR)
Ranked List of Paragraphs
List of Entailed Paragraphs
Module
Answer
Processing
(AP)
Question
User
Scenario
Answer
Set1
Answer
Set2
Module
Answer
Processing
(AP)
Answer
Set3
Documents
Figure 5: Framework for Intrinsic Evaluation of Contextual Entailment in Q/A.
results of such knowledge integration in a Q/A
system against the usage of scenario knowledge
reported in (Harabagiu et al., 2005).
Topic signatures, derived from the user scenario
and from documents are used to establish text pas-
sages that are relevant to the scenario, and thus
constitute relevant answers. For each such an-
swer, one or multiple questions were built auto-
matically with the method reported in (Harabagiu
et al., 2005). When a new question was asked, its
similarity to any of the questions generated based
on the knowledge of the scenario is computed, and
its corresponding answer is provided as an answer
for the current question as well. Since the ques-
tions are ranked by similarity to the current ques-
tion, the answers are also ranked and produce the
Answer Set1 illustrated in Figure 5.
When a Q/A system is used for answering the
question, the scenario knowledge can be used in
two ways. First, the keywords extracted by the
Question Processing module can be enhanced with
concepts from the topic signatures to produce a
ranked list of paragraphs, resulting from the Pas-
sage Retrieval Module. These passages together
with the question and the user scenario are used
in one of the contextual entailment con gurations
to derive a list of entailed paragraphs from which
the Answer Processing module can extract the an-
swer set 2 illustrated in Figure 5. In another way,
the ranked list of paragraphs is passed to the An-
swer Processing module, which provides a set of
ranked answers to the contextual entailment con-
 gurations to derive a list of entailed answers, rep-
resented as answer set 3 in Figure 5. We evalu-
ate the quality of each set of answers, and for the
answer set 2 and 3, we produce separate evalua-
tion for each con guration for the contextual en-
tailment.
5 Extrinsic Evaluation of Contextual
Entailment
Questions asked in response to a user scenario
tend to be complex. Following work in (Hickl
et al., 2004), we believe complex questions can
be answered in one of two ways: either by
(1) using techniques (similar to the ones pro-
posed in (Harabagiu et al., 2006)) for automati-
cally decomposing complex questions into sets of
informationally-simpler questions, or by (2) us-
ing a multi-document summarization (MDS) sys-
tem (such as the one described in (Lacatusu et al.,
2006)) in order to assemble a ranked list of pas-
sages which contain information that is potentially
relevant to the user’s question.
First, we expect that contextual entailment can
be used to select the decompositions of a complex
question that are most closely related to a scenario.
By assigning more con dence to the decomposi-
tions that are contextually entailed by a scenario,
systems can select a set of answers that are rel-
evant to both the user scenario  and the user’s
question. In contrast, contextual entailment can be
used in conjunction with the output of a MDS sys-
tem: once a summary has been constructed from
the passages retrieved for a query, contextual en-
36
User
Scenario
Question
Keyword Extraction
System
Question AnsweringEntailmentContextual
Question Decomposition Documents
Entailment
Contextual
Multi−Document
Summarization
System
Candidate
Sub−Questions
Query
RelevantDocuments
Candidate Answers
Summary
Answers
Ranked
Figure 6: Framework for Extrinsic Evaluation of Contextual Entailment in Q/A.
tailment can be used to select the most relevant
sentences from the summary.
The architecture of this proposed system is il-
lustrated in Figure 6.
When using contextual entailment for selecting
question decompositions, we rely on the method
reported in (Harabagiu et al., 2006) which gener-
ates questions by using a random walk on a bipar-
tite graph of salient relations and answers. In this
case, the recognition of entailment between ques-
tions operates as a  lter, forcing questions that are
not entailed by any of the signature answers de-
rived from the scenario context (see Figure 3) to
be dropped from consideration.
When entailment information is used for re-
ranking candidate answers, the summary is added
to the scenario context, each summary sentence
being treated akin to a signature answer. We be-
lieve that the summary contains the most informa-
tive information from both the question and the
scenario, since the queries that produced it orig-
inated both in the question and in the scenario. By
adding summary sentences to the scenario context,
we have introduced a new dimension to the pro-
cessing of the scenario. The contextual entailment
is based on the textual entailments between any of
the texts from the scenario context and any of the
candidate answers.
6 Experimental Results
In this section, we present preliminary results from
four sets of experiments which show how forms of
textual  and contextual  entailment can enhance
the quality of answers returned by an automatic
Q/A system.
Questions used in these experiments were gath-
ered from human interactions with the interactive
Q/A system described in (Hickl et al., 2006a). A
total of 6 users were asked to spend approximately
90 minutes gathering information related to three
different information-gathering scenarios similar
to the one in Table 1. Each user researched two
different scenarios, resulting in a total of 12 to-
tal research sessions. Once all research sessions
were completed, linguistically well-formed ques-
tions were extracted from the system logs for each
session for use in our experiments; ungrammatical
questions or keyword-style queries were not used
in our experiments. Table 2 presents a breakdown
of the total number of questions collected for each
of the 6 scenarios.
Scenario Name Users Total Qs Avg. Q/Session σ2
S1. India Outsourcing 4 45 11.25 2.50
S2. Chinese WMD Proliferation 4 38 9.50 6.45
S3. Libyan Bioweapons Programs 4 63 15.75 2.22
Total 12 146 12.17 1.23
Table 2: Questions Collected from User Experi-
ments.
In order to evaluate the performance of our Q/A
system under each of the experimental conditions
described below, questions were re-submitted to
the Q/A system and the top 10 answers were re-
trieved. Two annotators were then tasked with
judging the correctness  or  relevance  of each
returned answer to the original question. If the an-
swer could be considered to provide either a com-
plete or partial answer to the original question, it
was marked as correct; if the answer contained in-
formation that could not be construed as an answer
to the original question, it was marked as incor-
rect.
6.1 Textual Entailment
Following (Harabagiu and Hickl, 2006), we used
TE information in order to  lter answers identi ed
by the Q/A system that were not entailed by the
user’s original question. After  ltering, the top-
ranked entailed answer (as determined by the Q/A
system) was returned as the system’s answer to the
question. Results from both a baseline version and
a TE-enhanced version of our Q/A system are pre-
sented in Table 4.
Although no information from the scenario was
used in this experiment, performance of the Q/A
37
S1 S2 S3 Total
# of Questions 45 38 63 146
baseline top 1 8 (17.78%) 6 (15.79%) 11 (17.46%) 25 (17.12%)
TE top 1 10 (22.22%) 8 (21.05%) 16 (25.40%) 34 (23.29%)
baseline top 5 17 (37.78%) 16 (42.11%) 27 (42.86%) 60 (41.10%)
TE top 5 20 (44.44%) 17 (44.74%) 32 (50.79%) 69 (47.26%)
Table 3: Impact of Textual Entailment on Q/A.
system increased by more than 6% over the base-
line system for each of the three scenarios. These
results suggest that TE can be used effectively in
order to boost the percentage of relevant answers
found in the top answers returned by a system:
by focusing only on answers that are entailed by
a user’s question, we feel that systems can better
identify passages that might contain information
relevant to a user’s information need.
6.2 Contextual Entailment
In order to evaluate the performance of our con-
textual entailment system directly, two annota-
tors were tasked with identifying instances of CE
amongst the passages and answers returned by
our Q/A system. Annotators were instructed to
mark a passage as being contextually entailed by
a scenario only when the passage could be rea-
sonably expected to be associated with one of the
subtopics they believed to be entailed by the com-
plex scenario. If the passage could not be associ-
ated with the extension of any subtopic they be-
lieved to be entailed by the scenario, annotators
were instructed to mark the passage as not being
contextually entailed by the scenario. For evalua-
tion purposes, only examples that were marked by
both annotators were considered as valid examples
of CE.
Annotators were tasked with evaluating three
types of output from our Q/A system: (1) the
ranked list of passages retrieved by our system’s
Passage Retrieval module, (2) the list of passages
identi ed as being CE by the scenario, and (3) the
set of answers marked as being CE by the scenario
(AnsSet3). Results from the annotation of these
passages are presented in Table 4.
S1 S2 S3 Total
# %Rel # %Rel # %Rel # %Rel
Ranked Paragraphs 450 40.4% 380 31.3% 630 42.5% 1460 39.3%
Entailed Paragraphs 112 46.5% 87 44.8% 149 52.4% 348 48.6%
Answer Set 3 304 44.4% 188 39.9% 322 49.1% 814 45.2%
Table 4: Distribution of CE.
Annotators marked 39.3% of retrieved passages
as being CE by one of the three scenarios. This
number increased substantially when only pas-
sages identi ed by the CE system were consid-
ered, as annotators judged 48.6% of CE passages
and 45.2% of CE- ltered answers to be valid in-
stances of contextual entailment.
6.3 Intrinsic Evaluation
In order to evaluate the impact of CE on a Q/A sys-
tem, we compared the quality of answers produced
(1) when no CE information was used (AnsSet1),
(2) when CE information was used to select a
list of entailed paragraphs that were submitted to
an Answer Processing module (AnsSet2), and (3)
when CE information was used directly to select
answers (AnsSet3). Results from these three ex-
periments are presented in Table 5.
S1 S2 S3 Total
# of Questions 45 38 63 146
AnsSet1 top 1 12 (26.67%) 9 (23.68%) 19 (30.16%) 40 (27.39%)
AnsSet2 top 1 16 (35.56%) 11 (28.95%) 26 (41.27%) 53 (36.30%)
AnsSet3 top 1 14 (31.11%) 15 (39.47%) 31 (49.21%) 60 (41.09%)
AnsSet1 top 5 21 (46.67%) 17 (44.74%) 30 (47.62%) 68 (46.58%)
AnsSet2 top 5 24 (53.33%) 18 (47.37%) 35 (55.55%) 77 (52.74%)
AnsSet3 top 5 29 (64.44%) 20 (52.63%) 39 (61.90%) 88 (60.27%)
Table 5: Intrinsic Evaluation of CE on Q/A Per-
formance.
As with the TE-based experiments described in
Section 7.1, we found that the Q/A system was
more likely to return at least one relevant an-
swer among the top-ranked answers when con-
textual entailment information was used to either
rank or select answers. When CE was used to
rank passages for Answer Processing (AnsSet2),
accuracy increased by nearly 9% over the base-
line (AnsSet1), while accuracy increased by al-
most 14% overall when CE was used to select an-
swers directly (AnsSet3).
6.4 Extrinsic Evaluation
In order to evaluate the performance of the frame-
work illustrated in Figure 6, we compared the per-
formance of a question-focused MDS system ( rst
described in (Lacatusu et al., 2006)) that did not
use CE with a similar system that used CE to rank
passages for a summary answer.
When CE was not used, sentences identi ed by
the system’s Q/A and MDS system for each ques-
tion were combined and ranked based on number
of question keywords found in each sentence. In
the CE-enabled system (analogous to the system
depicted in Figure 6), only the sentences that were
contextually entailed by the scenario were consid-
ered; sentences were then ranked using the real-
valued entailment con dence computed by the CE
system for each sentence. Results from this sys-
tem are presented in Table 6.
Although the CE-enabled system was more
likely to return a scenario-relevant sentence in top
38
S1 S2 S3 Total
# of Questions 45 38 63 146
Without CE top 1 14 (31.11%) 15 (39.47%) 31 (49.21%) 60 (41.09%)
With CE top 1 20 (44.44%) 16 (42.11%) 32 (50.79%) 68 (48.23%)
Without CE top 5 29 (64.44%) 20 (52.63%) 39 (61.90%) 88 (60.27%)
With CE top 5 29 (64.44%) 21 (55.26%) 40 (63.49%) 90 (61.64%)
Table 6: Extrinsic Evaluation.
position (48.23%) than the system that did not
use CE (41.09%), differences between the systems
were much less apparent when the top 5 answers
returned by each system were compared.
7 Conclusions
This paper introduced a new form of textual entail-
ment, known as contextual entailment, which can
be used to recognize scenario-relevant information
in both the questions users ask and in the answers
that automatic Q/A systems return. In addition
to outlining a framework for recognizing contex-
tual entailment in texts, we showed that contextual
entailment information can signi cantly enhance
the quality of answers returned by a Q/A system
in response to users’ questions about a particular
scenario. In our evaluations, we found that using
contextual entailment allowed Q/A systems to im-
prove their accuracy by more than 10% overall.
8 Acknowledgments
This material is based upon work funded in whole
or in part by the U.S. Government and any opin-
ions,  ndings, conclusions, or recommendations
expressed in this material are those of the authors
and do not necessarily re ect the views of the U.S.
Government.

References
Ido Dagan, Oren Glickman, and Bernardo Magnini. 2005.
The PASCAL Recognizing Textual Entailment Challenge.
In Proceedings of the PASCAL Challenges Workshop.
Jeroen Groenendijk. 1999. The logic of interrogation: Clas-
sical version. In Proceedings of the Ninth Semantics and
Linguistics Theory Conference (SALT IX), Ithaca, NY.
Aria Haghighi, Andrew Ng, and Christopher Manning. 2005.
Robust textual inference via graph matching. In Pro-
ceedings of Human Language Technology Conference and
Conference on Empirical Methods in Natural Language
Processing, Vancouver, British Columbia, Canada, Octo-
ber.
Sanda Harabagiu and Andrew Hickl. 2006. Methods
for using textual entailment in open-domain question-
answering. In Proceedings of the Joint International
Conference on Computational Linguistics and Annual
Meeting of the Association for Computational Linguistics
(COLING-ACL 2006), Sydney, Australia.
Sanda Harabagiu, Andrew Hickl, John Lehmann, and Dan
Moldovan. 2005. Experiments with Interactive Question-
Answering. In Proceedings of the 43rd Annual Meeting of
the Association for Computational Linguistics (ACL’05).
Sanda Harabagiu, Finley Lacatusu, and Andrew Hickl. 2006.
Answering complex questions with random walk models.
In 2006 ACM SIGIR Conference on Research and Devel-
opment in Information Retrieval (SIGIR), Seattle, WA.
Sanda Harabagiu. 2004. Incremental Topic Representations.
In Proceedings of the 20th COLING Conference, Geneva,
Switzerland.
Andrew Hickl, John Lehmann, John Williams, and Sanda
Harabagiu. 2004. Experiments with Interactive Question-
Answering in Complex Scenarios. In Proceedings of the
Workshop on the Pragmatics of Question Answering at
HLT-NAACL 2004, Boston, MA.
Andrew Hickl, Patrick Wang, John Lehmann, and Sanda
Harabagiu. 2006a. FERRET: Interactive Question-
Answering for Real-World Environments. In Proceed-
ings of the Joint International Conference on Computa-
tional Linguistics and Annual Meeting of the Association
for Computational Linguistics (COLING-ACL 2006) In-
teractive Presentations Program, Sydney, Australia.
Andrew Hickl, John Williams, Jeremy Bensley, Kirk Roberts,
Bryan Rink, and Ying Shi. 2006b. Recognizing Textual
Entailment with LCC’s Groundhog System. In Proceed-
ings of the Second PASCAL Challenges Workshop, Syd-
ney, Australia.
Finley Lacatusu, Andrew Hickl, Kirk Roberts, Ying Shi,
Jeremy Bensley, Bryan Rink, Patrick Wang, and Lara Tay-
lor. 2006. LCC’s GISTexter at DUC 2006: Multi-Strategy
Multi-Document Summarization. In Proceedings of the
2006 Document Understanding Conference (DUC 2006),
New York, New York.
David Lewis. 1988. Relevant Implication. Theoria,
54(3):161 174.
Chin-Yew Lin and Eduard Hovy. 2000. The auto-
mated acquisition of topic signatures for text summariza-
tion. In Proceedings of the 18th COLING Conference,
Saarbrcurrency1ucken, Germany.
Bill MacCartney, Trond Grenager, Marie-Catherine de Marn-
effe, Daniel Cer, and Christopher D. Manning. 2006.
Learning to recognize features of valid textual entail-
ments. In Proceedings of the Joint Human Language
Technology Conference and Annual Meeting of the North
American Chapter of the Association for Computational
Linguistics (HLT-NAACL 2006), New York, New York.
Dan Moldovan, Christine Clark, Sanda Harabagiu, and Steve
Maiorano. 2003. COGEX: A Logic Prover for Question
Answering. In Proceedings of HLT/NAACL-2003.
Srini Narayanan and Sanda Harabagiu. 2004. Question An-
swering based on Semantic Structures. In Proceedings of
COLING-2004.
